U.S. patent application number 17/691815 was filed with the patent office on 2022-08-25 for non-invasive determination of tissue source of cell-free dna.
The applicant listed for this patent is The Chinese University of Hong Kong. Invention is credited to Kwan Chee Chan, Wai Man Chan, Rossa Wai Kwun Chiu, Peiyong Jiang, Yuk-Ming Dennis Lo, Miu Fan Lun.
Application Number | 20220267861 17/691815 |
Document ID | / |
Family ID | |
Filed Date | 2022-08-25 |
United States Patent
Application |
20220267861 |
Kind Code |
A1 |
Lo; Yuk-Ming Dennis ; et
al. |
August 25, 2022 |
NON-INVASIVE DETERMINATION OF TISSUE SOURCE OF CELL-FREE DNA
Abstract
Systems, methods, and apparatuses can determine and use
methylation profiles of various tissues and samples. Examples are
provided. A methylation profile can be deduced for fetal/tumor
tissue based on a comparison of plasma methylation (or other sample
with cell-free DNA) to a methylation profile of the mother/patient.
A methylation profile can be determined for fetal/tumor tissue
using tissue-specific alleles to identify DNA from the fetus/tumor
when the sample has a mixture of DNA. A methylation profile can be
used to determine copy number variations in genome of a
fetus/tumor. Methylation markers for a fetus have been identified
via various techniques. The methylation profile can be determined
by determining a size parameter of a size distribution of DNA
fragments, where reference values for the size parameter can be
used to determine methylation levels. Additionally, a methylation
level can be used to determine a level of cancer.
Inventors: |
Lo; Yuk-Ming Dennis;
(Homantin, CN) ; Chiu; Rossa Wai Kwun; (Shatin,
CN) ; Chan; Kwan Chee; (Mei Foo Sun Chuen, CN)
; Lun; Miu Fan; (Kwai Chung, CN) ; Chan; Wai
Man; (Shaukeiwan, CN) ; Jiang; Peiyong; (Pak
Shek Kok, CN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
The Chinese University of Hong Kong |
Shatin |
|
HK |
|
|
Appl. No.: |
17/691815 |
Filed: |
March 10, 2022 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15647824 |
Jul 12, 2017 |
11274347 |
|
|
17691815 |
|
|
|
|
13842209 |
Mar 15, 2013 |
9732390 |
|
|
15647824 |
|
|
|
|
61703512 |
Sep 20, 2012 |
|
|
|
International
Class: |
C12Q 1/6886 20060101
C12Q001/6886; C12Q 1/6881 20060101 C12Q001/6881; G16B 30/00
20060101 G16B030/00; C12Q 1/6809 20060101 C12Q001/6809; C12Q 1/6874
20060101 C12Q001/6874; G16B 30/10 20060101 G16B030/10 |
Claims
1. A method of analyzing a biological sample from a subject, the
method comprising: (a) obtaining sequence reads for cell-free DNA
molecules from the biological sample of the subject, wherein the
sequence reads include methylation statuses for the cell-free DNA
molecules at single nucleotide resolution, and wherein the sequence
reads comprise at least 100,000 sequence reads; (b) analyzing,
including aligning to a reference genome, the at least 1000,000
sequence reads to determine a methylation profile for a plurality
of sites based on the methylation statuses for the plurality of
sites; and (c) determining a tissue source for at least a portion
of the cell-free DNA molecules from the biological sample based, at
least in part, on the methylation profile.
2. The method of claim 1, wherein determining the tissue source
comprises: comparing the methylation profile to one or more
reference methylation profiles.
3. The method of claim 2, wherein at least one of the one or more
reference methylation profiles is determined from methylation
statuses of one or more reference sample obtained from another
subject known to have cancer.
4. The method of claim 3, wherein at least another one of the one
or more reference methylation profiles is obtained from methylation
statuses of at least one other sample obtained from a healthy
subject.
5. The method of claim 2, wherein the comparison of the methylation
profile to the one or more reference methylation profiles detects
changes in methylation status of CpG islands.
6. The method of claim 1, wherein the methylation profile comprises
a pattern of the cell-free DNA molecules that are methylated at the
plurality of sites, wherein the plurality of sites includes at
least 20,000 sites.
7. The method of claim 1, wherein at least a portion of the
cell-free DNA molecules are cancer-derived molecules, the method
further comprising determining a type of cancer of the subject
based, at least in part, on the methylation profile.
8. The method of claim 7, wherein the type of cancer is selected
from the group consisting of lung cancer, breast cancer, colorectal
cancer, prostate cancer, nasopharyngeal cancer, gastric cancer,
testicular cancer, skin cancer, cancer affecting the nervous
system, bone cancer, ovarian cancer, liver cancer, hematologic
malignancies, pancreatic cancer, endometriocarcinoma, and kidney
cancer.
9. The method of claim 1, further comprising sequencing the
cell-free DNA molecules to obtain the sequence reads.
10. The method of claim 9, wherein the sequencing comprises
methylation-aware sequencing.
11. The method of claim 10, wherein the methylation-aware
sequencing comprises bisulfate sequencing.
12. The method of claim 10, further comprising enriching the
cell-free DNA molecules before the sequencing, and wherein the
enriching comprises use of hybridization probes, polymerase chain
reaction amplification, or solid phase hybridization.
13. The method of claim 1, wherein the plurality of sites comprise
one or more CpG sites.
14. The method of claim 13, wherein the one or more CpG sites
comprise a plurality of CpG sites that are organized into one or
more CpG islands.
15. The method of claim 14, wherein the determining the methylation
profile for the plurality of sites comprises, for each CpG island
of a plurality of CpG islands, determining a number of sequence
reads showing methylation at the CpG sites in the CpG island.
16. The method of claim 1, wherein the determining the methylation
profile for the plurality of sites comprises, for each site of the
plurality of sites, determining a total number of sequence reads at
the plurality of sites.
17. The method of claim 1, wherein the analyzing further comprises
determining locations of the cell-free DNA molecules in a
genome.
18. The method of claim 1, wherein the determining the methylation
profile comprises determining a number of sequence reads of the
cell-free DNA molecules showing methylation at sites in a genomic
region.
19. The method of claim 1, wherein the biological sample is
selected from a group consisting of blood, plasma, serum, urine,
vaginal fluid, uterine or vaginal flushing fluids, plural fluid,
ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum,
bronchoalveolar lavage, fluid, and stool.
20. A non-transitory computer-readable medium comprising
instructions that, upon execution by one or more computer
processors of a computer system, causes the computer system to
perform a method, the method comprising: (a) obtaining sequence
reads for cell-free DNA molecules from a biological sample of a
subject, wherein the sequence reads include methylation statuses
for the cell-free DNA molecules at single nucleotide resolution,
and wherein the sequence reads comprise at least 100,000 sequence
reads; (b) analyzing, including aligning to a reference genome, the
at least 1000,000 sequence reads to determine a methylation profile
for a plurality of sites based on the methylation statuses for the
plurality of sites; and (c) determining a tissue source for at
least a portion of the cell-free DNA molecules from the biological
sample based, at least in part, on the methylation profile.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a divisional of U.S. patent application
Ser. No. 15/647,824, entitled "NON-INVASIVE DETERMINATION OF
METHYLOME OF FETUS OR TUMOR FROM PLASMA," filed on Jul. 12, 2017,
which is a continuation of U.S. patent application Ser. No.
13/842,209, entitled "Non-Invasive Determination Of Methylome Of
Fetus Or Tumor From Plasma," filed on Mar. 15, 2013, now U.S. Pat.
No. 9,732,390, which is a non-provisional of and claims the benefit
of U.S. Provisional Patent Application No. 61/703,512, entitled
"Method Of Determining The Whole Genome DNA Methylation Status Of
The Placenta By Massively Parallel Sequencing Of Maternal Plasma,"
filed on Sep. 20, 2012, each of which are herein incorporated by
reference in their entirety for all purposes.
FIELD
[0002] The present disclosure relates generally a determination of
a methylation pattern (methylome) of DNA, and more particularly to
analyzing a biological sample (e.g., plasma) that includes a
mixture of DNA from different genomes (e.g., from fetus and mother,
or from tumor and normal cells) to determine the methylation
pattern (methylome) of the minority genome. Uses of the determined
methylome are also described.
BACKGROUND
[0003] Embryonic and fetal development is a complex process and
involves a series of highly orchestrated genetic and epigenetic
events. Cancer development is also a complex process involving
typically multiple genetic and epigenetic steps. Abnormalities in
the epigenetic control of developmental processes are implicated in
infertility, spontaneous abortion, intrauterine growth
abnormalities and postnatal consequences. DNA methylation is one of
the most frequently studied epigenetic mechanisms. Methylation of
DNA mostly occurs in the context of the addition of a methyl group
to the 5' carbon of cytosine residues among CpG dinucleotides.
Cytosine methylation adds a layer of control to gene transcription
and DNA function. For example, hypermethylation of gene promoters
enriched with CpG dinucleotides, termed CpG islands, is typically
associated with repression of gene function.
[0004] Despite the important role of epigenetic mechanisms in
mediating developmental processes, human embryonic and fetal
tissues are not readily accessible for analysis (tumors may
similarly not be accessible). Studies of the dynamic changes of
such epigenetic processes in health and disease during the prenatal
period in humans are virtually impossible. Extraembryonic tissues,
particularly the placenta, which can be obtained as part of
prenatal diagnostic procedures or after birth, have provided one of
the main avenues for such investigations. However, such tissues
require invasive procedures.
[0005] The DNA methylation profile of the human placenta has
intrigued researchers for decades. The human placenta exhibits a
plethora of peculiar physiological features involving DNA
methylation. On a global level, placental tissues are
hypomethylated when compared with most somatic tissues. At the gene
level, the methylation status of selected genomic loci is a
specific signature of placental tissues. Both the global and
locus-specific methylation profiles show gestational-age dependent
changes. Imprinted genes, namely genes for which expression is
dependent on the parental origin of alleles serve key functions in
the placenta. The placenta has been described as pseudomalignant
and hypermethylation of several tumor suppressor genes have been
observed.
[0006] Studies of the DNA methylation profile of placental tissues
have provided insights into the pathophysiology of
pregnancy-associated or developmentally-related diseases, such as
preeclampsia and intrauterine growth restriction. Disorders in
genomic imprinting are associated with developmental disorders,
such as Prader-Willi syndrome and Angelman syndrome. Altered
profiles of genomic imprinting and global DNA methylation in
placental and fetal tissues have been observed in pregnancies
resulting from assisted reproductive techniques (H. Hiura et al.
2012 Hum Reprod; 27: 2541-2548). A number of environmental factors
such as maternal smoking (K. E. Haworth et al. 2013 Epigenomics; 5:
37-49), maternal dietary factors (X. Jiang et al. 2012 FASEB J; 26:
3563-3574) and maternal metabolic status such as diabetes (N. Hajj
et al., Diabetes. doi: 10.2337/db12-0289) have been associated with
epigenetic aberrations of the offsprings.
[0007] Despite decades of efforts, there had not been any practical
means available to study the fetal or tumor methylome and to
monitor the dynamic changes throughout pregnancy or during disease
processes, such as malignancies. Therefore, it is desirable to
provide methods for analyzing all or portions of a fetal methylome
and a tumor methylome noninvasively.
SUMMARY
[0008] Embodiments provide systems, methods, and apparatuses for
determining and using methylation profiles of various tissues and
samples. Examples are provided. A methylation profile can be
deduced for fetal/tumor tissue based on a comparison of plasma
methylation (or other sample with cell-free DNA) to a methylation
profile of the mother/patient. A methylation profile can be
determined for fetal/tumor tissue using tissue-specific alleles to
identify DNA from the fetus/tumor when the sample has a mixture of
DNA. A methylation profile can be used to determine copy number
variations in genome of a fetus/tumor. Methylation markers for a
fetus have been identified via various techniques. The methylation
profile can be determined by determining a size parameter of a size
distribution of DNA fragments, where reference values for the size
parameter can be used to determine methylation levels.
[0009] Additionally, a methylation level can be used to determine a
level of cancer. In the context of cancer, the measurement of the
methylomic changes in plasma can allow one to detect the cancer
(e.g. for screening purposes), for monitoring (e.g. to detect
response following anti-cancer treatment; and to detect cancer
relapse) and for prognostication (e.g. for measuring the load of
cancer cells in the body or for staging purposes or for assessing
the chance of death from disease or disease progression).
[0010] A better understanding of the nature and advantages of
embodiments of the present invention may be gained with reference
to the following detailed description and the accompanying
drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1A shows a table 100 of sequencing results for maternal
blood, placenta, and maternal plasma according to embodiments of
the present invention.
[0012] FIG. 1B shows methylation density in 1-Mb windows of
sequenced samples according to embodiments of the present
invention.
[0013] FIGS. 2A-2C show plots of the beta-values against the
methylation indices: (A) Maternal blood cells, (B) Chorionic villus
sample, (C) Term placental tissue.
[0014] FIGS. 3A and 3B show bar charts of percentage of methylated
CpG sites in plasma and blood cells collected from an adult male
and a non-pregnant adult female: (A) Autosomes, (B) Chromosome
X.
[0015] FIGS. 4A and 4B show plots of methylation densities of
corresponding loci in blood cell DNA and plasma DNA: (A)
Non-pregnant adult female, (B) Adult male.
[0016] FIGS. 5A and 5B show bar charts of percentage of methylated
CpG sites among samples collected from the pregnancy: (A).
Autosomes, (B) Chromosome X.
[0017] FIG. 6 shows a bar chart of methylation level of different
repeat classes of the human genome for maternal blood, placenta and
maternal plasma.
[0018] FIG. 7A shows a Circos plot 700 for first trimester samples.
FIG. 7B shows a Circos plot 750 for third trimester samples.
[0019] FIGS. 8A-8D show plots of comparisons of the methylation
densities of genomic tissue DNA against maternal plasma DNA for CpG
sites surrounding the informative single nucleotide
polymorphisms.
[0020] FIG. 9 is a flowchart illustrating a method 900 for
determining a first methylation profile from a biological sample of
an organism according to embodiments of the present invention.
[0021] FIG. 10 is a flowchart illustrating a method 1000 of
determining a first methylation profile from a biological sample of
an organism according to embodiments of the present invention.
[0022] FIGS. 11A and 11B show graphs of the performance of the
predicting algorithm using maternal plasma data and fractional
fetal DNA concentration according to embodiments of the present
invention.
[0023] FIG. 12A is a table 1200 showing details of 15 selected
genomic loci for methylation prediction according to embodiments of
the present invention. FIG. 12B is a graph 1250 showing the deduced
categories of the 15 selected genomic loci are provided with their
corresponding methylation levels in the placenta.
[0024] FIG. 13 is a flowchart of a method 1300 for detecting a
fetal chromosomal abnormality from a biological sample of a female
subject pregnant with at least one fetus.
[0025] FIG. 14 is a flowchart of a method 1400 for identifying
methylation markers by comparing a placental methylation profile to
a maternal methylation profile according to embodiments of the
present invention.
[0026] FIG. 15A is a table 1500 showing a performance of first
trimester differentially methylated region (DMR) identification
algorithm using placental methylome with reference to 33 previously
reported first trimester markers. FIG. 15B is a table 1550 showing
a performance of third trimester DMR identification algorithm using
the placental methylome measured using the placenta sample obtained
at delivery.
[0027] FIG. 16 is a table 1600 showing the numbers of loci
predicted to be hypermethylated or hypomethylated based on direct
analysis of the maternal plasma bisulfite-sequencing data.
[0028] FIG. 17A is a plot 1700 showing size distribution of
maternal plasma, non-pregnant female control plasma, placental and
peripheral blood DNA. FIG. 17B is a plot 1750 of size distribution
and methylation profile of maternal plasma, adult female control
plasma, placental tissue and adult female control blood.
[0029] FIGS. 18A and 18B are plots of methylation densities and
size of plasma DNA molecules according to embodiments of the
present invention.
[0030] FIG. 19A shows a plot 1900 of methylation densities and the
sizes of sequenced reads for an adult non-pregnant female. FIG. 19B
is a plot 1950 showing size distribution and methylation profile of
fetal-specific and maternal-specific DNA molecules in maternal
plasma.
[0031] FIG. 20 is a flowchart of a method 2000 for estimating a
methylation level of DNA in a biological sample of an organism
according to embodiments of the present invention.
[0032] FIG. 21A is a table 2100 showing the methylation densities
of the pre-operative plasma and the tissue samples of an HCC
patient. FIG. 21B is a table 2150 showing the number of sequence
reads and the sequencing depth achieved per sample.
[0033] FIG. 22 is a table 220 showing the methylation densities in
the autosomes ranged from 71.2% to 72.5% in the plasma samples of
the healthy controls.
[0034] FIGS. 23A and 23B shows methylation density of buffy coat,
tumor tissue, non-tumoral liver tissue, the pre-operative plasma
and post-operative plasma of the HCC patient.
[0035] FIG. 24A is a plot 2400 showing the methylation densities of
the pre-operative plasma from the HCC patient. FIG. 24B is a plot
2450 showing the methylation densities of the post-operative plasma
from the HCC patient.
[0036] FIGS. 25A and 25B show z-scores of the plasma DNA
methylation densities for the pre-operative (plot 2500) and
post-operative (plot 2550) plasma samples of the HCC patient using
the plasma methylome data of the four healthy control subjects as
reference for chromosome 1.
[0037] FIG. 26A is a table 2600 showing data for z-scores for
pre-operative and post-operative plasma. FIG. 26B is a Circos plot
2620 showing the z-score of the plasma DNA methylation densities
for the pre-operative and post-operative plasma samples of the HCC
patient using the four healthy control subjects as reference for 1
Mb bins analyzed from all autosomes. FIG. 26C is a table 2640
showing a distribution of the z-scores of the 1 Mb bins for the
whole genome in both the pre-operative and post-operative plasma
samples of the HCC patient. FIG. 26D is a table 2660 showing the
methylation levels of the tumor tissue and pre-operative plasma
sample overlapped with some of the control plasma samples when
using the CHH and CHG contexts.
[0038] FIG. 27A-H show Circos plots of methylation density of 8
cancer patients according to embodiments of the present invention.
FIG. 27I is table 2780 showing the number of sequence reads and the
sequencing depth achieved per sample. FIG. 27J is a table 2790
showing a distribution of the z-scores of the 1 Mb bins for the
whole genome in plasma of patients with different malignancies.
CL=adenocarcinoma of lung; NPC=nasopharyngeal carcinoma;
CRC=colorectal carcinoma; NE=neuroendocrine carcinoma; SMS=smooth
muscle sarcoma.
[0039] FIG. 28 is a flowchart of method 2800 of analyzing a
biological sample of an organism to determine a classification of a
level of cancer according to embodiments of the present
invention.
[0040] FIG. 29A is a plot 2900 showing the distribution of the
methylation densities in reference subjects assuming that this
distribution follows a normal distribution. FIG. 29B is a plot 2950
showing the distribution of the methylation densities in cancer
subjects assuming that this distribution follows a normal
distribution and the mean methylation level is 2 standard
deviations below the cutoff.
[0041] FIG. 30 is a plot 3000 showing the distribution of
methylation densities of the plasma DNA of healthy subjects and
cancer patients.
[0042] FIG. 31 is a graph 3100 showing the distribution of the
differences in methylation densities between the mean of the plasma
DNA of healthy subjects and the tumor tissue of the HCC
patient.
[0043] FIG. 32A is a table 3200 showing the effect of reducing the
sequencing depth when the plasma sample contained 5% or 2% tumor
DNA.
[0044] FIG. 32B is a graph 3250 showing the methylation densities
of the repeat elements and non-repeat regions in the plasma of the
four healthy control subjects, the buffy coat, the normal liver
tissue, the tumor tissue, the pre-operative plasma and the
post-operative plasma samples of the HCC patient.
[0045] FIG. 33 shows a block diagram of an example computer system
3300 usable with system and methods according to embodiments of the
present invention.
DEFINITIONS
[0046] A "methylome" provides a measure of an amount of DNA
methylation at a plurality of sites or loci in a genome. The
methylome may correspond to all of the genome, a substantial part
of the genome, or relatively small portion(s) of the genome. A
"fetal methylome" corresponds to the methylome of a fetus of a
pregnant female. The fetal methylome can be determined using a
variety of fetal tissues or sources of fetal DNA, including
placental tissues and cell-free fetal DNA in maternal plasma. A
"tumor methylome" corresponds to the methylome of a tumor of an
organism (e.g., a human). The tumor methylome can be determined
using tumor tissue or cell-free tumor DNA in maternal plasma. The
fetal methylome and the tumor methylome are examples of a methylome
of interest. Other examples of methylomes of interest are the
methylomes of organs that can contribute DNA into a bodily fluid
(e.g. methylomes of brain cells, bones, the lungs, the heart, the
muscles and the kidneys, etc.). The organs may be transplanted
organs.
[0047] A "plasma methylome" is the methylome determined from the
plasma or serum of an animal (e.g., a human). The plasma methylome
is an example of a cell-free methylome since plasma and serum
include cell-free DNA. The plasma methylome is also an example of a
mixed methylome since it is a mixture of fetal/maternal methylome
or tumor/patient methylome. The "placental methylome" can be
determined from a chorionic villus sample (CVS) or a placental
tissue sample (e.g. obtained following delivery). The "cellular
methylome" corresponds to the methylome determined from cells
(e.g., blood cells) of the patient. The methylome of the blood
cells is called the blood cell methylome (or blood methylome).
[0048] A "site" corresponds to a single site, which may be a single
base position or a group of correlated base positions, e.g., a CpG
site. A "locus" may correspond to a region that includes multiple
sites. A locus can include just one site, which would make the
locus equivalent to a site in that context.
[0049] The "methylation index" for each genomic site (e.g., a CpG
site) refers to the proportion of sequence reads showing
methylation at the site over the total number of reads covering
that site. The "methylation density" of a region is the number of
reads at sites within the region showing methylation divided by the
total number of reads covering the sites in the region. The sites
may have specific characteristics, e.g., be CpG sites. Thus, the
"CpG methylation density" of a region is the number of reads
showing CpG methylation divided by the total number of reads
covering CpG sites in the region (e.g., a particular CpG site, CpG
sites within a CpG island, or a larger region). For example, the
methylation density for each 100-kb bin in the human genome can be
determined from the total number of unconverted cytosines (which
corresponds to methylated cytosine) at CpG sites as a proportion of
all CpG sites covered by sequence reads mapped to the 100-kb
region. This analysis can also be performed for other bin sizes,
e.g. 50-kb or 1-Mb, etc. A region could be the entire genome or a
chromosome or part of a chromosome (e.g. a chromosomal arm). The
methylation index of a CpG site is the same as the methylation
density for a region when the region only includes that CpG site.
The "proportion of methylated cytosines" refers the number of
cytosine sites, "C's", that are shown to be methylated (for example
unconverted after bisulfite conversion) over the total number of
analyzed cytosine residues, i.e. including cytosines outside of the
CpG context, in the region. The methylation index, methylation
density and proportion of methylated cytosines are examples of
"methylation levels."
[0050] A "methylation profile" (also called methylation status)
includes information related to DNA methylation for a region.
Information related to DNA methylation can include, but not limited
to, a methylation index of a CpG site, a methylation density of CpG
sites in a region, a distribution of CpG sites over a contiguous
region, a pattern or level of methylation for each individual CpG
site within a region that contains more than one CpG site, and
non-CpG methylation. A methylation profile of a substantial part of
the genome can be considered equivalent to the methylome. "DNA
methylation" in mammalian genomes typically refers to the addition
of a methyl group to the 5' carbon of cytosine residues (i.e.
5-methylcytosines) among CpG dinucleotides. DNA methylation may
occur in cytosines in other contexts, for example CHG and CHH,
where H is adenine, cytosine or thymine. Cytosine methylation may
also be in the form of 5-hydroxymethylcytosine. Non-cytosine
methylation, such as N6-methyladenine, has also been reported.
[0051] A "tissue" corresponds to any cells. Different types of
tissue may correspond to different types of cells (e.g., liver,
lung, or blood), but also may correspond to tissue from different
organisms (mother vs. fetus) or to healthy cells vs. tumor cells. A
"biological sample" refers to any sample that is taken from a
subject (e.g., a human, such as a pregnant woman, a person with
cancer, or a person suspected of having cancer, an organ transplant
recipient or a subject suspected of having a disease process
involving an organ (e.g., the heart in myocardial infarction, or
the brain in stroke) and contains one or more nucleic acid
molecule(s) of interest. The biological sample can be a bodily
fluid, such as blood, plasma, serum, urine, vaginal fluid, uterine
or vaginal flushing fluids, plural fluid, ascitic fluid,
cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar
lavage fluid, etc. Stool samples can also be used.
[0052] The term "level of cancer" can refer to whether cancer
exists, a stage of a cancer, a size of tumor, whether there is
metastasis, the total tumor burden of the body, and/or other
measure of a severity of a cancer. The level of cancer could be a
number or other characters. The level could be zero. The level of
cancer also includes premalignant or precancerous conditions
(states) associated with mutations or a number of mutations. The
level of cancer can be used in various ways. For example, screening
can check if cancer is present in someone who is not known
previously to have cancer. Assessment can investigate someone who
has been diagnosed with cancer to monitor the progress of cancer,
study the effectiveness of therapies or to determine the prognosis.
Detection can mean `screening` or can mean checking if someone,
with suggestive features of cancer (e.g. symptoms or other positive
tests), has cancer.
DETAILED DESCRIPTION
[0053] Epigenetic mechanisms play an important role in embryonic
and fetal development. However, human embryonic and fetal tissues
(including placental tissues) are not readily accessible (U.S. Pat.
No. 6,927,028). Certain embodiments have addressed this problem by
analyzing a sample that has cell-free fetal DNA molecules present
in maternal circulation. The fetal methylome can be deduced in a
variety of ways. For example, the maternal plasma methylome can be
compared to a cellular methylome (from blood cells of the mother)
and the difference is shown to be correlated to the fetal
methylome. As another example, fetal-specific alleles can be used
to determine the methylation of the fetal methylome at specific
loci. Additionally, the size of a fragment can be used as an
indicator of a methylation percentage, as a correlation between
size and methylation percentage is shown.
[0054] In one embodiment, genome-wide bisulfite sequencing is used
to analyze the methylation profile (part or all of a methylome) of
maternal plasma DNA at single nucleotide resolution. By exploiting
the polymorphic differences between the mother and the fetus, the
fetal methylome could be assembled from maternal blood samples. In
another implementation, polymorphic differences were not used, but
a differential between the plasma methylome and the blood cell
methylome can be used.
[0055] In another embodiment, by exploiting single nucleotide
variations and/or copy number aberrations between a tumor genome
and a nontumor genome, and sequencing data from plasma (or other
sample), methylation profiling of a tumor can be performed in the
sample of a patient suspected or known to have cancer. A difference
in a methylation level in a plasma sample of a test individual when
compared with the plasma methylation level of a healthy control or
a group of healthy controls can allow the identification of the
test individual as harboring cancer. Additionally, the methylation
profile can act as a signature that reveals the type of cancer, for
example, from which organ, that the person has developed and
whether metastasis has occurred.
[0056] Due to the noninvasive nature of this approach, we were able
to serially assess the fetal and maternal plasma methylomes from
maternal blood samples collected in the first trimester, third
trimester and after delivery. Gestation-related changes were
observed. The approach can also be applied to samples obtained
during the second trimester. The fetal methylome deduced from
maternal plasma during pregnancy resembled the placental methylome.
Imprinted genes and differentially methylated regions were
identified from the maternal plasma data.
[0057] We have therefore developed an approach to study the fetal
methylome noninvasively, serially and comprehensively, thus
offering the possibility for identifying biomarkers or direct
testing of pregnancy-related pathologies. Embodiments can also be
used to study the tumor methylome noninvasively, serially and
comprehensively, for screening or detecting if a subject is
suffering from cancer, for monitoring malignant diseases in a
cancer patient and for prognostication. Embodiments can be applied
to any cancer type, including, but not limited to, lung cancer,
breast cancer, colorectal cancer, prostate cancer, nasopharyngeal
cancer, gastric cancer, testicular cancer, skin cancer, cancer
affecting the nervous system, bone cancer, ovarian cancer, liver
cancer, hematologic malignancies, pancreatic cancer,
endometriocarcinoma, kidney cancer etc.
[0058] A description of how to determine a methylome or methylation
profile is first discussed, and then different methylomes are
described (such as fetal methylomes, a tumor methylome, methylomes
of the mother or a patient, and a mixed methylome, e.g., from
plasma). The determination of a fetal methylation profile is then
described using fetal-specific markers or by comparing a mixed
methylation profile to a cellular methylation profile. Fetal
methylation markers are determined by comparing methylation
profiles. A relationship between size and methylation is discussed.
Uses of methylation profiles to detect cancer are also
provided.
I. Determination of a Methylome
[0059] A myriad of approaches have been used to investigate the
placental methylome, but each approach has its limitations. For
example, sodium bisulfite, a chemical that modifies unmethylated
cytosine residues to uracil and leaves methylated cytosine
unchanged, converts the differences in cytosine methylation into a
genetic sequence difference for further interrogation. The gold
standard method of studying cytosine methylation is based on
treating tissue DNA with sodium bisulfite followed by direct
sequencing of individual clones of bisulfite-converted DNA
molecules. After the analysis of multiple clones of DNA molecules,
the cytosine methylation pattern and quantitative profile per CpG
site can be obtained. However, cloned bisulfite sequencing is a low
throughput and labor-intensive procedure that cannot be readily
applied on a genome-wide scale.
[0060] Methylation-sensitive restriction enzymes that typically
digest unmethylated DNA provide a low cost approach to study DNA
methylation. However, data generated from such studies are limited
to loci with the enzyme recognition motifs and the results are not
quantitative. Immunoprecipitation of DNA bound by anti-methylated
cytosine antibodies can be used to survey large segments of the
genome but tends to bias towards loci with dense methylation due to
higher strength of antibody binding to such regions.
Microarray-based approaches are dependent on the a priori design of
the interrogation probes and hybridization efficiencies between the
probes and the target DNA.
[0061] To interrogate a methylome comprehensively, some embodiments
use massively parallel sequencing (MPS) to provide genome-wide
information and quantitative assessment of the level of methylation
on a per nucleotide and per allele basis. Recently, bisulfite
conversion followed by genome-wide MPS has become feasible (R.
Lister et al 2008 Cell; 133: 523-536).
[0062] Among the small number of published studies (R. Lister et
al. 2009 Nature; 462: 315-322); L. Laurent et al. 2010 Genome Res;
20: 320-331; Y. Li et al. 2010 PLoS Biol; 8: e1000533; and M. Kulis
et al. 2012 Nat Genet; 44: 1236-1242) that applied genome-wide
bisulfite sequencing for the investigation of human methylomes, two
studies focused on embryonic stem cells and fetal fibroblasts (R.
Lister et al. 2009 and L. Laurent et al 2010). Both studies
analyzed cell-line derived DNA.
[0063] A. Genome-Wide Bisulfite Sequencing
[0064] Certain embodiments can overcome the aforesaid challenges
and enable interrogation of a fetal methylome comprehensively,
noninvasively and serially. In one embodiment, genome-wide
bisulfite sequencing was used to analyze cell-free fetal DNA
molecules that are found in the circulation of pregnant women.
Despite the low abundance and fragmented nature of plasma DNA
molecules, we were able to assemble a high resolution fetal
methylome from maternal plasma and serially observe the changes
with pregnancy progression. Given the intense interest in
noninvasive prenatal testing (NIPT), embodiments can provide a
powerful new tool for fetal biomarker discovery or serve as a
direct platform for achieving NIPT of fetal or pregnancy-associated
diseases. Data from the genome-wide bisulfite sequencing of various
samples, from which the fetal methylome can be derived, is now
provided. In one embodiment, this technology can be applied for
methylation profiling in pregnancies complicated with preeclampsia,
or intrauterine growth retardation, or preterm labor. For such
complicated pregnancies, this technology can be used serially
because of its noninvasive nature, to allow for the monitoring
and/or prognostication and/or response to treatment.
[0065] FIG. 1A shows a table 100 of sequencing results for maternal
blood, placenta, and maternal plasma according to embodiments of
the present invention. In one embodiment, whole genome sequencing
was performed on bisulfite-converted DNA libraries, prepared using
methylated DNA library adaptors (Illumina) (R. Lister et al. 2008),
of blood cells of the blood sample collected in the first
trimester, the CVS, the placental tissue collected at term, the
maternal plasma samples collected during the first and third
trimesters and the postpartum period. Blood cell and plasma DNA
samples obtained from one adult male and one adult non-pregnant
female were also analyzed. A total of 9.5 billion pairs of raw
sequence reads were generated in this study. The sequencing
coverage of each sample is shown in table 100.
[0066] The sequence reads that were uniquely mappable to the human
reference genome reached average haploid genomic coverages of 50
folds, 34 folds and 28 folds, respectively, for the first
trimester, third trimester and post-delivery maternal plasma
samples. The coverage of the CpG sites in the genome ranged from
81% to 92% for the samples obtained from the pregnancy. The
sequence reads that spanned CpG sites amounted to average haploid
coverages of 33 folds per strand, 23 folds per strand and 19 folds
per strand, respectively, for the first trimester, third trimester
and post-delivery maternal plasma samples. The bisulfite conversion
efficiencies for all samples were >99.9% (table 100).
[0067] In table 100, ambiguous rate (marked "a") refers to the
proportion of reads mapped onto both the Watson and Crick strands
of the reference human genome. Lambda conversion rate refers to the
proportion of unmethylated cytosines in the internal lambda DNA
control being converted to the "thymine" residues by bisulfite
modification. H generically equates to A, C, or T. "a" refers to
reads that could be mapped to a specific genomic locus but cannot
be assigned to the Watson or Crick strand. "b" refers to paired
reads with identical start and end coordinates. For "c", lambda DNA
was spiked into each sample before bisulfite conversion. The lambda
conversion rate refers to the proportion of cytosine nucleotides
that remain as cytosine after bisulfite conversion and is used as
an indication of the rate of successful bisulfite conversion. "d"
refers to the number of cytosine nucleotides present in the
reference human genome and remaining as a cytosine sequence after
bisulfite conversion.
[0068] During bisulfite modification, unmethylated cytosines are
converted to uracils and subsequently thymines after PCR
amplifications while the methylated cytosines would remain intact
(Frommer M, et al. 1992 Proc Natl Acad Sci USA; 89:1827-31). After
sequencing and alignment, the methylation status of an individual
CpG site could thus be inferred from the count of methylated
sequence reads "M" (methylated) and the count of unmethylated
sequence reads "U" (unmethylated) at the cytosine residue in CpG
context. Using the bisulfite sequencing data, the entire methylomes
of maternal blood, placenta and maternal plasma were constructed.
The mean methylated CpG density (also called methylation density m)
of specific loci in the maternal plasma can be calculated using the
equation:
m = M M + U ##EQU00001##
where M is the count of methylated reads and U is the count of
unmethylated reads at the CpG sites within the genetic locus. If
there is more than one CpG site within a locus, then M and U
correspond to the counts across the sites.
[0069] B. Various Techniques
[0070] As described above, methylation profiling can be performed
using massively parallel sequencing (MPS) of bisulfite converted
plasma DNA. The MPS of the bisulfite converted plasma DNA can be
performed in a random or shotgun fashion. The depth of the
sequencing can be varied according to the size of the region of
interest.
[0071] In another embodiment, the region(s) of interest in the
bisulfite converted plasma DNA can be first captured using a
solution-phase or solid-phase hybridization-based process, followed
by the MPS. The massively parallel sequencing can be performed
using a sequencing-by-synthesis platform such as the Illumina, a
sequencing-by-ligation platform such as the SOLiD platform from
Life Technologies, a semiconductor-based sequencing system such as
the Ion Torrent or Ion Proton platforms from Life Technologies, or
single molecule sequencing system such as the Helicos system or the
Pacific Biosciences system or a nanopore-based sequencing system.
Nanopore-based sequencing including nanopores that are constructed
using lipid bilayers and protein nanopore, and solid-state
nanopores (such as those that are graphene based). As selected
single molecule sequencing platforms would allow the methylation
status of DNA molecules (including N6-methyladenine,
5-methylcytosine and 5-hydroxymethylcytosine) to be elucidated
directly without bisulfite conversion (B. A. Flusberg et al. 2010
Nat Methods; 7: 461-465; J. Shim et al. 2013 Sci Rep; 3:1389. doi:
10.1038/srep01389), the use of such platforms would allow the
methylation status of non-bisulfite converted sample DNA (e.g.
plasma DNA) to be analyzed.
[0072] Besides sequencing, other techniques can be used. In one
embodiment, methylation profiling can be done by
methylation-specific PCR or methylation-sensitive restriction
enzyme digestion followed by PCR or ligase chain reaction followed
by PCR. In yet other embodiments, the PCR is a form of single
molecule or digital PCR (B. Vogelstein et al. 1999 Proc Natl Acad
Sci USA; 96: 9236-9241). In yet further embodiments, the PCR can be
a real-time PCR. In other embodiments, the PCR can be multiplex
PCR.
II. Analysis of Methylomes
[0073] Some embodiments can determine the methylation profile of
plasma DNA using whole genome bisulfite sequencing. The methylation
profile of a fetus can be determined by sequencing maternal plasma
DNA samples, as is described below. Thus, the fetal DNA molecules
(and fetal methylome) were accessed noninvasively during the
pregnancy, and changes were monitored serially as the pregnancy
progressed. Due to the comprehensiveness of the sequencing data, we
were able to study the maternal plasma methylomes on a genome-wide
scale at single nucleotide resolution.
[0074] Since the genomic coordinates of the sequenced reads were
known, these data enabled one to study the overall methylation
levels of the methylome or any region of interest in the genome and
to make comparison between different genetic elements. In addition,
multiple sequence reads covered each CpG site or locus. A
description of some of the metrics used to measure the methylome
are now provided.
[0075] A. Methylation of Plasma DNA Molecules
[0076] DNA molecules are present in human plasma at low
concentrations and in a fragmented form, typically in lengths
resembling mononucleosomal units (Y. M. D. Lo et al. 2010 Sci
Transl Med; 2: 61ra91; and Zheng at al. 2012 Clin Chem; 58:
549-558). Despite these limitations, a genome-wide
bisulfite-sequencing pipeline was able to analyze the methylation
of the plasma DNA molecules. In other embodiments, as selected
single molecule sequencing platforms would allow the methylation
status of DNA molecules to be elucidated directly without bisulfite
conversion (Flusberg B A et al. 2010 Nat Methods; 7: 461-465; Shim
J et al. 2013 Sci Rep; 3:1389. doi: 10.1038/srep01389), the use of
such platforms would allow the non-bisulfite converted plasma DNA
to be used to determine the methylation levels of plasma DNA or to
determine the plasma methylome. Such platforms can detect
N6-methyladenine, 5-methylcytosine and 5-hydroxymethylcytosine.
[0077] FIG. 1B shows methylation density in 1-Mb windows of
sequenced samples according to embodiments of the present
invention. Plot 150 is a Circos plot depicting the methylation
density in the maternal plasma and genomic DNA in 1-Mb windows
across the genome. From outside to inside: chromosome ideograms can
be oriented pter-qter in a clockwise direction (centromeres are
shown in red), maternal blood (red), placenta (yellow), maternal
plasma (green), shared reads in maternal plasma (blue), and
fetal-specific reads in maternal plasma (purple). The overall CpG
methylation levels (i.e., density levels) of maternal blood cells,
placenta and maternal plasma can be found in table 100. The
methylation level of maternal blood cells is in general higher than
that of the placenta across the whole genome.
[0078] B. Comparison of Bisulfite Sequencing to Other
Techniques
[0079] We studied the placental methylome using massively parallel
bisulfite sequencing. In addition, we studied the placental
methylome using an oligonucleotide array platform that covered
about 480,000 CpG sites in the human genome (Illumina) (M. Kulis et
al. 2012 Nat Genet; 44: 1236-1242; and C. Clark et al. 2012 PLoS
One; 7: e50233). In one embodiment using beadchip-based genotyping
and methylation analysis, genotyping was performed using the
Illumina HumanOmni2.5-8 genotyping array according to the
manufacturer's protocol. Genotypes were called using the GenCall
algorithm of the Genome Studio Software (Illumina). The call rates
were over 99%. For the microarray based methylation analysis,
genomic DNA (500-800 ng) was treated with sodium bisulfite using
the Zymo E Z DNA Methylation Kit (Zymo Research, Orange, Calif.,
USA) according to the manufacturer's recommendations for the
Illumina Infinium Methylation Assay.
[0080] The methylation assay was performed on 4 .mu.l
bisulfite-converted genomic DNA at 50 ng/.mu.l according to the
Infinium HD Methylation Assay protocol. The hybridized beadchip was
scanned on an Illumina iScan instrument. DNA methylation data were
analyzed by the GenomeStudio (v2011.1) Methylation Module (v1.9.0)
software, with normalization to internal controls and background
subtraction. The methylation index for individual CpG site was
represented by a beta value (.beta.), which was calculated using
the ratio of fluorescent intensities between methylated and
unmethylated alleles:
.beta. = Intensity .times. of .times. methylated .times. allele
Intensity .times. of .times. unmethylated .times. allele +
Intensity .times. of .times. methylated .times. allele + 100
##EQU00002##
[0081] For CpG sites that were represented on the array and
sequenced to coverage of at least 10 folds, we compared the
beta-value obtained by the array to the methylation index as
determined by sequencing of the same site. Beta-values represented
the intensity of methylated probes as a proportion of the combined
intensity of the methylated and unmethylated probes covering the
same CpG site. The methylation index for each CpG site refers to
the proportion of methylated reads over the total number of reads
covering that CpG.
[0082] FIGS. 2A-2C show plots of the beta-values determined by the
Illumina Infinium HumanMethylation 450K beadchip array against the
methylation indices determined by genome-wide bisulfite sequencing
of corresponding CpG sites that were interrogated by both
platforms: (A) Maternal blood cells, (B) Chorionic villus sample,
(C) Term placental tissue. The data from both platforms were highly
concordant and the Pearson correlation coefficients were 0.972,
0.939 and 0.954, and R.sup.2 values were 0.945, 0.882 and 0.910 for
the maternal blood cells, CVS and term placental tissue,
respectively.
[0083] We further compared our sequencing data with those reported
by Chu et al, who investigated the methylation profiles of 12 pairs
of CVS and maternal blood cell DNA samples using an oligonucleotide
array that covered about 27,000 CpG sites (T. Chu et al. 2011 PLoS
One; 6: e14723). The correlation data between the sequencing
results of the CVS and maternal blood cell DNA and each of the 12
pairs of samples in the previous study are an average Pearson
coefficient (0.967) and R.sup.2 (0.935) for maternal blood and an
average Pearson coefficient (0.943) and R.sup.2 (0.888) for the
CVS. Among the CpG sites represented on both arrays, our data
correlated highly with the published data. The rates of non-CpG
methylation were <1% for the maternal blood cells, CVS and
placental tissues (table 100). These results were consistent with
current belief that substantial amounts of non-CpG methylation were
mainly restricted to pluripotent cells (R. Lister et al. 2009 and
L. Laurent et al 2010).
[0084] C. Comparison of Plasma and Blood Methylomes for
Non-Pregnant Subjects
[0085] FIGS. 3A and 3B show bar charts of percentage of methylated
CpG sites in plasma and blood cells collected from an adult male
and a non-pregnant adult female: (A) Autosomes, (B) Chromosome X.
The charts show a similarity between plasma and blood methylomes of
a male and a non-pregnant female. The overall proportions of CpG
sites that were methylated in the male and non-pregnant female
plasma samples were almost the same as the corresponding blood cell
DNA (table 100 and FIGS. 2A and 2B).
[0086] We next studied the correlation of the methylation profiles
of the plasma and blood cell samples in a locus-specific manner. We
determined the methylation density of each 100-kb bin in the human
genome by determining the total number of unconverted cytosines at
CpG sites as a proportion of all CpG sites covered by sequence
reads mapped to the 100-kb region. The methylation densities were
highly concordant between the plasma sample and corresponding blood
cell DNA of the male as well as the female samples.
[0087] FIGS. 4A and 4B show plots of methylation densities of
corresponding loci in blood cell DNA and plasma DNA: (A)
Non-pregnant adult female, (B) Adult male. The Pearson correlation
coefficient and R.sup.2 value for the non-pregnant female samples
were respectively 0.963 and 0.927, and that for the male samples
were respectively 0.953 and 0.908. These data are consistent with
previous findings based on the assessment of genotypes of plasma
DNA molecules of recipients of allogenic hematopoietic stem cell
transplantation which showed that hematopoietic cells are the
predominant source of DNA in human plasma (Zheng at al., 2012).
[0088] D. Methylation Levels Across Methylomes
[0089] We next studied the DNA methylation levels of maternal
plasma DNA, maternal blood cells, and placental tissue to determine
methylation levels. The levels were determined for repeat regions,
non-repeat regions, and overall.
[0090] FIGS. 5A and 5B show bar charts of percentage of methylated
CpG sites among samples collected from the pregnancy: (A).
Autosomes, (B) Chromosome X. The overall proportions of methylated
CpGs were 67.0% and 68.2% for the first and third trimester
maternal plasma samples, respectively. Unlike the results obtained
from the non-pregnant individuals, these proportions were lower
than that of the first trimester maternal blood cell sample but
higher than that of the CVS and term placental tissue samples
(table 100). Of note, the percentage of methylated CpGs for the
post-delivery maternal plasma sample was 73.1% which was similar to
the blood cell data (table 100). These trends were observed in CpGs
distributed over all autosomes as well as chromosome X and spanned
across both the non-repeat regions and multiple classes of repeat
elements of the human genome.
[0091] Both the repeat and non-repeat elements in the placenta were
found to be hypomethylated relative to maternal blood cells. The
results were concordant to the findings in literatures that the
placenta is hypomethylated to other tissues, including peripheral
blood cells.
[0092] Between 71% to 72% of the sequenced CpG sites were
methylated in the blood cell DNA from the pregnant woman,
non-pregnant woman and adult male (table 100 of FIG. 1). These data
are comparable with the report of 68.4% of CpG sites of blood
mononuclear cells reported by Li et al 2010. Consistent with the
previous reports on the hypomethylated nature of placental tissues,
55% and 59% of the CpG sites were methylated in the CVS and term
placental tissue, respectively (table 100).
[0093] FIG. 6 shows a bar chart of methylation level of different
repeat classes of the human genome for maternal blood, placenta and
maternal plasma. The repeat classes are as defined by the UCSC
genome browser. Data shown are from the first trimester samples.
Unlike earlier data suggesting that the hypomethylated nature of
placental tissues was mainly observed in certain repeat classes in
the genome (B. Novakovic et al. 2012 Placenta; 33: 959-970), here
we show that the placenta was in fact hypomethylated in most
classes of genomic elements with reference to blood cells.
[0094] E. Similarity of Methylomes
[0095] Embodiments can determine the methylomes of placental
tissues, blood cells and plasma using the same platform. Hence,
direct comparisons of the methylomes of those biological sample
types were possible. The high level of resemblance between
methylomes of the blood cells and plasma for the male and
non-pregnant female as well as between the maternal blood cells and
the post-delivery maternal plasma sample further affirmed that
hematopoietic cells were the main sources of DNA in human plasma
(Zheng at al., 2012).
[0096] The resemblances are evident both in terms of the overall
proportion of methylated CpGs in the genome as well as from the
high correlation of methylation densities between corresponding
loci in the blood cell DNA and plasma DNA. Yet, the overall
proportions of methylated CpGs in the first trimester and third
trimester maternal plasma samples were reduced when compared with
the maternal blood cell data or the post-delivery maternal plasma
sample. The reduced methylation levels during pregnancy were due to
the hypomethylated nature of the fetal DNA molecules present in
maternal plasma.
[0097] The reversal of the methylation profile in the post-delivery
maternal plasma sample to become more similar to that of the
maternal blood cells suggests that the fetal DNA molecules had been
removed from the maternal circulation. Calculation of the fetal DNA
concentrations based on SNP markers of the fetus indeed showed that
the concentration changed from 33.9% before delivery to just 4.5%
in the post-delivery sample.
[0098] F. Other Applications
[0099] Embodiments have successfully assembled DNA methylomes
through the MPS analysis of plasma DNA. The ability to determine
the placental or fetal methylome from maternal plasma provides a
noninvasive method to determine, detect and monitor the aberrant
methylation profiles associated with pregnancy-associated
conditions such as preeclampsia, intrauterine growth restriction,
preterm labor and others. For example, the detection of a
disease-specific aberrant methylation signature allows the
screening, diagnosis and monitoring of such pregnancy-associated
conditions. The measuring of the maternal plasma methylation level
allows the screening, diagnosis and monitoring of such
pregnancy-associated conditions. Besides the direct applications on
the investigation of pregnancy-associated conditions, the approach
could be applied to other areas of medicine where plasma DNA
analysis is of interest. For example, the methylomes of cancers
could be determined from plasma DNA of cancer patients. Cancer
methylomic analysis from plasma, as described herein, is
potentially a synergistic technology to cancer genomic analysis
from plasma (K. C. A. Chan at al. 2013 Clin Chem; 59:211-224 and
Leary R J et al. 2012 Sci Transl Med; 4:162ra154).
[0100] For example, the determination of a methylation level of a
plasma sample could be used to screen for cancer. When the
methylation level of the plasma sample shows aberrant levels
compared with healthy controls, cancer may be suspected. Then
further confirmation and assessment of the type of cancer or tissue
origin of the cancer may be performed by determining the plasma
profile of methylation at different genomic loci or by plasma
genomic analysis to detect tumor-associated copy number
aberrations, chromosomal translocations and single nucleotide
variants. Alternatively, radiological and imaging investigations
(e.g. computed tomography, magnetic resonance imaging, positron
emission tomography) or endoscopy (e.g. upper gastrointestinal
endoscopy or colonoscopy) could be used to further investigate
individuals who were suspected of having cancer based on the plasma
methylation level analysis.
[0101] For cancer screening or detection, the determination of a
methylation level of a plasma (or other biologic) sample can be
used in conjunction with other modalities for cancer screening or
detection such as prostate specific antigen measurement (e.g. for
prostate cancer), carcinoembryonic antigen (e.g. for colorectal
carcinoma, gastric carcinoma, pancreatic carcinoma, lung carcinoma,
breast carcinoma, medullary thyroid carcinoma), alpha fetoprotein
(e.g. for liver cancer or germ cell tumors) and CA19-9 (e.g. for
pancreatic carcinoma).
[0102] Additionally, other tissues may be sequenced to obtain a
cellular methylome. For example, liver tissue can be analyzed to
determine a methylation pattern specific to the liver, which may be
used to identify liver pathologies. Other tissues which can also be
analyzed include brain cells, bones, the lungs, the heart, the
muscles and the kidneys, etc. The methylation profiles of various
tissues may change from time to time, e.g. as a result of
development, aging, disease processes (e.g. inflammation or
cirrhosis) or treatment (e.g. treatment with demethylating agents
such as 5-azacytidine and 5-azadeoxycytidine). The dynamic nature
of DNA methylation makes such analysis potentially very valuable
for monitoring of physiological and pathological processes. For
example, if one detects a change in the plasma methylome of an
individual compared to a baseline value obtained when they were
healthy, one could then detect disease processes in organs that
contribute plasma DNA.
[0103] Also, the methylomes of transplanted organs could be
determined from plasma DNA of organ transplantation recipients.
Transplant methylomic analysis from plasma, as described in this
invention, is potentially a synergistic technology to transplant
genomic analysis from plasma (Y. W. Zheng at al, 2012; Y. M. D. Lo
at al. 1998 Lancet; 351: 1329-1330; and T. M. Snyder et al. 2011
Proc Natl Acad Sci USA; 108: 6229-6234).
III. Determining Fetal Methylome Using SNPs
[0104] As described above, the plasma methylome corresponds to the
blood methylome for a non-pregnant normal person. However, for a
pregnant female, the methylomes differ. Fetal DNA molecules
circulate in maternal plasma among a majority background of
maternal DNA (Y. M. D. Lo et al. 1998 Am J Hum Genet; 62:
768-775).
[0105] Thus, for a pregnant female, the plasma methylome is largely
a composite of the placental methylome and the blood methylome.
Accordingly, one can extract the placental methylome from
plasma.
[0106] In one embodiment, single nucleotide polymorphism (SNP)
differences between the mother and the fetus are used to identify
the fetal DNA molecules in maternal plasma. An aim was to identify
SNP loci where the mother was homozygous, but the fetus is
heterozygous; the fetal-specific allele can be used to determine
which DNA fragments are from the fetus. Genomic DNA from the
maternal blood cells was analyzed using a SNP genotyping array, the
Illumina HumanOmni2.5-8.
[0107] A. Correlation of Methylation of Fetal-Specific Reads and
Placental Methylome
[0108] Loci having two different alleles, where the amount of one
allele (B) was significantly less than the other allele (A), were
identified from sequencing results of a biological sample. Reads
covering the B alleles were regarded as fetal-specific
(fetal-specific reads). The mother is determined to be homozygous
for A and the fetus heterozygous for A/B, and thus reads covering
the A allele were shared by the mother and fetus (shared
reads).
[0109] The mother was found to be homozygous at 1,945,516 loci on
the autosomes. The maternal plasma DNA sequencing reads that
covered these SNPs were inspected. Reads carrying a non-maternal
allele was detected at 107,750 loci and these were considered the
informative loci. At each informative SNP, the allele that was not
from the mother was termed a fetal-specific allele while the other
one was termed a shared allele.
[0110] A fractional fetal/tumor DNA concentration (also called
fetal DNA percentage) in the maternal plasma can be determined. In
one embodiment, the fractional fetal DNA concentration in the
maternal plasma, f, is determined by the equation:
f = 2 .times. p p + q ##EQU00003##
where p is the number of sequenced reads with the fetal-specific
allele and q is the number of sequenced reads with the shared
allele between the mother and the fetus (Y. M. D. Lo et al. 2010
Sci Transl Med; 2:61ra91). The fetal DNA proportions in the first
trimester, third trimester and post-delivery maternal plasma
samples were found to be 14.4%, 33.9% and 4.5%, respectively. The
fetal DNA proportions were also calculated using the numbers of
reads that aligned to chromosome Y. Based on the chromosome Y data,
the results were 14.2%, 34.9% and 3.7%, respectively, in the first
trimester, third trimester and post-delivery maternal plasma
samples.
[0111] By separately analyzing the fetal-specific or shared
sequence reads, embodiments demonstrate that the circulating fetal
DNA molecules were much more hypomethylated than the background DNA
molecules. Comparisons of the methylation densities of
corresponding loci in the fetal-specific maternal plasma reads and
the placental tissue data for both the first and third trimesters
revealed high levels of correlation. These data provided genome
level evidence that the placenta is the predominant source of
fetal-derived DNA molecules in maternal plasma and represented a
major step forward compared with previous evidence based on
information derived from selected loci.
[0112] We determined the methylation density of each 1-Mb region in
the genome using either the fetal-specific or shared reads that
covered CpG sites adjacent to the informative SNPs. The fetal and
non-fetal-specific methylomes assembled from the maternal plasma
sequence reads can be displayed, for example, in Circos plots (M.
Krzywinski et al. 2009 Genome Res; 19: 1639-1645). The methylation
densities per 1-Mb bin were also determined for the maternal blood
cells and placental tissue samples.
[0113] FIG. 7A shows a Circos plot 700 for first trimester samples.
FIG. 7B shows a Circos plot 750 for third trimester samples. The
plots 700 and 750 show methylation density per 1-Mb bin. Chromosome
ideograms (outermost ring) are oriented pter-qter in a clockwise
direction (centromeres are shown in red). The second outermost
track shows the number of CpG sites in the corresponding 1-Mb
regions up to 20,000 sites. The methylation densities of the
corresponding 1-Mb regions are shown in the other tracks based on
the color scheme shown in the center.
[0114] For the first trimester samples (FIG. 7A), from inside to
outside, the tracks are: chorionic villus sample, fetal-specific
reads in maternal plasma, maternal-specific reads in maternal
plasma, combined fetal and non-fetal reads in maternal plasma, and
maternal blood cells. For the third trimester samples (FIG. 7B),
the tracks are: term placental tissue, fetal-specific reads in
maternal plasma, maternal-specific reads in maternal plasma,
combined fetal and non-fetal reads in maternal plasma,
post-delivery maternal plasma and maternal blood cells (from the
first trimester blood sample). It can be appreciated that for both
the first and third trimester plasma samples, the fetal methylomes
were more hypomethylated than those of the non-fetal-specific
methylomes.
[0115] The overall methylation profile of the fetal methylomes more
closely resembled that of the CVS or placental tissue samples. On
the contrary, the DNA methylation profile of the shared reads in
plasma, which were predominantly maternal DNA, more closely
resembled that of the maternal blood cells. We then performed a
systematic locus-by-locus comparison of the methylation densities
of the maternal plasma DNA reads and the maternal or fetal tissues.
We determined the methylation densities of CpG sites that were
present on the same sequence read as the informative SNPs and were
covered by at least 5 maternal plasma DNA sequence reads.
[0116] FIGS. 8A-8D shows plots of comparisons of the methylation
densities of genomic tissue DNA against maternal plasma DNA for CpG
sites surrounding the informative single nucleotide polymorphisms.
FIG. 8A shows methylation densities for fetal-specific reads in the
first trimester maternal plasma sample relative to methylation
densities for reads in a CVS sample. As can be seen, the
fetal-specific values correspond well to the CVS values.
[0117] FIG. 8B shows methylation densities for fetal-specific reads
in the third trimester maternal plasma sample relative to
methylation densities for reads in a term placental tissue. Again,
the sets of densities correspond well, indicating the a fetal
methylation profile can be obtained by analyzing reads with
fetal-specific alleles.
[0118] FIG. 8C shows methylation densities for shared reads in the
first trimester maternal plasma sample relative to methylation
densities for reads in maternal blood cells. Given that most of the
shared reads are from the mother, the two sets of values correspond
well. FIG. 8D shows methylation densities for shared reads in the
third trimester maternal plasma sample relative to methylation
densities for reads in maternal blood cells.
[0119] For the fetal-specific reads in maternal plasma, the
Spearman correlation coefficient between the first trimester
maternal plasma and the CVS was 0.705 (P<2.2*e-16); and that
between the third trimester maternal plasma and term placental
tissue was 0.796 (P<2.2*e-16) (FIGS. 8A and 8B). A similar
comparison was performed for the shared reads in maternal plasma
with the maternal blood cell data. The Pearson correlation
coefficient was 0.653 (P<2.2*e-16) for the first trimester
plasma sample and was 0.638 (P<2.2*e-16) for the third trimester
plasma sample (FIGS. 8C and 8D).
[0120] B. Fetal Methylome
[0121] In one embodiment, to assemble the fetal methylome from
maternal plasma, we sorted for sequence reads that spanned at least
one informative fetal SNP site and contained at least one CpG site
within the same read. Reads that showed the fetal-specific alleles
were included in the assembly of the fetal methylome. Reads that
showed the shared allele, i.e. non-fetal-specific allele, were
included in the assembly of the non-fetal-specific methylome which
was predominantly comprised of maternal-derived DNA molecules.
[0122] The fetal-specific reads covered 218,010 CpG sites on the
autosomes for the first trimester, maternal plasma samples. The
corresponding figures for the third trimester and post-delivery
maternal plasma samples were 263,611 and 74,020, respectively. On
average, the shared reads covered those CpG sites an average of
33.3, 21.7 and 26.3 times, respectively. The fetal-specific reads
covered those CpG sites 3.0, 4.4 and 1.8 times, respectively, for
the first trimester, third trimester and post-delivery maternal
plasma samples.
[0123] Fetal DNA represents a minor population in maternal plasma
and therefore the coverage of those CpG sites by fetal-specific
reads was proportional to the fetal DNA percentage of the sample.
For the first trimester maternal plasma sample, the overall
percentage of methylated CpG among the fetal reads was 47.0%, while
that for the shared reads was 68.1%. For the third trimester
maternal plasma sample, the percentage of methylated CpG of the
fetal reads was 53.3%, while that for the shared reads was 68.8%.
These data showed that the fetal-specific reads in maternal plasma
were more hypomethylated than the shared reads in maternal
plasma
[0124] C. Method
[0125] The techniques described above can also be used to determine
a tumor methylation profile. Methods for determining fetal and
tumor methylation profiles are now described.
[0126] FIG. 9 is a flowchart illustrating a method 900 for
determining a first methylation profile from a biological sample of
an organism according to embodiments of the present invention.
Method 900 can construct an epigenetic map of the fetus from the
methylation profile of maternal plasma. The biological sample
includes cell-free DNA comprising a mixture of cell-free DNA
originating from a first tissue and from a second tissue. As
examples, the first tissue can be from a fetus, a tumor, or a
transplanted organ.
[0127] At block 910, a plurality of DNA molecules are analyzed from
the biological sample. The analysis of a DNA molecule can include
determining a location of the DNA molecule in a genome of the
organism, determining a genotype of the DNA molecule, and
determining whether the DNA molecule is methylated at one or more
sites.
[0128] In one embodiment, the DNA molecules are analyzed using
sequence reads of the DNA molecules, where the sequencing is
methylation aware. Thus, the sequence reads include methylation
status of DNA molecules from the biological sample. The sequence
reads can be obtained from various sequencing techniques,
PCR-techniques, arrays, and other suitable techniques for
identifying sequences of fragments. The methylation status of sites
of the sequence read can be obtained as described herein.
[0129] At block 920, a plurality of first loci are identified at
which a first genome of the first tissue is heterozygous for a
respective first allele and a respective second allele and a second
genome of the second tissue is homozygous for the respective first
allele. For example, fetal-specific reads may be identified at the
plurality of first loci. Or, tumor-specific reads may be identified
at the plurality of first loci. The tissue-specific reads can be
identified from sequencing reads where the percentage of sequence
reads of the second allele fall within a particular range, e.g.,
about 3%-25%, thereby indicating a minority population of DNA
fragment from a heterozygous genome at the locus and a majority
population from a homozygous genome at the locus.
[0130] At block 930, DNA molecules located at one or more sites of
each of the first locus are analyzed. A number of DNA molecules
that are methylated at a site and correspond to the respective
second allele of the locus are determined. There may be more than
one site per locus. For example, a SNP might indicate that a
fragment is fetal-specific, and that fragment may have multiple
sites whose methylation status is determined. The number of reads
at each site that are methylated can be determined, and the total
number of methylated reads for the locus can be determined.
[0131] The locus may be defined by a specific number of sites, a
specific set of sites, or a particular size for a region around a
variation that comprises the tissue-specific allele. A locus can
have just one site. The sites can have specific properties, e.g.,
be CpG sites. The determination of a number of reads that are
unmethylated is equivalent, and is encompassed within the
determination of the methylation status.
[0132] At block 940, for each of the first loci, a methylation
density is calculated based on the numbers of DNA molecules
methylated at the one or more sites of the locus and corresponding
to the respective second allele of the locus. For example, a
methylation density can be determined for CpG sites corresponding
to a locus.
[0133] At block 950, the first methylation profile of the first
tissue is created from the methylation densities for the first
loci. The first methylation profile can correspond to particular
sites, e.g., CpG sites. The methylation profile can be for all loci
having a fetal-specific allele, or just some of those loci.
IV. Using Difference of Plasma and Blood Methylomes
[0134] Above, it was shown that the fetal-specific reads from
plasma correlate to the placental methylome. As the maternal
component of the maternal plasma methylome is primarily contributed
by the blood cells, the difference between the plasma methylome and
blood methylome can be used to determine the placental methylome
for all loci, and not just locations of fetal-specific alleles.
[0135] A. Method
[0136] FIG. 10 is a flowchart illustrating a method 1000 of
determining a first methylation profile from a biological sample of
an organism according to embodiments of the present invention. The
biological sample (e.g., plasma) includes cell-free DNA comprising
a mixture of cell-free DNA originating from a first tissue and from
a second tissue. The first methylation profile corresponds to a
methylation profile of the first tissue (e.g., fetal tissue or
tumor tissue). Method 1200 can provide a deduction of
differentially methylated regions from maternal plasma.
[0137] At block 1010, a biological sample is received. The
biological sample could simply be received at a machine (e.g., a
sequencing machine). The biological sample may be in the form taken
from the organism or may be in a processes form, e.g., the sample
may be plasma that is extracted from a blood sample.
[0138] At block 1020, a second methylation profile corresponding to
DNA of the second tissue is obtained. The second methylation
profile could be read from memory, as it may have been determined
previously. The second methylation profile can be determined from
the second tissue, e.g., a different sample that contains only or
predominantly cells of the second tissue. The second methylation
profile can correspond to a cellular methylation profile and be
obtained from cellular DNA. As another example, the second profile
can be determined from a plasma collected before pregnancy, or
before development of cancer because the plasma methylome of a
non-pregnant person without cancer is very similar to the methylome
of blood cells.
[0139] The second methylation profile can provide a methylation
density at each of a plurality of loci in a genome of the organism.
The methylation density at a particular locus corresponds to a
proportion of DNA of the second tissue that is methylated. In one
embodiment, the methylation density is a CpG methylation density,
where CpG sites associated with the locus are used to determine the
methylation density. If there is one site for a locus, then the
methylation density can equal the methylation index. The
methylation density also corresponds to an unmethylation density as
the two values are complementary.
[0140] In one embodiment, the second methylation profile is
obtained by performing methylation-aware sequencing of cellular DNA
from a sample of the organism. One example of methylation-aware
sequencing includes treating DNA with sodium bisulfite and then
performing DNA sequencing. In another example, the
methylation-aware sequencing can be performed without using sodium
bisulfite, using a single molecule sequencing platform that would
allow the methylation status of DNA molecules (including
N6-methyladenine, 5-methylcytosine and 5-hydroxymethylcytosine) to
be elucidated directly without bisulfite conversion (A. B. Flusberg
et al. 2010 Nat Methods; 7: 461-465; Shim J et al. 2013 Sci Rep;
3:1389. doi: 10.1038/srep01389); or through the immunoprecipitation
of methylated cytosine followed by sequencing; or through the use
of methylation-sensitive restriction enzymes followed by
sequencing. In another embodiment, non-sequencing techniques are
used, such as arrays and digital PCR.
[0141] In another embodiment, the second methylation density of the
second tissue could be obtained previously from control samples of
the subject or from other subjects. The methylation density from
another subject can act as a reference methylation profile having
reference methylation densities. The reference methylation
densities can be determined from multiple samples, where a mean
level (or other statistical value) of the different methylation
densities at a locus can be used as the reference methylation
density at the locus.
[0142] At block 1030, a cell-free methylation profile is determined
from the cell-free DNA of the mixture. The cell-free methylation
profile provides a methylation density at each of the plurality of
loci. The cell-free methylation profile can be determined by
receiving sequence reads from a sequencing of the cell-free DNA,
where the methylation information is obtained with the sequence
reads. The cell-free methylation profile can be determined in a
same manner as the cellular methylome.
[0143] At block 1040, a percentage of the cell-free DNA from the
first tissue in the biological sample is determined. In one
embodiment, the first tissue is fetal tissue, and the corresponding
DNA is fetal DNA. In another embodiment, the first tissue is tumor
tissue, and the corresponding DNA is tumor DNA. The percentage can
be determined in a variety of ways, e.g., using a fetal-specific
allele or a tumor-specific allele. Copy number can also be used to
determine the percentage, e.g., as described in U.S. patent
application Ser. No. 13/801,748 entitled "Mutational Analysis Of
Plasma DNA For Cancer Detection" filed on Mar. 13, 3013, which is
incorporated by reference.
[0144] At block 1050, a plurality of loci for determining the first
methylome are identified. These loci may correspond to each of the
loci used to determine the cell-free methylation profile and the
second methylation profile. Thus, the plurality of loci may
correspond. It is possible that more loci may be used to determine
the cell-free methylation profile and the second methylation
profile.
[0145] In some embodiments, loci that were hypermethylated or
hypomethylated in the second methylation profile can be identified,
e.g., using maternal blood cells. To identify the loci that were
hypermethylated in the maternal blood cells, one can scan from one
end of a chromosome for a CpG site with a methylation
index.gtoreq.80%. One can then search for the next CpG site within
the downstream 200-bp region. If the immediately downstream CpG
site also had a methylation index.gtoreq.80%, the first and the
second CpG sites can be grouped. The grouping can continue until
either there were no other CpG site within the next downstream
region of 200 bp; or the immediately downstream CpG site had a
methylation index <80%. The region of the grouped CpG sites can
be reported as hypermethylated in maternal blood cells if the
region contained at least five immediately adjacent hypermethylated
CpG sites. A similar analysis can be performed to search for loci
that were hypomethylated in maternal blood cells for CpG sites with
methylation indices .ltoreq.20%. The methylation densities for the
second methylation profile can calculated for the short-listed loci
and used to deduce the first methylation profile (e.g., placental
tissue methylation density) of the corresponding loci, e.g., from
maternal plasma bisulfite-sequencing data.
[0146] At block 1060, the first methylation profile of the first
tissue is determined by calculating a differential parameter that
includes a difference between the methylation density of the second
methylation profile and the methylation density of the cell-free
methylation profile for each of the plurality of loci. The
difference is scaled by the percentage.
[0147] In one embodiment, the first methylation density of a locus
in the first (e.g., placental) tissue (D) was deduced using the
equation:
D = mbc - ( mbc - mp ) f * CN ( 1 ) ##EQU00004##
where mbc denotes the methylation density of the second methylation
profile at a locus (e.g., a short-listed locus as determined in the
maternal blood cell bisulfite-sequencing data); mp denotes the
methylation density of the corresponding locus in the maternal
plasma bisulfite-sequencing data; f represented the percentage of
cell-free DNA from the first tissue (e.g., fractional fetal DNA
concentration), and CN represents copy number at the locus (e.g., a
higher value for amplifications or a lower number for deletions
relative to normal). If there is no amplification or deletion in
the first tissue then CN can be one. For trisomy (or a duplication
of the region in a tumor or a fetus), CN would be 1.5 (as the
increase is from 2 copies to 3 copies) and monosomy would have 0.5.
Higher amplification can increase by increments of 0.5. In this
example, D can correspond to the differential parameter.
[0148] At block 1070, the first methylation density is transformed
to obtain a corrected first methylation density of the first
tissue. The transformation can account for fixed differences
between the differential parameters and the actual methylation
profile of the first tissue. For example, the values may differ by
a fixed constant or by a slope. The transformation can be linear or
non-linear.
[0149] In one embodiment, the distribution of the deduced values,
D, was found to be lower than the actual methylation level of the
placental tissue. For example, the deduced values can be linearly
transformed using data from CpG islands, which were genomic
segments that had an overrepresentation of CpG sites. The genomic
positions of CpG islands used in this study were obtained from the
UCSC Genome Brower database (NCBI build 36/hg18) (P. A. Fujita et
al. 2011 Nucleic Acids Res; 39: D876-882). For example, a CpG
island can be defined as a genomic segment with GC content
.gtoreq.50%, genomic length >200 bp and the ratio of
observed/expected CpG number.gtoreq.0.6 (M. Gardiner-Garden et al
1987 J Mol Biol; 196: 261-282).
[0150] In one implementation, to derive the linear transformation
equation, CpG islands with at least 4 CpG sites and an average read
depth.gtoreq.5 per CpG site in the sequenced samples can be
included. After determining the linear relationships between the
methylation densities of CpG islands in the CVS or term placenta
and the deduced values, D, the following equations were used to
determine the predicted values:
First trimester predicted values=D.times.1.6+0.2
Third trimester predicted values=D.times.1.2+0.05
[0151] B. Fetal Example
[0152] As mentioned above, method 1000 can be used to deduce a
methylation landscape of the placenta from maternal plasma.
Circulating DNA in plasma is predominately originated from
hematopoietic cells. Still there is an unknown proportion of
cell-free DNA contributed from other internal organs. Moreover,
placenta-derived cell-free DNA accounts for approximately 5-40% of
the total DNA in maternal plasma, with a mean of approximately 15%.
Thus, one can make an assumption that the methylation level in
maternal plasma is equivalent to an existing background methylation
plus a placental contribution during pregnancy, as described
above.
[0153] The maternal plasma methylation level, MP, can be determined
using the following equation:
MP=BKG.times.(1-f)+PLN.times.f
where BKG is the background DNA methylation level in plasma derived
from blood cells and internal organs, PLN is the methylation level
of placenta and f is the fractional fetal DNA concentration in
maternal plasma.
[0154] The methylation level of placenta can theoretically be
deduced by:
PLN = MP - BKG .times. ( 1 - f ) f ( 2 ) ##EQU00005##
Equations (1) and (2) are equivalent when CN equals one, D equals
PLN, and BKG equals mbc.
[0155] The methylation level of maternal blood was taken to
represent the background methylation of maternal plasma. Besides
the loci that were hypermethylated or hypomethylated in maternal
blood cells, we further explored the deduction approach by focusing
on defined regions with clinical relevance, for instance, CpG
islands in the human genome.
[0156] The mean methylation density of a total of 27,458 CpG
islands (NCBI Build36/hg18) on the autosomes and chrX was derived
from the sequencing data of maternal plasma and placenta. Only
those with .gtoreq.10 CpG sites covered and an averaged read
depth.gtoreq.5 per covered CpG sites in all analyzed samples,
including the placenta, maternal blood and maternal plasma, were
selected. As a result, 26,698 CpG islands (97.2%) remained as valid
and their methylation level was deduced using the plasma
methylation data and the fractional fetal DNA concentration
according to the above equation.
[0157] It was noticed that the distribution of deduced PLN values
was lower than the actual methylation level of CpG islands in the
placental tissue. Thus, in one embodiment, the deduced PLN values,
or simply deduced values (D), were used as an arbitrary unit for
estimating the methylation level of CpG islands in the placenta.
After a transformation, the deduced values linearly and their
distribution became more alike to the actual dataset. The
transformed deduced values were named methylation predictive values
(MPV) and subsequently used for predicting the methylation level of
genetic loci in the placenta.
[0158] In this example, the CpG islands were classified into 3
categories based on their methylation densities in the placenta:
Low (.ltoreq.0.4), Intermediate (>0.4-<0.8) and High
(.gtoreq.0.8). Using the deduction equation, we calculated the MPV
of the same set of CpG islands and then used the values to classify
them into 3 categories with the same cutoffs. By comparing the
actual and the deduced datasets, we found that 75.1% of the
short-listed CpG islands could be matched correctly to the same
categories in the tissue data according to their MPS. About 22% of
the CpG islands were assigned to groups with 1-level difference
(high versus intermediate, or intermediate versus low) and less
than 3% would be completely misclassified (high versus low) (FIG.
12A). The overall classification performance was also determined:
86.1%, 31.4% and 68.8% of CpG islands with methylation densities
.ltoreq.0.4, >0.4-<0.8 and .gtoreq.0.8 in the placenta were
deduced to be "Low", "Intermediate" and "High" correctly (FIG.
12B).
[0159] FIGS. 11A and 11B shows graphs of the performance of the
predicting algorithm using maternal plasma data and fractional
fetal DNA concentration according to embodiments of the present
invention. FIG. 11A is a graph 1100 showing the accuracy of CpG
island classification using the MPV correction classification (the
deduced category matches exactly the actual dataset); 1-level
difference (the deduced category is 1-level different from the
actual dataset); and misclassification (the deduced category is
opposite to the actual dataset). FIG. 11B is a graph 1150 showing
the proportion of CpG islands classified in each deduced
category.
[0160] Provided that the maternal background methylation is low in
the respective genomic regions, the presence of hypermethylated
placental-derived DNA in the circulation would increase the overall
plasma methylation level to a degree depending on the fractional
fetal DNA concentration. A marked change could be observed when the
fetal DNA released is fully methylated. On the contrary, when the
maternal background methylation is high, the degree of change in
the plasma methylation level would become more significant if
hypomethylated fetal DNA is released. Therefore, the deduction
scheme may be more practical when the methylation level was deduced
for genetic loci which are known to be distinct between the
maternal background and the placenta, especially for those
hypermethylated and hypomethylated markers in the placenta.
[0161] FIG. 12A is a table 1200 showing details of 15 selected
genomic loci for methylation prediction according to embodiments of
the present invention. To confirm techniques, we selected 15
differentially methylated genomic loci which were studied
previously. The methylation levels of selected regions were deduced
and compared to previously studied 15 differentially methylated
genetic loci (R. W. K. Chiu et al. 2007 Am J Pathol; 170: 941-950;
S. S. C. Chim et al. 2008 Clin Chem; 54: 500-511; S. S. C. Chim et
al. 2005 Proc Natl Acad Sci USA; 102: 14753-14758; D. W. Y. Tsui et
al. 2010 PLoS One; 5: e15069).
[0162] FIG. 12B is a graph 1250 showing the deduced categories of
the 15 selected genomic loci are provided with their corresponding
methylation levels in the placenta. Deduced methylation categories
are: Low, .ltoreq.0.4; Intermediate, >0.4-<0.8; High,
.gtoreq.0.8. Table 1200 and graph 1300 shows that their methylation
levels in placenta could be deduced correctly with some exceptions:
RASSF1A, CGI009, CGI137 and VAPA. Out of these 4 markers, only
CGI009 showed a marked discrepancy with the actual dataset. The
others were just marginally misclassified.
[0163] In table 1200, "1" refers to the deduced values (D) being
calculated by the equation:
D = MP - BKG .times. ( 1 - f ) f ##EQU00006##
where f is the fraction fetal DNA concentration. The label "2"
refers to the methylation predictive values (MPV) referring to the
linearly transformed deduced values using the equation:
MPV=D.times.1.6+0.25. Label "3" refers to the classification cutoff
for the deduced values: Low, .ltoreq.0.4; Inter(mediate),
>0.4-<0.8; High, .gtoreq.0.8. Label "4" refers to the
classification cutoff for the actual placental dataset: Low,
.ltoreq.0.4; Inter(mediate), >0.4-<0.8; High, .gtoreq.0.8.
Label "5" denotes that placental status refers to the methylation
status of placenta relative to that of maternal blood cells.
[0164] C. Calculation of Fractional Concentrations of Fetal DNA
[0165] In one embodiment, the percentage of fetal DNA from the
first tissue can use a Y chromosome for a male fetus. The
proportion of chromosome Y (% chrY) sequences in a maternal plasma
sample was a composite of the chromosome Y reads derived from the
male fetus and the number of maternal (female) reads that were
misaligned to chromosome Y (R. W. K. Chiu et al. 2011 BMJ; 342:
c7401). Thus, the relationship between % chrY and the fractional
fetal DNA concentration (f) in the sample can be given by:
% chrY=% chrY.sub.male.times.f+% chrY.sub.female.times.(1-f)
where % chrY.sub.male refers to a proportion of reads aligned to
chromosome Y in a plasma sample containing 100% male DNA; and %
chrY.sub.female refers to the proportion of reads aligned to
chromosome Y in a plasma sample containing 100% female DNA.
[0166] % chrY can be determined from reads that were aligned to
chromosome Y with no mismatches for a sample from a female pregnant
with a male fetus, e.g., where the reads are from
bisulfite-converted samples. The % chrY.sub.male value can be
obtained from the bisulfite-sequencing of two adult male plasma
samples. The % chrY.sub.female value can be obtained from the
bisulfite-sequencing of two non-pregnant adult female plasma
samples.
[0167] In other embodiments, the fetal DNA percentage can be
determined from fetal-specific alleles on an autosome. As another
example, epigenetic markers may be used to determine the fetal DNA
percentage. Other ways of determining the fetal DNA percentage may
also be used.
[0168] D. Method of Using Methylation to Determine Copy Number
[0169] The placental genome is more hypomethylated than the
maternal genome. As discussed above the methylation of the plasma
of a pregnant woman is dependent on the fractional concentration of
placentally-derived fetal DNA in the maternal plasma. Therefore,
through the analysis of the methylation density of a chromosomal
region, it is possible to detect the difference in the contribution
of fetal tissues to the maternal plasma. For example, in a pregnant
woman carrying a trisomic fetus (e.g. suffering from trisomy 21 or
trisomy 18 or trisomy 13), the fetus would contribute an additional
amount of the DNA from the trisomic chromosome to the maternal
plasma when compared with the disomic chromosomes. In this
situation, the plasma methylation density for the trisomic
chromosome (or any chromosomal region that has an amplification)
would be lower than those for the disomic chromosomes. The degree
of difference can be predicted by mathematical calculation by
taking into account the fractional fetal DNA concentration in the
plasma sample. The higher the fractional fetal DNA concentration in
the plasma sample the larger the difference in methylation density
between the trisomic and disomic chromosomes would be. For regions
having a deletion, the methylation density would be higher.
[0170] From the previous discussion, the plasma methylation density
for a disomic chromosome (MP.sub.Non-aneu) can be calculated as:
MP.sub.Non-aneu=BKG.times.(1-f)+PLN.times.f,
where BKG is the background DNA methylation level in plasma derived
from blood cells and internal organs, PLN is the methylation level
of placenta and f is the fractional fetal DNA concentration in
maternal plasma.
[0171] The plasma methylation density for a trisomic chromosome
(MP.sub.Aneu) can be calculated as: MP.sub.Aneu=BKG.times.
(1-f)+PLN.times. f.times.1.5, where the 1.5 corresponds to the copy
number CN and the addition of one more chromosome is a 50%
increase. The difference between a trisomic and disomic chromosomes
(MP.sub.Diff) would be
MP.sub.Diff=PLN.times.f.times.0.5.
[0172] In one embodiment, a comparison of the methylation density
of the potentially aneuploid chromosome (or chromosomal region) to
one or more other presumed non-aneuploid chromosome(s) or the
overall methylation density of the genome can be used to
effectively normalize the fetal DNA concentration in the plasma
sample. The comparison be via a calculation of a parameter (e.g.,
involving a ratio or a difference) between the methylation
densities of the two regions to obtain a normalized methylation
density. The comparison can remove a dependence of the resulting
methylation level (e.g., determined as a parameter from the two
methylation densities).
[0173] If the methylation density of the potentially aneuploid
chromosome is not normalized to the methylation density of one or
more other chromosome(s), or other parameters that reflect the
fractional concentration of fetal DNA, the fractional concentration
would be a major factor affecting the methylation density in the
plasma. For example, the plasma methylation density of chromosome
21 of a pregnant woman carrying a trisomy 21 fetus with a
fractional fetal DNA concentration of 10% would be the same as that
of a pregnant woman carrying a euploid fetus and the fractional
fetal DNA concentration is 15%, whereas a normalized methylation
density would show a difference.
[0174] In another embodiment, the methylation density of the
potentially aneuploid chromosome can be normalized to the
fractional fetal DNA concentration. For example, the following
equation can be applied to normalize the methylation density:
MP.sub.Normalized=MP.sub.non-normalized+(BKG-PLN).times.f, where
MP.sub.Normalized is the methylation density normalized with the
fractional fetal DNA concentration in the plasma,
MP.sub.non-normalized is the measured methylation density, BKG is
the background methylation density from maternal blood cells or
tissues, PLN is the methylation density in the placental tissues,
and f is the fractional fetal DNA concentration. The methylation
densities of BKG and PLN could be based on reference values
previously established from maternal blood cells and placental
tissues obtained from healthy pregnancies. Different genetic and
epigenetic methods can be used for the determination of the
fractional fetal DNA concentration in the plasma sample, for
example by the measurement of the percentage of sequence reads from
the chromosome Y using massively parallel sequencing or PCR on
non-bisulfite-converted DNA.
[0175] In one implementation, the normalized methylation density
for a potentially aneuploid chromosome can be compared to a
reference group which consists of pregnant woman carrying euploid
fetuses. The mean and SD of the normalized methylation density of
the reference group can be determined. Then the normalized
methylation density of the tested case can be expressed as a
z-score which indicates the number of SDs from the mean of the
reference group by:
z - score = MP Normalized - Mean SD , ##EQU00007##
where MP.sub.Normalized is the normalized methylation density for
the tested case, Mean is the mean of the normalized methylation
density of the reference cases and SD is the standard deviation of
the normalized methylation density of the reference cases. A
cutoff, for example z-score <-3, can be used to classify if a
chromosome is significantly hypomethylated and, hence, to determine
if the aneuploidy status of the sample.
[0176] In another embodiment, the MP.sub.Diff can be used as the
normalized methylation density. In such an embodiment, PLN can be
deduced, e.g., using method 1000. In some implementations, a
reference methylation density (which can be normalized using f) can
be determined from a methylation level of a non-aneuploid region.
For example, the Mean could be determined from one or more
chromosomal regions of the same sample. The cutoff could be scaled
by f, or just set to a level sufficient as long as a minimum
concentration exists.
[0177] Accordingly, a comparison of a methylation level for a
region to a cutoff can be accomplished in various ways. The
comparison can involve a normalization (e.g., as described above),
which may be performed equivalently on the methylation level or the
cutoff value, depending on how the values are defined. Thus,
whether the determined methylation level of a region is
statistically different than a reference level (determined from
same sample or other samples) can be determined in a variety of
ways.
[0178] The above analysis can be applied to the analysis of
chromosomal regions, which can include a whole chromosome or parts
of the chromosome, including contiguous or disjoint subregions of a
chromosome. In one embodiment, the potentially aneuploid chromosome
can be divided into a number of bins. The bins can be of the same
or different sizes. The methylation density of each bin can be
normalized to the fractional concentration of the sample or to the
methylation density of one or more presumed non-aneuploid
chromosome(s) or the overall methylation density of the genome. The
normalized methylation density of each bin can then be compared
with a reference group to determine if it is significantly
hypomethylated. Then the percentage of bins being significantly
hypomethylated can be determined. A cutoff, for examples more than
5%, 10%, 15%, 20% or 30% of the bins being significantly
hypomethylated can be used to classify the aneuploidy status of the
case.
[0179] When one is testing for an amplification or a deletion, one
can compare the methylation density to a reference methylation
density, which may be specific for a particular region being
tested. Each region may have a different reference methylation
density as methylation can vary from region to region, particularly
depending on the size of the regions (e.g., smaller regions will
show more variation).
[0180] As mentioned above, one or more pregnant women each carrying
a euploid fetus can be used to define the normal range of the
methylation density for a region of interest or a difference in
methylation density between two chromosomal regions. A normal range
can also be determined for the PLN (e.g., by direct measurement or
as deduced by method 1000). In other embodiments, a ratio between
two methylation densities can be used, e.g., of a potentially
aneuploid chromosome and a non-aneuploid chromosome can be used for
the analysis instead of their difference. This methylation analysis
approach can be combined with sequence read counting approach (R W
K Chiu et al. 2008 Proc Natl Acad Sci USA; 105:20458-20463) and
approaches involving size analysis of plasma DNA (US patent
2011/0276277) to determine or confirm an aneuploidy.
[0181] The use of BKG can account for variations in the background
between samples. For example, one female might have different BKG
methylation levels than another female, but a difference between
the BKG and PLN can be used across samples in such situations. The
cutoff for different chromosomal regions can be different, e.g.,
when a methylation density of one region of the genome differs
relative to another region of the genome.
[0182] This approach can be generalized to detect any chromosomal
aberrations, including deletion and amplification, in the fetal
genome. In addition, the resolution of this analysis can be
adjusted to the desired level, for example, the genome can be
divided into 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 100 kb bins. Hence,
this technology can also be used for detecting subchromosomal
duplication or subchromosomal deletion. This technology would thus
allow a prenatal fetal molecular karyotype to be obtained
noninvasively. When used in this manner, this technology can be
used in combination with the noninvasive prenatal testing methods
that are based on the counting of molecules (A. Srinivasan et al.
2013 Am J Hum Genet; 92:167-176). In other embodiments, the size of
the bins need not be identical. For example, the size of the bins
may be adjusted so that each bin contains an identical number of
CpG dinucleotides. In this case, the physical size of the bins
would be different.
[0183] The equation can be rewritten to apply to different types of
chromosome aberrations as
MP.sub.Diff=(BKG-PLN).times.f.times.0.5.times.CN. Here CN
represents the number of copy number change at the affected region.
CN equals to 1 for the gain of 1 copy of a chromosome, 2 for the
gain of 2 copies of a chromosome and -1 for the loss of one of the
two homologous chromosomes (e.g. for detecting fetal Turner
syndrome in which a female fetus has lost one of the X chromosomes,
leading to a XO karyotype). This equation need not be changed when
the size of the bins are changed. However, the sensitivity and
specificity may reduce when smaller bin size is used because a
smaller number of CpG dinucleotides (or other nucleotide
combinations showing differential methylation between fetal DNA and
maternal DNA) would be present in smaller bins, leading to
increased stochastic variation in the measurement of methylation
densities. In one embodiment, the number of reads required can be
determined by analyzing the coefficient of variation of the
methylation density and the desired level of sensitivity.
[0184] To demonstrate the feasibility of this approach, we have
analyzed the plasma samples from 9 pregnant women. In five pregnant
women, each was carrying a euploid fetus and the other four were
each carrying a trisomy 21 (T21) fetus. Three of the five euploid
pregnancies were randomly selected to form a reference group. The
remaining two euploid pregnancy cases (Eu1 and Eu2) and the four
T21 cases (T21-1, T21-2, T21-3 and T21-4) were analyzed using this
approach to test for a potential T21 status. The plasma DNA was
bisulfite-converted and sequenced using the Illumina HiSeq2000
platform. In one embodiment, the methylation density of individual
chromosomes were calculated. The difference in methylation density
between chromosome 21 and the mean of the other 21 autosomes was
then determined to obtain a normalized methylation density. The
mean and SD of the reference group was used for the calculated of
the z-score of the five test cases.
TABLE-US-00001 TABLE 1 Using a cutoff of <-3 for z-score to
classify a sample to be T21, the classification of all the euploid
and T21 cases were correct. Eu1 Eu2 T21-1 T21-2 T21-3 T21-4 z-score
for MP.sub.Diff -1.48 1.09 -4.46 -5.30 -8.06 -5.69 between chr 21
and other autosomes
[0185] In another embodiment, the genome was divided into 1 Mb bins
and the methylation density for each 1 Mb bin can be determined.
The methylation density of all the bins on the potentially
aneuploid chromosome can be normalized with the median methylation
density of all the bins located on the presumed non-aneuploid
chromosomes. In one implementation, for each bin, the difference in
methylation density from the median of the non-aneuploid bins can
be calculated. Then z-score can be calculated for these values
using the mean and SD values of the reference group.
TABLE-US-00002 TABLE 2 Using 5% as a cutoff for the bins with
significantly more hypomethylated on chromosome 21, all the cases
were classified correctly for T21 status. Eu1 Eu2 T21-1 T21-2 T21-3
T21-4 Percentage of 0% 0% 33.3% 58.3% 19.4% 52.8% bins on chr 21
have a z-score of MP.sub.Diff <-3
[0186] This DNA methylation-based approach for detecting fetal
chromosomal or subchromosomal aberrations can be used in
conjunction with those based on the counting of molecules such as
by sequencing (R. W. Chiu et al. 2008 Proc Natl Acad Sci USA; 105:
20458-20463) or digital PCR (Y. M. Lo et al. 2007 Proc Natl Acad
Sci USA; 104: 13116-13121), or the sizing of DNA molecules (US
Patent Publication 2011/0276277). Such combination (e.g. DNA
methylation plus molecular counting, or DNA methylation plus
sizing, or DNA methylation plus molecular counting plus sizing)
would have a synergistic effect which would be advantageous in a
clinical setting, e.g. improving the sensitivity and specificity.
For example, the number of DNA molecules that would need to be
analyzed, e.g. by sequencing, can be reduced without adversely
impacting the diagnostic accuracy. This feature would allow such
tests to be done more economically. As another example, for a given
number of DNA molecules analyzed, a combined approach would allow
fetal chromosomal or subchromosomal aberrations to be detected at a
lower fractional concentration of fetal DNA.
[0187] FIG. 13 is a flowchart of a method 1300 for detecting a
chromosomal abnormality from a biological sample of an organism.
The biological sample includes cell-free DNA comprising a mixture
of cell-free DNA originating from a first tissue and from a second
tissue. The first tissue may be from a fetus or tumor and the
second tissue may be from a pregnant female or a patient.
[0188] At block 1310, a plurality of DNA molecules from the
biological sample are analyzed. The analysis of a DNA molecule can
include determining a location of the DNA molecule in a genome of
the organism and determining whether the DNA molecule is methylated
at one or more sites. The analysis can be performed by receiving
sequence reads from a methylation-aware sequencing, and thus the
analysis can be performed just on data previously obtained from the
DNA. In other embodiments, the analysis can include the actual
sequencing or other active steps of obtaining the data.
[0189] The determining of a location can include mapping the DNA
molecules (e.g., via sequence reads) to respective parts of the
human genome, e.g., to specific regions. In one implementation, if
a read does not map to a region of interest, then the read can be
ignored.
[0190] At block 1320, a respective number of DNA molecules that are
methylated at the site is determined for each of a plurality of
sites. In one embodiment, the sites are CpG sites, and may be only
certain CpG sites, as selected using one or more criteria mentioned
herein. The number of DNA that are methylated is equivalent to
determining the number that are unmethylated once normalization is
performed using a total number of DNA molecules analyzed at a
particular site, e.g., a total number of sequence reads.
[0191] At block 1330, a first methylation level of a first
chromosomal region is calculated based on the respective numbers of
DNA molecules methylated at sites within the first chromosomal
region. The first chromosomal region can be of any size, e.g.,
sizes mentioned above. The methylation level can account for a
total number of DNA molecules aligned to the first chromosomal
region, e.g., as part of a normalization procedure.
[0192] The first chromosomal region may be of any size (e.g., a
whole chromosome) and may be composed of disjoint subregions, i.e.,
subregions are separated from each other. Methylation levels of
each subregion can be determined and the combined, e.g., as an
average or median, to determine a methylation level for the first
chromosomal region.
[0193] At block 1340, the first methylation level is compared to a
cutoff value. The cutoff value may be a reference methylation level
or be related to a reference methylation level (e.g., a specified
distance from a normal level). The cutoff value may be determined
from other female pregnant subjects carrying fetuses without a
chromosomal abnormality for the first chromosomal region, from
samples of individuals without cancer, or from loci of the organism
that are known to not be associated with an aneuploidy (i.e.,
regions that are disomic).
[0194] In one embodiment, the cutoff value can be defined as having
a difference from a reference methylation level of (BKG-PLN).times.
f.times.0.5.times. CN, where BKG is the background of the female
(or an average or median from other subjects), f is the
concentration of cell-free DNA originating from the first tissue,
and CN is a copy number being tested. CN is an example of a scale
factor corresponding to a type of abnormality (deletion or
duplication). A cutoff for a CN of 1 can be used to test all
amplifications initially, and then further cutoffs can be used to
determine the degree of amplification. The cutoff value can be
based on a concentration of cell-free DNA originating from the
first tissue using other formula.
[0195] At block 1350, a classification of an abnormality for the
first chromosomal region is determined based on the comparison. A
statistically significant difference in levels can indicate
increased risk of the fetus having a chromosomal abnormality. In
various embodiments, the chromosomal abnormality can be trisomy 21,
trisomy 18, trisomy 13, Turner syndrome, or Klinefelter syndrome.
Other examples are a subchromosomal deletion, subchromosomal
duplication, or DiGeorge syndrome.
V. Determination of Markers
[0196] As noted above, certain parts of the fetal genome are
methylated differently than the maternal genome. These differences
can be common across pregnancies. The regions of different
methylation can be used to identify DNA fragments that are from the
fetus.
[0197] A. Method to Determine DMRs from Placental Tissue and
Maternal Tissue
[0198] The placenta has tissue-specific methylation signatures.
Fetal-specific DNA methylation markers have been developed for
maternal plasma detection and for noninvasive prenatal diagnostic
applications based on loci that are differentially methylated
between placental tissues and maternal blood cells (S. S. C. Chim
et al. 2008 Clin Chem; 54: 500-511; E. A. Papageorgiou et al 2009
Am J Pathol; 174: 1609-1618; and T. Chu et al. 2011 PLoS One; 6:
e14723). Embodiments for mining for such differentially methylated
regions (DMRs) on a genome-wide basis are provided.
[0199] FIG. 14 is a flowchart of a method 1400 for identifying
methylation markers by comparing a placental methylation profile to
a maternal methylation profile (e.g., determined from blood cells)
according to embodiments of the present invention. Method 1400 may
also be used to determine markers for a tumor by comparing a tumor
methylation profile to a methylation profile corresponding to
healthy tissue.
[0200] At block 1410, a placental methylome and a blood methylome
is obtained. The placental methylome can be determined from a
placental sample, e.g., CVS or a term placenta. Methylome should be
understood to possible include methylation densities of only part
of a genome.
[0201] At block 1420, a region is identified that includes a
specified number of sites (e.g., 5 CpG sites) and for which a
sufficient number of reads have been obtained. In one embodiment,
the identification began from one end of each chromosome to locate
the first 500-bp region that contained at least five qualified CpG
sites. A CpG site may be deemed qualified if the site was covered
by at least five sequence reads.
[0202] At block 1430, a placental methylation index and a blood
methylation index is calculated for each site. For example, the
methylation index was calculated individually for all qualified CpG
sites within each 500-bp region.
[0203] At block 1440, the methylation indices were compared between
the maternal blood cells and the placental sample to determine if
the sets of indices were different between each other. For example,
the methylation indices were compared between the maternal blood
cells and the CVS or the term placenta using, for example, the
Mann-Whitney test. A P-value of, for example, .ltoreq.0.01 was
considered as statistically significantly different, although other
values may be used, where a lower number would reduce false
positive regions.
[0204] In one embodiment, if the number of qualified CpG sites was
less than five or the Mann-Whitney test was non-significant, the
500-bp region shifted downstream for 100 bp. The region continued
to be shifted downstream until the Mann-Whitney test became
significant for a 500-bp region. The next 500-bp region would then
be considered. If the next region was found to exhibit statistical
significance by the Mann-Whitney test, it would be added to the
current region as long as the combined contiguous region is no
larger than 1,000 bp.
[0205] At block 1450, adjacent regions that were statistically
significantly different (e.g., by the Mann-Whitney test) can be
merged. Note the difference is between the methylation indices for
the two samples. In one embodiment, if the adjacent regions are
within a specified distance (e.g., 1,000 bp) of each other and if
they showed a similar methylation profile then they would be
merged. In one implementation, the similarity of the methylation
profile between adjacent regions can be defined using any of the
following: (1) showing the same trend in the placental tissue with
reference to the maternal blood cells, e.g. both regions were more
methylated in the placental tissues than the blood cells; (2) with
differences in methylation densities of less than 10% for the
adjacent regions in the placental tissue; and (3) with differences
in methylation densities of less than 10% for the adjacent regions
in the maternal blood cells.
[0206] At block 1460, methylation densities of the blood methylome
from maternal blood cell DNA and placental sample (e.g., CVS or
term placental tissue) at the regions were calculated. The
methylation densities can be determined as described herein.
[0207] At block 1470, putative DMRs where total placental
methylation density and a total blood methylation density for all
the sites in the region are statistically significantly different
is determined. In one embodiment, all qualified CpG sites within a
merged region are subjected to a .chi..sup.2 test. The .chi..sup.2
test assessed if the number of methylated cytosines as a proportion
of the methylated and unmethylated cytosines among all the
qualified CpG sites within the merged region was statistically
significantly different between the maternal blood cells and
placental tissue. In one implementation, for the .chi..sup.2 test,
a P-value of .ltoreq.0.01 may be considered as statistically
significantly different. The merged segments that showed
significance by the .chi..sup.2 test were considered as putative
DMRs.
[0208] At block 1480, loci where the methylation densities of the
maternal blood cell DNA were above a high cutoff or below a low
cutoff were identified. In one embodiment, loci were identified
where the methylation densities of the maternal blood cell DNA were
either .ltoreq.20% or .gtoreq.80%. In other embodiments, bodily
fluids other than maternal blood can be used, including, but not
limited to saliva, uterine or cervical lavage fluid from the female
genital tract, tears, sweat, saliva, and urine.
[0209] A key to the successful development of DNA methylation
markers that are fetal-specific in maternal plasma can be that the
methylation status of the maternal blood cells are either as highly
methylated or as unmethylated as possible. This can reduce (e.g.,
minimize) the chance of having maternal DNA molecules interfering
with the analysis of the placenta-derived fetal DNA molecules which
show an opposite methylation profile. Thus, in one embodiment,
candidate DMRs were selected by further filtering. The candidate
hypomethylated loci were those that showed methylation densities
.ltoreq.20% in the maternal blood cells and with at least 20%
higher methylation densities in the placental tissues. The
candidate hypermethylated loci were those that showed methylation
densities .gtoreq.80% in the maternal blood cells and with at least
20% lower methylation densities in the placental tissues. Other
percentages may be used.
[0210] At block 1490, DMRs were then identified among the subset of
loci where the placental methylation densities are significantly
different from the blood methylation densities by comparing the
difference to a threshold. In one embodiment, the threshold is 20%,
so the methylation densities differed by at least 20% from the
methylation densities of the maternal blood cells. Accordingly, a
difference between placental methylation densities and blood
methylation densities at each identified loci can be calculated.
The difference can be a simple subtraction. In other embodiments,
scaling factors and other functions can be used to determine the
difference (e.g., the difference can be the result of a function
applied to the simple subtraction).
[0211] In one implementation, using this method, 11,729
hypermethylated and 239,747 hypomethylated loci were identified
from the first trimester placental sample. The top 100
hypermethylated loci are listed in table S2A of the appendix. The
top 100 hypomethylated loci are listed in table S2B of the
appendix. The tables list the chromosome, the start and end
location, the size of the region, the methylation density in
maternal blood, the methylation density in the placenta sample, the
P-values (which are all very small), and the methylation
difference. The locations correspond to reference genome hg18,
which can be found at
hgdownload.soe.ucsc.edu/goldenPath/hg18/chromosomes.
[0212] 11,920 hypermethylated and 204,768 hypomethylated loci were
identified from the third trimester placental sample. The top 100
hypermethylated loci for the 3.sup.rd trimester are listed in table
S2C, and the top 100 hypomethylated loci are listed in table S2D.
Thirty-three loci that were previously reported to be
differentially methylated between maternal blood cells and first
trimester placental tissues were used to validate our list of first
trimester candidates. 79% of the 33 loci had been identified as
DMRs using our algorithm.
[0213] FIG. 15A is a table 1500 showing a performance of first
trimester DMR identification algorithm using placental methylome
with reference to 33 previously reported first trimester markers.
In the table, "a" indicates that loci 1 to 15 were previously
described in (R. W. K. Chiu et al. 2007 Am J Pathol; 170:941-950
and S. S. C. Chim et al. 2008 Clin Chem; 54:500-511); loci 16 to 23
were previously described in (K. C. Yuen, thesis 2007, The Chinese
University of Hong Kong, Hong Kong); and loci 24 to 33 were
previously described in (E. A. Papageorgiou et al. 2009 Am J
Pathol; 174:1609-1618). "b" indicates that these data were derived
from the above publications. "c" indicates that methylation
densities of maternal blood cells and chorionic villus sample and
their differences were observed from the sequencing data generated
in the present study but based on the genomic coordinates provided
by the original studies. "d" indicates that data on the loci
identified using embodiments of method 1400 on the bisulfite
sequencing data without taking reference from the publications
cited above by Chiu et al (2007), Chim et al (2008), Yuen (2007)
and Papageorgiou et al (2009). The span of the loci included the
previously reported genomic regions but in general spanned larger
regions. "e" indicates that a candidate DMR was classified as
true-positive (TP) or false-negative (FN) based on the requirement
of observing >0.20 difference between the methylation densities
of the corresponding genome coordinates of the DMRs in maternal
blood cells and chorionic villus sample.
[0214] FIG. 15B is a table 1550 showing a performance of third
trimester DMR identification algorithm using the placental
methylome measured using the placenta sample obtained at delivery.
"a" indicates that the same list of 33 loci as described in FIG.
17A were used. "b" indicates that as the 33 loci were previously
identified from early pregnancy samples, they might not be
applicable to the third trimester data. Hence, the bisulfite
sequencing data generated in the present study on the term
placental tissue based on the genomic coordinates provided by the
original studies were reviewed. A difference of >0.20 in the
methylation densities between the maternal blood cell and term
placental tissue was used to determine if the loci were indeed true
DMRs in the third trimester. "c" indicates that the data on the
loci was identified using method 1400 on the bisulfite sequencing
data without taking reference from previously cited publications by
Chiu et al (2007), Chim et al (2008), Yuen (2007) and Papageorgiou
et al (2009). The span of the loci included the previously reported
genomic regions but in general spanned larger regions. "d"
indicates that candidate DMRs that contained loci which qualified
as differentially methylated in the third trimester were classified
as true-positive (TP) or false-negative (FN) based on the
requirement of observing >0.20 difference between the
methylation densities of the corresponding genome coordinates of
the DMRs in maternal blood cells and term placental tissue. For
loci that did not qualify as differentially methylated in the third
trimester, their absence in the DMR list or the presence of a DMR
containing the loci but showing methylation difference of <0.20
was considered as true negative (TN) DMRs.
[0215] B. DMRs from the Maternal Plasma Sequencing Data
[0216] One should be able to identify placental tissue DMRs
directly from the maternal plasma DNA bisulfite-sequencing data
provided that the fractional fetal DNA concentration of the sample
was also known. It is possible because the placenta is the
predominant source of fetal DNA in maternal plasma (S. S. Chim et
al. 2005 Proc Natl Acad Sci USA 102, 14753-14758) and we showed in
this study that the methylation status of fetal-specific DNA in
maternal plasma correlated with the placental methylome.
[0217] Therefore, aspects of method 1400 may be implemented using a
plasma methylome to determine a deduced placental methylome instead
of using a placental sample. Thus, method 1000 and method 1400 can
be combined to determine DMRs. Method 1000 can be used to determine
the predicted values for the placental methylation profile and use
them in method 1400. For this analysis, the example also focuses on
loci that were either .ltoreq.20% or .gtoreq.80% methylated in the
maternal blood cells.
[0218] In one implementation, to deduce loci that were
hypermethylated in the placental tissues with respect to maternal
blood cells, we sorted for loci that showed .ltoreq.20% methylation
in maternal blood cells, and .gtoreq.60% methylation according to
the predicted value with a difference of at least 50% between the
blood cell methylation density and the predicted value. To deduce
loci that were hypomethylated in the placental tissues with respect
to maternal blood cells, we sorted for loci that showed .gtoreq.80%
methylation in maternal blood cells, and .ltoreq.40% methylation
according to the predicted value with a difference of at least 50%
between the blood cell methylation density and the predicted
value.
[0219] FIG. 16 is a table 1600 showing the numbers of loci
predicted to be hypermethylated or hypomethylated based on direct
analysis of the maternal plasma bisulfite-sequencing data. "N/A"
means not applicable. "a" indicates that the search for
hypermethylated loci started from the list of loci showing
methylation densities <20% in the maternal blood cells. "b"
indicates that the search for hypomethylated loci started from the
list of loci showing methylation densities >80% in the maternal
blood cells. "c" indicates that bisulfite-sequencing data from the
chorionic villus sample was used for verifying the first trimester
maternal plasma data, and the term placental tissue was used for
verifying the third trimester maternal plasma data.
[0220] As shown in table 1600, a majority of the noninvasively
deduced loci showed the expected methylation pattern in the tissues
and overlapped with the DMRs mined from the tissue data and
presented in the earlier section. The appendix lists DMRs
identified from the plasma. Table S3A lists the top 100 loci
deduced to be hypermethylated from the first trimester maternal
plasma bisulfite-sequencing data. Table S3B lists the top 100 loci
deduced to be hypomethylated from the first trimester maternal
plasma bisulfite-sequencing data. Table S3C lists the top 100 loci
deduced to be hypermethylated from the third trimester maternal
plasma bisulfite-sequencing data. Table S3D lists the top 100 loci
deduced to be hypomethylated from the third trimester maternal
plasma bisulfite-sequencing data.
[0221] C. Gestational Variation in Placental and Fetal
Methylomes
[0222] The overall proportion of methylated CpGs in the CVS was 55%
while it was 59% for the term placenta (table 100 of FIG. 1). More
hypomethylated DMRs could be identified from CVS than the term
placenta while the number of hypermethylated DMRs was similar for
the two tissues. Thus, it was evident that the CVS was more
hypomethylated than the term placenta. This gestational trend was
also apparent in the maternal plasma data. The proportion of
methylated CpGs among the fetal-specific reads was 47.0% in the
first trimester maternal plasma but was 53.3% in the third
trimester maternal plasma. The numbers of validated hypermethylated
loci were similar in the first (1,457 loci) and third trimester
(1,279 loci) maternal plasma samples but there were substantially
more hypomethylated loci in the first (21,812 loci) than the third
trimester (12,677 loci) samples (table 1600 of FIG. 16).
[0223] D. Use of Markers
[0224] The differentially methylated markers, or DMRs, are useful
in several aspects. The presence of such markers in maternal plasma
indicates and confirms the presence or fetal or placental DNA. This
confirmation can be used as a quality control for noninvasive
prenatal testing. DMRs can serve as generic fetal DNA markers in
maternal plasma and have advantages over markers that rely on
genotyping differences between the mother and fetus, such as
polymorphism based markers or those based on chromosome Y. DMRs are
generic fetal markers that are useful for all pregnancies. The
polymorphism based markers are only applicable to the subset of
pregnancies where the fetus has inherited the marker from its
father and where the mother does not possess this marker in her
genome. In addition, one could measure the fetal DNA concentration
in a maternal plasma sample by quantifying the DNA molecules
originating from those DMRs. By knowing the profile of DMRs
expected for normal pregnancies, pregnancy-associated
complications, particularly those involving placental tissue
changes, could be detected by observing a deviation in the maternal
plasma DMR profile or methylation profile from that expected for
normal pregnancies. Pregnancy-associated complications that involve
placental tissue changes include but not limited to fetal
chromosomal aneuploidies, such as trisomy 21, preeclampsia,
intrauterine growth retardation and preterm labor.
[0225] E. Kits Using Markers
[0226] Embodiments can provide compositions and kits for practicing
the methods described herein and other applicable methods. Kits can
be used for carrying out assays for analyzing fetal DNA, e.g.,
cell-free fetal DNA in maternal plasma. In one embodiment, a kit
can include at least one oligonucleotide useful for specific
hybridization with one or more loci identified herein. A kit can
also include at least one oligonucleotide useful for specific
hybridization with one or more reference loci. In one embodiment,
placental hypermethylated markers are measured. The test locus may
be the methylated DNA in maternal plasma and the reference locus
may be the methylated DNA in maternal plasma. A similar kit could
be composed for analyzing tumor DNA in plasma.
[0227] In some cases, the kits may include at least two
oligonucleotide primers that can be used in the amplification of at
least a section of a target locus (e.g., a locus in the appendix)
and a reference locus. Instead of or in addition to primers, a kit
can include labeled probes for detecting a DNA fragment
corresponding to a target locus and a reference locus. In various
embodiments, one or more oligonucleotides of the kit correspond to
a locus in the tables of the appendix. Typically, the kits also
provide instruction manuals to guide users in analyzing test
samples and assessing the state of physiology or pathology in a
test subject.
[0228] In various embodiments, a kit for analyzing fetal DNA in a
biological sample containing a mixture of fetal DNA and DNA from a
female subject pregnant with a fetus is provided. The kit may
comprise one or more oligonucleotides for specifically hybridizing
to at least a section of a genomic region listed in tables S2A,
S2B, S2C, S2D, S3A, S3B, S3C, and S3D. Thus, any number of
oligonucleotides from across the tables are just from one table may
be used. The oligonucleotides may act as primers, and may be
organized as pairs of primers, where a pair corresponds to a
particular region from the tables.
VI. Relationship of Size and Methylation Density
[0229] Plasma DNA molecules are known to exist in circulation in
the form of short molecules, with the majority of molecules about
160 bp in length (Y. M. D. Lo et al. 2010 Sci Transl Med; 2:
61ra91, Y. W. Zheng at al. 2012 Clin Chem; 58: 549-558).
Interestingly, our data revealed a relationship between the
methylation status and the size of plasma DNA molecules. Thus,
plasma DNA fragment length is linked to DNA methylation level. The
characteristic size profiles of plasma DNA molecules suggest that
the majority are associated with mononucleosomes, possibly derived
from enzymatic degradation during apoptosis.
[0230] Circulating DNA is fragmented in nature. In particular,
circulating fetal DNA is shorter than maternally-derived DNA in
maternal plasma samples (KCA Chan et al. 2004 Clin Chem; 50:
88-92). As paired-end alignment enables the size analysis of
bisulfite-treated DNA, one could assess directly if any correlation
exists between the size of plasma DNA molecules and their
respective methylation levels. We explored this in the maternal
plasma as well as a non-pregnant adult female control plasma
sample.
[0231] Paired-end sequencing for both ends of each DNA molecule was
used to analyze each sample in this study. By aligning the pair of
end sequences of each DNA molecule to the reference human genome
and noting the genome coordinates of the extreme ends of the
sequenced reads, one can determine the lengths of the sequenced DNA
molecules. Plasma DNA molecules are naturally fragmented into small
molecules and the sequencing libraries for plasma DNA are typically
prepared without any fragmentation steps. Hence, the lengths
deduced by the sequencing represented the sizes of the original
plasma DNA molecules.
[0232] In a previous study, we determined the size profiles of the
fetal and maternal DNA molecules in maternal plasma (Y. M. D. Lo et
al. 2010 Sci Transl Med; 2: 61ra91). We showed that the plasma DNA
molecules had sizes that resembled mononucleosomes and fetal DNA
molecules were shorter than the maternal ones. In this study, we
have determined the relationship of the methylation status of
plasma DNA molecules had to their sizes.
[0233] A. Results
[0234] FIG. 17A is a plot 1700 showing size distribution of
maternal plasma, non-pregnant female control plasma, placental and
peripheral blood DNA. For the maternal sample and the non-pregnant
female control plasma, the two bisulfite-treated plasma samples
displayed the same characteristic size distribution as previously
reported (Y. M. D. Lo et al. 2010 Sci Transl Med; 2: 61ra91) with
the most abundant total sequences of 166-167 bp in length and a
10-bp periodicity of DNA molecules shorter than 143 bp
[0235] FIG. 17B is a plot 1750 of size distribution and methylation
profile of maternal plasma, adult female control plasma, placental
tissue and adult female control blood. For DNA molecules of the
same size and containing at least one CpG site, their mean
methylation density was calculated. We then plotted the
relationship between the sizes of the DNA molecules and their
methylation densities. Specifically, the mean methylation density
was determined for each fragment length ranging from 50 bp up to
180 bp for sequenced reads covering at least 1 CpG site.
Interestingly, the methylation density increased with the plasma
DNA size and peaked at around 166-167 bp. This pattern, however,
was not observed in the placenta and control blood DNA samples
which were fragmented using an ultrasonicator system.
[0236] FIG. 18 shows plots of methylation densities and size of
plasma DNA molecules. FIG. 18A is a plot 1800 for the first
trimester maternal plasma. FIG. 18B is a plot 1850 for the third
trimester maternal plasma. Data for all the sequenced reads that
covered at least one CpG site are represented by the blue curve
1805. Data for reads that also contained a fetal-specific SNP
allele are represented by the red curve 1810. Data for reads that
also contained a maternal-specific SNP allele are represented by
the green curve 1815.
[0237] Reads that contained a fetal-specific SNP allele was
considered a fetal DNA molecule. Reads that contained a
maternal-specific SNP allele was considered a maternal DNA
molecule. In general, DNA molecules with high methylation densities
were longer in size. This trend was present in both the fetal and
maternal DNA molecules in both the first and third trimesters. The
overall sizes of the fetal DNA molecules were shorter than the
maternal ones as previously reported.
[0238] FIG. 19A shows a plot 1900 of methylation densities and the
sizes of sequenced reads for an adult non-pregnant female. The
plasma DNA sample from the adult non-pregnant female also showed
the same relationship between the sizes and methylation state of
the DNA molecules. On the other hand, the genomic DNA samples were
fragmented by an ultrasonication step before MPS analysis. As shown
in plot 1900, the data from the blood cell and placental tissue
samples did not reveal the same trend. Since the fragmentation of
the cells is artificial, one would expect to have no relationship
of size and density. Since the naturally fragmented DNA molecules
in plasma do show a dependence on size, it can be presumed that the
lower methylation densities make it more likely for molecules to
break into smaller fragments.
[0239] FIG. 19B is a plot 1950 showing size distribution and
methylation profile of fetal-specific and maternal-specific DNA
molecules in maternal plasma. Fetal-specific and maternal-specific
plasma DNA molecules also exhibited the same correlation between
fragment size and methylation level. Both the fragment length of
placenta-derived and maternal circulating cell-free DNA increased
with the methylation level. Moreover, the distribution of their
methylation status did not overlap with each other, suggesting that
the phenomenon exists irrespective of the original fragment length
of the sources of circulating DNA molecules.
[0240] B. Method
[0241] Accordingly, a size distribution can be used to estimate a
total methylation percentage of a plasma sample. This methylation
measurement can then be tracked during pregnancy or during cancer
treatment by serial measure of the size distributions of the plasma
DNA according to the relationship shown in FIGS. 18A and 18B. The
methylation measurement can also be used to look for increased or
decreased release of DNA from an organ or a tissue of interest. For
example, one can specifically look for DNA methylation signatures
specific to a specific organ (e.g. the liver) and to measure the
concentrations of these signatures in plasma. As DNA is released
into plasma when cells die, an increase in levels could mean an
increase in cell death or damage in that particular organ or
tissue. A decrease in level from a particular organ can mean that
treatment to counter damage or pathological processes in that organ
is under control.
[0242] FIG. 20 is a flowchart of a method 2000 for estimating a
methylation level of DNA in a biological sample of an organism
according to embodiments of the present invention. The methylation
level can be estimated for a particular region of a genome or the
entire genome. If a specific region is desired, then DNA fragments
only from that specific region may be used.
[0243] At block 2010, amounts of DNA fragments corresponding to
various sizes are measured. For each size of a plurality of sizes,
an amount of a plurality of DNA fragments from the biological
sample corresponding to the size can be measured. For instance, the
number of DNA fragments having a length of 140 bases may be
measured. The amounts may be saved as a histogram. In one
embodiment, a size of each of the plurality of nucleic acids from
the biological sample is measured, which may be done on an
individual basis (e.g., by single molecule sequencing of a whole
molecule or just ends of the molecule) or on a group basis (e.g.,
via electrophoresis). The sizes may correspond to a range. Thus, an
amount can be for DNA fragments that have a size within a
particular range. When paired-end sequencing is performed, the DNA
fragments (as determined by the paired sequence reads) mapping
(aligning) to a particular region may be used to determine the
methylation level of the region.
[0244] At block 2020, a first value of a first parameter is
calculated based on the amounts of DNA fragments at multiple sizes.
In one aspect, the first parameter provides a statistical measure
of a size profile (e.g., a histogram) of DNA fragments in the
biological sample. The parameter may be referred to as a size
parameter since it is determined from the sizes of the plurality of
DNA fragments.
[0245] The first parameter can be of various forms. One parameter
is the percentage of DNA fragment of a particular size or range of
sizes relative to all DNA fragments or relative to DNA fragments of
another size or range. Such a parameter is a number of DNA
fragments at a particular size divided by the total number of
fragments, which may be obtained from a histogram (any data
structure providing absolute or relative counts of fragments at
particular sizes). As another example, a parameter could be a
number of fragments at a particular size or within a particular
range divided by a number of fragments of another size or range.
The division can act as a normalization to account for a different
number of DNA fragments being analyzed for different samples. A
normalization can be accomplished by analyzing a same number of DNA
fragments for each sample, which effectively provides a same result
as dividing by a total number fragments analyzed. Additional
examples of parameters and about size analysis can be found in U.S.
patent application Ser. No. 13/789,553, which is incorporated by
reference for all purposes.
[0246] At block 2030, the first size value is compared to a
reference size value. The reference size value can be calculated
from DNA fragments of a reference sample. To determine the
reference size values, the methylation profile can be calculated
and quantified for a reference sample, as well as a value of the
first size parameter. Thus, when the first size value is compared
to the reference size value, a methylation level can be
determined.
[0247] At block 2040, the methylation level is estimated based on
the comparison. In one embodiment, one can determine if the first
value of the first parameter is above or below the reference size
value, and thereby determine if the methylation level of the
instant sample is above or below the methylation level to the
reference size value. In another embodiment, the comparison is
accomplished by inputting the first value into a calibration
function. The calibration function can effectively compare the
first value to calibration values (a set of reference size values)
by identifying the point on a curve corresponding to the first
value. The estimated methylation level is then provided as the
output value of the calibration function.
[0248] Accordingly, one can calibrate a size parameter to a
methylation level. For example, a methylation level can be measured
and associated with a particular size parameter for that sample.
Then data points from various samples can be fit a calibration
function. In one implementation, different calibration functions
can be used for different subsets of DNA. Thus, there may be some
form of calibration based on prior knowledge about the relationship
between methylation and size for a particular subset of DNA. For
example, the calibration for fetal and maternal DNA could be
different.
[0249] As shown above, the placenta is more hypomethylated when
compared with maternal blood, and thus the fetal DNA is smaller due
to the lower methylation. Accordingly, an average size of the
fragments of a sample (or other statistical value) can be used to
estimate the methylation density. As the fragment sizes can be
measured using paired-end sequencing, rather than the potentially
technically more complex methylation-aware sequencing, this
approach would potentially be cost-effective if used clinically.
This approach can be used for monitoring the methylation changes
associated with the progress of pregnancy, or with
pregnancy-associated disorders such as preeclampsia, preterm labor
and fetal disorders (such as those caused by chromosomal or genetic
abnormalities or intrauterine growth retardation).
[0250] In another embodiment, this approach can be used for
detecting and monitoring cancer. For example, with the successful
treatment of cancer, the methylation profile in plasma or another
bodily fluid as measured using this size-based approach would
change towards that of healthy individuals without cancer.
Conversely, in the event that the cancer is progressing, then the
methylation profile in plasma or another bodily fluid would diverge
from that of healthy individuals without cancer.
[0251] In summary, the hypomethylated molecules were shorter than
the hypermethylated ones in plasma. The same trend was observed in
both the fetal and maternal DNA molecules. Since DNA methylation is
known to influence nucleosome packing, our data suggest that
perhaps the hypomethylated DNA molecules were less densely packed
with histones and were therefore more susceptible to enzymatic
degradation. On the other hand, the data presented in FIGS. 18A and
18B also showed that despite the fetal DNA being much more
hypomethylated than the maternal reads, the size distribution of
the fetal and maternal DNA does not separate from one another
completely. In FIG. 19B, one can see that even for the same size
category, the methylation level of fetal- and maternal-specific
reads differ from one another This observation suggests that the
hypomethylated state of fetal DNA is not the only factor that
accounted for its relative shortness with reference to the maternal
DNA.
VII. Imprinting Status of Gene Loci
[0252] Fetal-derived DNA molecules can be detected which share the
same genotype but with different epigenetic signatures as the
mother in maternal plasma (L. L. Poon et al. 2002 Clin Chem; 48:
35-41). To demonstrate that the sequencing approach is sensitive in
picking up fetal-derived DNA molecules in maternal plasma, we
applied the same strategy to detect the imprinted fetal alleles in
maternal plasma sample. Two genomic imprinted regions were
identified: H19 (chr11:1,977,419-1,977,821, NCBI Build36/hg18) and
MEST (chr7:129,917,976-129,920,347, NCBI Build36/hg18). Both of
them contain informative SNPs for differentiation between the
maternal and fetal sequences. For H19, a maternally expressed gene,
the mother was homozygous (A/A) and the fetus was heterozygous
(A/C) for the SNP rs2071094 (chr11:1,977,740) in the region. One of
the maternal A alleles is fully methylated and the other is
unmethylated. In the placenta, however, the A allele is
unmethylated while the paternal-inherited C allele is fully
methylated. We detected two methylated reads with the C genotype,
corresponding to the imprinted paternal alleles derived from the
placenta, in maternal plasma.
[0253] MEST, also known as PEG1, is a paternally expressed gene.
Both the mother and the fetus were heterozygous (A/G) for the SNP
rs2301335 (chr7:129,920,062) within the imprinted locus. The G
allele is methylated while the A allele is unmethylated in maternal
blood. The methylation pattern is reversed in the placenta with the
maternal A allele being methylated and the paternal G allele
unmethylated. Three unmethylated G alleles, which were paternally
derived, were detectable in maternal plasma. In contrast, VAV1, a
non-imprinted gene locus on chromosome 19
(chr19:6,723,621-6,724,121), did not display any allelic
methylation pattern in the tissue as well as in the plasma DNA
samples.
[0254] Thus, methylation status can be used to determine which DNA
fragments are from the fetus. For example, just detecting the A
allele in maternal plasma cannot be used as a fetal marker when the
mother is GA heterozygous. But if one distinguishes the methylation
status of the A molecules in plasma, the methylated A molecules are
fetal-specific while the unmethylated A-molecules are
maternal-specific, or vice versa.
[0255] We next focused on loci that have been reported to
demonstrate genomic imprinting in placental tissues. Based on the
list of loci reported by Woodfine et al. (2011 Epigenetics
Chromatin; 4: 1), we further sorted for those that contained SNPs
within the imprinting control region. Four loci fulfilled the
criteria and they were H19, KCNQ10T1, MEST and NESP.
[0256] Regarding the reads of the maternal blood cell sample for
H19 and KCNQ10T1, the maternal reads were homozygous for the SNP
and there were approximately equal proportions of methylated and
unmethylated reads. The CVS and term placental tissue sample
revealed that the fetus was heterozygous for both loci and each
allele was either exclusively methylated or unmethylated, i.e.
showing monoallelic methylation. In the maternal plasma samples,
the paternally inherited fetal DNA molecules were detected for both
loci. For H19, the paternally inherited molecules were represented
by the sequenced reads that contained the fetal-specific allele and
were methylated. For KCNQ10T1, the paternally inherited molecules
were represented by the sequenced reads that contained the
fetal-specific allele and were unmethylated.
[0257] On the other hand, the mother was heterozygous for both MEST
and NESP. For MEST, both the mother and fetus were GA heterozygotes
for the SNP. However, as evident from the data for the Watson
strand for the maternal blood cells and placental tissue, the
methylation status for the CpGs adjacent to the SNP was opposite in
the mother and fetus. The A-allele was unmethylated in the mother's
DNA but methylated in the fetus's DNA. For MEST, the maternal
allele was methylated. Hence, one could pinpoint that the fetus had
inherited the A-allele from its mother (methylated in the CVS) and
the mother had inherited the A-allele from her father (unmethylated
in the maternal blood cells). Interestingly, in the maternal plasma
samples, all four groups of molecules could be readily
distinguished, including each of the two alleles of the mother and
each of the two alleles of the fetus. Thus, by combining the
genotype information with the methylation status at the imprinted
loci, we could readily distinguish the maternally inherited fetal
DNA molecules from the background maternal DNA molecules (L. L.
Poon et al. 2002).
[0258] This approach could be used to detect uniparental disomy.
For example, if the father of this fetus is known to be homozygous
for the G-allele, the failure to detect the unmethylated G-allele
in maternal plasma signifies the lack of contribution of the
paternal allele. In addition, under such a circumstance, when both
methylated G-allele and methylated A-allele were detected in the
plasma of this pregnancy, it would suggest that the fetus has
heterodisomy from the mother, i.e. inheriting two different alleles
from the mother with no inheritance from the father. Alternatively,
if both methylated A-allele (fetal allele inherited from the
mother) and unmethylated A-allele (maternal allele inherited from
the maternal grandfather) were detected in maternal plasma without
the unmethylated G-allele (paternal allele that should have been
inherited by the fetus), it would suggest that the fetus has
isodisomy from the mother, i.e. inheriting two identical alleles
from the mother with no inheritance from the father.
[0259] For NESP, the mother was a GA heterozygote at the SNP while
the fetus was homozygous for the G-allele. The paternal allele was
methylated for NESP. In the maternal plasma samples, the
paternally-inherited fetal G-alleles that were methylated could be
readily distinguished from the background maternal G-alleles which
were unmethylated.
VIII. Cancer/Donors
[0260] Some embodiments can be used for the detection, screening,
monitoring (e.g. for relapse, remission, or response (e.g. presence
or absence) to treatment), staging, classification (e.g. for aid in
choosing the most appropriate treatment modality) and
prognostication of cancer using methylation analysis of circulating
plasma/serum DNA.
[0261] Cancer DNA is known to demonstrate aberrant DNA methylation
(J. G. Herman et al. 2003 N Engl J Med; 349: 2042-2054). For
example, the CpG island promoters of gene, e.g. tumor suppressor
genes, are hypermethylated while the CpG sites in the gene body are
hypomethylated when compared with non-cancer cells. Provided that
the methylation profile of the cancer cells could be reflected by
the methylation profile of the tumor-derived plasma DNA molecules
using methods herein described, we expect that the overall
methylation profile in plasma would be different between
individuals with cancer when compared with those healthy
individuals without cancer or when compared with those whose cancer
had been cured. The types of differences in the methylation profile
could be in terms of quantitative differences in the methylation
densities of the genome and/or methylation densities of segments of
the genomes. For example, due to the general hypomethylated nature
of DNA from cancer tissues (Gama-Sosa M A et al. 1983 Nucleic Acids
Res; 11: 6883-6894), reduction in methylation densities in the
plasma methylome or segments of the genome would be observed in
plasma of cancer patients.
[0262] Qualitative changes in the methylation profile should also
be reflected among the plasma methylome data. For example, plasma
DNA molecules originating from genes that are hypermethylated only
in cancer cells would show hypermethylation in plasma of a cancer
patient when compared with plasma DNA molecules originating from
the same genes but in a sample of a healthy control. Because
aberrant methylation occurs in most cancers, the methods herein
described could be applied to the detection of all forms of
malignancies with aberrant methylation, for example, malignancies
in, but not limited to, the lung, breast, colorectum, prostate,
nasopharynx, stomach, testes, skin, nervous system, bone, ovary,
liver, hematologic tissues, pancreas, uterus, kidney, lymphoid
tissues, etc. The malignancies may be of a variety of histological
subtypes, for example, carcinomas, adenocarcinomas, sarcomas,
fibroadenocarcinoma, neuroendocrine, undifferentiated.
[0263] On the other hand, we expect that tumor-derived DNA
molecules can be distinguished from the background
non-tumor-derived DNA molecules because the overall short size
profile of tumor-derived DNA is accentuated for DNA molecules
originating from loci with tumor-associated aberrant
hypomethylation which would have an additional effect on the size
of the DNA molecule. Also, tumor-derived plasma DNA molecules can
be distinguished from the background non-tumor-derived plasma DNA
molecules using multiple characteristic features that are
associated with tumor DNA, including but not limited to single
nucleotide variants, copy number gains and losses, translocations,
inversions, aberrant hyper- or hypo-methylation and size profiling.
As all of these changes could occur independently, the combined use
of these features may provide additive advantage for the sensitive
and specific detection of cancer DNA in plasma.
[0264] A. Size and Cancer
[0265] The size of tumor-derived DNA molecules in plasma also
resemble the sizes of mononucleosomal units and are shorter than
the background non-tumor-derived DNA molecules, which co-exists in
plasma of cancer patients. Size parameters have been shown to be
correlated with cancer, as described in U.S. patent application
Ser. No. 13/789,553, which is incorporated by reference for all
purposes.
[0266] Since both fetal-derived and maternal-derived DNA in plasma
showed a relationship between the size and methylation status of
the molecule, tumor-derived DNA molecules are expected to exhibit
the same trend. For example, the hypomethylated molecules would be
shorter than the hypermethylated molecules in the plasma of cancer
patients or in subjects screened for cancer.
[0267] B. Methylation Densities of Different Tissues in a Cancer
Patient
[0268] In this example, we analyzed the plasma and tissue samples
of a hepatocellular carcinoma (HCC) patient. Blood samples were
collected from the HCC patient before and at 1 week after surgical
resection of the tumor. Plasma and buffy coat were harvested after
centrifugation of the blood samples. The resected tumor and the
adjacent non-tumor liver tissue were collected. The DNA samples
extracted from the plasma and tissue samples were analyzed using
massively parallel sequencing with and without prior bisulfite
treatment. The plasma DNA from four healthy individuals without
cancer was also analyzed as controls. The bisulfite treatment of a
DNA sample would convert the unmethylated cytosine residues to
uracil. In the downstream polymerase chain reaction and sequencing,
these uracil residues would behave as thymidine. On the other hand,
the bisulfite treatment would not convert the methylated cytosine
residues to uracil. After massively parallel sequencing, the
sequencing reads were analyzed by the Methy-Pipe (P. Jiang, et al.
Methy-Pipe: An integrated bioinformatics data analysis pipeline for
whole genome methylome analysis, paper presented at the IEEE
International Conference on Bioinformatics and Biomedicine
Workshops, Hong Kong, 18 to 21 Dec. 2010), to determine the
methylation status of the cytosine residues at all CG dinucleotide
positions, i.e CpG sites.
[0269] FIG. 21A is a table 2100 showing the methylation densities
of the pre-operative plasma and the tissue samples of an HCC
patient. The CpG methylation density for the regions of interest
(e.g. CpG sites, promoter, or repeat regions etc.) refers to the
proportion of reads showing CpG methylation over the total number
of reads covering genomic CpG dinucleotides. The methylation
densities of the buffy coat and the non-tumoral liver tissue are
similar. The overall methylation density of the tumor tissue, based
on data from all autosomes, was 25% lower than those of the buffy
coat and the non-tumoral liver tissue. The hypomethylation was
consistent across each individual chromosome. The methylation
density of the plasma was between the values of the non-malignant
tissues and the cancer tissues. This observation is consistent with
the fact that both cancer and non-cancer tissues would contribute
to the circulating DNA of a cancer patient. It has been shown that
the hematopoietic system is the main source of the circulating DNA
in individuals without an active malignant condition (Y. Y. Lui, et
al. 2002 Clin Chem; 48: 421-7). We therefore also analyzed plasma
samples obtained from four healthy controls. The number of sequence
reads and the sequencing depth achieved per sample are shown in
table 2150 of FIG. 21B.
[0270] FIG. 22 is a table 220 showing the methylation densities in
the autosomes ranged from 71.2% to 72.5% in the plasma samples of
the healthy controls. These data showed the expected level of DNA
methylation in plasma samples obtained from individuals without a
source of tumor DNA. In a cancer patient, the tumor-tissue would
also release DNA into the circulation (K. C. Chan et al. 2013 Clin
Chem; 59: 211-224); R. J. Leary et al. 2012 Sci Transl Med; 4:
162ra154). Due to the hypomethylated nature of the HCC tumor, the
presence of both tumor- and non-tumor-derived DNA in the
pre-operative plasma of the patient resulted in a reduction in the
methylation density when compared with plasma levels of healthy
controls. In fact, the methylation density of the pre-operative
plasma sample was between the methylation densities of the tumor
tissue and the plasma of the healthy controls. The reason is
because the methylation level of the plasma DNA of cancer patients
would be influenced by the degree of degree of aberrant
methylation, hypomethylation in this case, of the tumor tissue and
the fractional concentration of the tumor-derived DNA in the
circulation. A lower methylation density of the tumor tissue and a
higher fractional concentration of tumor-derived DNA in the
circulation would lead to a lower methylation density of the plasma
DNA in a cancer patient. Most tumors are reported to show global
hypomethylation (J. G. Herman et al. 2003 N Engl J Med; 349:
2042-2054; Gama-Sosa M A et al. 1983 Nucleic Acids Res; 11:
6883-6894). Thus, the current observations seen in the HCC samples
should also be applicable to other types of tumors.
[0271] In one embodiment, the methylation density of the plasma DNA
can be used to determine the fractional concentration of
tumor-derived DNA in a plasma/serum sample when the methylation
level of the tumor tissue is known. The methylation level, e.g.
methylation density, of the tumor tissue can be obtained if the
tumor sample is available or a biopsy of the tumor is available. In
another embodiment, the information regarding the methylation level
of the tumor tissue can be obtained from survey of the methylation
level in a group of tumors of a similar type and this information
(e.g. a mean level or a median level) is applied to the patient to
be analyzed using the technology described in this invention. The
methylation level of the tumor tissue can be determined by the
analysis of the tumor tissue of the patient or inferred from the
analysis of the tumor tissues of other patients with the same or a
similar cancer type. The methylation of tumor tissues can be
determined using a range of methylation-aware platforms, including
but not limited to massively parallel sequencing, single molecular
sequencing, microarray (such as methylated cytosine
immunoprecipitation or methylation-aware restriction enzyme
digestion followed by microarray analysis, or oligonucleotide
arrays), or mass spectrometry (such as the Epityper, Sequenom,
Inc., analysis). When the methylation level of a tumor is known,
the fractional concentration of tumor DNA in the plasma of cancer
patients could be calculated after plasma methylome analysis.
[0272] The relationship between the plasma methylation level, P,
with the fractional tumor DNA concentration, f, and the tumor
tissue methylation level, TUM, can be described as:
P=BKG.times.(1-f)+TUM.times.f, where BKG is the background DNA
methylation level in plasma derived from blood cells and other
internal organs. For example, the overall methylation density of
all autosomes was shown to be 42.9% in the tumor biopsy tissue
obtained from this HCC patient, i.e. the TUM value for this case.
The mean methylation density of the plasma samples from the four
healthy controls was 71.6%, i.e. the BKG value of this case. The
plasma methylation density for the pre-operative plasma was 59.7%.
Using these values, f is estimated to be 41.5%
[0273] In another embodiment, the methylation level of the tumor
tissue can be estimated noninvasively based on the plasma methylome
data when the fractional concentration of the tumor-derived DNA in
the plasma sample is known. The fractional concentration of the
tumor-derived DNA in the plasma sample can be determined by other
genetic analysis, for example the genomewide analysis of allelic
loss (GAAL) and the analysis of single nucleotide mutations as
previously described (U.S. patent application Ser. No. 13/308,473;
Chan K C et al. 2013 Clin Chem; 59: 211-24). The calculation is
based on the same relationship described above except that in this
embodiment, the value off is known and the value of TUM becomes the
unknown. The deduction can be performed for the whole genome or for
parts of the genome, similar to the data observed for the context
of determining the placental tissue methylation level from maternal
plasma data.
[0274] In another embodiment, one can use the inter-bin variation
or profile in the methylation densities to differentiate subjects
with cancer and those without cancer. The resolution of the
methylation analysis can be further increased by dividing the
genome into bins of a particular size, e.g., 1 Mb. In such an
embodiment, the methylation density of each 1 Mb bin was calculated
for the collected samples, e.g., buffy coat, the resected HCC
tissue, the non-tumoral liver tissue adjacent to the tumor and the
plasma collected before and after tumor resection. In another
embodiment, the bin sizes do not need to be kept constant. In one
implementation, the number of CpG sites is kept constant within
each bin while the bin itself can vary in size.
[0275] FIGS. 23A and 23B shows methylation density of buffy coat,
tumor tissue, non-tumoral liver tissue, the pre-operative plasma
and post-operative plasma of the HCC patient. FIG. 23A is a plot
2300 of results for chromosome 1. FIG. 23B is a plot 2350 of
results for chromosome 2.
[0276] For most of the 1 Mb windows, the methylation densities for
the buffy coat and the non-tumoral liver tissue adjacent to the
tumor were similar whereas those of the tumor tissues were lower.
Similar to the results in the whole chromosome analyses as shown in
Table 1, the methylation densities of the pre-operative plasma lie
between those of the tumor and the non-malignant tissues. The
methylation densities of the interrogated genomic regions in the
tumor tissues could be deduced using the methylation data of the
pre-operative plasma and the fractional tumor DNA concentration.
The method is same as described above using the methylations
density values of all the autosomes. The deduction of the tumor
methylation described can also be performed using this higher
resolution methylation data of the plasma DNA. Other bin sizes,
such as 300 kb, 500 kb, 2 Mb, 3 Mb, 5 Mb or more than 5 Mb can also
be used. In one embodiment, the bin sizes do not need to be kept
constant. In one implementation, the number of CpG sites is kept
constant within each bin while the bin itself can vary in size.
[0277] C. Comparison of Plasma Methylation Density Between the
Cancer Patient and Healthy Individuals
[0278] As shown in 2100, the methylation densities of the
pre-operative plasma DNA were lower than those of the non-malignant
tissues in the cancer patient. This is likely to result from the
presence of DNA from the tumor tissue which was hypomethylated.
This lower plasma DNA methylation density can potentially be used
as a biomarker for the detection and monitoring of cancer. For
cancer monitoring, if a cancer is progressing, then there will be
an increased amount of cancer-derived DNA in plasma with time. In
this example, an increased amount of circulating cancer-derived DNA
in plasma will lead to a further reduction in the plasma DNA
methylation density on a genome wide level.
[0279] Conversely, if a cancer responds to treatment, then the
amount of cancer-derived DNA in plasma will decrease with time. In
this example, a decrease in the amount of cancer-derived DNA in
plasma will lead to an increase in the plasma DNA methylation
density. For example, if a lung cancer patient with epidermal
growth factor receptor mutation has been treated with a targeted
therapy, e.g. tyrosine kinase inhibition, then an increase in
plasma DNA methylation density would signify a response.
Subsequently, the emergence of a tumor clone resistant to tyrosine
kinase inhibition would be associated with a decrease in plasma DNA
methylation density which would indicate a relapse.
[0280] Plasma methylation density measurements can be performed
serially and the rate of change of such measurements can be
calculated and used to predict or correlate with clinical
progression or remission or prognosis. For selected genomic loci
which are hypermethylated in cancer tissues but hypomethylated in
normal tissues, e.g. the promoter regions of a number of tumor
suppressor genes, the relationship between cancer progression and
favorable response to treatment will be opposite to the patterns
described above.
[0281] To demonstrate the feasibility of this approach, we compared
the DNA methylation densities of plasma samples collected from the
cancer patient before and after surgical removal of the tumor with
plasma DNA obtained from four healthy control subjects.
[0282] Table 2200 shows the DNA methylation densities of each
autosome and the combined values of all autosomes of the
pre-operative and post-operative plasma samples of the cancer
patient and that of the four healthy control subjects. For all
chromosomes, the methylation densities of the pre-operative plasma
DNA sample was lower than the post-operative sample and the plasma
samples from the four healthy subjects. The difference in the
plasma DNA methylation densities between the pre-operative and
post-operative samples provided supportive evidence that the lower
methylation densities in the pre-operative plasma sample were due
to the presence of DNA from the HCC tumor.
[0283] The reversal of the DNA methylation densities in the
post-operative plasma sample levels similar to the plasma samples
of the healthy controls suggested that much of the tumor-derived
DNA had disappeared due to the surgical removal of the source, i.e.
tumor. These data suggest that the methylation density of the
pre-operative plasma as determined using data available from a
large genomic regions, such as all autosomes or individual
chromosomes, was of a lower methylation level than that of the
healthy controls to allow the identification, i.e. diagnosis or
screening, of the test case as having cancer.
[0284] The data of the pre-operative plasma also showed much lower
methylation level than that of the post-operative plasma indicating
that the plasma methylation level could also be used to monitor the
tumor load, hence prognosticate and monitor the progress of cancer
in the patient. Reference values can be determined from plasma of
healthy controls or persons at-risk for the cancer but currently
without cancer. Persons at risk for HCC include those with chronic
hepatitis B or hepatitis C infection, those with hemochromatosis,
and those with liver cirrhosis.
[0285] Plasma methylation density values beyond, for example lower
than, a defined cutoff based on the reference values can be used to
assess if a nonpregnant person's plasma has tumor DNA or not. To
detect the presence of hypomethylated circulating tumor DNA, the
cutoff can be defined as lower than the 5.sup.th or 1.sup.st
percentiles of the values of the control population, or based on a
number of standard deviations, for example, 2 or 3 standard
deviations (SDs), below the mean methylation density values of the
controls, or based on determining a multiple of the median (MoM).
For hypermethylated tumor DNA, the cutoff can be defined as higher
than the 95.sup.th or 99.sup.th percentile of the values of the
control population, or based on a number of standard deviations,
for example, 2 or 3 SDs, above the mean methylation density values
of the controls, or based on determining a multiple of the median
(MoM). In one embodiment, the control population is matched in age
to the test subject. The age matching does not need to be exact and
can be performed in age bands (e.g. 30 to 40 years, for a test
subject of 35 years).
[0286] We next compared the methylation densities of 1 Mb bins
between the plasma samples of the cancer patient and the four
control subjects. For illustration purpose, the results of
chromosome 1 are shown.
[0287] FIG. 24A is a plot 2400 showing the methylation densities of
the pre-operative plasma from the HCC patient. FIG. 24B is a plot
2450 showing the methylation densities of the post-operative plasma
from the HCC patient. The blue dots represent the results of the
control subjects, the red dots represent the results of the plasma
sample of the HCC patient.
[0288] As shown in FIG. 24A, the methylation densities of the
pre-operative plasma from the HCC patient were lower than those of
the control subjects for most of the bins. Similar patterns were
observed for other chromosomes. As shown in FIG. 24B, the
methylation densities of the post-operative plasma from the HCC
patient were similar to those of the control subjects for most of
the bins. Similar patterns were observed for other chromosomes.
[0289] To assess if a tested subject is having cancer, the result
of the tested subject would be compared to the values of a
reference group. In one embodiment, the reference group can
comprise of a number of healthy subjects. In another embodiment,
the reference group can comprise of subjects with non-malignant
conditions, for example, chronic hepatitis B infection or
cirrhosis. The difference in the methylation densities between the
tested subject and the reference group can then be quantified.
[0290] In one embodiment, a reference range can be derived from the
values of the control group. Then deviations in the result of the
tested subject from the upper or lower limits of the reference
group can be used to determine if the subject has a tumor. This
quantity would be affected by the fractional concentration of
tumor-derived DNA in the plasma and the difference in the level of
methylation between malignant and non-malignant tissues. Higher
fractional concentration of tumor-derived DNA in plasma would lead
to larger methylation density differences between the test plasma
sample and the controls. A larger degree of difference in the
methylation level of the malignant and non-malignant tissues are
also associated with larger methylation density differences between
the test plasma sample and the controls. In yet another embodiment,
different reference groups are chosen for test subjects of
different age ranges.
[0291] In another embodiment, the mean and SD of the methylation
densities of the four control subjects were calculated for each 1
Mb bin. Then for corresponding bins, the difference between the
methylation densities of the HCC patient and the mean value of the
control subjects was calculated. In one embodiment, this difference
was then divided by the SD of the corresponding bin to determine
the z-score. In other words, the z-score represents the difference
in methylation densities between the test and control plasma
samples expressed as a number of SDs from the mean of the control
subjects. A z-score >3 of a bin indicates that the plasma DNA of
the HCC patient is more hypermethylated than the control subjects
by more than 3 SDs in that bin whereas a z-score of <-3 in a bin
indicates that the plasma DNA of the HCC patient is more
hypomethylated than the control subjects by more than 3 SDs in that
bin.
[0292] FIGS. 25A and 25B show z-scores of the plasma DNA
methylation densities for the pre-operative (plot 2500) and
post-operative (plot 2550) plasma samples of the HCC patient using
the plasma methylome data of the four healthy control subjects as
reference for chromosome 1. Each dot represents the result of one 1
Mb bin. The black dots represent the bins with z-score between -3
and 3. Red dots represent bins with z-score <-3.
[0293] FIG. 26A is a table 2600 showing data for z-scores for
pre-operative and post-operative plasma. Most of the bins on
chromosome 1 (80.9%) in the pre-operative plasma sample had a
z-score of <-3 indicating that the pre-operative plasma DNA of
the HCC patient was significantly more hypomethylated than that of
the control subjects. On the contrary, the number of red dots
decreased substantially in the post-operative plasma sample (8.3%
of the bins on chromosome 1) suggesting that most of the tumor DNA
had been removed from the circulation due to surgical resection of
the source of circulating tumor DNA.
[0294] FIG. 26B is a Circos plot 2620 showing the z-score of the
plasma DNA methylation densities for the pre-operative and
post-operative plasma samples of the HCC patient using the four
healthy control subjects as reference for 1 Mb bins analyzed from
all autosomes. The outermost ring shows the ideograms of the human
autosomes. The middle ring shows the data for the pre-operative
plasma sample. The innermost ring shows that data for the
post-operative plasma sample. Each dot represents the result of one
1 Mb bin. The black dots represent the bins with z-scores between
-3 and 3. The red dots represent bins with z-scores <-3. The
green dots represent bins with z-scores >3.
[0295] FIG. 26C is a table 2640 showing a distribution of the
z-scores of the 1 Mb bins for the whole genome in both the
pre-operative and post-operative plasma samples of the HCC patient.
The results indicate that the pre-operative plasma DNA of the HCC
patient was more hypomethylated than that of the controls for the
majority of regions (85.2% of the 1 Mb bins) in the whole genome.
On the contrary, majority of the regions (93.5% of the 1 Mb bins)
in the post-operative plasma sample showed no significant
hypermethylation or hypomethylation compared with controls. These
data indicate that much of the tumor DNA, mainly hypomethylated in
nature for this HCC, was no longer present in the post-operative
plasma sample.
[0296] In one embodiment, the number, percentage or proportion of
bins with z-scores <-3 can be used to indicate if a cancer is
present. For example, as shown in table 2640, 2330 of the 2734 bins
analyzed (85.2%) showed z-scores <-3 in the pre-operative plasma
while only 171 of the 2734 analyzed bins (6.3%) showed z-scores
<-3 in the post-operative plasma. The data indicated that the
tumor DNA load in the pre-operative plasma was much higher than in
the post-operative plasma.
[0297] The cutoff values of the number of bins may be determined
using statistical methods. For example, approximately 0.15% of the
bins would be expected to have a z-score of <-3 based on a
normal distribution. Therefore, the cutoff number of bins can be
0.15% of the total number of bins being analyzed. In other words,
if a plasma sample from a nonpregnant individual shows more than
0.15% of bins with z-scores <-3, there is a source of
hypomethylated DNA in plasma, namely cancer. For example, 0.15% of
the 2734 1 Mb bins that we have analyzed in this example is about 4
bins. Using this value as a cutoff, both the pre-operative and
post-operative plasma samples contained hypomethylated
tumor-derived DNA, though the amount is much more in the
pre-operative plasma sample than the post-operative plasma sample.
For the four healthy control subjects, none of the bins showed
significant hypermethylation or hypomethylation.
[0298] In another embodiment, the cutoff number can be determined
by receiver operator characteristic (ROC) curve analysis by
analyzing a number of cancer patients and individuals without
cancer. To further validate the specificity of this approach, a
plasma sample from a patient seeking medical consultation for a
non-malignant condition (C06) was analyzed. 1.1% of the bins had a
z-score of <-3. In one embodiment, different thresholds can be
used to classify different levels of disease status. A lower
percentage threshold can be used to differentiate healthy status
from benign conditions and a higher percentage threshold to
differentiate benign conditions from malignancies.
[0299] In yet another embodiment, the sum of the z-scores for all
the bins can be used to determine if cancer is present or used for
the monitoring of the serial changes of the level of plasma DNA
methylation. Due to the overall hypomethylated nature of tumor DNA,
the sum of z-scores would be lower in plasma collected from an
individual with cancer than healthy controls. The sum of z-scores
for the pre- and post-operative plasma sample of the HCC patient
were -49843.8 and -3132.13, respectively.
[0300] In other embodiments, other methods can be used to survey
the methylation level of plasma DNA. For example, the proportion of
methylated cytosine residues over the total content of cytosine
residues can be determined using mass spectrometry (M. L. Chen et
al. 2013 Clin Chem; doi: 10.1373/clinchem.2012.193938) or massively
parallel sequencing. However, as most of the cytosine residues are
not in the CpG dinucleotide context, the proportion of methylated
cytosine among total cytosine residuals would be relatively small
when compared to methylation levels estimated in the context of CpG
dinucleotides. We determined the methylation level of the tissue
and plasma samples obtained from the HCC patient as well as the
four plasma samples obtained from the healthy controls. The
methylation levels were measured in the context of CpGs, any
cytosines, in CHG and CHH contexts using the genome-wide massively
parallel sequencing data. H refers to adenine, thymine or cytosine
residues.
[0301] FIG. 26D is a table 2660 showing the methylation levels of
the tumor tissue and pre-operative plasma sample overlapped with
some of the control plasma samples when using the CHH and CHG
contexts. The methylation levels of the tumor tissue and
pre-operative plasma sample were consistently lower when compared
with the buffy coat, non-tumor liver tissue, post-operative plasma
sample and healthy control plasma samples in both among the CpGs
and unspecified cytosines. However, the data based on the
methylated CpGs, i.e. methylation densities, showed a wider dynamic
range than the data based on the methylated cytosines.
[0302] In other embodiments, the methylation status of the plasma
DNA can be determined by methods using antibodies against
methylated cytosine, for example, methylated DNA
immunoprecipitation (MeDIP). However, the precision of these
methods are expected to be inferior to sequencing-based methods
because of the variability in antibody binding. In yet another
embodiment, the level of 5-hydroxymethylcytosine in plasma DNA can
be determined. In this regard, a reduction in the level of
5-hydroxymethylcytosine has been found to be an epigenetic feature
of certain cancer, e.g. melanoma (C. G. Lian, et al. 2012 Cell;
150: 1135-1146).
[0303] In addition to HCC, we also investigated if this approach
could be applied to other types of cancers. We analyzed the plasma
samples from 2 patients with adenocarcinoma of the lung (CL1 and
CL2), 2 patients with nasopharyngeal carcinoma (NPC1 and NPC2), 2
patients with colorectal cancer (CRC1 and CRC2), 1 patient with
metastatic neuroendocrine tumor (NE1) and 1 patient with metastatic
smooth muscle sarcoma (SMS1). The plasma DNA of these subjects was
bisulfite-converted and sequenced using the Illumina HiSeq2000
platform for 50 bp at one end. The four healthy control subjects
mentioned above were used as a reference group for the analysis of
these 8 patients. 50 bp of the sequence reads at one end were used.
The whole genome was divided into 1 Mb bins. The mean and SD of
methylation density were calculated for each bin using the data
from the reference group. Then the results of the 8 cancer patients
were expressed as z-scores which represent the number of SDs from
the mean of the reference group. A positive value indicates that
the methylation density of the test case is lower than the mean of
the reference group, and vice versa. The number of sequence reads
and the sequencing depth achieved per sample are shown in table
2780 of FIG. 27I.
[0304] FIG. 27A-H show Circos plots of methylation density of 8
cancer patients according to embodiments of the present invention.
Each dot represents the result of a 1 Mb bin. The black dots
represent the bins with z-scores between -3 and 3. The red dots
represent bins with z-scores <-3. The green dots represent bins
with z-scores >3. The interval between two consecutive lines
represents a z-score difference of 20.
[0305] Significant hypomethylation was observed in multiple regions
across the genomes for patients with most types of cancers,
including lung cancer, nasopharyngeal carcinoma, colorectal cancer
and metastatic neuroendocrine tumor. Interestingly, in addition to
hypomethylation, significant hypermethylation was observed in
multiple regions across the genome in the case with metastatic
smooth muscle sarcoma. The embryonic origin of the smooth muscle
sarcoma is the mesoderm whereas the embryonic origin of the other
types of cancers in the remaining 7 patients is the ectoderm.
Therefore, it is possible that the DNA methylation pattern of
sarcoma may be different from that of carcinoma.
[0306] As can be seen from this case, the methylation pattern of
plasma DNA can also be useful for differentiating different types
of cancer, which in this example is a differentiation of carcinoma
and sarcoma. These data also suggest that the approach could be
used to detect aberrant hypermethylation associated with the
malignancy. For all these 8 cases, only plasma samples were
available and no tumor tissue had been analyzed. This showed that
even without the prior methylation profile or methylation levels of
the tumor tissue, tumor-derived DNA can be readily detected in
plasma using the methods described.
[0307] FIG. 27J is a table 2790 is a table showing a distribution
of the z-scores of the 1 Mb bins for the whole genome in plasma of
patients with different malignancies. The percentages of bins with
z-score <-3, -3 to 3 and >3 are shown for each case. More
than 5% of the bins had a z-score of <-3 for all the cases.
Therefore, if we use a cutoff of 5% of the bins being significantly
hypomethylated for classifying a sample being positive for cancer,
then all of these cases would be classified as positive for cancer.
Our results show that hypomethylation is likely to be a general
phenomenon for different types of cancers and the plasma methylome
analysis would be useful for detecting different types of
cancers.
[0308] D. Method
[0309] FIG. 28 is a flowchart of method 2800 of analyzing a
biological sample of an organism to determine a classification of a
level of cancer according to embodiments of the present invention.
The biological sample includes DNA originating from normal cells
and may potentially include DNA from cells associated with cancer.
At least some of the DNA may be cell-free in the biological
sample.
[0310] At block 2810, a plurality of DNA molecules from the
biological sample are analyzed. The analysis of a DNA molecule can
include determining a location of the DNA molecule in a genome of
the organism and determining whether the DNA molecule is methylated
at one or more sites. The analysis can be performed by receiving
sequence reads from a methylation-aware sequencing, and thus the
analysis can be performed just on data previously obtained from the
DNA. In other embodiments, the analysis can include the actual
sequencing or other active steps of obtaining the data.
[0311] At block 2820, a respective number of DNA molecules that are
methylated at the site is determined for each of a plurality of
sites. In one embodiment, the sites are CpG sites, and may be only
certain CpG sites, as selected using one or more criteria mentioned
herein. The number of DNA that are methylated is equivalent to
determining the number that are unmethylated once normalization is
performed using a total number of DNA molecules analyzed at a
particular site, e.g., a total number of sequence reads. For
example, an increase in the CpG methylation density of a region is
equivalent to a decrease in the density of unmethylated CpGs of the
same region.
[0312] At block 2830, a first methylation level is calculated based
on the respective numbers of DNA molecules methylated at the
plurality of sites. The first methylation level can correspond to a
methylation density that is determined based on the number of DNA
molecules corresponding to the plurality of sites. The sites can
correspond to a plurality of loci or just one locus.
[0313] At block 2840, the first methylation level is compared to a
first cutoff value. The first cutoff value may be a reference
methylation level or be related to a reference methylation level
(e.g., a specified distance from a normal level). The reference
methylation level may be determined from samples of individuals
without cancer or from loci or the organism that are known to not
be associated with a cancer of the organism. The first cutoff value
may be established from a reference methylation level determined
from a previous biological sample of the organism obtained previous
to the biological sample being tested.
[0314] In one embodiment, the first cutoff value is a specified
distance (e.g., a specified number of standard deviations) from a
reference methylation level established from a biological sample
obtained from a healthy organism. The comparison can be performed
by determining a difference between the first methylation level and
a reference methylation level, and then comparing the difference to
a threshold corresponding to the first cutoff value (e.g., to
determine if the methylation level is statistically different than
the reference methylation level).
[0315] At block 2850, a classification of a level of cancer is
determined based on the comparison. Examples of a level of cancer
includes whether the subject has cancer or a premalignant
condition, or an increased likelihood of developing cancer. In one
embodiment, the first cutoff value may be determined from a
previously obtained sample from the subject (e.g., a reference
methylation level may be determined from the previous sample).
[0316] In some embodiments, the first methylation level can
correspond to a number of regions whose methylation levels exceed a
threshold value. For example, a plurality of regions of a genome of
the organism can be identified. The regions can be identified using
criteria mentioned herein, e.g., of certain lengths or certain
number of sites. One or more sites (e.g., CpG sites) can be
identified within each of the regions. A region methylation level
can be calculated for each region. The first methylation level is
for a first region. Each of the region methylation levels are
compared to a respective region cutoff value, which may be the same
or vary among regions. The region cutoff value for the first region
is the first cutoff value. The respective region cutoff values can
be a specified amount (e.g., 0.5) from a reference methylation
level, thereby counting only regions that have a significant
difference from a reference, which may be determined from
non-cancer subjects.
[0317] A first number of regions whose region methylation level
exceeds the respective region cutoff value can be determined, and
compared to a threshold value to determine the classification. In
one implementation, the threshold value is a percentage. Comparing
the first number to a threshold value can include dividing the
first number of regions by a second number of regions (e.g., all of
the regions) before comparing to the threshold value, e.g., as part
of a normalization process.
[0318] As described above, a fractional concentration of tumor DNA
in the biological sample can be used to calculate the first cutoff
value. The fractional concentration can simply be estimated to be
greater than a minimum value, where as a sample with less can be
flagged, e.g., as not being suitable for analysis. The minimum
value can be determined based on an expected difference in
methylation levels for a tumor relative to a reference methylation
level. For example, if a difference if 0.5 (e.g., as used as a
cutoff value), then a certain tumor concentration would be required
to be high enough to see this difference.
[0319] Specific techniques from method 1300 can be applied for
method 2800. In method 1300, copy number variations can be
determined for a tumor (e.g., where the first chromosomal region of
a tumor can be tested for having a copy number change relative to a
second chromosomal region of the tumor). Thus, method 1300 can
presume that a tumor exists. In method 2800, a sample can be tested
for whether there is an indication of any tumor to exist at all,
regardless of any copy number characteristics. Some techniques of
the two methods can be similar. However, the cutoff values and
methylation parameters (e.g., normalized methylation levels) for
method 2800 can detect a statistical difference from a reference
methylation level for non-cancer DNA as opposed to a difference
from a reference methylation level for a mixture of cancer DNA and
non-cancer DNA with some regions possibly having copy number
variations. Thus, the reference values for method 2800 can be
determined from samples without cancer, such as from organisms
without cancer or from non-cancer tissue of the same patient (e.g.,
plasma taken previously or from contemporaneously acquired samples
that are known to not have cancer, which may be determined from
cellular DNA).
[0320] E. Prediction of the Minimal Fractional Concentration of
Tumor-DNA to be Detected Using Plasma DNA Methylation Analysis
[0321] One way to measure the sensitivity of the approach to detect
cancer using the methylation level of plasma DNA is related to the
minimal fractional tumor-derived DNA concentration that is required
to reveal a change in plasma DNA methylation level when compared
with those of controls. The test sensitivity is also dependent on
the extent of difference in DNA methylation between the tumor
tissue and baseline plasma DNA methylations levels in healthy
controls or blood cell DNA. Blood cells are the predominant source
of DNA in plasma of healthy individuals. The larger the difference,
the easier the cancer patients can be discriminated from the
non-cancer individuals and would be reflected as a lower detection
limit of tumor-derived in plasma and a higher clinical sensitivity
in detecting the cancer patients. In addition, the variations in
the plasma DNA methylation in the healthy subjects or in subjects
with different ages (G. Hannum et al. 2013 Mol Cell; 49: 359-367)
would also affect the sensitivity of detecting the methylation
changes associated with the presence of a cancer. A smaller
variation in the plasma DNA methylation in the healthy subjects
would make the detection of the change caused by the presence of a
small amount of cancer-derived DNA easier.
[0322] FIG. 29A is a plot 2900 showing the distribution of the
methylation densities in reference subjects assuming that this
distribution follows a normal distribution. This analysis is based
on each plasma sample only provides one methylation density value,
for example, the methylation density of all autosomes or of a
particular chromosome. It illustrates how the specificity of the
analysis would be affected. In one embodiment, a cutoff of 3 SDs
below the mean DNA methylation density of the reference subjects is
used to determine if a tested sample is significantly more
hypomethylated than samples from the reference subjects. When this
cutoff is used, it is expected that approximately 0.15% of
non-cancer subjects would have false-positive results of being
classified as having cancer resulting in a specificity of
99.85%.
[0323] FIG. 29B is a plot 2950 showing the distributions of
methylation densities in reference subjects and cancer patients.
The cutoff value is 3 SDs below the mean of the methylation
densities of the reference subjects. If the mean of methylation
densities of the cancer patients is 2 SDs below the cutoff value
(i.e. 5 SDs below the mean of the reference subjects), 97.5% of the
cancer subjects would be expected to have a methylation density
below the cutoff value. In other words, the expected sensitivity
would be 97.5% if one methylation density value is provided for
each subject, for example when the total methylation density of the
whole genome, of all autosomes or a particular chromosome is
analyzed. The difference between the mean methylation densities of
the two populations is affected by two factors, namely the degree
of difference in the methylation level between cancer and
non-cancer tissues and the fractional concentration of
tumor-derived DNA in the plasma sample. The higher the values of
these two parameters, the higher the difference in value of the
methylation densities of these two populations would be. In
addition, the lower is the SD of the distributions of methylation
densities of the two populations, the lesser is the overlapping of
the distributions of the methylation densities of the two
populations.
[0324] Here we use a hypothetical example to illustrate this
concept. Let's assume that the methylation density of the tumor
tissue is approximately 0.45 and that of the plasma DNA of the
healthy subjects is approximately 0.7. These assumed values are
similar to those obtained from our HCC patient where the overall
methylation density of the autosomes is 42.9% and the mean
methylation density of the autosomes for the plasma samples from
healthy controls was 71.6% Assuming that the CV of measuring the
plasma DNA methylation density for the whole genome is 1%, the
cutoff value would be 0.7.times.(100%-3.times.1%)=0.679. To achieve
a sensitivity of 97.5%, the mean methylation density of the plasma
DNA for the cancer patients need to be approximately
0.679-0.7.times.(2.times.1%)=0.665. Let f represents the fractional
concentration of tumor-derived DNA in the plasma sample. Then f can
be calculated as (0.7-0.45).times.f=0.7-0.665. Therefore, f is
approximately 14%. From this calculation, it is estimated that the
minimal fractional concentration that can be detected in the plasma
is 14% so as to achieve a diagnostic sensitivity of 97.5% if the
total methylation density of the whole genome is used as the
diagnostic parameter.
[0325] Next we performed this analysis on the data obtained from
the HCC patient. For this illustration, only one methylation
density measurement based on the value estimated from all autosomes
was made for each sample. The mean methylation density was 71.6%
among the plasma samples obtained from the healthy subjects. The SD
of the methylation densities of these four samples was 0.631%.
Therefore, the cutoff value for plasma methylation density would
need to be 71.6%-3.times.0.631%=69.7% to reach a z-score <-3 and
a specificity of 99.85%. To achieve a sensitivity of a 97.5%, the
mean plasma methylation density of the cancer patients would need
to be 2 SDs below the cutoff, i.e. 68.4%. Since the methylation
density of the tumor tissue was 42.9% and using the formula:
P=BKG.times.(1-f)+TUM.times.f, f would need to be at least
11.1%.
[0326] In another embodiment, the methylation densities of
different genomic regions can be analyzed separately as shown in
FIG. 3 or 4. In other words, multiple measurements of the
methylation level were made for each sample. As shown below,
significant hypomethylation could be detected at much lower
fractional tumor DNA concentration in plasma and thus the
diagnostic performance of the plasma DNA methylation analysis for
cancer detection would be enhanced. The number of genomic regions
showing a significant deviation in methylation densities from the
reference population can be counted. Then the number of genomic
regions can be compared to a cutoff value to determine if there is
an overall significant hypomethylation of plasma DNA across the
population of genomic regions surveyed, for example, the 1 Mb bins
of the whole genome. The cutoff value can be established by the
analysis of a group of reference subjects without a cancer or
derived mathematically, for example, according to normal
distribution function.
[0327] FIG. 30 is a plot 3000 showing the distribution of
methylation densities of the plasma DNA of healthy subjects and
cancer patients. The methylation density of each 1 Mb bin is
compared with the corresponding values of the reference group. The
percentage of bins showing significant hypomethylation (3 SDs below
the mean of the reference group) was determined. A cutoff of 10%
being significantly hypomethylated was used to determine if
tumor-derived DNA is present in the plasma sample. Other cutoff
values such as 5%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 60%,
70%, 80% or 90% can also be used according to the desired
sensitivity and specificity of the test.
[0328] For example, to classify a sample as containing
tumor-derived DNA, we can use 10% of the 1 Mb bins showing
significant hypomethylation (z-score <-3) as a cutoff. If there
are more than 10% of the bins being significantly more
hypomethylated than the reference group, then the sample is
classified as positive for the cancer test. For each 1 Mb bin, a
cutoff of 3 SDs below the mean methylation density of the reference
group is used to define a sample as significantly more
hypomethylated. For each of the 1 Mb bins, if the mean plasma DNA
methylation density of the cancer patients is 1.72 SDs lower than
the mean plasma DNA methylation densities of the reference
subjects, then there is a 10% chance that the methylation density
value of any particular bin of a cancer patient would be lower than
the cutoff (i.e. z-score <-3) and gives a positive result. Then,
if we look at all the 1 Mb bins for the whole genome, then
approximately 10% of the bins would be expected to show positive
results of having significantly lower methylation densities (i.e.
z-scores <-3). Assuming that the overall methylation density of
the plasma DNA of the healthy subjects is approximately 0.7 and the
coefficient of variation (CV) of measuring the plasma DNA
methylation density for each 1 Mb bin is 1%, the mean methylation
density of the plasma DNA of the cancer patients would need to be
0.7.times.(100%-1.72.times.1%)=0.68796. Let f be the fractional
concentration of tumor-derived DNA in plasma so as to achieve this
mean plasma DNA methylation density. Assuming that the methylation
density of the tumor tissue is 0.45, then f can be calculated using
the equation
(M.sub.P.sub.ref-M.sub.tumor).times.f=M.sub.P.sub.ref-M.sub.P.sub.cancer
where M.sub.P.sub.ref represents the mean methylation density of
plasma DNA in the reference individuals; M.sub.tumor represents the
methylation density of the tumor tissue in the cancer patient; and
M.sub.P.sub.cancer represents the mean methylation density of
plasma DNA in the cancer patients
[0329] Using this equation, (0.7-0.45).times.f=0.7-0.68796. Thus,
the minimal fractional concentration can be detected using this
approach would be deduced as 4.8%. The sensitivity can be further
enhanced by decreasing the cutoff percentage of bins being
significantly more hypomethylated, for example, from 10% to 5%.
[0330] As shown in the above example, the sensitivity of this
method is determined by the degree of difference in methylation
level between cancer and non-cancer tissues, for example, blood
cells. In one embodiment, only the chromosomal regions which show a
large difference in methylation densities between the plasma DNA of
the non-cancer subjects and the tumor tissue are selected. In one
embodiment, only regions with a difference in methylation density
of >0.5. In other embodiments a difference of 0.4, 0.6, 0.7, 0.8
or 0.9 can be used for selecting the suitable regions. In yet
another embodiments, the physical size of the genomic regions is
not fixed. Instead, the genomic regions are defined, for example,
based on a fixed read depth or a fixed number of CpG sites. The
methylation levels at a multiple of these genomic regions are
assessed for each sample.
[0331] FIG. 31 is a graph 3100 showing the distribution of the
differences in methylation densities between the mean of the plasma
DNA of healthy subjects and the tumor tissue of the HCC patient. A
positive value signifies that the methylation density is higher in
the plasma DNA of the healthy subjects and a negative value
signifies that the methylation density is higher in the tumor
tissue.
[0332] In one embodiment, the bins with the greatest difference
between the methylation density of the cancer and non-cancer
tissues can be selected, for example, those with a difference of
>0.5, regardless of whether the tumor is hypomethylated or
hypermethylated for these bins. The detection limit of fractional
concentration of tumor-derived DNA in plasma can be lowered by
focusing on these bins because of the greater differences between
the distributions of the plasma DNA methylation levels between
cancer and non-cancer subjects given the same fractional
concentration of tumor-derived DNA in the plasma. For example, if
only bins with differences >0.5 are used and a cutoff of 10% of
the bins being significantly more hypomethylated is adopted to
determine if a tested individual has a cancer, the minimal
fractional concentration (f) of tumor derived DNA detected can be
calculated using the following equation:
(M.sub.P.sub.ref-M.sub.tumor).times.f=M.sub.P.sub.ref-M.sub.P.sub.cancer-
,
where M.sub.P.sub.ref represents the mean methylation density of
plasma DNA in the reference individuals; M.sub.tumor represents the
methylation density of the tumor tissue in the cancer patient; and
M.sub.P.sub.cancer represents the mean methylation density of
plasma DNA in the cancer patients.
[0333] While the difference in methylation density between the
plasma of the reference subjects and the tumor tissues is at least
0.5. Then, we have 0.5.times.f=0.7-0.68796 and f=2.4%. Therefore,
by focusing on bins with a higher difference in methylation density
between cancer and non-cancer tissues, the lower limit of
fractional tumor-derived DNA can be lowered from 4.8% to 2.4%. The
information regarding which bins would show larger degrees of
methylation differences between cancer and non-cancer tissues, for
example, blood cells, could be determined from tumor tissues of the
same organ or same histological type obtained from other
individuals.
[0334] In another embodiment, a parameter can be derived from the
methylation density of the plasma DNA of all bins and taking into
account the difference in methylation densities between cancer and
non-cancer tissues. Bins with greater difference can be given a
heavier weight. In one embodiment, the difference in methylation
density between cancer and non-cancer tissue of each bin can
directly be used as the weight if the particular bin in calculating
the final parameter.
[0335] In yet another embodiment, different types of cancer may
have different patterns of methylation in the tumor tissue. A
cancer-specific weight profile can be derived from the degree of
methylation of the specific type of cancer.
[0336] In yet another embodiment, the inter-bin relationship of
methylation density can be determined in subjects with and without
cancer. In FIG. 8, we can observe that in a small number of bins,
the tumor tissues were more methylated than the plasma DNA of the
reference subjects. Thus, the bins with the most extreme values of
difference, e.g. difference >0.5 and difference <0, can be
selected. The ratio of the methylation density of these bins can
then be used to indicate if the tested individual has a cancer. In
other embodiments, the difference and quotient of the methylation
density of different bins can be used as parameters for indicating
the inter-bin relationship.
[0337] We further assessed the detection sensitivity of the
approach to detect or assess tumor using the methylation densities
of multiple genomic regions as illustrated by the data obtained
from the HCC patient. First, we mixed reads from the pre-operative
plasma with those obtained from the plasma samples of the healthy
controls to simulate plasma samples that contained fractional tumor
DNA that ranged from 20% to 0.5%. We then scored the percentage of
1 Mb bins (out of 2,734 bins in the whole genome) with methylation
densities equivalent to z-scores <-3. When the fractional tumor
DNA concentration in plasma was 20%, 80.0% of the bins showed
significant hypomethylation. The corresponding data for fractional
tumor DNA concentration in plasma of 10%, 5%, 2%, 1% and 0.5% were
67.6%, 49.7%, 18.9%, 3.8% and 0.77% of the bins showing
hypomethylation, respectively. Since the theoretical limit of the
number of bins showing z-scores <-3 in the controls samples is
0.15%, our data show that there were still more bins (0.77%) beyond
the theoretical cutoff limit even when the tumor fractional
concentration was just 0.5%.
[0338] FIG. 32A is a table 3200 showing the effect of reducing the
sequencing depth when the plasma sample contained 5% or 2% tumor
DNA. A high proportion of bins (>0.15%) showing significant
hypomethylation could still be detected when the mean sequencing
depth was just 0.022 times the haploid genome.
[0339] FIG. 32B is a graph 3250 showing the methylation densities
of the repeat elements and non-repeat regions in the plasma of the
four healthy control subjects, the buffy coat, the normal liver
tissue, the tumor tissue, the pre-operative plasma and the
post-operative plasma samples of the HCC patient. It can be
observed that the repeat elements were more methylated (higher
methylation density) than the non-repeat regions in both cancer and
non-cancer tissues. However, the difference in methylation between
repeat elements and non-repeat regions was bigger in the non-cancer
tissues and the plasma DNA of the healthy subjects when compared
with the tumor tissues.
[0340] As a result, the plasma DNA of the cancer patient had a
larger reduction in methylation density at the repeat elements than
in the non-repeat regions. The difference in plasma DNA methylation
density between the mean of the four healthy controls and the HCC
patient was 0.163 and 0.088 for the repeat elements and the
non-repeat regions, respectively. The data on the pre-operative and
post-operative plasma samples also showed that the dynamic range in
the change in methylation density was larger in the repeat than the
non-repeat regions. In one embodiment, the plasma DNA methylation
density of the repeat elements can be used for determining if a
patient is affected by cancer or the monitoring of the disease
progression.
[0341] As discussed above, the variation in methylation densities
in the plasma of the reference subjects would also affect the
accuracy of differentiating cancer patients from non-cancer
individuals. The tighter the distribution of methylation densities
(i.e. smaller standard deviation), the more accurate it would be to
differentiate cancer and non-cancer subjects. In another
embodiment, the coefficient of variation (CV) of the methylation
densities of the 1 Mb bins can be used as a criterion for selecting
the bins with low variability of plasma DNA methylation densities
in the reference group. For example, only bins with CV<1% are
selected. Other values, for example 0.5%, 0.75%, 1.25% and 1.5% can
also be used as criteria for selecting the bins with low
variability in methylation density. In yet another embodiment, the
selection criteria can include both the CV of the bin and the
difference in methylation density between cancer and non-cancer
tissues.
[0342] The methylation density can also be used to estimate the
fractional concentration of tumor-derived DNA in a plasma sample
when the methylation density of the tumor tissue is known. This
information can be obtained by the analysis of the tumor of the
patient or from the survey of the tumors from a number of patients
having the same type of cancer. As discussed above, the plasma
methylation density (P) can be expressed using the following
equation: P=BKG.times.(1-f)+TUM.times.f where BKG is the background
methylation density from the blood cells and other organs, TUM is
the methylation density in the tumor tissue, and f is the
fractional concentration of tumor-derived DNA in the plasma sample.
This can be rewritten as:
f = BKG - P BKG - TUM . ##EQU00008##
[0343] The values of BKG can be determined by analyzing the
patient's plasma sample at a time point that the cancer is not
present or from the survey of a reference group of individuals
without cancer. Therefore, after measuring the plasma methylation
density, f can be determined.
[0344] F. Combination
[0345] The methylation analysis approach described in this
invention can be used in combination with other methods that are
based on the genetic changes of tumor-derived DNA in plasma.
Examples of such methods include the analysis for cancer-associated
chromosomal aberrations (K. C. Chan et al. 2013 Clin Chem;
59:211-224; R. J. Leary et al. 2012 Sci Transl Med; 4:162ra154) and
cancer-associated single nucleotide variations in plasma (K. C.
Chan et al. 2013 Clin Chem; 59:211-224). There are advantages of
the methylation analysis approach over those genetic
approaches.
[0346] As shown in FIG. 21A, the hypomethylation of the tumor DNA
is a global phenomenon involving regions distributed across almost
the entire genome. Therefore, the DNA fragments from all
chromosomal regions would be informative regarding the potential
contribution of the tumor-derived hypomethylated DNA to the
plasma/serum DNA in the patient. In contrast, chromosomal
aberrations (either amplification or deletion of a chromosomal
region) are only present in some chromosomal and the DNA fragments
from the regions without a chromosome aberration in the tumor
tissue would not be informative in the analysis (K. C. Chan et al.
2013 Clin Chem; 59: 211-224). Similarly only a few thousand of
single nucleotide alterations are observed in each cancer genome
(K. C. Chan et al. 2013 Clin Chem; 59: 211-224). DNA fragments that
do not overlap with these single nucleotide changes would not be
informative in determining if tumor-derived DNA is present in the
plasma. Therefore, this methylation analysis approach is
potentially more cost-effective than those genetic approaches for
detecting cancer-associated changes in the circulation.
[0347] In one embodiment, the cost-effectiveness of plasma DNA
methylation analysis can further be enhanced by enriching for DNA
fragments from the most informative regions, for example regions
with highest differential methylation difference between cancer and
non-cancer tissues. Examples for the methods of enriching for these
regions include the use of hybridization probes (e.g. Nimblegen
SeqCap system and Agilent SureSelect Target Enrichment system), PCR
amplification and solid phase hybridization (e.g. Illumina TruSeq
Enrichment kit).
[0348] G. Tissue-Specific Analysis/Donors
[0349] Tumor-derived cells invade and metastasize to adjacent or
distant organs. The invaded tissues or metastatic foci contribute
DNA into plasma as a result of cell death. By analyzing the
methylation profile of DNA in the plasma of cancer patients and
detecting the presence of tissue-specific methylation signatures,
one could detect the types of tissues that are involved in the
disease process. This approach provides a noninvasive anatomic scan
of the tissues involved in the cancerous process to aid in the
identification of the organs involved as the primary and metastatic
sites. Monitoring the relative concentrations of the methylation
signatures of the involved organs in plasma would also allow one to
assess the tumor burden of those organs and determine if the cancer
process in that organ is deteriorating or improving or had been
cured. For example, if a gene X is specifically methylated in the
liver. Then, metastatic involvement of the liver by a cancer (e.g.
colorectal cancer) will be expected to increase the concentration
of methylated sequences from gene X in the plasma. There would also
be another sequence or groups of sequences with similar methylation
characteristics as gene X. One could then combine the results from
such sequences. Similar considerations are applicable to other
tissues, e.g. the brain, bones, lungs and kidneys, etc.
[0350] On the other hand, DNA from different organs is known to
exhibit tissue-specific methylation signatures (B. W. Futscher et
al. 2002 Nat Genet; 31:175-179; S. S. C. Chim et al. 2008 Clin
Chem; 54: 500-511). Thus, methylation profiling in plasma can be
used for elucidating the contribution of tissues from various
organs into plasma. The elucidation of such contribution can be
used for assessing organ damage, as plasma DNA is believed to be
released when cells die. For example, liver pathology such as
hepatitis (e.g. by viruses, autoimmune processes, etc) or
hepatoxicity (e.g. drug overdose (such as by paracetamol) or toxins
(such as alcohol) caused by drugs is associated with liver cell
damage and will be expected to be associated with increased level
of liver-derived DNA in plasma. For example, if a gene X is
specifically methylated in the liver. Then, liver pathology will be
expected to increase the concentration of methylated sequences from
gene X in the plasma. Conversely, if a gene Y is specifically
hypomethylated in the liver. Then, liver pathology will be expected
to decrease the concentration of methylated sequences from gene Y
in the plasma.
[0351] The present described approach could also be applied to the
assessment of donor-derived DNA in the plasma of organ
transplantation recipients (Y. M. D. Lo et al. 1998 Lancet;
351:1329-1330). Polymorphic differences between the donor and
recipient had been used to distinguish the donor-derived DNA from
the recipient-derived DNA in plasma (Y. W. Zheng et al. 2012 Clin
Chem; 58: 549-558). We propose that tissue-specific methylation
signatures of the transplanted organ could also be used as a method
to detect the donor's DNA in the recipient's plasma.
[0352] By monitoring the concentration of the donor's DNA, one
could noninvasively assess the status of the transplanted organ.
For example, transplant rejection is associated with higher rate of
cell death and hence the concentration of the donor's DNA, as
reflected by the methylation signature of the transplanted organ,
would be increased when compared with the time when the patient is
in stable condition or when compared to other stable transplant
recipients or healthy controls without transplantation. Similar to
what has been described for cancer, the donor-derived DNA could be
identified in the plasma of transplantation recipients by detecting
for all or some of the characteristic features, including
polymorphic differences, shorter size DNA for the transplanted
solid organs (Y. W. Zheng et al. 2012 Clin Chem; 58: 549-558) and
tissue-specific methylation profile.
IX. Materials and Methods
[0353] A. Preparation of Bisulfite-Treated DNA Libraries and
Sequencing
[0354] Genomic DNA (5 .mu.g) added with 0.5% (w/w) unmethylated
lambda DNA (Promega) was fragmented by a Covaris S220 System
(Covaris) to approximately 200 bp in length. DNA libraries were
prepared using the Paired-End Sequencing Sample Preparation Kit
(Illumina) according to the manufacturer's instructions, except
that methylated adapters (Illumina) were ligated to the DNA
fragments. Following two rounds of purification using AMPure XP
magnetic beads (Beckman Coulter), the ligation products were split
into 2 portions, one of which was subjected to 2 rounds of
bisulfite modification with an EpiTect Bisulfite Kit (Qiagen).
Unmethylated cytosines at CpG sites in the inserts were converted
to uracils while the methylated cytosines remained unchanged. The
adapter-ligated DNA molecules, either treated or untreated with
sodium bisulfite, were enriched by 10 cycles of PCR using the
following recipe: 2.5U PfuTurboCx hotstart DNA polymerase (Agilent
Technologies), 1.times. PfuTurboCx reaction buffer, 25 .mu.M dNTPs,
1 .mu.l PCR Primer PE 1.0 and 1 .mu.l PCR Primer PE 2.0 (Illumina)
in a 50 .mu.l-reaction. The thermocycling profile was: 95.degree.
C. for 2 min, 98.degree. C. for 30 s, then 10 cycles of 98.degree.
C. for 15 s, 60.degree. C. for 30 s and 72.degree. C. for 4 min,
with a final step of 72.degree. C. for 10 min (R. Lister, et al.
2009 Nature 462, 315-322). The PCR products were purified using
AMPure XP magnetic beads.
[0355] Plasma DNA extracted from 3.2-4 ml of maternal plasma
samples was spiked with fragmented lambda DNA (25 pg per ml plasma)
and subjected to library construction as described above (R. W. K.
Chiu et al. 2011 BMJ; 342: c7401). After ligating to the methylated
adapters, the ligation products were split into 2 halves and a
portion was subjected to 2 rounds of bisulfite modification. The
bisulfite-treated or untreated ligation products were then enriched
by 10 cycles of PCR as described above.
[0356] Bisulfite-treated or untreated DNA libraries were sequenced
for 75 bp in a paired-end format on HiSeq2000 instruments
(Illumina). DNA clusters were generated with a Paired-End Cluster
Generation Kit v3 on a cBot instrument (Illumina). Real-time image
analysis and base calling were performed using the HiSeq Control
Software (HCS) v1.4 and Real Time Analysis (RTA) Software v1.13
(Illumina), by which the automated matrix and phasing calculations
were based on the spiked-in PhiX control v3 sequenced with the DNA
libraries.
[0357] B. Sequence Alignment and Identification of Methylated
Cytosines
[0358] After base calling, adapter sequences and low quality bases
(i.e. quality score <20) on the fragment ends were removed. The
trimmed reads in FASTQ format were then processed by a methylation
data analysis pipeline called Methy-Pipe (P. Jiang, et al.
Methy-Pipe: An integrated bioinformatics data analysis pipeline for
whole genome methylome analysis, paper presented at the IEEE
International Conference on Bioinformatics and Biomedicine
Workshops, Hong Kong, 18 to 21 Dec. 2010). In order to align the
bisulfite converted sequencing reads, we first performed in silico
conversion of all cytosine residues to thymines, on the Watson and
Crick strands separately, using the reference human genome (NCBI
build 36/hg18). We then performed in silico conversion of each
cytosine to thymine in all the processed reads and kept the
position information of each converted residue. SOAP2 (R. Li, et
al. 2009 Bioinformatics; 25: 1966-1967) was used to align the
converted reads to the two pre-converted reference human genomes,
with a maximum of two mismatches allowed for each aligned read.
Only reads mappable to a unique genomic location were selected.
Ambiguous reads which mapped to both the Watson and Crick strands
and duplicated (clonal) reads which had the same start and end
genomic positions were removed. Sequenced reads with insert size
.ltoreq.600 bp were retained for the methylation and size
analyses.
[0359] Cytosine residues in the CpG dinucleotide context were the
major targets for the downstream DNA methylation studies. After
alignment, the cytosines originally present on the sequenced reads
were recovered based on the positional information kept during the
in silico conversion. The recovered cytosines among the CpG
dinucleotides were scored as methylated. Thymines among the CpG
dinucleotides were scored as unmethylated. The unmethylated lambda
DNA included during library preparation served as an internal
control for estimating the efficiency of sodium bisulfite
modification. All cytosines on the lambda DNA should have been
converted to thymines if the bisulfite conversion efficiency was
100%.
X. Summary
[0360] With the use of embodiments described herein, one could
screen, detect, monitor or prognosticate cancer noninvasively using
for example the plasma of a subject. One could also carry out
prenatal screening, diagnosis, investigation or monitoring of a
fetus by deducing the methylation profile of fetal DNA from
maternal plasma. To illustrate the power of the approach, we showed
that information that was conventionally obtained via the study of
placental tissues could be assessed directly from maternal plasma.
For example, the imprinting status of gene loci, identification of
loci with differential methylation between the fetal and maternal
DNA and the gestational variation in the methylation profile of
gene loci were achieved through the direct analysis of maternal
plasma DNA. The major advantage of our approach is that the fetal
methylome could be assessed comprehensively during pregnancy
without disruption to the pregnancy or the need for invasive
sampling of fetal tissues. Given the known association between
altered DNA methylation status and the many pregnancy-associated
conditions, the approach described in this study can serve as an
important tool for investigating the pathophysiology of and the
identification of biomarkers for those conditions. By focusing on
the imprinted loci, we showed that both the paternally-transmitted
as well as the maternally-transmitted fetal methylation profiles
could be assessed from maternal plasma. This approach may
potentially be useful for the investigation of imprinting diseases.
Embodiments can also be applied directly for the prenatal
assessment of fetal or pregnancy-associated diseases.
[0361] This is also the first study where genome-wide bisulfite
sequencing has been applied to investigate the DNA methylation
profile of placental tissues. There are approximately 28M CpG sites
in the human genome (C. Clark, et al. 2012 PLoS One; 7: e50233).
Our bisulfite sequencing data of the CVS and term placental tissue
sample covered more than 80% of the CpGs. This represents a
substantially broader coverage than those achievable using other
high-throughput platforms. For example, the Illumina Infinium
HumanMethylation 27K beadchip array that was used in a previous
study on placental tissues (T. Chu, et al. 2011 PLoS One; 6:
e14723). only covered 0.1% of the CpGs in the genome. The Illumina
Infinium HumanMethylation 450K beadchip array that was available
more recently only covered 1.7% of the CpGs (C. Clark, et al.
2012). Because the MPS approach is free from restrictions related
to probe design, hybridization efficiency or strength of antibody
capture, CpGs within or beyond CpG islands and in most sequence
contexts could be assessed.
XI. Computer System
[0362] Any of the computer systems mentioned herein may utilize any
suitable number of subsystems. Examples of such subsystems are
shown in FIG. 33 in computer apparatus 3300. In some embodiments, a
computer system includes a single computer apparatus, where the
subsystems can be the components of the computer apparatus. In
other embodiments, a computer system can include multiple computer
apparatuses, each being a subsystem, with internal components.
[0363] The subsystems shown in FIG. 33 are interconnected via a
system bus 3375. Additional subsystems such as a printer 3374,
keyboard 3378, storage device(s) 3379, monitor 3376, which is
coupled to display adapter 3382, and others are shown. Peripherals
and input/output (I/O) devices, which couple to I/O controller
3371, can be connected to the computer system by any number of
means known in the art, such as serial port 3377. For example,
serial port 3377 or external interface 3381 (e.g. Ethernet, Wi-Fi,
etc.) can be used to connect computer system 3300 to a wide area
network such as the Internet, a mouse input device, or a scanner.
The interconnection via system bus 3375 allows the central
processor 3373 to communicate with each subsystem and to control
the execution of instructions from system memory 3372 or the
storage device(s) 3379 (e.g., a fixed disk), as well as the
exchange of information between subsystems. The system memory 3372
and/or the storage device(s) 3379 may embody a computer readable
medium. Any of the values mentioned herein can be output from one
component to another component and can be output to the user.
[0364] A computer system can include a plurality of the same
components or subsystems, e.g., connected together by external
interface 3381 or by an internal interface. In some embodiments,
computer systems, subsystem, or apparatuses can communicate over a
network. In such instances, one computer can be considered a client
and another computer a server, where each can be part of a same
computer system. A client and a server can each include multiple
systems, subsystems, or components.
[0365] It should be understood that any of the embodiments of the
present invention can be implemented in the form of control logic
using hardware (e.g. an application specific integrated circuit or
field programmable gate array) and/or using computer software with
a generally programmable processor in a modular or integrated
manner. As user herein, a processor includes a multi-core processor
on a same integrated chip, or multiple processing units on a single
circuit board or networked. Based on the disclosure and teachings
provided herein, a person of ordinary skill in the art will know
and appreciate other ways and/or methods to implement embodiments
of the present invention using hardware and a combination of
hardware and software.
[0366] Any of the software components or functions described in
this application may be implemented as software code to be executed
by a processor using any suitable computer language such as, for
example, Java, C++ or Perl using, for example, conventional or
object-oriented techniques. The software code may be stored as a
series of instructions or commands on a computer readable medium
for storage and/or transmission, suitable media include random
access memory (RAM), a read only memory (ROM), a magnetic medium
such as a hard-drive or a floppy disk, or an optical medium such as
a compact disk (CD) or DVD (digital versatile disk), flash memory,
and the like. The computer readable medium may be any combination
of such storage or transmission devices.
[0367] Such programs may also be encoded and transmitted using
carrier signals adapted for transmission via wired, optical, and/or
wireless networks conforming to a variety of protocols, including
the Internet. As such, a computer readable medium according to an
embodiment of the present invention may be created using a data
signal encoded with such programs. Computer readable media encoded
with the program code may be packaged with a compatible device or
provided separately from other devices (e.g., via Internet
download). Any such computer readable medium may reside on or
within a single computer program product (e.g. a hard drive, a CD,
or an entire computer system), and may be present on or within
different computer program products within a system or network. A
computer system may include a monitor, printer, or other suitable
display for providing any of the results mentioned herein to a
user.
[0368] Any of the methods described herein may be totally or
partially performed with a computer system including one or more
processors, which can be configured to perform the steps. Thus,
embodiments can be directed to computer systems configured to
perform the steps of any of the methods described herein,
potentially with different components performing a respective steps
or a respective group of steps. Although presented as numbered
steps, steps of methods herein can be performed at a same time or
in a different order. Additionally, portions of these steps may be
used with portions of other steps from other methods. Also, all or
portions of a step may be optional. Additionally, any of the steps
of any of the methods can be performed with modules, circuits, or
other means for performing these steps.
[0369] The specific details of particular embodiments may be
combined in any suitable manner without departing from the spirit
and scope of embodiments of the invention. However, other
embodiments of the invention may be directed to specific
embodiments relating to each individual aspect, or specific
combinations of these individual aspects
[0370] The above description of exemplary embodiments of the
invention has been presented for the purposes of illustration and
description. It is not intended to be exhaustive or to limit the
invention to the precise form described, and many modifications and
variations are possible in light of the teaching above. The
embodiments were chosen and described in order to best explain the
principles of the invention and its practical applications to
thereby enable others skilled in the art to best utilize the
invention in various embodiments and with various modifications as
are suited to the particular use contemplated.
[0371] A recitation of "a", "an" or "the" is intended to mean "one
or more" unless specifically indicated to the contrary.
[0372] All patents, patent applications, publications, and
descriptions mentioned here are incorporated by reference in their
entirety for all purposes. None is admitted to be prior art.
* * * * *