U.S. patent application number 13/863992 was filed with the patent office on 2013-08-15 for fetal aneuploidy detection by sequencing.
The applicant listed for this patent is Barb Ariel Cohen, Ronald Davis, Ravi Kapur, Roland Stoughton, Mehmet Toner. Invention is credited to Barb Ariel Cohen, Ronald Davis, Ravi Kapur, Roland Stoughton, Mehmet Toner.
Application Number | 20130210644 13/863992 |
Document ID | / |
Family ID | 38832873 |
Filed Date | 2013-08-15 |
United States Patent
Application |
20130210644 |
Kind Code |
A1 |
Stoughton; Roland ; et
al. |
August 15, 2013 |
FETAL ANEUPLOIDY DETECTION BY SEQUENCING
Abstract
The present invention provides apparatus and methods for
enriching components or cells from a sample and conducting genetic
analysis, such as SNP genotyping to provide diagnostic results for
fetal disorders or conditions.
Inventors: |
Stoughton; Roland; (The Sea
Ranch, CA) ; Kapur; Ravi; (Sharon, MA) ;
Toner; Mehmet; (Wellesley, MA) ; Davis; Ronald;
(Palo Alto, CA) ; Cohen; Barb Ariel; (Watertown,
MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Stoughton; Roland
Kapur; Ravi
Toner; Mehmet
Davis; Ronald
Cohen; Barb Ariel |
The Sea Ranch
Sharon
Wellesley
Palo Alto
Watertown |
CA
MA
MA
CA
MA |
US
US
US
US
US |
|
|
Family ID: |
38832873 |
Appl. No.: |
13/863992 |
Filed: |
April 16, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12751940 |
Mar 31, 2010 |
|
|
|
13863992 |
|
|
|
|
11763133 |
Jun 14, 2007 |
|
|
|
12751940 |
|
|
|
|
60820778 |
Jul 28, 2006 |
|
|
|
60804816 |
Jun 14, 2006 |
|
|
|
Current U.S.
Class: |
506/2 |
Current CPC
Class: |
C12Q 1/6883 20130101;
C12Q 2600/156 20130101; C12Q 1/6874 20130101 |
Class at
Publication: |
506/2 |
International
Class: |
C12Q 1/68 20060101
C12Q001/68 |
Claims
1. A method of determining a likelihood of the presence or absence
of a fetal aneuploidy in a fetus using a maternal sample derived
from a pregnant human female comprising fetal and maternal DNA, the
method comprising: (a) selectively amplifying a plurality of single
nucleotide polymorphism (SNP) sites of a first chromosome selected
from the group consisting of chromosomes 13, 18, 21, X, and Y in
the maternal sample comprising fetal and maternal DNA; (b)
sequencing the amplified plurality of SNP sites of the fetal and
maternal DNA of (a) and determining abundances of alleles at the
plurality of SNP sites to obtain a first set of genotype data; (c)
selectively amplifying the plurality of SNP sites of the first
chromosome in a maternal-only sample comprising maternal DNA,
wherein the maternal-only sample is essentially free of fetal DNA;
(d) sequencing the amplified plurality of SNP sites of the first
chromosome of the maternal DNA of (c) and determining abundances of
alleles at the plurality of SNP sites to obtain a second set of
genotype data; (e) creating model sets corresponding to a plurality
of fetal ploidy states using the second set of genotype data; (f)
comparing the first set of genotype data to the model sets and
selecting from the model sets a model that provides a best fit to
the first set of genotype data; and (g) determining the likelihood
of the presence or absence of a fetal aneuploidy of the first
chromosome using the selected model that provides the best fit to
the first set of genotype data.
2. The method of claim 1, wherein (a) and (b) each comprise
selectively amplifying at least 100 SNP sites of the first
chromosome.
3. The method of claim 1, wherein the fetal aneuploidy comprises a
trisomy of the first chromosome.
4. The method of claim 3, wherein the fetal aneuploidy comprises
trisomy 13, trisomy 18, or trisomy 21.
5. The method of claim 1, wherein the fetal aneuploidy comprises an
aneuploidy of chromosome X.
6. The method of claim 5, wherein the fetal aneuploidy comprises
monosomy X.
7. The method of claim 1, wherein the fetal aneuploidy comprises an
aneuploidy of chromosome Y.
8. The method of claim 1, wherein the fetal aneuploidy comprises
XXX, XXY, or XYY.
9. The method of claim 1, wherein selectively amplifying the
plurality of SNP sites of the first chromosome of (a) comprises
performing polymerase chain reaction (PCR) amplification.
10. The method of claim 1, wherein selectively amplifying the
plurality of SNP sites of the first chromosome of (c) comprises
performing polymerase chain reaction (PCR) amplification.
11. The method of claim 1, wherein the plurality of SNP sites of
(a) are amplified in parallel.
12. The method of claim 1, wherein the plurality of SNP sites of
(c) are amplified in parallel.
13. The method of claim 2, wherein the at least 100 SNP sites of
each of (a) and (c) are amplified in parallel.
14. The method of claim 1, wherein (a) further comprises
selectively amplifying a plurality of SNP sites of a second
chromosome that is different from the first chromosome selected
from the group consisting of chromosomes 13, 18, 21, X, and Y.
15. The method of claim 1, wherein (c) further comprises
selectively amplifying a plurality of SNP sites of a second
chromosome that is different from the first chromosome selected
from the group consisting of chromosomes 13, 18, 21, X, and Y.
16. The method of claim 1, wherein sequencing of (b) and sequencing
of (d) each comprise sequencing millions of molecules in
parallel.
17. The method of claim 1, wherein steps (c) and (d) are performed
prior to steps (a) and (b).
18. The method of claim 1, further comprising performing paternal
genotyping to obtain a third set of genotype data.
19. The method of claim 18, wherein (e) further comprises creating
model sets corresponding to a plurality of fetal ploidy states
using the second and third sets of genotype data.
20. The method of claim 1, wherein the plurality of fetal
chromosomal ploidy states comprise disomy, trisomy, and
monosomy.
21. A method of determining a likelihood of the presence or absence
of a fetal aneuploidy in a fetus using a maternal sample derived
from a pregnant human female comprising fetal and maternal DNA, the
method comprising: (a) amplifying by PCR amplification a plurality
of selected genomic DNA regions comprising of a first chromosome
selected from the group consisting of chromosomes 13, 18, 21, X,
and Y in the maternal sample comprising fetal and maternal DNA, and
amplifying by PCR amplification a plurality of selected genomic DNA
regions of SNPs of the first chromosome in a maternal-only sample
comprising maternal DNA, wherein the maternal-only sample is
essentially free of fetal DNA, and wherein each genomic DNA region
comprises a locus comprising one or more single nucleotide
polymorphism (SNP) positions of interest; (b) sequencing the
amplified plurality of selected genomic DNA regions of the fetal
and maternal DNA of the maternal sample and determining abundances
of alleles at one or more SNP positions to obtain a first set of
genotype data, and sequencing the amplified plurality of selected
genomic DNA of the maternal DNA of the maternal-only sample and
determining abundances of alleles at one or more SNP positions to
obtain a second set of genotype data; (c) creating model sets
corresponding to a plurality of fetal ploidy states using the
second set of genotype data; (d) comparing the first set of
genotype data to the model sets and selecting from the model sets a
model that provides a best fit to the first set of genotype data;
and (e) determining the likelihood of the presence or absence of a
fetal aneuploidy of the first chromosome using the selected model
that provides the best fit to the first set of genotype data.
22. The method of claim 21, wherein (c) comprises amplifying a
plurality of selected genomic DNA regions of a second chromosome
that is different from the first chromosome and is selected from
the group consisting of chromosomes 13, 18, 21, X, and Y.
23. The method of claim 21, wherein the plurality of fetal
chromosomal ploidy states comprise disomy, trisomy, and
monosomy.
24. The method of claim 23, wherein the fetal aneuploidy comprises
trisomy 13, trisomy 18, trisomy 21, or monosomy X.
25. The method of claim 21, wherein (a) comprises amplifying a
plurality of selected genomic DNA regions of each of chromosomes
13, 18, 21, X, and Y.
26. The method of claim 21, wherein at least one SNP position
comprises an informative SNP.
Description
CROSS-REFERENCE
[0001] This application is a continuation application of U.S.
patent application Ser. No. 12/751,940, filed Mar. 31, 2010, which
is a continuation application of U.S. patent application Ser. No.
11/763,133, filed Jun. 14, 2007, now abandoned, which claims the
benefit of U.S. Provisional Application No. 60/804,816, filed Jun.
14, 2006, which applications are incorporated herein by reference.
U.S. patent application Ser. No. 11/763,133 also claims the benefit
of U.S. Provisional Application No. 60/820,778, filed Jul. 28,
2006.
SEQUENCE LISTING
[0002] The instant application contains a Sequence Listing which
has been submitted via EFS-Web and is hereby incorporated by
reference in its entirety. Said ASCII copy, created on Apr. 11,
2013, is named 32047-718-305-Seqlisting.txt and is 3 Kilobytes in
size.
BACKGROUND OF THE INVENTION
[0003] Analysis of specific cells can give insight into a variety
of diseases. These analyses can provide non-invasive tests for
detection, diagnosis and prognosis of diseases, thereby eliminating
the risk of invasive diagnosis. For instance, social developments
have resulted in an increased number of prenatal tests. However,
the available methods today, amniocentesis and chorionic villus
sampling (CVS) are potentially harmful to the mother and to the
fetus. The rate of miscarriage for pregnant women undergoing
amniocentesis is increased by 0.5-1%, and that figure is slightly
higher for CVS. Because of the inherent risks posed by
amniocentesis and CVS, these procedures are offered primarily to
older women, i.e., those over 35 years of age, who have a
statistically greater probability of bearing children with
congenital defects. As a result, a pregnant woman at the age of 35
has to balance an average risk of 0.5-1% to induce an abortion by
amniocentesis against an age related probability for trisomy 21 of
less than 0.3%.
[0004] Some non-invasive methods have already been developed to
diagnose specific congenital defects. For example, maternal serum
alpha-fetoprotein, and levels of unconjugated estriol and human
chorionic gonadotropin can be used to identify a proportion of
fetuses with Down's syndrome, however, these tests are not one
hundred percent accurate. Similarly, ultrasonography is used to
determine congenital defects involving neural tube defects and limb
abnormalities, but is useful only after fifteen weeks'
gestation.
[0005] The methods of the present invention allow for the detection
of fetal cells and fetal abnormalities when fetal cells are mixed
with a population of maternal cells, even when the maternal cells
dominate the mixture.
SUMMARY OF THE INVENTION
[0006] The presence of fetal cells within the blood of pregnant
women offers the opportunity to develop a prenatal diagnostic that
replaces amniocentesis and thereby eliminates the risk of today's
invasive diagnosis. However, fetal cells represent a small number
of cells against the background of a large number of maternal cells
in the blood which make the analysis time consuming and prone to
error. Current technologies and protocols for highly parallel SNP
detection with DNA microarray readout result in inaccurate calls
when there are too few starting DNA copies or when a particular
allele represents a small fraction in the population of input DNA
molecules.
[0007] The present invention relates to methods for detecting a
fetal abnormality by determining the ratio of the abundance of one
or more maternal alleles to the abundance of one or more paternal
alleles in the genomic DNA of a sample. The genomic region includes
a single nucleotide polymorphism (SNP), which can preferably be an
informative SNP. The SNP can be detected by methods that include
using a DNA microarray, bead microarray, or high throughput
sequencing. In some embodiments, determining the ratio involves
detecting an abundance of a nucleotide base at a SNP position. In
other embodiments, determining the ratio also comprises calculating
error rate based amplification. Prior to determining the abundance
of allele(s), the sample can be enriched for fetal cells.
[0008] The method of detection is provided by highly parallel SNP
detection that can be used to determine the ratios of abundance of
maternal and paternal alleles at a plurality of genomic regions
present in the sample. In some embodiments, the ratios of abundance
are determined in at least 100 genomic regions, which can comprise
a single locus, different loci, a single chromosome, or different
chromosomes. In some embodiments, a first genomic region (SNP)
analyzed is in a genomic region suspected of being trisomic or is
trisomic and a second genomic region (SNP) analyzed is in a
non-trisomic region or a region suspected of being non-trisomic.
The ratio of alleles in the first genomic region can then be
compared to the ratio of alleles in the second genomic region, and
in some embodiments, the comparison is made by determining the
difference in the means of the ratios in the first and second
genomic regions. An increase in paternal abundance can be
indicative of paternal trisomy, while an increase in maternal
abundance can be indicative of maternal trisomy. Alternatively, an
increase in paternal abundance or maternal abundance of one or more
alleles is indicative of partial trisomy. The first and second
genomic regions can be on the same or different chromosomes.
[0009] In an embodiment, the invention provides for a method for
detecting a fetal abnormality comprising comparing an abundance of
one or more maternal alleles in a first genomic region in a
maternal blood sample, where said genomic region is suspected of
trisomy with an abundance of one or more maternal alleles in a
second genomic region in said blood sample wherein said second
genomic region is non-trisomic. Up to 20 ml of blood can be used to
detect the fetal abnormality. The first genomic region that is
suspected of trisomy and the second genomic region that is a
non-trisomic region can each be present on chromosomes 13, 18, 21
and on the X chromosome.
[0010] In some embodiments, a ratio of the abundance of the
maternal alleles in the first genomic region to the abundance of
the maternal alleles in the second genomic region can be determined
and compared to a second ratio obtained for a control sample. The
control sample can comprise a diluted portion of the maternal
sample, which can be diluted by a factor of at least 1,000.
[0011] In some embodiments, detecting the fetal abnormality further
involves estimating the number of fetal cells present in the
maternal sample. This can be performed by, e.g., ranking the
alleles detected according to their abundance. The ranking can then
be used to determine an abundance of one or more paternal alleles.
In some embodiments, data models can be fitted for optimal
detection of aneuploidy. The methods herein can be used to identify
monoploidy, triploidy, tetraploidy, pentaploidy and other multiples
of the normal haploid state. For example, the data models can be
used to determine estimates for the fraction of fetal cells present
in a sample and for detecting a fetal abnormality or condition.
[0012] In some embodiments, the abundance of one or more paternal
alleles can be compared to the abundance of the maternal alleles at
one or more genetic regions. In other embodiments, one or more
ratios of the abundance of the paternal allele(s) to the abundance
of the maternal allele(s) at one or more genetic regions can be
compared with an estimate fraction of fetal cells. A statistical
analysis can be performed on the one or more ratios of the
abundance of paternal alleles to the abundance of the maternal
alleles to determine the presence of fetal DNA in the sample with a
level of confidence that exceeds 90%.
INCORPORATION BY REFERENCE
[0013] All publications and patent applications mentioned in this
specification are herein incorporated by reference to the same
extent as if each individual publication or patent application was
specifically and individually indicated to be incorporated by
reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] The novel features of the invention are set forth with
particularity in the appended claims. A better understanding of the
features and advantages of the present invention will be obtained
by reference to the following detailed description that sets forth
illustrative embodiments, in which the principles of the invention
are utilized, and the accompanying drawings of which:
[0015] FIG. 1 illustrates an overview of the process of the
invention.
[0016] FIGS. 2A-2D illustrates one embodiment of a size-based
separation module.
[0017] FIGS. 3A-3C illustrates one embodiment of an affinity
separation module.
[0018] FIG. 4 illustrates one embodiment of a magnetic separation
module.
[0019] FIG. 5 illustrates an overview for a typical parallel SNP
genotyping assay.
[0020] FIG. 6 illustrates the types of SNP calls that result
depicting allele strengths at different loci.
[0021] FIG. 7 illustrates the concept of rank ordering of allele
strengths.
[0022] FIG. 8 illustrates a histogram of paternal allele strength
normalized relative to maternal alleles.
[0023] FIGS. 9A-9B illustrate cell smears of the product and waste
fractions.
[0024] FIG. 10A-10F illustrate isolated fetal cells confirmed by
the reliable presence of male Y chromosome.
[0025] FIG. 11 illustrates trisomy 21 pathology in an isolated
fetal nucleated red blood cell.
[0026] FIG. 12A-12D illustrate various embodiments of a size-based
separation module.
[0027] FIG. 13 illustrates the detection of single copies of a
fetal cell genome by qPCR.
[0028] FIG. 14 illustrates detection of single fetal cells in
binned samples by SNP analysis.
[0029] FIG. 15 illustrates a method of trisomy testing. The trisomy
21 screen is based on scoring of target cells obtained from
maternal blood. Blood is processed using a cell separation module
for hemoglobin enrichment (CSM-HE). Isolated cells are transferred
to slides that are first stained and subsequently probed by FISH.
Images are acquired, such as from bright field or fluorescent
microscopy, and scored. The proportion of trisomic cells of certain
classes serves as a classifier for risk of fetal trisomy 21. Fetal
genome identification can performed using assays such as: (1) STR
markers; (2) qPCR using primers and probes directed to loci, such
as the multi-repeat DYZ locus on the Y-chromosome; (3) SNP
detection; and (4) CGH (comparative genome hybridization) array
detection.
[0030] FIG. 16 illustrates assays that can produce information on
the presence of aneuploidy and other genetic disorders in target
cells. Information on anueploidy and other genetic disorders in
target cells may be acquired using technologies such as: (1) a CGH
array established for chromosome counting, which can be used for
aneuploidy determination and/or detection of intra-chromosomal
deletions; (2) SNP/taqman assays, which can be used for detection
of single nucleotide polymorphisms; and (3) ultra-deep sequencing,
which can be used to produce partial or complete genome sequences
for analysis.
[0031] FIG. 17 illustrates methods of fetal diagnostic assays.
Fetal cells are isolated by CSM-HE enrichment of target cells from
blood. The designation of the fetal cells may be confirmed using
techniques comprising FISH staining (using slides or membranes and
optionally an automated detector), FACS, and/or binning. Binning
may comprise distribution of enriched cells across wells in a plate
(such as a 96 or 384 well plate), microencapsulation of cells in
droplets that are separated in an emulsion, or by introduction of
cells into microarrays of nanofluidic bins. Fetal cells are then
identified using methods that may comprise the use of biomarkers
(such as fetal (gamma) hemoglobin), allele-specific SNP panels that
could detect fetal genome DNA, detection of differentially
expressed maternal and fetal transcripts (such as Affymetrix
chips), or primers and probes directed to fetal specific loci (such
as the multi-repeat DYZ locus on the Y-chromosome) Binning sites
that contain fetal cells are then be analyzed for aneuploidy and/or
other genetic defects using a technique such as CGH array
detection, ultra deep sequencing (such as Solexa, 454, or mass
spectrometry), STR analysis, or SNP detection.
[0032] FIG. 18 illustrates methods of fetal diagnostic assays,
further comprising the step of whole genome amplification prior to
analysis of aneuploidy and/or other genetic defects.
DETAILED DESCRIPTION OF THE INVENTION
[0033] The methods herein are used for detecting the presence and
condition of fetal cells in a mixed sample wherein the fetal cells
are at a concentration of less than 90, 80, 70, 60, 50, 40, 30, 20,
10, 5 or 1% of all cells in the sample at a concentration less than
1:2, 1:4, 1:10, 1:50, 1:100, 1:1000, 1:10,000, 1:100,000,
1,000,000, 1:10,000,000 or 1:100,000,000 of all cells in the
sample.
[0034] FIG. 1 illustrates an overview of the methods and systems
herein.
[0035] In step 100, a sample to be analyzed for rare cells (e.g.
fetal cells) is obtained from an animal. Such animal can be
suspected of being pregnant, pregnant, or one that has been
pregnant. Such sample can be analyzed by the systems and methods
herein to determine a condition in the animal or fetus of the
animal. In some embodiments, the methods herein are used to detect
the presence of a fetus, sex of a fetus, or condition of the fetus.
The animal from whom the sample is obtained can be, for example, a
human or a domesticated animal such as a cow, chicken, pig, horse,
rabbit, dog, cat, or goat. Samples derived from an animal or human
include, e.g., whole blood, sweat, tears, ear flow, sputum, lymph,
bone marrow suspension, lymph, urine, saliva, semen, vaginal flow,
cerebrospinal fluid, brain fluid, ascites, milk, secretions of the
respiratory, intestinal or genitourinary tracts fluid.
[0036] To obtain a blood sample, any technique known in the art may
be used, e.g. a syringe or other vacuum suction device. A blood
sample can be optionally pre-treated or processed prior to
enrichment. Examples of pre-treatment steps include the addition of
a reagent such as a stabilizer, a preservative, a fixant, a lysing
reagent, a diluent, an anti-apoptotic reagent, an anti-coagulation
reagent, an anti-thrombotic reagent, magnetic property regulating
reagent, a buffering reagent, an osmolality regulating reagent, a
pH regulating reagent, and/or a cross-linking reagent.
[0037] When a blood sample is obtained, a preservative such an
anti-coagulation agent and/or a stabilizer is often added to the
sample prior to enrichment. This allows for extended time for
analysis/detection. Thus, a sample, such as a blood sample, can be
enriched and/or analyzed under any of the methods and systems
herein within 1 week, 6 days, 5 days, 4 days, 3 days, 2 days, 1
day, 12 hrs, 6 hrs, 3 hrs, 2 hrs, or 1 hr from the time the sample
is obtained.
[0038] A blood sample can be combined with an agent that
selectively lyses one or more cells or components in a blood
sample. For example, fetal cells can be selectively lysed releasing
their nuclei when a blood sample including fetal cells is combined
with deionized water. Such selective lysis allows for the
subsequent enrichment of fetal nuclei using, e.g., size or affinity
based separation. In another example, platelets and/or enucleated
red blood cells are selectively lysed to generate a sample enriched
in nucleated cells, such as fetal nucleated red blood cells (fnRBC)
and material red nucleated blood cells (mnRBC). The fnRBCs can
subsequently be separated from the mnRBCs using, e.g., antigen-i
affinity or differences in hemoglobin
[0039] When obtaining a sample from an animal (e.g., blood sample),
the amount can vary depending upon animal size, its gestation
period, and the condition being screened. Up to 50, 40, 30, 20, 10,
9, 8, 7, 6, 5, 4, 3, 2, or 1 mL of a sample is obtained. The volume
of sample obtained can be 1-50, 2-40, 3-30, or 4-20 mL.
Alternatively, more than 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55,
60, 65, 70, 75, 80, 85, 90, 95 or 100 mL of a sample is
obtained.
[0040] To detect fetal abnormality, a blood sample can be obtained
from a pregnant animal or human within 36, 24, 22, 20, 18, 16, 14,
12, 10, 8, 6 or 4 weeks of gestation.
[0041] In step 101, a reference or control sample is obtained by
any means known in the art. A reference sample is any sample that
consists essentially of, or only of, non-fetal cells or non-fetal
DNA. A reference sample is preferably a maternal only cell or DNA
sample. In some embodiment, a reference sample is a maternal only
blood sample. When obtaining a reference sample such as a maternal
blood sample from a pregnant female, or one suspected of being
pregnant or the sample can be diluted enough to ensure that
<<1 fetal cell is expected in the sample. Dilution can be by
a factor of about 10 to 1000 fold, or by a factor of greater than
5, 10, 50, 100, 200, 500 to 1000 fold. Alternatively, white blood
cells can be obtained from the same organism from whom the mixed
sample is obtained. In some cases, the reference sample is obtained
by deleting a portion of the mixed sample.
[0042] In step 102, when the sample to be tested or analyzed is a
mixed sample (e.g. maternal blood sample), it is enriched for rare
cells or rare DNA (e.g. fetal cells, fetal DNA or fetal nuclei)
using one or more methods known in the art or disclosed herein.
Such enrichment increases the ratio of fetal cells to non-fetal
cells, the concentration of fetal DNA to non-fetal DNA, and/or the
concentration of fetal cells in volume per total volume of the
mixed sample.
[0043] In some embodiments, enrichment occurs by selective lysis as
described above. For example, enucleated cells may be selectively
lysed prior to subsequent enrichment steps or fetal nucleated cells
may be selectively lysed prior to separation of the fetal nuclei
from other cells and components in the sample.
[0044] In some embodiments, enrichment of fetal cells or fetal
nuclei occurs using one or more size-based separation modules.
Size-based separation modules include filtration modules, sieves,
matrixes, etc., including those disclosed in International
Publication Nos. WO 2004/113877, WO 2004/0144651, and US
Application Publication No. 2004/011956.
[0045] In some embodiments, a size-based separation module includes
one or more arrays of obstacles that form a network of gaps. The
obstacles are configured to direct particles (e.g. cells or nuclei)
as they flow through the array/network of gaps into different
directions or outlets based on the particle's hydrodynamic size.
For example, as a blood sample flows through an array of obstacles,
nucleated cells or cells having a hydrodynamic size larger than a
predetermined size, e.g., 8 microns, are directed to a first outlet
located on the opposite side of the array of obstacles from the
fluid flow inlet, while the enucleated cells or cells having a
hydrodynamic size smaller than a predetermined size, e.g., 8
microns, are directed to a second outlet also located on the
opposite side of the array of obstacles from the fluid flow
inlet.
[0046] An array can be configured to separate cells smaller than a
predetermined size from those larger than a predetermined size by
adjusting the size of the gaps, obstacles, and offset in the period
between each successive row of obstacles. For example, in some
embodiments, obstacles and/or gaps between obstacles can be up to
10, 20, 50, 70, 100, 120, 150, 170, or 200 microns in length or
about 2, 4, 6, 8 or 10 microns in length. In some embodiments, an
array for size-based separation includes more than 100, 500, 1,000,
5,000, 10,000, 50,000 or 100,000 obstacles that are arranged into
more than 10, 20, 50, 100, 200, 500, or 1000 rows. Preferably,
obstacles in a first row of obstacles are offset from a previous
(upstream) row of obstacles by up to 50% the period of the previous
row of obstacles. In some embodiments, obstacles in a first row of
obstacles are offset from a previous row of obstacles by up to 45,
40, 35, 30, 25, 20, 15 or 10% the period of the previous row of
obstacles. Furthermore, the distance between a first row of
obstacles and a second row of obstacles can be up to 10, 20, 50,
70, 100, 120, 150, 170 or 200 microns. A particular offset can be
continuous (repeating for multiple rows) or non-continuous. In some
embodiments, a separation module includes multiple discrete arrays
of obstacles fluidly coupled such that they are in series with one
another. Each array of obstacles has a continuous offset. But each
subsequent (downstream) array of obstacles has an offset that is
different from the previous (upstream) offset. Preferably, each
subsequent array of obstacles has a smaller offset that the
previous array of obstacles. This allows for a refinement in the
separation process as cells migrate through the array of obstacles.
Thus, a plurality of arrays can be fluidly coupled in series or in
parallel, (e.g., more than 2, 4, 6, 8, 10, 20, 30, 40, 50). Fluidly
coupling separation modules (e.g., arrays) in parallel allows for
high-throughput analysis of the sample, such that at least 1, 2, 5,
10, 20, 50, 100, 200, or 500 mL per hour flows through the
enrichment modules or at least 1, 5, 10, or 50 million cells per
hour are sorted or flow through the device.
[0047] FIGS. 2A-2D illustrates an example of a size-based
separation module. Obstacles (which may be of any shape) are
coupled to a flat substrate to form an array of gaps. A transparent
cover or lid may be used to cover the array. The obstacles form a
two-dimensional array with each successive row shifted horizontally
with respect to the previous row of obstacles, where the array of
obstacles directs component having a hydrodynamic size smaller than
a predetermined size in a first direction and component having a
hydrodynamic size larger that a predetermined size in a second
direction. The flow of sample into the array of obstacles can be
aligned at a small angle (flow angle) with respect to a
line-of-sight of the array. Optionally, the array is coupled to an
infusion pump to perfuse the sample through the obstacles. The flow
conditions of the size-based separation module described herein are
such that cells are sorted by the array with minimal damage. This
allows for downstream analysis of intact cells and intact nuclei to
be more efficient and reliable.
[0048] In one embodiment, a size-based separation module comprises
an array of obstacles configured to direct fetal cells larger than
a predetermined size to migrate along a line-of-sight within the
array towards a first outlet or bypass channel leading to a first
outlet, while directing cells and analytes smaller than a
predetermined size through the array of obstacles in a different
direction towards a second outlet.
[0049] A variety of enrichment protocols may be utilized although,
in most embodiments, gentle handling of the cells is needed to
reduce any mechanical damage to the cells or their DNA. This gentle
handling also preserves the small number of fetal cells in the
sample. Integrity of the nucleic acid being evaluated is an
important feature to permit the distinction between the genomic
material from the fetal cells and other cells in the sample. In
particular, the enrichment and separation of the fetal cells using
the arrays of obstacles produces gentle treatment which minimizes
cellular damage and maximizes nucleic acid integrity permitting
exceptional levels of separation and the ability to subsequently
utilize various formats to very accurately analyze the genome of
the cells which are present in the sample in extremely low
numbers.
[0050] In some embodiments, enrichment of fetal cells occurs using
one or more capture modules that selectively inhibit the mobility
of one or more cells of interest. Preferable a capture module is
fluidly coupled downstream to a size-based separation module.
Capture modules can include a substrate having multiple obstacles
that restrict the movement of cells or analytes greater than a
predetermined size. Examples of capture modules that inhibit the
migration of cells based on size are disclosed in U.S. Pat. Nos.
5,837,115 and 6,692,952.
[0051] In some embodiments, a capture module includes a two
dimensional array of obstacles that selectively filters or captures
cells or analytes having a hydrodynamic size greater than a
particular gap size, e.g., predetermined size. Arrays of obstacles
adapted for separation by capture can include obstacles having one
or more shapes and can be arranged in a uniform or non-uniform
order. In some embodiments, a two-dimensional array of obstacles is
staggered such that each subsequent row of obstacles is offset from
the previous row of obstacles to increase the number of
interactions between the analytes being sorted (separated) and the
obstacles.
[0052] Another example of a capture module is an affinity-based
separation module. An affinity-based separation module captures
analytes or cells of interest based on their affinity to a
structure or particle as opposed to their size. One example of an
affinity-based separation module is an array of obstacles that are
adapted for complete sample flow through, but for the fact that the
obstacles are covered with binding moieties that selectively bind
one or more analytes (e.g., cell population) of interest (e.g., red
blood cells, fetal cells, or nucleated cells) or analytes
not-of-interest (e.g., white blood cells). Binding moieties can
include e.g., proteins (e.g., ligands/receptors), nucleic acids
having complementary counterparts in retained analytes, antibodies,
etc. In some embodiments, an affinity-based separation module
comprises a two-dimensional array of obstacles covered with one or
more antibodies selected from the group consisting of: anti-CD71,
anti-CD235a, anti-CD36, anti-carbohydrates, anti-selectin,
anti-CD45, anti-GPA, and anti-antigen-i.
[0053] FIG. 3A illustrates a path of a first analyte through an
array of posts wherein an analyte that does not specifically bind
to a post continues to migrate through the array, while an analyte
that does bind a post is captured by the array.
[0054] FIG. 3B is a picture of antibody coated posts. FIG. 3C
illustrates coupling of antibodies to a substrate (e.g., obstacles,
side walls, etc.) as contemplated by the present invention.
Examples of such affinity-based separation modules are described in
International Publication No. WO 2004/029221.
[0055] In some embodiments, a capture module utilizes a magnetic
field to separate and/or enrich one or more analytes (cells) that
has a magnetic property or magnetic potential. For example, red
blood cells which are slightly diamagnetic (repelled by magnetic
field) in physiological conditions can be made paramagnetic
(attracted by magnetic field) by deoxygenation of the hemoglobin
into methemoglobin. This magnetic property can be achieved through
physical or chemical treatment of the red blood cells. Thus, a
sample containing one or more red blood cells and one or more
non-red blood cells can be enriched for the red blood cells by
first inducing a magnetic property and then separating the above
red blood cells from other analytes using a magnetic field (uniform
or non-uniform). For example, a maternal blood sample can flow
first through a size-based separation module to remove enucleated
cells and cellular components (e.g., analytes having a hydrodynamic
size less than 6 .mu.m) based on size. Subsequently, the enriched
nucleated cells (e.g., analytes having a hydrodynamic size greater
than 6 .mu.m) white blood cells and nucleated red blood cells are
treated with a reagent, such as CO.sub.2, N.sub.2 or NaNO.sub.2,
that changes the magnetic property of the red blood cells'
hemoglobin. The treated sample then flows through a magnetic field
(e.g., a column coupled to an external magnet), such that the
paramagnetic analytes (e.g., red blood cells) will be captured by
the magnetic field while the white blood cells and any other
non-red blood cells will flow through the device to result in a
sample enriched in nucleated red blood cells (including fnRBC's).
Additional examples of magnetic separation modules are described in
U.S. application Ser. No. 11/323,971, filed Dec. 29, 2005 entitled
"Devices and Methods for Magnetic Enrichment of Cells and Other
Particles" and U.S. application Ser. No. 11/227,904, filed Sep. 15,
2005, entitled "Devices and Methods for Enrichment and Alteration
of Cells and Other Particles".
[0056] Subsequent enrichment steps can be used to separate the rare
cells (e.g. fnRBC's) from the non-rare maternal nucleated red blood
cells (non-RBC's). In some embodiments, a sample enriched by
size-based separation followed by affinity/magnetic separation is
further enriched for rare cells using fluorescence activated cell
sorting (FACS) or selective lysis of a subset of the cells (e.g.
fetal cells). In some embodiments, fetal cells are selectively
bound to an anti-antigen i binding moiety (e.g. an antibody) to
separate them from the mnRBC's. In some embodiments, fetal cells or
fetal DNA is distinguished from non-fetal cells or non-fetal DNA by
forcing the rare cells (fetal cells) to become apoptotic, thus
condensing their nuclei and optionally ejecting their nuclei. Rare
cells such as fetal cells can be forced into apoptosis using
various means including subjecting the cells to hyperbaric pressure
(e.g. 4% CO.sub.2). The condensed nuclei can be detected and/or
isolated for further analysis using any technique known in the art
including DNA gel electrophoresis, in situ labeling of DNA nicks
(terminal deoxynucleotidyl transferase (TdT))-mediated dUTP in situ
nick labeling (also known as TUNEL) (Gavrieli, Y., et al. J. Cell
Biol 119:493-501 (1992)) and ligation of DNA strand breaks having
one or two-base 3' overhangs (Taq polymerase-based in situ
ligation). (Didenko V., et al. J. Cell Biol. 135:1369-76
(1996)).
[0057] In some embodiments, when the analyte desired to be
separated (e.g., red blood cells or white blood cells) is not
ferromagnetic or does not have a magnetic property, a magnetic
particle (e.g., a bead) or compound (e.g., Fe.sup.3+) can be
coupled to the analyte to give it a magnetic property. In some
embodiments, a bead coupled to an antibody that selectively binds
to an analyte of interest can be decorated with an antibody elected
from the group of anti CD71 or CD75. In some embodiments a magnetic
compound, such as Fe.sup.3+, can be coupled to an antibody such as
those described above. The magnetic particles or magnetic
antibodies herein may be coupled to any one or more of the devices
described herein prior to contact with a sample or may be mixed
with the sample prior to delivery of the sample to the
device(s).
[0058] The magnetic field used to separate analytes/cells in any of
the embodiments herein can uniform or non-uniform as well as
external or internal to the device(s) herein. An external magnetic
field is one whose source is outside a device herein (e.g.,
container, channel, obstacles). An internal magnetic field is one
whose source is within a device contemplated herein. An example of
an internal magnetic field is one where magnetic particles may be
attached to obstacles present in the device (or manipulated to
create obstacles) to increase surface area for analytes to interact
with to increase the likelihood of binding. Analytes captured by a
magnetic field can be released by demagnetizing the magnetic
regions retaining the magnetic particles. For selective release of
analytes from regions, the demagnetization can be limited to
selected obstacles or regions. For example, the magnetic field can
be designed to be electromagnetic, enabling turn-on and turn-off of
the magnetic fields for each individual region or obstacle at
will.
[0059] FIG. 4 illustrates an embodiment of a device configured for
capture and isolation of cells expressing the transferrin receptor
from a complex mixture. Monoclonal antibodies to CD71 receptor are
readily available off-the-shelf and can be covalently coupled to
magnetic materials, such as, but not limited to any conventional
ferroparticles including ferrous doped polystyrene and
ferroparticles or ferro-colloids (e.g., from Miltenyi or Dynal).
The anti CD71 bound to magnetic particles is flowed into the
device. The antibody coated particles are drawn to the obstacles
(e.g., posts), floor, and walls and are retained by the strength of
the magnetic field interaction between the particles and the
magnetic field. The particles between the obstacles, and those
loosely retained with the sphere of influence of the local magnetic
fields away from the obstacles, are removed by a rinse.
[0060] One or more of the enrichment modules herein (e.g.,
size-based separation module(s) and capture module(s)) may be
fluidly coupled in series or in parallel with one another. For
example a first outlet from a separation module can be fluidly
coupled to a capture module. In some embodiments, the separation
module and capture module are integrated such that a plurality of
obstacles acts both to deflect certain analytes according to size
and direct them in a path different than the direction of
analyte(s) of interest, and also as a capture module to capture,
retain, or bind certain analytes based on size, affinity, magnetism
or other physical property.
[0061] In any of the embodiments herein, the enrichment steps
performed have a specificity and/or sensitivity >50, 60, 70, 80,
90, 95, 96, 97, 98, 99, 99.1, 99.2, 99.3, 99.4, 99.5, 99.6, 99.7,
99.8, 99.9 or 99.95% The retention rate of the enrichment module(s)
herein is such that >50, 60, 70, 80, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, or 99.9% of the analytes or cells of interest (e.g.,
nucleated cells or nucleated red blood cells or nucleated from red
blood cells) are retained. Simultaneously, the enrichment modules
are configured to remove >50, 60, 70, 80, 85, 90, 91, 92, 93,
94, 95, 96, 97, 98, 99, or 99.9% of all unwanted analytes (e.g.,
red blood-platelet enriched cells) from a sample.
[0062] Any or all of the enrichment steps can occur with minimal
dilution of the sample. For example, in some embodiments the
analytes of interest are retained in an enriched solution that is
less than 50, 40, 30, 20, 10, 9.0, 8.0, 7.0, 6.0, 5.0, 4.5, 4.0,
3.5, 3.0, 2.5, 2.0, 1.5, 1.0, or 0.5 fold diluted from the original
sample. In some embodiments, any or all of the enrichment steps
increase the concentration of the analyte of interest (e.g. fetal
cell), for example, by transferring them from the fluid sample to
an enriched fluid sample (sometimes in a new fluid medium, such as
a buffer). The new concentration of the analyte of interest may be
at least 2, 4, 6, 8, 10, 20, 50, 100, 200, 500, 1,000, 2,000,
5,000, 10,000, 20,000, 50,000, 100,000, 200,000, 500,000,
1,000,000, 2,000,000, 5,000,000, 10,000,000, 20,000,000,
50,000,000, 100,000,000, 200,000,000, 500,000,000, 1,000,000,000,
2,000,000,000, or 5,000,000,000 fold more concentrated than in the
original sample. For example, a 10 times concentration increase of
a first cell type out of a blood sample means that the ratio of
first cell type/all cells in a sample is 10 times greater after the
sample was applied to the apparatus herein. Such concentration can
take a fluid sample (e.g., a blood sample) of greater than 10, 15,
20, 50, or 100 mL total volume comprising rare components of
interest, and it can concentrate such rare component of interest
into a concentrated solution of less than 0.5, 1, 2, 3, 5, or 10 mL
total volume.
[0063] The final concentration of rare cells in relation to
non-rare cells after enrichment can be about 1/10,000- 1/10, or
1/1,000- 1/100. In some embodiments, the concentration of fetal
cells to maternal cells may be up to 1/1,000, 1/100, or 1/10 or as
low as 1/1,000, 1/1,000 or 1/10,000.
[0064] Thus, detection and analysis of the fetal cells can occur
even if the non-fetal (e.g. maternal) cells are >50%, 60%, 70%,
80%, 90%, 95%, or 99% of all cells in a sample. In some
embodiments, fetal cells are at a concentration of less than 1:2,
1:4, 1:10, 1:50, 1:100, 1:1000, 1:10,000, 1:100,000, 1,000,000,
1:10,000,000 or 1:100,000,000 of all cells in a mixed sample to be
analyzed or at a concentration of less than 1.times.10.sup.-3,
1.times.10.sup.-4, 1.times.10.sup.-5, 1.times.10.sup.-6, or
1.times.10.sup.-6 cells/.mu.L of the mixed sample. Over all, the
number of fetal cells in a mixed sample, (e.g. enriched sample) has
up to 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 100 total
fetal cells.
[0065] Enriched target cells (e.g., fnRBC) can be "binned" prior to
analysis of the enriched cells (FIGS. 17 and 18). Binning is any
process which results in the reduction of complexity and/or total
cell number of the enriched cell output. Binning may be performed
by any method known in the art or described herein. One method of
binning the enriched cells is by serial dilution. Such dilution may
be carried out using any appropriate platform (e.g., PCR wells,
microtiter plates). Other methods include nanofluidic systems which
separate samples into droplets (e.g., BioTrove, Raindance,
Fluidigm). Such nanofluidic systems may result in the presence of a
single cell present in a nanodroplet.
[0066] Binning may be preceded by positive selection for target
cells including, but not limited to affinity binding (e.g. using
anti-CD71 antibodies). Alternately, negative selection of
non-target cells may precede binning. For example, output from the
size-based separation module may be passed through a magnetic
hemoglobin enrichment module (MHEM) which selectively removes WBCs
from the enriched sample.
[0067] For example, the possible cellular content of output from
enriched maternal blood which has been passed through a size-based
separation module (with or without further enrichment by passing
the enriched sample through a MHEM) may consist of: 1)
approximately 20 fnRBC; 2) 1,500 nmRBC; 3) 4,000-40,000 WBC; 4)
15.times.10.sup.6 RBC. If this sample is separated into 100 bins
(PCR wells or other acceptable binning platform), each bin would be
expected to contain: 1) 80 negative bins and 20 bins positive for
one fnRBC; 2) 150 nmRBC; 3) 400-4,000 WBC; 4) 15.times.10.sup.4
RBC. If separated into 10,000 bins, each bin would be expected to
contain: 1) 9,980 negative bins and 20 bins positive for one fnRBC;
2) 8,500 negative bins and 1,500 bins positive for one mnRBC;
3)<1-4 WBC; 4) 15.times.10.sup.2 RBC. One of skill in the art
will recognize that the number of bins may be increased depending
on experimental design and/or the platform used for binning. The
reduced complexity of the binned cell populations may facilitate
further genetic and cellular analysis of the target cells.
[0068] Analysis may be performed on individual bins to confirm the
presence of target cells (e.g. fnRBC) in the individual bin. Such
analysis may consist of any method known in the art, including, but
not limited to, FISH, PCR, STR detection, SNP analysis, biomarker
detection, and sequence analysis (FIGS. 17 and 18).
[0069] Fetal Biomarkers
[0070] In some embodiments fetal biomarkers may be used to detect
and/or isolate fetal cells, after enrichment or after detection of
fetal abnormality or lack thereof. For example, this may be
performed by distinguishing between fetal and maternal nRBCs based
on relative expression of a gene (e.g., DYS1, DYZ, CD-71,
.epsilon.- and .zeta.-globin) that is differentially expressed
during fetal development. In preferred embodiments, biomarker genes
are differentially expressed in the first and/or second trimester.
"Differentially expressed," as applied to nucleotide sequences or
polypeptide sequences in a cell or cell nuclei, refers to
differences in over/under-expression of that sequence when compared
to the level of expression of the same sequence in another sample,
a control or a reference sample. In some embodiments, expression
differences can be temporal and/or cell-specific. For example, for
cell-specific expression of biomarkers, differential expression of
one or more biomarkers in the cell(s) of interest can be higher or
lower relative to background cell populations. Detection of such
difference in expression of the biomarker may indicate the presence
of a rare cell (e.g., fnRBC) versus other cells in a mixed sample
(e.g., background cell populations). In other embodiments, a ratio
of two or more such biomarkers that are differentially expressed
can be measured and used to detect rare cells.
[0071] In one embodiment, fetal biomarkers comprise differentially
expressed hemoglobins. Erythroblasts (nRBCs) are very abundant in
the early fetal circulation, virtually absent in normal adult blood
and by having a short finite lifespan, there is no risk of
obtaining fnRBC which may persist from a previous pregnancy.
Furthermore, unlike trophoblast cells, fetal erythroblasts are not
prone to mosaic characteristics.
[0072] Yolk sac erythroblasts synthesize .epsilon.-, .zeta.-,
.gamma.- and .alpha.-globins, these combine to form the embryonic
hemoglobins. Between six and eight weeks, the primary site of
erythropoiesis shifts from the yolk sac to the liver, the three
embryonic hemoglobins are replaced by fetal hemoglobin (HbF) as the
predominant oxygen transport system, and .epsilon.- and
.zeta.-globin production gives way to .gamma.-, .alpha.- and
.beta.-globin production within definitive erythrocytes (Peschle et
al., 1985). HbF remains the principal hemoglobin until birth, when
the second globin switch occurs and .beta.-globin production
accelerates.
[0073] Hemoglobin (Hb) is a heterodimer composed of two identical a
globin chains and two copies of a second globin. Due to
differential gene expression during fetal development, the
composition of the second chain changes from .epsilon. globin
during early embryonic development (1 to 4 weeks of gestation) to
.gamma. globin during fetal development (6 to 8 weeks of gestation)
to .beta. globin in neonates and adults as illustrated in (Table
1).
TABLE-US-00001 TABLE 1 Relative expression of .epsilon., .gamma.
and .beta. in maternal and fetal RBCs. .epsilon. .gamma. B 1st
trimester Fetal ++ ++ - Maternal - +/- ++ 2nd trimester Fetal - ++
+/- Maternal - +/- ++
[0074] In the late-first trimester, the earliest time that fetal
cells may be sampled by CVS, fnRBCs contain, in addition to a
globin, primarily .epsilon. and .gamma. globin. In the early to mid
second trimester, when amniocentesis is typically performed, fnRBCs
contain primarily .gamma. globin with some adult .beta. globin.
Maternal cells contain almost exclusively .alpha. and .beta.
globin, with traces of .gamma. detectable in some samples.
Therefore, by measuring the relative expression of the .epsilon.,
.gamma. and .beta. genes in RBCs purified from maternal blood
samples, the presence of fetal cells in the sample can be
determined. Furthermore, positive controls can be utilized to
assess failure of the FISH analysis itself.
[0075] In various embodiments, fetal cells are distinguished from
maternal cells based on the differential expression of hemoglobins
.beta., .gamma. or .epsilon.. Expression levels or RNA levels can
be determined in the cytoplasm or in the nucleus of cells. Thus in
some embodiments, the methods herein involve determining levels of
messenger RNA (mRNA), ribosomal RNA (rRNA), or nuclear RNA
(nRNA).
[0076] In some embodiments, identification of fnRBCs can be
achieved by measuring the levels of at least two hemoglobins in the
cytoplasm or nucleus of a cell. In various embodiments,
identification and assay is from 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15
or 20 fetal nuclei. Furthermore, total nuclei arrayed on one or
more slides can number from about 100, 200, 300, 400, 500, 700,
800, 5000, 10,000, 100,000, 1,000,000, 2,000,000 to about
3,000,000. In some embodiments, a ratio for .gamma./.beta. or
.epsilon./.beta. is used to determine the presence of fetal cells,
where a number less than one indicates that a fnRBC(s) is not
present. In some embodiments, the relative expression of
.gamma./.beta. or .epsilon./.beta. provides a fnRBC index ("FNI"),
as measured by .gamma. or .epsilon. relative to .beta.. In some
embodiments, a FNI for .gamma./.beta. greater than 5, 10, 15, 20,
25, 30, 35, 40, 45, 90, 180, 360, 720, 975, 1020, 1024, 1250 to
about 1250, indicate that a fnRBC(s) is present. In yet other
embodiments, a FNI for .gamma./.beta. of less than about 1
indicates that a fnRBC(s) is not present. Preferably, the above FNI
is determined from a sample obtained during a first trimester.
However, similar ratios can be used during second trimester and
third trimester.
[0077] In some embodiments, the expression levels are determined by
measuring nuclear RNA transcripts including, nascent or unprocessed
transcripts. In another embodiment, expression levels are
determined by measuring mRNA, including ribosomal RNA. There are
many methods known in the art for imaging (e.g., measuring) nucleic
acids or RNA including, but not limited to, using expression arrays
from Affymetrix, Inc. or Illumina, Inc.
[0078] RT-PCR primers can be designed by targeting the globin
variable regions, selecting the amplicon size, and adjusting the
primers annealing temperature to achieve equal PCR amplification
efficiency. Thus TaqMan probes can be designed for each of the
amplicons with well-separated fluorescent dyes, Alexa
Fluor.RTM.-355 for 8, Alexa Fluor.RTM.-488 for .gamma., and Alexa
Fluor-555 for .beta.. The specificity of these primers can be first
verified using .epsilon., .gamma., and .beta. cDNA as templates.
The primer sets that give the best specificity can be selected for
further assay development. As an alternative, the primers can be
selected from two exons spanning an intron sequence to amplify only
the mRNA to eliminate the genomic DNA contamination.
[0079] The primers selected can be tested first in a duplex format
to verify their specificity, limit of detection, and amplification
efficiency using target cDNA templates. The best combinations of
primers can be further tested in a triplex format for its
amplification efficiency, detection dynamic range, and limit of
detection.
[0080] Various commercially available reagents are available for
RT-PCR, such as One-step RT-PCR reagents, including Qiagen One-Step
RT-PCR Kit and Applied Biosytems TaqMan One-Step RT-PCR Master Mix
Reagents kit. Such reagents can be used to establish the expression
ratio of .epsilon., .gamma., and .beta. using purified RNA from
enriched samples. Forward primers can be labeled for each of the
targets, using Alexa fluor-355 for .epsilon., Alexa fluor-488 for
.gamma., and Alexa fluor-555 for .beta.. Enriched cells can be
deposited by cytospinning onto glass slides. Additionally,
cytospinning the enriched cells can be performed after in situ
RT-PCR. Thereafter, the presence of the fluorescent-labeled
amplicons can be visualized by fluorescence microscopy. The reverse
transcription time and PCR cycles can be optimized to maximize the
amplicon signal:background ratio to have maximal separation of
fetal over maternal signature. Preferably, signal:background ratio
is greater than 5, 10, 50 or 100 and the overall cell loss during
the process is less than 50, 10 or 5%.
[0081] Fetal Cell Analysis
[0082] In step 125, DNA is extracted and purified from cells/nuclei
of the enriched product (mixed sample enriched) and reference
sample. Methods for extracting DNA are known to those skilled in
the art.
[0083] In step 131, the DNA is optionally pre-amplified to increase
the overall quantity of DNA for subsequent analysis.
Pre-amplification of DNA can be conducted using any amplification
method known in the art, including for example, amplification via
multiple displacement amplification (MDA) (Gonzalez J M, et al.
Cold Spring Harb Symp Quant Biol; 68:69-78 (2003), Murthy et al.
Hum Mutat 26(2):145-52 (2005) and Paulland et al., Biotechniques;
38(4):553-4, 556, 558-9 (2005)), and linear amplification methods
such as in vitro transcription (Liu, et al., BMC Genomics; 4(1)19
(2003)).
[0084] Other methods for pre-amplification include PCR methods
including quantitative PCR, quantitative fluorescent PCR (QF-PCR),
multiplex fluorescent PCR (MF-PCR), real time PCR(RT-PCR), single
cell PCR, PCR-RFLP/RT-PCR-RFLP, hot start PCR and Nested PCR. For
example, the PCR products can be directly sequenced
bi-directionally by dye-terminator sequencing. PCR can be performed
in a 384-well plate in a volume of 15 ul containing 5 ng genomic
DNA, 2 mM MgCl.sub.2, 0.75 ul DMSO, 1 M Betaine, 0.2 mM dNTPs, 20
pmol primers, 0.2 ul AmpliTaq Gold.RTM. (Applied Biosystems),
1.times. buffer (supplied with AmpliTaq Gold). Thermal cycling
conditions are as follows: 95.degree. C. for 10 minutes; 95.degree.
C. for 30 seconds, 60.degree. C. for 30 seconds, 72.degree. C. for
1 minute for 30 cycles; and 72.degree. C. for 10 minutes. PCR
products can be purified with Ampure.RTM.Magnetic Beads (Agencourt)
and can be optionally separated by capillary electrophoresis on an
ABI3730 DNA Analyzer (Applied Biosystems).
[0085] Other suitable amplification methods include the ligase
chain reaction (LCR), transcription amplification, self-sustained
sequence replication, selective amplification of target
polynucleotide sequences, consensus sequence primed polymerase
chain reaction (CP-PCR), arbitrarily primed polymerase chain
reaction (AP-PCR) and nucleic acid based sequence amplification
(NABSA). Other amplification methods that may be used in step 131
include those described in, U.S. Pat. Nos. 5,242,794, 5,494,810,
4,988,617 and 6,582,938, each of which is incorporated herein by
reference.
[0086] The pre-amplification step increases the amount of enriched
fetal DNA thus allowing analysis to be performed even if up to 1
.mu.g, 500 ng, 200 ng 100 ng, 50 ng, 40 ng, 30 ng, 20 ng, 10 ng, 5
ng, 1 ng, 500 pg, 200 pg, 100 pg, 50 pg, 40 pg, 30 pg, 20 pg, 10
pg, 5 pg, or 1 pg of fetal or total DNA was obtained from the mixed
sample, or between 1-5 .mu.g, 5-10 .mu.g, 10-50 .mu.g of fetal or
total DNA was obtained from the mixed sample.
[0087] In step 141, SNP(s) are detected from DNA of both mixed and
reference samples using any method known in the art. Detection can
involve detecting an abundance of a nucleotide base at a SNP
position. Detection can be accomplished using a DNA microarray,
bead microarray, or high throughput sequencing. In some instances
SNPs are detected using highly parallel SNP detection methods such
as those described in Fan J B, et al. Cold Spring Harb Symp Quant
Biol; 68:69-78 (2003); Moorhead M, et al. Eur. J. Hum Genet.
14:207-215 (2005); Wang Y, et. al. Nucleic Acids Res; 33(21):e183
(2005). Highly parallel SNP detection provides information about
genotype and gene copy numbers at a large number of loci scattered
across the genome in one procedure. In some cases, highly parallel
SNP detection involves performing SNP specific ligation-extension
reactions, followed by amplification of the products. The readout
of the SNP types can be done using DNA microarrays (Gunderson et
al. Nat. Genety 37(5):549-54 (2005), bead arrays (Shen, et al.,
Mutat. Res; 573 (1-2):70-82 (2005), or by sequencing, such as high
throughput sequencing (e.g. Margulies et al. Nature, 437
(7057):376-80 (2005)) of individual amplicons.
[0088] In some embodiments, cDNAs, which are reverse transcribed
from mRNAs obtained from fetal or maternal cells, are analyzed for
the presence of SNPS using the methods disclosed within. The type
and abundance of the cDNAs can be used to determine whether a cell
is a fetal cell (such as by the presence of Y chromosome specific
transcripts) or whether the fetal cell has a genetic abnormality
(such as anueploidy, abundance of alternative transcripts or
problems with DNA methylation or imprinting).
[0089] In one embodiment, fetal or maternal cells or nuclei are
enriched using one or more methods disclosed herein. Preferably,
fetal cells are enriched by flowing the sample through an array of
obstacles that selectively directs particles or cells of different
hydrodynamic sizes into different outlets such that fetal cells and
cells larger than fetal cells are directed into a first outlet and
one or more cells or particles smaller than the rare cells are
directed into a second outlet.
[0090] Total RNA or poly-A mRNA is then obtained from enriched
cell(s) (fetal or maternal cells) using purification techniques
known in the art. Generally, about 1 .mu.g-2 .mu.g of total RNA is
sufficient. Next, a first-strand complementary DNA (cDNA) is
synthesized using reverse transcriptase and a single T7-oligo(dT)
primer. Next, a second-strand cDNA is synthesized using DNA ligase,
DNA polymerase, and RNase enzyme. Next, the double stranded cDNA
(ds-cDNA) is purified.
[0091] In another embodiment, total RNA is extracted from enriched
cells (fetal cells or maternal cells). Next a, two one-quarter
scale Message Amp II reactions (Ambion, Austin, Tex.) are performed
for each RNA extraction using 200 ng of total RNA. MessageAmp is a
procedure based on antisense RNA (aRNA) amplification, and involves
a series of enzymatic reactions resulting in linear amplification
of exceedingly small amounts of RNA for use in array analysis.
Unlike exponential RNA amplification methods, such as NASBA and
RT-PCR, aRNA amplification maintains representation of the starting
mRNA population. The procedure begins with total or poly(A) RNA
that is reverse transcribed using a primer containing both
oligo(dT) and a T7 RNA polymerase promoter sequence. After
first-strand synthesis, the reaction is treated with RNase H to
cleave the mRNA into small fragments. These small RNA fragments
serve as primers during a second-strand synthesis reaction that
produces a double-stranded cDNA template.
[0092] Any DNA microarray that is capable of detecting one or more
SNPs can be used with the methods herein. DNA microarrays comprise
a plurality of genetic probes immobilized at discrete sites (i.e.,
defined locations or assigned positions) on a substrate surface. A
DNA microarray preferably monitors at least 5, 10, 20, 50, 100,
200, 500, 1,000, 2,000, 5,000, 10,000, 20,000, 50,000, 100,000,
200,000 or 500,000 different SNPs. Such SNPs can be located in one
or more target chromosomes or over the entire genome. Methods for
manufacturing DNA microarrays for detecting SNPs are known in the
art. Microarrays that can be used in the systems herein include
those commercially available from Affymetrix (Santa Clara, Calif.),
Illumina (San Diego, Calif.), Spectral Genomics, Inc. (Houston,
Tex.), and Vysis Corporation (Downers Grove, Ill.). Methods for
detecting SNPs using microarrays are further described in U.S. Pat.
Nos. 6,300,063, 5,837,832, 6,969,589, 6,040,138, and 6,858,412.
[0093] In one embodiment, SNPs are detected using molecular
inversion probes (MIPs). MIPs are nearly circularized probes having
a first end of the probe complementary to a region immediately
upstream of the SNP to be detected, and a second end of the probe
complementary to a region immediately downstream of the SNP. To use
MIPs both ends are allowed to hybridize to genomic regions
surrounding the SNP and an enzymatic reaction fills the gap at the
SNP position in an allele specific manner. The fully circular probe
now can be separated by a simple exonuclease reaction which leaves
a primer sequence coupled to a label unique to the allele. The
primer is subsequently used to amplify the label which is then
hybridized to an array for detection.
[0094] FIG. 5 illustrates one embodiment of an allele specific
extension and ligation reaction. Genomic DNA fragments are first
annealed to a solid support. Subsequently, probes designed to be
unique for each allele (P1 and P2) are annealed to the target DNA.
After a washing step, allele-specific primer extension is conducted
to extend the probes if such probes have 3' ends that are
complementary to their cognate SNP in the genomic DNA template. The
extension is followed by ligation of the extended templates to
their corresponding locus-specific probes (P3) to create PCR
templates. Requiring the joining of two fragments (P1 and P3 or P2
and P3) to create a PCR template provides an additional level of
genomic specificity, because any residual incorrectly hybridized
allele-specific or locus-specific probes are unlikely to be
adjacent and thus should not be able to ligate. Next, fluorescently
labeled primers, each with a different dye, are added for PCR
amplification, thus providing a means for detection and
quantification of each SNP by providing data points. In addition,
each SNP is assigned a different address sequence (P3) which is
contained within the locus-specific probe. Each address sequence is
complementary to a unique capture sequence that can be contained by
one of several bead types present in an array. Furthermore, the use
of universal PCR primers to associate a fluorescent dye with each
SNP allele provides a cost-saving element, because only three
primers, two labeled and one unlabeled, are needed regardless of
the number of SNPs to be assayed.
[0095] If the addresses are captured by beads, multiple SNPs can be
amplified in the same or in different reactions using bead
amplification. When more than one DNA polymorphism is used in the
same amplification reaction, primers are chosen to be multiplexable
(fairly uniform melting temperature, absence of cross-priming on
the human genome, and absence of primer-primer interaction based on
sequence analysis) with other pairs of primers. Furthermore,
primers and loci may be chosen so that the amplicon lengths from a
given locus do not overlap with those from another locus. Multiple
dyes and multi-color fluorescence readout may be used to increase
the multiplexing capacity.
[0096] In some embodiments, highly parallel SNP detection is
performed by arrayed primer extension (APEX). In order to perform
APEX, a gene locus is chosen where one wishes to analyze SNPs or
mutations, for example, loci for abnormal ploidy disorders (e.g.
chromosome X, 13, 18, and 21). Oligonucleotides (e.g., about 20-,
25-, 30-, 40-, 50-mers) are designed to be complementary to the
gene up to, but not including the base where the mutation or SNP
exists. In one example, the oligonucleotides are modified with an
amine group at the 5' end to facilitate covalent binding to
activated glass slides, in this case epoxy silanized surfaces. The
locus in question is PCR amplified and the DNA enzymatically
sheared to facilitate hybridization to the oligos. The PCR
reactions contain dTTP and dUTP at about a 5 to 1 ratio, and the
incorporation of the dUTP allows the amplified DNA to be
enzymatically cut with uracil N-glycosylase (UNG). The optimal size
of the sheared DNA is about 100 base pairs. The sheared DNA is then
hybridized to the bound oligos and a primer extension reaction
carried out using a thermostable DNA polymerase such as Thermo
Sequenase (Amersham Pharmacia Biotech) or AmpliTaq FS (Roche
Molecular Systems). The primer extension reaction contains four
dideoxynucleotides (ddNTPs) corresponding to A, G, C & T, with
each ddNTP containing a distinct fluor molecule. In the above
example, ddNTPs can be conjugated to either fluorescein, Cy3.TM.,
Texas Red or Cy5.TM.. Depending on which base is next in the
sequence (wild type, mutant or SNP), the primer extension reaction
will incorporate one nucleotide with one and only one of the four
dyes. Thus, by applying a simple four laser scan one can tell which
base is next in the sequence as each of the above dyes are easily
spectrally separable. A large number of different oligos, (e.g.,
5-, 10-, 15-, 20-, 30-, 40-, 50-, 60-, 70-, 80-, 90-, 100-thousand
probes) may be attached to a slide for this type of analysis with
the requirement that very little cross hybridization occurs among
all the sequences. It may be helpful to increase the length of the
oligos (e.g., 50-, 60-, 70-, 80-mers) so that the initial
hybridization can be done at higher stringency resulting in less
background from non-homologous hybridization. In the APEX method,
the signal to noise ratio is about 40 to 1, a level which is more
than sufficient for unambiguously identifying SNPs and mutations.
To design such large arrays for SNP analysis, a computational
screen can be conducted to favor a subset of sequences with similar
GC content and thermodynamic properties, and eliminate sequences
with possible secondary structure or sequence similarity to other
tags. Shoemaker et al. Nature Genetics 14:450-456 (1996); Giaever
et al. Nature Genetics 21:278-283 (1999); Winzeler et al. Science
285:901-906 (1999). For example, in high density tag array 64,000
probes, each probe occupying an area of 30.times.30 .mu.m, are used
for parallel genotyping of human SNPs.
[0097] In some cases, it may be desirable to introduce a novel
restriction site in the region of the mutation to create
cleavage-based detection. Gasparini, et al., Mol. Cell. Probes 6:1
(1992). Amplification is subsequently performed using Taq ligase
and the like. Barany, Proc. Natl. Acad. Sci. USA 88:189 (1991). In
such cases, ligation will occur only if there is a perfect match at
the 3'-terminus of the 5' sequence, making it possible to detect
the presence of a known mutation at a specific site by looking for
the presence or absence of amplification.
[0098] Alternatively, detection of single strand conformation
polymorphism (SSCP) may be used to detect differences in
electrophoretic mobility between mutant and wild type nucleic acids
(e.g., SNP). Orita, et al., Proc. Natl. Acad. Sci. USA: 86: 2766
(1989); Cotton, Mutat. Res. 285: 125-144 (1993); and Hayashi,
Genet. Anal. Tech. Appl. 9: 73-79 (1992). Single-stranded DNA
fragments of sample and control nucleic acids will be denatured and
allowed to renature. The secondary structure of single-stranded
nucleic acids varies according to sequence, the resulting
alteration in electrophoretic mobility enables the detection of
even a single base change. The DNA fragments may be labeled or
detected with labeled probes. The sensitivity of the assay may be
enhanced by using RNA (rather than DNA), in which the secondary
structure is more sensitive to a change in sequence. The subject
method utilizes heteroduplex analysis to separate double-stranded
heteroduplex molecules on the basis of changes in electrophoretic
mobility. Keen, et al., Trends Genet. 7: 5 (1991).
[0099] Other methods for detecting SNPs include methods in which
protection from cleavage agents is used to detect mismatched bases
in DNA/RNA or RNA/DNA heteroduplexes. Myers, et al., Science 230:
1242 (1985). In general, the art technique of "mismatch cleavage"
starts by providing, heteroduplexes of formed by hybridizing
(labeled) RNA or DNA containing the control sequence with
potentially mutant RNA or DNA obtained from a tissue sample. The
double-stranded duplexes are treated with an agent that cleaves
single stranded regions of the duplex such as those that exist due
to "base pair mismatches" between the control and sample strands.
For instance, RNA/DNA duplexes can be treated with RNase and
DNA/DNA hybrids treated with SI nuclease to enzymatically digesting
the mismatched regions. Furthermore, either DNA/DNA or RNA/DNA
duplexes can be treated with hydroxylamine or osmium tetroxide and
with piperidine in order to digest mismatched regions. After
digestion of the mismatched regions, the resulting material is then
separated by size on denaturing polyacrylamide gels to determine
the site of mutation. Cotton, et al., Proc. Natl. Acad. Sci. USA
85:4397 (1988); and Saleeba, et al., Methods Enzymol. 2 17: 286-295
(1992). The control DNA or RNA can be labeled for detection.
[0100] SNPs can also be detected and quantified using by sequencing
methods including the classic Sanger sequencing method as well as
high throughput sequencing, which may be capable of generating at
least 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 100,000
or 500,000 sequence reads per hour, with at least 50, 60, 70, 80,
90, 100, 120 or 150 bases per read.
[0101] High throughput sequencing can involve
sequencing-by-synthesis, sequencing-by-ligation, and ultra deep
sequencing.
[0102] Sequence-by-synthesis can be initiated using sequencing
primers complementary to the sequencing element on the nucleic acid
tags. The method involves detecting the identity of each nucleotide
immediately after (substantially real-time) or upon (real-time) the
incorporation of a labeled nucleotide or nucleotide analog into a
growing strand of a complementary nucleic acid sequence in a
polymerase reaction. After the successful incorporation of a label
nucleotide, a signal is measured and then nulled by methods known
in the art. Examples of sequence-by-synthesis methods are described
in U.S. Application Publication Nos. 2003/0044781, 2006/0024711,
2006/0024678 and 2005/0100932. Examples of labels that can be used
to label nucleotide or nucleotide analogs for
sequencing-by-synthesis include, but are not limited to,
chromophores, fluorescent moieties, enzymes, antigens, heavy metal,
magnetic probes, dyes, phosphorescent groups, radioactive
materials, chemiluminescent moieties, scattering or fluorescent
nanoparticles, Raman signal generating moieties, and
electrochemical detection moieties. Sequencing-by-synthesis can
generate at least 1,000, at least 5,000, at least 10,000, at least
20,000, 30,000, at least 40,000, at least 50,000, at least 100,000
or at least 500,000 reads per hour. Such reads can have at least
50, at least 60, at least 70, at least 80, at least 90, at least
100, at least 120 or at least 150 bases per read.
[0103] Another sequencing method involves hybridizing the amplified
regions to a primer complementary to the sequence element in an
LST. This hybridization complex is incubated with a polymerase, ATP
sulfurylase, luciferase, apyrase, and the substrates luciferin and
adenosine 5' phosphosulfate. Next, deoxynucleotide triphosphates
corresponding to the bases A, C, G, and T (U) are added
sequentially. Each base incorporation is accompanied by release of
pyrophosphate, converted to ATP by sulfurylase, which drives
synthesis of oxyluciferin and the release of visible light. Since
pyrophosphate release is equimolar with the number of incorporated
bases, the light given off is proportional to the number of
nucleotides adding in any one step. The process is repeated until
the entire sequence is determined.
[0104] Yet another sequencing method involves a four-color
sequencing by ligation scheme (degenerate ligation), which involves
hybridizing an anchor primer to one of four positions. Then an
enzymatic ligation reaction of the anchor primer to a population of
degenerate nonamers that are labeled with fluorescent dyes is
performed. At any given cycle, the population of nonamers that is
used is structure such that the identity of one of its positions is
correlated with the identity of the fluorophore attached to that
nonamer. To the extent that the ligase discriminates for
complementarily at that queried position, the fluorescent signal
allows the inference of the identity of the base. After performing
the ligation and four-color imaging, the anchor primer:nonamer
complexes are stripped and a new cycle begins. Methods to image
sequence information after performing ligation are known in the
art.
[0105] In some cases, high throughput sequencing involves the use
of ultra-deep sequencing, such as described in Marguiles et al.,
Nature 437 (7057): 376-80 (2005). Briefly, the amplicons are
diluted and mixed with beads such that each bead captures a single
molecule of the amplified material. The DNA molecule on each bead
is then amplified to generate millions of copies of the sequence
which all remain bound to the bead. Such amplification can occur by
PCR. Each bead can be placed in a separate well, which can be a
(optionally addressable) picolitre-sized well. In some embodiments,
each bead is captured within a droplet of a
PCR-reaction-mixture-in-oil-emulsion and PCR amplification occurs
within each droplet. The amplification on the bead results in each
bead carrying at least one million, at least 5 million, or at least
10 million copies of the original amplicon coupled to it. Finally,
the beads are placed into a highly parallel sequencing by synthesis
machine which generates over 400,000 reads (-100 bp per read) in a
single 4 hour run.
[0106] Other methods for ultra-deep sequencing that can be used are
described in Hong, S. et al. Nat. Biotechnol. 22(4):435-9 (2004);
Bennett, B. et al. Pharmacogenomics 6(4):373-82 (2005); Shendure,
P. et al. Science 309 (5741):1728-32 (2005).
[0107] The microarray or sequencing methods described herein
provide a readout, which can be visualized via apparatus and
methods known in the art. For example, for a given marker or at a
given tag probe position, the fluorescence intensity of each of the
fluorophores utilized (e.g., tagged sequencing or PCR primers)
provides a signal which is detected by apparatus or automated
systems/machines known in the art. The fluorophore markers can be
utilized either in an array-based or sequencing-based analysis.
[0108] In step 151, SNP data is used to determine aneuploidy by,
e.g., determining the ratio of material allele(s) to paternal
allele(s) (or vice versa); or determining ratio of maternal
allele(s) in a region suspected of aneuploidy versus in a control
region.
[0109] Aneuploidy means the condition of having less than or more
than the normal diploid number of chromosomes. In other words, it
is any deviation from euploidy. Aneuploidy includes conditions such
as monosomy (the presence of only one chromosome of a pair in a
cell's nucleus), trisomy (having three chromosomes of a particular
type in a cell's nucleus), tetrasomy (having four chromosomes of a
particular type in a cell's nucleus), pentasomy (having five
chromosomes of a particular type in a cell's nucleus), triploidy
(having three of every chromosome in a cell's nucleus), and
tetraploidy (having four of every chromosome in a cell's nucleus).
Birth of a live triploid is extraordinarily rare and such
individuals are quite abnormal, however triploidy occurs in about
2-3% of all human pregnancies and appears to be a factor in about
15% of all miscarriages. Tetraploidy occurs in approximately 8% of
all miscarriages. (www.emedicine.com/med/topic3241.htm).
[0110] In one embodiment, kits are provided which include a
separation device, optionally a capture device and the reagents and
devices used for the analysis of the genomic sequences. For
example, the kit may include the separation arrays and DNA
microarrays for detecting one or more SNPs. Any of the devices
mentioned for the DNA determination may be combined with the
separation devices. The combination of the array separation devices
with DNA analysis devices provides gentle handling and accurate
analysis.
[0111] A simple intuitive understanding of the effect of trisomy is
that it increase the abundances of fetal alleles at loci within the
affected region. Trisomies are predominately from maternal
non-dysjunction events, so typically both maternal alleles, and a
single paternal allele, are increased, and the ratio of maternal
allele abundance to paternal allele abundance is higher in the
trisomic region. These signatures may be masked by differences in
DNA amplification and hybridization efficiency from locus to locus,
and from allele to allele.
[0112] In one embodiment, trisomies are determined by comparing
abundance (e.g. intensities) of maternal and paternal alleles in a
genomic region. Within a locus, the PCR differences are smaller
than between loci, because the same primers are responsible for all
the different allele amplicons at that locus. Therefore, the allele
ratios may be more stable than the overall allele abundances. This
can be exploited by identifying loci where the paternal allele is
distinct form the maternal allele and taking the ratio of the
paternal allele strength to the average of the maternal allele
strengths. These allele ratios then can be averaged over the
hypothesized aneuploidy region and compared to the average over a
control region. The distributions of these ratio values in the
hypothesized aneuploidy region and in the control region can be
compared to create an estimate of statistical significance for the
observed difference in means. A simple example of this procedure
would use Student's t-test.
[0113] Thus, the present invention contemplates detection of fetal
abnormality by determining a ratio of abundance of maternal
allele(s) and abundance of paternal allele(s) (or vice versa) in
one or more genomic regions of interest. (Preferably the paternal
allele differs from one or both the maternal alleles). The genomic
region can be derived from a mixed sample comprising fetal and
maternal cells. The sample can be obtained from a pregnant animal
and can be, e.g., a blood sample. In some cases, the genomic region
includes a SNP and/or an informative SNP. In some cases at least
10, 20, 50, 100, 200 or 500 SNPs are analyzed per sample. The SNPs
analyzed can be in a single locus, different loci, single
chromosome, or different chromosomes. In some cases, a first
genomic region (SNP) analyzed is in a genomic region suspected of
being trisomic or is trisomic and a second genomic region (SNP)
analyzed is in a control region that is non-trisomic or a region
suspected of being non-trisomic. The ratio of alleles (e.g.,
maternal/paternal) in a first genomic region or first plurality of
genomic regions (trisomic) (hereinafter test regions) is then
compared with a ratio of alleles (e.g., maternal/paternal) in the
second genomic region or second plurality of genomic regions
(hereinafter control regions). The control region(s) and test
region(s) can be on the same or different chromosomes. In some
instances, comparison is made by determining the difference in
means of the ratios in the first regions and second regions.
Detection of an increase of paternal abundance in the test
region(s) is indicative of paternal trisomy, while detection of an
increase of maternal abundance in the first region(s) is indicative
of maternal trisomy. Furthermore, calculation of error rate based
on amplification can be performed prior to making a call if a fetus
has a specific condition (e.g., trisomy) or not.
[0114] Alternatively, the maternal allele strengths over the
suspected aneuploidy region(s) can be compared to those in the
control region(s), all without forming ratios to paternal alleles.
In this approach, errors in the measurement of the paternal allele
abundances are not calculated. However, differences in
amplification efficiency between primer pairs are calculated. These
measurements can be larger than differences between alleles in the
same locus. In this approach there also may be a residual bias
between the efficiencies averaged over certain chromosomes.
Therefore it may be useful to also perform the same detection
process in a reference sample (e.g. maternal only cell sample) and
then take the ratio of ratios. In other words, the ratio obtained
for the mixed sample of the abundance in test genomic region(s) and
control genomic region(s) divided by the same ratio obtained from
the reference sample. The ratios obtained for the mixed and
reference samples reflect allele strength over suspected aneuploidy
region over allele strength over control region, and the ratio of
ratios presents an estimate that is normalized to the reference
(maternal) sample. Such ratio of ratios is therefore free of
chromosome bias, but may include errors in the measurements of the
reference sample, as that sample is used as the control or
normalizer.
[0115] In some cases, the methods herein contemplate detecting
fetal abnormality by comparing an abundance of one or more maternal
allele(s) in a first genomic region or regions (test region(s))
with one or more maternal alleles in a second genomic region or
regions (control region(s)) in a mixed sample (e.g., maternal blood
sample from pregnant animal). This ratio can then be compared to a
similar ratio measured in a control sample (e.g., maternal-cell
only sample). The control sample can be a diluted subset of the
mixed sample, wherein the dilution is by a factor of at least 10,
100, 1000, or 10,000. In some cases, such methods further involve
estimating the number of fetal cells in the mixed sample. This can
be performed by, e.g., ranking the alleles detected according to
their abundance. The ranking can be used to determine abundance of
one or more paternal alleles. Ranking is described in more detail
herein.
[0116] Aneuploidy can be determined by modeling SNP data. One
example of a model for SNP data in the context of fetal diagnosis
is given in Equations 1-3 below.
[0117] A normal (diploid) fetus result in data x.sub.k at locus k
and is represented by:
x.sub.k=A.sub.k[(1-f)(m.sub.k1+m.sub.k2)+f((m.sub.k1 or
m.sub.k2)+p.sub.k)]+residual (1)
[0118] A trisomy caused by maternal non-dysjunction is represented
by
x.sub.k=A.sub.k[(1-f(m.sub.k1+m.sub.k2)+f(m.sub.k1+m.sub.k2+p.sub.k)]+re-
sidual (2)
[0119] and a paternally inherited trisomy is represented by
x.sub.k=A.sub.k[(1-f)(m.sub.k1+m.sub.k2)+f((m.sub.k1 or
m.sub.k2)+p.sub.k1+p.sub.k2)]+residual (3)
[0120] In Equations 1-3, A.sub.k denotes a scale factor which
subsumes the efficiencies of amplification, hybridization, and
readout common to the alleles at locus k. In this model
amplification differences between different primer pairs are fitted
and do not appear in the residuals. Alternatively, a single A
parameter could be used and the residuals would reflect these
differences. Further, f represents the fraction of fetal cells in
the mixture, m.sub.k1 and m denote the maternal alleles at locus k,
and p.sub.k denotes the paternal allele at locus k. The allele
symbols actually represent unit data contributions that can be
arithmetically summed; e.g., m.sub.k1 might be a detection of the
`C` genotype represented by unit contribution to the `C` bin at
that locus.
[0121] FIG. 6 illustrates the SNP calls that result under this data
model. At Locus 1, the fetal genotype was GC. There is a paternally
inherited `G` allele contribution in the mixed sample that results
in an increase of G signal above the noise level observed in the
maternal-only sample, and a maternally inherited `C` allele
contribution that increases the C signal. The effective value that
has been assumed in these illustrations is f=0.2. At Locus 2, the
paternal allele is `T`. At Locus 3, the fetus is homozygous GG. In
the third row of FIG. 2, the effect of a fetal trisomy is
represented by the dashed red lines, superposed on a normal
(diploid) mixed-sample pattern. The trisomy is assumed to include
Loci 1 and 2, but not Loci 3 and 4. At Loci 1 and 2 both maternal
allele strengths are increased in the mixed sample, as well as the
separate paternal allele contribution. At Locus 3, it was assumed
that the fetus was `GG` and the paternal allele is the same as the
first maternal allele. Note that the ratio between the average of
the two maternal alleles and the paternal allele will be slightly
greater at Loci 1 and 2 than at Locus 4--this is one indicator of
trisomy.
[0122] The location and abundance of SNPs can be used to determine
whether the fetus has an abnormal genotypes, such as Down syndrome
or Kleinfelter Syndrome (XXY). Other examples of abnormal fetal
genotypes include, but are not limited to, aneuploidy such as,
monosomy of one or more chromosomes (X chromosome monosomy, also
known as Turner's syndrome), trisomy of one or more chromosomes
(13, 18, 21, and X), tetrasomy and pentasomy of one or more
chromosomes (which in humans is most commonly observed in the sex
chromosomes, e.g. XXXX, XXYY, XXXY, XYYY, XXXXY, XXXYY, XYYYY and
XXYYY), triploidy (three of every chromosome, e.g. 69 chromosomes
in humans), tetraploidy (four of every chromosome, e.g. 92
chromosomes in humans) and multiploidy. In some embodiments, an
abnormal fetal genotype is a segmental aneuploidy. Examples of
segmental aneuploidy include, but are not limited to, 1p36
duplication, dup(17)(p11.2p11.2) syndrome, Down syndrome,
Pelizaeus-Merzbacher disease, dup(22)(q11.2q11.2) syndrome, and
cat-eye syndrome. In some cases, an abnormal fetal genotype is due
to one or more deletions of sex or autosomal chromosomes, which may
result in a condition such as Cri-du-chat syndrome,
Wolf-Hirschhorn, Williams-Beuren syndrome, Charcot-Marie-Tooth
disease, Hereditary neuropathy with liability to pressure palsies,
Smith-Magenis syndrome, Neurofibromatosis, Alagille syndrome,
Velocardiofacial syndrome, DiGeorge syndrome, Steroid sulfatase
deficiency, Kallmann syndrome, Microphthalmia with linear skin
defects, Adrenal hypoplasia, Glycerol kinase deficiency,
Pelizaeus-Merzbacher disease, Testis-determining factor on Y,
Azospermia (factor a), Azospermia (factor b), Azospermia (factor
c), or 1p36 deletion. In some embodiments, a decrease in
chromosomal number results in an XO syndrome.
[0123] In some cases, data models are fitted for optimal detection
of aneuploidy. For example, the data models can be used to
simultaneously recover estimates of the fraction of fetal cells,
and efficient detection of aneuploidy in hypothesized chromosomes
or chromosomal segments. This integrated approach results in more
reliable and sensitive declarations of aneuploidy.
[0124] Equations 1-3 represent five different models because of the
ambiguity between m.sub.k1 and m.sub.k2 in the last term of
Equations 1 and 3. In other words, since Equation 1 and 3 are
different and in each equation there are two possibilities (i.e.,
m.sub.k1 or m.sub.k2) then it follows that each of Equations 1 and
3 represent two different models. Therefore, Equations 1-3
represent five different models. Testing for aneuploidy of
Chromosomes 13, 18, and 21, for example entails
5.times.5.times.5=125 different model variants that would be fit to
the data.
[0125] The parameter values for the maternal allele identities are
taken from the results for the reference (i.e. maternal-only)
sample and the remaining parameters are fit to the data from the
mixed sample. Because the number of parameters is very large when
the number of loci is large, a global optimization requires
iterative search techniques. One possible approach is to do the
following for each model variant
i) Set f to 0 and solve for A.sub.k at each locus. ii) Set f to a
value equal to the smallest fetal/maternal cell ratio for which
fetal cells are likely to be detectable. iii) Solve for paternal
allele(s) identities and strengths at each locus, one locus at a
time, that minimize data-model residuals. iv) Fix the paternal
alleles and adjust f to minimize residuals over all the data. v)
Now vary only the A.sub.k to minimize residuals. Repeat iv and v
until convergence. vi) Repeat iii through v until convergence.
[0126] The best overall fit of model to data is selected from among
all the model variants. The best overall fit yields the values off
and A.sub.k we will call f.sub.max, A.sub.kmax. The likelihood of
observing the data given f.sub.max can be compared to the
likelihood given f=0. The ratio is a measure of the amount of
evidence for fetal DNA. A typical threshold for declaring fetal DNA
would be a likelihood ratio of 1000 or more. The likelihood
calculation can be approximated by a more familiar Chi-squared
calculation involving the sum of squared residuals between the data
and the model, where each residual is normalized by the expected
rms error. This Chi-squared is a good approximation to the
Log(likelihood) to the extent the expected errors in the data are
Gaussian additive errors, or can be made so by some amplitude
transformation of the data.
[0127] If based on the above determination of likelihood ratio it
is decided that fetal DNA is not present, then the test is declared
to be non-informative. If it is decided that fetal DNA is present,
then the likelihoods of the data given the different data model
types can be compared to declare aneuploidy. The likelihood ratios
of aneuploidy models (Equations 2 and 3) to the normal model
(Equation 1) are calculated and these ratios are compared to a
predefined threshold. Typically this threshold is set so that in
controlled tests all the trisomic cases are declared aneuploid.
Thus, it is expected that the vast majority (>99.9%) of all
truly trisomic cases are declared aneuploid by the test. Another
approach to accomplish approximately the 99.9% detection rate is to
increase the likelihood ratio threshold beyond that necessary to
declare all the known trisomic cases in the validation set by a
factor of 1000/N, where N is the number of trisomy cases in the
validation set.
[0128] In step 161 (FIG. 1), which is optional, the presence of
fetal cells and ratio of fetal alleles/maternal alleles is
determined. Because the fraction of fetal cells can be small or
even zero, the aneuploidy signal (the departure of the observed
ratio from unity) may be weak even when fetal aneuploidy is
present. An independent estimate of fetal cell fraction, including
a confidence estimate of whether measurable fetal DNA is present at
all, is useful in interpreting the observed aneuploidy ratios. FIG.
7 illustrates allele signals re-ordered by rank. Assuming the
mother has no more than two alleles at each locus, the magnitude of
the third ranked allele is potentially a robust indicator of the
presence of fetal DNA. Although measurement errors can artificially
inflate the size of the third and fourth alleles, it is very
unlikely to result in a bimodal distribution for the relative
magnitude of the third allele with respect to the first two. Such a
bimodal distribution is illustrated in FIG. 8. The secondary peak
of this distribution occurs at a value approximately equal to the
fraction of fetal cells. (This is one way to determine the value of
the variable f in the data model.) The statistical confidence that
the bimodality is real can be used to assign a confidence that
fetal DNA was present in the mixed sample. Statistical tests for
bimodality are discussed in M. Y. Cheng and P. Hall, J. R. Statist.
Soc. B (1998), 60 (Part 3) pp. 579-589. If this confidence level
exceeds a threshold, e.g., 90%, 95%, 99% or 99.9%, an aneuploidy
call may be made. The threshold set can be stringent (e.g. 99.9%)
to avoid declaring a fetus normal when in fact it is not. Thus, the
independently estimated fetal cell fraction can be used to
interpret the aneuploidy statistic. For example, a value f=0.5
along with an estimated aneuploidy ratio from the fetal-maternal
mixture of 0.05.+-.0.01 would weaken the evidence for aneuploidy
because the ratio is too small to be consistent with the
independently determined f value (the ratio should be
.about.1+f/2). As another example, a value f=0.1 along with an
estimated aneuploidy ratio from the fetal-maternal mixture of
0.05.+-.0.02 would tend to strengthen the evidence for aneuploidy
because the observed ratio is consistent with the independently
derived value of f.
[0129] In any of the embodiments herein SNP data may be analyzed
for possible errors. For example, in some instances SNP data can
contain small additive errors associated with the readout
technology, multiplicative errors associated with DNA amplification
and hybridization efficiencies being different from locus to locus
and from allele to allele within a locus, and errors associated
with imperfect specificity in the process. By including the many
parameters (Ak) in the model, rather than a single scale parameter,
the residuals include allele-to-allele efficiency differences but
not locus to locus differences. These tend to be multiplicative
errors in the resulting observed allele strengths heights (e.g.,
two signals may be 20% different in strength although the starting
concentrations of the alleles are identical). In other words, by
providing many parameters, the errors that are otherwise
attributable to locus to locus differences, are minimized. As a
first approximation, one can assume errors are random from allele
to allele; errors have relatively small additive measurement noise
error components; and larger Poisson and multiplicative error
components exist. The magnitudes of these error components can be
estimated from repeated processing of identical samples. A
Chi-square residuals calculation for any data-model fit then can be
supported with these modeled squared errors for any peak height or
data bin.
[0130] For example, we anticipate a large scale SNP genotyping
platform such as the Golden Gate assay by Illumina will provide
.about.100 SNP loci per chromosome of interest. Measurements of
repeated `normal` pregnancy samples would give ratios of paternal
to maternal allele strengths which varied by .about.20% due to
assay errors. Averaged over the 100 loci in a chromosome, the ratio
error would be reduced to 20%/sqrt(100), or .about.2%. For an
assumed fetal/maternal cell ratio of 0.2 in a sample, the expected
observed aneuploidy ratio in the case of a trisomy would be 1.10
with an estimation error of 0.02, yielding a confident (5 sigma)
detection of aneuploidy.
[0131] Alternatively, when using a single A parameter, the
residuals will be larger and will contain a component which is
correlated between alleles at the same locus. Calculation of
likelihood will need to take this correlation into account.
[0132] Another aspect of the invention involves a computer
executable logic for determining the presence of fetal cells in a
mixed sample and fetal abnormalities and/or conditions in such
cells. A computer program product is described comprising a
computer usable medium having the computer executable logic
(computer software program, including program code) stored therein.
Computer executable logic when executed by the processor causes the
processor to perform one or more functions described herein. For
example, a computer executable logic can be utilized to automate,
process or control sample collection, sample enrichment,
pre-amplification, SNP data modeling, estimating fetal/maternal
allele ratio, comparing maternal allele intensity from suspected
aneuploid region and control region and determining the existence
of aneuploidy and the type of aneuploidy if one exists.
[0133] For example, the computer executable logic can determine the
presence and ratio of fetal cells to maternal cells in a mixed
sample. The executable code can also receive data for one or more
SNPs, and apply such data to one or more data models. The computer
executable logic can then calculate a set of values for each of the
data sets associated with each data model; select the data model
that best fit the data, model and calculate for any potential
errors in the data models; for example, a computer executable logic
can determine the ratio of maternal alleles to paternal alleles in
one or more SNP locations; and/or the ratio of maternal alleles in
a region suspected of aneuploidy and a control region. One example
of a data model provides a determination of a fetal abnormality
from given data signals of SNPs at two genomic regions. The
executable logic can establish the presence or absence of trisomy,
and conclude whether the trisomy is paternally derived or if it
originated from a maternal non-disjunction event. For example, the
program can fit SNP data to the following model, which can provide
the diagnosis as follows:
[0134] A normal (diploid) fetus result in data x.sub.k at locus k
and is represented by:
x.sub.k=A.sub.k[(1-f)(m.sub.k1+m.sub.k2)+f((m.sub.k1 or
m.sub.k2)+p.sub.k)]+residual (1)
[0135] A trisomy caused by maternal non-dysjunction is represented
by
x.sub.k=A.sub.k[(1-f)(m.sub.k1+m.sub.k2)+f(m.sub.k1+m.sub.k2+p.sub.k)]+r-
esidual (2)
[0136] and a paternally inherited trisomy is represented by
x.sub.k=A.sub.k[(1-f)(m.sub.k1+m.sub.k2)+f((m.sub.k1 or
m.sub.k2)+p.sub.k1+p.sub.k2)]+residual (3)
[0137] In Equations 1-3, A.sub.k denotes a scale factor which
subsumes the efficiencies of amplification, hybridization, and
readout common to the alleles at locus k. In this model
amplification differences between different primer pairs are fitted
and do not appear in the residuals. Alternatively, a single A
parameter could be used and the residuals would reflect these
differences. Further, f represents the fraction of fetal cells in
the mixture, m.sub.k1 and m.sub.k2 denote the maternal alleles at
locus k, and p.sub.k denotes the paternal allele at locus k. The
allele symbols actually represent unit data contributions that can
be arithmetically summed; e.g., m.sub.k1 might be a detection of
the `C` genotype represented by unit contribution to the `C` bin at
that locus.
[0138] In some cases, the computer executable logic records data
measurements corresponding to readouts (e.g., SNP intensities from
DNA microamap or a sequencing machine. Such measurements can be
processed by the computer executable logic to determine
fetal/maternal allele ratios and provide a call with result with
respect to detection of aneuploidy. Moreover, computer executable
logic can control display of such results in print or electronic
formats, which an operator can view. Thus, a computer executable
logic can include code for receiving data on one or more target DNA
polymorphisms (i.e. SNP loci); calculating a set of values for each
of the data sets associated with each data model; selecting the
data model that best fit the data, wherein the best model will be
an indication of the presence of fetal cells in the mixed sample
and fetal abnormalities and/or conditions in said cells. The
determination of presence of fetal cells in the mixed sample and
fetal abnormalities and/or conditions in said cells can be made by
the computer executable logic or an user. Therefore, the computer
based logic can provide results for estimating fetal/maternal
ratios, allele strength and aneuploidy, which can be observed by a
technician or operator.
EXAMPLES
Example 1
Separation of Fetal Cord Blood
[0139] FIG. 12A-D shows a schematic of the device used to separate
nucleated cells from fetal cord blood.
[0140] Dimensions: 100 mm.times.28 mm.times.1 mm
[0141] Array design: 3 stages, gap size=18, 12 and 8 .mu.m for the
first, second and third stage, respectively.
[0142] Device fabrication: The arrays and channels were fabricated
in silicon using standard photolithography and deep silicon
reactive etching techniques. The etch depth is 140 Through holes
for fluid access are made using KOH wet etching. The silicon
substrate was sealed on the etched face to form enclosed fluidic
channels using a blood compatible pressure sensitive adhesive
(9795, 3M, St Paul, Minn.).
[0143] Device packaging: The device was mechanically mated to a
plastic manifold with external fluidic reservoirs to deliver blood
and buffer to the device and extract the generated fractions.
[0144] Device operation: An external pressure source was used to
apply a pressure of 2.0 PSI to the buffer and blood reservoirs to
modulate fluidic delivery and extraction from the packaged
device.
[0145] Experimental conditions: Human fetal cord blood was drawn
into phosphate buffered saline containing Acid Citrate Dextrose
anticoagulants. 1 mL of blood was processed at 3 mL/hr using the
device described above at room temperature and within 48 hrs of
draw. Nucleated cells from the blood were separated from enucleated
cells (red blood cells and platelets), and plasma delivered into a
buffer stream of calcium and magnesium-free Dulbecco's Phosphate
Buffered Saline (14190-144, Invitrogen, Carlsbad, Calif.)
containing 1% Bovine Serum Albumin (BSA) (A8412-100ML,
Sigma-Aldrich, St Louis, Mo.) and 2 mM EDTA (15575-020, Invitrogen,
Carlsbad, Calif.).
[0146] Measurement techniques: Cell smears of the product and waste
fractions (FIG. 8A-8B) were prepared and stained with modified
Wright-Giemsa (WG16, Sigma Aldrich, St. Louis, Mo.).
[0147] Performance: Fetal nucleated red blood cells were observed
in the product fraction (FIG. 8A) and absent from the waste
fraction (FIG. 8B).
Example 2
Isolation of Fetal Cells from Maternal Blood
[0148] The device and process described in detail in Example 1 were
used in combination with immunomagnetic affinity enrichment
techniques to demonstrate the feasibility of isolating fetal cells
from maternal blood.
[0149] Experimental conditions: blood from consenting maternal
donors carrying male fetuses was collected into K.sub.2EDTA
vacutainers (366643, Becton Dickinson, Franklin Lakes, N.J.)
immediately following elective termination of pregnancy. The
undiluted blood was processed using the device described in Example
1 at room temperature and within 9 hrs of draw. Nucleated cells
from the blood were separated from enucleated cells (red blood
cells and platelets), and plasma delivered into a buffer stream of
calcium and magnesium-free Dulbecco's Phosphate Buffered Saline
(14190-144, Invitrogen, Carlsbad, Calif.) containing 1% Bovine
Serum Albumin (BSA) (A8412-100ML, Sigma-Aldrich, St Louis, Mo.).
Subsequently, the nucleated cell fraction was labeled with
anti-CD71 microbeads (130-046-201, Miltenyi Biotech Inc., Auburn,
Calif.) and enriched using the MiniMACS.TM. MS column (130-042-201,
Miltenyi Biotech Inc., Auburn, Calif.) according to the
manufacturer's specifications. Finally, the CD71-positive fraction
was spotted onto glass slides.
[0150] Measurement techniques: Spotted slides were stained using
fluorescence in situ hybridization (FISH) techniques according to
manufacturer's specifications using Vysis probes (Abbott
Laboratories, Downer's Grove, Ill.). Samples were stained from the
presence of X and Y chromosomes. In one case, a sample prepared
from a known Trisomy 21 pregnancy was also stained for chromosome
21.
[0151] Performance: Isolation of fetal cells was confirmed by the
reliable presence of male cells in the CD71-positive population
prepared from the nucleated cell fractions (FIGS. 10A-10F). In the
single abnormal case tested, the trisomy 21 pathology was also
identified (FIG. 11).
Example 3
Confirmation of the Presence of Male Fetal Cells in Enriched
Samples
[0152] Confirmation of the presence of a male fetal cell in an
enriched sample is performed using qPCR with primers specific for
DYZ, a marker repeated in high copy number on the Y chromosome.
After enrichment of fnRBC by any of the methods described herein,
the resulting enriched fnRBC are binned by dividing the sample into
100 PCR wells. Prior to binning, enriched samples may be screened
by FISH to determine the presence of any fnRBC containing an
aneuploidy of interest. Because of the low number of fnRBC in
maternal blood, only a portion of the wells will contain a single
fnRBC (the other wells are expected to be negative for fnRBC). The
cells are fixed in 2% Paraformaldehyde and stored at 4.degree. C.
Cells in each bin are pelleted and resuspended in 5 .mu.l PBS plus
1 .mu.l 20 mg/ml Proteinase K (Sigma #P-2308). Cells are lysed by
incubation at 65.degree. C. for 60 minutes followed by inactivation
of the Proteinase K by incubation for 15 minutes at 95.degree. C.
For each reaction, primer sets (DYZ forward primer
TCGAGTGCATTCCATTCCG (SEQ ID NO: 1); DYZ reverse primer
ATGGAATGGCATCAAACGGAA (SEQ ID NO: 2); and DYZ Taqman Probe
6FAM-TGGCTGTCCATTCCA-MGBNFQ (SEQ ID NO: 3)), TaqMan Universal PCR
master mix, No AmpErase and water are added. The samples are run
and analysis is performed on an ABI 7300: 2 minutes at 50.degree.
C., 10 minutes 95.degree. C. followed by 40 cycles of 95.degree. C.
(15 seconds) and 60.degree. C. (1 minute). Following confirmation
of the presence of male fetal cells, further analysis of bins
containing fnRBC is performed. Positive bins may be pooled prior to
further analysis.
[0153] FIG. 13 shows the results expected from such an experiment.
The data in FIG. 13 was collected by the following protocol.
Nucleated red blood cells were enriched from cord cell blood of a
male fetus by sucrose gradient two Heme Extractions (HE). The cells
were fixed in 2% paraformaldehyde and stored at 4.degree. C.
Approximately 10.times.1000 cells were pelleted and resuspended
each in 5 .mu.l PBS plus 1 .mu.l 20 mg/ml Proteinase K (Sigma
#P-2308). Cells were lysed by incubation at 65.degree. C. for 60
minutes followed by a inactivation of the Proteinase K by 15 minute
at 95.degree. C. Cells were combined and serially diluted 10-fold
in PBS for 100, 10 and 1 cell per 6 .mu.l final concentration were
obtained. Six .mu.l of each dilution was assayed in quadruplicate
in 96 well format. For each reaction, primer sets (DYZ forward
primer TCGAGTGCATTCCATTCCG (SEQ ID NO: 1); 0.9 uM DYZ reverse
primer ATGGAATGGCATCAAACGGAA (SEQ ID NO: 2); and 0.5 uM DYZ TaqMan
Probe 6FAM-TGGCTGTCCATTCCA-MGBNFQ (SEQ ID NO: 3)), TaqMan Universal
PCR master mix, No AmpErase and water were added to a final volume
of 25 .mu.l per reaction. Plates were run and analyzed on an ABI
7300: 2 minutes at 50.degree. C., 10 minutes 95.degree. C. followed
by 40 cycles of 95.degree. C. (15 seconds) and 60.degree. C. (1
minute). These results show that detection of a single fnRBC in a
bin is possible using this method.
Example 4
Confirmation of the Presence of Fetal Cells in Enriched Samples by
STR Analysis
[0154] Maternal blood is processed through a size-based separation
module, with or without subsequent MHEM enhancement of fnRBCs. The
enhanced sample is then subjected to FISH analysis using probes
specific to the aneuploidy of interest (e.g., triploidy 13,
triploidy 18, and XYY). Individual positive cells are isolated by
"plucking" individual positive cells from the enhanced sample using
standard micromanipulation techniques. Using a nested PCR protocol,
STR marker sets are amplified and analyzed to confirm that the
FISH-positive aneuploid cell(s) are of fetal origin. For this
analysis, comparison to the maternal genotype is typical. An
example of a potential resulting data set is shown in Table 2.
Non-maternal alleles may be proven to be paternal alleles by
paternal genotyping or genotyping of known fetal tissue samples. As
can be seen, the presence of paternal alleles in the resulting
cells, demonstrates that the cell is of fetal origin (cells #1, 2,
9, and 10). Positive cells may be pooled for further analysis to
diagnose aneuploidy of the fetus, or may be further analyzed
individually.
TABLE-US-00002 TABLE 2 STR locus alleles in maternal and fetal
cells STR STR STR STR STR locus locus locus locus locus DNA Source
D14S D16S D8S F13B vWA Maternal alleles 14, 17 11, 12 12, 14 9, 9
16, 17 Cell #1 alleles 8 19 Cell #2 alleles 17 15 Cell #3 alleles
14 Cell #4 alleles Cell #5 alleles 17 12 9 Cell #6 alleles Cell #7
alleles 19 Cell #8 alleles Cell #9 alleles 17 14 7, 9 17, 19 Cell
#10 alleles 15
Example 5
Confirmation of the Presence of Fetal Cells in Enriched Samples by
SNP Analysis
[0155] Maternal blood is processed through a size-based separation
module, with or without subsequent MHEM enhancement of fnRBCs. The
enhanced sample is then subjected to FISH analysis using probes
specific to the aneuploidy of interest (e.g., triploidy 13,
triploidy 18, and XYY). Samples testing positive with FISH analysis
are then binned into 96 microtiter wells, each well containing 15
.mu.l of the enhanced sample. Of the 96 wells, 5-10 are expected to
contain a single fnRBC and each well should contain approximately
1000 nucleated maternal cells (both WBC and mnRBC). Cells are
pelleted and resuspended in 5 .mu.l PBS plus 1 .mu.l 20 mg/ml
Proteinase K (Sigma #P-2308). Cells are lysed by incubation at
65.degree. C. for 60 minutes followed by a inactivation of the
Proteinase K by 15 minute at 95.degree. C.
[0156] In this example, the maternal genotype (BB) and fetal
genotype (AB) for a particular set of SNPs is known. The genotypes
A and B encompass all three SNPs and differ from each other at all
three SNPs. The following sequence from chromosome 7 contains these
three SNPs (rs7795605, rs7795611 and rs7795233 indicated in
brackets, respectively)
(ATGCAGCAAGGCACAGACTAA[G/A]CAAGGAGA[G/C]GCAAAATTTTC[A/G]TAGGGGAGAGAAATGGG-
TCAT T, SEQ ID NO: 4).
[0157] In the first round of PCR, genomic DNA from binned enriched
cells is amplified using primers specific to the outer portion of
the fetal-specific allele A and which flank the interior SNP
(forward primer ATGCAGCAAGGCACAGACTACG (SEQ ID NO: 5); reverse
primer AGAGGGGAGAGAAATGGGTCATT (SEQ ID NO: 6)). In the second round
of PCR, amplification using real time SYBR Green PCR is performed
with primers specific to the inner portion of allele A and which
encompass the interior SNP (forward primer
CAAGGCACAGACTAAGCAAGGAGAG (SEQ ID NO: 7); reverse primer
GGCAAAATTTTCATAGGGGAGAGAAATGGGTCATT (SEQ ID NO: 8)).
[0158] Expected results are shown in FIG. 14. Here, six of the 96
wells test positive for allele A, confirming the presence of cells
of fetal origin, because the maternal genotype (BB) is known and
cannot be positive for allele A. DNA from positive wells may be
pooled for further analysis or analyzed individually.
Example 6
Use of Highly Parallel Genotyping and High Throughput Sequencing
for Fetal Diagnosis
[0159] Fetal cells or nuclei can be isolated as described in the
enrichment section or as described in example 1. The enrichment
process described in example 1 may generate a final mixture
containing approximately 500 maternal white blood cells (WBCs),
approximately 100 maternal nuclear red blood cells (mnBCs), and a
minimum of approximately 10 fetal nucleated red blood cells
(fnRBCs) starting from an initial 20 ml blood sample taken late in
the first trimester. In the context of fetal diagnosis, it is very
valuable to have a reference sample containing only the mother's
genotype. When the diagnosis procedure is based on enriching for
circulating fetal cells in the mother's blood, the reference sample
can be created simply by not enriching for fetal cells, and then
diluting enough to ensure that <<1 fetal cell is expected in
the sample used as input to the SNP detection process.
Alternatively, white blood cells can be selected, for which the
circulating fetal fraction is negligible.
[0160] Perform Multiple Displacement Amplification (MDA): Current
technologies and protocols for highly parallel SNP detection with
DNA microarray readout result in inaccurate calls when there are
too few starting DNA copies or when a particular allele represents
a small fraction in the population of input DNA molecules. In the
methods described herein a ratio-preserving pre-amplification of
the DNA, such as multiple displacement amplification, is done to
provide enough copies to support accurate SNP detection via primer
extension ligation methods described below. This pre-amplification
method is chosen to produce as close as possible the same
amplification factor for all target regions of the genome.
[0161] Multiple displacement amplification protocols can be
performed as described in Gonzalez et al. Environmental
Microbiology 7(7) 1024-1028, (2005). Briefly, samples are suspended
in 100 ul 10 mM Tris-HCl buffer (pH 7.5). Cells are lysed by adding
100 ul of alkaline lysis solution (400 mM KOH, 100 mM DTT, 10 mM
EDTA) and incubating cells for 10 min on ice. Lysed cells are
neutralized with 100 .mu.l of neutralization solution (2 ml 1 M HCl
and 3 ml 1 M Tris-HCl). Lysed cells are used directly as template
in MDA and PCR reactions.
[0162] 1 .mu.l template DNA in 9 .mu.l sample buffer (50 mM
Tris-HCl, (pH 8.2), 0.5 mM EDTA) containing random hexamers is
denatured at 95.degree. C. for 3 min and placed on ice. Buffer (9
.mu.l) containing dNTPs and 1 .mu.l enzyme mix containing .PHI.29
DNA polymerase are added to the 10 .mu.l of denatured DNA
template-random hexamers solution and incubated at 30.degree. C.
for 6 h. A final incubation at 65.degree. C. for 10 min inactivated
the .PHI.29 DNA polymerase.
[0163] Highly Parallel Genotyping: Highly parallel SNP detection
can be used to obtain information about genotype and gene copy
numbers at a large number of loci scattered across the genome, in
one procedure. Highly parallel SNP genotyping can be performed as
described in Fan et al. Cold Spring Harb Symp Quant Biol; 68:
69-78, (2003). Genomic DNA is immobilized to streptavidin-coated
magnetic beads by mixing 20 .mu.l of DNA (100 ng/.mu.l) with 5
.mu.l of photobiotin (0.2 .mu.g/.mu.l) and 15 .mu.l of mineral oil,
and incubating at 95.degree. C. for 30 minutes. Trizma base (25
.mu.l of 0.1 M) is added, followed by two extractions with 75 .mu.l
of Sec-butanol to remove unreacted photobiotin. The extracted gDNA
(20 .mu.l) is then mixed with 34 .mu.l of Paramagnetic Particle A
Reagent (MPA; Illumina) and incubated at room temperature for 90
minutes. The immobilized gDNA is then washed twice with DNA wash
buffer (WDI) (Illumina) and resuspended at 10 ng/pl in WDI. In each
subsequent reaction, 200 ng (10 .mu.l) of DNA is used.
[0164] Assay oligonucleotides are then annealed to the genomic DNA
by combining the immobilized DNA (10 .mu.l) with annealing reagent
(MAI; Illumina; 30 .mu.l) and SNP-specific oligonucleotides (10
.mu.l containing 25 nM of each oligonucleotide) to a final volume
of 50 LSOs are synthesized with a 5' phosphate to enable ligation.
Annealing is carried out by ramping temperature from 70.degree. C.
to 30.degree. C. over .about.8 hours, then holding at 30.degree. C.
until the next processing step.
[0165] After annealing, excess and mishybridized oligonucleotides
are washed away, and 37 .mu.l of master mix for extension (MME;
Illumina) is added to the beads. Extension is carried out at room
temperature for 15 minutes. After washing, 37 .mu.l of master mix
for ligation (MML; Illumina) is added to the extension products,
and incubated for 20 minutes at 57.degree. C. to allow the extended
upstream oligo to ligate to the downstream oligo.
[0166] The extension products are then amplified by PCR. After
extension and ligation, the beads are washed with universal buffer
1 (UB 1; Illumina), resuspended in 35 .mu.l of elution buffer (IPI;
Illumina) and heated at 95.degree. C. for one minute to release the
ligated products. The supernatant is then used in a 60-.mu.l PCR.
PCR reactions are thermocycled as follows: 10 seconds at 25.degree.
C.; 34 cycles of (35 seconds at 95.degree. C., 35 seconds at
56.degree. C., 2 minutes at 72.degree. C.); 10 minutes at
72.degree. C.; and cooled to 4.degree. C. for 5 minutes. The three
universal PCR primers (PI, P2, and P3) are labeled with Cy3, Cy5,
and biotin, respectively.
[0167] High throughput sequencing: After the SNP-specific
ligation-extension reaction, and amplification of the products,
readout of the SNP types can be done using high throughput
sequencing as described in Margulies et al. Nature 437 376-380,
(2005). Briefly, the amplicons are diluted and mixed with beads
such that each bead captures a single molecule of the amplified
material. The DNA-carrying beads are isolated in separate 100 um
aqueous droplets made through the creation of a
PCR-reaction-mixture-in-oil emulsion. The DNA molecule on each bead
is then amplified to generate millions of copies of the sequence,
which all remain bound to the bead. Finally, the beads are placed
into a highly parallel sequencing-by-synthesis machine which can
generate over 400,000 sequence reads (.about.100 bp per read) in a
single 4 hour run.
[0168] Fetal Diagnosis: The SNP data obtained from the high
throughput sequencing is analyzed for fetal diagnosis using the
methods described in Example 9.
Example 7
Use of Highly Parallel Genotyping and Bead Arrays for Fetal
Diagnosis
[0169] Fetal cells or nuclei can be isolated as described in the
enrichment section or as described in example 1. The enrichment
process described in example 1 may generate a final mixture
containing approximately 500 maternal white blood cells (WBCs),
approximately 100 maternal nuclear red blood cells (mnBCs), and a
minimum of approximately 10 fetal nucleated red blood cells
(fnRBCs) starting from an initial 20 ml blood sample taken late in
the first trimester. In the context of fetal diagnosis, it is very
valuable to have a reference sample containing only the mother's
genotype. When the diagnosis procedure is based on enriching for
circulating fetal cells in the mother's blood, the reference sample
can be created simply by not enriching for fetal cells, and then
diluting enough to ensure that <<1 fetal cell is expected in
the sample used as input to the SNP detection process.
Alternatively, white blood cells can be selected, for which the
circulating fetal fraction is negligible.
[0170] Perform Linear Amplification of Genomic DNA: Current
technologies and protocols for highly parallel SNP detection with
DNA microarray readout result in inaccurate calls when there are
too few starting DNA copies or when a particular allele represents
a small fraction in the population of input DNA molecules. In the
methods described herein a ratio-preserving pre-amplification of
the DNA, such as linear amplification of genomic DNA, is done to
provide enough copies to support accurate SNP detection via primer
extension ligation methods described below. This pre-amplification
method is chosen to produce as close as possible the same
amplification factor for all target regions of the genome.
[0171] Linear amplification protocols can be performed as described
in Liu et al. BMC Genomics 4(1) 19-30 (2003). This protocol uses a
terminal transferase tailing step and second strand synthesis to
incorporate T7 promoters at the ends of the DNA fragments prior to
in vitro transcription (IVT). Briefly, genomic DNA can be obtained
either by ChIP or by restriction digests. ChIP DNA is fragmented by
sonication and isolated using antibody against di-methyl-H3 K4.
Restricted genomic DNA is prepared as follows: genomic DNA isolated
by bead lysis, phenol/chloroform extraction, and ethanol
precipitation, is restricted either with Alu I or with Rsa I (New
England BioLabs (NEB)). Digested products then undergo
electrophoresis on a 2% agarose gel. Restriction fragments in the
100-700 bp size range are excised from the gel and purified using
the QIAquick Gel Extraction Kit (Qiagen).
[0172] Calf intestinal alkaline phosphatase (CIP) (NEB) is used to
remove 3' phosphate groups from DNA samples prior to IVT. Up to 500
ng DNA is incubated with 2.5 U enzyme in a 10 .mu.l volume with the
supplied buffer at 37.degree. C. for 1 hour. The reaction was
cleaned up with the MinElute Reaction Cleanup Kit (Qiagen) per
manufacturer instructions except that the elution volume is
increased to 20 .mu.l.
[0173] PolyT tails are generated using terminal transferase (TdT)
as follows. Up to 50 ng of CIP-treated template DNA is incubated
for 20 minutes at 37.degree. C. in a 10 .mu.l solution containing
20 U TdT (NEB), 0.2 M potassium cacodylate, 25 mM Tris-HCl pH 6.6,
0.25 mg/ml BSA, 0.75 mM CoCl.sub.2, 4.6 .mu.M dTTP and 0.4 .mu.M
ddCTP. The reaction is halted by the addition of 2 of 0.5 M EDTA pH
8.0, and product isolated with the MinElute Reaction Cleanup Kit
(Qiagen), increasing the elution volume to 20 .mu.l.
[0174] Second strand synthesis and incorporation of the T7 promoter
sequence is carried out as follows: the 20 .mu.l tailing reaction
product is mixed with 0.6 .mu.l of 25 .mu.M T7-A18B primer
(5'-CATTAGCGGCCGCGAAATTAATACGACTCACTATAGGGAG(A)18 [B], where B
refers to C, G or T, SEQ ID NO: 9), 5 .mu.l 10.times.EcoPol buffer
(100 mM Tris-HCl pH 7.5, 50 mM MgCl.sub.2, 75 mM dTT), 2 .mu.l 5.0
mM dNTPs, and 20.4 .mu.l nuclease-free water. In experiments with
10-50 ng starting material, the end primer concentration is kept at
300 nM, while the reaction volume is scaled down to maintain an end
concentration of 1 ng/ul starting material. For starting amounts
less than 10 ng, the volume is kept at 10 .mu.l. If necessary,
volume reduction of the eluate from the TdT tailing is performed in
a vacuum centrifuge on medium heat. Samples are incubated at
94.degree. C. for 2 minutes to denature, ramped down at -1.degree.
C./sec to 35.degree. C., held at 35.degree. C. for 2 minutes to
anneal, ramped down at -0.5.degree. C./sec to 25.degree. C. and
held while Klenow enzyme is added (NEB) to an end concentration of
0.2 U/.mu.l. The sample is then incubated at 37.degree. C. for 90
minutes for extension. The reaction is halted by addition of 5
.mu.L 0.5 M EDTA pH 8.0 and product is isolated with the MinElute
Reaction Cleanup Kit (Qiagen), increasing the elution volume to 20
.mu.L.
[0175] Prior to in vitro transcription, samples are concentrated in
a vacuum centrifuge at medium heat to 8 .mu.l volume. The in vitro
transcription is performed with the T7 Megascript Kit (Ambion) per
manufacturer's instructions, except that the 37.degree. C.
incubation is increased to 16 hours. The samples are purified with
the RNeasy Mini Kit (Qiagen) per manufacturer's RNA cleanup
protocol, except with an additional 500 .mu.L wash with buffer RPE.
RNA is quantified by absorbance at 260 nm, and visualized on a
denaturing 1.25.times.MOPS-EDTA-Sodium Acetate gel.
[0176] Highly Parallel Genotyping: Highly parallel SNP detection
can be used to obtain information about genotype and gene copy
numbers at a large number of loci scattered across the genome, in
one procedure. Highly parallel SNP genotyping can be performed as
described in Example 6.
[0177] Bead Array: After the SNP-specific ligation-extension
reaction, and amplification of the products, readout of the SNP
types can be done using bead arrays as described in Shen at al.
Mutation Research 573 70-82, (2005). Double-stranded PCR products
are immobilized onto paramagnetic particles by adding 20 .mu.l of
Paramagnetic Particle B Reagent (MPB; Illumina) to each 60-.mu.l
PCR, and incubated at room temperature for a minimum of 60 minutes.
The bound PCR products are washed with universal buffer 2 (UB2;
Illumina), and denatured by adding 30 .mu.l of 0.1 N NaOH. After
one minute at room temperature, 25 .mu.l of the released ssDNAs is
neutralized with 25 .mu.l of hybridization reagent (MH I: Illumina)
and hybridized to arrays.
[0178] Arrays are hydrated in UB2 for 3 minutes at room
temperature, and then preconditioned in 0.1 N NaOH for 30 seconds.
Arrays are returned to the UB2--reagent for at least 1 minute to
neutralize the NaOH. The pretreated arrays are exposed to the
labeled ssDNA samples described above. Hybridization is conducted
under a temperature gradient program from 60.degree. C. to
45.degree. C. over -12 hours. The hybridization is held at
45.degree. C. until the array is processed. After hybridization,
the arrays are first rinsed twice in UB2 and once with IS1 (IS 1;
Illumina) at room temperature with mild agitation, and then imaged
at a resolution of 0.8 microns using a BeadArray Reader (Illumina).
PMT settings are optimized for dynamic range, channel balance, and
signal-to-noise ratio. Cy3 and Cy5 dyes are excited by lasers
emitting at 532 nm and 635 nm, respectively.
[0179] The automatic calling of genotypes is performed by genotype
calling software (GenCall) genotyping software, using a Bayesian
model, which compared intensities between probes for allele A and
allele B across a large number of samples to create archetypal
clustering patterns. These patterns allowed the genotyping data to
be assigned membership to clusters using a probabilistic model and
allowed assignment of a corresponding GenCall score. For example,
data points falling between two clusters are assigned a low
probability score of being a member of either cluster and had a
correspondingly low GenCall score. The cluster quality can be
assessed by evaluating the CSS, a measure of statistical separation
between clusters. It is defined as
CSS = min ( .theta. AB - .theta. AA .sigma. AB + .sigma. AA ,
.theta. AB - .theta. BB .sigma. AB + .sigma. BB ) .
##EQU00001##
[0180] Loci with cluster scores around the cutoff of 3.0 are
visually evaluated and the training clusters refined by manual
intervention. A cutoff value of 3.0 can be chosen for the CSS on
the basis of minimizing strand concordance errors. Loci with
questionable clusters are scored as unsuccessful and excluded from
further analysis.
[0181] Fetal Diagnosis: The SNP data obtained from the bead array
assay is analyzed for fetal diagnosis using the methods described
in Example 9.
Example 8
Use of Highly Parallel Genotyping and DNA Arrays for Fetal
Diagnosis
[0182] Fetal cells or nuclei can be isolated as described in the
enrichment section or as described in example 1. The enrichment
process described in example 1 may generate a final mixture
containing approximately 500 maternal white blood cells (WBCs),
approximately 100 maternal nuclear red blood cells (mnBCs), and a
minimum of approximately 10 fetal nucleated red blood cells
(fnRBCs) starting from an initial 20 ml blood sample taken late in
the first trimester. In the context of fetal diagnosis, it is very
valuable to have a reference sample containing only the mother's
genotype. When the diagnosis procedure is based on enriching for
circulating fetal cells in the mother's blood, the reference sample
can be created simply by not enriching for fetal cells, and then
diluting enough to ensure that <<1 fetal cell is expected in
the sample used as input to the SNP detection process.
Alternatively, white blood cells can be selected, for which the
circulating fetal fraction is negligible.
[0183] Perform Multiple Displacement Amplification: Current
technologies and protocols for highly parallel SNP detection with
DNA microarray readout result in inaccurate calls when there are
too few starting DNA copies or when a particular allele represents
a small fraction in the population of input DNA molecules. In the
methods described herein a ratio-preserving pre-amplification of
the DNA, such as multiple displacement amplification, is done to
provide enough copies to support accurate SNP detection via primer
extension ligation methods described below. This pre-amplification
method is chosen to produce as close as possible the same
amplification factor for all target regions of the genome. Multiple
displacement amplification protocols can be performed as described
in Example 6.
[0184] Highly Parallel Genotyping: Highly parallel SNP detection
can be used to obtain information about genotype and gene copy
numbers at a large number of loci scattered across the genome, in
one procedure. Highly parallel SNP genotyping can be performed as
described in Example 6.
[0185] DNA Array: After the SNP-specific ligation-extension
reaction, and amplification of the products, readout of the SNP
types can be done using DNA arrays as described in Gunderson et al.
Nature Genetics 37(5) 549-554, (2005). The array data can be
obtained using Illumina's Sentrix BeadArray matrix. Oligonucleotide
probes on the beads are 75 bases in length; 25 bases at the 5' end
are used for decoding and the remaining 50 bases are
locus-specific. The oligonucleotides are immobilized on activated
beads using a 5' amino group. The array can contain probes for SNP
assays (probe pairs, allele A and allele B).
[0186] The amplification products of the SNP-specific
ligation-extension reaction are denatured at 95.degree. C. for 5
min and then exposed it to the Sentrix array matrix, which is mated
to a microtiter plate, submerging the fiber bundles in 15 ml of
hybridization sample. The entire assembly is incubated for 14-18 h
at 48.degree. C. with shaking. After hybridization, arrays are
washed in 1.times. hybridization buffer and 20% formamide at
48.degree. C. for 5 min.
[0187] Allele Specific Primer Extension (ASPE) can be used to score
SNPs. Before carrying out the array-based primer extension
reaction, Sentrix array matrices are washed for 1 min with wash
buffer (33.3 mM NaCl, 3.3 mM potassium phosphate and 0.1% Tween-20,
pH 7.6) and then incubated for 15 min in 50 .mu.l of ASPE reaction
buffer (Illumina EMM, containing polymerase, a mix of
biotin-labeled and unlabeled nucleotides, single-stranded binding
protein, bovine serum albumin and appropriate buffers and salts) at
37.degree. C. After the reaction, the arrays are immediately
stripped in freshly prepared 0.1 N NaOH for 2 min and then washed
and neutralized twice in 1.times. hybridization buffer for 30 s.
The biotin-labeled nucleotides incorporated during primer extension
using a sandwich assay is then detected as described in Pinkel et
al. PNAS 83 (1986) 2934-2938. The arrays are blocked at room
temperature for 10 min in 1 mg ml.sup.-1 bovine serum albumin in
1.times. hybridization buffer and then washed for 1 min in 1.times.
hybridization buffer. The arrays are then stained with
streptavidin-phycoerythrin solution (1.times. hybridization buffer,
3 .mu.g ml.sup.-1 streptavidin-phycoerythrin (Molecular Probes) and
1 mg ml.sup.-1 bovine serum albumin) for 10 min at room
temperature. The arrays are washed with 1.times. hybridization
buffer for 1 min and then counterstained them with an antibody
reagent (10 mg ml.sup.-1 biotinylated antibody to streptavidin
(Vector Labs) in 1.times.PBST (137 mM NaCl, 2.7 mM KCl, 4.3 mM
sodium phosphate, 1.4 mM potassium phosphate and 0.1% Tween-20)
supplemented with 6 mg ml.sup.-1 goat normal serum) for 20 min.
After counterstaining, the arrays are washed in 1.times.
hybridization buffer and restained them with
streptavidin-phycoerythrin solution for 10 min. The arrays are
washed one final time in 1.times. hybridization buffer before
imaging them in 1.times. hybridization buffer on a custom CCD-based
BeadArray imaging system. The intensities are extracted intensities
using custom image analysis software.
[0188] The automatic calling of genotypes is performed by genotype
calling software (GenCall) genotyping software as described in
example 7.
[0189] Fetal Diagnosis: The SNP data obtained from the DNA array
assay is analyzed for fetal diagnosis using the methods described
in Example 9.
Example 9
Fetal Diagnosis
[0190] Results obtained in Example 6, 7, and 8 can be used for
fetal diagnosis.
[0191] A model for SNP data in the context of fetal diagnosis is
given in Equations 1-3. A normal (diploid) fetus will result in
data xk at locus k
x.sub.k=A.sub.k[(m.sub.k1+m.sub.k2)+f((m.sub.k1 or
m.sub.k2)+p.sub.k)]+residual (1)
[0192] A trisomy caused by maternal non-dysjunction would be
represented by
x.sub.k=A.sub.k[(m.sub.k1+m.sub.k2)+f(m.sub.k1 or
m.sub.k2)+p.sub.k)]+residual (2)
[0193] and a paternally inherited trisomy would be represented
by
x.sub.k=A.sub.k[m.sub.k1+m.sub.k2)+f((m.sub.k1 or
m.sub.k2)+p.sub.k1+p.sub.k2)]+residual (3)
[0194] In Equations 1-3, A.sub.k denotes a scale factor which
subsumes the efficiencies of amplification, hybridization, and
readout common to the alleles at locus k. In this model
amplification differences between different primer pairs are fitted
and do not appear in the residuals. Alternatively, a single A
parameter could be used and the residuals would reflect these
differences. f represents the fraction of fetal cells in the
mixture, m.sub.k1 and m.sub.k2 denote the maternal alleles at locus
k, and p.sub.k denotes the paternal allele at locus k. The allele
symbols actually represent unit data contributions that can be
arithmetically summed; e.g., m.sub.k1 might be a detection of the
`C` genotype represented by unit contribution to the `C` bin at
that locus.
[0195] FIG. 6 illustrates the kinds of SNP calls that result under
this data model. At Locus 1, the fetal genotype was GC. There is a
paternally inherited `G` allele contribution in the mixed sample
that results in an increase of G signal above the noise level
observed in the maternal-only sample, and a maternally inherited
`C` allele contribution that increases the C signal. The effective
value of f that has been assumed in these illustrations is f=0.2.
At Locus 2, the paternal allele is `T`. At Locus 3, the fetus is
homozygous GG. In the third row of FIG. 6, the effect of a fetal
trisomy is represented by the dashed red lines, superposed on a
normal (diploid) mixed-sample pattern. The trisomy is assumed to
include Loci 1 and 2, but not Loci 3 and 4. At Loci 1 and 2 both
maternal allele strengths are increased in the mixed sample, as
well as the separate paternal allele contribution. At Locus 3, it
was assumed that the fetus was `GG` and the paternal allele is the
same as the first maternal allele. Note that the ratio between the
average of the two maternal alleles and the paternal allele will be
slightly greater at Loci 1 and 2 than at Locus 4--this is one
indicator of trisomy.
Simple, Suboptimal Detection Methods
[0196] A simple intuitive understanding of the effect of trisomy is
that it increases the abundances of fetal alleles at loci within
the affected region. Trisomies are predominately from maternal
non-disjunction events, so typically both maternal alleles, and a
single paternal allele, are increased, and the ratio of maternal
allele abundance to paternal allele abundance is higher in the
trisomic region. These signatures may be masked by differences in
DNA amplification and hybridization efficiency from locus to locus,
and from allele to allele.
[0197] Within a locus, the PCR differences are smaller than between
loci, because the same primers are responsible for all the
different allele amplicons at that locus. Therefore, the allele
ratios may be more stable than the overall allele abundances. This
can be exploited by identifying loci where the paternal allele is
distinct form the maternal allele and taking the ratio of the
paternal allele strength to the average of the maternal allele
strengths. These allele ratios then can be averaged over the
hypothesized aneuploidy region and compared to the average over a
control region. The distributions of these ratio values in the
hypothesized aneuploidy region and in the control region can be
compared to create an estimate of statistical significance for the
observed difference in means. A simple example of this procedure
would use Student's t-test.
[0198] Alternatively, the maternal allele strengths over the
suspected aneuploid region can be compared to those in the control
region, all without forming any ratios to paternal alleles. In this
approach, errors in the measurement of the paternal allele
abundances do not enter; however, the differences in amplification
efficiency between primer pairs do enter, and these typically will
be larger than differences between alleles in the same locus. In
this approach there also may be a residual bias between the
efficiencies averaged over certain chromosomes; therefore it may be
useful to perform the entire detection process resulting in an
observed abundance ratio for the mixed sample, do it also for the
maternal sample, and then take the ratio of ratios. This ratio of
ratios will be free of the chromosome bias; however, it will
include errors in the measurements of the maternal sample.
[0199] Because the fraction of fetal cells can be small or even
zero, the aneuploidy signal (the departure of the observed ratio
from unity) may be weak even when fetal aneuploidy is present. An
independent estimate of the fetal cell fraction, including a
confidence estimate of whether measurable fetal DNA is present at
all, is useful in interpreting the observed aneuploidy ratios. FIG.
7 illustrates allele signals re-ordered by rank. Assuming the
mother has no more than two alleles at each locus, the magnitude of
the third ranked allele is potentially a robust indicator of the
presence of fetal DNA. Although measurement errors can artificially
inflate the size of the third and fourth alleles, it is very
unlikely to result in a bimodal distribution for the relative
magnitude of the third allele with respect to the first two. Such a
bimodal distribution is cartooned in FIG. 8. The secondary peak of
this distribution occurs at a value approximately equal to the
fraction of fetal cells. This is one way to determine the value of
the variable f in the data model. The statistical confidence that
the bimodality is real can be used to assign a confidence that
fetal DNA was present in the mixed sample. Statistical tests for
bimodality are discussed in M Y Cheng and P Hall, J. R. Statist.
Soc. B (1998), 60 (Part 3) pp 579-589, and these authors prefer
bootstrap based methods. Only if this confidence exceeds a
threshold, say 99.9%, would an aneuploidy call be attempted. This
threshold needs to be quite stringent to avoid the expensive
mistake of declaring a fetus normal when in fact it is not. The
estimated fetal cell fraction can be used to interpret the
aneuploidy statistic: a large value of f and an observed aneuploidy
ratio very close to unity would suggest no aneuploidy; a small
value of f along with an aneuploidy ratio approximately equal to
1+f/2 would suggest trisomy, but it is still necessary to decide
whether the observed aneuploidy is significantly different from
unity and this requires an error model. A simple robust estimate of
the error distribution could come from repeated processing of
nominally identical samples.
Fitting of Data to the Model for Optimal Detection of
Aneuploidy
[0200] The data model can be used to simultaneously recover
estimates of the fraction of fetal cells, and efficient detection
of aneuploidies in hypothesized chromosomes or chromosomal
segments. This integrated approach should result in more reliable
and sensitive declarations of aneuploidy.
[0201] Equations 1-3 actually represent five different models
because of the ambiguity between m.sub.k1 and m.sub.k2 in the last
term of Equations 1 and 3. Testing for aneupoidy of Chromosomes 13,
18, and 21 then would entail 5.times.5.times.5=125 different model
variants that would be fit to the data.
[0202] The parameter values for the maternal allele identities are
taken from the results for the maternal-only sample and the
remaining parameters are fit to the data from the mixed sample.
Because the number of parameters is very large when the number of
loci is large, a global optimization requires iterative search
techniques. One possible approach is to do the following for each
model variant
i) Set f to 0 and solve for A.sub.k at each locus. ii) Set f to a
value equal to the smallest fetal/maternal cell ratio for which
fetal cells are likely to be detectable. iii) Solve for paternal
allele(s) identities and strengths at each locus, one locus at a
time, that minimize data-model residuals. iv) Fix the paternal
alleles and adjust f to minimize residuals over all the data. v)
Now vary only the A.sub.k to minimize residuals. Repeat iv and v
until convergence. vi) Repeat iii through v until convergence.
[0203] The best overall fit of model to data is selected from among
all the model variants. The best overall fit yields the values off
and A.sub.k we will call f.sub.max, A.sub.kmax. The likelihood of
observing the data given f.sub.max can be compared to the
likelihood given f=0. The ratio is a measure of the amount of
evidence for fetal DNA. A typical threshold for declaring fetal DNA
would be a likelihood ratio of 1000 or more. The likelihood
calculation can be approximated by a more familiar Chi-squared
calculation involving the sum of squared residuals between the data
and the model, where each residual is normalized by the expected
rms error. This Chi-squared is a good approximation to the
Log(likelihood) to the extent the expected errors in the data are
Gaussian additive errors, or can be made so by some amplitude
transformation of the data.
[0204] If based on the above determination of likelihood ratio it
is decided that fetal DNA is not present, then the test is declared
to be non-informative. If it is decided that fetal DNA is present,
then the likelihoods of the data given the different data model
types can be compared to declare aneuploidy. The likelihood ratios
of aneuploid models (Equations 2 and 3) to the normal model
(Equation 1) are calculated and these ratios are compared to a
predefined threshold. Typically this threshold would be set so that
in controlled tests all the trisomic cases would be declared
aneuploid, and so that it would be expected that the vast majority
(>99.9%) of all truly trisomic cases would be declared aneuploid
by the test. Given a limited patient cohort size for test
validation, one strategy to accomplish approximately the 99.9%
detection rate is to increase the likelihood ratio threshold beyond
that necessary to declare all the known trisomic cases in the
validation set by a factor of 1000/N, where N is the number of
trisomy cases in the validation set.
Error Modeling
[0205] The data contain small additive errors associated with the
readout technology, multiplicative errors associated with DNA
amplification and hybridization efficiencies being different from
locus to locus and from allele to allele within a locus, and errors
associated with imperfect specificity in the process. By including
the many parameters A.sub.k in the model, rather than a single
scale parameter, the residuals will include allele-to-allele
efficiency differences but not locus to locus differences. These
tend to be multiplicative errors in the resulting observed allele
strengths heights; i.e. two signals may be 20% different in
strength although the starting concentrations of the alleles were
identical. As a first approximation we can assume errors are random
from allele to allele, and have relatively small additive errors,
and larger Poisson and multiplicative error components. The
magnitudes of these error components can be estimated from repeated
processing of identical samples. The Chi-square residuals
calculation for any data-model fit then can be supported with these
modeled squared errors for any peak height or data bin.
[0206] Alternatively, when using a single A parameter, the
residuals will be larger and will contain a component which is
correlated between alleles at the same locus. Calculation of
likelihood will need to take this correlation into account.
Sequence CWU 1
1
9119DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 1tcgagtgcat tccattccg 19221DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
2atggaatggc atcaaacgga a 21315DNAArtificial SequenceDescription of
Artificial Sequence Synthetic probe 3tggctgtcca ttcca
15465DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotide 4atgcagcaag gcacagacta arcaaggaga
sgcaaaattt tcrtagggga gagaaatggg 60tcatt 65522DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
5atgcagcaag gcacagacta cg 22623DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 6agaggggaga gaaatgggtc att
23725DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 7caaggcacag actaagcaag gagag 25835DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
8ggcaaaattt tcatagggga gagaaatggg tcatt 35959DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
9cattagcggc cgcgaaatta atacgactca ctatagggag aaaaaaaaaa aaaaaaaab
59
* * * * *