U.S. patent application number 15/186774 was filed with the patent office on 2016-12-22 for systems and methods for determining aneuploidy risk using sample fetal fraction.
This patent application is currently assigned to Natera, Inc.. The applicant listed for this patent is Natera, Inc.. Invention is credited to Zachary Demko, Susan J. Gross, Katie Kobara, Allison Ryan.
Application Number | 20160371428 15/186774 |
Document ID | / |
Family ID | 57587067 |
Filed Date | 2016-12-22 |
United States Patent
Application |
20160371428 |
Kind Code |
A1 |
Ryan; Allison ; et
al. |
December 22, 2016 |
SYSTEMS AND METHODS FOR DETERMINING ANEUPLOIDY RISK USING SAMPLE
FETAL FRACTION
Abstract
Disclosed herein are system, method, and computer program
product embodiments for determining aneuploidy risk in a target
sample of maternal blood or plasma based on the amount of fetal
DNA. An embodiment operates by receiving known genetic data from
known prenatal testing samples and genetic data for the target
sample. A fetal fraction distribution is determined for the known
genetic data based on gestational age and the maternal weight
associated with the target sample. A model is then generated based
on a fixed ratio reduction of the determined fetal fraction
distribution. A fetal fraction based data likelihood for the target
sample is then determined for each of the plurality of ploidy
states using the generated model. An aneuploidy risk score is then
outputted based on applying a Bayesian probability determination
that combines each fetal fraction based data likelihood with a
previously determined risk score as a conditional value.
Inventors: |
Ryan; Allison; (Redwood
City, CA) ; Kobara; Katie; (San Francisco, CA)
; Demko; Zachary; (San Francisco, CA) ; Gross;
Susan J.; (Bronx, NY) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Natera, Inc. |
San Carlos |
CA |
US |
|
|
Assignee: |
Natera, Inc.
San Carlos
CA
|
Family ID: |
57587067 |
Appl. No.: |
15/186774 |
Filed: |
June 20, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62182085 |
Jun 19, 2015 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G16B 20/00 20190201;
C12Q 1/6883 20130101; C12Q 1/6869 20130101 |
International
Class: |
G06F 19/18 20060101
G06F019/18; C12Q 1/68 20060101 C12Q001/68 |
Claims
1. A method for determining aneuploidy risk in a target sample,
comprising: receiving known genetic data from a plurality of known
noninvasive prenatal testing samples; receiving genetic data for
the target sample, the genetic data including a gestational age, a
maternal weight, and a fetal fraction associated with the target
sample; determining a fetal fraction distribution for the received
known genetic data based on the gestational age and the maternal
weight associated with the target sample; generating a model for a
plurality of ploidy states based on a fixed ratio reduction of the
determined fetal fraction distribution compared to an expected
average fetal fraction for the gestational age and the maternal
weight associated with the target sample; determining a fetal
fraction based data likelihood for the target sample for each of
the plurality of ploidy states using the generated model and the
fetal fraction associated with the target sample; applying a
Bayesian probability determination to combine each fetal fraction
based data likelihood with a previously determined risk score as a
conditional value; and outputting an aneuploidy risk score for the
target sample based on the applying.
2. The method of claim 1, wherein the previously determined risk
score is a SNP based risk score.
3. The method of claim 1, further comprising: transforming the
determined fetal fraction distribution to logarithm space, wherein
a logarithm of the fetal fraction is assumed Gaussian distributed
with a mean and standard deviation that are a function of
gestational age and maternal weight for the known prenatal testing
samples.
4. The method of claim 1, wherein determining a fetal fraction
based data likelihood for the target sample comprises computing an
integral of a probability density function of the generated
model.
5. The method of claim 1, wherein the generated model is associated
with trisomy 13.
6. The method of claim 1, wherein the generated model is associated
with trisomy 18.
7. The method of claim 1, wherein the generated model is associated
with maternal triploidy.
8. The method of claim 1, wherein determining a fetal fraction
distribution for the received known genetic data comprises:
grouping the genetic data for the plurality of known prenatal
testing samples into sets according to gestational age and maternal
weight; and generating a grid of distribution parameters
corresponding to each set, wherein the distribution parameters
include average fetal fraction and standard deviation.
9. A system for determining aneuploidy risk in a target sample,
comprising: means for receiving known genetic data from a plurality
of known noninvasive prenatal testing samples; means for receiving
genetic data for the target sample, the genetic data including a
gestational age, a maternal weight, and a fetal fraction associated
with the target sample; means for determining a fetal fraction
distribution for the received known genetic data based on the
gestational age and the maternal weight associated with the target
sample; means for generating a model for a plurality of ploidy
states based on a fixed ratio reduction of the determined fetal
fraction distribution compared to an expected average fetal
fraction for the gestational age and the maternal weight associated
with the target sample; means for determining a fetal fraction
based data likelihood for the target sample for each of the
plurality of ploidy states using the generated model and the fetal
fraction associated with the target sample; means for applying a
Bayesian probability determination to combine each fetal fraction
based data likelihood with a previously determined risk score as a
conditional value; and means for outputting an aneuploidy risk
score for the target sample based on the applying.
10. The method of claim 9, wherein the previously determined risk
score is a SNP based risk score.
11. The method of claim 9, further comprising: means for
transforming the determined fetal fraction distribution to
logarithm space, wherein a logarithm of the fetal fraction is
assumed Gaussian distributed with a mean and standard deviation
that are a function of gestational age and maternal weight for the
known prenatal testing samples.
12. The method of claim 9, wherein the means for determining a
fetal fraction based data likelihood for the target sample
comprises means for computing an integral of a probability density
function of the generated model.
13. The method of claim 9, wherein the generated model is
associated with trisomy 13.
14. The method of claim 9, wherein the generated model is
associated with trisomy 18.
15. The method of claim 9, wherein the generated model is
associated with maternal triploidy.
16. The method of claim 9, wherein the means for determining a
fetal fraction distribution for the received known genetic data
comprises: means for grouping the genetic data for the plurality of
known prenatal testing samples into sets according to gestational
age and maternal weight; and means for generating a grid of
distribution parameters corresponding to each set, wherein the
distribution parameters include average fetal fraction and standard
deviation.
17. A system for determining aneuploidy risk in a target sample,
comprising: a known testing samples database containing known
genetic data from a plurality of known noninvasive prenatal testing
samples; a target sample database containing genetic data for at
least the target sample, the genetic data including a gestational
age, a maternal weight, and a fetal fraction associated with the
target sample; an aneuploidy risk analysis system in communication
with the known testing samples database and the target sample
database, the aneuploidy risk analysis system comprises: a logical
element configured to determine a fetal fraction distribution for
the received known genetic data based on the gestational age and
the maternal weight associated with the target sample; a modeling
engine configured to generate a model for a plurality of ploidy
states based on a fixed ratio reduction of the determined fetal
fraction distribution compared to an expected average fetal
fraction for the gestational age and the maternal weight associated
with the target sample; and a probability engine configured to
determine a fetal fraction based data likelihood for the target
sample for each of the plurality of ploidy states using the
generated model and the fetal fraction associated with the target
sample, apply a Bayesian probability determination to combine each
fetal fraction based data likelihood with a previously determined
risk score as a conditional value, and output an aneuploidy risk
score for the target sample based on the Bayesian probability
determination.
18. The system of claim 17, wherein the previously determined risk
score is a SNP based risk score.
19. The system of claim 17, wherein the modeling engine is further
configured to transform the determined fetal fraction distribution
to logarithm space, wherein a logarithm of the fetal fraction is
assumed Gaussian distributed with a mean and standard deviation
that are a function of gestational age and maternal weight for the
known prenatal testing samples.
20. The system of claim 17, wherein the logic element is configured
to determine a fetal fraction based data likelihood for the target
sample by computing an integral of a probability density function
of the generated model.
21. The system of claim 17, wherein the generated model is
associated with trisomy 13.
22. The system of claim 17, wherein the generated model is
associated with trisomy 18.
23. The system of claim 17, wherein the generated model is
associated with maternal triploidy. The system of claim 17, wherein
the probability engine is configured to determine a fetal fraction
distribution for the received known genetic data by grouping the
genetic data for the plurality of known prenatal testing samples
into sets according to gestational age and maternal weight, and
generating a grid of distribution parameters corresponding to each
set, wherein the distribution parameters include average fetal
fraction and standard deviation.
24. The system of claim 17, further comprising a DNA sequencer in
communication with the target sample database and configured to
supply genetic data about the target sample.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional
Patent Application No. 62/182,085, filed Jun. 19, 2015, which is
hereby incorporated by reference herein in its entirety.
FIELD OF THE INVENTION
[0002] The present invention generally relates to molecular biology
methods and systems, and more specifically to methods and systems
for determining aneuploidy risk in a target maternal blood
sample.
BACKGROUND
[0003] Noninvasive prenatal testing using cell-free DNA (cfDNA) can
be used to detect abnormalities in a fetus. As a result,
noninvasive prenatal testing is rapidly becoming part of clinical
care for pregnant women.
[0004] Noninvasive prenatal testing is used to determine the
genetic state of a fetus from genetic material that is obtained in
a noninvasive manner, for example from a blood draw on the pregnant
mother. The blood could be separated and the plasma isolated, and
size selection can optionally be used to isolate the DNA of the
appropriate length. This isolated DNA can then be measured by a
number of means, such as by hybridizing to a genotyping array and
measuring the fluorescence, or by sequencing on a high throughput
sequencer.
[0005] Single Nucleotide Polymorphism (SNP) based noninvasive
prenatal testing is one type of noninvasive prenatal test. SNP
based noninvasive prenatal testing is often used to screen for
fetal aneuploidies. But the accuracy of SNP based tests is
dependent on the amount of fetal DNA present in a maternal blood or
plasma sample. SNP based testing returns no call results when the
amount of fetal DNA is not sufficient to provide the desired
accuracy.
[0006] Low amounts of fetal DNA may be caused by a number of
factors. One common factor is maternal weight. For example, as
maternal weight increases, the amount of fetal DNA in maternal
blood plasma, or other fluids often decreases. Thus, SNP based
noninvasive prenatal tests that screen for fetal aneuploidies are
sometimes unavailable to pregnant women.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The presently disclosed embodiments will be further
explained with reference to the attached drawings, wherein like
structures are referred to by like numerals throughout the several
views. The drawings shown are not necessarily to scale, with
emphasis instead generally being placed upon illustrating the
principles of the presently disclosed embodiments.
[0008] FIG. 1 illustrates a fetal fraction distribution, according
to an example embodiment.
[0009] FIG. 2 illustrates a log normal fetal fraction distribution,
according to an example embodiment.
[0010] FIG. 3A illustrates a generated model for 19 weeks
gestational age, according to an example embodiment.
[0011] FIG. 3B illustrates a generated model for 13 weeks
gestational age, according to an example embodiment.
[0012] FIG. 4 is a flow chart of a method according to one
embodiment of the invention.
[0013] FIG. 5 illustrates an example computer system for performing
embodiments of the present invention.
[0014] FIG. 6 illustrates an example system for performing
embodiments of the present invention.
[0015] FIG. 7 illustrates a posterior fetal fraction risk
distribution, according to an example embodiment.
[0016] FIG. 8 illustrates a result set for an example embodiment
for fetal fraction-based high risk assessment that predicts an
aneuploidy in cases with low fetal fraction.
[0017] FIG. 9 illustrates a redraw success rate distribution,
according to an example embodiment.
[0018] FIG. 10 illustrates a distribution of fetal fraction based
risk scores in cases identified as high risk and low fetal
fraction, according to an example embodiment.
[0019] FIG. 11A illustrates an estimated detection rate for trisomy
13 and 18, according to an example embodiment.
[0020] FIG. 11B illustrates an estimated detection rate for digynic
triploidy, according to an example embodiment.
[0021] FIG. 12 illustrates a probability density function (PDF) of
normalized euploid data, according to an example embodiment.
[0022] FIG. 13 illustrates a cumulative distribution function (CDF)
of normalized euploid data, according to an example embodiment.
[0023] FIG. 14 illustrates a plot of redraw success rate, according
to an example embodiment.
[0024] FIG. 15 illustrates a result set for identified high risk
samples, according to an example embodiment.
[0025] While the above-identified drawings set forth presently
disclosed embodiments, other embodiments are also contemplated, as
noted in the discussion. This disclosure presents illustrative
embodiments by way of representation and not limitation. Numerous
other modifications and embodiments can be devised by those skilled
in the art which fall within the scope and spirit of the principles
of the presently disclosed embodiments.
DETAILED DESCRIPTION
[0026] Provided herein are system, method and/or computer program
product embodiments, and/or combinations and sub-combinations
thereof, for determining aneuploidy risk in a target sample of
maternal blood, plasma, or other fluid based on the amount of fetal
DNA. Such embodiments may be used in situations where a low or
extremely low fetal fraction renders traditional aneuploidal risk
methodologies inconclusive or inaccurate. For example, such
embodiments may be used to determine the risk of trisomy 13,
trisomy 18, or maternal triploidy, which are all aneuploidies
associated with a low or extremely low fetal fraction. An
embodiment operates by receiving genetic data for a target sample
(sample of interest) of maternal blood, plasma, or other fluid, and
known genetic data from a plurality of known noninvasive prenatal
testing samples. A fetal fraction distribution is determined for
the received known genetic data based on gestational age and the
maternal weight associated with the target sample. A model for a
plurality of ploidy states is then generated based on a fixed ratio
reduction of the determined fetal fraction distribution. A fetal
fraction based data likelihood for the target sample is then
determined for each of the plurality of ploidy states using the
generated model and the fetal fraction associated with the target
sample. An aneuploidy risk score is then output for the target
sample based on applying a Bayesian probability determination that
combines each fetal fraction based data likelihood with a
previously determined risk score as a conditional value.
Accordingly, aneuploidy risk can be determined in a target sample
even when the sample contains a low amount of fetal DNA. This
allows aneuploidy risk to be determined even when SNP based
noninvasive prenatal testing would be unreliable or
unavailable.
[0027] The term "obtaining genetic data" as used herein refers to
both, unless indicated otherwise by context, (1) acquiring DNA
sequence information by laboratory techniques, e.g. use of an
automated high throughput DNA sequencer, and (2) acquiring
information that had been previously obtained by laboratory
techniques, wherein the information is electronically transmitted
to an analyzer, e.g. by computer over the Internet, by electronic
transfer from the sequencing device, etc.
[0028] The term "aneuploidy" refers to the state where the wrong
number of chromosomes are present in a cell. In the case of a
somatic human cell it refers to the case where a cell does not
contain 22 pairs of autosomal chromosomes and one pair of sex
chromosomes. In the case of a human gamete, it refers to the case
where a cell does not contain one of each of the 23 chromosomes. In
the case of a single chromosome, it refers to the case where more
or less than two homologous but nonidentical chromosomes are
present, and where each of the two chromosomes originate from a
different parent.
[0029] The term "ploidy state" refers to the quantity and
chromosomal identity of one or more chromosomes in a cell.
[0030] Certain aneuploidies are often associated with a reduced
amount of fetal DNA in the target sample. For example, trisomy 13,
trisomy 18, and maternal triploidy are often associated with a
reduced amount of fetal DNA in the target sample. Embodiments of
the invention determine aneuploidy risk in a target maternal sample
based on a relationship between the amount of fetal DNA in the
target sample and the presence of certain aneuploidies.
[0031] FIG. 6 illustrates a system for performing embodiments of
the present invention.
[0032] FIG. 6 includes an analysis system 602 for determining a
risk of fetal aneuploidy. Analysis system 602 may include one or
more processors for executing the functions described herein. Such
functions may be implemented on the processor as engines or logical
elements that perform the analytical functionality described
herein, such as a modeling engine and a probability engine.
Interaction of a user with such analytical engines may be conducted
through an appropriate user interface. Analysis system 602 may be
coupled to a database of known samples 604 via, for example, a
network 610. Network 610 may be any type of communication network,
including intranets, local area networks, or wide area networks
such as the Internet. Genetic data from samples with a known ploidy
state may be used to form a baseline for comparison with a target
sample in question, as discussed further below. Database 604 may be
a collection of data from a variety of sources including clinical
studies and commercial data sets.
[0033] In an embodiment, a fetal fraction distribution is defined
for such known genetic data from the plurality of known prenatal
testing samples by analysis system 602. The fetal fraction
distribution may be based on the maternal weight and the
gestational age corresponding to each sample. This is because
gestational age and maternal weight are often factors for the
amount of fetal DNA present in a maternal blood sample. A fetal
fraction distribution may be defined for known genetic data from a
plurality of noninvasive known prenatal testing samples.
[0034] The plurality of known prenatal testing samples may be
selected based on various criteria to ensure an accurate and
representative fetal fraction distribution. In an embodiment, known
genetic data for a known prenatal testing sample may be selected or
filtered for inclusion in the fetal fraction distribution based on
an associated low aneuploidy risk result, a no call result due to
low fetal fraction, and a low confidence result. Known genetic data
for a known prenatal testing sample may also be selected based on
whether the maternal weight associated with the sample is available
or whether the sample was collected in a clinical trial in the
United States or a foreign country. A selection based on country of
origin may be done to prevent unit conversion uncertainty in
maternal weight for the sample.
[0035] In an embodiment, known genetic data for a plurality of
known prenatal testing samples may be grouped into sets according
to gestational age and maternal weight. In an embodiment, known
genetic data for the plurality of known prenatal testing samples
may include sample data taken at a gestational age ranging from 9
to 20 weeks at one week increments. Known genetic data for the
plurality of known prenatal testing samples may also include sample
data corresponding to a maternal weight ranging from 110 to 250
pounds at 20 pounds increments. In an embodiment, sampling of the
known genetic data from known prenatal testing samples may be
accurate to within plus or minus ten days of gestational age and
plus or minus five pounds of maternal weight.
[0036] In an embodiment, the average fetal fraction is computed for
the known genetic data in each set of known prenatal testing
samples. The standard deviation may also be computed. In an
embodiment, the average fetal fraction and standard deviation is
only computed for sets of known prenatal testing samples containing
at least 50 samples. This may be done to ensure an accurate and
representative fetal fraction distribution. The result is a grid of
distribution parameters (e.g. average fetal fractions and standard
deviations) that correspond to the grid of sample conditions.
[0037] FIG. 1 illustrates an example fetal fraction distribution
based on known genetic data from a plurality of known prenatal
testing samples grouped according to gestational age and maternal
weight, according to an example embodiment. In the example of FIG.
1, a set of known prenatal testing samples is grouped together
based on the gestational ages of 9 weeks, 12 weeks, and 18 weeks.
Moreover, each set of known prenatal testing samples is further
grouped together based on maternal weight. The average fetal
fraction is computed for the known genetic data of each resulting
set of known prenatal testing samples.
[0038] For example, in FIG. 1, the average fetal fraction of
prenatal testing samples with a maternal weight of 200 lbs. and a
gestational age of 9 weeks is around 0.06. The average fetal
fraction of prenatal testing samples with a maternal weight of 200
lbs. and a gestational age of 12 weeks is around 0.07. The average
fetal fraction of prenatal testing samples with a maternal weight
of 200 lbs. and a gestational age of 18 weeks is around 0.08.
[0039] Given a particular gestational age and fetal fraction, a
fetal fraction distribution may become more symmetric when
transformed to log space. Therefore, modeling of fetal fraction may
be conducted in log space.
[0040] In an embodiment, the fetal fraction distribution may be
transformed to a log-normal distribution. In other words, the fetal
fraction distribution may transformed to a continuous probability
distribution of the fetal fraction whose logarithm is normally
distributed. Specifically, the logarithm of the fetal fraction is
assumed Gaussian distributed with a mean and standard deviation
that are a function of gestational age and maternal weight for the
known genetic data of the known prenatal testing samples.
[0041] FIG. 2 illustrates an example log normal fetal fraction
distribution based on the transformation of a fetal fraction
distribution to log space, according to an example embodiment. In
the example of FIG. 2, the logarithm of the fetal fraction is
assumed Gaussian distributed with a mean and standard deviation
that are a function of gestational age and maternal weight for the
known prenatal testing samples.
[0042] In the example of FIG. 2, for known genetic data for around
800 known prenatal testing samples, the gestational age is 10 weeks
plus or minus 10 days and the maternal weight is 230 pounds plus or
minus 5 pounds. Thus, in FIG. 2, the log normal fetal fraction
distribution represents a probability density function (PDF) that
describes the relative likelihood for fetal fraction to take on a
given value where the gestational age is around 10 weeks plus or
minus 10 days and the maternal weight is 230 pounds plus or minus 5
pounds.
[0043] In an embodiment, the probability of having an aneuploidy
can be computed from the log normal fetal fraction distribution.
Specifically, the probability of having an aneuploidy can be
computed as the integral of the PDF over a defined range.
[0044] In an embodiment, the effect of an aneuploidy may be modeled
as a fixed rate reduction in the average fetal fraction compared to
the expected average fetal fraction for a given maternal weight and
gestational age. For example, the average fetal fraction of a
trisomy 13 pregnancy may be 80% of the average fetal fraction for a
euploid pregnancy of the same maternal weight and gestational age.
Trisomy 13, trisomy 18, and maternal triploidy may be modeled using
a fixed rate reduction in the average fetal fraction. As would be
appreciated by a person of ordinary skill in the art, the effect of
an aneuploidy may be modeled according to various other reductions
in the average fetal fraction compared to the expected average
fetal fraction for a given maternal weight and gestational age.
[0045] In an embodiment, a model may be generated for a plurality
of ploidy states based on the fixed ratio reduction of the fetal
fraction distribution. A ploidy state may be referred to as a
hypothesis.
[0046] A fetal fraction distribution may be transformed to a
log-normal distribution of fetal fraction prior to generation of a
model. In an embodiment, a model may be generated for three
hypotheses: trisomy 13, trisomy 18, and maternal triploidy.
[0047] In an embodiment for a log-normal distribution of fetal
fraction, a fixed rate reduction in the average fetal fraction
corresponds to a constant subtracted offset. Thus, for a pregnancy
with a particular gestational age and maternal weight, the log
fetal fraction for euploid prenatal testing samples is Gaussian
distributed with a mean m and a standard deviation s, but the log
fetal fraction for prenatal testing samples with an aneuploidy is
Gaussian distributed with a mean m-c and a standard deviation s-c
where c is a constant subtracted offset for a given aneuploidy. As
would be appreciated by a person of ordinary skill in the art, a
constant subtracted offset for a given aneuploidy may be determined
by an analysis of empirical data.
[0048] In an embodiment, there may be a single constant subtracted
offset for trisomies 13 and 18 and a different offset for maternal
triploidy. In an embodiment, the constant subtracted offset for
trisomies 13 and 18 is log(0.79). In other words, in this example,
the mean for the trisomy 13 and 18 hypothesis distributions are
reduced by log(0.79).
[0049] In an embodiment, the constant substracted offset for
maternal triploidy is log(0.22). In other words, in this example,
the mean of the maternal triploidy hypothesis distribution is
reduced by log(0.22).
[0050] Returning to FIG. 6, analysis system 602 may also be coupled
to a database 606 containing genetic data for a target sample,
either directly or over network 610. Genetic data about the target
sample, stored in database 606, may have been obtained from, for
example, a sequencer 608. The target sample is one for which a
fetal aneuploidy risk is to be determined. While the examples
herein will refer to maternal blood, one of skill in the art will
recognize that the target sample may be, for example, a maternal
blood or plasma containing both maternal DNA and fetal DNA. Such
DNA may be, for example, cell-free DNA. As would be appreciated by
a person of ordinary skill in the art, a target maternal blood
sample that contains fetal DNA may be obtained using various
methods.
[0051] In some embodiments of the invention, the obtained prenatal
target sample is modified using standard molecular biology
techniques in order to be sequenced on a DNA sequencer, such as
sequencer 608. In some embodiments, the technique will involve
forming a genetic library containing priming sites for the DNA
sequencing procedure. A plurality of loci may be targeted for site
specific amplification. In some embodiments the targeted loci are
polymorphic loci, e.g., a single nucleotide polymorphisms. In
embodiments employing the formation of genetic libraries, libraries
may be encoded using a DNA sequence that is specific for the
patient, e.g. barcoding, thereby permitting multiple patients to be
analyzed in a single flow cell (or flow cell equivalent) of a high
throughput DNA sequencer. Although the samples are mixed together
in the DNA sequencer flow cell, the determination of the sequence
of the barcode permits identification of the patient source that
contributed the DNA that had been sequenced.
[0052] Methods are known in the art for obtaining genetic data from
a sample. Typically this involves amplification of DNA in the
sample, a process which transforms a small amount of genetic
material to a larger amount of genetic material that contains a
similar set of genetic data. This can be done by a wide variety of
methods, including, but not limited to, Polymerase Chain Reaction
(PCR), ligand mediated PCR, degenerative oligonucleotide primer
PCR, Multiple Displacement Amplification, allele-specific
amplification techniques, Molecular Inversion Probes (MIP), padlock
probes, other circularizing probes, and combination thereof. Many
variants of the standard protocol can be used, for example
increasing or decreasing the times of certain steps in the
protocol, increasing or decreasing the temperature of certain
steps, increasing or decreasing the amounts of various reagents,
etc. The DNA amplification transforms the initial sample of DNA
into a sample of DNA that is similar in the set of sequences, but
of much greater quantity. In some cases, amplification may not be
required.
[0053] The genetic data of the target sample can be transformed
from a molecular state to an electronic state by measuring the
appropriate genetic material using tools and or techniques taken
from a group including, but not limited to: genotyping microarrays,
and high throughput sequencing. Some high throughput sequencing
methods and systems include Sanger DNA sequencing, pyrosequencing,
the ILLUMINA SOLEXA platform, ILLUMINA's GENOME ANALYZER,
ILLUMINA's HISEQ or MISEQ, APPLIED BIOSYSTEM's SOLiD platform, ION
TORRENT'S PGM or PROTON platforms, HELICOS's TRUE SINGLE MOLECULE
SEQUENCING platform, HALCYON MOLECULAR's electron microscope
sequencing method, or any other sequencing method. All of these
methods physically transform the genetic data stored in a sample of
DNA into a set of genetic data that is typically stored in a memory
device en route to being processed.
[0054] In an embodiment, a fetal fraction based data likelihood for
a target sample may be computed by analysis system 602 for each
ploidy state (e.g., trisomy 13, trisomy 18, and maternal triploidy)
using the generated model and the fetal fraction associated with
the target sample, where each ploidy state corresponds to a
hypothesis. Specifically, a fetal fraction based data likelihood
for a target sample may be computed for each hypothesis (e.g.
trisomy 13, trisomy 18, maternal triploidy, etc.) by evaluating the
Gaussian probability density function at the observed log value of
the fetal fraction associated with the target sample at each of the
three hypotheses.
[0055] FIG. 3A illustrates an example of a generated model for
trisomy 13, trisomy 18, and maternal triploidy based on a fixed
ratio reduction of a determined fetal fraction distribution,
according to an embodiment. Specifically, FIG. 3A illustrates an
example of a generated model for trisomy 13, trisomy 18, and
maternal triploidy where the gestational age is 19 weeks and the
maternal weight is 166 pounds. Thus, in FIG. 3A, a fetal fraction
based data likelihood for a target sample with a gestational age of
19 weeks and a maternal weight of 166 pounds may be computed for
trisomy 13, trisomy 18, and maternal triploidy by evaluating the
respective Gaussian probability density function at the observed
log value of the fetal fraction associated with the target
sample.
[0056] For example, in FIG. 3A, the fetal fraction based data
likelihood of trisomy 13 or trisomy 18 for a target sample with a
fetal fraction of 0.10, a maternal weight of 166 pounds, and a
gestational age of 19 weeks is around 35%. Similarly, the fetal
fraction based data likelihood of trisomy 13 or trisomy 18 for a
target sample with a fetal fraction of 0.20, a maternal weight of
166 pounds, and a gestational age of 19 weeks is around 10%.
[0057] FIG. 3B illustrates an example of a generated model for
trisomy 13, trisomy 18, and maternal triploidy based on a fixed
ratio reduction of a determined fetal fraction distribution,
according to an embodiment. Specifically, FIG. 3B illustrates an
example of a generated model for trisomy 13, trisomy 18, and
maternal triploidy where the gestational age is 13 weeks and the
maternal weight is 166 pounds. Thus, in FIG. 3B, a fetal fraction
based data likelihood for a target sample with a gestational age of
13 weeks and a maternal weight of 166 pounds may be computed for
trisomy 13, trisomy 18, and maternal triploidy by evaluating the
respective Gaussian probability density function at the observed
log value of the fetal fraction associated with the target
sample.
[0058] By determining fetal fraction based data likelihoods for
different ploidy states using a generated model for a target
sample, an aneuploidy risk score for the fetus associated with the
target sample may be determined. Specifically, in an embodiment,
each fetal fraction based data likelihood can be combined with a
previously determined risk score in order to determine the
aneuploidy risk score for the fetus associated with the target
sample. A previously determined risk score may be, for example, an
age based prior risk score for the mother associated with the
target sample. In another example, a previously determined risk
score may be a SNP-based prior risk score. As would be appreciated
by a person of ordinary skill in the art, a previously determined
risk score may be based on other prior risk factors, including a
combination of prior risk factors.
[0059] In an embodiment, an aneuploidy risk score for the fetus
associated with the target sample may be determined based on the
posterior probability of the presence of any of trisomy 13, trisomy
18, and maternal triploidy. Specifically, the fetal fraction based
data likelihoods may be combined with previously determined risk
scores for trisomy 13, trisomy 18, and maternal triploidy using
Bayes' theorem to determine an aneuploidy risk score for the fetus
associated with the target sample. In an embodiment, the previously
determined risk scores for trisomy 13 and trisomy 18 depend on
maternal age and gestational age and may be determined empirically.
In an embodiment, the previously determined risk score for maternal
triploidy is 1/5505.
[0060] FIG. 4 is a flowchart of a method 400 for determining
aneuploidy risk in a target maternal blood sample, according to an
embodiment. Method 400 can be performed by processing logic that
can comprise hardware (e.g., circuitry, dedicated logic,
programmable logic, microcode, etc.), software (e.g., instructions
run on a processing device), or a combination thereof. Such
processing logic may be implemented in, for example, analysis
system 602.
[0061] In step 402 of FIG. 4, known genetic data from a plurality
of known noninvasive prenatal testing samples is received. As would
be appreciated by a person of ordinary skill in the art, the known
genetic data from the plurality of known prenatal testing samples
may be received from a variety of sources including clinical
studies and commercial data sets. Moreover, as would be appreciated
by a person of ordinary skill in the art, a fetal fraction
distribution may be defined for known genetic data from a plurality
of noninvasive known prenatal testing samples, a plurality of
invasive known prenatal testing samples, or a combination of
both.
[0062] The received known genetic data from the plurality of known
prenatal testing samples may be optionally filtered based on
various criteria to ensure that an accurate and representative
fetal fraction distribution is determined in step 406. In an
embodiment, known genetic data for the known prenatal testing
samples may be filtered based on an associated low aneuploidy risk
result, a no call result due to low fetal fraction, and a low
confidence result. The received known genetic data for the known
prenatal testing samples may also be filtered based on whether the
maternal weight associated with a sample is available or whether a
sample was collected in a clinical in the United States or a
foreign country. The filtering based on country of origin may be
done to prevent unit conversion uncertainty in maternal weight for
a sample.
[0063] In step 404 of FIG. 4, genetic data for a target maternal
blood sample containing fetal DNA is received. The genetic data
includes at least gestational age of the associated fetus, a
maternal weight, and a fetal DNA fraction of the target sample. As
would be appreciated by a person of ordinary skill in the art, a
target maternal blood sample that contains fetal DNA may be
obtained using various methods.
[0064] In step 406 of FIG. 4, a fetal fraction distribution is
determined for the known genetic data from step 402. The determined
fetal fraction distribution is based on the maternal weight and the
gestational age associated with the target blood sample of step
404. In other words, the received known genetic data for the
plurality of known prenatal testing samples is grouped into sets
according to gestational age and maternal weight. As discussed
above, the sampling of the known genetic data from known prenatal
testing samples may be done at various intervals for gestational
age and maternal weight.
[0065] For each set of known prenatal testing samples, the average
fetal fraction is then computed. In an embodiment, the average
fetal fraction may only be computed where a set of known prenatal
testing samples includes a minimum number of 50 samples. This may
be done to ensure an accurate and representative fetal fraction
distribution.
[0066] In step 408 of FIG. 4, the fetal fraction distribution is
transformed to a log-normal distribution. In an embodiment, the
logarithm of the fetal fraction is assumed Gaussian distributed
with a mean and standard deviation that are a function of
gestational age and maternal weight for the received known genetic
data of step 402. As would be appreciated by a person of ordinary
skill in the art, the log normal fetal fraction distribution
represents a PDF that describes the relative likelihood for fetal
fraction to take on a given value where the gestational age is
equal to gestational age and the maternal weight associated with
the received genetic data for the target sample of step 404.
[0067] In step 410 of FIG. 4, a model is generated for a plurality
of ploidy states based on the log-normal distribution of fetal
fraction of step 408. In an embodiment, trisomy 13, trisomy 18, and
maternal triploidy distributions are generated from the log-normal
distribution of fetal fraction of step 408. This involves reducing
the mean for the trisomy 13, trisomy 18, and maternal triploidy
distributions by respective constant subtracted offset. As would be
appreciated by a person of ordinary skill in the art, the constant
subtracted offsets for the trisomy 13, trisomy 18, and maternal
triploidy distributions may be determined experimentally.
[0068] In step 412 of FIG. 4, fetal fraction based data likelihoods
for the received target sample of step 404 are computed for each of
the ploidy states using the generated model of step 410 and the
fetal fraction associated with the target sample. In an embodiment,
a fetal fraction based data likelihood for the received target
sample is computed for trisomy 13, trisomy 18, and maternal
triploidy by evaluating the Gaussian probability density functions
for trisomy 13, trisomy 18, and maternal triploidy at the observed
log value of the fetal fraction associated with the target
sample.
[0069] In step 414 of FIG. 4, a Bayesian probability determination
is applied to combine the fetal fraction based data likelihoods of
step 412 with previously determined risk scores. As would be
appreciated by a person of ordinary skill in the art, a previously
determined risk score may be an age based prior risk score for the
mother associated with the target sample or an SNP-based prior risk
score.
[0070] In step 416 of FIG. 4, aneuploidy risk scores for trisomy
13, trisomy 18, and maternal triploidy are output based on the
applying in step 414. As would be appreciated by a person of
ordinary skill in the art, the outputting may be performed using
various methods and mediums.
[0071] In an embodiment, the aneuploidy risks scores for trisomy
13, trisomy 18, and maternal triploidy are independently
determined. Because each aneuploidy risk score is an independent
posterior probability of the presence of either trisomy 13, trisomy
18, or maternal triploidy, the resulting aneuploidy risk scores can
be compared to identify the most likely ploidy state.
[0072] In an embodiment, a probability that the sample is euploid
is also determined and taken into account.
[0073] In this manner, an additional type of analysis is made
available to individuals whose aneuploidy risk may not be able to
be determined by traditional methods, such as SNP-based methods.
This analysis may also be used to confirm a previously determined
risk score in situations where extremely low fetal fraction is an
issue.
[0074] FIG. 7 illustrates a posterior fetal fraction risk
distribution, according to an example embodiment. In the example of
FIG. 7, a posterior risk distribution is computed by combining data
likelihoods with prior risk for a gestational age between 9 and 11
weeks. The cutoff is at 1/100 risk. This sets the fetal fraction
limit for a high risk call.
[0075] FIG. 8 illustrates a result set for a pilot study of an
example embodiment for fetal fraction-based high risk assessment
that predicts an aneuploidy in cases with low fetal fraction. The
result set of FIG. 8 indicates that the example embodiment for
fetal fraction-based risk assessment is able to predict
abnormalities in a clinical data sample set. Specifically, in the
example of FIG. 8, there were 143 cases with high risk, low fetal
fraction. 70 cases were with karyotype. There was a 10% positive
predictive value (PPV) if the associated clinical sample data set
was restricted to cases with karyotype and a 4.9% PPV if missing
karyotypes were assumed unaffected. FIG. 8 illustrates some of the
abnormalities detected in the pilot study.
[0076] FIG. 9 illustrates a redraw success rate distribution,
according to an example embodiment. FIG. 9 shows fetal fraction
change observed from approximately 3,000 Non-Invasive Prenatal
Testing (NIPT) redraws. The example embodiment of FIG. 9 provides
useful information when an embodiment for NIPT single-nucleotide
polymorphism (SNP) fails to provide a prediction. Specifically, the
example embodiment of FIG. 9 provides a fetal fraction-based risk
score and a probability of successful call on redraw, making it
possible to predict redraw success based on a predicted range of
redraw fetal fraction.
[0077] FIG. 10 illustrates a distribution of fetal fraction based
risk scores in cases identified as high risk and low fetal
fraction, according to follow up study of an example embodiment.
For example, FIG. 10 shows that roughly 5 cases had a fetal
fraction based risk score of 0.2. In the follow-up study, the
objective was to test whether high fetal fraction-based risk
predicts aneuploidy in cases with unusually low fetal fraction. An
attempt to collect follow up was made for 896 samples, where the
adjusted fetal fraction was below approximately the 2.sup.nd
percentile, and the maternal weight was available. 525 samples were
eligible for inclusion in the follow up study, from domestic
clinics and direct sales clinics. 143 samples were identified as
having high fetal fraction-based risk with low fetal fraction. In
particular, the fetal fraction-based risk was greater than 0.01 and
the fetal fraction was 2.5 SD below mean. Karyotype was available
for 70 samples.
[0078] FIG. 11A illustrates an estimated detection rate for trisomy
13 and 18, according to an example embodiment. Specifically, FIG.
11A illustrates what fraction of affected cases that are not
identified by a NIPT SNP embodiment will be identified by the fetal
fraction-based risk score >1/100. The estimated detection rate
is based on the sample data set of FIG. 10. In FIG. 11A, the
estimated detection rate for trisomy 13/18 is 91.4%.
[0079] FIG. 11B illustrates an estimated detection rate for digynic
triploidy, according to an example embodiment. Specifically, FIG.
11B illustrates what fraction of affected cases that are not
identified by a NIPT SNP embodiment will be identified by the fetal
fraction-based risk score >1/100. The estimated detection rate
is based on the sample data set of FIG. 10. In FIG. 11B, the
estimated detection rate for digynic triploidy is 96.6%.
Retroactive application of such high risk fetal fraction criteria
to 29,000 NIPT cases would have resulted in 432 high risk calls
(1.5%). Application of the SNP method would result in 115 (0.4%)
high risk calls (for T13, T18, digynic triploidy). This results in
a 1.8% combined high risk call rate. The expected aneuploidy rate
based on priors was 0.3%. The theoretical PPV was thus 16%
(0.3%/1.8%).
[0080] FIG. 12 illustrates a PDF of normalized euploid data,
according to an example embodiment. Specifically, FIG. 12 shows
empirical density plots of fetal fractions after normalization.
There are 39 density curves. Each of the 39 density curves comes
from a set of data with approximately the same maternal weight and
gestational age, with between 400 and 500 samples each. Each data
set is normalized by its observed mean and variance. The plot in
FIG. 12 shows that the Gaussian fit is appropriate because the
distributions are very similar.
[0081] FIG. 13 illustrates a CDF of the normalized euploid data of
FIG. 12, according to an example embodiment. Specifically, FIG. 13
shows empirical density plots of fetal fractions after
normalization. There are 39 density curves. Each of the 39 density
curves comes from a set of data with approximately the same
maternal weight and gestational age, with between 400 and 500
samples each. Each data set is normalized by its observed mean and
variance. The plot in FIG. 13 shows that the Gaussian fit is
appropriate because the distributions are very similar.
[0082] FIG. 14 illustrates a plot of redraw success rate, according
to an example embodiment. Specifically, FIG. 14 plots the redraw
success rate against material weight bucket center. This plots
shows that another characteristic of the fetal fraction
distribution is the redraw success rate. Specifically, the ability
to make a call is strongly dependent on fetal fraction and a
successful redraw is often based on an increase in fetal fraction
between the first and second draw. The ability to predict the
probability of success for a redraw is often useful for doctors and
patients. This is because many cases with low fetal fraction will
not be at high risk for aneuploidy, but still have low probability
of a successful redraw, and so other testing embodiments may be
preferred.
[0083] FIG. 15 illustrates an example result set for identified
high risk samples, according to an embodiment. Specifically, FIG.
15 illustrates a result set for 143 sample cases that were
identified as having high extremely low fetal fraction (ELFF) risk
based on not having received a successful high or low risk draw
call, and having a computed ELFF risk score greater than 0.01. FIG.
15 further illustrates that follow-up results were successfully
collected for 70 of these sample cases. Of these 70 sample cases, 7
were found to be aneuploid.
[0084] FIG. 15 shows that among the cohort with successful
follow-up, the positive predictive value of high ELFF risk is
7/58=12.07%. FIG. 15 further shows that assuming all cases without
follow-up are euploid, the positive predictive value is
7/113=6.19%. This value can be considered the lower bound PPV based
on the data set of FIG. 15.
[0085] Various embodiments can be implemented, for example, using
one or more well-known computer systems, such as computer system
500 shown in FIG. 5. Computer system 500 can be any well-known
computer capable of performing the functions described herein.
[0086] Computer system 5 includes one or more processors (also
called central processing units, or CPUs), such as a processor 5.
Processor 504 is connected to a communication infrastructure or bus
506.
[0087] One or more processors 504 may each be a graphics processing
unit (GPU). In an embodiment, a GPU is a processor that is a
specialized electronic circuit designed to process mathematically
intensive applications. The GPU may have a parallel structure that
is efficient for parallel processing of large blocks of data, such
as mathematically intensive data common to computer graphics
applications, images, videos, etc.
[0088] Computer system 500 also includes user input/output
device(s) 503, such as monitors, keyboards, pointing devices, etc.,
that communicate with communication infrastructure 506 through user
input/output interface(s) 502.
[0089] Computer system 500 also includes a main or primary memory
508, such as random access memory (RAM). Main memory 508 may
include one or more levels of cache. Main memory 508 has stored
therein control logic (i.e., computer software) and/or data.
[0090] Computer system 500 may also include one or more secondary
storage devices or memory 510. Secondary memory 510 may include,
for example, a hard disk drive 512 and/or a removable storage
device or drive 514. Removable storage drive 514 may be a floppy
disk drive, a magnetic tape drive, a compact disk drive, an optical
storage device, tape backup device, and/or any other storage
device/drive.
[0091] Removable storage drive 514 may interact with a removable
storage unit 518. Removable storage unit 518 includes a computer
usable or readable storage device having stored thereon computer
software (control logic) and/or data. Removable storage unit 518
may be a floppy disk, magnetic tape, compact disk, DVD, optical
storage disk, and/ any other computer data storage device.
Removable storage drive 514 reads from and/or writes to removable
storage unit 518 in a well-known manner.
[0092] According to an exemplary embodiment, secondary memory 510
may include other means, instrumentalities or other approaches for
allowing computer programs and/or other instructions and/or data to
be accessed by computer system 500. Such means, instrumentalities
or other approaches may include, for example, a removable storage
unit 522 and an interface 520. Examples of the removable storage
unit 522 and the interface 520 may include a program cartridge and
cartridge interface (such as that found in video game devices), a
removable memory chip (such as an EPROM or PROM) and associated
socket, a memory stick and USB port, a memory card and associated
memory card slot, and/or any other removable storage unit and
associated interface.
[0093] Computer system 500 may further include a communication or
network interface 524. Communication interface 524 enables computer
system 500 to communicate and interact with any combination of
remote devices, remote networks, remote entities, etc.
(individually and collectively referenced by reference number 528).
For example, communication interface 524 may allow computer system
500 to communicate with remote devices 528 over communications path
526, which may be wired and/or wireless, and which may include any
combination of LANs, WANs, the Internet, etc. Control logic and/or
data may be transmitted to and from computer system 500 via
communication path 526.
[0094] In an embodiment, a tangible apparatus or article of
manufacture comprising a tangible computer useable or readable
medium having control logic (software) stored thereon is also
referred to herein as a computer program product or program storage
device. This includes, but is not limited to, computer system 500,
main memory 508, secondary memory 510, and removable storage units
518 and 522, as well as tangible articles of manufacture embodying
any combination of the foregoing. Such control logic, when executed
by one or more data processing devices (such as computer system
500), causes such data processing devices to operate as described
herein.
[0095] Based on the teachings contained in this disclosure, it will
be apparent to persons skilled in the relevant art(s) how to make
and use embodiments of the invention using data processing devices,
computer systems and/or computer architectures other than that
shown in FIG. 5. In particular, embodiments may operate with
software, hardware, and/or operating system implementations other
than those described herein.
[0096] It is to be appreciated that the Detailed Description
section, and not the Summary and Abstract sections (if any), is
intended to be used to interpret the claims. The Summary and
Abstract sections (if any) may set forth one or more but not all
exemplary embodiments of the invention as contemplated by the
inventor(s), and thus, are not intended to limit the invention or
the appended claims in any way.
[0097] While the invention has been described herein with reference
to exemplary embodiments for exemplary fields and applications, it
should be understood that the invention is not limited thereto.
Other embodiments and modifications thereto are possible, and are
within the scope and spirit of the invention. For example, and
without limiting the generality of this paragraph, embodiments are
not limited to the software, hardware, firmware, and/or entities
illustrated in the figures and/or described herein. Further,
embodiments (whether or not explicitly described herein) have
significant utility to fields and applications beyond the examples
described herein.
[0098] Embodiments have been described herein with the aid of
functional building blocks illustrating the implementation of
specified functions and relationships thereof. The boundaries of
these functional building blocks have been arbitrarily defined
herein for the convenience of the description. Alternate boundaries
can be defined as long as the specified functions and relationships
(or equivalents thereof) are appropriately performed. Also,
alternative embodiments may perform functional blocks, steps,
operations, methods, etc. using orderings different than those
described herein.
[0099] References herein to "one embodiment," "an embodiment," "an
example embodiment," or similar phrases, indicate that the
embodiment described may include a particular feature, structure,
or characteristic, but every embodiment may not necessarily include
the particular feature, structure, or characteristic. Moreover,
such phrases are not necessarily referring to the same embodiment.
Further, when a particular feature, structure, or characteristic is
described in connection with an embodiment, it would be within the
knowledge of persons skilled in the relevant art(s) to incorporate
such feature, structure, or characteristic into other embodiments
whether or not explicitly mentioned or described herein.
[0100] The breadth and scope of the invention should not be limited
by any of the above-described exemplary embodiments, but should be
defined only in accordance with the following claims and their
equivalents.
* * * * *