U.S. patent application number 15/431395 was filed with the patent office on 2017-09-14 for detection and treatment of disease exhibiting disease cell heterogeneity and systems and methods for communicating test results.
The applicant listed for this patent is Guardant Health, Inc.. Invention is credited to Helmy ELTOUKHY, Nnamdi IHUEGBU, Bahram Ghaffarzadeh KERMANI, AmirAli TALASAZ.
Application Number | 20170260590 15/431395 |
Document ID | / |
Family ID | 56284976 |
Filed Date | 2017-09-14 |
United States Patent
Application |
20170260590 |
Kind Code |
A1 |
ELTOUKHY; Helmy ; et
al. |
September 14, 2017 |
DETECTION AND TREATMENT OF DISEASE EXHIBITING DISEASE CELL
HETEROGENEITY AND SYSTEMS AND METHODS FOR COMMUNICATING TEST
RESULTS
Abstract
This disclosure provides, among other things, methods for
generating and applying therapeutic interventions. The methods
involve, for example, (a) sequencing polynucleotides from cancer
cells from a subject; (b) identifying and quantifying somatic
mutations in the polynucleotides; (c) developing a profile of tumor
heterogeneity in the subject indicating the presence and relative
quantity of a plurality of the somatic mutations in the
polynucleotides, wherein different relative quantities indicates
tumor heterogeneity; and (d) determining a therapeutic intervention
for a cancer exhibiting the tumor heterogeneity, wherein the
therapeutic intervention is effective against a cancer having the
profile of tumor heterogeneity determined.
Inventors: |
ELTOUKHY; Helmy; (Atherton,
CA) ; TALASAZ; AmirAli; (Menlo Park, CA) ;
KERMANI; Bahram Ghaffarzadeh; (Los Altos, CA) ;
IHUEGBU; Nnamdi; (San Francisco, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Guardant Health, Inc. |
Redwood City |
WA |
US |
|
|
Family ID: |
56284976 |
Appl. No.: |
15/431395 |
Filed: |
February 13, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
PCT/US2015/067717 |
Dec 28, 2015 |
|
|
|
15431395 |
|
|
|
|
62098426 |
Dec 31, 2014 |
|
|
|
62155763 |
May 1, 2015 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
C12Q 1/6827 20130101;
C12Q 2600/118 20130101; C12Q 1/6869 20130101; G16B 30/00 20190201;
C12Q 2600/156 20130101; C12Q 1/6886 20130101; C12Q 1/6827 20130101;
C12Q 2535/113 20130101; C12Q 2545/114 20130101; C12Q 1/6869
20130101; C12Q 2527/113 20130101; C12Q 2535/122 20130101; C12Q
2537/149 20130101; C12Q 2537/165 20130101 |
International
Class: |
C12Q 1/68 20060101
C12Q001/68; G06F 19/22 20060101 G06F019/22 |
Claims
1. A method comprising: (a) sequencing polynucleotides from cancer
cells from a biological sample of a subject; (b) identifying and
quantifying somatic mutations in the polynucleotides; (c)
developing a profile of tumor heterogeneity in the subject
indicating a presence and a relative quantity of a plurality of the
somatic mutations in the polynucleotides, wherein different
relative quantities indicate tumor heterogeneity; and (d)
determining a therapeutic intervention for a cancer exhibiting the
tumor heterogeneity, wherein the therapeutic intervention is
effective against a cancer having the profile of tumor
heterogeneity.
2.-14. (canceled)
15. A system comprising a computer readable medium comprising
machine-executable code that, upon execution by a computer
processor, implements a method comprising: (a) receiving into
memory sequence reads of polynucleotides from cancer cells from a
biological sample of a subject mapping to a genetic locus; (b)
determining, among said sequence reads, identity of one or more
bases that than differ from a base of a reference sequence at the
locus in a total number of sequence reads mapping to the locus; (c)
reporting the determined identity a relative quantity, and a
location in the reference sequence of the one or more bases; and
(d) inferring tumor heterogeneity of the cancer cells from the
biological sample based on the determined identity, the relative
quantity, and the location in the reference sequence of the one or
more bases.
16.-42. (canceled)
43. A method, comprising: a) providing a plurality of nucleic acid
samples from a subject, the nucleic acid samples collected at
serial time points; b) sequencing polynucleotides from the
plurality of nucleic acid samples to generate sequences; c)
determining a quantitative measure of each of a plurality of
genetic variants among the polynucleotides from each of the
plurality of nucleic acid samples; d) graphically representing by
computer relative quantities of each of the plurality of genetic
variants at each of the serial time points for somatic mutations
present at a non-zero quantity for at least one of the serial time
points.
44. The method of claim 43 wherein the quantitative measure is a
frequency of the genetic variant among all sequences mapping to the
same genetic locus.
45. The method of claim 43 wherein the relative quantities are
graphically represented as a stacked area graph.
46. The method of claim wherein the relative quantities are
stacked, at the earliest time point of the serial time points,
highest to lowest from bottom to top of the graph, and wherein a
genetic variant first appearing at a non-zero quantity at a later
time point of the serial time points is stacked at the top of the
graph.
47. A method to generate a paper or electronic patient test report
from data generated by a genetic analyzer, comprising: a)
summarizing data from two or more testing time points, whereby a
union of all non-zero testing results is reported at each
subsequent testing time point of the two or more time points after
the first testing time point of the two or more time points, to
generate testing results; and b) rendering the testing results on
the paper or electronic patient test report.
48. The method of claim 47 wherein summarizing and rendering are
performed on a computer by executing code with a computer processor
to (i) identify all non-zero testing results, (ii) generate the
electronic patient test report and (iii) display the electronic
patient test report on a graphical user interface.
49. A method of graphically representing evolution of genetic
variants of a tumor in a subject from data generated by a genetic
analyzer, the method comprising: a) generating by computer a
stacked representation of the genetic variants detected at each of
a plurality of time points in the subject, wherein a height or
width of each layer in the stacked representation that corresponds
to a genetic variant represents a quantitative contribution of the
genetic variant to a total quantity of the genetic variants at each
of the plurality of time points; and b) displaying the stacked
representation on a computer monitor or a paper report.
50. The method of claim 49 wherein displaying comprises: a)
receiving data representing the detected tumor genetic variants
into computer memory; b) executing code with a computer processor
to graphically represent the quantitative contribution of each
genetic variant at a time point as a line or area proportional to a
relative contribution of the quantitative contribution; and c)
displaying the graphical representation on a graphical user
interface.
51. The method of claim 43, wherein the graphical representation
further indicates, for each time point, the quantitative measure of
the predominant genetic variant among the plurality of genetic
variants.
52. The method of claim 43, wherein graphically representing
comprises normalizing and scaling the quantitative measures.
53. The method of claim 43, wherein the polynucleotides comprise
cell-free DNA (cfDNA).
54. The method of claim 44, wherein the genetic locus is located in
an oncogene.
55. The method of claim 43, wherein the plurality of genetic
variants (i) maps to different genes in the genome or (ii) maps to
the same gene in the genome.
56. The method of claim 47, wherein the paper or electronic patient
test report includes one or more annotations to help a physician
interpret the testing results or recommend treatment options.
57. The method of claim 49, wherein the stacked representation
includes one or more annotations to help a physician interpret the
stacked representation or recommend treatment options.
58. The method of claim 49, further comprising using allele
fractions of the detected genetic variants, allelic imbalances of
the detected genetic variants, and gene-specific coverage of the
genetic variants detected using a bodily-fluid based test to infer
the disease burden.
59. The method of claim 49, wherein an overall stack height is
representative of overall disease burden or a disease burden score
in the subject.
60. The method of claim 49, wherein only a subset of the detected
genetic variants is plotted, the subset chosen based on (i) a
likelihood of a genetic variant being a driver alteration or (ii)
an association with increased or reduced response to treatment.
61. The method of claim 49, wherein the method comprises estimating
a disease progression or remission based on rate of change and/or
quantitative precision of each testing result.
62. The method of claim 49, wherein the method comprises displaying
a therapeutic intervention between intervening testing points.
63. The method of claim 43, wherein sequencing the polynucleotides
comprises generating a plurality of sequence reads for parent
polynucleotides, and collapsing the plurality of sequence reads to
generate consensus calls for bases in each parent
polynucleotide.
64. The method of claim 43, wherein the serial time points comprise
a first time point before a treatment is administered to the
subject and a second time point after the treatment is administered
to the subject.
65. The method of claim 43, wherein the serial time points comprise
a first time point and a second time point, wherein the second time
point is about one month after the first time point.
66. The method of claim 45, wherein the stacked area graph
comprises areas represented by different colors.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional
Application No. 62/098,426, filed Dec. 31, 2014 and U.S.
Provisional Application No. 62/155,763, filed on May 1, 2015, each
of which is incorporated entirely herein by reference.
BACKGROUND
[0002] Health care is just now starting to effectively use
information from the human genome to diagnose and treat disease.
Nowhere is this more crucial than in the treatment of cancer, from
which 7.6 million people in the U.S. die each year, and for which
the US spends $87 billion a year on treatment. Cancer refers to any
disorder of various malignant neoplasms characterized by the
proliferation of anaplastic cells that tend to invade surrounding
tissue and metastasize to new body sites and the pathological
conditions characterized by such growths.
[0003] One of the reasons cancer is difficult to treat is that
current testing methods may not help doctors match specific cancers
with effective drug treatments. And it is a moving target--cancer
cells are constantly changing and mutating. Cancers can accumulate
genetic variants through, e.g., somatic cell mutation. Such
variants include, for example, sequence variants and copy number
variants. Analysis of tumors has indicated that different cells in
a tumor can bear different genetic variants. Such differentiation
between tumor cells has been referred to as tumor
heterogeneity.
[0004] Cancers can evolve over time, becoming resistant to a
therapeutic intervention. Certain variants are known to correlate
with responsiveness or resistance to specific therapeutic
interventions. More effective treatments for cancers exhibiting
tumor heterogeneity would be beneficial. Such cancers may be
treated with a second, different, therapeutic intervention to which
the cancer responds.
[0005] DNA sequencing methods allow detection of genetic variants
in DNA from tumor cells. Cancer tumors continually shed their
unique genomic material into the bloodstream. Unfortunately, these
telltale genomic "signals" are so weak that current genomic
analysis technologies, including next-generation sequencing, may
only detect such signals sporadically or in patients with
terminally high tumor burden. The main reason for this is that such
technologies are plagued by error rates and bias that can be orders
of magnitude higher than what is required to reliably detect de
novo genomic alterations associated with cancer.
[0006] In a parallel trend, to understand the clinical significance
of a genetic test, treating professionals must have a working
knowledge of basic principles of genetic inheritance and reasonable
facility with the interpretation of probabilistic data. Some
studies suggest that many treating professionals are not adequately
prepared to interpret genetic tests for disease susceptibility.
Some physicians have difficulty interpreting probabilistic data
related to the clinical utility of diagnostic tests, such as the
positive or negative predictive value of a laboratory test.
[0007] The error rates and bias in detecting de novo genomic
alterations associated with cancer, along with inadequate
explanation or the implications of the genetic tests for cancer,
have lowered the quality of care for cancer patients. Professional
societies, such as the College of American Pathologists (CAP) and
the American College of Medical Genetics (ACMG), have published
standards or guidelines for laboratories that provide genetic
testing, which require that reports containing genetic information
include interpretive content that is understandable by generalist
physicians.
SUMMARY
[0008] In an aspect provided herein is a method comprising: (a)
sequencing polynucleotides from cancer cells from a biological
sample of a subject; (b) identifying and quantifying somatic
mutations in the polynucleotides; (c) developing a profile of tumor
heterogeneity in the subject indicating the presence and relative
quantity of a plurality of the somatic mutations in the
polynucleotides, wherein different relative quantities indicates
tumor heterogeneity; and (d) determining a therapeutic intervention
for a cancer exhibiting the tumor heterogeneity, wherein the
therapeutic intervention is effective against a cancer having the
profile of tumor heterogeneity determined. In some embodiments, the
cancer cells are spatially distinct. In some embodiments, the
therapeutic intervention is more effective against a cancer
presenting with the plurality of somatic mutations than it is
against a cancer presenting with any one, but not all, of the
somatic mutations. In some embodiments, the method further
comprises: (e) monitoring changes in tumor heterogeneity in the
subject over time and determining different therapeutic
interventions over time based on the changes. In some embodiments,
the method further comprises: (e) displaying the therapeutic
intervention. In some embodiments, the method further comprises:
(e) implementing the therapeutic intervention. In some embodiments,
the method further comprises: (e) generating a phylogeny of tumor
evolution based on the tumor profile; wherein determining the
therapeutic intervention takes into account the phylogeny.
[0009] In some embodiments, determining is performed with the aid
of computer-executed algorithm. In some embodiments, sequence reads
generated by sequencing are subject to noise reduction before
identifying and quantifying. In some embodiments, noise reduction
comprises molecular tracking of sequences generated from a single
polynucleotide in the sample.
[0010] In some embodiments, determining a therapeutic intervention
takes into account the relative frequencies of the tumor-related
genetic alterations. In some embodiments, the therapeutic
intervention comprises administering, in combination or in series,
a plurality of drugs, wherein each drug is relatively more
effective against a cancer presenting with a different one of
somatic mutations that occur at different relative frequency. In
some embodiments, a drug that is relatively more effective against
a cancer presenting with a somatic mutation occurring at higher
relative frequency is administered in higher amount. In some
embodiments, the drugs are delivered at doses that are stratified
to reflect the relative amounts of the variants in the DNA. In some
embodiments, cancers presenting with at least one of the genetic
variants is resistant to at least one of the drugs. In some
embodiments, determining a therapeutic intervention takes into
account the tissue of origin of the cancer. In some embodiments,
the therapeutic intervention is determined based on a database of
interventions shown to be therapeutic for cancers having tumor
heterogeneity characterized by each of the somatic mutations.
[0011] In some embodiments, the polynucleotides comprise cfDNA from
a blood sample. In some embodiments, the polynucleotides comprise
polynucleotides from spatially distinct cancer cells. In some
embodiments, the polynucleotides comprise polynucleotides from
different metastatic tumor sites. In some embodiments, the
polynucleotides comprise polynucleotides from a solid tumor or a
diffuse tumor. In some embodiments, the polynucleotides are
comprised in a blood sample or in solid tumor biopsy.
[0012] In some embodiments, identifying comprises generating a
plurality of sequence reads for parent polynucleotides from the
sample, and collapsing the sequence reads to generate consensus
calls for bases in each parent polynucleotide. In some embodiments,
quantifying comprises determining frequency at which the somatic
mutations are detected in the population of polynucleotides from
the biological sample. In some embodiments, the biological sample
comprises biological molecules from non-disease cells. In some
embodiments, the biological sample comprises biological molecules
from a plurality of different tissues. In some embodiments, the
biomolecules are comprised in one biological sample. In some
embodiments, the biomolecules are comprised in a plurality of
biological samples. In some embodiments, the plurality of
biological samples are tumors from a plurality of metastases.
[0013] In some embodiments, sequencing comprises sequencing all or
part of a subset of genes in the subject's genome. In some
embodiments, the somatic mutations are selected from single
nucleotide variations (SNVs), insertions, deletions, inversions,
transversions, translocations, copy number variations (CNVs) (e.g.,
aneuploidy, partial aneuploidy, polyploidy), chromosomal
instability, chromosomal structure alterations, gene fusions,
chromosome fusions, gene truncations, gene amplification, gene
duplications, chromosomal lesions, DNA lesions, abnormal changes in
nucleic acid chemical modifications, abnormal changes in epigenetic
patterns and abnormal changes in nucleic acid methylation. In some
embodiments, genetic loci are selected from single nucleotides,
genes and chromosomes.
[0014] In some embodiments, the cancer is selected from carcinomas,
sarcomas, leukemias, lymphomas, myelomas and central nervous system
cancers (e.g., breast cancer, prostate cancer, colorectal cancer,
brain cancer, esophageal cancer, head and neck cancer, bladder
cancer, gynecological cancer, liposarcoma, and multiple myeloma).
In some embodiments, cancer cells of the tumor are derived from a
common parent disease cell. In some embodiments, cancer cells of
the tumor are derived from different parent cancer cells of the
same or different cancer type. In some embodiments, the method
further comprises determining a measure of the somatic mutations to
one or more control references to determine the relative
quantity.
[0015] In some embodiments, the polynucleotides are sourced from
both circulating cancer polynucleotides and from solid tumor
biopsy. In some embodiments, profiles are separately developed for
polynucleotides sourced from the circulating cancer polynucleotides
and from the solid tumor biopsy.
[0016] In an aspect provided herein is a method comprising
providing a therapeutic intervention for a subject having a cancer
having a tumor profile from which tumor heterogeneity can be
inferred, wherein the therapeutic intervention is effective against
cancers with the tumor profile. In some embodiments, the tumor
profile indicates relative frequency of a plurality of more somatic
mutations. In some embodiments, the method further comprises
monitoring changes in the relative frequencies in the subject over
time and determining different therapeutic interventions over time
based on the changes. In some embodiments, the therapeutic
intervention is more effective against a cancer presenting with
each of the somatic mutations than it is against a cancer
presenting with any one, but not all, of the somatic mutations. In
some embodiments, the therapeutic intervention comprises
administering, in combination or in series, a plurality of drugs,
wherein each drug is relatively more effective against a cancer
presenting with a different one of somatic mutations that occur at
different relative frequency. In some embodiments, a drug that is
relatively more effective against a cancer presenting with a
somatic mutation occurring at higher relative frequency is
administered in higher amount. In some embodiments, the drugs are
delivered at doses that are stratified to reflect the relative
amounts of the variants in the DNA. In some embodiments, cancers
presenting with at least one of the genetic variants is resistant
to at least one of the drugs. In some embodiments, the cancer is
selected from carcinomas, sarcomas, leukemias, lymphomas, myelomas
and central nervous system cancers (e.g., breast cancer, prostate
cancer, colorectal cancer, brain cancer, esophageal cancer, head
and neck cancer, bladder cancer, gynecological cancer, liposarcoma,
and multiple myeloma).
[0017] In an aspect provided herein is a method comprising
administering to a subject a therapeutic intervention that is
effective against a tumor exhibiting tumor heterogeneity, wherein
the therapeutic intervention is based on a profile of tumor
heterogeneity in the subject indicating the presence and relative
quantity of a plurality of the somatic mutations in the
polynucleotides, wherein different relative quantities indicates
tumor heterogeneity.
[0018] In an aspect provided herein is a system comprising a
computer readable medium comprising machine-executable code that,
upon execution by a computer processor, implements a method
comprising: (a) receiving into memory sequence reads of
polynucleotides mapping to a genetic locus; (b) determining, among
said sequence reads, identity of bases that are different than a
base of a reference sequence at the locus of the total number of
sequence reads mapping to a locus; (c) reporting the identity and
relative quantity of the determined bases and their location in the
genome; and (d) inferring heterogeneity of a given sample based on
information in (c). In some embodiments, the method implemented
further comprises receiving into memory sequence reads derived from
samples at a plurality of different times and calculating a
difference in relative amount and identity of a plurality of bases
between the two samples.
[0019] In an aspect provided herein is a kit comprising a first
pharmaceutical drug and a second pharmaceutical drug, wherein a
combination of the first drug and the second drug is more
therapeutically effective against a cancer presenting with a first
and a second somatic mutation than it is against a cancer
presenting with any one, but not all, of the somatic mutations. In
some embodiments, the combination is contained in a mixture or each
drug is contained in a separate container.
[0020] In an aspect provided herein is a method comprising: (a)
performing biomolecular analysis of biomolecular polymers from
disease cells (e.g., spatially distinct disease cells) from a
subject; (b) identifying and quantifying biomolecular variants in
the biomolecular macromolecules; (c) developing a profile of
disease cell heterogeneity in the subject indicating the presence
and relative quantity of a plurality of the variants in the
biomolecular macromolecules, wherein different relative quantities
indicates disease cell heterogeneity; and (d) determining a
therapeutic intervention for a disease exhibiting the disease cell
heterogeneity, wherein the therapeutic intervention is effective
against a disease having the profile of disease cell heterogeneity
determined. In some embodiments, the disease cells are spatially
distinct disease cells. In some embodiments, the therapeutic
intervention is determined based on a database of interventions
shown to be therapeutic for cancers having tumor heterogeneity
characterized by each of the somatic mutations.
[0021] In an aspect herein is a method of detecting disease cell
heterogeneity in a subject comprising: a) quantifying
polynucleotides that bear a sequence variant at each of a plurality
of genetic loci in polynucleotides from a sample from the subject,
wherein the sample comprises polynucleotides from somatic cells and
from disease cells; b) determining for each locus a measure of copy
number variation (CNV) for polynucleotides bearing the sequence
variant; c) determining for each locus a weighted measure of
quantity of polynucleotides bearing a sequence variant at the locus
as a function of CNV at the locus; and d) comparing the weighted
measures at each of the plurality of loci, wherein different
weighted measures indicate disease cell heterogeneity. In some
embodiments, the disease cells are tumor cells. In some
embodiments, polynucleotides comprise cfDNA.
[0022] In an aspect provided herein is a method comprising: a)
subjecting a subject to one or more pulsed therapy cycles, each
pulsed therapy cycle comprising: (i) a first period during which
one or more drugs is administered at a first amount and (ii) a
second period during which the one or more drugs is administered at
a second, reduced (e.g., completely not administered) amount;
wherein: (A) the first period is characterized by a tumor burden
detected above a first clinical level; and (B) the second period is
characterized by a tumor burden detected below a second clinical
level. In some embodiments, tumor burden is measured as a function
of a quantity of a selected somatic variant in tumor
polynucleotides. In some embodiments, one or more drugs is a
plurality of drugs and each amount of each drug in each cycle is
determined as a function of tumor burden measured as a function of
a quantity of each of a plurality of different selected somatic
variants in tumor polynucleotides. In some embodiments, the method
comprises subjecting the subject to a plurality of pulsed therapy
cycles. In some embodiments, the method further comprises: b) when
the subject exhibits resistance to the one or more drugs,
subjecting the subject to one or more pulsed therapy cycles, each
pulsed therapy cycle comprising: (i) a first period during which a
different one or more drugs is administered at a first amount and
(ii) a second period during which the different one or more drugs
is administered at a second, reduced (e.g., completely not
administered) amount; wherein: (A) the first period is
characterized by a tumor burden detected above a first clinical
level; and (B) the second period is characterized by a tumor burden
detected below a second clinical level.
[0023] In an aspect provided herein is a method comprising: (a)
sequencing polynucleotides from cancer cells from a subject; (b)
identifying and quantifying somatic mutations in the
polynucleotides; and (c) developing a profile of tumor
heterogeneity in the subject for use in determining a therapeutic
intervention effective for a cancer exhibiting tumor heterogeneity,
wherein the profile indicates the presence and relative quantity of
a plurality of the somatic mutations in the polynucleotides,
wherein different relative quantities indicates tumor
heterogeneity.
[0024] In an aspect provided herein is a method comprising
providing a therapeutic intervention for a subject wherein the
therapeutic intervention is determined from a profile of disease
cell heterogeneity in the subject, wherein the profile indicates
the presence and relative quantity of a plurality of the somatic
mutations in the polynucleotides, wherein different relative
quantities indicates disease cell heterogeneity; and wherein the
therapeutic intervention is effective against a disease having the
profile of disease cell heterogeneity determined, e.g., more
effective against a disease presenting with the plurality of
somatic mutations than it is against a disease presenting with any
one, but not all, of the somatic mutations.
[0025] In an aspect provided herein is a method comprising: a)
determining a measure of deviation from a value of central tendency
(e.g., standard deviation, variance) of copy number in
polynucleotides in a sample across a region of at least 1 kb, at
least 10 kb, at least 100 kb, at least 1 mb, at least 10 mb or at
least 100 mb of a genome; b) inferring a measure of burden of DNA
from cells undergoing cell division in the sample based on the
measure of deviation. In some embodiments, the value of central
tendency is mean, median or mode. In some embodiments, determining
comprises partitioning the region into a plurality of
non-overlapping intervals, determining a measure of copy number at
each interval and determining the measure of deviation based on
measures of copy number at each interval. In some embodiments, the
interval is no more than any of 1 base, 10 bases, 100 bases, 1 kb
bases or 10 kb.
[0026] In an aspect provided herein is a method of inferring a
measure of burden of DNA from cells undergoing cell division in a
sample comprising measuring copy number variation induced by
proximity of one or more genomic loci to cells' origins of
replication, wherein increased CNV indicates cells undergoing cell
division. In some embodiments, the burden is measured in cell-free
DNA. In some embodiments, the measure of burden relates to the
fraction of tumor cells or genome-equivalents of DNA from tumor
cells in the sample. In some embodiments, CNV due to proximity to
origins of replication is inferred from a set of control samples or
cell-lines. In some embodiments, a hidden-markov model, regression
model, principal component analysis-based model, or
genotype-modified model is used to approximate variations due to
origins of replications. In some embodiments, the measure of burden
is presence or absence of cells undergoing cell division. In some
embodiments, proximity is within 1 kb of an origin of
replication.
[0027] In an aspect provided herein is a method of increasing
sensitivity and/or specificity of determining gene-related
copy-number variations by ameliorating the effect of variations due
to proximity to origins of replications. In some embodiments, the
method comprises measuring CNV at a locus, determining amount of
CNV due to proximity of the locus to an origin of replication, and
correcting the measured CNV to reflect genomic CNV, e.g., by
subtracting amount of CNV attributable to cell division. In some
embodiments, the genomic data is obtained from cell-free DNA. In
some embodiments, the measure of burden relates to the fraction of
tumor cells or genome-equivalents of DNA in a sample. In some
embodiments, variations due to origins of replication are inferred
from a set of control samples or cell-lines. In some embodiments, a
hidden-markov model, regression model, principal component
analysis-based model, or genotype-modified model is used to
approximate variations due to origins of replications.
[0028] In an aspect provided herein is a method comprising: a)
determining a baseline measure of copies of DNA molecules at one or
more loci from one or more control samples, wherein one or more of
the loci includes an origin of replication, each containing DNA
from cells undergoing a predetermined level of cell division; b)
determining a test measure of DNA molecules in a test sample;
wherein the measure in test sample is from one or more loci
partitioned into one or more partitions and wherein one or more of
the loci includes an origin of replication; c) comparing the test
measure and the baseline measure, wherein a test measure above a
baseline measure indicates DNA in the test sample from cells
dividing at a rate faster than cells providing DNA to the control
sample. In some embodiments, the measure is selected from molecule
count, a measure of central tendency of molecule count across
partitions or a measure of variation of molecule count across
partitions.
[0029] In an aspect provided herein is a method comprising: (a)
administering to a subject an intervention that increases an amount
of tumor-derived DNA in the subject's circulation; and (b) when
said amount is increased, collecting from the subject a sample
containing tumor-derived DNA. In some embodiments, the intervention
preferentially kills tumor cells. In some embodiments, the
intervention comprises exposing the subject or suspected diseased
areas of the subject to radiation. In some embodiments, the
intervention comprises exposing the subject or suspected diseased
areas of subject to ultrasound. In some embodiments, the
intervention comprises exposing the subject or suspected diseased
areas of subject to physical agitation. In some embodiments, the
intervention comprises administering to the subject a low dose of
chemotherapy. In some embodiments, the method comprises
administering the intervention to the subject within 1 week before
collecting the sample. In some embodiments, the sample is selected
from blood, plasma, serum, urine, saliva, cerebral spinal fluid,
vaginal secretion, mucous and semen.
[0030] In an aspect provided herein is a method comprising
compiling a database, wherein the database includes, for each of a
plurality of subjects having cancer, tumor genomic testing data,
including somatic alterations, collected at two or more time
intervals per subject, one or more therapeutic interventions
administered to each of the subjects at one or more times and
efficacy of the therapeutic interventions, wherein the database is
useful to infer efficacy of the therapeutic interventions in
subjects with a tumor genomic profile. In some embodiments, the
plurality is at least 50, at least 500 or at least 5000. In some
embodiments, the tumor genomic testing data is collected via serial
biopsy, cell-free DNA, cell-free RNA or circulating tumor cells. In
some embodiments, relative frequencies of detected genetic variants
are used to classify treatment efficacy. In some embodiments,
additional information is used to help classify treatment efficacy,
including but not limited to, weight, adverse treatment effects,
histological testing, blood testing, radiographic information,
prior treatments, and cancer type. In some embodiments, treatment
response per patient is collected and classified quantitatively
through additional testing. In some embodiments, the additional
testing is blood or urine based testing.
[0031] In an aspect provided herein is a method comprising use of a
database to identify one or more effective therapeutic
interventions for a subject having cancer, wherein the database
includes, for each of a plurality of subjects having cancer, tumor
genomic testing data, including somatic alterations, collected at
two or more time intervals per subject, one or more therapeutic
interventions administered to each of the subjects at one or more
times and efficacy of the therapeutic interventions. In some
embodiments, identified therapeutic interventions are stratified by
efficacy. In some embodiments, quantitative bounds on predicted
therapeutic interventions efficacy or lack thereof are reported. In
some embodiments, the therapeutic interventions use information of
predicted tumor genomic evolution or acquired resistance mechanisms
in similar patients in response to treatment.
[0032] In some embodiments, the method comprises classifying
effectiveness of treatment using a classification algorithm, e.g.,
linear regression processes (e.g., multiple linear regression
(MLR), partial least squares (PLS) regression and principal
components regression (PCR)), binary decision trees (e.g.,
recursive partitioning processes such as CART--classification and
regression trees), artificial neural networks such as back
propagation networks, discriminant analyses (e.g., Bayesian
classifier or Fischer analysis), logistic classifiers, and support
vector classifiers (e.g., support vector machines).
[0033] In an aspect disclosed herein is a method to report results
of one or more genetic tests comprising: capturing genetic
information including genetic variants and quantitative measures
thereof over one or more test points using a genetic analyzer;
normalizing the quantitative measures for rendering with the one or
more test points and generating a scaling factor; applying the
scaling factor to render a tumor response map; and generating a
summary of genetic variants. In some embodiments, the method
comprises analyzing non-CNV (copy number variation) mutant allele
frequencies. In some embodiments, the method comprises transforming
an absolute value into a relative metric for rendering the tumor
response map. In some embodiments, the method comprises multiplying
a mutant allele frequency by a predetermined value and taking a log
thereof. In some embodiments, the method comprises: multiplying the
scaling factor by a transformed value for each gene to determine a
quantity indicator to be rendered on the tumor response map; and
assigning a unique visual indicator for each alteration in a visual
panel. In some embodiments, the method comprises Y-centering or
vertically centering the quantity indicator in a contiguously
placed panel that indicates continuity. In some embodiments, the
assigning further comprises providing a unique color for each
alteration.
[0034] In some embodiments, the method comprises analyzing genetic
information from another test point or test time. In some
embodiments, wherein a new test result does not differ from a prior
test result, the method comprises rendering the prior visual panel.
In some embodiments, wherein if alterations remain the same, but
quantities have changed, the method comprises: maintaining the
order and unique visual indicator for each alteration; and
determining a new quantity indicator and generating a new visual
panel for all test points. In some embodiments, the method
comprises determining a new alteration in the genetic information
and adding the alteration to the top of existing alterations. In
some embodiments, the method comprises determining a new alteration
in the genetic information and determining new transform values and
scaling factor and assigning a unique visual indicator for each new
alteration. In some embodiments, the method comprises determining a
new alteration in the genetic information and re-generating the
tumor response map including alterations from a prior test point
that are still detected in current test point and the new
alteration. In some embodiments, the method comprises determining
if a prior alteration is no longer present and if so, comprising
using a height of zero when rendering the quantity of the
alteration of the prior alteration for subsequent test points. In
some embodiments, the method comprises determining if a prior
alteration is no longer present and if so, reserving the unique
visual indicator associated with the prior alternation from future
use.
[0035] In some embodiments, the method comprises analyzing CNV
mutant allele frequencies and methylation mutant allele
frequencies. In some embodiments, the method comprises grouping of
maximum mutant allele frequencies for rendering first on the tumor
response map. In some embodiments, the method comprises rendering
alterations for the gene in decreasing mutant allele frequency
order of alterations. In some embodiments, the method comprises
rendering alterations for the gene in a decreasing order. In some
embodiments, the method comprises selecting a next gene with next
highest mutant allele frequency.
[0036] In some embodiments, for each reported alteration, the
method comprises generating a trend indicator for the alteration
over the different test points. In some embodiments, the method
comprises generating a summary of alterations. In some embodiments,
the method comprises generating a summary of treatment options. In
some embodiments, the method comprises generating a summary of
mutant allele frequency, cell free amplification, clinical approval
indication, and clinical trial. In some embodiments, the method
comprises generating a panel based on a biological pathway. In some
embodiments, the method comprises generating a panel based on an
evidence level. In some embodiments, the genetic information
includes one or more of single-nucleotide variations, copy number
variations, insertions and deletions, and gene rearrangements. In
some embodiments, the method comprises generating a clinical
relevance report on detected alterations. In some embodiments, the
method comprises generating a therapy result summary.
[0037] In an aspect provided herein is a method to generate a
genetic report comprising: generating non-copy number variation
(CNV) data using a genetic analyzer; determining a scaling factor
for each non-CNV mutant allele frequency; for a first test,
generating a visual panel each non-CNV alteration using the scaling
factor; and for each subsequent test, generating changes in the
non-CNV alteration for the visual panel using the scaling
factor.
[0038] In some embodiments, the method comprises transforming an
absolute value into a relative metric for rendering. In some
embodiments, the method comprises multiplying a mutant allele
frequency by a predetermined value and taking a log of the
predetermined value. In some embodiments, the method comprises
determining a scaling factor using a maximum observed value. In
some embodiments, for each non-CNV alteration, the method comprises
multiplying a scaling factor by a transformed value for each gene
variant as a quantity indicator for visualizing the gene
variant.
[0039] In some embodiments, the method comprises assigning a unique
visual indicator for each alteration. In some embodiments, for the
subsequent test, the method comprises using the visual panel if the
test result is unchanged. In some embodiments, if alterations
remain the same in the subsequent test, the method comprises
maintaining the order and unique visual indicator for each
alteration; and recomputing a quantity indicator for visualizing
that variant and re-rendering updated values in existing panel(s)
and new panel for the latest test. In some embodiments, if new
alteration is found in the subsequent test, the method comprises
adding the alterations to the top of all existing alterations;
computing transform values and the scaling factor; and assigning a
unique visual indicator for each new alterations.
[0040] In some embodiments, the method comprises: re-rendering
alterations in the prior test point and the new alteration; and
vertically centering an image of the alterations in a contiguously
placed panel that indicates continuity. In some embodiments, if a
prior alteration is not present in a subsequent test, the method
comprises using a height of zero as the quantity of the alteration
for a subsequent rendering. In some embodiments, the method
comprises rendering subject or intervention information associated
with alteration changes. In some embodiments, the method comprises
identifying an alteration with the maximum Mutant Allele
Frequency.
[0041] In some embodiments, the method comprises: reporting
alterations for that gene in decreasing mutant allele frequency
order of non-CNV alterations; and reporting CNV alterations for
that gene in decreasing order of CNV value. In some embodiments,
the method comprises selecting the next gene with next highest
non-CNV mutant allele frequency and reporting alterations for that
gene in decreasing mutant allele frequency order of non-CNV
alterations; and reporting CNV alterations for that gene in
decreasing order of CNV value.
[0042] In some embodiments, the method comprises rendering a trend
indicator for an alteration over different test dates. In some
embodiments, the method comprises grouping of maximum mutant allele
frequencies and generating annotations including biological
pathways or evidence level. In some embodiments, the method
comprises generating a panel based on an evidence level. In some
embodiments, the method comprises generating a panel based on a
biological pathway. In some embodiments, the genetic information
includes one or more of single-nucleotide variations, copy number
variations, insertions and deletions, and gene rearrangements.
[0043] In an aspect provided herein is a method comprising: a)
providing a plurality of nucleic acid samples from a subject, the
samples collected at serial time points; b) sequencing
polynucleotides from the samples to generate sequences; c)
determining a quantitative measure of each of a plurality of
genetic variants among the polynucleotides in each sample; d)
graphically representing by computer relative quantities of genetic
variants at each serial time point for those somatic mutations
present at a non-zero quantity at least one of the serial time
points. In some embodiments, the quantitative measure is the
frequency of the genetic variant among all sequences mapping to the
same genetic locus. In some embodiments, the relative quantities
are represented as a stacked area graph. In some embodiments, the
relative quantities are stacked, at the earliest time point,
highest to lowest from the bottom to the top of the graph, and
wherein a genetic variant first appearing at a non-zero quantity at
a later time point is stacked at the top of the graph. In some
embodiments, the areas are represented by different colors. In some
embodiments, the graphical representation further indicates, for
each time point, the quantitative measure of the predominant
genetic variant. In some embodiments, the graphical representation
further includes a key identifying genetic variants represented on
the graph. In some embodiments, graphically representing comprises
normalizing and scaling the quantitative measures.
[0044] In some embodiments, the polynucleotides comprise cfDNA. In
some embodiments, the loci are located in oncogenes. In some
embodiments, the plurality of the genetic variants maps to a
different gene in the genome. In some embodiments, the plurality of
the genetic variants maps to the same gene in the genome. In some
embodiments, at least 10 different oncogenes are sequenced.
[0045] In some embodiments, determining comprises receiving the
sequences into computer memory and using a computer processor to
execute software to determine the quantitative measurement. In some
embodiments, graphically representing comprises using a computer
processor to execute software that transforms the quantitative
measures into a graphical format and representing the graphical
format on an electronic graphical user interface, e.g., a display
screen.
[0046] In an aspect provided herein is a method to generate a paper
or electronic patient test report from data generated by a genetic
analyzer comprising: a) summarizing data from two or more testing
time points, whereby a union of all non-zero testing results are
reported at each subsequent test point after the first test; and b)
rendering the testing results on the paper or electronic patient
test report. In some embodiments, summarizing and rendering are
performed on a computer by executing code with a computer processor
to (i) identify all non-zero testing results, (ii) generate the
test report and (iii) display the test report on a graphical user
interface.
[0047] In an aspect provided herein is a method of graphically
representing evolution of genetic variants of a tumor in a subject
from data generated by a genetic analyzer comprising: a) generating
by computer a stacked representation of genetic variants detected
at each of a plurality of time points in the subject, wherein a
height or width of each layer in the stack that corresponds to a
genetic variant represents a quantitative contribution of the
genetic variant to the a total quantity of genetic variants at each
time point; and b) displaying the stacked representation on a
computer monitor or a paper report. In some embodiments, the method
further comprises using a combination of a magnitude of detected
genetic variants in a body-fluid based test to infer a disease
burden. In some embodiments, the method further comprises using
allele fractions of detected mutations, allelic imbalances,
gene-specific coverage to infer the disease burden.
[0048] In some embodiments, an overall stack height is
representative of overall disease burden or a disease burden score
in the subject. In some embodiments, a distinct color is used to
represent each genetic variant. In some embodiments, only a subset
of detected genetic variants is plotted. In some embodiments, the
subset is chosen based on likelihood of being a driver alteration
or association with increased or reduced response to treatment.
[0049] In some embodiments, the method comprises producing a test
report for a genomic test. In some embodiments, a non-linear scale
is used for representing the heights or widths of each represented
genetic variant. In some embodiments, a plot of previous test
points is depicted on the report. In some embodiments, the method
comprises estimating a disease progression or remission based on
rate of change and/or quantitative precision of each testing
result. In some embodiments, the method comprises displaying a
therapeutic intervention between intervening testing points. In
some embodiments, displaying comprises: a) receiving data
representing the detected tumor genetic variants into computer
memory; b) executing code with a computer processor to graphically
represent the quantitative contribution of each genetic variant at
a time point as a line or area proportional to the relative
contribution; and c) displaying the graphical representation on a
graphical user interface.
BRIEF DESCRIPTION OF THE DRAWINGS
[0050] FIG. 1 shows a flow chart of an exemplary method of
determination and use of a therapeutic intervention.
[0051] FIG. 2 shows a flow chart of an exemplary method of
determining frequency of variants in a sample corrected based on
CNV at a locus.
[0052] FIG. 3 shows a flow chart of an exemplary method of
providing pulsed therapy cycles which can delay drug
resistance.
[0053] FIG. 4 shows a flow chart of an exemplary method of
detecting tumor burden using CNV at origins of replication to
detect DNA from dividing cells.
[0054] FIG. 5 shows an exemplary computer system.
[0055] FIG. 6 shows an exemplary scan of CNV across a region of a
genome from samples containing cells in a resting state and in a
state of cell division. No genomic CNV is seen in loci a and b, but
locus c shows gene duplication. In the resting state cells, copy
number is relatively equal in all intervals in the region, except
those intervals overlapping the locus of gene duplication. In the
sample containing DNA from tumor cells, which are undergoing cell
division, copy number appears to increase immediately after origins
of replication, providing variance in CNV over the region.
Deviation is particularly dramatic at a locus exhibiting CNV at an
origin of replication (c).
[0056] FIG. 7 shows an exemplary course of monitoring and treatment
of disease in a subject.
[0057] FIG. 8 shows an exemplary panel of 70 genes that exhibit
genetic variation in cancer.
[0058] FIG. 9A shows an exemplary system for communicating cancer
test results.
[0059] FIG. 9B shows an exemplary process to reduce error rates and
bias in DNA sequence readings and generate genetic reports for
users.
[0060] FIG. 10A-10C show exemplary processes for reporting genetic
test results to users.
[0061] FIG. 10D-10I-2 show pages from an exemplary genetic test
report.
[0062] FIG. 10J-10P shows various exemplary modified
streamgraph.
[0063] FIG. 11A-11B shows exemplary processes for detecting
mutation and reporting test results to users.
DETAILED DESCRIPTION
[0064] Methods of the present disclosure can detect biomolecular
mosaicism (e.g., genetic mosaicism) in a biological sample, such as
a heterogeneous genomic population of cells or deoxyribonucleic
acid (DNA). Genetic mosaicism can exist at the organismal level.
For example, genetic variants that arise early in development can
result in different somatic cells having different genomes. An
individual can be a chimera, e.g., produced by the fusion of two
zygotes. Organ transplant from an allogeneic donor can result in
genetic mosaics, which also can be detected by examining
polynucleotides shed into the blood from the transplanted organ.
Disease cell heterogeneity, in which diseased cells have different
genetic variants, is another form of genetic mosaicism. Methods
provided herein can detect mosaicism and, in the case of disease,
provide therapeutic intervention. In certain embodiments, this
disclosure provides methods for performing body-wide profiling of
biomolecular mosaicism through the use of circulating
polynucleotides, which may derive or otherwise originate from cells
in diverse locations of the body of a subject.
[0065] Diseased cells, such as tumors, may evolve over time,
resulting in different clonal sub-populations having new genetic
and phenotypic characteristics. This may result from natural
mutations as the cells divide, or it may be driven by treatments
that target certain clonal sub-populations, allowing clones more
resistant to the treatment to proliferate by negative selection.
The existence of sub-populations of diseased cells that bear
different genotypic or phenotypic characteristics is referred to
herein as disease cell heterogeneity, or, in the case of cancer,
tumor heterogeneity.
[0066] Presently, cancers are treated based on mutant forms found
in a cancer biopsy. For example, the finding of Her2+ in even small
amounts of breast cancer cells may be indicative of breast cancer,
which may be followed through with a treatment using an anti-Her230
therapy. As another example, a colorectal cancer in which a KRAS
mutant is found in small amounts may be treated with a therapy for
which KRAS is responsive.
[0067] Tools for fine analysis of diseased cells (e.g., tumors),
allows detection of disease cell heterogeneity. Furthermore, the
analysis of polynucleotides sourced from diseased cells located
throughout the body allows for a whole-body profile of disease cell
heterogeneity. The use of cell-free DNA, or circulating DNA, is
particularly powerful because polynucleotides in the blood are not
sourced from physically localized cells. Rather, they include cells
from metastatic sites throughout the body. For example, analysis
may show that a population of breast-cancer cells includes 90% that
are Her2+ and 10% that are Her2-. This may be determined, for
example, by quantifying DNA for each form in a sample, e.g., cell
free DNA (cfDNA), thereby detecting heterogeneity in the tumor.
[0068] This information can be used by a health care provider,
e.g., a physician, to develop therapeutic interventions. For
example, a subject that has a heterogeneous tumor can be treated as
if they had two tumors, and a therapeutic intervention can treat
each of the tumors. The therapeutic intervention could include, for
example, a combination therapy including a first drug effective
against the first tumor type and a second drug effective against
the second tumor type. The drugs can be given in amounts that
reflect the relative amounts of the mutant forms detected. For
example, a drug to treat the mutant form that is found in higher
relative amounts can be delivered at greater dose than a drug to
treat the mutant form in lesser relative amount. Or, treatment for
the mutant in the lesser relative amount can be delayed or
staggered with respect the mutant in greater amount.
[0069] Monitoring changes in the profile of disease cell
heterogeneity over time allows therapeutic intervention to be
calibrated to an evolving tumor. For example, analysis may show
increasing amounts of polynucleotides bearing drug resistance
mutants. In this case, the therapeutic intervention can be modified
to decrease the amount of drug effective to treat a tumor that does
not bear the resistance mutant and increase administration of a
drug that does treat a tumor bearing the resistance marker.
[0070] Therapeutic interventions can be determined by a healthcare
provider or by a computer algorithm, or a combination of the two. A
database can contain the results of therapeutic interventions
against diseases having various profiles of disease cell
heterogeneity. The database can be consulted in determining a
therapeutic intervention for a disease with a particular
profile.
[0071] This present disclosure provides, among other things,
methods of determining a therapeutic intervention for a subject
having a disease, such as cancer, that exhibits disease cell
heterogeneity, e.g., tumor heterogeneity. In one embodiment, the
method involves analyzing biological macromolecules (e.g.,
sequencing polynucleotides) of disease cells (e.g., spatially
distinct disease cells) from a subject having the disease. A
profile of disease cell heterogeneity is developed that indicates
the existence of genetic variants specific to the disease cells and
the amount of these variants relative to each other. This
information, in turn, is used to determine a therapeutic
intervention that takes the profile into account.
Disease Cells
[0072] A subject of the methods of this disclosure is any
multicellular organism. More specifically, the subject can be a
plant or an animal, a vertebrate, a mammal, a mouse, a primate, a
simian or a human. Animals include, but are not limited to, farm
animals, sport animals, and pets. A subject can be a healthy
individual, an individual that has or is suspected of having a
disease or a pre-disposition to the disease, or an individual that
is in need of therapy or suspected of needing therapy. A subject
can be a patient, e.g., a subject under the care of a professional
heathcare provider.
[0073] The subject can have a pathological condition (disease).
Cells exhibiting pathology of disease are referred to herein as
disease cells.
[0074] In particular, the disease can be a cancer. Cancer is a
condition characterized by abnormal cells that divide out of
control. Cancers include, without limitation, carcinomas, sarcomas,
leukemias, lymphomas, myelomas and central nervous system cancers.
More specific examples of cancers are breast cancer, prostate
cancer, colorectal cancer, brain cancer, esophageal cancer, head
and neck cancer, bladder cancer, gynecological cancer, liposarcoma,
and multiple myeloma.
[0075] Other cancers include, for example, acute lymphoblastic
leukemia (ALL), acute myeloid leukemia (AML), adrenocortical
carcinoma, Kaposi Sarcoma, anal cancer, basal cell carcinoma, bile
duct cancer, bladder cancer, bone cancer, osteosarcoma, malignant
fibrous histiocytoma, brain stem glioma, brain cancer,
craniopharyngioma, ependymoblastoma, ependymoma, medulloblastoma,
medulloeptithelioma, pineal parenchymal tumor, breast cancer,
bronchial tumor, Burkitt lymphoma, Non-Hodgkin lymphoma, carcinoid
tumor, cervical cancer, chordoma, chronic lymphocytic leukemia
(CLL), chronic myelogenous leukemia (CML), colon cancer, colorectal
cancer, cutaneous T-cell lymphoma, ductal carcinoma in situ,
endometrial cancer, esophageal cancer, Ewing Sarcoma, eye cancer,
intraocular melanoma, retinoblastoma, fibrous histiocytoma,
gallbladder cancer, gastric cancer, glioma, hairy cell leukemia,
head and neck cancer, heart cancer, hepatocellular (liver) cancer,
Hodgkin lymphoma, hypopharyngeal cancer, kidney cancer, laryngeal
cancer, lip cancer, oral cavity cancer, lung cancer, non-small cell
carcinoma, small cell carcinoma, melanoma, mouth cancer,
myelodysplastic syndromes, multiple myeloma, medulloblastoma, nasal
cavity cancer, paranasal sinus cancer, neuroblastoma,
nasopharyngeal cancer, oral cancer, oropharyngeal cancer,
osteosarcoma, ovarian cancer, pancreatic cancer, papillomatosis,
paraganglioma, parathyroid cancer, penile cancer, pharyngeal
cancer, pituitary tumor, plasma cell neoplasm, prostate cancer,
rectal cancer, renal cell cancer, rhabdomyosarcoma, salivary gland
cancer, Sezary syndrome, skin cancer, nonmelanoma, small intestine
cancer, soft tissue sarcoma, squamous cell carcinoma, testicular
cancer, throat cancer, thymoma, thyroid cancer, urethral cancer,
uterine cancer, uterine sarcoma, vaginal cancer, vulvar cancer,
Waldenstrom macroglobulinemia, and/or Wilms Tumor.
[0076] A tumor is a collection of cancer cells (cancer disease
cells). This includes, for example, a collection of cells in a
single mass of cells (e.g., a solid tumor), a collection of cells
from different metastatic tumor sites (metastatic tumors), and
diffuse tumors (e.g., circulating tumor cells). A tumor can include
cells of a single cancer (e.g., colorectal cancer), or multiple
cancers (e.g., colorectal cancer and pancreatic cancer). A tumor
can include cells originating from a single original somatic cell
or from different somatic cells.
[0077] In certain embodiments, disease cells in the subject are
spatially distinct. Disease cells are spatially distinct if the
cells are located at least 1 cm, at least 2 cm, at least 5 cm or at
least 10 cm apart in a body, e.g, in different tissues or organs,
or the same tissue or organ. In the case of cancer, examples of
spatially distinct cancer cells include cancer cells from diffuse
cancers (such as leukemias), cancer cells at different metastatic
sites, and cancer cells from the same mass of tumor cells that are
separated by at least 1 cm.
[0078] Disease cell burden (e.g., "tumor burden") is a quantitative
measure of the amount of disease cells in a subject. One measure of
disease cell burden is the fraction of total biological
macromolecules in a sample that are disease biological
macromolecules, e.g., the relative amount of tumor polynucleotides
in a sample of cell free polynucleotides. For example, if cfDNA
from a first subject has 10% cancer polynucleotides, the subject
may be said to have a cell-free tumor burden of 10%, If cfDNA from
a second subject has 5% cancer polynucleotides, the a second
subject may be said to have half the cell-free tumor burden of the
first subject. These measures are much more relevant on an
intra-subject basis than on an inter-subject basis, as cell-free
tumor burdens in one individual can be much higher or lower than
another individual despite differing levels of disease burden.
However, these measures can be used quite effectively for
monitoring disease burden within an individual, e.g., an increase
from a 5% to a 15% cell-free DNA tumor burden may indicate
significant progression of disease, while a decrease from 10% to 1%
may indicate partial response to treatment.
[0079] Polynucleotides to be sequenced can be sourced from
spatially distinct sites. This includes polynucleotides sourced
from biopsies of different locations in a single tumor mass. It
also includes polynucleotides sourced from cells at different
metastatic tumor sites. Cells shed polynucleotides into the blood
where it is detectable as cell free polynucleotides (e.g.,
circulating tumor DNA). Cell free polynucleotides also can be found
in other bodily fluids such as urine. Therefore, cfDNA provides a
more accurate profile of tumor heterogeneity across the entire
disease cell population than DNA sourced from a single tumor
location. DNA sampled from cells across the disease cell population
in a body is referred to as "disease burden DNA" or, in the case of
cancer, "tumor burden DNA".
[0080] Disease cells, such as tumors, can share the same or similar
biomolecular profiles. For example, tumors may share one, two,
three or more genetic variants. Such variants may share the same
stratification, for example highest frequency, second highest
frequency, etc. Profiles can also share similar disease cell
burdens, e.g., cfDNA burdens, e.g., within 15%, within 10%, within
5% or within 2%.
Analytes
[0081] As used herein, a macromolecule is a molecule formed from
monomeric subunits. Monomeric subunits forming biological
macromolecules include, for example, nucleotides, amino acids,
monosaccharides and fatty acids. Biological macromolecules include,
for example, biopolymers and non-polymeric macromolecules.
[0082] A polynucleotide is a macromolecule comprising a polymer of
nucleotides. Polynucleotides include, for example,
polydeoxyribonucleotides (DNA) and polyribonucleotides (RNA). A
polypeptide is a macromolecule comprising a polymer of amino acids.
A polysaccharide is a macromolecule comprising a polymer of
monosaccharides. Lipids are a diverse group of organic compounds
including, for example, fats, oils and hormones that share the
functional characteristic of not interacting appreciably with
water. For example, a triglyceride is a fat formed from three fatty
acid chains.
[0083] A cancer polynucleotide (e.g., cancer DNA) is a
polynucleotide (e.g., DNA) derived from a cancer cell. Cancer DNA
and/or RNA can be extracted from tumors, from isolated cancer cells
or from biological fluids (e.g., saliva, serum, blood or urine) in
the form of cell free DNA (cfDNA) or cell free RNA.
[0084] Cell free DNA is DNA located outside of a cell in a bodily
fluid, e.g., in blood or urine. Circulating nucleic acids (CNA) are
nucleic acids found in the blood stream. Cell free DNA in the blood
is a form of circulating nucleic acid. Cell free DNA is believed to
arise from dying cells that shed their DNA into the blood. Because
spatially distinct cancer cells will shed DNA into bodily fluids,
such as blood, cfDNA of cancer subjects typically comprises cancer
DNA from spatially distinct cancer cells.
Biological Samples
[0085] Analytes for analysis in the methods of this disclosure can
derive from a biological sample, e.g., a sample comprising a
biological macromolecule. A biological sample can be derived from
any organ, tissue or biological fluid. A biological sample can
comprise, for example, a bodily fluid or a solid tissue sample. An
example of a solid tissue sample is a tumor sample, e.g., from a
solid tumor biopsy. Bodily fluids include, for example, blood,
serum, tumor cells, saliva, urine, lymphatic fluid, prostatic
fluid, seminal fluid, milk, sputum, stool and tears. Bodily fluids
are particularly good sources of biological macromolecules from
spatially distinct disease cells, as such cells from many locations
in a body can shed these molecules into the bodily fluid. For
example, blood and urine are good sources of cell free
polynucleotides. Macromolecules from such sources can provide a
more accurate profile of the diseased cells than macromolecules
derived from a localized disease cell mass.
[0086] Amounts of disease polynucleotides in a bodily fluid sample
can be increased. Such increases can increase sensitivity of
detection of disease polynucleotides. In one method, an
intervention, such as a therapeutic intervention, is administered
to a subject that causes disease cells to lyse, emptying their DNA
into the surrounding fluid. Such interventions can include
administration of chemotherapy. It also can include administering
radiation or ultrasound to the whole body of a subject, or to a
portion of the body of a subject, such as being directed to a tumor
or a diseased organ. After administration of the intervention and
when the amount disease polynucleotides in the fluid is increased,
a fluid sample is collected for analysis. The interval between
administration of the intervention and collection can be long
enough for the disease polynucleotides to increase, but not so long
that they are cleared from the body. For example, a low dose of
chemotherapy can be administered about a week before collection of
the sample.
Analytic Methods
[0087] This disclosure contemplates several types of biomolecular
analysis including, for example, genomic, epigenetic (e.g.,
methylation), RNA expression and proteomic. Genomic analysis can be
performed by, for example, a genetic analyzer, e.g., using DNA
sequencing. Methylation analysis can be performed by, for example,
conversion of methylated bases followed by DNA sequencing. RNA
expression analysis can be performed by, for example,
polynucleotide array hybridization. Proteomic analysis can be
performed by, for example, mass spectrometry.
[0088] As used herein, the term "genetic analyzer" refers to a
system including a DNA sequencer for generating DNA sequence
information and a computer comprising software that performs
bioinformatic analysis on the DNA sequence information.
Bioinformatic analysis can include, without limitation, assembling
sequence data, detecting and quantifying genetic variants in a
sample, including either of germline variants (e.g.,
heterozygosity) and somatic cell variants (e.g., cancer cell
variants).
[0089] Analytic methods can include generating and capturing
genetic information. Genetic information can include genetic
sequence information, ploidy states, the identity of one or more
genetic variants, as well as a quantitative measure of the
variants. The term "quantitative measure" refers to any measure of
quantity including absolute and relative measures. A quantitative
measure can be, for example, a number (e.g., a count), a
percentage, a frequency, a degree or a threshold amount.
[0090] Polynucleotides can be analyzed by any method known in the
art. Typically, the DNA sequencer will employ next generation
sequencing (e.g., Illumina, 454, Ion torrent, SOLiD). Sequence
analysis can be performed by massively parallel sequencing, that
is, simultaneously (or in rapid succession) sequencing any of at
least 100,000, 1 million, 10 million, 100 million, or 1 billion
polynucleotide molecules. Sequencing methods may include, but are
not limited to: high-throughput sequencing, pyrosequencing,
sequencing-by-synthesis, single-molecule sequencing, nanopore
sequencing, semiconductor sequencing, sequencing-by-ligation,
sequencing-by-hybridization, RNA-Seq (Illumina), Digital Gene
Expression (Helicos), Next generation sequencing, Single Molecule
Sequencing by Synthesis (SMSS) (Helicos), massively-parallel
sequencing, Clonal Single Molecule Array (Solexa), shotgun
sequencing, Maxam-Gilbert or Sanger sequencing, primer walking,
sequencing using PacBio, SOLiD, Ion Torrent, Genius (GenapSys) or
Nanopore (e.g., Oxford Nanopore) platforms and any other sequencing
methods known in the art.
[0091] The DNA sequencer can apply Gilbert's sequencing method
based on chemical modification of DNA followed by cleavage at
specific bases, or it can apply Sanger's technique which is based
on dideoxynucleotide chain termination. The Sanger method became
popular due to its increased efficiency and low radioactivity. The
DNA sequencer can use techniques that do not require DNA
amplification (polymerase chain reaction--PCR), which speeds up the
sample preparation before sequencing and reduces errors. In
addition, sequencing data is collected from the reactions caused by
the addition of nucleotides in the complementary strand in real
time. For example, the DNA sequencers can utilize a method called
Single-molecule real-time (SMRT), where sequencing data is produced
by light (captured by a camera) emitted when a nucleotide is added
to the complementary strand by enzymes containing fluorescent
dyes.
[0092] Sequencing of the genome can be selective, e.g., directed to
portions of the genome of interest. For example, many genes (and
mutant forms of these genes) are known to be associated with
various cancers. Sequencing of select genes, or portions of genes
may suffice for the analysis desired. Polynucleotides mapping to
specific loci in the genome that are the subject of interest can be
isolated for sequencing by, for example, sequence capture or
site-specific amplification.
[0093] A nucleotide sequence (e.g., DNA sequence) can refer to raw
sequence reads or processed sequence reads, such as unique
molecular counts inferred from raw sequence reads.
[0094] Sequence reads generated from sequencing are subject to
analysis including, for example, identifying genetic variants. This
can include identifying sequence variants and quantifying numbers
of base calls at each locus. Quantifying can involve, for example,
counting the number of reads mapping to a particular genetic locus.
Different numbers of reads at different loci can indicate copy
number variation (CNV).
[0095] Sequencing and bioinformatics methods that reduce noise and
distortion are particularly useful when the number of target
polynucleotides in a sample is small compared with non-target
polynucleotides. When the target molecules are few in number, the
signal from the target may be weak. This can be the case, for
example, in the case of cell free DNA, where a small number of
tumor polynucleotides may be mixed with a much larger number of
polynucleotides from healthy cells. Molecular tracking methods can
be useful in such situations. Molecular tracking involves tracking
sequence reads from a sequencing protocol back to molecules in an
original sample (e.g., before amplification and/or sequencing) from
which the reads are derived. Certain methods involve tagging
molecules in such a way that multiple sequence reads produced from
original molecules can be grouped into families of sequences
derived from original molecules. In this way, base calls
representing noise can be filtered out. Such methods are described
in more detail in, for example, WO 2013/142389 (Schmitt et al.), US
2014/0227705 (Vogelstein et al.) and WO 2014/149134 (Talasaz et
al.). Up-sampling methods also are useful to more accurately
determine counts of molecules in a sample. In some embodiments,
up-sampling methods involve determining a quantitative measure of
individual DNA molecules for which both strands (Watson and Crick
strands) are detected; determining a quantitative measure of
individual DNA molecules for which only one of the DNA strands is
detected; inferring from these measures a quantitative measure of
individual DNA molecules for which neither strand was detected; and
using these measures to determine the quantitative measure
indicative of a number of individual double-stranded DNA molecules
in the sample. This method is described in more detail in
PCT/US2014/072383, filed Dec. 24, 2014.
Genetic Variants
[0096] Methods of the present disclosure can be used in the
detection of genetic variants (also referred to a "gene
alterations"). Genetic variants are alternative forms at a genetic
locus. In the human genome, approximately 0.1% of nucleotide
positions are polymorphic, that is, exist in a second genetic form
occurring in at least 1% of the population. Mutations can introduce
genetic variants into the germ line, and also into disease cells,
such as cancer. Reference sequences, such as hg19 or NCBI Build 37
or Build 38, intend to represent a "wild type" or "normal" genome.
However, to the extent they have a single sequence, they do not
identify common polymorphisms which may also be considered
normal.
[0097] Genetic variants include sequence variants, copy number
variants and nucleotide modification variants. A sequence variant
is a variation in a genetic nucleotide sequence. A copy number
variant is a deviation from wild type in the number of copies of a
portion of a genome. Genetic variants include, for example, single
nucleotide variations (SNPs), insertions, deletions, inversions,
transversions, translocations, gene fusions, chromosome fusions,
gene truncations, copy number variations (e.g., aneuploidy, partial
aneuploidy, polyploidy, gene amplification), abnormal changes in
nucleic acid chemical modifications, abnormal changes in epigenetic
patterns and abnormal changes in nucleic acid methylation.
[0098] Genetic variants can be detected by comparing sequences from
polynucleotides in a sample to a reference, e.g., to a reference
genome sequence, to an index or to a database of known mutations.
In one embodiment, the reference sequence is a publicly available
reference sequence, such as the human genome sequence HG-19 or NCBI
Build 37. In another embodiment, the reference sequence is a
sequence in a non-public database. In another embodiment, the
reference sequence is a germ line sequence of an organism inferred
or determined from sequencing polynucleotides from the
organism.
[0099] A somatic mutation or somatic alteration is a genetic
variant that arises in a somatic cell. Somatic mutations are
distinguished from mutations that arise in the genome of a germ
line cell (i.e., sperm or egg) or a zygote, of an individual.
Somatic mutations, e.g., those found in cancer cells, are
distinguishable from the germ line genome of a subject in which the
cancer arose. They also can be detected by comparing the cancer
genome with the germ line genome or with a reference genome. There
also are known genetic variants that are common in cancer cells. A
database of SNVs in human cancer can be found at the website:
cancer.sanger.ac.uk/cancergenome/projects/cosmic/.
[0100] FIG. 8 shows genes known, in cancer, to exhibit point
mutations, amplifications, fusions and indels.
CNV Deviation in Rapidly Dividing Cells
[0101] During the S phase of the cell cycle, the cell replicates
DNA. A diploid cell having 2N chromosomes with replicated DNA may
correspond to about 4.times. DNA content, whereas a diploid cell
having 2N chromosomes without replicated DNA may correspond to
about 2.times. DNA content. Replication proceeds from origins of
replication. In mammals, origins of replication are spaced at
intervals of about 15 kb to 300 kb. During this period, portions of
the genome exist in polyploid form. Those areas between origins of
replication and the position of the polymerase are duplicated,
while those areas beyond the position of the polymerase (or just
before the origin of replication) are still in single copy number
in the strand undergoing replication. When scanned across the
genome, copy number appears uneven or distorted, having regions
that exist in polyploidy form and regions that exist in diploid
form. Such a scan appears noisy. This is true even for cells that
do not bear copy number variations in the genome in the resting
state. In contrast, a scan of CNV in cells in G.sub.0 shows a
profile in which copy number is relatively flat or undistorted
across the genome. Because cancer cells divide rapidly, their CNV
profile across the genome exhibits distortion, whether or not the
genome also bears CNVs at certain loci.
[0102] One can take advantage of this fact to detect tumor burden
in DNA from samples comprising heterogeneous DNA, e.g., a mixture
of disease DNA and healthy DNA, such as cfDNA. One method to detect
tumor burden involves determining copy-number variation due to
proximity of examined locus or loci to various origins of
replication. Regions that include a replication origin will have
very close to 4 copies of DNA in that locus (in a diploid cell),
while regions that are far removed from a replication origin will
have closer to 2 copies (in a diploid cell). In certain
embodiments, the examined locus or loci include, at least 1kb, at
least 10 kb, at least 100 kb, at least 1 mb, at least 10 mb, at
least 100 mb, across an entire chromosome or across an entire
genome. A measure of replication origin CNV (ROCNV) across the
region is determined. This can be, for example, a measure of
deviation in copy number from a value of central tendency. The
value of central tendency can be, for example, mean, median or
mode. The measure of deviation can be for example, variance or
standard deviation. This measure can be compared with a measure of
ROCNVs across the same region in a control sample, e.g., from a
healthy individual or cells in resting state. ROCNVs can be
determined by partitioning the region or regions analyzed into
non-overlapping partitions of various lengths and taking a measure
of CNV in this partition. This measure of CNV can be derived from
the number of reads or fragments determined to map to those regions
after sequencing. The partitions can have various sizes, to produce
various levels of resolution, e.g., a single base level
(base-per-base), 10 bases, 100 bases, 1 kb, 10 kb or 100 kb.
Deviations that are greater than a control indicate the presence of
DNA undergoing replication, which, in turn, indicates malignancy.
The greater the degree of deviation, the greater the amount of DNA
from cells undergoing cell division in the sample.
[0103] Various methods can be used to calculate true genetic copy
number variations that differ from replication origin based
distortion. For example, heterozygous SNP positions at affected CNV
loci can be used to infer copy number variation by calculating the
deviation from 50% or the allelic imbalance at those loci.
Distortion due to replication origin proximity should not affect
this imbalance since both copies would generally be copied at
similar time intervals and thus self-normalizing (although allelic
changes could conceivably change the replication of origin between
the two allelic variants). For example, duplication of a chromosome
segment containing a SNP could be detected in around 67% of reads,
while duplication resulting from ROCNV would be detected in about
50% of reads. In another method, counting-based techniques that use
the density of detected fragments or reads at a certain locus are
used to calculate relative copy number. These techniques are
generally limited by poisson noise and systematic bias due to DNA
sample preparation and sequencing bias. A combination of these
methods may also be to obtain even greater accuracy.
[0104] ROCNV can be calculated for a given sample and be used to
give a value on cell-free tumor burden despite lack of detection of
traditional somatic variants, such as, SNVs, gene-specific CNVs,
genomic rearrangements, epigenetic variants, loss of
heterozygosity, etc. ROCNVs can also be used to subtract distortion
for a given sample to increase sensitivity and/or specificity of a
given CNV detection/estimation method by removing variation that is
related to replication origin proximity rather than due to true
copy number changes in a cell. Cell-lines with known or no copy
number changes over a reference can also be used as a reference of
ROCNVs for use in estimating its contribution to a given
sample.
[0105] In one embodiment, the method involves determining a
baseline level of copies of DNA molecules at one or more loci from
one or more control samples, each containing DNA from cells
undergoing a predetermined level of cell division, e.g., cells in
resting state or rapidly dividing tumor cells. A measure of copies
of DNA molecules in a test sample is also determined. The measure
in test samples can be from one or more loci partitioned into one
or more partitions. In each case, a plurality of loci each include
an origin or replication. The measure of copies from the test
sample can be an average across all partitions, or a level of
variance across loci. A measure of central tendency or of variation
(e.g., variance or standard deviation) in copy number in the test
sample is compared to the control sample. A measure that is greater
in a test sample than in a control of cells in resting state, or
slowly dividing, indicates that cells generating the DNA in the
test sample are dividing more rapidly than cells providing DNA to
the control sample, e.g., are cancerous. Similarly, measures that
are similar between a test sample and a control of cells in
actively dividing state, indicates that cells generating the DNA in
the test sample are dividing at a rate similar to the rapidly
dividing cells, e.g., are cancerous.
Disease Cell Heterogeneity
[0106] Disease cell heterogeneity, e.g., tumor heterogeneity, is
the occurrence of diseased cells having different genetic variants.
Disease cell heterogeneity can be determined by examination of
polynucleotides isolated from diseased cells and detection of
differences in their genomes. Disease cell heterogeneity also can
be inferred from examination of polynucleotides from a sample
containing polynucleotides from both diseased and healthy cells
based on differences in relative frequency of somatic mutations.
For example, cancer is characterized by changes at the genetic
level, e.g., through the accumulation of somatic mutations in
different clonal groups of cells. These changes can contribute to
unregulated growth of the cancer cells, or function as markers of
responsiveness or non-responsiveness to various therapeutic
interventions.
[0107] Tumor heterogeneity is a condition in which a tumor
characterized by cancer cells containing different combinations of
genetic variants, e.g., different combinations of somatic
mutations. That is, the tumor can have different cells containing
alterations in different genes, or containing different alterations
in the same gene. For example, a first cell could include a mutant
form of BRAF, while a second cell could include mutant forms of
both BRAF and ERBB2. Alternatively, a first cancer cell could
include the single nucleotide polymorphism EGRF 55249063 G>A,
while a second cell could include the single nucleotide
polymorphism EGRF 55238874 T>A. (Numbers refer to nucleotide
position in genomic reference sequence.)
[0108] For example, an original tumor cell can include a genetic
variant in a gene, e.g., an oncogene. As the cells continue to
divide, some progeny cells, which carry the original mutation, may
independently develop genetic variants in other genes or in
different parts of the same gene. In subsequent divisions, tumor
cells can accumulate still more genetic variants.
Profile of Disease Heterogeneity
[0109] Methods of this disclosure allow quantitative as well as
qualitative profiling of disease mosaicism, e.g., tumor
heterogeneity. In one embodiment, the profile includes information
from polynucleotides from spatially distinct disease cells. In one
embodiment, the profile is a whole body profile containing
information from cells distributed throughout the body. Analysis of
polynucleotides in cfDNA allows sampling of DNA across the entire
geographic extent of a tumor, in contrast with sampling of a
localized area of a tumor. In particular, it allows sampling of
diffuse and metastatic tumors. This contrasts with methods that
detect the mere existence of tumor heterogeneity through the
localized sampling of a tumor. The profile can indicate the exact
nucleotide sequence of the variant, or may simply indicate a gene
bearing the somatic mutation.
[0110] In one embodiment of a profile of disease cell
heterogeneity, such as tumor cell heterogeneity, the profile
identifies genetic variations and the relative amounts of each
variant. From this information, one can infer possible
distributions of the variants in different cell sub-population. For
example, a cancer may begin with a cell bearing somatic mutation X.
As a result of clonal evolution, some progeny of this cell may
develop variant Y. Other progeny may develop variant Z. At the
cellular level, after analysis, the tumor may be characterized as
50% X, 35% XY and 15% XZ. At the DNA level (and considering DNA
from tumor cells only), the profile may indicate 100% X, 35% Y and
15% Z. One may also detect both CNV at a first locus and sequence
variants at a second locus.
[0111] Tumor heterogeneity can be detected from analysis of
sequences of cancer polynucleotides, based on the existence of
genomic variations at different loci occurring at different
frequencies. For example, in a sample of cell free DNA (which is
likely to contain germ line DNA as well as cancer DNA), it may be
found that a sequence variant of BRAF occurs at a frequency of 17%,
a sequence variant of CDKN2A occurs at a frequency of 6%, a
sequence variant of ERBB2 occurs at a frequency of 3% and a
sequence variant of ATM occurs at a frequency of 1%. These
different frequencies of sequence variants indicate tumor
heterogeneity. Similarly, genetic sequences exhibiting different
amounts of copy number variation also indicate tumor heterogeneity.
For example, analysis of a sample may show different levels of
amplification for the EGFR and CCNE1 genes. This also indicates
tumor heterogeneity.
[0112] In the case of cell free DNA, detection of somatic mutations
can be made by comparing base calls in the sample to a reference
sequence or, internally, as less frequent base calls to more common
base calls, presumed to be in the germ line sequence. In either
case, the existence of sub-dominant forms (e.g., less than 40% of
total base calls) at different loci and at different frequency
indicates disease cell heterogeneity.
[0113] Cell free DNA typically comprises a preponderance of DNA
from normal cells having the germ line genome sequence and, in the
case of a disease, such as cancer, a small percentage of DNA from
cancer cells and having a cancer genome sequence. Sequences
generated from polynucleotides in a sample of cfDNA can be compared
with a reference sequence to detect differences between the
reference sequence and the polynucleotides in the cfDNA. At any
locus, all or nearly all of the polynucleotides from a test sample
may be identical to a nucleotide in the reference sequence.
Alternatively, a nucleotide detected at nearly 100% frequency in a
sample may be different than a nucleotide in the reference
sequence. This most likely indicates a normal polymorphic form at
this locus. If a first nucleotide that matches a reference
nucleotide is detected at about 50% and a second nucleotide that is
different than a reference nucleotide is detected at about 50%,
this most likely indicates normal heterozygosity. Heterozygosity
may present at allele ratios divergent from 50:50, e.g., 60:40 or
even 70:30. However, if the sample comprises a nucleotide
detectable above noise at a frequency below (of above) an
unambiguously heterozygote range (for example, less than about 45%,
less than 40%, less than 30%, less than 20%, less than 10% or less
than 5%), this can be attributed to the existence of somatic
mutations in a percentage of the cells contributing DNA to the
cfDNA population. These may come from disease cells, e.g., cancer
cells. (The exact percentage is a function of tumor load.) If the
frequency of somatic mutations at two different genetic loci are
different, e.g., 16% at one locus and 5% at another locus, this
indicates that the disease cells, e.g., the cancer cells, are
heterogeneous.
[0114] In the case of DNA from solid tumors, which is expected to
predominantly comprise tumor DNA, somatic mutations also can be
detected by comparison to a reference sequence. Detection of
somatic mutations that exist in 100% of the tumor cells may require
reference to a standard sequence or information about known mutants
to. However, the existence of sub-dominant sequences among the
polynucleotide pool at different loci and at different relative
frequencies, indicates tumor heterogeneity.
[0115] The profile may include genetic variants in genes that are
known to be actionable. Knowledge of such variants can contribute
to selecting therapeutic interventions, as therapies can be
targeted to such variants. In the case of cancer, many actionable
genetic variants are already known.
CNV and SNV in Disease Cell Heterogeneity
[0116] In general, the copy number state of a gene should be
reflected in the frequency of a genetic form of the gene in the
sample. For example, a sequence variant may be detected at a
frequency consistent with homozygosity or heterozygosity (e.g.,
about 100% or about 50%, respectively) with no copy number
variation. This is consistent with a germ line polymorphism or
mutation. A sequence variant may be detected at frequency of about
67% (or, alternatively, at about 33%) of polynucleotides at a
locus, and also in a gene measured at increased copy number
(generally, n=2), This is consistent with gene duplication in the
germ line. For example, a trisomy would present in this fashion.
However, if a sequence variant is detected at a level consistent
with homozygosity (e.g., about 100%) but at amounts consistent with
copy number variation, this is more likely to reflect the presence
of disease cell polynucleotides having undergone gene
amplification. Similarly, if a sequence variant is detected at a
level not inconsistent with heterozygosity (e.g., deviating
somewhat from 50%) but at amounts consistent with copy number
variation, this also is more likely to reflect the presence of
disease cell polynucleotides; the diseased polynucleotides create
some level of imbalance in allele frequency away from 50:50.
[0117] This observation can be used to infer whether a sequence
variant is more likely present in the germ line level or resulted
from a somatic cell mutation, e.g., in a cancer cell. For example,
a sequence variant in a gene detected at levels arguably consistent
with heterozygosity in the germ line is more probably the product
of a somatic mutation in disease cells if copy number variation
also is detected in that gene.
[0118] Also, to the extent we expect that a gene duplication in the
germ line should bear a variant consistent with increased genetic
dose (e.g., about 67% for trisomy at a locus), detection gene
amplification with a sequence variant dose that deviates
significantly from this expected amount indicates that the CNV is
more likely present as a result of somatic cell mutation.
[0119] The fact that somatic mutations at different loci may be
present at single or multiple copy number in the same disease cell
also can be used to infer tumor heterogeneity. More specifically,
tumor heterogeneity can be inferred when two genes are detected at
different frequency but their copy number is relatively equal.
Alternatively, tumor homogeneity can be inferred when the
difference in frequency between two sequence variants is consistent
with difference in copy number for the two genes. Thus, if an EGFR
variant is detected at 11% and a KRAS variant is detected at 5%,
and no CNV is detected at these genes, the difference in frequency
likely reflects tumor heterogeneity (e.g., all tumor cells carry an
EGFR mutant and half the tumor cells also carry a KRAS mutant).
Alternatively, if the EGFR gene carrying the mutant is detected at
increased copy number, one consistent interpretation is a
homogenous population of tumor cells, each cell carrying a mutant
in the EGFR and KRAS genes, but in which the KRAS gene is
duplicated. Accordingly, both the frequency of a sequence variant
and a measure of CNV at the locus of the sequence variant in a
sample can be determined. The frequency can then be corrected to
reflect the relative number of cells bearing the variant by
weighing the frequency based on dose per cell determined from the
measure of CNV. This result is now more comparable in terms of
number of cells carrying the variant to a sequence variant that
does not vary in copy number.
Communicating Test Results
[0120] A report of results from genetic variant analysis (e.g.,
sequence variants, CNV, disease cell heterogeneity, and
combinations thereof) may be provided by a report generator, for
example to a healthcare practitioner, e.g., a physician, to aid the
interpretation of the test results (e.g., data) and selection of
treatment options. A report generated by a report generator may
provide additional information, such as clinical lab results, that
may be useful for diagnosing disease and selecting treatment
options.
[0121] Referring now to FIG. 9A, a system with a report generator 1
for reporting on, e.g., cancer test results and treatment options
therefrom is schematically illustrated. The report generator system
can be a central data processing system configured to establish
communications directly with: a remote data site or lab 2, a
medical practice/healthcare provider (treating professional) 4,
and/or a patient/subject 6 through communication links. The lab 2
can be medical laboratory, diagnostic laboratory, medical facility,
medical practice, point-of-care testing device, or any other remote
data site capable of generating subject clinical information.
Subject clinical information includes but it is not limited to
laboratory test data, e.g., analysis of genetic variants; imaging
and X-ray data; examination results; and diagnosis. The healthcare
provider or practice 6 may include medical services providers, such
as doctors, nurses, home health aides, technicians and physician's
assistants, and the practice may be any medical care facility
staffed with healthcare providers. In certain instances the
healthcare provider/practice is also a remote data site. Where
cancer is a disease to be treated, the subject may be afflicted
with cancer, among other possible diseases or disorders.
[0122] Other clinical information for a cancer subject 6 can
include the results of laboratory tests, e.g., analysis of genetic
variants, metabolic panel, complete blood count, etc.; medical
imaging data; and/or medical procedures directed to diagnosing the
condition, providing a prognosis, monitoring the progression of the
disease, determining relapse or remission, or combinations thereof.
The list of appropriate sources of clinical information for cancer
includes, but it is not limited to, CT scans, MRI scans, ultrasound
scans, bone scans, PET Scans, bone marrow test, barium X-ray,
endoscopies, lymphangiograms, IVU (Intravenous urogram) or IVP (IV
pyelogram), lumbar punctures, cystoscopy, immunological tests
(anti-malignin antibody screen), and cancer marker tests.
[0123] The subject 6's clinical information may be obtained from
the lab 2 manually or automatically. Where simplicity of the system
is desired, the information may be obtained automatically at
predetermined or regular time intervals. A regular time interval
can refer to a time interval at which the collection of the
laboratory data is carried out automatically by the methods and
systems described herein based on a measurement of time such as
hours, days, weeks, months, years etc. In one embodiment, the
collection of data and processing is carried out at least once a
day. In one embodiment, the transfer and collection of data is
carried out about any of monthly, biweekly, weekly, several times a
week or daily. Alternatively the retrieval of information may be
carried out at predetermined time intervals, which may not be
regular time intervals. For instance, a first retrieval step may
occur after one week and a second retrieval step may occur after
one month. The transfer and collection of data can be customized
according to the nature of the disorder that is being managed and
the frequency of required testing and medical examinations of the
subjects.
[0124] FIG. 9B shows an exemplary process to generate genetic
reports, including a tumor response map and associated summary of
alterations. A tumor response map is a graphical representation of
genetic information indicating changes over time in genetic
information from a tumor, e.g., qualitative and quantitative
changes. Such changes can reflect response of a subject to a
therapeutic intervention. This process can reduce error rates and
bias that may be orders of magnitude higher than what is required
to reliably detect de novo genetic variants associated with cancer.
The process can comprise first capturing genetic information by
collecting body fluid samples as sources of genetic material (e.g.,
blood, saliva, sweat, urine, etc). Then, the process can comprise
sequencing the materials (11). For example, polynucleotides in a
sample can be sequenced, producing a plurality of sequence reads.
The tumor burden in a sample that comprises polynucleotides can be
estimated as the relative number of sequence reads bearing a
variant to the total number of sequence reads generated from the
sample. Where copy number variants are analyzed, the tumor burden
can be estimated as the relative excess (e.g., in the case of gene
duplication) or relative deficit (e.g., in the case of gene
elimination) of the total number of sequence reads at test and
control loci. For example, a run may produce 1000 reads mapping to
an oncogene locus of which 900 correspond to wild type and 100
correspond to a cancer mutant, indicating a copy number variant at
this gene. More details on exemplary specimen collection and
sequencing of the genetic materials are discussed below in FIGS.
10-11.
[0125] Next, genetic information can be processed (12). Genetic
variants can then be identified. The process can comprise
determining the frequency of genetic variants in the sample
containing the genetic material. The process can comprise
separating information from noise (13) if this process is
noisy.
[0126] The sequencing methods for genetic analysis may have error
rates. For example, the mySeq system of Illumina can produce
percent error rates in the low single digits. For 1000 sequence
reads mapping to a locus, about 50 reads (about 5%) may be expected
to include errors. Certain methodologies, such as those described
in WO 2014/149134 can significantly reduce the error rate. Errors
create noise that can obscure signals from cancer present at low
levels in a sample. For example, if a sample has a tumor burden at
a level around the sequencing system error rate, e.g., around
0.1%-5%, it may be difficult to distinguish a signal corresponding
to a genetic variant due to cancer from one due to noise.
[0127] Analysis of genetic variants may be used for diagnosing in
the presence of noise. The analysis can be based on the frequency
of Sequence Variants or Level of CNV (14) and a diagnosis
confidence indication or level for detecting genetic variants in
the noise range can be established (15).
[0128] Next, the process can comprise increasing the diagnosis
confidence. This can be done using a plurality of measurements to
increase confidence of diagnosis (16), or alternatively using
measurements at a plurality of time points to determine whether
cancer is advancing, in remission or stabilized (17). The
diagnostic confidence can be used to identify disease states. For
example, cell free polynucleotides taken from a subject can include
polynucleotides derived from normal cells, as well as
polynucleotides derived from diseased cells, such as cancer cells.
Polynucleotides from cancer cells may bear genetic variants, such
as somatic cell mutations and copy number variants. When cell free
polynucleotides from a sample from a subject are sequenced, these
cancer polynucleotides are detected as sequence variants or as copy
number variants.
[0129] Measurements of a parameter, whether or not they are in the
noise range, may be provided with a confidence interval. Tested
over time, one can determine whether a cancer is advancing,
stabilized or in remission by comparing confidence intervals over
time. When confidence intervals overlap, one may not be able to
tell whether disease is increasing or decreasing, because there is
no statistically significant difference between the measures.
However, where the confidence intervals do not overlap, this
indicates the direction of disease. For example, comparing the
lowest point on a confidence interval at one time point and the
highest point on a confidence interval at a second time point
indicates the direction.
[0130] Next, the process can comprise generating genetic
Report/Diagnosis. The process can comprise generating genetic graph
for a plurality of measurements showing mutation trend (18) and
generating report showing treatment results and options (19).
[0131] FIGS. 10A-10C show in more details one embodiment for
generating genetic reports and diagnosis (e.g., Report/Diagnosis).
In one implementation, FIG. 10C shows an exemplary pseudo-code
executed by the system of FIG. 9A to process non-CNV reported
mutant allele frequencies. However, the system can process CNV
reported mutant allele frequencies as well.
[0132] Samples comprising genetic material, such as cfDNA, can be
collected from a subject at a plurality of time points, that is,
serially. The genetic material can be sequenced, e.g., using a
high-throughput sequencing system. Sequencing can target loci of
interest to detect genetic variants, such genes bearing somatic
mutations, genes that undergo copy number variation, or genes
involved in gene fusions, for example, in cancer. At each time
point, a quantitative measure of the genetic variants found can be
determined. For example, in the case of cfDNA, the quantitative
measure can be the frequency or percentage of a genetic variant
among polynucleotides mapping to a locus, or the absolute number of
sequence reads or polynucleotides mapping to a locus. Genetic
variants having a non-zero quantity at at least one time point can
then be represented graphically through all time points. For
example, in a collection of 1000 sequences, variant 1 may be found
at time points 1, 2 and 3 in amounts of 50, 30 and 0, respectively.
Variant 2 may be found in amounts 0, 10 and 20 at these time
points. These amounts can be normalized, for variant 1, to 5%, 3%
and 0%, and, for variant 2, 0%, 1% and 2%. A graphical
representation showing the union of all non-zero results can
indicate these amounts for both variants at all of the time points.
The normalized amounts can be scaled so that each percentage is
represented by a layer, for example, having height 1 mm. So, for
example, in this case the heights would be at time point 1: heights
5 mm (variant 1) and 0 mm (variant 2); at time point 2: heights 3
mm (variant 1) and 1 mm (variant 2), at time point 3: heights 0 mm
(variant 1) and 2 mm (variant 2). The graphical representation can
be in the form of a stacked area graph, such as a streamgraph. A
"zero" time point (before the first time point) can be represented
by a point, with all values at 0. The height of the quantity of the
variants in the graphical representation can be, for example,
relative or proportional to each other. For example, a variant
frequency 5% at one time point could be represented with a height
of twice that of a variant with frequency of 2.5% at the same time
point. The order of stacking can be chosen for ease of
understanding. For example, variants can be stacked in order of
quantity high to low from bottom to top. Or, they can be stacked in
a streamgraph with the variant of largest initial amount in the
middle, and other variants of decreasing quantity on either side.
In certain embodiments, the areas can be color coded based on
variant. Variants in the same gene can be shown in different hues
of the same color. For example, KRAS mutants can be shown in
different shades of blue, EGFR mutants in different shades of
red.
[0133] Turning now to FIG. 10A, the process can comprise receiving
genetic information from a DNA sequencer (30). The process can then
comprise determining specific gene alterations and quantities
thereof (32).
[0134] Next, a tumor response map is generated. To generate the
map, the process can comprise normalizing the quantities for each
gene alteration for rendering across all test points and then
generates a scaling factor (34). As used herein, the term
"normalize" generally refers to means adjusting values measured on
different scales to a notionally common scale. For example, data
measured at different points are converted/adjusted so that all
values can be resized to a common scale. As used herein, the term
"scaling factor" generally refers to a number which scales, or
multiplies, some quantity. For example, in the equation y=Cx, C is
the scale factor for x. C is also the coefficient of x, and may be
called the constant of proportionality of y to x. The values are
normalized to allow plotting on a common scale that is
visually-friendly. And the scaling factor is used to know the exact
heights that correspond to the values to be plotted (e.g. 10%
mutant allele frequency may represent 1 cm on the report wherein
the total height is 10 cm). The scaling factor is applied to all
test points and thus is considered to be a universal scaling
factor. For each test point, the process can comprise rendering
information on a tumor response map (36). In operation 36, the
process can comprise rendering alterations and relative heights
using the determined scaling factor (38) and assigns a unique
visual indicator for each alteration (40). In addition to the
response map, the process can comprise generating a summary of
alterations and treatment options (42). Also, information from
clinical trials that may help the particular genetic alterations
and other helpful treatment suggestions is presented, along with
explanations of terminology, test methodology, and other
information is added to the report and rendered for the user.
[0135] In one implementation, the copy number variation may be
reported as graph, indicating various positions in the genome and a
corresponding increase or decrease or maintenance of copy number
variation at each respective position. Additionally, copy number
variation may be used to report a percentage score indicating how
much disease material (or nucleic acids having a copy number
variation) exists in the cell free polynucleotide sample.
[0136] In another embodiment, the report includes annotations to
help physicians interpret the results and recommend treatment
options. The annotating can include annotating a report for a
condition in the NCCN Clinical Practice Guidelines in Oncology.TM.
or the American Society of Clinical Oncology (ASCO) clinical
practice guidelines. The annotating can include listing one or more
FDA-approved drugs for off-label use, one or more drugs listed in a
Centers for Medicare and Medicaid Services (CMS) anti-cancer
treatment compendia, and/or one or more experimental drugs found in
scientific literature, in the report. The annotating can include
connecting a listed drug treatment option to a reference containing
scientific information regarding the drug treatment option. The
scientific information can be from a peer-reviewed article from a
medical journal. The annotating can include providing a link to
information on a clinical trial for a drug treatment option in the
report. The annotating can include presenting information in a
pop-up box or fly-over box near provided drug treatment options in
an electronic based report. The annotating can include adding
information to a report selected from the group consisting of one
or more drug treatment options, scientific information concerning
one or more drug treatment options, one or more links to scientific
information regarding one or more drug treatment options, one or
more links to citations for scientific information regarding one or
more drug treatment options, and clinical trial information
regarding one or more drug treatment options.
[0137] FIG. 10B shows an exemplary process to generate a tumor
response map pathway which may be used by a healthcare
practitioner, e.g., physician, for example to make patient care
decisions. In this embodiment, the process can comprise first
determining a global scaling factor (43). In one embodiment, for
all non-CNV (copy number variation) reported mutant allele
frequencies, the process can comprise transforming the absolute
value into a relative metric/scale that may be more amenable for
plotting (e.g. Multiply mutant allele frequency by 100 and take log
of that value) and determines a global scaling factor using maximum
observed value. The process then involves visualizing information
from the earliest test dataset (44). Visualizing can comprise
graphically representing the information on a user interface (e.g.,
a computer screen) or in tangible form (e.g., on a piece of paper).
For each non-CNV alteration, the process can comprise multiplying
the scaling factor by a transformed value for each gene and use as
a quantity indicator for plotting that variant, and then assigns a
color/unique visual indicator for each alteration. Then the process
can comprise visualizing information for subsequent test points
(45) using the following pseudo-code:
[0138] If unchanged composition of test results, continue prior
panel date visual in new panel
[0139] If alterations remain the same, but quantities have changed
[0140] Recompute the quantity indicator for plotting that variant
and re-plot all updated values in existing panel(s) and new panel
for the latest test date.
[0141] If new alterations addition [0142] Add the alterations to
the top of all existing alterations [0143] Compute transform values
[0144] Recompute scaling factor [0145] Re-draw the response map,
re-plotting alterations in the prior test date that are still
detected in current test date as well as newly emerging
alterations
[0146] If prior existing alteration is not among the set of
detected alterations
[0147] Use a height of zero and plot the quantity of the alteration
for all subsequent test dates
[0148] Still include color is set of unavailable colors
[0149] Each subsequent panel denoting a test date may also include
additional patient or intervention information that may correlate
with the alteration changes seen in the remainder of the map.
Similar scaling, plotting, and transformation may be also
implemented on CNV and other types of DNA alterations (e.g.
methylation) to display these quantities in separate or combined
charts. These additional annotations may themselves also be
quantifiable and similarly plotted on the map.
[0150] The process can then comprise determining a summary of
alterations and treatment options (46). In one embodiment, for the
alteration with the maximum mutant allele frequencies, the
following actions are done:
[0151] Report all alterations for that gene in decreasing mutant
allele frequency order of non-CNV alterations
[0152] Report all CNV alterations for that gene in decreasing order
of CNV value
[0153] Repeat for next gene with next highest non-CNV mutant allele
frequency not yet reported
[0154] For each reported alteration, the process can comprise
including a trend indicator for that alteration over the different
test date points.
[0155] Grouping of maximum mutant allele frequencies may also
extend beyond just the genes they are harbored in to greater
encapsulating annotations such as biological pathways, evidence
level, etc.
[0156] FIGS. 10D-10I show one exemplary report generated by the
system of FIG. 9A. In FIG. 10D, a patient identification section 52
provides patient information, reporting date, and physician contact
information. A tumor response map 54 includes a modified
streamgraph 56 that shows tumor activities with unique colors for
each mutant gene. The graph 56 has accompanying summary explanation
textbox 58. More details are provided in a summary of alterations
and treatment option section 60. The alterations 62 and 64 are
presented in section 60, along with mutation trend, mutant allele
frequency, cell-free amplification, FDA Approved Drug Indication,
FDA Approved Drugs with other Indications, and Clinical Drug Trial
information. FIGS. 10D-1, 10D-2, and 10D-3 provide enlarged views
of FIG. 10D.
[0157] FIG. 10E shows an exemplary report section providing
definitions, comments, and interpretation of the tests. FIGS. 10E-1
and 10E-2 provide enlarged views of FIG. 10E. FIG. 10F shows an
exemplary detailed therapy result portion of the report. FIGS.
10E-1 and 10E-2 provide enlarged views of FIG. 10F. FIG. 10G shows
an exemplary discussion of the clinical relevance of detected
alterations. FIGS. 10G-1 and 10G-2 provide enlarged views of FIG.
10G. FIG. 10H shows potentially available medications that are
going through clinical trials. FIG. 10I shows the test methods and
limitations thereof. FIGS. 10I-1 and 10I-2 provide enlarged views
of FIG. 10I.
[0158] FIG. 10J-10P shows various exemplary modified streamgraph
56. A streamgraph, or stream graph, is a type of stacked area graph
which is displaced around a central axis, resulting in a flowing,
organic shape. Streamgraphs are a generalization of stacked area
graphs where the baseline is free. By shifting the baseline, it is
possible to minimize the change in slope (or "wiggle") in
individual series, thereby making it easier to perceive the
thickness of any given layer across the data.
[0159] For example, FIG. 10J shows seven layers representing at
least 8 mutants over three time periods, and a "0" time point (all
values "0"). FIG. 10K shows a single mutant over 4 time periods. No
mutants are detected at the second, third and fourth time points.
FIG. 10L indicates frequency of dominant allele at each time point.
FIG. 10M shows a single time point with a total of four mutants in
two genes. Mutants are identified by amino acid at a position
changed (i.e., EGFR T790M).
[0160] One embodiment renders a streamgraph so that it is not
x-axis reflective. The modified graph applies a unique scaling to
denote proportional attributes. The graph can indicate the addition
of new attributes over time. The presence or absence of a mutation
may be reflected in graphical form, indicating various positions in
the genome and a corresponding increase or decrease or maintenance
of a frequency of mutation at each respective position.
Additionally, mutations may be used to report a percentage score
indicating how much disease material exists in the cell free
polynucleotide sample. A confidence score may accompany each
detected mutation, given known statistics of typical variances at
reported positions in non-disease reference sequences. Mutations
may also be ranked in order of abundance in the subject or ranked
by clinically actionable importance.
[0161] The mapping of genome positions and copy number variation
for the subject with cancer can indicate that a particular cancer
is aggressive and resistant to treatment. The subject may be
monitored for a period and retested. If at the end of the period,
the copy number variation profile, e.g., as depicted in a tumor
response map, begins to increase dramatically, this may indicate
that the current treatment is not working. A comparison can also
done with genetic profiles of other subjects. For example, if it is
determined that this increase in copy number variation indicates
that the cancer is advancing, then the original treatment regimen
as prescribed is no longer treating the cancer and a new treatment
is prescribed.
[0162] These reports can be submitted and accessed electronically
via the internet. Analysis of sequence data may occur at a site
other than the location of the subject. The report can be generated
and transmitted to the subject's location. Via an internet enabled
computer, the subject may access the reports reflecting his tumor
burden.
[0163] Next, details of exemplary gene testing processes are
disclosed. Turning now to FIG. 11A, an exemplary process receives
genetic materials from blood sample or other body samples (1102).
The process can comprise converting the polynucleotides from the
genetic materials into tagged parent nucleotides (1104). The tagged
parent nucleotides are amplified to produce amplified progeny
polynucleotides (1106). A subset of the amplified polynucleotides
is sequenced to produce sequence reads (1108), which are grouped
into families, each generated from a unique tagged parent
nucleotide (1110). At a selected locus, the process can comprise
assigning each family a confidence score for each family (1112).
Next, a consensus is determined using prior readings. This is done
by reviewing prior confidence score for each family, and if
consistent prior confidence scores exists, then the current
confidence score is increased (1114). If there are prior confidence
scores, but they are inconsistent, the current confidence score is
not modified in one embodiment (1116). In other embodiments, the
confidence score is adjusted in a predetermined manner for
inconsistent prior confidence scores. If this is a first time the
family is detected, the current confidence score can be reduced as
it may be a false reading (1118). The process can comprise
inferring the frequency of the family at the locus in the set of
tagged parent polynucleotides based on the confidence score. Then
genetic test reports are generated as discussed above (1120).
[0164] While temporal information has been used in FIGS. 11A-11B to
enhance the information for mutation or copy number variation
detection, other consensus methods can be applied. In other
embodiments, the historical comparison can be used in conjunction
with other consensus sequences mapping to a particular reference
sequence to detect instances of genetic variation. Consensus
sequences mapping to particular reference sequences can be measured
and normalized against control samples. Measures of molecules
mapping to reference sequences can be compared across a genome to
identify areas in the genome in which copy number varies, or
heterozygosity is lost. Consensus methods include, for example,
linear or non-linear methods of building consensus sequences (e.g.,
voting, averaging, statistical, maximum a posteriori or maximum
likelihood detection, dynamic programming, Bayesian, hidden Markov
or support vector machine methods, etc.) derived from digital
communication theory, information theory, or bioinformatics. After
the sequence read coverage has been determined, a stochastic
modeling algorithm is applied to convert the normalized nucleic
acid sequence read coverage for each window region to the discrete
copy number states. In some cases, this algorithm may comprise one
or more of the following: Hidden Markov Model, dynamic programming,
support vector machine, Bayesian network, trellis decoding, Viterbi
decoding, expectation maximization, Kalman filtering methodologies
and neural networks.
[0165] As depicted in FIG. 11B, a comparison of sequence coverage
to a control sample or reference sequence may aid in normalization
across windows. In this embodiment, cell free DNAs are extracted
and isolated from a readily accessible bodily fluid such as blood,
sweat, saliva, urine, etc. For example, cell free DNAs can be
extracted using a variety of methods known in the art, including
but not limited to isopropanol precipitation and/or silica based
purification. Cell free DNAs may be extracted from any number of
subjects, such as subjects without cancer, subjects at risk for
cancer, or subjects known to have cancer (e.g. through other
means).
[0166] Following the isolation/extraction step, any of a number of
different sequencing operations may be performed on the cell free
polynucleotide sample. Samples may be processed before sequencing
with one or more reagents (e.g., enzymes, unique identifiers (e.g.,
barcodes), probes, etc.). In some cases if the sample is processed
with a unique identifier such as a barcode, the samples or
fragments of samples may be tagged individually or in subgroups
with the unique identifier. The tagged sample may then be used in a
downstream application such as a sequencing reaction and individual
molecules may be tracked to parent molecules.
[0167] The cell free polynucleotides can be tagged or tracked in
order to permit subsequent identification and origin of the
particular polynucleotide. The assignment of an identifier to
individual or subgroups of polynucleotides may allow for a unique
identity to be assigned to individual sequences or fragments of
sequences. This may allow acquisition of data from individual
samples and is not limited to averages of samples. In some
examples, nucleic acids or other molecules derived from a single
strand may share a common tag or identifier and therefore may be
later identified as being derived from that strand. Similarly, all
of the fragments from a single strand of nucleic acid may be tagged
with the same identifier or tag, thereby permitting subsequent
identification of fragments from the parent strand. In other cases,
gene expression products (e.g., mRNA) may be tagged in order to
quantify expression. A barcode or barcode in combination with
sequence to which it is attached can be counted. In still other
cases, the systems and methods can be used as a PCR amplification
control. In such cases, multiple amplification products from a PCR
reaction can be tagged with the same tag or identifier. If the
products are later sequenced and demonstrate sequence differences,
differences among products with the same identifier can then be
attributed to PCR error. Additionally, individual sequences may be
identified based upon characteristics of sequence data for the read
themselves. For example, the detection of unique sequence data at
the beginning (start) and end (stop) portions of individual
sequencing reads may be used, alone or in combination, with the
length, or number of base pairs of each sequence read to assign
unique identities to individual molecules. Fragments from a single
strand of nucleic acid, having been assigned a unique identity, may
thereby permit subsequent identification of fragments from the
parent strand. This can be used in conjunction with bottlenecking
the initial starting genetic material to limit diversity.
[0168] Further, using unique sequence data at the beginning (start)
and end (stop) portions of individual sequencing reads and
sequencing read length may be used, alone or combination, with the
use of barcodes. In some cases, the barcodes may be unique as
described herein. In other cases, the barcodes themselves may not
be unique. In this case, the use of non-unique barcodes, in
combination with sequence data at the beginning (start) and end
(stop) portions of individual sequencing reads and sequencing read
length may allow for the assignment of a unique identity to
individual sequences. Similarly, fragments from a single strand of
nucleic acid having been assigned a unique identity may thereby
permit subsequent identification of fragments from the parent
strand.
[0169] Generally, the methods and systems provided herein are
useful for preparation of cell free polynucleotide sequences to a
down-stream application sequencing reaction. Often, a sequencing
method is classic Sanger sequencing. Sequencing methods may
include, but are not limited to: high-throughput sequencing,
pyrosequencing, sequencing-by-synthesis, single-molecule
sequencing, nanopore sequencing, semiconductor sequencing,
sequencing-by-ligation, sequencing-by-hybridization, RNA-Seq
(Illumina), Digital Gene Expression (Helicos), Next generation
sequencing, Single Molecule Sequencing by Synthesis (SMSS)
(Helicos), massively-parallel sequencing, Clonal Single Molecule
Array (Solexa), shotgun sequencing, Maxim-Gilbert sequencing,
primer walking, and any other sequencing methods known in the
art.
[0170] Sequencing methods typically involve sample preparation,
sequencing of polynucleotides in the prepared sample to produce
sequence reads and bioinformatic manipulation of the sequence reads
to produce quantitative and/or qualitative genetic information
about the sample. Sample preparation typically involves converting
polynucleotides in a sample into a form compatible with the
sequencing platform used. This conversion can involve tagging
polynucleotides. In certain embodiments of this invention the tags
comprise polynucleotide sequence tags. Conversion methodologies
used in sequencing may not be 100% efficient. For example, it is
not uncommon to convert polynucleotides in a sample with a
conversion efficiency of about 1-5%, that is, about 1-5% of the
polynucleotides in a sample are converted into tagged
polynucleotides. Polynucleotides that are not converted into tagged
molecules are not represented in a tagged library for sequencing.
Accordingly, polynucleotides having genetic variants represented at
low frequency in the initial genetic material may not be
represented in the tagged library and, therefore may not be
sequenced or detected. By increasing conversion efficiency, the
probability that a polynucleotide in the initial genetic material
will be represented in the tagged library and, consequently,
detected by sequencing is increased. Furthermore, rather than
directly address the low conversion efficiency issue of library
preparation, most protocols to date call for greater than 1
microgram of DNA as input material. However, when input sample
material is limited or detection of polynucleotides with low
representation is desired, high conversion efficiency can
efficiently sequence the sample and/or to adequately detect such
polynucleotides.
[0171] Generally, mutation detection may be performed on
selectively enriched regions of the genome or transcriptome
purified and isolated (1302). As described herein, specific
regions, which may include but are not limited to genes, oncogenes,
tumor suppressor genes, promoters, regulatory sequence elements,
non-coding regions, miRNAs, snRNAs and the like may be selectively
amplified from a total population of cell free polynucleotides.
This may be performed as herein described. In one example,
multiplex sequencing may be used, with or without barcode labels
for individual polynucleotide sequences. In other examples,
sequencing may be performed using any nucleic acid sequencing
platforms known in the art. This step generates a plurality of
genomic fragment sequence reads (1304). Additionally, a reference
sequence is obtained from a control sample, taken from another
subject. In some cases, the control subject may be a subject known
to not have known genetic aberrations or disease. In some cases,
these sequence reads may contain barcode information. In other
examples, barcodes are not utilized.
[0172] After sequencing, reads can be assigned a quality score. A
quality score may be a representation of reads that indicates
whether those reads may be useful in subsequent analysis based on a
threshold. In some cases, some reads are not of sufficient quality
or length to perform the subsequent mapping step. Sequencing reads
with a quality score at least 90%, 95%, 99%, 99.9%, 99.99% or
99.999% may be filtered out of the data set. In other cases,
sequencing reads assigned a quality scored at least 90%, 95%, 99%,
99.9%, 99.99% or 99.999% may be filtered out of the data set. In
step 1306, the genomic fragment reads that meet a specified quality
score threshold are mapped to a reference genome, or a reference
sequence that is known not to contain mutations. After mapping
alignment, sequence reads are assigned a mapping score. A mapping
score may be a representation or reads mapped back to the reference
sequence indicating whether each position is or is not uniquely
mappable. In some instances, reads may be sequences unrelated to
mutation analysis. For example, some sequence reads may originate
from contaminant polynucleotides. Sequencing reads with a mapping
score at least 90%, 95%, 99%, 99.9%, 99.99% or 99.999% may be
filtered out of the data set. In other cases, sequencing reads
assigned a mapping scored less than 90%, 95%, 99%, 99.9%, 99.99% or
99.999% may be filtered out of the data set.
[0173] For each mappable base, bases that do not meet the minimum
threshold for mappability, or low quality bases, may be replaced by
the corresponding bases as found in the reference sequence.
[0174] The frequency of variant bases may be calculated as the
number of reads containing the variant divided by the total number
of reads 1308 after ascertaining read coverage and identifying
variant bases relative to the control sequence in each read. This
may be expressed as a ratio for each mappable position in the
genome.
[0175] For each base position, the frequencies of all four
nucleotides, cytosine, guanine, thymine, adenine can be analyzed in
comparison to the reference sequence. A stochastic or statistical
modeling algorithm can be applied to convert the normalized ratios
for each mappable position to reflect frequency states for each
base variant. In some cases, this algorithm may comprise one or
more of the following: Hidden Markov Model, dynamic programming,
support vector machine, Bayesian or probabilistic modeling, trellis
decoding, Viterbi decoding, expectation maximization, Kalman
filtering methodologies, and neural networks.
[0176] The discrete mutation states of each base position can be
utilized to identify a base variant with high frequency of variance
as compared to the baseline of the reference sequence. In some
cases, the baseline might represent a frequency of at least
0.0001%, 0.001%, 0.01%, 0.1%, 1.0%, 2.0%, 3.0%, 4.0% 5.0%, 10%, or
25%. In other cases the baseline might represent a frequency of at
least 0.0001%, 0.001%, 0.01%, 0.1%, 1.0%, 2.0%, 3.0%, 4.0% 5.0%.
10%, or 25%. In some cases, all adjacent base positions with the
base variant or mutation can be merged into a segment to report the
presence or absence of a mutation. In some cases, various positions
can be filtered before they are merged with other segments.
[0177] After calculation of frequencies of variance for each base
position, the variant with largest deviation for a specific
position in the sequence derived from the subject as compared to
the reference sequence can be identified as a mutation. In some
cases, a mutation may be a cancer mutation. In other cases, a
mutation might be correlated with a disease state.
[0178] A mutation or variant may comprise a genetic aberration that
includes, but is not limited to a single base substitution, or
small indels, transversions, translocations, inversion, deletions,
truncations or gene truncations. In some cases, a mutation may be
at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or 20 nucleotides in
length. On other cases a mutation may be at least 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 15 or 20 nucleotides in length.
[0179] Next, a consensus is determined using prior readings. This
is done by reviewing prior confidence score for the corresponding
bases, and if consistent prior confidence scores exists, then the
current confidence score is increased (1314). If there are prior
confidence scores, but they are inconsistent, the current
confidence score is not modified in one embodiment (1316). In other
embodiments, the confidence score is adjusted in a predetermined
manner for inconsistent prior confidence scores. If this is a first
time the family is detected, the current confidence score can be
reduced as it may be a false reading (1318). The process can
comprise then converting the frequency of variance per each base
into discrete variant states for each base position (1320).
[0180] Numerous cancers may be detected using the methods and
systems described herein. Cancers cells, as most cells, can be
characterized by a rate of turnover, in which old cells die and are
replaced by newer cells. Generally dead cells, in contact with
vasculature in a given subject, may release DNA or fragments of DNA
into the blood stream. This is also true of cancer cells during
various stages of the disease. Cancer cells may also be
characterized, dependent on the stage of the disease, by various
genetic aberrations such as copy number variation as well as
mutations. This phenomenon may be used to detect the presence or
absence of cancers individuals using the methods and systems
described herein.
[0181] For example, blood from subjects at risk for cancer may be
drawn and prepared as described herein to generate a population of
cell free polynucleotides. In one example, this might be cell free
DNA. The systems and methods of the disclosure may be employed to
detect mutations or copy number variations that may exist in
certain cancers present. The method may help detect the presence of
cancerous cells in the body, despite the absence of symptoms or
other hallmarks of disease.
[0182] The types and number of cancers that may be detected may
include but are not limited to blood cancers, brain cancers, lung
cancers, skin cancers, nose cancers, throat cancers, liver cancers,
bone cancers, lymphomas, pancreatic cancers, skin cancers, bowel
cancers, rectal cancers, thyroid cancers, bladder cancers, kidney
cancers, mouth cancers, stomach cancers, solid state tumors,
heterogeneous tumors, homogenous tumors and the like.
[0183] The system and methods may be used to detect any number of
genetic aberrations that may cause or result from cancers. These
may include but are not limited to mutations, mutations, indels,
copy number variations, transversions, translocations, inversion,
deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal
instability, chromosomal structure alterations, gene fusions,
chromosome fusions, gene truncations, gene amplification, gene
duplications, chromosomal lesions, DNA lesions, abnormal changes in
nucleic acid chemical modifications, abnormal changes in epigenetic
patterns, abnormal changes in nucleic acid methylation infection
and cancer.
[0184] Additionally, the systems and methods described herein may
also be used to help characterize certain cancers. Genetic data
produced from the system and methods of this disclosure may allow
practitioners to help better characterize a specific form of
cancer. Often times, cancers are heterogeneous in both composition
and staging. Genetic profile data may allow characterization of
specific sub-types of cancer that may be important in the diagnosis
or treatment of that specific sub-type. This information may also
provide a subject or practitioner clues regarding the prognosis of
a specific type of cancer.
[0185] The systems and methods provided herein may be used to
monitor already known cancers, or other diseases in a particular
subject. This may allow either a subject or practitioner to adapt
treatment options in accord with the progress of the disease. In
this example, the systems and methods described herein may be used
to construct genetic profiles of a particular subject of the course
of the disease. In some instances, cancers can progress, becoming
more aggressive and genetically unstable. In other examples,
cancers may remain benign, inactive or dormant. The system and
methods of this disclosure may be useful in determining disease
progression.
[0186] Further, the systems and methods described herein may be
useful in determining the efficacy of a particular treatment
option. In one example, successful treatment options may actually
increase the amount of copy number variation or mutations detected
in subject's blood if the treatment is successful as more cancers
may die and shed DNA. In other examples, this may not occur. In
another example, perhaps certain treatment options may be
correlated with genetic profiles of cancers over time. This
correlation may be useful in selecting a therapy. Additionally, if
a cancer is observed to be in remission after treatment, the
systems and methods described herein may be useful in monitoring
residual disease or recurrence of disease.
[0187] The methods and systems described herein may not be limited
to detection of mutations and copy number variations associated
with only cancers. Various other diseases and infections may result
in other types of conditions that may be suitable for early
detection and monitoring. For example, in certain cases, genetic
disorders or infectious diseases may cause a certain genetic
mosaicism within a subject. This genetic mosaicism may cause copy
number variation and mutations that could be observed. In another
example, the system and methods of the disclosure may also be used
to monitor the genomes of immune cells within the body. Immune
cells, such as B cells, may undergo rapid clonal expansion upon the
presence certain diseases. Clonal expansions may be monitored using
copy number variation detection and certain immune states may be
monitored. In this example, copy number variation analysis may be
performed over time to produce a profile of how a particular
disease may be progressing.
[0188] Further, the systems and methods of this disclosure may also
be used to monitor systemic infections themselves, as may be caused
by a pathogen such as a bacteria or virus. Copy number variation or
even mutation detection may be used to determine how a population
of pathogens are changing during the course of infection. This may
be particularly important during chronic infections, such as
HIV/AIDs or Hepatitis infections, whereby viruses may change life
cycle state and/or mutate into more virulent forms during the
course of infection.
[0189] Yet another example that the system and methods of this
disclosure may be used for is the monitoring of transplant
subjects. Generally, transplanted tissue undergoes a certain degree
of rejection by the body upon transplantation. The methods of this
disclosure may be used to determine or profile rejection activities
of the host body, as immune cells attempt to destroy transplanted
tissue. This may be useful in monitoring the status of transplanted
tissue as well as altering the course of treatment or prevention of
rejection.
[0190] Further, the methods of the disclosure may be used to
characterize the heterogeneity of an abnormal condition in a
subject, the method comprising generating a genetic profile of
extracellular polynucleotides in the subject, wherein the genetic
profile comprises a plurality of data resulting from copy number
variation and mutation analyses. In some cases, including but not
limited to cancer, a disease may be heterogeneous. Disease cells
may not be identical. In the example of cancer, some tumors are
known to comprise different types of tumor cells, some cells in
different stages of the cancer. In other examples, heterogeneity
may comprise multiple foci of disease. Again, in the example of
cancer, there may be multiple tumor foci, perhaps where one or more
foci are the result of metastases that have spread from a primary
site.
[0191] The methods of this disclosure may be used to generate or
profile, fingerprint or set of data that is a summation of genetic
information derived from different cells in a heterogeneous
disease. This set of data may comprise copy number variation and
mutation analyses alone or in combination.
[0192] Additionally, the systems and methods of the disclosure may
be used to diagnose, prognose, monitor or observe cancers or other
diseases of fetal origin. That is, these methodologies may be
employed in a pregnant subject to diagnose, prognose, monitor or
observe cancers or other diseases in a unborn subject whose DNA and
other polynucleotides may co-circulate with maternal molecules.
[0193] Further, these reports are submitted and accessed
electronically via the internet. Analysis of sequence data occurs
at a site other than the location of the subject. The report is
generated and transmitted to the subject's location. Via an
internet enabled computer, the subject accesses the reports
reflecting his tumor burden.
[0194] The annotated information can be used by a health care
provider to select other drug treatment options and/or provide
information about drug treatment options to an insurance company.
The method can include annotating the drug treatment options for a
condition in, for example, the NCCN Clinical Practice Guidelines in
Oncology.TM. or the American Society of Clinical Oncology (ASCO)
clinical practice guidelines.
[0195] The drug treatment options that are stratified in a report
can be annotated in the report by listing additional drug treatment
options. An additional drug treatment can be an FDA-approved drug
for an off-label use. A provision in the 1993 Omnibus Budget
Reconciliation Act (OBRA) requires Medicare to cover off-label uses
of anticancer drugs that are included in standard medical
compendia. The drugs used for annotating lists can be found in CMS
approved compendia, including the National Comprehensive Cancer
Network (NCCN) Drugs and Biologics Compendium.TM., Thomson
Micromedex DrugDex.RTM., Elsevier Gold Standard's Clinical
Pharmacology compendium, and American Hospital Formulary
Service--Drug Information Compendium.RTM..
[0196] The drug treatment options can be annotated by listing an
experimental drug that may be useful in treating a cancer with one
or more molecular markers of a particular status. The experimental
drug can be a drug for which in vitro data, in vivo data, animal
model data, pre-clinical trial data, or clinical-trial data are
available. The data can be published in peer-reviewed medical
literature found in journals listed in the CMS Medicare Benefit
Policy Manual, including, for example, American Journal of
Medicine, Annals of Internal Medicine, Annals of Oncology, Annals
of Surgical Oncology, Biology of Blood and Marrow Transplantation,
Blood, Bone Marrow Transplantation, British Journal of Cancer,
British Journal of Hematology, British Medical Journal, Cancer,
Clinical Cancer Research, Drugs, European Journal of Cancer
(formerly the European Journal of Cancer and Clinical Oncology),
Gynecologic Oncology, International Journal of Radiation, Oncology,
Biology, and Physics, The Journal of the American Medical
Association, Journal of Clinical Oncology, Journal of the National
Cancer Institute, Journal of the National Comprehensive Cancer
Network (NCCN), Journal of Urology, Lancet, Lancet Oncology,
Leukemia, The New England Journal of Medicine, and Radiation
Oncology.
[0197] The drug treatment options can be annotated by providing a
link on an electronic based report connecting a listed drug to
scientific information regarding the drug. For example, a link can
be provided to information regarding a clinical trial for a drug
(clinicaltrials.gov). If the report is provided via a computer or
computer website, the link can be a footnote, a hyperlink to a
website, a pop-up box, or a fly-over box with information, etc. The
report and the annotated information can be provided on a printed
form, and the annotations can be, for example, a footnote to a
reference.
[0198] The information for annotating one or more drug treatment
options in a report can be provided by a commercial entity that
stores scientific information. A health care provider can treat a
subject, such as a cancer patient, with an experimental drug listed
in the annotated information, and the health care provider can
access the annotated drug treatment option, retrieve the scientific
information (e.g., print a medical journal article) and submit it
(e.g., a printed journal article) to an insurance company along
with a request for reimbursement for providing the drug treatment.
Physicians can use any of a variety of Diagnosis-related group
(DRG) codes to enable reimbursement.
[0199] A drug treatment option in a report can also be annotated
with information regarding other molecular components in a pathway
that a drug affects (e.g., information on a drug that targets a
kinase downstream of a cell-surface receptor that is a drug
target). The drug treatment option can be annotated with
information on drugs that target one or more other molecular
pathway components. The identification and/or annotation of
information related to pathways can be outsourced or subcontracted
to another company.
[0200] The annotated information can be, for example, a drug name
(e.g., an FDA approved drug for off-label use; a drug found in a
CMS approved compendium, and/or a drug described in a scientific
(medical) journal article), scientific information concerning one
or more drug treatment options, one or more links to scientific
information regarding one or more drugs, clinical trial information
regarding one or more drugs (e.g., information from
clinicaltrials.gov/), one or more links to citations for scientific
information regarding drugs, etc.
[0201] The annotated information can be inserted into any location
in a report. Annotated information can be inserted in multiple
locations on a report. Annotated information can be inserted in a
report near a section on stratified drug treatment options.
Annotated information can be inserted into a report on a separate
page from stratified drug treatment options. A report that does not
contain stratified drug treatment options can be annotated with
information.
[0202] The system can also include reports on the effects of drugs
on sample (e.g. tumor cells) isolated from a subject (e.g. cancer
patient). An in vitro culture using a tumor from a cancer patient
can be established using techniques known to those skilled in the
art. The system can also include high-throughput screening of FDA
approved off-label drugs or experimental drugs using said in vitro
culture and/or xenograft model. The system can also include
monitoring tumor antigen for recurrence detection.
[0203] The system can provide internet enabled access of reports of
a subject with cancer. The system can use a handheld DNA sequencer
or a desktop DNA sequencer. The DNA sequencer is a scientific
instrument used to automate the DNA sequencing process. Given a
sample of DNA, a DNA sequencer is used to determine the order of
the four bases: adenine, guanine, cytosine, and thymine. The order
of the DNA bases is reported as a text string, called a read. Some
DNA sequencers can be also considered optical instruments as they
analyze light signals originating from fluorochromes attached to
nucleotides.
[0204] The data is sent by the DNA sequencers over a direct
connection or over the internet to a computer for processing. The
data processing aspects of the system can be implemented in digital
electronic circuitry, or in computer hardware, firmware, software,
or in combinations of them. Data processing apparatus of the
invention can be implemented in a computer program product tangibly
embodied in a machine-readable storage device for execution by a
programmable processor; and data processing method steps of the
invention can be performed by a programmable processor executing a
program of instructions to perform functions of the invention by
operating on input data and generating output. The data processing
aspects of the invention can be implemented advantageously in one
or more computer programs that are executable on a programmable
system including at least one programmable processor coupled to
receive data and instructions from and to transmit data and
instructions to a data storage system, at least one input device,
and at least one output device. Each computer program can be
implemented in a high-level procedural or object-oriented
programming language, or in assembly or machine language, if
desired; and, in any case, the language can be a compiled or
interpreted language. Suitable processors include, by way of
example, both general and special purpose microprocessors.
Generally, a processor will receive instructions and data from a
read-only memory and/or a random access memory. Storage devices
suitable for tangibly embodying computer program instructions and
data include all forms of nonvolatile memory, including by way of
example semiconductor memory devices, such as EPROM, EEPROM, and
flash memory devices; magnetic disks such as internal hard disks
and removable disks; magneto-optical disks; and CD-ROM disks. Any
of the foregoing can be supplemented by, or incorporated in, ASICs
(application-specific integrated circuits).
[0205] To provide for interaction with a user, the invention can be
implemented using a computer system having a display device such as
a monitor or LCD (liquid crystal display) screen for displaying
information to the user and input devices by which the user can
provide input to the computer system such as a keyboard, a
two-dimensional pointing device such as a mouse or a trackball, or
a three-dimensional pointing device such as a data glove or a
gyroscopic mouse. The computer system can be programmed to provide
a graphical user interface through which computer programs interact
with users. The computer system can be programmed to provide a
virtual reality, three-dimensional display interface.
Therapeutic Intervention
[0206] The methods of this disclosure allow one to provide
therapeutic interventions more precisely directed to the form of a
disease in a subject, and to calibrate these therapeutic
interventions over time. This precision reflects, in part, the
precision by which one is able to profile the whole body tumor
status of a subject as reflected in tumor heterogeneity. Thus, the
therapeutic intervention is more effective against cancers with
this profile than against cancers with any single one of these
variants.
[0207] A therapeutic intervention is an intervention that produces
a therapeutic effect, (e.g., is therapeutically effective).
Therapeutically effective interventions prevent, slow the
progression of, improve the condition of (e.g., causes remission
of), or cure a disease, such as a cancer. A therapeutic
intervention can include, for example, administration of a
treatment, such as chemotherapy, radiation therapy, surgery,
immunotherapy, administration of a pharmaceutical or a
nutraceutical, or, a change in behavior, such as diet. One measure
of therapeutic effectiveness is effectiveness for at least 90% of
subjects undergoing the intervention over at least 100
subjects.
[0208] Drug targets in cancer and drugs efficacious against these
targets are set forth in Tables 1 and 2 (taken from Bailey et al.,
Discovery Medicine, v. 18 #92, 2/7/14).
TABLE-US-00001 TABLE 1 Selected Examples of Commercially Available
Diagnostic Tests, Associated Therapy Implication, and Relevant
Cancer Type. Drug-Biomarker Clinical Therapy Implications Test
Cancer Type Association IHC Assays Cetuximab; Panitumumab EGFR CRC
Established Imatinib C-KIT GIST Established Trastuzumab HER2 Breast
Cancer; Established Gastric Cancer Resistance to PI3K, AKT, LKB1
NSCLC Investigational (Mahoney et and MEK inhibitors al., 2009)
Crizotinib C-MET NSCLC Investigational (Sadiq & Salgia, 2013)
Akt/mTOR Inhibitors; PTEN CRC, NSCLC Investigational (Di resistance
to anti- EGFR Nicolantonio et al., 2010; therapies Sos et al.,
2009; Wang et al., 2012) In Situ Hybridization Assays Crizotinib
ALK Fusion NSCLC Established FISH Trastuzumab; Pertuzumab HER2 FISH
Breast Cancer, Established Gastric Cancer Trastuzumab HER2 CISH
Breast Cancer Established Trastuzumab HER2 ISH Breast Cancer
Established Mutation Assays Cetuximab, Panitumumab KRAS CRC, NSCLC,
Established Pancreatic Cancer Erlotinib, Gefitinib EGFR NSCLC, CRC
Established Vemurafenib, Trametenib, BRAF CRC, Thyroid Established
Dabrafenib, Resistance to Cancer, Melanoma Anti-EGFR therapies
Imatinib; 2nd Generation BCR-ABL CML, Ph+ AML Established TKIs
Crizotinib ALK NSCLC Established RAF and MEK inhibitors, NRAS
Melanoma, CRC, Investigational (Ascierto et resistance to anti-EGFR
NSCLC al., 2013; De Mattos-Arruda therapies et al., 2011; De Roock
et al., 2010; Huang et al., 2013) Imatinib PDGFRA GIST Established
PI3K/mTOR Inhibitors PIK3CA Breast Cancer, Investigational (Di CRC,
Lung Cancer Nicolantonio et al., 2010; Janku et al., 2013) Akt/mTOR
Inhibitors; PTEN CRC, NSCLC, Investigational (Di resistance to
anti- EGFR Breast Nicolantonio et al., 2010, therapies Jerusalem et
al., 2013; Sos et al., 2009; Wang et al., 2012) Resistance to PI3K,
AKT, LKB1 NSCLC Investigational (Averette- and MEK inhibitors Byers
et al., 2012) Other Imatinib BCR-ABL1 CML, Ph+ AML Established
Quantitative Transcript Analysis Resistance to Imatinib BCR-ABL1
CML, Ph+ AML Investigational (Hochhaus et Copy Number al., 2002)
PI3K Inhibitors PIK3CA Multiple Cancer Investigational (Rodon et
Amplification Types al., 2013) Erlotinib; Getfitnib; EGFR NSCLC,
CRC Investigational (Gupta et al., Cetuximab; Panitumumab
Amplification 2009) Note: The drug-biomarker clinical associations
denoted `Established` reflect well known drug FDA indications. The
ones denoted `Investigational` are associations that are
hypothesized and demonstrated by scientific literature.
TABLE-US-00002 TABLE 2 US FDA Approved Targeted Therapies and
Indications. Trade Agent Name Target(s) FDA-approved Indication(s)
Company Monoclonal Antibodies Adotrastuzumab Kadcyla HER2 Breast
cancer (HER2+)* Genentech emtansine (T- DM1)* Bevacizumab Avastin
VEGF CRC Genentech GBM NCLC RCC Cetuximab* Erbitux EGFR CRC (KRAS
wild-type)* Eli Lilly HNSCC Ipilimumab Yervoy CTLA-4 Melanoma
Bristol-Myers Squibb Obinutuzumab Gazyva CD-20 CLL Genentech
Panitumumab* Vectibix EGFR CRC (KRAS wild-type)* Amgen Pertuzumab
Perjeta HER2 Breast Cancer (HER2+)* Genentech Trastuzumab*
Herceptin HER2 Breast cancer (HER2+)* Genentech Gastric cancer
(HER2+)* Small Molecule Inhibitors Afatinib* Gilotrif EGFR, HER2
NSCLC (with EGFR exon 19 Boehringer deletions or L858R
substitution)* Ingelheim Axitinib Inlyta KIT, PDGFR.beta., RCC
Pfizer VEGFR1/2/3 Bosutinib* Bosulif ABL CML (Philadelphia
chromosome Pfizer positive)* Cabozantinib Cometriq FLT3, KIT,
Medullary thyroid cancer Exelixis MET, RET, VEGFR2 Crizotinib*
Xalkori ALK, MET NSCLC (with ALK fusion)* Pfizer Dabrafenib*
Tafinlar BRAF Melanoma (with BRAF V600E GlaxoStnithKline mutation)*
Dasatinib* Sprycel ABL CML (Philadelphia chromosome Bristol-Myers
positive)* Squibb ALL (Philadelphia chromosome positive)* Denosumab
Xgeva RANKL Giant cell tumor of bone Amgen Erlotinib* Tarceva EGFR
NSCLC (with exon 19 deletions Genentech & or L858R
substitutions)* OSI Pancreatic cancer Everolimus* Afinitor mTOR
Pancreatic neuroendocrine tumor Novartis RCC Breast cancer (ER/PR+)
in combination with exemestane* Nonresectable subependymal giant
cell astrocytorna associated with tuberous sclerosis Gefitinib
Iressa EGFR NSCLC with known prior benefit AstraZeneca from
gefitinib (limited approval) Ibrutininb Imbruvica BTK Mantle cell
lymphoma Pharmacyclics Imatinib* Gleevec KIT, PDGFR, GI stromal
tumor Novartis ABL Dermatofibrosarcoma protuberans Multiple
hematologic malignancies including Philadelphia chromosome-
positive ALL and CML* Lapatinib* Tykerb HER2, EGFR Breast cancer
(HER2+)* GlaxoSmithKline Nilotinib* Tasigna ABL CML (Philadelphia
chromosome Novartis positive)* Pazopanib Votrient VEGFR, RCC
GlaxoSmithKline PDGFR, KIT Soft tissue sarcoma Regorafenib Stivarga
KIT, PDGFR.beta., CRC Bayer RAF, RET, Gastrointestinal stromal
tumors VEGFR1/2/3 Ruxolitinib Jakafi JAK1/2 Myelofibrosis Incyte
Sorafenib Nexavar VEGFR, Hepatocellular carcinoma Bayer PDGFR, KIT,
RCC RAF Sunitinib Sutent VEGFR, GIST Pfizer PDGFR, KIT, Pancreatic
neuroendocrine tumor RET RCC Temsirolimus Torisel mTOR RCC Wyeth
Trametinib* Mekinist MEK Melanoma (with BRAF V600E GlaxoSmithKline
or V600K mutations)* Vandetanib Caprelsa EGFR RET, Medullary
thyroid cancer AstraZeneca VEGFR2 Vemurafenib* Zelboraf BRAF
Melanoma (with BRAF V600 Roche mutation)* Note: ALL, acute
lymphoblastic leukemia; CML, chronic myeloid leukemia; GIST,
gastrointestinal stromal tumor; ER, estrogen receptor, PR,
progesterone receptor; NSCLC, non-small cell lung cancer; CRC,
colorectal cancer; GBM, glioblastoma; RCC, renal cell carcinoma,
HNSCC, head and neck squamous cell carcinoma; CLL, chronic
lymphoblastic leukemia; BTK, Bruton's tyrosine kinase. *Targeted
therapy that is associated with a molecular-specific cancer subtype
alteration. There are approximately 17 targeted therapies that are
associated with 10 molecular-specific subtypes of cancer.
[0209] In one embodiment, based on the profile of disease
heterogeneity, a therapeutic intervention is determined that takes
into account both the type of genetic variants found in the disease
cells and their relative amounts (e.g., proportion). The
therapeutic intervention can treat the subject as if each clonal
variant were a different cancer to be treated independently. In
some cases, when one or more genetic variants are detected at less
than sub-clinical amounts, e.g., at least 5.times. lower, at least
10.times. lower, or at least 100.times. lower than the dominant
detected clones, these variants may be left out of the therapeutic
intervention until they rise to a clinical threshold or significant
relative frequency (e.g., greater than the threshold stated
above).
[0210] When a plurality of different genetic variants is found in
different quantities, e.g., different numbers or different relative
amounts, a therapeutic intervention can include treatments
effective against diseases with each of the genetic variants. For
example, in the case of cancer, genetic variants, such as mutant
forms of a gene or gene amplification, may be detected in several
genes (e.g., a major clone and a minor clone). Each of these forms
may be actionable, that is, a treatment may be known for which
cancers with the particular variant are responsive. However, the
profile of tumor heterogeneity may indicate that one of the
variants is present in the polynucleotides at, for example, five
times the level of each of the other two variants. A therapeutic
intervention can be determined that involves delivering three
different drugs to the subject, each drug relatively more effective
against cancers bearing each of the variants. The drugs can be
delivered as a cocktail, or sequentially.
[0211] In a further embodiment, the drugs can be administered in
doses stratified to reflect the relative amounts of the variants in
the DNA. For example, a drug effective against the most common
variant can be administered in greater amount than drugs effective
against the two less common variants.
[0212] Alternatively, the profile of tumor heterogeneity can show
the presence of a sub-population of cancer cells bearing a genetic
variant that is resistant to a drug to which the disease typically
responds. In this case, the therapeutic intervention can involve
including both a first drug effective against tumor cells without
the resistance variant and a second drug effective against tumor
cells with the resistant variant. Again, doses can be stratified to
reflect relative amounts of each variant detected in the
profile.
[0213] In another embodiment, changes in the profile of tumor
heterogeneity are examined over time, and therapeutic interventions
are developed to treat the changing tumor. For example, disease
heterogeneity can be determined at a plurality of different times.
Using the profiling methods of this disclosure, more precise
inferences can be made about tumor evolution. This allows the
practitioner to monitor the evolution of the disease, in particular
as new clonal sub-populations emerge after remission effected by a
first wave of therapy. In this case, therapeutic interventions can
be calibrated over time to treat the changing tumor. For example, a
profile may show that a cancer has a form that is responsive to a
certain treatment. The treatment is delivered and the tumor burden
is seen to decrease over time. At some point, a genetic variant is
found in the tumor indicating the presence of a population of
cancer cells that is not responsive to the treatment. A new
therapeutic intervention is determined that targets the cells
bearing the marker of non-responsiveness.
[0214] In response to chemotherapy, a dominant tumor form can
eventually give way through Darwinian selection to cancer cells
carrying mutants that render the cancer unresponsive to the therapy
regimen. Appearance of these resistance mutants can be delayed
through methods of this disclosure. In one embodiment of this
method, a subject is subjected to one or more pulsed therapy
cycles, each pulsed therapy cycle comprising a first period during
which a drug is administered at a first amount and a second cycle
during which the drug is administered at a second, reduced amount.
The first period is characterized by a tumor burden detected above
a first clinical level. The second period is characterized by a
tumor burden detected below a second clinical level. First and
second clinical levels can be different in different pulsed therapy
cycles. So, for example, the first clinical level can be lower in
succeeding cycles. A plurality of cycles can include at least 2, 3,
4, 5, 6, 7, 8 or more cycles. For example, the BRAF mutant V600E
may be detected in disease cell polynucleotides at an amount
indicating a tumor burden of 5% in cfDNA. Chemotherapy can commence
with dabrafenib. Subsequent testing can show that the amount of the
BRAF mutant in the cfDNA falls below 0.5% or to undetectable
levels. At this point, dabrafenib therapy can stop or be
significantly curtailed. Further subsequent testing may find that
DNA bearing the BRAF mutation has risen to 2.5% of polynucleotides
in cfDNA. At this point, dabrafenib therapy is re-started, e.g., at
the same level as the initial treatment. Subsequent testing may
find that DNA bearing the BRAF mutation has decreased to 0.5% of
polynucleotides in cfDNA. Again, dabrafenib therapy is stopped or
reduced. The cycle can be repeated a number of times.
[0215] FIG. 7 shows an exemplary course of monitoring and treatment
of disease in a subject. A subject tested at the time of blood draw
1 has a tumor burden of 1.4% and presents with genetic alterations
in genes 1, 2 and 3. The subject is treated with Drug A. After a
time, treatment is discontinued. At a second later time, a second
blood draw shows the cancer in remission. At a third later time, a
third blood draw indicates that the cancer has recurred, in this
instance, presenting with a genetic variant in Gene 4. The subject
is now put on a course of Drug B, to which cancers having this
variant are responsive.
[0216] In another embodiment, a therapeutic intervention can be
changed upon detection of the rise of a mutant form resistant to an
original drug. For example, cancers with the EGFR mutation L858R
respond to therapy with erlotinib. However, cancers with the EGFR
mutation T790M are resistant to erlotinib. However, they are
responsive to ruxolitinib. A method of this disclosure involves
monitoring changes in tumor profile and changing a therapeutic
intervention when a genetic variant associated with drug resistance
rises to a predetermined clinical level.
Database
[0217] In another embodiment, a database is built in which genetic
information from serial samples collected from cancer patients is
recorded. This database may also contain intervening treatment and
other clinically relevant information, such as, weight, adverse
effects, histological testing, blood testing, radiographic
information, prior treatments, cancer type, etc. Serial test
results can be used to infer efficacy of treatment, especially when
used with blood samples, which can give a more unbiased estimate of
tumor burden than self-reporting or radiographic reporting by a
medical practitioner. Treatment efficacy can be clustered by those
with similar genomic profiles and vice versa. Genomic profiles can
be organized around, for example, primary genetic alteration,
secondary genetic alteration(s), relative amounts of these genetic
alterations, and tumor load. This database can be used for decision
support for subsequent patients. Both germline and somatic
alterations can be used for determining treatment efficacy as well.
Acquired resistance alterations that can also be inferred from the
database when treatments that were effective initially begin to
fail. This failure can be detected through radiographic, blood or
other means. The primary data used for inference of acquired
resistance mechanisms are genomic tumor profiles collected after
treatment per patient. This data can also be used to place
quantitative bounds on likely treatment response as well as predict
time to treatment failure. Based on likely acquired resistance
alterations for a given treatment and tumor genomic profile, a
treatment regimen can be modified to suppress acquisition of most
likely resistance alterations.
Computer Systems
[0218] Methods of the present disclosure can be implemented using,
or with the aid of, computer systems. FIG. 5 shows a computer
system 1501 that is programmed or otherwise configured to implement
the methods of the present disclosure. The computer system 1501
includes a central processing unit (CPU, also "processor" and
"computer processor" herein) 1505. The computer system 1501 also
includes memory or memory location 1510 (e.g., random-access
memory, read-only memory, flash memory), electronic storage unit
1515 (e.g., hard disk), communication interface 1520 (e.g., network
adapter) for communicating with one or more other systems, and
peripheral devices 1525, such as cache, other memory, data storage
and/or electronic display adapters. The memory 1510, storage unit
1515, interface 1520 and peripheral devices 1525 are in
communication with the CPU 1505 through a communication bus (solid
lines). The storage unit 1515 can be a data storage unit (or data
repository) for storing data. The computer system 1501 can be
operatively coupled to a computer network ("network") 1530 with the
aid of the communication interface 1520. The network 1530 can be
the Internet, an internet and/or extranet, or an intranet and/or
extranet that is in communication with the Internet. The network
1530 in some cases is a telecommunication and/or data network. The
network 1530 can include one or more computer servers, which can
enable distributed computing, such as cloud computing. The CPU 1505
can execute a sequence of machine-readable instructions, which can
be embodied in a program or software. The instructions may be
stored in a memory location, such as the memory 1510. The storage
unit 1515 can store files, such as drivers, libraries and saved
programs. The computer system 1501 can communicate with one or more
remote computer systems through the network 1530. Methods as
described herein can be implemented by way of machine (e.g.,
computer processor) executable code stored on an electronic storage
location of the computer system 1501, such as, for example, on the
memory 1510 or electronic storage unit 1515. The machine executable
or machine readable code can be provided in the form of software.
Aspects of the systems and methods provided herein, such as the
computer system 1501, can be embodied in programming. Various
aspects of the technology may be thought of as "products" or
"articles of manufacture" typically in the form of machine (or
processor) executable code and/or associated data that is carried
on or embodied in a type of machine readable medium.
Machine-executable code can be stored on an electronic storage
unit, such memory (e.g., read-only memory, random-access memory,
flash memory) or a hard disk. "Storage" type media can include any
or all of the tangible memory of the computers, processors or the
like, or associated modules thereof, such as various semiconductor
memories, tape drives, disk drives and the like, which may provide
non-transitory storage at any time for the software programming.
All or portions of the software may at times be communicated
through the Internet or various other telecommunication networks.
The computer system 1501 can include or be in communication with an
electronic display that comprises a user interface (UI) for
providing, for example, one or more results of sample analysis.
EXAMPLES
[0219] Nucleotide positions (e.g., loci) in the genome can be
designated by number, as depicted in FIG. 2. Positions at which
about 100% of the base calls are identical to the reference
sequence or at which about 100% of the base calls are different
than the reference sequence are inferred to represent homozygosity
of the cfDNA (presumed normal). Positions at which about 50% of the
base calls are identical to the reference sequence are inferred to
represent heterozygosity of the cfDNA (also presumed normal).
Positions at which the percentage of base calls at a locus are
substantially below 50% and above the detection limit of the base
calling system are inferred to represent tumor-associated genetic
variants.
Example 1
Methods for Copy Number Variation Detection
[0220] Blood Collection
[0221] 10-30 mL Blood samples are collected at room temperature.
The samples are centrifuged to remove cells. Plasma is collected
after centrifugation.
[0222] cfDNA Extraction
[0223] The sample is subjected to proteinase K digestion. DNA is
precipitated with isopropanol. DNA is captured on a DNA
purification column (e.g., a QIAamp DNA Blood Mini Kit) and eluted
in 100 .mu.l solution. DNAs below 500 bp are selected with Ampure
SPRI magnetic bead capture (PEG/salt). The resulting production is
suspended in 30 .mu.L H.sub.2O. Size distribution is checked (major
peak=166 nucleotides; minor peak=330 nucleotides) and quantified. 5
ng of extracted DNA contain approximately 1700 haploid genome
equivalents ("HGE"). The general correlation between the amount of
DNA and HGE is as follow: 3 pg DNA=1 HGE; 3 ng DNA=1K HGE; 3 .mu.g
DNA=1M HGE; 10 pg DNA=3 HGE; 10 ng DNA=3K HGE; 10 .mu.g DNA=3M
HGE.
[0224] "Single Molecule" Library Prep
[0225] High-efficiency DNA tagging (>80%) is performed by end
repair, A-tailing and sticky-end ligation with 2 different octomers
(i.e., 4 combinations) with overloaded hairpin adaptors. 2.5 ng DNA
(i.e. approximately 800 HGE) is used as the starting material. Each
hairpin adaptor comprises a random sequence on its
non-complementary portion. Both ends of each DNA fragment are
attached with hairpin adaptors. Each tagged fragment can be
identified by a combination of the octomer sequence on the hairpin
adaptors and endogenous portions of the insert sequence.
[0226] Tagged DNA is amplified by 12 cycles of PCR to produce about
1-7 .mu.g DNA that contain approximately 500 copies of each of the
800 HGE in the starting material.
[0227] Buffer optimization, polymerase optimization and cycle
reduction may be performed to optimize the PCR reactions.
Amplification bias, e.g., non-specific bias, GC bias, and/or size
bias are also reduced by optimization. Noise(s) (e.g.,
polymerase-introduced errors) are reduced by using high-fidelity
polymerases.
[0228] Sequences may be enriched as follow: DNAs with regions of
interest (ROI) are captured using biotin-labeled bead with probe to
ROIs. The ROIs are amplified with 12 cycles of PCR to generate a
2000 times amplification.
[0229] Massively Parallel Sequencing
[0230] 0.1 to 1% of the sample (approximately 100 pg) are used for
sequencing. The resulting DNA is then denatured and diluted to 8 pM
and loaded into an Illumina sequencer.
[0231] Digital Bioinformatics
[0232] Sequence reads are grouped into families, with about 10
sequence reads in each family. Families are collapsed into
consensus sequences by voting (e.g., biased voting) each position
in a family. A base is called for consensus sequence if 8 or 9
members agree. A base is not called for consensus sequence if no
more than 60% of the members agree.
[0233] The resulting consensus sequences are mapped to a reference
genome, such as hg19. Each base in a consensus sequence is covered
by about 3000 different families. A quality score for each sequence
is calculated and sequences are filtered based on their quality
scores. Base calls at each position in a consensus sequence are
compared with the HG-19 reference sequence. At each position at
which a base call differs from the reference sequence, the identity
of the different base or bases, and their percentage as a function
of total base calls at the locus is determined and reported.
[0234] Sequence variation is detected by counting distribution of
bases at each locus. If 98% of the reads have the same base
(homozygous) and 2% have a different base, the locus is likely to
have a sequence variant, presumably from cancer DNA.
[0235] CNV is detected by counting the total number of sequences
(bases) mapping to a locus and comparing with a control locus. To
increase CNV detection, CNV analysis is performed specific regions,
including regions on ALK, APC, BRAF, CDKN2A, EGFR, ERBB2, FBXW7,
KRAS, MYC, NOTCH1, NRAS, PIK3CA, PTEN, RB1, TP53, MET, AR, ABL1,
AKT1, ATM, CDH1, CSF1R, CTNNB1, ERBB4, EZH2, FGFR1, FGFR2, FGFR3,
FLT3, GNA11, GNAQ, GNAS, HNF1A, HRAS, IDH1, IDH2, JAK2, JAK3, KDR,
KIT, MLH1, MPL, NPM1, PDGFRA, PROC, PTPN11, RET,SMAD4, SMARCB1,
SMO, SRC, STK11, VHL, TERT, CCND1, CDK4, CDKN2B, RAF1, BRCA1,
CCND2, CDK6, NF1, TP53, ARID1A, BRCA2, CCNE1, ESR1, RIT1, GATA3,
MAP2K1, RHEB, ROS1, ARAF, MAP2K2, NFE2L2, RHOA, or NTRK1 genes.
Example 2
Method for Correcting Base Calling by Determining the Total Number
Unseen Molecules in a Sample
[0236] After fragments are amplified and the sequences of amplified
fragments are read and aligned, the fragments are subjected to base
calling. Variations in the number of amplified fragments and unseen
amplified fragments can introduce errors in base calling. These
variations are corrected by calculating the number of unseen
amplified fragments.
[0237] When base calling for locus A (an arbitrary locus), it is
first assumed that there are N amplified fragments. The sequence
readouts can come from two types of fragments: double-strand
fragments and single-strand fragments. The following is a
theoretical example of calculating the total number of unseen
molecules in a sample.
[0238] N is the total number of molecules in the sample.
[0239] Assuming 1000 is the number of duplexes detected.
[0240] Assuming 500 is the number of single-stranded molecule
detected.
[0241] P is the probability of seeing a strand.
[0242] Q is the probability of not detecting a strand.
[0243] Since Q=1-P.
[0244] 1000=NP(2).
[0245] 500=N2PQ.
[0246] 1000/P(2)=N.
[0247] 500/2PQ=N.
[0248] 1000/P(2)=500/2PQ.
[0249] 1000*2 PQ=500 P(2).
[0250] 2000 PQ=500 P(2).
[0251] 2000 Q=500 P.
[0252] 2000 (1-P)=500P
[0253] 2000-2000 P=500P.
[0254] 2000=500P+2000 P.
[0255] 2000=2500 P.
[0256] 2000+2500=P.
[0257] 0.8=P.
[0258] 1000/P(2)=N.
[0259] 1000+0.64=N.
[0260] 1562=N.
[0261] Number of unseen fragments=62.
Example 3
Identification of Genetic Variants in Cancer-associated Somatic
Variants in a Patient
[0262] An assay is used to analyze a panel of genes to identify
genetic variants in cancer-associated somatic variants with high
sensitivity.
[0263] Cell-free DNA is extracted from plasma of a patient and
amplified by PCR. Genetic variants are analyzed by massively
parallel sequencing of the amplified target genes. For one set of
genes, all exons are sequenced as such sequencing coverage had
shown to have clinically utility (Table 3). For another set of
genes, sequencing coverage included those exons with a previously
reported somatic mutation (Table 4). The minimum detectable mutant
allele (limit of detection) is dependent on the patient's sample
cell-free DNA concentration, which varied from less than 10 to over
1,000 genomic equivalents per mL of peripheral blood. Amplification
may not be detected in samples with lower amounts of cell-free DNA
and/or low-level gene copy amplification. Certain sample or variant
characteristics resulted in reduced analytic sensitivity, such as
low sample quality or improper collection.
[0264] The percentage of genetic variants found in cell-free DNA
circulating in blood is related to the unique tumor biology of this
patient. Factors that affected the amount/percentages of detected
genetic variants in circulating cell-free DNA in blood include
tumor growth, turn-over, size, heterogeneity, vascularization,
disease progression or treatment. Table 5 annotates the percentage,
or allele frequency, of altered circulating cell-free DNA (% cfDNA)
detected in this patient. Some of the detected genetic variants are
listed in descending order by % cfDNA.
[0265] Genetic variants are detected in the circulating cell-free
DNA isolated from this patient's blood specimen. These genetic
variants are cancer-associated somatic variants, some of which have
been associated with either increased or reduced clinical response
to specific treatment. "Minor Alterations" are defined as those
alterations detected at less than 10% the allele frequency of
"Major Alterations". A Major Alteration is the predominant
alteration at a locus. The detected allele frequencies of these
alterations (Table 5) and associated treatments for this patient
are annotated.
[0266] All genes listed in Tables 3 and 4 are analyzed as part of
the test. Amplification is not detected for ERBB 2, EGFR, or MET in
the circulating cell-free DNA isolated from this patient's blood
specimen.
[0267] Patient test results comprising the genetic variants are
listed in Table 6.
[0268] Referring to Table 4, at 13 positions, a nucleotide detected
at at least 98.8% frequency in the sample is different than a
nucleotide in the reference sequence, indicating homozygosity at
these loci. For example, in the KRAS gene, at position 25346462, T
was detected rather than reference nucleotide C in 100% of
cases.
[0269] At 35 positions, a nucleotide detected at between 41.4% and
55% frequency in the sample is different than a nucleotide in the
reference sequence, indicating heterozygosity at these loci. For
example, in the ALK gene, at position 29455267, G was detected
rather than reference nucleotide A in 50% of cases.
[0270] At 3 positions a nucleotide detected at less than 9%
frequency is different than a nucleotide in the reference sequence.
These include variants in BRAF (140453136 A>T, 8.9%), NRAS
(115256530 G>T 2.6%) and JAK2 (5073770 G>T 1.5%). They are
presumed to be somatic mutations from cancer DNA.
[0271] The relative amounts of tumor-associated genetic variants
are calculated. The ratio of amounts of BRAF:NRAS:JAK2 is
8.9:2.6:1.5, or 1:0.29: 0.17. From this result one can infer the
presence of tumor heterogeneity. For example, one possible
interpretation is that 100% of tumor cells contain a variant in
BRAF, 83% contain variants in BRAF and NRAS, and 17% contain
variants in BRAF, NRAS and JAK2. However, analysis of CNV may show
amplification of BRAF, in which case 100% of tumor cells may have
variants in both BRAF and NRAS.
TABLE-US-00003 TABLE 3 Genes in which all exons are sequenced GENES
IN WHICH ALL EXONS ARE SEQUENCED ALK <0.1% APC <0.1% AR
<0.1% BRAF <0.1% CDKN2A <0.1% EGFR <0.1% ERBB2 <0.1%
FBXW7 <0.1% KRAS <0.1% MET <0.1% MYC <0.1% NOTCH1
<0.1% NRAS <0.1% PIK3CA <0.1% PTEN <0.1% PROC <0.1%
RB1 <0.1% TP53 <0.1% LOD: Limit of Detection. The minimum
detectable mutant allele frequency for this specimen in which 80%
of somatic variants is detected.
TABLE-US-00004 TABLE 4 Genes in which exons with a previously
reported somatic mutation are sequenced GENES IN WHICH EXONS WITH A
PREVIOUSLY REPORTED SOMATIC MUTATION ARE SEQUENCED ABL1 <0.1%
AKT1 <0.1% ATM <0.1% CDH1 <0.1% CSF1R <0.1% CTNNB1
<0.1% ERBB4 <0.1% EZH2 <0.1% FGFR1 <0.1% FGFR2 <0.1%
FGFR3 <0.1% FLT3 <0.1% GNA11 <0.1% GNAQ <0.1% GNAS
<0.1% HNF1A <0.1% HRAS <0.1% IDH1 <0.1% IDH2 <0.1%
JAK2 <0.1% JAK3 <0.1% KDR <0.1% KIT <0.1% MLH1 <0.1%
MPL <0.1% NPM1 <0.1% PDGFRA <0.1% PTPN11 <0.1% RET
<0.1% SMAD4 <0.1% SMARCB1 <0.1% SMO <0.1% SRC <0.1%
STK11 <0.1% TERT <0.1% VHL <0.1% LOD: Limit of Detection.
The minimum detectable mutant allele frequency for this specimen in
which 80% of somatic variants is detected.
TABLE-US-00005 TABLE 6 Genomic alterations detected in selected
genes Detected: 51 Genomic Alterations Mutation Gene Chromosome
Position Mutation (nt) (AA) Percentage Cosmic ID DBSNP ID KRAS 12
25368462 C > T 100.0% rs4362222 ALK 2 29416572 T > C I1461V
100.0% rs1670283 ALK 2 29444095 C > T 100.0% rs1569156 ALK 2
29543663 T > C Q500Q 100.0% rs2293564 ALK 2 29940529 A > T
P234P 100.0% rs2246745 APC 5 112176756 T > A V1822D 100.0%
rs459552 CDKN2A 9 21968199 C > G 100.0% COSM14251 rs11515 FGFR3
4 1807894 G > A T651T 100.0% rs7688609 NOTCH1 9 139410424 A >
G 100.0% rs3125006 PDGFRA 4 55141055 A > G P567P 100.0%
rs1873778 HRAS 11 534242 A > G H27H 100.0% COSM249860 rs12628
EGFR 7 55214348 C > T N158N 99.9% COSM42978 rs2072454 TP53 17
7579472 G > C P72R 99.8% rs1042522 APC 5 112162854 T > C
Y486Y 55.0% rs2229992 APC 5 112177171 G > A P1960P 53.8%
rs465899 EGFR 7 55266417 T > C T903T 53.6% rs1140475 APC 5
112176325 G > A G1678G 53.2% rs42427 APC 5 112176559 T > G
S1756S 53.0% rs866006 EGFR 7 55229255 G > A R521K 53.0% MET 7
116397572 A > G Q648Q 52.7% APC 5 112175770 G > A T1493T
52.7% rs41115 EGFR 7 55249063 G > A Q787Q 52.6% rs1050171 NOTCH1
9 139411714 T > C 52.4% rs11145767 EGFR 7 55238874 T > A
T629T 52.0% rs2227984 ERBB2 17 37879588 A > G I655V 51.6%
rs1136201 NOTCH1 9 139397707 G > A D1698D 51.3% COSM33747
rs10521 ALK 2 30143499 G > C L9L 51.0% rs4358080 APC 5 112164561
G > A A545A 51.0% rs351771 FLT3 13 28610183 A > G 50.8%
rs2491231 NOTCH1 9 139418260 A > G N104N 50.5% rs4489420 ALK 2
29444076 G > T 50.4% rs1534545 PIK3CA 3 178917005 A > G 50.3%
rs3729674 NOTCH1 9 139412197 G > A 50.2% rs9411208 ALK 2
29455267 A > G G845G 50.0% COSM148825 rs2256740 KIT 4 55593464 A
> C M541L 49.9% COSM28026 NOTCH1 9 139391636 G > A D2185D
48.9% rs2229974 PDGFRA 4 55152040 C > T V824V 48.9% COSM22413
rs2228230 ALK 2 29416481 T > C K1491R 48.9% COSM1130802
rs1881420 ALK 2 29445458 G > T G1125G 48.6% rs3795850 NOTCH1 9
139410177 T > C 48.5% rs3124603 RET 10 43613843 G > T L769L
48.2% rs1800861 EGFR 7 55214443 G > A 48.0% rs7801956 ALK 2
29416366 G > C D1529E 47.2% rs1881421 EGFR 7 55238087 C > T
45.5% rs10258429 RET 10 43615633 C > G S904S 44.8% rs1800863
BRAF 7 140453136 A > T V600E 8.9% COSM476 NRAS 1 115256530 G
> T Q61K 6.2% COSM580 rs121913254 JAK2 9 5073770 G > T V617F
1.5% COSM12600 rs77375493
Example 4
Determining patient-specific Limits of Detection for Genes Analyzed
by Assays
[0272] Using the method of Example 3, Genetic alterations in
cell-free DNA of a patient are detected. The sequence reads of
these genes include exon and/or intron sequences.
Example 5
Correcting Sequence Errors Comparing Watson and Crick Sequences
[0273] Double-stranded cell-free DNA is isolated from the plasma of
a patient. The cell-free DNA fragments are tagged using 16
different bubble-containing adaptors, each of which comprises a
distinctive barcode. The bubble-containing adaptors are attached to
both ends of each cell-free DNA fragment by ligation. After
ligation, each of the cell-free DNA fragment can be distinctly
identified by the sequence of the distinct barcodes and two 20 bp
endogenous sequences at each end of the cell-free DNA fragment.
[0274] The tagged cell-free DNA fragments are amplified by PCR. The
amplified fragments are enriched using beads comprising
oligonucleotide probes that specifically bind to a group of
cancer-associated genes. Therefore, cell-free DNA fragments from
the group of cancer-associated genes are selectively enriched.
[0275] Sequencing adaptors, each of which comprises a sequencing
primer binding site, a sample barcode, and a cell-flow sequence,
are attached to the enriched DNA molecules. The resulting molecules
are amplified by PCR.
[0276] Both strands of the amplified fragments are sequenced.
Because each bubble-containing adaptor comprises a
non-complementary portion (e.g., the bubble), the sequence of the
one strand of the bubble-containing adaptor is different from the
sequence of the other strand (complement). Therefore, the sequence
reads of amplicons derived from the Watson strand of an original
cell-free DNA can be distinguished from amplicons from the Crick
strand of the original cell-free DNA by the attached
bubble-containing adaptor sequences.
[0277] The sequence reads from a strand of an original cell-free
DNA fragment are compared to the sequence reads from the other
strand of the original cell-free DNA fragment. If a variant occurs
in only the sequence reads from one strand, but not other strand,
of the original cell-free DNA fragment, this variant will be
identified as an error (e.g., resulted from PCR and/or
amplification), rather than a true genetic variant.
[0278] The sequence reads are grouped into families. Errors in the
sequence reads are corrected. The consensus sequence of each family
is generated by collapsing.
Example 6
Therapeutic Intervention
[0279] A therapeutic intervention is determined to treat the
cancer. Cancers with BRAF mutants respond to treatment with
vemurafenib, regorafenib, tranetinib and dabrafenib. Cancers with
NRAS mutants respond to treatment with trametinib. Cancers with
JAK2 mutants respond to treatment with ruxolitinib. A therapeutic
intervention including administration of trametinib and ruxolitinib
is determined to be more effective against this cancer than
treatment with any one of the aforementioned drugs alone. The
subject is treated with a combination of trametinib and ruxolitinib
at a dose ratio of 5:1.
[0280] After several rounds of treatment, the cfDNA from the
subject is tested again for the presence of tumor heterogeneity.
Results show that the ratio of the BRAF:NRAS:JAK2 is now about
4:2:1.5. This indicates that the therapeutic intervention has
reduced the number of cells with the BRAF and NRAS mutants, and has
halted growth of cells with JAK2 mutants. A second therapeutic
intervention is determined in which trametinib and ruxolitinib are
determined to be effective in a dose ratio of 1:1. The subject is
given a course of chemotherapy at amounts at this ratio. Subsequent
testing shows that BRAF, NRAS and JAK2 mutants are present in cfDNA
at amounts below 1%.
Example 7
Therapeutic Intervention
[0281] A blood sample is collected from an individual with melanoma
pre-treatment and the patient is determined to have a BRAF V600E
mutation at a concentration of 2.8% and no detectable NRAS
mutations using cell-free DNA analysis. The patient is put on an
anti-BRAF therapy (dabrafenib). After 3 weeks, another blood sample
is collected and tested. The BRAF V600E level is determined to have
dropped to 0.1%. The therapy is stopped and the test repeated every
2 weeks. The BRAF V600E level rises again and therapy is
reinitiated when the BRAF V600E level rises to 1.5%. Therapy is
again stopped when the level drops down to 0.1% again. This cycle
is repeated.
Example 8
Correcting CNV Based on ROCNV Measurements
[0282] Copy number variations in a patient sample are determined.
Methods for determining can include molecular tracking and
upsampling, as described above. A hidden-markov model based on
expected locations of origins of replication is used to remove the
effect of replication origin proximity from the estimated copy
number variations in the patient sample. The standard deviation of
copy-number variations for each gene is subsequently reduced by
40%. The replication origin proximity model is also used to infer
cell-free tumor burden in the patient.
[0283] In many cases, the level of cell-free tumor derived may be
low or below the detection limit of a particular technology. This
can be the case when the number of human genome equivalents of
tumor derived DNA in plasma is below 1 copy per 5 mL. Radiation and
chemotherapies have been shown to affect rapidly dividing cells
more than stable, healthy cells, hence their efficacy in treating
advanced cancer patients. Hence, a procedure with minimal adverse
effects is administered to a patient pre-blood collection to
preferentially increase the fraction of tumor-derived DNA
collected. For example, a low dose of chemotherapy could be
administered to the patient and a blood sample could be collected
within 24 hours, 48 hours, 72 hours or less than 1 week. For
effective chemotherapies, this blood sample contains higher
concentrations of cell-free tumor-derived DNA due to potentially
higher rates of cell-death of cancer cells. Alternatively, low-dose
radiation therapy is applied via a whole-body radiographic
instrument or locally to the affected regions instead of low-dose
chemotherapy. Other procedures are envisioned, including subjecting
a patient to ultrasound, sound waves, exercise, stress, etc.
[0284] All publications, patents, and patent applications mentioned
in this specification are herein incorporated by reference to the
same extent as if each individual publication, patent, or patent
application was specifically and individually indicated to be
incorporated by reference.
[0285] While preferred embodiments of the present disclosure have
been shown and described herein, it will be obvious to those
skilled in the art that such embodiments are provided by way of
example only. Numerous variations, changes, and substitutions will
now occur to those skilled in the art without departing from the
disclosure. It should be understood that various alternatives to
the embodiments of the invention described herein may be employed
in practicing the invention. It is intended that the following
claims define the scope of the invention and that methods and
structures within the scope of these claims and their equivalents
be covered thereby.
* * * * *