U.S. patent application number 17/545611 was filed with the patent office on 2022-06-30 for methods and materials for assessing allelic imbalance.
This patent application is currently assigned to Myriad Genetics, Inc.. The applicant listed for this patent is Myriad Genetics, Inc.. Invention is credited to Alexander Gutin, Jerry Lanchbury, Kirsten Timms.
Application Number | 20220205022 17/545611 |
Document ID | / |
Family ID | 1000006200400 |
Filed Date | 2022-06-30 |
United States Patent
Application |
20220205022 |
Kind Code |
A1 |
Gutin; Alexander ; et
al. |
June 30, 2022 |
METHODS AND MATERIALS FOR ASSESSING ALLELIC IMBALANCE
Abstract
Methods and systems for detecting allelic imbalance using
nucleic acid sequencing are provided.
Inventors: |
Gutin; Alexander; (Salt Lake
City, UT) ; Timms; Kirsten; (Salt Lake City, UT)
; Lanchbury; Jerry; (Salt Lake City, UT) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Myriad Genetics, Inc. |
Salt Lake City |
UT |
US |
|
|
Assignee: |
Myriad Genetics, Inc.
Salt Lake City
UT
|
Family ID: |
1000006200400 |
Appl. No.: |
17/545611 |
Filed: |
December 8, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16815963 |
Mar 11, 2020 |
11225685 |
|
|
17545611 |
|
|
|
|
15412404 |
Jan 23, 2017 |
10626449 |
|
|
16815963 |
|
|
|
|
15010721 |
Jan 29, 2016 |
9574229 |
|
|
15412404 |
|
|
|
|
14109163 |
Dec 17, 2013 |
9279156 |
|
|
15010721 |
|
|
|
|
PCT/US12/42668 |
Jun 15, 2012 |
|
|
|
14109163 |
|
|
|
|
61498418 |
Jun 17, 2011 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
C12Q 1/6883 20130101;
C12Q 1/6874 20130101; G16B 30/00 20190201; C12Q 1/6858 20130101;
C12Q 1/6827 20130101 |
International
Class: |
C12Q 1/6827 20060101
C12Q001/6827; G16B 30/00 20060101 G16B030/00; C12Q 1/6858 20060101
C12Q001/6858; C12Q 1/6874 20060101 C12Q001/6874; C12Q 1/6883
20060101 C12Q001/6883 |
Claims
1. An in vitro method for detecting copy number at a plurality of
single nucleotide polymorphism loci, comprising: (1) extracting DNA
from a formalin-fixed paraffin-embedded sample comprising at least
one tumor cell obtained from a patient to produce at least one
solution comprising genomic DNA of the at least one tumor cell; (2)
enriching the at least one solution in (1) for test DNA molecules
each comprising at least one locus from the plurality of single
nucleotide polymorphism loci, wherein the plurality of single
nucleotide polymorphism loci comprises at least 1,000 single
nucleotide polymorphism loci and wherein there is at least one
single nucleotide polymorphism locus located on average every 500
kb within each chromosome; and (3) quantitatively sequencing the
test DNA molecules to detect the copy number of each allele at each
such locus in the plurality of single nucleotide polymorphism
loci.
2. The method of claim 1, wherein the plurality of single
nucleotide polymorphism loci comprises at least 2,500 single
nucleotide polymorphism loci.
3. The method of claim 1, wherein the plurality of single
nucleotide polymorphism loci comprises at least 5,000 single
nucleotide polymorphism loci.
4. The method of claim 1, wherein the plurality of single
nucleotide polymorphism loci comprises at least 10,000 single
nucleotide polymorphism loci.
5. The method of claim 1, wherein the plurality of single
nucleotide polymorphism loci comprises at least 50,000 single
nucleotide polymorphism loci.
6. The method of claim 1, wherein there is at least one single
nucleotide polymorphism locus located on average every 1 Mb within
each chromosome.
7. The method of claim 1, wherein there is at least one single
nucleotide polymorphism locus in the plurality of single nucleotide
polymorphism loci located on average every 500 kb within each
chromosome.
8. The method of claim 1, wherein there is at least one single
nucleotide polymorphism locus in the plurality of single nucleotide
polymorphism loci located on average every 100 kb within each
chromosome.
9. The method of claim 1, wherein there is at least one single
nucleotide polymorphism locus in the plurality of single nucleotide
polymorphism loci located on average every 50 kb within each
chromosome.
10. The method of claim 1, wherein there is at least one single
nucleotide polymorphism locus in the plurality of single nucleotide
polymorphism loci located on average every 10 kb within each
chromosome.
11. The method of claim 1, wherein the genomic spacing of the
plurality of single nucleotide polymorphism loci in the plurality
of single nucleotide polymorphism loci is less than or equal to
50%.
12. The method of claim 1, wherein the genomic spacing of the
plurality of single nucleotide polymorphism loci in the plurality
of single nucleotide polymorphism loci is less than or equal to
25%.
13. The method of claim 1, wherein the genomic spacing of the
plurality of single nucleotide polymorphism loci in the plurality
of single nucleotide polymorphism loci is less than or equal to
10%.
14. An in vitro method for detecting copy number at a plurality of
single nucleotide polymorphism loci, comprising: (1) extracting DNA
from a formalin-fixed paraffin-embedded sample comprising at least
one tumor cell obtained from a patient to produce at least one
solution comprising genomic DNA of the at least one tumor cell; (2)
enriching the at least one solution in (1) for test DNA molecules
each comprising at least one locus from the plurality of single
nucleotide polymorphism loci, wherein the plurality of single
nucleotide polymorphism loci comprises at least 1,000 single
nucleotide polymorphism loci, wherein there is at least one single
nucleotide polymorphism locus from the plurality of single
nucleotide polymorphism loci located on average every 5 Mb within
each chromosome analyzed, and wherein enriching the at least one
solution comprises: (a) (i) separating the test DNA molecules from
the rest of the genomic DNA in the at least solution by contacting
genomic DNA in the at least one solution with a plurality of
oligonucleotide probes, wherein there is at least one probe in the
plurality of oligonucleotide probes complementary to each locus in
the plurality of single nucleotide polymorphism loci and (ii)
amplifying the test DNA molecules separated in (i) by PCR; or (b)
(i) amplifying the genomic DNA in the at least one solution and
(ii) separating the test DNA molecules from the rest of the
amplified genomic DNA in (i) by contacting the amplified genomic
DNA with a plurality of oligonucleotide probes, wherein there is at
least one probe in the plurality of oligonucleotide probes
complementary to each locus in the plurality of single nucleotide
polymorphism loci; or (c) directly amplifying the test DNA
molecules from the at least one solution of genomic DNA; and (3)
quantitatively sequencing the test DNA molecules to detect the copy
number of each allele at each such locus in the plurality of single
nucleotide polymorphism loci.
15. The method of claim 14, wherein the plurality of single
nucleotide polymorphism loci comprises at least 2,500 single
nucleotide polymorphism loci.
16. The method of claim 14, wherein the plurality of single
nucleotide polymorphism loci comprises at least 5,000 single
nucleotide polymorphism loci.
17. The method of claim 14, wherein the plurality of single
nucleotide polymorphism loci comprises at least 10,000 single
nucleotide polymorphism loci.
18. The method of claim 14, wherein the plurality of single
nucleotide polymorphism loci comprises at least 50,000 single
nucleotide polymorphism loci.
19. The method of claim 14, wherein there is at least one single
nucleotide polymorphism locus located on average every 1 Mb, every
500 kb, every 100 Kb, every 50 Kb or every 10 Kb within each
chromosome.
20.-23. (canceled)
24. The method of claim 14, wherein the genomic spacing of the
plurality of single nucleotide polymorphism loci in the plurality
of single nucleotide polymorphism loci is less than or equal to
50%, 25%, or 10%.
25.-39. (canceled)
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 16/815,963, filed Mar. 11, 2020, which is a continuation of
U.S. application Ser. No. 15/412,404, filed Jan. 23, 2017 (issued
as U.S. Pat. No. 10,626,449), which is a continuation of U.S.
application Ser. No. 15/010,721, filed Jan. 29, 2016 (issued as
U.S. Pat. No. 9,574,229), which is a continuation of U.S.
application Ser. No. 14/109,163, filed Dec. 17, 2013 (issued as
U.S. Pat. No. 9,279,156), which is a continuation of International
Application No. PCT/US12/042668, filed Jun. 15, 2012, which claims
the priority benefit of U.S. Provisional Application No.
61/498,418, filed Jun. 17, 2011, the entire contents of each of
which are hereby incorporated by reference.
FIELD OF THE INVENTION
[0002] The invention generally relates to molecular diagnosis, and
particularly to a method and system for detecting allelic imbalance
in patient samples.
BACKGROUND OF THE INVENTION
[0003] In general, a comparison of sequences present at the same
locus on each chromosome (each autosomal chromosome for males) of a
chromosome pair can reveal whether that particular locus is
homozygous or heterozygous within the genome of a cell. Polymorphic
loci within the human genome are generally heterozygous within an
individual since that individual typically receives one copy from
the biological father and one copy from the biological mother. In
some cases, a polymorphic locus or a string of polymorphic loci
within an individual are homozygous as a result in inheriting
identical copies from both biological parents. In other cases,
homozygosity results from a loss of heterozygosity (LOH) from the
germline. Because LOH and copy number information can be clinically
useful, there is a need for improved methods of identifying loci
and regions of LOH in samples.
BRIEF SUMMARY OF THE INVENTION
[0004] Copy number (including allelic imbalance and LOH) analysis
of tumor tissues has been traditionally performed using single
nucleotide polymorphism (SNP) arrays. The data quality is often
highly variable and, especially for FFPE samples, tends to be poor.
The inventors have developed a method of genome-wide copy number
analysis that produces high quality data from all sample types that
is based on in-solution capture of DNA fragments spanning target
loci (e.g., SNPs), followed by parallel sequencing to identify and
quantitate the alleles. The resulting data allows high quality LOH
and copy number analysis of the sample.
[0005] Accordingly, in one aspect of the present invention, a
method of detecting allelic imbalance status in a plurality of
genomic loci in a tumor sample from a cancer patient is provided,
comprising the steps of enriching a genomic DNA sample for DNA
molecules each comprising a locus of interest; sequencing said DNA
molecules to determine the genotype at each such locus; determining
for each locus whether there is allelic imbalance.
[0006] In another aspect of the present invention, a method of
detecting LOH status in a plurality of genomic loci in a tumor
sample from a cancer patient is provided, comprising the steps of
enriching a genomic DNA sample for DNA molecules each comprising a
locus of interest; sequencing said DNA molecules to determine the
genotype at each such locus; determining for each homozygous locus
whether it is homozygous due to LOH.
[0007] Unless otherwise defined, all technical and scientific terms
used herein have the same meaning as commonly understood by one of
ordinary skill in the art to which this invention pertains.
Although methods and materials similar or equivalent to those
described herein can be used in the practice or testing of the
present invention, suitable methods and materials are described
below. In case of conflict, the present specification, including
definitions, will control. In addition, the materials, methods, and
examples are illustrative only and not intended to be limiting.
[0008] Other features and advantages of the invention will be
apparent from the following detailed description, and from the
claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 is a graph plotting allele dosages of breast cancer
cells from a breast cancer patient along chromosome 1 as determined
using a SNP array. The chromosome region between the arrows is an
LOH region that is about 103 Mb in length.
[0010] FIG. 2 is a graph plotting allele dosages of breast cancer
cells for the same breast cancer patient as on FIG. 1 along
chromosome 1 as determined using high-throughput sequencing. The
chromosome region between the arrows is an LOH region that is about
103 Mb in length.
[0011] FIG. 3 is a diagram of an example of a computer device and a
mobile computer device that can be used to implement the techniques
described herein.
DETAILED DESCRIPTION OF THE INVENTION
[0012] It has been surprisingly discovered that determining allelic
imbalance (e.g., abnormal copy number, LOH) in formalin-fixed
paraffin-embedded ("FFPE") samples using sequencing of genomic
regions comprising loci of interest (e.g., SNPs) yields far
superior quality data when compared to copy number and allelic
imbalance data generated using microarrays. This invention enables
large-scale (e.g., whole genome) copy number (e.g., allelic
imbalance) analysis of samples of varying quality. In particular,
it enables high quality data to be produced from FFPE-derived DNA.
Current array-based platforms are unable to produce data of
sufficient quality from this sample type.
[0013] Accordingly, in one aspect of the present invention, a
method of detecting allelic imbalance status in a plurality of
genomic loci in a tumor sample from a cancer patient is provided,
comprising the steps of enriching a genomic DNA sample for DNA
molecules each comprising a locus of interest; sequencing said DNA
molecules to determine the genotype at each such locus; determining
for each locus whether there is allelic imbalance. "Locus" as used
herein has its usual meaning in the art. As used herein, "region"
means a plurality of substantially adjacent loci. Unless stated
otherwise or unless the context clearly indicates otherwise,
statements made about a locus will generally apply to a region.
[0014] As used herein, "allelic imbalance" means any instance where
the somatic copy number differs from the germline copy number at a
genomic locus or region. In some embodiments allelic imbalance is
expressed in terms of major copy proportion ("MCP"). Major copy
proportion and MCP, as used herein, mean the ratio of the major
allele copy number to the major+minor allele copy number, as
follows:
MCP=[major allele copy number]/([major allele copy number]+[minor
allele copy number])
In some embodiments, a locus or region shows allelic imbalance if
the MCP at such locus or region is 0.51, 0.52, 0.53, 0.54, 0.55,
0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, or 1.
[0015] One example of allelic imbalance is loss of heterozygosity
("LOH"), in which a locus is heterozygous in the germline but
homozygous in somatic tissue. In this sense, homozygosity can
include homozygous loss (i.e., deletion) of the locus in somatic
tissue. The different types of possible LOH and allelic imbalance
are discussed in more detail below.
[0016] Thus in some embodiments the present invention provides a
method of detecting LOH status in a plurality of genomic loci in a
tumor sample from a cancer patient, comprising enriching a genomic
DNA sample for DNA molecules each comprising a locus of interest;
sequencing said DNA molecules to determine the genotype at each
such locus; determining for each homozygous locus whether it is
homozygous due to LOH.
[0017] According to the present invention, nucleic acid sequencing
techniques can be used to identify loci and/or regions as having
allelic imbalance. For example, genomic DNA from a cell sample
(e.g., a cancer cell sample) can be extracted and fragmented. Any
appropriate method can be used to extract and fragment genomic
nucleic acid including, without limitation, commercial kits such as
QIAamp DNA Mini Kit (Qiagen), MagNA Pure DNA Isolation Kit (Roche
Applied Science) and GenElute Mammalian Genomic DNA Miniprep Kit
(Sigma-Aldrich). Once extracted and fragmented, either targeted or
untargeted sequencing can be done to determine the sample's
genotypes at loci of interest. For example, whole genome, whole
transcriptome, or whole exome sequencing can be done to determine
genotypes at millions or even billions of base pairs (i.e., base
pairs can be "loci" to be evaluated).
[0018] In some cases, targeted sequencing of known polymorphic loci
(e.g., SNPs and surrounding sequences) can be done as an
alternative to microarray analysis. For example, the genomic DNA
can be enriched for those fragments containing a locus (e.g., SNP
location) to be analyzed using kits designed for this purpose
(e.g., Agilent SureSelect, Illumina TruSeq Capture, Nimblegen
SeqCap EZ Choice, Raindance Thunderstorm.TM.). For example, genomic
DNA containing the loci to be analyzed can be hybridized to
biotinylated capture RNA fragments to form biotinylated RNA/genomic
DNA complexes. Alternatively, DNA capture probes may be utilized
resulting in the formation of biotinylated DNA/genomic DNA hybrids.
Streptavidin coated magnetic beads and a magnetic force can be used
to separate the biotinylated RNA/genomic DNA complexes from those
genomic DNA fragments not present within a biotinylated RNA/genomic
DNA complex. The obtained biotinylated RNA/genomic DNA complexes
can be treated to remove the captured RNA from the magnetic beads,
thereby leaving intact genomic DNA fragments containing a locus to
be analyzed. These intact genomic DNA fragments containing the loci
to be analyzed can be amplified using, for example, PCR techniques.
Alternatively, a multiplex PCR reaction can be employed to enrich
for loci of interest. PCR primers can be designed to flank loci of
interest and a PCR reaction can be run to amplify sequences
comprising such loci.
[0019] The enriched genomic DNA fragments can be sequenced using
any sequencing technique. Beyond Sanger sequencing, numerous
suitable sequencing machines and strategies are well known in the
art, including but not limited to those developed by IIlumina (the
Genome Analyzer; Bennett et al. (2005) Pharmacogenomics, 6:373-382;
HiSeq; MiSeq); by Applied Biosystems, Inc. (the SOLiD.TM.
Sequencer; solid.appliedbiosystems.com); by Roche (e.g., the 454 GS
FLX.TM. sequencer; Margulies et al. (2005) Nature, 437:376-380;
U.S. Pat. Nos. 6,274,320; 6,258,568; 6,210,891); by Helicos
Biosciences (Heliscope.TM. system, see, e.g., U.S. Patent App. Pub.
No. 2007/0070349); by Oxford Nanopore (e.g., GridION.TM. and
MinION.TM., see, e.g., International Application No.
PCT/GB2009/001690, pub. no. WO/2010/004273); and by others.
[0020] The sequencing results from the genomic DNA fragments can be
used to identify loci as having allelic imbalance. In some cases,
an analysis of the allelic imbalance status of loci over a length
of a chromosome can be performed to determine the length of regions
of allelic imbalance. For example, a stretch of SNP locations that
are spaced apart (e.g., spaced about 25 kb to about 100 kb apart)
along a chromosome can be evaluated by sequencing, and the
sequencing results used to determine not only the presence of a
region of allelic imbalance ( ) e.g., somatic homozygosity) along a
chromosome but also the length of that region of imbalance.
Obtained sequencing results can be used to generate a graph that
plots allele dosages along a chromosome. Allele dosage d.sub.i for
SNP i can be calculated from the adjusted number of captured probes
for two alleles (A.sub.i and B.sub.i): d.sub.i=.
A.sub.i/(A.sub.i+B.sub.i). An example of such a graph is presented
in FIG. 2.
[0021] Once a sample's genotype (e.g., homozygosity) has been
determined for a plurality of loci (e.g., SNPs), common techniques
can be used to identify loci and regions of allelic imbalance due
to somatic change (e.g., LOH). One way to determine whether
imbalance is due to somatic change is to compare the somatic
genotype to the germline. For example, the genotype for a plurality
of loci (e.g., SNPs) can be determined in both a germline (e.g.,
blood) sample and a somatic (e.g., tumor) sample. The genotypes for
each sample can be compared (typically computationally) to
determine where the genome of the germline cell was, e.g.,
heterozygous and the genome of the somatic cell is, e.g.,
homozygous. Such loci are LOH loci and regions of such loci are LOH
regions.
[0022] Computational techniques can also be used to determine
whether allelic imbalance is somatic (e.g., due to LOH). Such
techniques are particularly useful when a germline sample is not
available for analysis and comparison. For example, algorithms such
as those described elsewhere can be used to detect allelic
imbalance regions using information from SNP arrays (Nannya et al.,
Cancer Res., 65:6071-6079 (2005)). Typically these algorithms do
not explicitly take into account contamination of tumor samples
with benign tissue. Cf. International Application No.
PCT/US2011/026098 to Abkevich et al.; Goransson et al., PLoS One
(2009) 4(6):e6057. This contamination is often high enough to make
the detection of allelic imbalance regions challenging. Improved
analytical methods according to the present invention for
identifying allelic imbalance, even in spite of contamination,
include those embodied in computer software products as described
below.
[0023] The following is one example. If the observed ratio (e.g.,
MCP) of the signals of two alleles, A and B, is two to one, there
are two possibilities. The first possibility is that cancer cells
have LOH with deletion of allele B in a sample with 50%
contamination with normal cells. The second possibility is that
there is no LOH but allele A is duplicated in a sample with no
contamination with normal cells. An algorithm can be implemented as
a computer program as described herein to reconstruct LOH regions
based on genotype (e.g., SNP genotype) data. One point of the
algorithm is to first reconstruct allele specific copy numbers
(ASCN) at each locus (e.g., SNP). ASCNs are the numbers of copies
of both paternal and maternal alleles. An LOH region is then
determined as a stretch of SNPs with one of the ASCNs (paternal or
maternal) being zero. The algorithm can be based on maximizing a
likelihood function and can be conceptually akin to a previously
described algorithm designed to reconstruct total copy number
(rather than ASCN) at each locus (e.g., SNP). See International
Application No. PCT/US2011/026098 (pub. no. WO/2011/106541) (hereby
incorporated by reference in its entirety). The likelihood function
can be maximized over ASCN of all loci, level of contamination with
benign tissue, total copy number averaged over the whole genome,
and sample specific noise level. The input data for the algorithm
can include or consist of (1) sample-specific normalized signal
intensities for both allele of each locus and (2) assay-specific
(specific for different SNP arrays and for sequence based approach)
set of parameters defined based on analysis of large number of
samples with known ASCN profiles.
[0024] In some cases, a selection process can be used to select
loci (e.g., SNP loci) to be evaluated using an assay configured to
identify loci as having allelic imbalance (e.g., SNP array-based
assays and sequencing-based assays). For example, any human SNP
location can be selected for inclusion in a SNP array-based assay
or a sequencing-based assay configured to identify loci as having
allelic imbalance within the genome of cells. In some cases, 0.5,
1.0, 1.5, 2.0, 2.5 million or more loci (e.g., SNP locations)
present within the human genome can be evaluated to identify those
loci that (a) are not present on the Y chromosome, (b) are not
mitochondrial loci, (c) have a minor allele frequency of at least
about 5% in the population of interest (e.g., Caucasians), (d) have
a minor allele frequency of at least about 1% in three populations
other than the population of interest (e.g., Chinese, Japanese, and
Yoruba), and/or (e) do not have a significant deviation from
Hardy-Weinberg equilibrium in any of these populations. In some
cases, more than 100,000, 150,000, or 200,000 human loci can be
selected that meet criteria (a) through (e). Of the human loci
meeting criteria (a) through (e), a group of loci (e.g., top 2,500,
5,000, 7,500, 10,000, 20,000, 30,000, 40,000, 50,000, 75,000,
100,000, 150,000, or 200,000 loci) can be selected such that the
loci have a high degree of allele frequency in the population of
interest, cover the human genome in a somewhat evenly spaced manner
(e.g., at least one locus of interest every about 5 kb, 10 kb, 25
kb, 50 kb, 75 kb, 100 kb, 150 kb, 200 kb, 300 kb, 400 kb, 500 kb or
more), and are not in linkage disequilibrium with another selected
locus in any of the populations used for analysis. In some cases,
about 40, 50, 60, 70, 80, 90, 100, 110, 120, 130 thousand or more
loci can be selected as meeting each of these criteria and included
in an assay configured to identify allelic imbalance regions across
a human genome. For example, between about 70,000 and about 90,000
(e.g., about 80,000) SNPs can be selected for analysis with a SNP
array-based assay, and between about 45,000 and about 55,000 (e.g.,
about 54,000) SNPs can be selected for analysis with a
sequencing-based assay.
[0025] Accordingly, in one aspect of the present invention, a
method of detecting allelic imbalance status in a plurality of
genomic loci in a sample from a patient is provided, comprising the
steps of enriching a genomic DNA sample for DNA molecules each
comprising a locus of interest; sequencing said DNA molecules to
determine the genotype at each such locus; determining for each
locus whether it has allelic imbalance.
[0026] In another aspect of the present invention, a method of
detecting LOH status in a plurality of genomic loci in a sample
from a patient is provided, comprising the steps of enriching a
genomic DNA sample for DNA molecules each comprising a locus of
interest; sequencing said DNA molecules to determine the genotype
at each such locus; determining for each homozygous locus whether
it is homozygous due to LOH.
[0027] In another aspect of the present invention, a method of
detecting copy number status in a plurality of genomic loci in a
sample from a patient is provided, comprising the steps of
enriching a genomic DNA sample for DNA molecules each comprising a
locus of interest; sequencing said DNA molecules; and quantitating
each allele at each such locus to determine its copy number.
[0028] In some embodiments at least 10, 50, 100, 1,000, 10,000,
50,000, 55,000, 75,000, 100,000, 150,000, 200,000, 300,000,
400,000, 500,000, 750,000, 1,000,000, 2,000,000 or more loci are
evaluated. In some embodiments these loci are spaced evenly along
the genome. As used herein, loci are "evenly spaced along the
genome" when the percentage difference between the distance.sub.AB
between any two loci A and B and the distance.sub.CD between any
other two loci C and D (i.e.,
100*(distance.sub.AB-distance.sub.CD)/distance.sub.AB or
100*(distance.sub.AB-distance.sub.CD)/distance.sub.CD) is less than
or equal to 50%, 40%, 30%, 20%, 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%,
3%, 2%, or 1%. Such percentage difference is referred to herein as
the "genomic spacing" of loci. In some embodiments the sample is an
FFPE tissue sample. In some embodiments the sample is a tumor
sample from the patient.
[0029] Another aspect of the invention provides a system for
determining allelic imbalance status in a plurality of loci in a
sample comprising: a sample analyzer for (1) enriching a genomic
DNA sample for DNA molecules each comprising a locus of interest
and (2) sequencing said DNA molecules to produce a plurality of
quantitative signals about each such locus; a computer program for
analyzing said plurality of quantitative signals to determine
whether each such locus has allelic imbalance.
[0030] Another aspect of the invention provides a system for
determining LOH status in a plurality of loci in a sample
comprising: a sample analyzer for (1) enriching a genomic DNA
sample for DNA molecules each comprising a locus of interest and
(2) sequencing said DNA molecules to produce a plurality of
quantitative signals about each such locus; a computer program for
analyzing said plurality of quantitative signals to determine
whether each such locus is homozygous in the sample; and a computer
program for determining for each homozygous locus whether it is
homozygous due to LOH.
[0031] Another aspect of the invention provides a system for
detecting copy number status in a plurality of genomic loci in a
sample from a patient comprising: a sample analyzer for (1)
enriching a genomic DNA sample for DNA molecules each comprising a
locus of interest and (2) sequencing said DNA molecules to produce
a plurality of quantitative signals about each such locus; and a
computer program for analyzing said plurality of quantitative
signals to quantitate each allele at each such locus to determine
its copy number.
[0032] In some embodiments of the systems of the invention, one
sample analyzer both enriches the sample for DNA of interest and
sequences that DNA. In other embodiments two or more sample
analyzers perform these functions. In some embodiments, one
software program analyzes the plurality of quantitative signals to
determine whether each locus is homozygous in the sample and also
determines for each homozygous locus whether it is homozygous due
to LOH.
[0033] FIG. 3 is a diagram of an example of a computer device 1400
and a mobile computer device 1450, which may be used with the
techniques described herein. Computing device 1400 is intended to
represent various forms of digital computers, such as laptops,
desktops, workstations, personal digital assistants, servers, blade
servers, mainframes, and other appropriate computers. Computing
device 1450 is intended to represent various forms of mobile
devices, such as personal digital assistants, cellular telephones,
smart phones, and other similar computing devices. The components
shown here, their connections and relationships, and their
functions, are meant to be exemplary only, and are not meant to
limit implementations of the inventions described and/or claimed in
this document.
[0034] Computing device 1400 includes a processor 1402, memory
1404, a storage device 1406, a high-speed interface 1408 connecting
to memory 1404 and high-speed expansion ports 1410, and a low speed
interface 1415 connecting to low speed bus 1414 and storage device
1406. Each of the components 1402, 1404, 1406, 1408, 1410, and
1415, are interconnected using various busses, and may be mounted
on a common motherboard or in other manners as appropriate. The
processor 1402 can process instructions for execution within the
computing device 1400, including instructions stored in the memory
1404 or on the storage device 1406 to display graphical information
for a GUI on an external input/output device, such as display 1416
coupled to high speed interface 1408. In other implementations,
multiple processors and/or multiple buses may be used, as
appropriate, along with multiple memories and types of memory.
Also, multiple computing devices 1400 may be connected, with each
device providing portions of the necessary operations (e.g., as a
server bank, a group of blade servers, or a multi-processor
system).
[0035] The memory 1404 stores information within the computing
device 1400. In one implementation, the memory 1404 is a volatile
memory unit or units. In another implementation, the memory 1404 is
a non-volatile memory unit or units. The memory 1404 may also be
another form of computer-readable medium, such as a magnetic or
optical disk.
[0036] The storage device 1406 is capable of providing mass storage
for the computing device 1400. In one implementation, the storage
device 1406 may be or contain a computer-readable medium, such as a
floppy disk device, a hard disk device, an optical disk device, or
a tape device, a flash memory or other similar solid state memory
device, or an array of devices, including devices in a storage area
network or other configurations. A computer program product can be
tangibly embodied in an information carrier. The computer program
product may also contain instructions that, when executed, perform
one or more methods, such as those described herein. The
information carrier is a computer- or machine-readable medium, such
as the memory 1404, the storage device 1406, memory on processor
1402, or a propagated signal.
[0037] The high speed controller 1408 manages bandwidth-intensive
operations for the computing device 1400, while the low speed
controller 1415 manages lower bandwidth-intensive operations. Such
allocation of functions is exemplary only. In one implementation,
the high-speed controller 1408 is coupled to memory 1404, display
1416 (e.g., through a graphics processor or accelerator), and to
high-speed expansion ports 1410, which may accept various expansion
cards (not shown). In the implementation, low-speed controller 1415
is coupled to storage device 1406 and low-speed expansion port
1414. The low-speed expansion port, which may include various
communication ports (e.g., USB, Bluetooth, Ethernet, or wireless
Ethernet) may be coupled to one or more input/output devices, such
as a keyboard, a pointing device, a scanner, an optical reader, a
fluorescent signal detector, or a networking device such as a
switch or router, e.g., through a network adapter.
[0038] The computing device 1400 may be implemented in a number of
different forms, as shown in the figure. For example, it may be
implemented as a standard server 1420, or multiple times in a group
of such servers. It may also be implemented as part of a rack
server system 1424. In addition, it may be implemented in a
personal computer such as a laptop computer 1422. Alternatively,
components from computing device 1400 may be combined with other
components in a mobile device (not shown), such as device 1450.
Each of such devices may contain one or more of computing device
1400, 1450, and an entire system may be made up of multiple
computing devices 1400, 1450 communicating with each other.
[0039] Computing device 1450 includes a processor 1452, memory
1464, an input/output device such as a display 1454, a
communication interface 1466, and a transceiver 1468, among other
components (e.g., a scanner, an optical reader, a fluorescent
signal detector). The device 1450 may also be provided with a
storage device, such as a microdrive or other device, to provide
additional storage. Each of the components 1450, 1452, 1464, 1454,
1466, and 1468, are interconnected using various buses, and several
of the components may be mounted on a common motherboard or in
other manners as appropriate.
[0040] The processor 1452 can execute instructions within the
computing device 1450, including instructions stored in the memory
1464. The processor may be implemented as a chipset of chips that
include separate and multiple analog and digital processors. The
processor may provide, for example, for coordination of the other
components of the device 1450, such as control of user interfaces,
applications run by device 1450, and wireless communication by
device 1450.
[0041] Processor 1452 may communicate with a user through control
interface 1458 and display interface 1456 coupled to a display
1454. The display 1454 may be, for example, a TFT LCD
(Thin-Film-Transistor Liquid Crystal Display) or an OLED (Organic
Light Emitting Diode) display, or other appropriate display
technology. The display interface 1456 may comprise appropriate
circuitry for driving the display 1454 to present graphical and
other information to a user. The control interface 1458 may receive
commands from a user and convert them for submission to the
processor 1452. In addition, an external interface 1462 may be
provide in communication with processor 1452, so as to enable near
area communication of device 1450 with other devices. External
interface 1462 may provide, for example, for wired communication in
some implementations, or for wireless communication in other
implementations, and multiple interfaces may also be used.
[0042] The memory 1464 stores information within the computing
device 1450. The memory 1464 can be implemented as one or more of a
computer-readable medium or media, a volatile memory unit or units,
or a non-volatile memory unit or units. Expansion memory 1474 may
also be provided and connected to device 1450 through expansion
interface 1472, which may include, for example, a SIMM (Single In
Line Memory Module) card interface. Such expansion memory 1474 may
provide extra storage space for device 1450, or may also store
applications or other information for device 1450. For example,
expansion memory 1474 may include instructions to carry out or
supplement the processes described herein, and may include secure
information also. Thus, for example, expansion memory 1474 may be
provide as a security module for device 1450, and may be programmed
with instructions that permit secure use of device 1450. In
addition, secure applications may be provided via the SIMM cards,
along with additional information, such as placing identifying
information on the SIMM card in a non-hackable manner.
[0043] The memory may include, for example, flash memory and/or
NVRAM memory, as discussed below. In one implementation, a computer
program product is tangibly embodied in an information carrier. The
computer program product contains instructions that, when executed,
perform one or more methods, such as those described herein. The
information carrier is a computer- or machine-readable medium, such
as the memory 1464, expansion memory 1474, memory on processor
1452, or a propagated signal that may be received, for example,
over transceiver 1468 or external interface 1462.
[0044] Device 1450 may communicate wirelessly through communication
interface 1466, which may include digital signal processing
circuitry where necessary. Communication interface 1466 may provide
for communications under various modes or protocols, such as GSM
voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA,
CDMA2000, or GPRS, among others. Such communication may occur, for
example, through radio-frequency transceiver 1468. In addition,
short-range communication may occur, such as using a Bluetooth,
WiFi, or other such transceiver (not shown). In addition, GPS
(Global Positioning System) receiver module 1470 may provide
additional navigation- and location-related wireless data to device
1450, which may be used as appropriate by applications running on
device 1450.
[0045] Device 1450 may also communicate audibly using audio codec
1460, which may receive spoken information from a user and convert
it to usable digital information. Audio codec 1460 may likewise
generate audible sound for a user, such as through a speaker, e.g.,
in a handset of device 1450. Such sound may include sound from
voice telephone calls, may include recorded sound (e.g., voice
messages, music files, etc.) and may also include sound generated
by applications operating on device 1450.
[0046] The computing device 1450 may be implemented in a number of
different forms, as shown in the figure. For example, it may be
implemented as a cellular telephone 1480. It may also be
implemented as part of a smartphone 1482, personal digital
assistant, or other similar mobile device.
[0047] Various implementations of the systems and techniques
described herein can be realized in digital electronic circuitry,
integrated circuitry, specially designed ASICs (application
specific integrated circuits), computer hardware, firmware,
software, and/or combinations thereof. These various
implementations can include implementation in one or more computer
programs that are executable and/or interpretable on a programmable
system including at least one programmable processor, which may be
special or general purpose, coupled to receive data and
instructions from, and to transmit data and instructions to, a
storage system, at least one input device, and at least one output
device.
[0048] These computer programs (also known as programs, software,
software applications or code) include machine instructions for a
programmable processor, and can be implemented in a high-level
procedural and/or object-oriented programming language, and/or in
assembly/machine language. As used herein, the terms
"machine-readable medium" and "computer-readable medium" refer to
any computer program product, apparatus and/or device (e.g.,
magnetic discs, optical disks, memory, and Programmable Logic
Devices (PLDs)) used to provide machine instructions and/or data to
a programmable processor, including a machine-readable medium that
receives machine instructions as a machine-readable signal. The
term "machine-readable signal" refers to any signal used to provide
machine instructions and/or data to a programmable processor.
[0049] To provide for interaction with a user, the systems and
techniques described herein can be implemented on a computer having
a display device (e.g., a CRT (cathode ray tube) or LCD (liquid
crystal display) monitor) for displaying information to the user
and a keyboard and a pointing device (e.g., a mouse or a trackball)
by which the user can provide input to the computer. Other kinds of
devices can be used to provide for interaction with a user as well;
for example, feedback provided to the user can be any form of
sensory feedback (e.g., visual feedback, auditory feedback, or
tactile feedback); and input from the user can be received in any
form, including acoustic, speech, or tactile input.
[0050] The systems and techniques described herein can be
implemented in a computing system that includes a back end
component (e.g., as a data server), or that includes a middleware
component (e.g., an application server), or that includes a front
end component (e.g., a client computer having a graphical user
interface or a Web browser through which a user can interact with
an implementation of the systems and techniques described herein),
or any combination of such back end, middleware, or front end
components. The components of the system can be interconnected by
any form or medium of digital data communication (e.g., a
communication network). Examples of communication networks include
a local area network ("LAN"), a wide area network ("WAN"), and the
Internet.
[0051] The computing system can include clients and servers. A
client and server are generally remote from each other and
typically interact through a communication network. The
relationship of client and server arises by virtue of computer
programs running on the respective computers and having a
client-server relationship to each other.
[0052] In some cases, a system provided herein can be configured to
include one or more sample analyzers. A sample analyzer can be
configured to produce a plurality of signals about genomic DNA of a
cancer cell. For example, a sample analyzer can produce signals
that are capable of being interpreted in a manner that identifies
the allelic imbalance status of loci along a chromosome. In some
cases, a sample analyzer can be configured to carry out one or more
steps of a sequencing-based assay and can be configured to produce
and/or capture signals from such assays. In some cases, a computing
system provided herein can be configured to include a computing
device. In such cases, the computing device can be configured to
receive signals from a sample analyzer.
[0053] The computing device can include computer-executable
instructions or a computer program (e.g., software) containing
computer-executable instructions for carrying out one or more of
the methods or steps described herein. In some cases, such
computer-executable instructions can instruct a computing device to
analyze signals from a sample analyzer, from another computing
device, or from a sequencing-based assay. The analysis of such
signals can be carried out to determine genotypes, allelic
imbalance at certain loci, regions of allelic imbalance, the number
of allelic imbalance regions, to determine the size of allelic
imbalance regions, to determine the number of allelic imbalance
regions having a particular size or range of sizes, or to determine
a combination of these items.
[0054] In some cases, a system provided herein can include
computer-executable instructions or a computer program (e.g.,
software) containing computer-executable instructions for
formatting an output providing an indication about copy number,
allelic imbalance, LOH, or a combination of these items.
[0055] In some cases, a system provided herein can include a
pre-processing device configured to process a sample (e.g., cancer
cells) such that a sequencing-based assay can be performed.
Examples of pre-processing devices include, without limitation,
devices configured to enrich cell populations for cancer cells as
opposed to non-cancer cells, devices configured to lyse cells
and/or extract genomic nucleic acid, and devices configured to
enrich a sample for particular genomic DNA fragments.
[0056] Additional embodiments of the invention are as follows:
Embodiment 1
[0057] An in vitro method of detecting allelic imbalance status in
a plurality of genomic loci in a sample from a patient, comprising:
[0058] enriching a genomic DNA sample for DNA molecules each
comprising a locus of interest; [0059] sequencing said DNA
molecules to determine the genotype at each such locus; [0060]
determining for each locus whether it has allelic imbalance.
Embodiment 2
[0061] An in vitro method of detecting LOH status in a plurality of
genomic loci in a sample from a patient, comprising: [0062]
enriching a genomic DNA sample for DNA molecules each comprising a
locus of interest; [0063] sequencing said DNA molecules to
determine the genotype at each such locus; [0064] determining for
each homozygous locus whether it is homozygous due to LOH.
Embodiment 3
[0065] A system for determining allelic imbalance status in a
plurality of genomic loci in a sample comprising: [0066] a sample
analyzer for (1) enriching a genomic DNA sample for DNA molecules
each comprising a locus of interest and (2) sequencing said DNA
molecules to produce a plurality of quantitative signals about each
such locus; [0067] a computer program for analyzing said plurality
of quantitative signals to determine the genotype of each such
locus in the sample; and [0068] a computer program for determining
for each locus whether it has allelic imbalance.
Embodiment 4
[0069] A system for determining LOH status in a plurality of
genomic loci in a sample comprising: [0070] a sample analyzer for
(1) enriching a genomic DNA sample for DNA molecules each
comprising a locus of interest and (2) sequencing said DNA
molecules to produce a plurality of quantitative signals about each
such locus; [0071] a computer program means for analyzing said
plurality of quantitative signals to determine the genotype of each
such locus in the sample; and [0072] a computer means for
determining for each homozygous locus whether it is homozygous due
to LOH.
Embodiment 5
[0073] The method of either Embodiment 1 or Embodiment 2 or the
system of either Embodiment 3 or Embodiment 4, wherein said
plurality of genomic loci comprises at least 10, 50, 100, 1,000,
10,000, 50,000, 55,000, 75,000, 100,000, 150,000, 200,000, 300,000,
400,000, 500,000, 750,000, 1,000,000, or 2,000,000 or more
loci.
Embodiment 6
[0074] The method or system of Embodiment 5, wherein said genomic
loci are evenly spaced along the genome.
Embodiment 7
[0075] The method or system of Embodiment 6, wherein the genomic
spacing of said plurality of genomic loci is less than or equal to
50%, 40%, 30%, 20%, 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or
1%.
Embodiment 8
[0076] The method of either Embodiment 1 or Embodiment 2 or the
system of either Embodiment 3 or Embodiment 4, wherein said sample
is a formalin-fixed, paraffin-embedded tissue sample.
Embodiment 9
[0077] The method or system of Embodiment 8, wherein said sample is
a tumor sample extracted from the patient.
Examples
[0078] The process described here utilized an Agilent SureSelect
Capture system followed by IIlumina HiSeq sequencing, however any
in solution or solid support based capture method and high
throughput parallel sequencing platform could be used.
[0079] The initial design selection process utilized the
.sup..about.2.5 million SNPs on the IIlumina Omni2.5 SNP array.
This list of SNPs was chosen because it is the currently the
largest list of SNPs from which there is genotyping information
available for multiple different population groups. All 2,448,785
SNP locations were input into the Agilent eArray Sure Select Target
Enrichment wizard for Single End Long Reads using the default
settings. 1,353,042 passed the selection criteria and had baits
designed.
[0080] Then, 110,000 SNPs with high minor allele frequencies and
evenly covering the genome were selected. In the selection, SNPs in
strong linkage disequilibrium and SNPs with strong deviation from
Hardy-Weinberg equilibrium were discarded.
[0081] Two preliminary library designs were constructed comprised
of 55,000 probes each targeting 55,000 different SNP locations.
Testing was carried out using a high quality normal DNA sample to
check for even capture of both alleles of every SNP. In addition, 4
FFPE samples were captured and used to select the most optimally
performing probes. We looked for probes that showed robust capture
and even sequence depth without over or underrepresentation of
sequence reads in the final sequencing library.
[0082] The final capture probe library design was comprised of the
55,000 optimal probes identified using the preliminary capture
designs.
[0083] The results of measuring copy number and LOH using the above
sequencing technique are shown in FIG. 2 (with FIG. 1 showing
microarray analysis on fresh frozen tissue as a comparison).
[0084] All publications and patent applications mentioned in the
specification are indicative of the level of those skilled in the
art to which this invention pertains. All publications and patent
applications are herein incorporated by reference to the same
extent as if each individual publication or patent application was
specifically and individually indicated to be incorporated by
reference. The mere mentioning of the publications and patent
applications does not necessarily constitute an admission that they
are prior art to the instant application.
[0085] Although the foregoing invention has been described in some
detail by way of illustration and example for purposes of clarity
of understanding, it will be obvious that certain changes and
modifications may be practiced within the scope of the appended
claims.
* * * * *