U.S. patent application number 12/619726 was filed with the patent office on 2010-06-03 for integrated analyses of breast and colorectal cancers.
This patent application is currently assigned to The Johns Hopkins University. Invention is credited to Kenneth W. Kinzler, Rebecca J. Leary, Victor E. Velculescu, Bert Vogelstein.
Application Number | 20100136560 12/619726 |
Document ID | / |
Family ID | 42223167 |
Filed Date | 2010-06-03 |
United States Patent
Application |
20100136560 |
Kind Code |
A1 |
Vogelstein; Bert ; et
al. |
June 3, 2010 |
Integrated Analyses of Breast and Colorectal Cancers
Abstract
Genome-wide analysis of copy number changes in breast and
colorectal tumors used approaches that can reliably detect
homozygous deletions and amplifications. The number of genes
altered by major copy number changes--deletion of all copies or
amplification of at least twelve copies per cell--averaged thirteen
per tumor. These data were integrated with previous mutation
analyses of the Reference Sequence genes in these same tumor types
to identify genes and cellular pathways affected by both copy
number changes and point alterations. Pathways enriched for genetic
alterations include those controlling cell adhesion, intracellular
signaling, DNA topological change, and cell cycle control. These
analyses provide an integrated view of copy number and sequencing
alterations on a genome-wide scale and identify genes and pathways
that are useful for cancer diagnosis and therapy.
Inventors: |
Vogelstein; Bert;
(Baltimore, MD) ; Kinzler; Kenneth W.; (Baltimore,
MD) ; Leary; Rebecca J.; (Baltimore, MD) ;
Velculescu; Victor E.; (Dayton, MD) |
Correspondence
Address: |
BANNER & WITCOFF, LTD.
1100 13th STREET, N.W., SUITE 1200
WASHINGTON
DC
20005-4051
US
|
Assignee: |
The Johns Hopkins
University
Baltimore
MD
|
Family ID: |
42223167 |
Appl. No.: |
12/619726 |
Filed: |
November 17, 2009 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61119103 |
Dec 2, 2008 |
|
|
|
Current U.S.
Class: |
435/6.12 |
Current CPC
Class: |
C12Q 2600/156 20130101;
A61P 35/00 20180101; C12Q 1/6886 20130101 |
Class at
Publication: |
435/6 |
International
Class: |
C12Q 1/68 20060101
C12Q001/68 |
Goverment Interests
[0001] The disclosed invention was made using funds from the U.S.
government, particularly National Institutes of Health grants CA
043460, CA 057345, CA 062924, and CA 121113. The U.S. government
therefore retains certain rights in the invention.
Claims
1. A method of characterizing a breast or colon tumor in a human,
comprising the steps of: determining a mutated pathway selected
from those listed in Table 3 or SI Table 6 in a breast or colon
tumor sample by determining at least one somatic mutation in a gene
in the pathway in a test sample relative to a normal sample of the
human; assigning the breast or colon tumor to a first group of
breast or colon tumors that have a somatic mutation in a gene in
said pathway.
2. The method of claim 1 wherein the first group comprises tumors
with mutations in a plurality of genes in said pathway.
3. The method of claim 1 wherein the first group consists of tumors
with mutation in a gene in said pathway.
4. The method of claim 1 wherein a mutation is determined by DNA
sequencing.
5. The method of claim 1 wherein a mutation is determined by DNA
sequencing in which a polymerase or ligase enzyme is used to join
together nucleotides and oligonucleotides complementary to the
gene.
6. The method of claim 1 wherein a mutation is determined by DNA
sequencing in which a single stranded DNA molecule obtained from
the gene is hybridized to a single stranded DNA reagent.
7. The method of claim 1 wherein a mutation is determined by DNA
sequencing in which DNA molecules are separated by length or
mass.
8. The method of claim 1 wherein a mutation is determined by DNA
sequencing employing a dideoxynucleotide inhibitor.
9. The method of claim 1 wherein a mutation is determined by DNA
sequencing in an automated sequencing machine.
10. The method of claim 1 further comprising the steps of:
administering a drug or drug candidate to the first group.
11. The method of claim 10 further comprising the steps of:
administering a drug or drug candidate to a second group of breast
or colon tumors that does not have a mutation is said pathway.
12. The method of claim 11 further comprising the steps of:
comparing efficacy of the drug or drug candidate on the first group
to efficacy on the second group; identifying a pathway which
correlates with increased or decreased efficacy of the drug or
candidate drug in the first group relative to the second group.
13. The method of claim 1 further comprising the steps of:
comparing efficacy of a candidate or known anti-cancer therapeutic
on the first group to efficacy on a second group of breast or colon
tumors that does not have a mutation is said pathway; identifying a
pathway which correlates with increased or decreased efficacy of
the candidate or known anti-cancer therapeutic in the first group
relative to other groups.
14. The method of claim 1 wherein the mutation is selected from the
group consisting of a point mutation, a homozygous deletion, and a
genomic amplification.
15. A method of detecting or diagnosing a breast or colon tumor or
minimal residual disease of a breast or colon tumor or molecular
relapse of a breast or colon tumor in a human, comprising the steps
of: determining in a test sample of a tumor or suspected tumor of
the human, a genomic amplification of at least one genomic region,
said genomic region selected from the group consisting of those
listed in SI Table 4 or Table 1; identifying the human as likely to
have a breast or colon tumor, minimal residual disease, or
molecular relapse of breast or colon tumor when the amplification
is determined.
16. The method of claim 15 wherein genomic amplification is
determined by generating fragments of genomic DNA from the test
sample, ligating the fragments into a concatenate, and sequencing
the concatenate.
17. The method of claim 15 wherein genomic amplification is
determined by hybridizing genomic DNA from the test sample to an
array of oligonucleotides.
18. The method of claim 15 wherein the genomic amplification is at
least 6-fold increased relative to a normal sample of the
human.
19. A method of detecting or diagnosing a breast or colon tumor or
minimal residual disease of a breast or colon tumor or molecular
relapse of a breast or colon tumor in a human, comprising the steps
of: determining in a test sample of a tumor or suspected tumor of
the human, a genomic deletion of at least one genomic region, said
genomic region selected from the group consisting of those listed
in SI Table 5 or Table 2; identifying the human as likely to have a
breast or colon tumor, minimal residual disease, or molecular
relapse of breast or colon tumor when the homozygous deletion is
determined.
20. The method of claim 19 wherein genomic deletion is determined
by generating fragments of genomic DNA from the test sample,
ligating the fragments into a concatenate, and sequencing the
concatenate.
21. The method of claim 19 wherein genomic deletion is determined
by hybridizing genomic DNA from the test sample to an array of
oligonucleotides.
22. The method of claim 19 wherein the genomic deletion is
homozygous.
Description
TECHNICAL FIELD OF THE INVENTION
[0002] This invention is related to the area of classifying,
characterizing, detecting and diagnosing cancers. In particular, it
relates to breast and colorectal cancers.
BACKGROUND OF THE INVENTION
[0003] It is well accepted that cancer is the result of the
sequential mutations of oncogenes and tumor suppressor genes (1).
Historically, the discovery of these genes has been accomplished
through analyses of individual candidate genes chosen on the basis
of functional or biologic data implicating them in the tumorigenic
process. Recent advances in genomic technologies and bioinformatics
have permitted simultaneous evaluation of many genes, thereby
offering more comprehensive and unbiased information (2, 3). For
example, the sequence of large families of genes, and even the
human genes in the Reference Sequence (RefSeq) database, have been
determined in subsets of human cancers (4, 5). However, the
alterations detected by sequencing represent only one category of
genetic change that occurs in human cancer. Other alterations
include gains (amplifications) and losses (deletions) of discrete
chromosomal sequences that occur during tumor progression. Dramatic
amplifications of oncogenes such as ERBB2 (6) or MYC (7) and
deletions of tumor suppressor genes such as CDKN2A (8), PTEN (9,
10) and SMAD4 (11) have demonstrated the importance of these
mechanisms of genetic alteration in particular tumor types. A
comprehensive picture of genetic alterations in human cancer should
therefore include the integration of sequence based alterations
together with copy number gains and losses.
[0004] Evaluations of copy number changes in cancers using a
variety of array types have been previously reported (12). Several
of the more recent studies employed oligonucleotide arrays capable
of distinguishing >100,000 genomic loci in colon, breast lung,
pancreatic, and skin cancers as well as certain leukemias (13-20).
However, identification of focal, high copy amplifications or
homozygous deletions (HDs) have infrequently been reported because
many prior copy number analyses on arrays have used genomic DNA
purified from primary tumors. Primary tumors contain varying
proportions of non-neoplastic cells thereby reducing the apparent
extent of amplification and obscuring focal amplifications--defined
by the increased copy number of a small region of the genome--from
simple gains of whole chromosome arms. Furthermore, HDs can be
difficult to discern in primary tumors due to confounding
hybridization signals from non-neoplastic cells (17).
[0005] Many of the problems encountered with primary tumor samples
can be overcome by use of early passage cancer cell lines or
xenografts which are devoid of human non-neoplastic cells. Previous
studies have shown that the process of generating such in vitro or
in vivo cultures is not associated with the development of
additional genetic alterations (21). It is now widely recognized
that HDs found in cell lines and xenografts represent true genetic
alterations that are present in clonal fashion in primary tumors
but are difficult to document in the latter because of
contaminating non-neoplastic cells (22, 23).
[0006] There is a continuing need in the art for methods to
characterize, classify, detect and diagnose breast and colorectal
cancers.
SUMMARY OF THE INVENTION
[0007] According to one embodiment of the invention a method of
characterizing a breast or colon tumor in a human is provided. A
mutated pathway selected from those listed in Table 3 or SI Table 6
is determined in a breast or colon tumor by determining at least
one somatic mutation in a gene in the pathway in a test sample
relative to a normal sample of the human. The breast or colon tumor
is assigned to a first group of breast or colon tumors that have a
somatic mutation in at least one gene in said pathway.
[0008] According to another embodiment of the invention a method of
detecting or diagnosing a breast or colon tumor or minimal residual
disease of a breast or colon tumor or molecular relapse of a breast
or colon tumor in a human is provided. A genomic amplification of
at least one genomic region is determined in a test sample of a
tumor or suspected tumor of the human. The genomic region is
selected from the group consisting of those listed in SI Table 4 or
Table 1. The human is identified as likely to have a breast or
colon tumor, minimal residual disease, or molecular relapse of
breast or colon tumor when the amplification is determined.
[0009] According to another embodiment a method is provided of
detecting or diagnosing a breast or colon tumor or minimal residual
disease of a breast or colon tumor or molecular relapse of a breast
or colon tumor in a human. A genomic deletion of at least one
genomic region is determined in a test sample of a tumor or
suspected tumor of the human. The genomic region is selected from
the group consisting of those listed in SI Table 5 or Table 2. The
human is identified as likely to have a breast or colon tumor,
minimal residual disease, or molecular relapse of breast or colon
tumor when the homozygous deletion is determined.
[0010] These and other embodiments which will be apparent to those
of skill in the art upon reading the specification provide the art
with methods for detecting, classifying, characterizing and
diagnosing breast and colorectal tumors.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 shows alterations in the combined FGF, EGFR, ERBB2
and PI3K pathways. Genes affected by copy number alterations are
circled in red, while those altered by point mutations are circled
in blue. The number of breast (B) and colorectal (C) tumors
containing alterations are indicated in boxes adjacent to each
gene.
[0012] FIG. 2A-2B shows genomic landscape of a copy number and
nucleotide alterations in two typical cancer samples. FIG. 2A
indicates breast cancer alterations while FIG. 2B indicates
colorectal cancer alterations. The telomere of the short arm of
chromosome 1 is represented in the rear left corner of the green
plane and ascending chromosomal positions continue in the direction
of the arrow. Chromosomal positions that follow the front edge of
the plane are continued at the back edge of the plane of the
adjacent row and chromosomes are appended end to end. Peaks
indicate the 60 highest-ranking candidate cancer genes for each
tumor type, with peak heights reflecting the passenger probability
scores. The yellow and blue peaks correspond to genes that are
altered by copy number changes, while those altered only by point
mutations are purple. The dots represent genes that were altered by
copy number changes (red squares) or point mutations (white
circles) in the B9C breast or Mx27 colorectal tumor samples.
Altered genes participating in significant gene groups or pathways
(FIG. 13; SI Table 6) are indicated as black circles or
squares.
[0013] FIG. 3 (SI FIG. 1.) shows schematic of experimental approach
for integration of copy number and sequence alterations in breast
and colorectal cancers
[0014] FIG. 4 (SI FIG. 2.) shows detection of amplifications and
homozygous deletions using Illumina arrays and Digital Karyotyping.
Digital Karyotyping results are shown in the top graphs, with the
chromosomal coordinates indicated on the horizontal axis and the
Digital Karyotyping tag density ratio indicated on the vertical
axis. Illumina array results are shown in the bottom graphs, with
the chromosomal coordinates indicated on the horizontal axis and
the Log R Ratio indicated on the vertical axis. Digital Karyotyping
data were used to validate the Illumina arrays and to develop
approaches for sensitive and specific detection of focal
amplifications and homozygous deletions.
[0015] FIG. 5 (Table 1) Top candidate cancer genes in breast and
colorectal cancer amplifications.
[0016] FIG. 6 (Table 2) Top candidate cancer genes in breast and
colorectal cancer homozygous deletions.
[0017] FIG. 7 (Table 3) Candidate cancer pathways altered in breast
and colorectal cancers.
[0018] FIG. 8 (SI Table 1) Comparison between Illumina.TM. array
and Digital Karyotyping copy number analyses.
[0019] FIG. 9 (SI Table 2) Copy number changes detected by Digital
Karyotyping in colorectal cancer
[0020] FIG. 10 (SI Table 3) Amplifications and homozygous deletions
detected by Illumina.TM. arrays in breast and colorectal
cancers
[0021] FIG. 11 (SI Table 4) Amplified genes in breast and
colorectal cancers.
[0022] FIG. 12 (SI Table 5) Homozygously deleted genes in breast
and colorectal cancers.
[0023] FIG. 13 (SI Table 6) Pathways enriched for copy number
alterations and point mutations.
[0024] FIG. 14 (SI Table 7) Breast and colorectal cancer samples
used in these analyses.
DETAILED DESCRIPTION OF THE INVENTION
[0025] The inventors have developed means of diagnosing,
classifying, characterizing, and detecting breast and colorectal
tumors based on somatic mutations in genes in pathways, including
point mutations, genomic amplifications, and genomic deletions.
[0026] Xenografts or cell lines derived from breast and colorectal
cancers were examined to obtain high resolution analyses of copy
number and nucleotide alterations. Tumors were evaluated with
microarrays containing at least 317,000 SNP probes and selected
samples were also evaluated with Digital Karyotyping (24). This
latter method provides a highly quantitative measure of gene copy
number and was used to validate the sensitivity and specificity of
the microarray data. The sequences of the 18,191 genes from the
RefSeq database previously determined for breast and colorectal
cancers were integrated with these results, providing a genome-wide
analysis of sequence and copy number alterations.
[0027] The integrated mutational analysis described here provides a
global picture of the genetic alterations of breast and colorectal
cancers. The combination of sequencing and copy number analysis at
the whole genome level permits the identification of genes and
pathways that may not be easily detected by either analysis alone.
The analysis of point mutations can provide independent information
that can help identify candidate target genes in regions of
amplification or HD. As gene groups and pathways can be affected by
sequence and copy number changes, a combined analysis can highlight
the groups that are enriched for these somatic alterations.
[0028] The analysis of copy number changes can also provide general
insights into the functional effects of point mutations. Single
nucleotide substitutions in genes that are observed to be deleted
are more likely to be inactivating, while substitutions in genes
that are amplified are more likely to be activating. This was
confirmed by the observation of HDs and point mutations in TP53,
SMAD2, SMAD3, and PTEN all of which are thought to be tumor
suppressors. If copy number changes faithfully reflect the overall
effect of target genes, one would expect to infrequently see both
amplifications and HDs of the same set of genes in human tumors.
Accordingly, we observed an under-representation of genes that are
homozygously deleted in one tumor and amplified in another (only
two of the 1148 altered genes identified were altered by both
amplification and HD (p<0.01, binomial test) and neither were
considered good candidates by the integrated statistical
analyses).
[0029] In addition to identifying genes through the integrated
analysis of point mutations and copy number changes, a number of
issues arise from these studies that have implications for future
large scale genomic analyses (36). One is that the complexity of
genetic alterations in human cancer increases when considering both
point alterations and copy number changes. In addition to a median
of 84 and 76 genes altered by point mutation, breast and colorectal
cancers have a median of 24 and 9 genes altered by a major copy
number change. These observations support a view of the breast and
colorectal cancer genomic landscape where a few commonly affected
"gene mountains" are scattered among a much larger number of "gene
hills" that are infrequently altered by either point mutation or
copy number changes. An example of a cancer genome landscape that
incorporates copy number changes, illustrated in FIG. 2, shows new
gene mountains and hills that result from the combined
analysis.
[0030] Though cancer genome landscapes are complex, they may be
better understood by placing all genetic alterations within defined
cellular pathways. Our analyses identified several converging gene
pathways, including the ERBB2, EGFR and PI3K pathways, that were
affected by copy number changes and point alterations in both
breast and colorectal cancers. In addition, many pathways
implicated in colorectal tumor progression (Notch, AKT, and MAPK)
were enriched for alterations. Interestingly, many gene groups
contained genes that were both amplified and others that were
deleted, suggesting that different genes within the same group or
pathway may be affected through alternate mechanisms. This is
consistent with the observation that most signaling pathways
contain both positive and negative regulators and alterations in
any of these can lead to dysregulated signaling.
[0031] The copy number and sequence alterations reported here
should be placed in the context of other analyses to reveal the
full compendium of molecular changes in a tumor cell. One
limitation of our approach is that the copy number analyses we
performed may have missed very small regions (<20 kb) that were
amplified or deleted. Use of arrays with higher numbers of SNPs or
larger DK libraries generated using next generation sequencing
approaches will help improve the sensitivity of these analyses.
Additionally, the incorporation of approaches that detect
structural changes (e.g. translocations) and epigenetic alterations
will likely prove to be useful. Finally, as has been done with
karyotypic abnormalities (37), it will be important to determine
the timing of these alterations within each tumor type by analysis
of additional tumor samples from different stages. In this regard,
it should be noted that other methods of tumor isolation may not
result in tumor DNA purity that will allow the sensitive and
quantitative detection of copy number alterations afforded by our
studies (38, 39).
[0032] The development of approaches to identify genetic
alterations on a genome-wide scale has made the discovery of
mutations the "easy" part of cancer gene discovery efforts.
Functional studies to identify the culprits underlying the 1077
copy number changes discovered from our study would currently be
impractical. The statistical techniques we developed highlight the
best candidates for future functional studies, but it remains
possible that specific loci are more likely to be altered by copy
number changes than others because they are located near fragile
sites or other hotspots for recombination (40). Therefore, these
genetic analyses can only identify candidate genes that may play a
role in cancer and do not definitively implicate any gene in the
neoplastic process.
[0033] Several of the pathways identified affected a relatively
high fraction of cancers and may be useful for cancer diagnosis or
therapy. Alterations in signaling pathways of FGFR, EGFR, ERBB2 and
PI3K were detected in nearly two thirds of breast and colorectal
tumors that were comprehensively examined in this study. These data
suggest that the ERBB2 inhibitors may be useful not only in breast
cancer but also in selected colorectal cancer patients in
combination with existing therapeutic agents. Additionally, a
significant fraction of the breast tumors analyzed had genetic
alterations in a process regulating DNA topology. Although TOP2A is
co-amplified with ERBB2 and therefore does not represent the likely
driver of this amplicon, alterations of TOP2A may still be of
clinical utility. As higher doses of anthracyclines may improve
clinical outcomes in breast cancer patients with TOP2A
amplifications (41, 42), our observations suggest that the
additional alterations that we identified could be used to select
patients that may respond to topoisomerase-targeted therapies. In a
similar fashion, tumor cells deficient in certain cellular
processes as a result of HDs could be targeted pharmacologically
through synthetic lethality. In a general sense, our discovery that
a typical colorectal or breast cancer has 4 to 7 genes homozygously
deleted suggests that further development of strategies targeting
such HDs (43) could be widely applicable.
[0034] Mutations, including homozygous deletions, genomic
amplifications, and point mutations can be determined by any means
known in the art, including but not limited to the methods
described below. Sequencing, digital karyotyping, and hybridization
to SNP arrays, are non-limiting examples of techniques which can be
used. DNA sequencing can be performed using any techniques which
are known in the art, for example, based on chemical degradation,
enzymatic synthesis, ligation, hybridization, etc. Enzymes which
can be used include but are not limited to polymerases and ligases.
Synthesized or degraded nucleic acids can be analyzed using
techniques which separate molecules based on length or mass, for
example. Sequence determinations can be performed manually or in an
automated fashion. Some techniques which can be exploited utilize
radiolabeled or fluorescently labeled nucleotides. Single stranded
oligonucleotides can be employed as probes or primers, both of
which may hybridize to the analyte. Some methods utilize
dideoxynucleotides which act as monomers and terminators of DNA
synthesis.
[0035] Mutation, deletion, or amplification determination involves
one or more ex vivo samples which are processed in order to analyze
the genetic material (or sometimes the proteins encoded by the
genetic material). Typically this involves purification or
enrichment of nucleic acids and removal or de-enrichment of other
cellular components, such as protein, lipid, carbohydrates. The
nucleic acids are further reacted chemically or enzymatically to
yield readily detectable products which correspond to the nucleic
acids in the ex vivo samples. Determination of a somatic mutation
is done by comparing a tumor sample or characteristic to a normal
sample of the same individual. Differences can be observed and
recorded by a human or a machine or a computer.
[0036] Changes in copy number of a genomic segment can be
determined by any means known in the art. In one technique,
fragments (enzymatically generated or random) are generated and
ligated together to form a chain or concatenate. The concatenates
can be sequenced, and underrepresented or overrepresented fragments
of the genome can be noted. Alternatively genomic DNA fragments can
be hybridized to an array of oligonucleotides and their relative
prevalence scored. Such techniques may detect deletions or
amplifications. Changes in copy number may be from diploid to
homozygous deletion, or amplifications ranging from diploid to at
least 5-, at least 6-, at least 7-, at least 8-, at least 9-, at
least 10-, at least 15-, at least 20-, at least 25-fold of
diploid.
[0037] Tumors or patients bearing tumors can be divided into or
assigned to groups based on the presence or absence of a particular
somatic mutation. The group with the mutation may optionally
contain tumors with a particular mutation in a particular gene,
tumors with mutations in a single gene, or tumors with mutations in
a single pathway. Groups comprising tumors with mutations in a
single gene or a single pathway may be the same or different types
of mutations.
[0038] Groups that are divided on the basis of a mutation in a gene
or in a pathway may be used to evaluate drugs or other therapeutic
treatments. This permits the determination of groups which are
susceptible or refractory to the treatment. Thus patients who are
susceptible can be successfully treated, and patients who are
refractory can avoid expensive, potentially hazardous, and
ultimately ineffective treatments.
[0039] The mutations in genes and pathways, including point
mutations, homozygous deletions, and amplifications, can also be
used to detect or diagnose breast or colon tumors, or minimal
residual disease of such tumors, or molecular relapse of such
tumors. The mutations and genes and pathways which have been found
are characteristic of these cancers and can be used to identify
them in various stages of disease. Characteristic mutations are not
necessarily present in all or even in a majority of tumors of the
breast or colon.
[0040] Mutations found in tumors can be determined or confirmed by
comparison to normal tissue. Somatic mutations are ones that occur
in the tumor but are not found in normal tissue of the individual.
Thus a comparison between tumor and normal can be used for
identification and confirmation.
[0041] The above disclosure generally describes the present
invention. All references disclosed herein are expressly
incorporated by reference. A more complete understanding can be
obtained by reference to the following specific examples which are
provided herein for purposes of illustration only, and are not
intended to limit the scope of the invention.
Example 1
Optimization of Copy Number Analysis with Digital Karyotyping
[0042] Digital Karyotyping (DK) was used as a standard to develop
criteria for assessing amplifications and HDs with Illumina high
density SNP arrays. Analysis of DK libraries from 18 colorectal
tumor samples identified a total of 21 amplification events, each
containing relatively small chromosomal regions (41 kb to 2.3 Mb)
with 12 to 186 copies per nucleus (SI Table 2). We also found 4
regions within the autosomal chromosomes where the tag density
reached zero, representing HDs. As expected, we identified
low-amplitude gains and losses of entire chromosomes, chromosomal
arms, or other large genomic regions. We did not pursue these
low-amplitude copy number changes as it is difficult to reliably
identify candidate cancer genes from such large regions. To ensure
that the copy number changes identified by DK were bona fide
amplifications or HDs, we independently examined 12 alterations by
quantitative PCR and confirmed the presence of the genomic
alterations in every case examined.
[0043] We then directly compared DK data to those obtained through
genomic hybridization of the same DNA samples to Illumina high
density oligonucleotide arrays. The Illumina platform employs a two
step procedure based on oligo hybridization and single base
extension for analysis of genomic SNPs (25). The combination of
these two steps leads to greater fidelity of SNP calls and
decreases false hybridization signals. Using fluorescence intensity
measurements we developed an approach to detect amplifications
resulting in 12 or more copies per nucleus (6-fold or greater
amplification compared to the diploid genome) as well as deletions
of both copies of a gene (HDs) (see SI Methods).
[0044] Using this new approach, 14 amplification events and 3 HD
events identified by DK in 3 representative tumor samples were
detected by Illumina arrays (SI Table 1 and SI FIG. 2). In all
cases, the genomic boundaries identified by both approaches were
similar and within the resolution expected for both methods. The
copy number of amplifications determined by DK was higher than
those same regions identified by Illumina arrays. It is known that
arrays underestimate the copy number of amplifications (26) and
this was confirmed by real-time PCR for amplifications in SI Table
1. No additional copy number changes of the sizes expected to be
detected by DK were identified by the microarray approach in these
samples. We did identify 25 additional small HDs; all of these were
<250 kb in length and would not have been possible to detect
with DK given the number of tags analyzed (24). To independently
validate such smaller HDs, we used PCR and Sanger sequencing to
examine genes located within small HDs and found that in each case,
multiple exons of each gene could not be amplified or sequenced.
These results suggested that our approach for analysis of Illumina
array data provided a sensitive and specific method for
identification of amplifications and HDs, including relatively
small alterations of either type.
Example 2
Detection of Amplifications and Homozygous Deletions
[0045] A total of 45 breast and 36 colorectal tumors were analyzed
by Illumina arrays containing either .about.317,000 or
.about.550,000 SNPs (SI FIG. 1). To determine the fraction of
alterations that were likely to be somatic (i.e., tumor derived),
we analyzed these regions in 23 matched normal samples. In the
normal samples, no amplifications and only four distinct HDs were
detected. We removed these alterations from further analysis, as
well as those corresponding to known copy number variation in
normal human cells (27, 28). Finally, we removed any copy number
changes where the boundaries were identical in two or more samples,
as these were likely to represent germline variants. Based on this
conservative strategy, we estimated that >95% of the 614
amplifications and 463 HDs (SI Table 3) represented true somatic
alterations.
[0046] Breast cancers contributed to a majority of the alterations
identified, comprising 68% and 81% of the total HDs and
amplifications, respectively. Individual colorectal and breast
tumors had on average 7 and 18 copy number alterations,
respectively. Each colorectal cancer had an average of 4 HDs and 3
amplifications. Breast cancers had on average 7 HDs and 11
amplifications. Several of the tumor samples contained copy number
alterations that were separated by short non-amplified or deleted
sequences, presumably reflecting the complex structure of these
alterations (29, 30).
[0047] The copy number alterations observed encompassed on average
1.7 and 2.4 Mb of colorectal and breast haploid genomic sequence,
respectively. Each HD affected the coding region of one gene on
average, while an average amplicon contained two genes. The average
numbers of protein-coding genes that were affected by either
amplification or HD were 24 and 9 per breast and colorectal cancer,
respectively.
Example 3
Genes Altered in More than One Tumor
[0048] One of the main challenges in the analysis of somatic
alterations in cancers involves the distinction between those
changes which are selected for during tumorigenesis (driver
alterations) from those that provide no selective advantage
(passenger alterations). Even in regions that have multiple copy
number alterations, this distinction can be particularly difficult
because regions of amplification and HD can contain multiple genes,
only a subset of which are presumably the underlying targets. We
reasoned that the integration of copy number analyses with sequence
data would help reveal the driver genes that were more likely to
contain genetic alterations. To accomplish this integration, we
developed a new statistical approach for determining whether the
observed genetic alterations of any type in any gene were likely to
reflect an underlying mutation frequency that was significantly
higher than the passenger rate. To analyze the probability that a
given gene would be involved in a copy number alteration, we made
the conservative assumption that the frequency of all
amplifications and HDs observed in each tumor type represented the
passenger mutation frequency (i.e., we assumed that all copy number
changes were passengers). The number of actual copy number
alterations affecting each gene in all tumors was then compared to
the simulated number of expected passenger alterations taking into
account gene size, the distribution of SNP locations, and the
frequency of passenger amplifications and HDs in breast and
colorectal cancers.
[0049] We integrated these copy number analyses with the sequence
data of the Sjoblom et al. and Wood et al studies (5, 31). In these
studies, the protein coding sequences of 20,857 transcripts from
the 18,191 genes in the RefSeq database were determined in 11
breast and 11 colorectal cancer samples, allowing detection of
somatic sequence alterations. Genes containing somatic alterations
were subsequently analyzed for mutations in additional tumors of
the same type. In the current study, the same 22 breast and
colorectal tumor samples were analyzed in parallel by Illumina
arrays, together with additional samples of each tumor type (SI
FIG. 1 and SI Table 7). To integrate these different mutational
data for each tumor type, we combined the probability that a gene
was a driver gene based on the type and frequency of point
mutations previously observed with the probability that the gene
was a driver based on the number of observed amplifications and
HDs.
[0050] Table 1 lists the loci that were amplified in at least one
tumor and had the highest probability of containing driver genes as
determined by the combined mutation analysis (a complete list of
amplifications is provided in SI Table 3 and amplified genes in SI
Table 4). For genes to be considered potential targets of the
amplification, the entire coding region of the gene was required to
be contained within a focal amplicon. A few candidate genes in this
list (e.g. CCNE1 (cyclin E) and ERBB2) were amplified in multiple
tumors but were not found to be mutated by sequencing. The majority
of candidate genes, however, harbored point mutations in some
tumors and amplifications in others. The most striking aspect of
this list of candidate genes is that only some of them had been
implicated in cancer in the past. Of the 19 genes indicated in
Table 1, only 8 had been previously implicated in tumorigenesis.
The known cancer genes included MYC, ERBB2 (HER2/NEU), CCNE1,
CCND1, EGFR, FGFR2, and IRS2, each of which had been shown to be
amplified. In addition, MRE11, which was amplified in breast
cancers, has been shown to be mutated in small fraction of
colorectal cancers and is thought to play an essential role in
maintaining chromosomal stability (32). Some genes were shown to be
altered in both breast and colorectal cancers, with at least one of
the tumors containing amplifications. Interestingly, among these
genes, ERBB2 was found to be amplified in both breast and
colorectal cancers, and FGFR2 was found to be mutated in breast
cancers and amplified in colorectal cancers.
[0051] Table 2 similarly lists the loci that were homozygously
deleted in at least one tumor and had the highest probability of
containing drivers as determined by the combined mutation analysis
(a complete list of HDs is provided in SI Table 3 and homozygously
deleted genes in SI Table 5). For each of these genes, a portion of
the coding region was affected by the HD. A number of genes
previously known to be inactivated in colorectal or breast
tumorigenesis, such as CDKN2A, PTEN, and TP53 are found in this
list. We also identified genes, such as CHD5, MAP2K4, SMAD2, and
SMAD3 that have been previously shown to be deleted in other tumor
types, but not in colorectal or breast cancers. Finally, we
discovered a number of genes not previously known to be affected by
HD in any tumor type. For example, HDs as well as point mutations
were found in OMA1 and ZNF521 in colorectal cancers and in MANEA,
PCDH8, SATL1, and ZNF674 in breast cancers. During the course of
preparing this manuscript, we identified through independent
experimentation that PCDH8 is mutated and homozygously deleted in
breast cancer (33). A number of genes that were less frequently
altered in any one tumor type were shown to be affected at
significant levels in both tumor types, including CDH20, FHOD3 and
FNDC1.
Example 4
Pathways Enriched for Copy Number and Point Alterations
[0052] We examined whether groups of genes belonging to certain
cellular processes or pathways were preferentially affected by
genetic alterations. For this purpose, we developed a statistical
approach that provided a probability that a pathway contained
driver alterations, taking into account both the copy number
changes and point mutations. This approach was similar to that
described above for evaluating individual genes but in this case
was applied to entire groups of genes involved in specific pathways
or functional groups. Because the net effect of a pathway can be
the same whether certain components are amplified or others
deleted, all copy number alterations within a gene group were
considered. The analysis was performed using three well-annotated
GeneGo MetaCore databases: gene ontology (GO), canonical gene
pathway maps (MA), and genes participating in defined cellular
processes and networks (GG) (34). For each gene group, we
considered whether the component genes were more likely to be
affected by point mutations, amplifications, or HDs, as compared to
all genes analyzed. Importantly, these analyses were based on
analysis of the rankings of altered genes within each group using a
modified version of gene set enrichment analysis (GSEA) (35),
rather than the total number of mutations within individual groups.
This approach limits the effects of single highly mutated genes and
requires the involvement of multiple genes to score a pathway as
significantly affected.
[0053] These analyses identified gene groups that were enriched for
genetic alterations in these tumor types (Table 3). In particular,
the EGFR and ERBB gene families were enriched for alterations.
Interestingly, both of these signaling pathways involved various
components of the PI3 kinase pathway, suggesting that the observed
alterations may result in similar effects in these tumor cells
(FIG. 1). A third of genes in these combined pathways were mutated
by sequence alterations, amplifications, or HDs. Enrichment of
alterations in other canonical gene groups including Notch and G1-S
cell cycle transition pathways were also detected. The latter group
included HDs of CDKN2A and CDKN2B genes as well as amplifications
of cyclin D1, cyclin D3, and cyclin E3 genes in breast cancers. For
all these gene groups, new genes were identified that had not
previously been implicated by genetic alterations in these cellular
processes. Finally, a variety of gene groups not previously known
to be enriched for copy number changes in tumorigenesis were
identified. These included genes implicated in cell-cell
interaction and adhesion, including cadherins and metalloproteases,
as well as other genes implicated in cellular interactions during
early embryonic and neural development.
[0054] As an example, in colorectal cancers, a total of 33 cadherin
and protocadherin genes were detected as being affected by copy
number or sequence changes. In breast cancers, there was also
enrichment in genes implicated in DNA topological control,
including alterations in a number of topoisomerases (TOP1, TOP2A,
TOP2B and TOP3A) and helicases. All pathways showing significant
enrichment for genetic alterations are listed in SI Table 6.
REFERENCES
[0055] The disclosure of each reference cited is expressly
incorporated herein. [0056] 1. Vogelstein, B. & Kinzler, K. W.
(2004) Cancer genes and the pathways they control. Nat Med
10:789-99. [0057] 2. Bardelli, A. & Velculescu, V. E. (2005)
Mutational analysis of gene families in human cancer. Curr Opin
Genet Dev 15:5-12. [0058] 3. Strausberg, R. L., Levy, S. &
Rogers, Y. H. (2008) Emerging DNA sequencing technologies for human
genomic medicine. Drug Discov Today 13:569-77. [0059] 4. Greenman,
C., et al. (2007) Patterns of somatic mutation in human cancer
genomes. Nature 446:153-8. [0060] 5. Wood, L. D., et al. (2007) The
genomic landscapes of human breast and colorectal cancers. Science
318:1108-13. [0061] 6. Slamon, D. J., et al. (1987) Human breast
cancer: correlation of relapse and survival with amplification of
the HER-2/neu oncogene. Science 235:177-82. [0062] 7. Collins, S.
& Groudine, M. (1982) Amplification of endogenous myc-related
DNA sequences in a human myeloid leukaemia cell line. Nature
298:679-81. [0063] 8. Kamb, A., et al. (1994) A cell cycle
regulator potentially involved in genesis of many tumor types.
Science 264:436-40. [0064] 9. Li, J., et al. (1997) PTEN, a
putative protein tyrosine phosphatase gene mutated in human brain,
breast, and prostate cancer. Science 275:1943-7. [0065] 10. Steck,
P. A., et al. (1997) Identification of a candidate tumour
suppressor gene, MMAC1, at chromosome 10q23.3 that is mutated in
multiple advanced cancers. Nat Genet 15:356-62. [0066] 11. Hahn, S.
A., et al. (1996) Dpc4, a Candidate Tumor Suppressor Gene At Human
Chromosome 18q21.1. Science 271:350-353. [0067] 12. Pinkel, D.
& Albertson, D. G. (2005) Array comparative genomic
hybridization and its applications in cancer. Nat Genet 37
Suppl:S11-7. [0068] 13. Camps, J., et al. (2008) Chromosomal
breakpoints in primary colon cancer cluster at sites of structural
variants in the genome. Cancer Res 68:1284-95. [0069] 14. Weir, B.
A., et al. (2007) Characterizing the cancer genome in lung
adenocarcinoma. Nature 450:893-8. [0070] 15. Haverty, P. M., et al.
(2008) High-resolution genomic and expression analyses of copy
number alterations in breast tumors. Genes Chromosomes Cancer
47:530-42. [0071] 16. Mullighan, C. G., et al. (2007) Genome-wide
analysis of genetic alterations in acute lymphoblastic leukaemia.
Nature 446:758-64. [0072] 17. Harada, T., et al. (2008) Genome-wide
DNA copy number analysis in pancreatic cancer using high-density
single nucleotide polymorphism arrays. Oncogene 27:1951-60. [0073]
18. Stark, M. & Hayward, N. (2007) Genome-wide loss of
heterozygosity and copy number analysis in melanoma using
high-density single-nucleotide polymorphism arrays. Cancer Res
67:2632-42. [0074] 19. Nagayama, K., et al. (2007) Homozygous
deletion scanning of the lung cancer genome at a 100-kb resolution.
Genes Chromosomes Cancer 46:1000-10. [0075] 20. Zhao, X., et al.
(2005) Homozygous deletions and chromosome amplifications in human
lung carcinomas revealed by single nucleotide polymorphism array
analysis. Cancer Res 65:5561-70. [0076] 21. Jones, S., et al.
(2008) Comparative lesion sequencing provides insights into tumor
evolution. Proc Natl Acad Sci USA 105:4283-8. [0077] 22. Cairns,
P., et al. (1995) Frequency of homozygous deletion at p16/CDKN2 in
primary human tumours. Nat Genet 11:210-2. [0078] 23. Liggett, W.
H., Jr. & Sidransky, D. (1998) Role of the p16 tumor suppressor
gene in cancer. J Clin Oncol 16:1197-206. [0079] 24. Wang, T. L.,
et al. (2002) Digital karyotyping. Proc Natl Acad Sci USA
99:16156-61. [0080] 25. Steemers, F. J., et al. (2006) Whole-genome
genotyping with the single-base extension assay. Nat Methods
3:31-3. [0081] 26. Peiffer, D. A., et al. (2006) High-resolution
genomic profiling of chromosomal aberrations using Infinium
whole-genome genotyping. Genome Res 16:1136-48. [0082] 27. Conrad,
D. F., Andrews, T. D., Carter, N. P., Hurles, M. E. &
Pritchard, J. K. (2006) A high-resolution survey of deletion
polymorphism in the human genome. Nat Genet 38:75-81. [0083] 28.
Sebat, J., et al. (2004) Large-scale copy number polymorphism in
the human genome. Science 305:525-8. [0084] 29. Volik, S., et al.
(2003) End-sequence profiling: sequence-based analysis of aberrant
genomes. Proc Natl Acad Sci USA 100:7696-701. [0085] 30. Bignell,
G. R., et al. (2007) Architectures of somatic genomic rearrangement
in human cancer amplicons at sequence-level resolution. Genome Res
17:1296-303. [0086] 31. Sjoblom, T., et al. (2006) The consensus
coding sequences of human breast and colorectal cancers. Science
314:268-74. [0087] 32. Wang, Z., et al. (2004) Three classes of
genes mutated in colorectal cancers with chromosomal instability.
Cancer Res 64:2998-3001. [0088] 33. Yu, J. S., et al. (2008) PCDH8,
the human homolog of PAPC, is a candidate tumor suppressor of
breast cancer. Oncogene. [0089] 34. Ekins, S., Nikolsky, Y.,
Bugrim, A., Kirillov, E. & Nikolskaya, T. (2007) Pathway
mapping tools for analysis of high content data. Methods Mol Biol
356:319-50. [0090] 35. Subramanian, A., et al. (2005) Gene set
enrichment analysis: a knowledge-based approach for interpreting
genome-wide expression profiles. Proc Natl Acad Sci USA
102:15545-50. [0091] 36. Collins, F. S. & Barker, A. D. (2007)
Mapping the cancer genome. Pinpointing the genes involved in cancer
will help chart a new course across the complex landscape of human
malignancies. Sci Am 296:50-7. [0092] 37. Hoglund, M., et al.
(2002) Dissecting karyotypic patterns in colorectal tumors: two
distinct but overlapping pathways in the adenoma-carcinoma
transition. Cancer Res 62:5939-46. [0093] 38. Paez, J. G., et al.
(2004) Genome coverage and sequence fidelity of phi29
polymerase-based multiple strand displacement whole genome
amplification. Nucleic Acids Res 32:e71. [0094] 39. Arriola, E., et
al. (2007) Evaluation of Phi29-based whole-genome amplification for
microarray-based comparative genomic hybridisation. Lab Invest
87:75-83. [0095] 40. Popescu, N. C. (2004) Fragile sites and cancer
genes on the short arm of chromosome 8. Lancet Oncol 5:77;
discussion 77. [0096] 41. Tanner, M., et al. (2006) Topoisomerase
IIalpha gene amplification predicts favorable treatment response to
tailored and dose-escalated anthracycline-based adjuvant
chemotherapy in HER-2/neu-amplified breast cancer: Scandinavian
Breast Group Trial 9401. J Clin Oncol 24:2428-36. [0097] 42. Coon,
J. S., et al. (2002) Amplification and overexpression of
topoisomerase IIalpha predict response to anthracycline-based
therapy in locally advanced breast cancer. Clin Cancer Res
8:1061-7. [0098] 43. Varshaysky, A. (2007) Targeting the absence:
homozygous DNA deletions as immutable signposts for cancer therapy.
Proc Natl Acad Sci USA 104:14935-40. [0099] 44. Leary, R. J.,
Cummins, J., Wang, T. L. & Velculescu, V. E. (2007) Digital
karyotyping. Nat Protoc 2:1973-86.
Example 5
Materials and Methods
[0100] DNA samples from tumor derived xenografts and cell lines
were obtained and purified. DK libraries were generated and
analyzed as previously described (24, 44). The Illumina SNP arrays
were used to analyze tumor samples. Bioinformatic analyses were
used to determine focal amplifications and HDs. Statistical methods
were employed to determine the likelihood that genetic alterations
occurred at a frequency higher than the passenger rate, and to
identify gene groups enriched for copy number and sequence
alterations.
Clinical Samples and Cell Lines
[0101] DNA samples were obtained from xenografts and cell lines of
ductal breast and colorectal carcinoma. Normal DNA samples were
obtained from matched normal tissue or peripheral blood. Twenty two
of the DNA samples include those used in the Discovery Screen of
Sjoblom et al. and Wood et al. (1, 2). All tumor samples analyzed
for copy number analyses are listed in SI Table 9. For the Illumina
analyses, the colorectal cancer samples used were cell lines (10)
or xenografts (26), each developed from a liver metastasis of a
different patient. The breast cancer samples used were cell lines
(22) and xenografts (23), each developed from a different patient.
In addition, 11 colorectal cancer metastases (immunopurified using
the BerEP4 antibody as previously described (3)) and 7 cell lines
were analyzed by Digital Karyotyping analyses. Available clinical
information for samples that were analyzed by copy number and
sequence analyses is available in Table S2 of reference (2). All
samples were obtained in accordance with the Health Insurance
Portability and Accountability Act (HIPAA).
Digital Karyotyping
[0102] Digital Karyotyping libraries were constructed as previously
described (4, 5). In brief, 17 by tags of genomic DNA were
generated using the NlaIII mapping and Sad fragmenting restriction
enzymes. For each library, the experimental tags obtained were
concatenated, cloned and sequenced. SAGE2002 software was used to
extract the experimental tags from the sequencing data. The
sequences of the experimental tags were compared to the predicted
virtual tags extracted from the human genome reference sequence
hg16 (NCBI Build 34, July 2003) and were visualized using the
SageGenie DKView to identify potential alterations
(http://cgap.nci.nih.gov/SAGE/DKViewHome). The coordinates of all
identified alterations were translated to the human genome
reference sequence hg17 (NCBI Build 35, May 2004) to allow
comparison to Illumina data.
[0103] Homozygous deletions were identified using a sliding window
size of 175 virtual tags (.about.700 kb in size). Windows with a
tag density ratio (observed tags in window/expected tags in window)
<0.01 were considered to represent putative homozygous deletions
and were further examined. Regions of homozygous deletions were
defined as containing no experimental tags and the boundaries were
determined as the outermost virtual tags with no matching
experimental tags.
[0104] Amplifications were identified using sliding windows of
variable sizes, as the most accurate window size for detection and
quantification of amplifications is the exact size of the altered
region. Windows with tag density ratios .gtoreq.6 were considered
to represent amplified regions. Boundaries of the amplified region
are determined by the outermost tag contained in a window with a
tag density ratio >3 or by the virtual tag position after which
there is sharp decline in the observed experimental tags.
High Density SNP Arrays
[0105] The Illumina Infinium II Whole Genome Genotyping Assay
employing the BeadChip platform was used to analyze tumor samples
at 317,503 (317 k), 555,351 (550 k V1), or 561,466 (550 k V3) SNP
loci from the Human HapMap collection. All SNP positions were based
on hg17 (NCBI Build 35, May 2004) version of the human genome
reference sequence. The genotyping assay is a two step procedure
that is based on hybridization to a 50 nucleotide oligo, followed
by a two-color fluorescent single base extension. The image files
of fluorescence intensities were processed using Illumina
BeadStation software to provide intensity values for each SNP
position. For each SNP, the normalized experimental intensity value
(R) was compared to the intensity values for that SNP from a
training set of normal samples and represented as a ratio (called
the "Log R Ratio") of log 2(Rexperimental/Rtraining set).
Bioinformatic Analysis of High Density SNP Array Data
[0106] Digital Karyotyping was used to inform and optimize the
criteria for detection of focal homozygous deletions and high-copy
amplifications using the Illumina arrays. Three colorectal cancer
samples (Co44, Co82 and Co84) were assessed by Digital Karyotyping
tag libraries as well as the Illumina arrays (SI Table 1). From
these analyses criteria were developed to permit sensitive and
specific detection of the Digital Karyotyping alterations using the
Illumina platform as described below. These criteria were
subsequently used to analyze an additional 46 breast and 33
colorectal cancers.
Detection of Homozygous Deletions
[0107] Homozygous deletions (HDs) were defined as two or more
consecutive SNPs with a Log R Ratio value of .ltoreq.-2. The first
and last SNPs of the identified HD region were considered to be the
boundaries of the alteration for subsequent analyses. The deletion
breakpoint would be expected to be located between the boundary
deleted SNPs and adjacent non-deleted SNPs; use of the inner
deleted SNP boundaries provides the most conservative approach as
use of the outer boundaries may include non-deleted regions. To
eliminate chip artifacts and potential copy number polymorphisms,
we removed all HDs that were included in copy number polymorphism
databases (6, 7). As these analyses showed that copy number
polymorphisms had conserved boundaries, we also removed all
observed HDs with identical boundaries that occurred in multiple
samples. Adjacent homozygous deletions separated by one or two SNPs
were considered to be part of the same alteration. Adjacent HDs
were evaluated separately for the purposes of determining affected
genes, but were counted as single entries in Table 2 and SI Table
5. To identify genes affected by HDs, we compared the location of
coding exons in the RefSeq and CCDS databases with the genomic
coordinates of the observed HDs. Any gene with a portion of its
coding region contained within a homozygous deletion was considered
to be affected by the deletion.
Detection of Amplifications
[0108] High copy amplifications (i.e. >12 chromosomal copies as
determined by Digital Karyotyping) were defined as regions having
at least one SNP with a LogR ratio .gtoreq.1.4, at least one in ten
SNPs with a LogR ratio .gtoreq.1, and an average LogR ratio of the
entire region of .gtoreq.0.9. The boundaries of amplified regions
were delimited by the outermost SNPs with LogR ratios >1.
Similar to analyses of homozygous deletions, we removed all
amplifications that had identical boundaries and occurred in
multiple samples.
[0109] As focal amplifications are more likely to be useful in
identifying specific target genes, a second set of criteria were
used to remove large chromosomal regions or entire chromosomes that
showed copy number gains. These large alterations, called "complex
amplifications", were thus distinguished from small focal
alterations, called "simple amplifications". Based on observations
from Digital Karyotyping, several steps were used to identify and
remove complex amplifications. First, amplifications >3 Mb in
size and groups of nearby amplifications (within 1 Mb) that were
also >3 Mb in size were considered complex. Amplifications or
groups of amplifications that occurred at a frequency of .gtoreq.4
amplifications in a 10 Mb region, or .gtoreq.5 amplifications per
chromosome were deemed to be complex. The amplifications remaining
after these filtering steps were considered to be simple
amplifications and were further examined. The complex regions were
not included in subsequent statistical analyses but those
containing candidate cancer genes are indicated in Table 1. To
identify protein coding genes affected by amplifications, we
compared the location of the start and stop positions of each gene
within the RefSeq and CCDS databases with the genomic coordinates
of the observed amplifications. As amplifications of a sub-genic
region (i.e. containing only a fraction of a gene) are less likely
to have a functional consequence, we focused our analyses on genes
whose entire coding regions were included in the observed
amplifications.
[0110] A number of genes co-amplified or co-deleted with known
oncogenes (CCND1, ERBB2, CCNE1, EGFR, MYC) or tumor suppressors
(CDKN2A, PTEN, MAP2K4, TP53) were considered "known passengers" and
eliminated from further statistical analysis. However, for
completeness, these known passengers were listed along with their
respective copy number alterations in SI Tables 4 and 5. Copy
number alterations of known passengers were also listed in SI
Tables 6 and 7, but these alterations were not used to calculate
the passenger probabilities listed in the same tables. Alterations
of known passengers were also excluded from statistical analysis of
pathways (SI Table 8).
Statistical Analysis of Deletions and Amplifications
[0111] For each of the genes involved in amplifications or
deletions, we quantify the strength of the evidence that they may
be drivers of carcinogenesis by reporting a driver probability,
separately for amplifications and deletions. In each case, the
passenger probability is an a posteriori probability that
integrates information from the somatic mutation analysis of Wood
et al. (2) with the data presented in this article. The passenger
probabilities reported in Wood et al. (2) serve as a priori
probabilities. These are available for three different scenarios of
passenger mutation rates and results are presented separately for
each. If a gene was not found to be mutated in Wood et al. (2) the
prior passenger probability is set to the estimated proportion of
passengers in the RefSeq set. Then, a likelihood ratio for "driver"
versus "passenger" was evaluated using as evidence the number of
samples in which a gene was found to be amplified (or deleted).
Analysis is carried out separately by type of array, and then
combined by multiplication of the relevant likelihood terms. The
passenger term is the probability that the gene in question is
amplified (deleted). For each sample, we begin by computing the
probability that the observed amplifications (deletions) will
include the gene in question by chance. Inclusion of all available
SNPs is required for amplification, while any overlap of SNPs is
sufficient for deletions. Specifically, if in a specific sample N
SNPs are typed, and K amplifications are found, whose sizes, in
terms of SNPs involved, are A1 . . . AK, a gene with G SNPs will be
included at random with probability
(A1-G+1)/N+ . . . +(AK-G+1)/N
for amplifications and
(A1+G-1)/N+ . . . +(AK+G-1)/N
for deletions.
[0112] We then compute the probability of the observed number of
amplifications (deletions) assuming that the samples are
independent but not identically distributed Bernoulli random
variables, using the Thomas and Taub algorithm (8), as implemented
in R by M. Newton. Our approach to evaluating the passenger
probabilities provides an upper bound, as it assumes that all the
deletions and amplifications observed only include passengers. The
driver term of the likelihood ratio was approximated as for the
passenger term, after multiplying the sample-specific passenger
rates above by a gene-specific factor reflecting the increase
(alternative hypothesis) of interest. This increase is estimated by
the ratio between the empirical deletion rate of the gene and the
expected deletion rate for that gene.
[0113] For each of the gene sets considered we quantify the
strength of the evidence that they may include a
higher-than-average proportion of driver genes. For each set, in a
list of all the RefSeq genes sorted by a score combining
information on mutations, amplifications and deletions, we compared
the ranking of the genes contained in the set with the ranking of
those outside, using the rank-sum test, as implemented by the Limma
package in Bioconductor (9). Scores were obtained by adding three
log likelihood ratios for mutations, amplifications and deletions.
This combination approach makes an approximating assumption of
independence of amplifications and deletions. In general, amplified
genes cannot be deleted, so independence is technically violated.
However, because of the relatively small number of dramatic
amplification and deletions, this assumption is tenable for the
purposes of gene set analysis. Inspection of the log likelihoods
suggest that they are roughly linear in the number of events,
supporting the validity of this approximation as a scoring system.
The statistical significance of deviation from the null hypothesis
of a random distribution was calculated using Limma and then
corrected for multiplicity by the q-value method (10) as
implemented in version 1.1 of the package "q-value".
METHODS REFERENCES
Example 5
[0114] 1. Sjoblom, T., Jones, S., Wood, L. D., Parsons, D. W., Lin,
J., Barber, T. D., Mandelker, D., Leary, R. J., Ptak, J., Silliman,
N., Szabo, S., Buckhaults, P., Farrell, C., Meeh, P., Markowitz, S.
D., Willis, J., Dawson, D., Willson, J. K., Gazdar, A. F.,
Hartigan, J., Wu, L., Liu, C., Parmigiani, G., Park, B. H.,
Bachman, K. E., Papadopoulos, N., Vogelstein, B., Kinzler, K. W.
& Velculescu, V. E. (2006) The consensus coding sequences of
human breast and colorectal cancers. Science 314:268-74. [0115] 2.
Wood, L. D., Parsons, D. W., Jones, S., Lin, J., Sjoblom, T.,
Leary, R. J., Shen, D., Boca, S. M., Barber, T., Ptak, J.,
Silliman, N., Szabo, S., Derso, Z., Ustyanksky, V., Nikolskaya, T.,
Nikolsky, Y., Karchin, R., Wilson, P. A., Kaminker, J. S., Zhang,
Z., Croshaw, R., Willis, J., Dawson, D., Shipitsin, M., Willson, J.
K., Sukumar, S., Polyak, K., Park, B. H., Pethiyagoda, C. L., Pant,
P. V., Ballinger, D. G., Sparks, A. B., Hartigan, J., Smith, D. R.,
Suh, E., Papadopoulos, N., Buckhaults, P., Markowitz, S. D.,
Parmigiani, G., Kinzler, K. W., Velculescu, V. E. & Vogelstein,
B. (2007) The genomic landscapes of human breast and colorectal
cancers. Science 318:1108-13. [0116] 3. Saha, S., Bardelli, A.,
Buckhaults, P., Velculescu, V. E., Rago, C., St Croix, B., Romans,
K. E., Choti, M. A., Lengauer, C., Kinzler, K. W. & Vogelstein,
B. (2001) A phosphatase associated with metastasis of colorectal
cancer. Science 294:1343-6. [0117] 4. Wang, T. L., Maierhofer, C.,
Speicher, M. R., Lengauer, C., Vogelstein, B., Kinzler, K. W. &
Velculescu, V. E. (2002) Digital karyotyping. Proc Natl Acad Sci
USA 99:16156-61. [0118] 5. Leary, R. J., Cummins, J., Wang, T. L.
& Velculescu, V. E. (2007) Digital karyotyping. Nat Protoc
2:1973-86. [0119] 6. Conrad, D. F., Andrews, T. D., Carter, N. P.,
Hurles, M. E. & Pritchard, J. K. (2006) A high-resolution
survey of deletion polymorphism in the human genome. Nat Genet
38:75-81. [0120] 7. Sebat, J., Lakshmi, B., Troge, J., Alexander,
J., Young, J., Lundin, P., Maner, S., Massa, H., Walker, M., Chi,
M., Navin, N., Lucito, R., Healy, J., Hicks, J., Ye, K., Reiner,
A., Gilliam, T. C., Trask, B., Patterson, N., Zetterberg, A. &
Wigler, M. (2004) Large-scale copy number polymorphism in the human
genome. Science 305:525-8. [0121] 8. Thomas, M. A. & Taub, A.
E. (1982) Calculating binomial probabilities when the trial
probabilities are unequal. Journal of Statistical Computation and
Simulation 14:125-131. [0122] 9. Smyth, G. K. (2005) in
Bioinformatics and Computational Biology Solutions using R and
Bioconductor, eds. Gentleman, V., Carey, S., Dudoit, R. &
Irizarry, W. H. (Springer, New York), pp. 397-420. [0123] 10.
Storey, J. D. & Tibshirani, R. (2003) Statistical significance
for genomewide studies. Proc Natl Acad Sci USA 100:9440-5.
* * * * *
References