U.S. patent application number 14/757779 was filed with the patent office on 2016-07-14 for methods for constructing association maps of imaging data and biological data.
The applicant listed for this patent is Howard Yuan-Hao Chang, Michael David Kuo, Eran Segal. Invention is credited to Howard Yuan-Hao Chang, Michael David Kuo, Eran Segal.
Application Number | 20160203597 14/757779 |
Document ID | / |
Family ID | 39344895 |
Filed Date | 2016-07-14 |
United States Patent
Application |
20160203597 |
Kind Code |
A1 |
Chang; Howard Yuan-Hao ; et
al. |
July 14, 2016 |
METHODS FOR CONSTRUCTING ASSOCIATION MAPS OF IMAGING DATA AND
BIOLOGICAL DATA
Abstract
A method for constructing an association map between imaging
features and biological data is described. The method comprises
combining one or more image features relating to a clinical subject
with biological data and using an algorithm to make predictions
based on the features and data.
Inventors: |
Chang; Howard Yuan-Hao;
(Stanford, CA) ; Segal; Eran; (Rehovot, IL)
; Kuo; Michael David; (San Diego, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Chang; Howard Yuan-Hao
Segal; Eran
Kuo; Michael David |
Stanford
Rehovot
San Diego |
CA
CA |
US
IL
US |
|
|
Family ID: |
39344895 |
Appl. No.: |
14/757779 |
Filed: |
December 23, 2015 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12447890 |
Mar 15, 2010 |
|
|
|
PCT/US07/22973 |
Oct 30, 2007 |
|
|
|
14757779 |
|
|
|
|
60856386 |
Oct 31, 2006 |
|
|
|
Current U.S.
Class: |
382/128 |
Current CPC
Class: |
G06T 2207/30096
20130101; A61B 5/4842 20130101; F23Q 3/004 20130101; A61B 5/7271
20130101; G06T 7/0012 20130101; G16B 25/00 20190201; G16B 40/00
20190201; G06K 9/46 20130101; G06K 9/66 20130101 |
International
Class: |
G06T 7/00 20060101
G06T007/00; G06K 9/46 20060101 G06K009/46; A61B 5/00 20060101
A61B005/00; G06K 9/66 20060101 G06K009/66 |
Goverment Interests
STATEMENT REGARDING GOVERNMENT INTEREST
[0001] This work was supported in part by grant number 1 K08
AR050007 from the National Institute of Health. The U.S. Government
has certain rights in the invention.
Claims
1.-24. (canceled)
25. A method of assessing the disease state of a tumor comprising:
a. identifying one or more tumor imaging features from images from
a plurality of subjects, b. applying a module networks algorithm to
identify relationships between the one or more tumor imaging
features and biological data relating to the images; c.
constructing, based on the identified relationships an association
map between the one or more tumor imaging features and the
biological data; and d. determining the diseased state of a tumor
by visually inspecting the association map.
26. The method of claim 1 wherein the tumor imaging features are
associated with a disease.
27. The method of claim 1, wherein the identifying comprises
identifying one or more tumor imaging features based on frequency
of the one or more features in the images.
28. The method of claim 1, wherein the identifying comprises
identifying one or more tumor imaging features based on its
independence from other features.
29. The method of claim 1, wherein said identifying comprises
identifying one or more tumor imaging features from images obtained
using an imaging technique.
Description
TECHNICAL FIELD
[0002] The subject matter described herein relates to methods for
predicting disease risk, prognosis, and best treatment regimens in
clinical subjects. The methods involve evaluating a subjects
non-invasively obtained imaging features in view of an association
map that correlates imaging features with biological data.
BACKGROUND
[0003] Scientists and clinicians routinely use non-invasive imaging
to detail the physical and structural composition of living matter.
Assessing the genetic and biochemical makeup of living tissue
through non-invasive imaging is a desirable goal of current
research. Recent development of genomic and proteomic methods have
enabled molecular profiling of biological specimens by
simultaneously revealing the expression level of thousands of genes
and proteins. For example, gene expression patterns of cancer can
reveal its etiology, prognosis, and therapeutic potential (Chung,
C. H. et al., Nat. Genet., 32 Suppl.:533-540 (2002); Segal, E. et
al., Nat. Genet., 37 Suppl.:S38-45 (2005); Chen, X. et al., Mol
Biol Cell, 13:1929-1939 (2002)).
[0004] Current methods of molecular profiling often require
invasive surgeries for tissue procurement and specialized
equipment, thus limiting its routine use. In some cases, current
profiling methods provide a single snap shot in time because they
are destructive by nature in that cells must be disintegrated to
extract nucleic acids or proteins for analysis. Another barrier to
wide spread use of molecular profiling is that human tissues
exhibit diverse distinctive features on noninvasive radiographic
imaging, many of which currently have no known significance.
Because imaging features of tissues reflect the dynamic and
physiologic interplay of parenchymal cells, blood vessels, and
stroma, it would be desirable if imaging features could be used to
predict specific gene expression patterns in human diseases.
[0005] The foregoing examples of the related art and limitations
related therewith are intended to be illustrative and not
exclusive. Other limitations of the related art will become
apparent to those of skill in the art upon a reading of the
specification and a study of the drawings.
BRIEF SUMMARY
[0006] The following aspects and embodiments thereof described and
illustrated below are meant to be exemplary and illustrative, not
limiting in scope.
[0007] In one aspect, a method of constructing an association map
between imaging features and biological data is provided,
comprising: [0008] identifying one or more imaging features from a
plurality of images of a subject; [0009] applying an algorithm to
identify relationships between the one or more imaging features and
biological data relating to the subject, wherein the identified
relationships are used to construct an association map between the
one or more imaging features and the biological data; [0010]
evaluating the statistical significance of the association map to
test its predictive value.
[0011] In some embodiments, the features from a plurality of images
of a subject are associated with a disease.
[0012] In some embodiments, the identifying comprises identifying
one or more imaging features based on frequency of the one or more
features in the plurality of images.
[0013] In some embodiments, the identifying comprises identifying
one or more imaging features based on its independence from other
features.
[0014] In some embodiments, the identifying comprises identifying
one or more imaging features from images obtained using an imaging
technique selected from the group consisting of computerized
tomography imaging, magnetic resonance imaging (MRI), positron
emission tomography (PET), ultrasonography (US), optical imaging,
infrared imaging, and x-ray radiography. In particular embodiments,
the imaging technique comprises the use of an imaging agent or
image-enhancing agent.
[0015] In some embodiments, the applying comprises applying a
module networks algorithm.
[0016] In some embodiments, the applying comprises applying an
algorithm that applies an iterative Bayesian probabilistic
procedure that identifies combinations of imaging features that
relate to the biological data.
[0017] In some embodiments, the applying comprises applying an
algorithm to gene expression data.
[0018] In some embodiments, the gene expression data is from a DNA
microarray assay. In some embodiments, the gene expression data is
from a cDNA microarray assay. In some embodiments, the gene
expression data is from an RNA microarray assay.
[0019] In some embodiments, the applying comprises applying an
algorithm to protein expression data.
[0020] In some embodiments, the evaluating the statistical
significance of the association map comprises evaluating by
comparison of the map with permuted data sets.
[0021] In some embodiments, the evaluating the statistical
significance of the association map comprises evaluating by testing
the prediction using an independent biological data set,
independent images, or both.
[0022] In a related aspect, a method for predicting a gene or
protein expression level in a biological sample is provided,
comprising: [0023] providing an image of the biological sample,
[0024] comparing the image to an association map as above to
predict a gene or protein expression of the biological sample.
[0025] In some embodiments, the method further comprises, based on
the predicting, providing a treatment prognosis of said patient
based on the presence and/or absence of certain imaging
features.
[0026] In some embodiments, the providing comprises providing a
prediction of a patient's response to a drug. In some embodiments,
the providing comprises providing a prediction of a patient's
probable survival. In particular embodiments, the probable survival
is disease free survival.
[0027] In some embodiments, the providing comprises providing a
likelihood of disease recurrence.
[0028] In some embodiments, the providing comprises providing a
likelihood of metastasis.
[0029] In another aspect, an association map constructed using the
above method is provided.
[0030] In addition to the exemplary aspects and embodiments
described above, further aspects and embodiments will become
apparent by reference to the drawings and by study of the following
descriptions.
BRIEF DESCRIPTION OF THE DRAWINGS
[0031] FIGS. 1A-1C are computerized tomography (CT) images of
distinct features in human hepatocellular carcinomas (HCC), the
features referred to as internal arteries (FIG. 1A), hypodense halo
(FIG. 1B), and texture heterogeneity (FIG. 1C);
[0032] FIG. 1D illustrates a strategy for constructing an
association map between imaging features and gene expression;
[0033] FIG. 2A shows an overview of an association map of imaging
features and global gene expression, where each column is a sample;
each row is a module. For each module, a decision tree of imaging
features is associated with variation in the expression level of
module genes. Knowledge of the imaging features thus allows an
approximate reconstruction of the gene expression pattern.
[0034] FIG. 2B is a graph showing the cumulative fraction of gene
expression variation across the full complement of gene activities
that is predicted by the number of imaging features in the
model.
[0035] FIG. 2C shows a matrix of modules, associated imaging
features, and their enriched gene ontology annotations. Only
modules and annotations with significant enrichment (false
discovery rate <0.05 after accounting for multiple hypothesis
testing) are shown.
[0036] FIGS. 3A-3C show molecular portraits of HCC from imaging
features, where modules associated with HCC proliferation (FIG.
3A), liver synthetic function (FIG. 3B), and extracellular matrix
remodeling (FIG. 3C) are shown; each column is a tumor sample; each
row is a gene. Imaging features specifying each module are outlined
on top; expression pattern of genes within the module as
distinguished by imaging features are shown on bottom.
[0037] FIGS. 4A-4B show that imaging features predict venous
invasion and survival, where a two-feature decision tree associated
with a gene expression signature of venous invasion is shown to
predict histologic venous invasion (FIG. 4A), and Kaplan-Meier
survival curves of HCC patients with and without "internal
arteries" imaging feature are shown in FIG. 4B.
[0038] FIG. 5 is a Table showing examples of image features.
DETAILED DESCRIPTION
[0039] In one aspect, a method is provided wherein an image or one
or more imaging features is correlated to an association map of
imaging features and biological data. The method finds use in
various fields, including medical diagnostics and therapeutics. The
methods have use in clinical subject/patient disease screening,
diagnosis, characterization, and treatment selection.
[0040] The method is based on correlating biological data with
associated imaging data, to construct a bidirectional association
map, as will be illustrated below in Example 1. The biological data
for construction of the association map can be obtained from a
database or generated from patient biological samples. Databases of
polynucleotide and protein expression data are well known. Such
gene expression data can also be obtained, for example, using a DNA
microarray that surveys the expression levels of thousands of genes
simultaneously. For example, a 21-gene assay, termed Oncotype Dx,
is a commercially available DNA microarray to determine prognosis
and predict response of primary breast tumors to chemotherapy. A
70-gene signature known as Mammaprint is known for use in
determining an adjuvant chemotherapeutic regimen in primary breast
cancer. Gene expression signatures have also been identified to
predict prognosis or therapeutic response in lung cancer, leukemia,
and prostate cancer.
[0041] Data from any or all of these sources, preexisting or
generated for the purpose of building an association map, are
examples of biological data suitable for use in the method
described herein. It will be appreciated that the gene expression
data can be for any tissue source, such as cancerous tissue, tissue
associated with a malignant or benign growth, infected tissue,
inflamed tissue, and the like. Gene expression data may relate to
expression levels, splicing patterns, gene copy number, chromosomal
alterations (e.g., deletions, amplifications, inversions, and
translocations), single nucleotide polymorphisms, and the like.
Gene expression data include epigenetic data, e.g., relating to DNA
methylation and histone modifications (e.g. acetylation,
methylation, and ubiquitination). Gene expression data may be based
on analyses of DNA, cDNA, mRNA, snRNA, iRNA, or other nucleic
acids.
[0042] Biological data includes data based on protein-based
analyses, including tissue protein expression profiles of different
tissues (e.g. cancer, infected, inflamed, infected, etc).
Particular examples include biological data from Serial Analysis of
Gene Expression (SAGE), nuclear magnetic resonance,
protein-interaction screens, chromatin immunoprecipitation-chips,
isotope coded affinity tagging, activity based reagents, gel or
chromatographic separation, RNAi screens, tissue arrays or mass
spectrometry in which a large number of genes, proteins or
metabolites are measured in a single experiment or assay is also
contemplated. Biological data also include data from serological
tests, EKGs, EEG, urinalysis, and other clinical and forensic
analyses.
[0043] As noted above, the method combines the association map with
imaging data. Such imaging data can be obtained from a wide variety
of sources, including but not limited to magnetic resonance imaging
(MRI), positron emission tomography (PET), computerized tomography
(CT), ultrasonography (US), optical imaging, infrared imaging, and
x-ray radiography. Imaging can be coupled with drugs or compounds,
contrast agents or other agents or stimuli, or medical devices to
elicit additional information from the imaging. Images are obtained
using these modalities applied to a tissue sample, a lesion, an
organism imaged in whole or in part.
[0044] In a general embodiment, the method of constructing an
association map comprises providing a plurality of images of, for
example, a tissue or a whole or part of an organism, such as a
human subject, and biological data that has some relation to the
images. For example, images of a solid tumor would preferably be
accompanied by biological data based on the imaged solid tumor or
on a like solid tumor. That is, images of tumors in the thyroid or
images of infected tissue on a limb would have corresponding
biological data from thyroid tumors or infected limb tissue,
respectively. In a preferred embodiment, the image and the
biological data derive from the same tissue or organism; however, a
population of images and a population of biological data need not
have a one-to-one correspondence.
[0045] An exemplary association map relating to human
hepatocellular carcinoma is constructed by inspecting the imaging
data and identifying distinctive features in the image. Examples of
distinctive image features (or traits) for human hepatocellular
carcinomas are shown in FIGS. 1A-1C, where computerized tomography
(CT) images of features referred to as internal arteries (FIG. 1A),
hypodense halo (FIG. 1B), and texture heterogeneity (FIG. 1C) were
identified. As will be illustrated below (Example 1), the image or
images may be scrutinized to extract certain features or features
that inform gene expression. Such features include observations
related to morphology, composition, structure, and/or physiology.
Examples of distinct features that inform gene expression analyses
include tissue necrosis, tissue heterogeneity, tumor margin score,
internal septa, enhancement pattern, internal arteries, hypodense
halo, wash-out, wash-in, texture heterogeneity, capsule,
infiltration, and other imaging features familiar to artisans.
[0046] Such imaging features (and representative data) are
associated with a unique image, imaging study, examination, subject
or population, all of which are data relating to the image. Such
image data independently or in combination define elements or
components of the image, or the composite imaging appearance
itself, which are included in the biological data used to construct
an association map.
[0047] It will be appreciated that a single imaging feature may be
sufficient to add value an association map; however more (and more
detailed) features/data are generally preferred.
[0048] In some embodiments, the method of constructing an
association map further includes one or both of (i) using an
algorithm to identify relationships between one or more imaging
features and the biological data and/or (ii) evaluating the
statistical significance of the association map.
[0049] With respect to (i), algorithms that identify relationships
between the imaging features and the biological data are known in
the art, and such identified relationships form the basis for
constructing an association map between such imaging features and
biological data. For example, a module network algorithm is
suitable for use (Segal, E. et al., Nat. Genet., 34:166-176 (2003))
wherein the algorithm identifies groups of genes, termed modules,
which demonstrate coherent variation in expression across multiple
samples. This algorithm further applies an iterative Bayesian
probabilistic analysis and to identify combinations of imaging
features that can predict the expression levels of gene
modules.
[0050] As used herein, Bayesian probabilistic analysis refers
broadly to a genus of related models and their derivatives.
Multiple regression analysis and other analyses are known in the
art. Classification algorithms such as neural networks, support
vector machines, decision trees, Markov networks, and their
derivatives may be applied. An exemplary analyses involves
application of the Cox proportional hazard model. Other algorithms
that can identify multi-way relationships may also be used.
[0051] With respect to (ii), evaluating the statistical
significance of the association map ensures that the map is
applicable to, and predictive for, images and/or biological data
that was not used in the construction of the map. Such statistical
analysis thereby provides a means to validate the association map
as being generally applicable (i.e., generalizable) to other images
and biological data.
[0052] For example, when two large biological data sets are
compared, many apparent associations will occur by chance alone.
These spurious associations are not useful, and in fact interfere
with the identification of significant (i.e., "real" or "actually")
associations that have predictive value. Thus, a feature of some
embodiments of the present method is confirmation of the
statistical significance and predictive value of the association
map.
[0053] Statistical significance can be evaluated in several ways,
for example, by comparing the actual/observed association map with
theoretical maps derived from modified/permuted data sets, e.g.,
where the imaging features and biological data have been scrambled.
Observation of the same image feature-biological data association
at equal frequency using such scrambled data, strongly suggests
that the image feature or gene module is noisy and
non-specific.
[0054] In addition, statistical significance and predictive value
can be evaluated by cross-validation, also called leave-one-out
analysis. This means that an association map is constructed on some
fraction of the subject biological data or image features, and the
resulting map is used to predict the outcome in the remaining
patients in subjects not used to "train" the algorithm. In
practice, half, ten percent, or a single individual can be left out
as the test, and the procedure is iterated until each individual
subject in the data set has been used both as the test and for
training. Such iterative learning procedures may be a component of
the module network algorithm, described above.
[0055] Finally, the most robust method for confirming statistical
significance and predictive value is to test the association map
against a completely independent set of subjects. Because the
association map has not been trained on the new set of patients,
the ability of the map to predict the outcomes in the test set
provides strong evidence that the association map is
generalizable--meaning that the map can be used to give diagnostic
and prognostic information on most, if not all, future
subjects.
[0056] An approach of constructing an association map is
illustrated in Example 1 using expression data from imaging
features on three phase contrast-enhanced CT and gene expression
patterns of 28 human hepatocellular carcinomas (HCC). As will
become apparent, global gene expression patterns of human cancers
are encoded in their dynamic imaging features. In order to relate
gene expression to imaging, distinctive features of from
qualitative imaging were identified, and coherent patterns of
variation from gene expression profiles were defined.
[0057] In another aspect, methods for using an association map
constructed as described above, and as exemplified in Example 1,
are provided. In one embodiment, the association map is used to
guide treatment or provide a diagnosis of a subject. For example,
an image of a tumor in a subject, such as a brain, breast, lung,
prostate tumor, can be viewed in light of the association map to
inform the clinician of the gene or protein expression of the
patient. Knowledge of the gene or protein expression profile, i.e.,
molecular based information, about the patient informs the
clinician about a patient's likely response to a drug, probability
of relapse, survival rate, disease free survival, and the like.
Such information will guide the treatment regimen, including the
drug selection, dose, dosing regimen, and whether additional
treatments should be considered, such as radiotherapy or tumor
resection. Thus, a noninvasive image of a patient informs the
clinician of molecular information useful in guiding treatment.
[0058] While the methods have been exemplified mainly using disease
conditions, the methods can also be used for preventative medicine,
in which case the biological data, with indeterminate image data,
may suggest further imaging to be performed on a subject, e.g., to
watch for likely diseases or conditions. This situation would
arise, for example, when a subject was at risk for a disease, based
on genetic data, lifestyle data, and laboratory tests but the
presence of the disease could not be definitively shown by imaging
or other methods.
[0059] Association maps are also suited for use in predicting
subject outcome. Gene expression data or sequence variation
patterns that predict treatment response to particular therapies
are reported in the medical literature. For example, subjects with
breast cancer that express particular cell surface receptors, such
are HER2, are more responsive to certain chemotherapeutic agents
than subjects that do not express certain cell surface receptors.
Thus, an image of a tumor or other diseased tissue in a subject,
viewed in light of an association map, can be used to predict
response to a selected treatment.
[0060] It will also be appreciated that association maps can be
constructed from images and biological data generated or gathered
solely for this purpose, or another particular purpose. For
example, images of patients that were not responsive to a
particular drug and biological data from the subjects can be used
to build an association map.
[0061] An association map between imaging and biological data can
also be used to design a targeted therapeutic treatment regimen for
a patient, providing a personalized care program. Based on an image
of a tumor viewed in light of an association map for that tumor
type, information about the gene and/or protein expression of the
tumor can be determined. Understanding the tumor cell surface
receptors permits selection of targeting agents, such as antibody
fragments or other agents that have binding specificity for
particular cell surface receptors, that can guide or direct a drug
to the tumor cell. The targeting agent can be attached directly to
the drug, or attached to a carrier for the drug, such as a
liposome.
[0062] It will be appreciated that the method described herein can
be accompanied, if desired, by additional clinical information for
a patient, such as a
III. EXAMPLES
[0063] The following examples are illustrative in nature and are in
no way intended to be limiting.
Materials and Methods
[0064] Imaging features/traits. One hundred thirty eight (138)
distinct imaging features that were present in at least one tumor
sample were defined and were scored across all tumor samples.
Features were selected a priori based on intrinsic radiological
interest (e.g., internal arteries and hypodense halos). Features
were also filtered based on their frequency and prominence in the
data, inter-observer agreement and independence from other features
based on Pearson correlation (cut off value of 0.9). Thirty-two
(32) imaging features were used as input in the Bayesian model, and
28 of 32 were found to be informative of gene expression (FIG.
5).
[0065] Microarray data. Gene expression profiles of imaged HCCs
were downloaded from Stanford Microarray Database, which is
available via the Stanfor website. Data from array elements that
had hybridization signal over background by 1.5 fold in both Cy5
and Cy3 channels and present in 70% of samples were centered by
mean across samples. Data from replicate probes representing the
same gene (as determined by Locuslink ID) were averaged. 6732 genes
met these criteria for data quality and were used for subsequent
analysis.
[0066] Module network. A module network procedure previously
developed was applied (Segal, E. et al., Nat. Genet., 34:166-176
(2003)) to construct an association map between imaging features
and gene expression profiles. The module network procedure takes as
input a gene expression data and a set of potential regulatory
input, and attempts to partition the expression data into distinct
and mutually exclusive modules, such that the gene assigned to each
module can be well predicted by a small decision tree of input
regulatory inputs. The regulatory inputs were set to be the
real-valued imaging features and were applied to the expression
data described above. The 116 imaging networks can be interactively
searched (Segal et al. (2007) Nat. Biotechnol. 25:675-80).
[0067] Module enrichment in Gene Ontology annotations. Significance
of overlap between genes in modules and gene ontology annotations
was calculated by comparison to the degree of overlap expected by
chance alone using the hypergeometric distribution. Multiple
hypothesis testing was accounted for by calculating a false
discovery rate and present results with FDR<0.05.
[0068] Mapping venous invasion genes to imaging features. To find
imaging features that correspond to the set of 91 genes associated
with venous invasion, seven (7) modules that were significantly
enriched for these gene were identified using the hypergeometric
distribution as described above. The associated imaging feature
trees of the 7 modules were analyzed (Table, below), and two
features, internal arteries and halos, were found to be
overrepresented among the top splits. To identify the consensus
threshold of applying these features for this purpose, the p-value
weighted average of the splits from the 7 image feature trees was
calculated. The consensus thresholds were used for the imaging
feature decision tree of FIG. 4A.
TABLE-US-00001 TABLE Venous Invasion Module Analysis Node Imaging
Trait Level Frequency Module Internal Arteries, Density 1 4 595,
720, 651, 773 Hypodense Halo 1 2 479, 556 Tumor--Liver Difference,
Minimum 1 1 697 Tumor Margin Score, Maximum 2 2 720, 773
Attenuation Heterogeneity, Maximum 2 2 595, 697 Internal Arteries,
Rank 2 1 556 Internal Septa 2 1 651 Tumor Margin Score, Minumum 2 1
479 Tumor Margin Score, Minumum 3 3 773, 556, 720 Wash-out, Maximum
3 1 651 Necrosis, Density 3 1 595 Tumor Margin Score, Maximum 3 1
697 Attenuation Heterogeneity, Maximum 4 1 651
The position (node level) of each imaging feature/trait used to
construct the decision trees used to predict the 7 venous invasion
modules and their frequency of occurence at this node level are
displayed. Internal Arteries, followed by Hypodense Halos, are
over-represented in the imaging networks occupying the top node
level and frequency and were thus used to construct the venous
invasion predictor.
[0069] Clinical data analysis. Microscopic venous invasion status
on histologic analysis was available for 30 patients in the
training set and 32 patients in the test set. Within each data set,
patients were partitioned into two groups based on the two feature
decision trees ("internal arteries" and "hypodense halos" on CT
scan, FIG. 4A). Significance of association between the two feature
imaging groups and histologic venous invasion was calculated using
two-by-two contingency tables and chi square test. Overall survival
data were available for 23 patients in the training set and 32
patients in the test set; only patients with clear surgical margin
after HCC resection were used in this analysis. Within each data
set, patients were partitioned based on the presence or absence of
the "internal arteries" feature on CT scan, and survival analysis
by the method of Kaplan and Meier for the two groups of patients
was implemented in Winstat (R. Fitch Software, Bad Krozingen,
DE).
[0070] Construction of Association Map. In this example, a three
step strategy was used to create an "association map" between
imaging features gene expression patterns. More particularly, an
association map between imaging features on three phase
contrast-enhanced CT and gene expression patterns of 28 human
hepatocellular carcinomas (HCC; Chen, X. et al., Mol. Biol. Cell,
13:1929-1939 (2002)) was constructed, as shown in FIG. 1D. In the
analysis, 138 distinctive imaging features present in one or more
HCCs were defined and quantified. To identify informative features,
features were filtered based on their frequency and prominence in
the data, inter-observer agreement between two radiologists, and
independence from other features as determined by Pearson
correlation among the features (r=0.9). Thirty two imaging features
were judged most promising by these criteria and used for
subsequent analysis (FIG. 5). For instance, and with reference to
FIGS. 1A-1C, channels of radio-dense signal within certain tumors
on the arterial phase of the CT scan were noted, and this feature
was termed "internal arteries".
[0071] Next, a module networks algorithm (Segal, E. et al., Nat.
Genet., 34:166-176 (2003)) was adopted to systematically search for
associations between expression levels of 6732 well-measured genes
determined by microarray analysis (Chen, X. et al., Mol Biol Cell,
13:1929-1939 (2002)) and combinations of imaging features. The
algorithm identifies groups of genes, termed modules, which
demonstrate coherent variation in expression across multiple
samples. The algorithm further applies an iterative Bayesian
probabilistic procedure to identify combinations of imaging
features that can predict the expression levels of gene modules. An
end result is identification of specific networks of imaging
features that predict the expression level of gene modules. Each
network of imaging features predicts the expression level of one
gene module.
[0072] Next, statistical significance of the association map was
validated by comparison with permuted data sets, and also by
testing the prediction of the association map in an independent set
of tumors.
[0073] The association map of imaging features and gene expression
revealed that a surprisingly large fraction of the gene expression
program can be reconstructed from a small number of imaging
features, as seen in FIGS. 2A-2B. The expression variation in 6732
genes was captured by 116 gene modules, each of which was
associated with specific combinations of imaging features. For each
module, presence or absence of combinations of imaging features
predicted the aggregate expression level of genes within the module
(FIG. 2A). The combinations of relevant imaging features are
depicted in decision trees: each split in the tree is specified by
variation of an imaging feature; each terminal leaf in the tree is
a cluster of samples that share similar expression pattern of
module genes. Thus, the association map allowed one to predict the
relative expression level of a gene (by mapping to a module) in a
given HCC sample (by mapping to a cluster).
[0074] The hierarchical combination of only 28 imaging features was
sufficient to predict the variation of all 116 gene modules. As
shown in FIG. 2B, only nine features were sufficient to predict the
expression patterns of 50% of the full complement of gene
activities, and the prediction plateaus to above 80% of the full
complement of gene activities with more than 23 features. For each
gene, the number of features needed to predict its variation was on
average three and no more than four in any instance. The
association of imaging features and gene expression was highly
significant by several independent statistical criteria.
Specification of the entire module network involved 355 splits
based on imaging features. The average gene expression levels
between two sides of each split was significantly different in 299
of 355 splits (p<0.05 after applying the conservative Bonferroni
correction), accounting for 5282 of 6732 input genes (78.5%).
Comparison of the observed association map of imaging features and
gene expression with maps derived from data sets with permuted
sample labels confirmed that the predictive power of imaging
features for expression patterns was highly unlikely due to chance
alone. The log-likelihood was -18 per microarray, compared to only
-23.+-.0.1 expected by chance (10 permutations; p<10.sup.-50).
Thus, the variation in gene expression is densely encoded by a
small number of imaging features. Once discovered, such "coding"
image features can be quickly used to translate visual images into
the underlying gene expression.
[0075] Using the association map, imaging features predictive of
expression level of specific genes are directly revealed, and the
potential physiologic significance of many imaging features can be
inferred from their associated genes. The distribution of genes
into modules defined by imaging features was not random, but was
highly enriched for specific and diverse biological functions and
processes. Comparison of gene membership in modules versus the
published Gene Ontology annotation (Ashburner, M. et al., Nat.
Genet., 25:25-29 (2000)) revealed significant overlaps, as shown in
FIG. 2C, allowing many key physiologic properties of tumors to be
gleaned from CT images. For example, three image features predicted
the expression level of module 697 that is highly enriched in genes
involved in cell proliferation, including PCNA, cydin A, MCM5,
MCM6, and geminin, as shown in FIG. 3A. In addition, expression
level of VEGF, an important driver of tumor angiogenesis and target
of the approved chemotherapy drug bevacizumab (Kerr, D. J., Nat.
Clin. Pract. Oncol., 1:39-43 (2004)), co-varies with these cell
cycle genes and is predicted by the same imaging features, as seen
in FIG. 3A.
[0076] Thus, in one embodiment, the association provides a method
for non-invasively delineating a molecularly distinct subset of
tumors for a targeted therapeutic strategy. For example, the liver
synthetic function of HCC patients is an important guide of disease
severity (Thomas, M. B. et al., J. Clin. Oncol., 23:8093-8108
(2005)), and this information is evident in module 595, which
details the expression level of albumin, pyruvate kinase,
transferrin receptor 2, as well as revealing clotting function
(thrombin, factor V, factor X), and detoxification activity (GSTO1,
CYP27A1, epoxide hydroxylase), as seen in FIG. 3B.
[0077] It will also be appreciated that identity of genes in a
module can reveal the physiologic basis of an imaging feature. The
imaging feature "Tumor Margin Score, Minimum" denotes tumors that
show an ill-defined transition zone between tumor and surrounding
liver tissue. It was found that the presence of this feature was
associated with elevated expression of a group of genes associated
with extracellular matrix remodeling, such as MMP2, MMP7, COL3A1,
COL6A2, and thrombospondin 1 and thrombospondin 2, as seen in FIG.
3C. Several of these genes, notably MMP2 (Giannelli, G. et al.,
Int. J. Cancer, 97:425-431 (2002); Qin, L. X. et al., World J.
Gastroenterol., 8:385-392 (2002)) and thrombospondin (Qin, L. X. et
al., World J. Gastroenterol., 8:385-392 (2002); Poon, R. T. et al.,
Clin. Cancer Res., 10:4150-4157 (2004)) are known to increase tumor
invasiveness into surrounding stroma, which may lead to the poor
demarcation of tumor margins on CT imaging.
[0078] The association map also enables systematic mapping of a
predetermined group of genes to their corresponding imaging
features. Expression variation in a group of 91 genes that was
associated with microscopic venous invasion has been identified
(Chen, X. et al., Mol. Biol Cell, 13:1929-1939 (2002)), and is a
well-established sign of poor prognosis (Thomas, M. B. et al., J.
Clin. Oncol., 23:8093-8108 (2005)) that is extremely difficult to
predict using conventional imaging methods in the absence of gross
venous invasion. Here, the 91 genes in the "venous invasion
signature" were enriched in 7 modules and associated with two
predominant imaging features- the presence of "Internal Arteries"
and absence of "Hypodense Halos", as seen FIG. 4A and FIG. 5.
Therefore, whether this pair of imaging features, as observed
during the pre-operative CT scan, predicted the occurrence of
microscopic venous invasion on histologic analysis was evaluated.
In 30 patients with HCC, tumors with this combination of imaging
features had a twelve-fold increased risk of microscopic venous
invasion (p=0.004).
[0079] The predictive value of the two-feature predictor of venous
invasion was validated in an independent set of 32 patients that
were not used for training the association map (FIG. 4A, p=0.03).
The presence of the feature "Internal Arteries" in the
pre-operative CT scan of HCCs was a significant univariate
predictor of overall survival in both groups of patients, as seen
in FIG. 4B. Thus, the association map can identify novel imaging
features corresponding to gene expression signatures and provide
useful information to guide clinical decision making.
[0080] In summary, the global gene expression profiles of liver
cancer are embodied in their imaging features. The systematic
association between imaging features and gene expression allowed
useful inference from both directions: on one hand, the association
map identified biological processes, based on specific gene
expression programs, which underlie specific imaging features. On
the other hand, the association map enabled the use of imaging
features to reconstruct the global gene expression programs of
cancer, thereby creating a noninvasive "molecular portrait" of the
tumor (FIGS. 3A-3C). The utility of this approach by identifying
and validating a two-feature predictor of venous invasion in HCC
(FIG. 4) was shown. Moreover, the "Internal Artery" feature that
emerged from this analysis was a significant predictor of survival
in two independent groups of patients. These results demonstrate
that existing imaging technology may be used to reconstruct the
molecular anatomy of disease, such as cancer, in a noninvasive
fashion. The examples and data set forth herein using liver cancer
as an exemplary disease illustrates the robustness of the method.
Canonical association maps constructed from large representative
series of tumors will enable routine noninvasive diagnosis of
genetically heterogeneous tumors, reveal their prognosis, and allow
serial profiling of tumors during therapy. This type of imaging
based molecular profiling permits personalized medicine.
[0081] While a number of exemplary aspects and embodiments have
been discussed above, those of skill in the art will recognize
certain modifications, permutations, additions and sub-combinations
thereof. It is therefore intended that the following appended
claims and claims hereafter introduced are interpreted to include
all such modifications, permutations, additions and
sub-combinations as are within their true spirit and scope.
* * * * *