U.S. patent application number 11/141258 was filed with the patent office on 2005-12-08 for method and system for profiling biological systems.
Invention is credited to Adourian, Aram S., Afeyan, Noubar B., Neumann, Eric K., Oresic, Matej, Regnier, Frederick E., van der Greef, Jan, Verheij, Elwin Robbert.
Application Number | 20050273275 11/141258 |
Document ID | / |
Family ID | 23210073 |
Filed Date | 2005-12-08 |
United States Patent
Application |
20050273275 |
Kind Code |
A1 |
Afeyan, Noubar B. ; et
al. |
December 8, 2005 |
Method and system for profiling biological systems
Abstract
The present invention provides methods and systems for
developing profiles of a biological system based on the discernment
of similarities, differences, and/or correlations between
biomolecular components, of a single biomolecular component type,
of a plurality of biological samples. Preferably, the method
comprises utilizing hierarchical multivariate analysis of
spectrometric data at one or more levels of correlation.
Inventors: |
Afeyan, Noubar B.;
(Lexington, MA) ; van der Greef, Jan;
(Driebergen-Rijsenburg, NL) ; Regnier, Frederick E.;
(W. Lafayette, IN) ; Adourian, Aram S.; (Woburn,
MA) ; Neumann, Eric K.; (Lexington, MA) ;
Oresic, Matej; (Espoo, FI) ; Verheij, Elwin
Robbert; (Zeist, NL) |
Correspondence
Address: |
KIRKPATRICK & LOCKHART NICHOLSON GRAHAM LLP
(FORMERLY KIRKPATRICK & LOCKHART LLP)
75 STATE STREET
BOSTON
MA
02109-1808
US
|
Family ID: |
23210073 |
Appl. No.: |
11/141258 |
Filed: |
May 31, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11141258 |
May 31, 2005 |
|
|
|
10218880 |
Aug 13, 2002 |
|
|
|
60312145 |
Aug 13, 2001 |
|
|
|
Current U.S.
Class: |
702/22 |
Current CPC
Class: |
G16B 40/10 20190201;
H01J 49/0036 20130101; Y10T 436/24 20150115; G16B 40/00
20190201 |
Class at
Publication: |
702/022 |
International
Class: |
G01N 031/00 |
Claims
1-44. (canceled)
45. A method of profiling a state of a biological system in an
animal, the method comprising the steps of: (a) providing a
plurality of data sets derived from a biological sample type, the
plurality of data sets comprising different types of measurements
of a sample of a biological system; (b) preprocessing the plurality
of the data sets to generate preprocessed data sets; (c) evaluating
the preprocessed data sets with a multivariate analysis and
correlating features among the preprocessed data sets to determine
one or more sets of differences among at least a portion of the
plurality of data sets; (d) generating a profile of one or more
biomarkers in response to one or more correlations, the profile
characterizing a state of the biological system; and (e) displaying
at least a portion of the data relevant to the profile.
46. The method of claim 45 wherein the step of preprocessing
comprises the step of normalizing the plurality of the data
sets.
47. The method of claim 45 wherein the step of preprocessing
comprises the step of reducing the noise of the plurality of the
data sets.
48. The method of claim 45 wherein the step of preprocessing
comprises the step of applying a partial linear fit technique to
the plurality of the data sets.
49. The method of claim 45, wherein the plurality of data sets
comprises different types of measurements of a biomolecule
component type.
50. The method of claim 49, wherein the biomolecule component type
is a gene transcript, a protein, or a metabolite.
51. The method of claim 45, wherein the different types of
measurements comprise different types of spectrometric
measurements.
52. The method of claim 51, wherein the spectrometric measurements
comprise data from different instrument configurations of the same
spectrometric technique.
53. The method of claim 45, wherein the biological sample type is
selected from the group consisting of blood, blood plasma, blood
serum, cerebrospinal fluid, bile, saliva, synovial fluid, pleural
fluid, pericardial fluid, peritoneal fluid, feces, nasal fluid,
ocular fluid, intracellular fluid, intercellular fluid, lymph
fluid, and urine.
54. The method of claim 45, wherein the biological sample type is
selected from the group consisting of liver cells, epithelial
cells, endothelial cells, kidney cells, prostate cells, blood
cells, lung cells, brain cells, skin cells, adipose cells, tumor
cells, and mammary cells.
55. The method of claim 45, wherein the plurality of data sets is
derived from one biological sample type.
56. The method of claim 45, wherein the plurality of data sets is
derived from one biological sample type that is treated
differently.
57. The method of claim 45, wherein the plurality of data sets are
derived from biological samples taken at different times for the
same organism.
58. The method of claim 45, wherein the biological system is in a
mammal.
59. An article of manufacture having a computer-readable medium
with computer-readable instructions embodied thereon for performing
the method of claim 45.
60. A method of profiling a state of a biological system of an
animal, the method comprising the steps of: (a) analyzing a sample
of a biological system to provide a plurality of data sets, the
plurality of data sets comprising different types of measurements
of the sample of the biological system; (b) preprocessing the
plurality of data sets to generate preprocessed data sets; (c)
evaluating the preprocessed data sets with a multivariate analysis
and correlating features among the plurality of the data sets to
determine one or more sets of differences among at least a portion
of the plurality of data sets; and (d) generating a profile of one
or more biomarkers in response to one or more correlations, the
profile characterizing a state of the biological system.
61. The method of claim 60, wherein the step of preprocessing
comprises the step of normalizing the plurality of the data
sets.
62. The method of claim 60, wherein the step of preprocessing
comprises the step of reducing the noise of the plurality of the
data sets.
63. The method of claim 60, wherein the step of preprocessing
comprises the step of applying a partial linear fit technique to
the plurality of the data sets.
64. The method of claim 60, wherein the plurality of data sets
comprises different types of measurements of a biomolecule
component type.
65. The method of claim 64, wherein the biomolecule component type
is a gene transcript, a protein, or a metabolite.
66. The method of claim 60, wherein the different types of
measurements comprise different types of spectrometric
measurements.
67. The method of claim 66, wherein the spectrometric measurements
comprise data from different instrument configurations of the same
spectrometric technique.
68. The method of claim 60, wherein the biological sample type is
selected from the group consisting of blood, blood plasma, blood
serum, cerebrospinal fluid, bile, saliva, synovial fluid, pleural
fluid, pericardial fluid, peritoneal fluid, feces, nasal fluid,
ocular fluid, intracellular fluid, intercellular fluid, lymph
fluid, and urine.
69. The method of claim 60, wherein the biological sample type is
selected from the group consisting of liver cells, epithelial
cells, endothelial cells, kidney cells, prostate cells, blood
cells, lung cells, brain cells, skin cells, adipose cells, tumor
cells, and mammary cells.
70. The method of claim 60, wherein the plurality of data sets is
derived from one biological sample type.
71. The method of claim 60, wherein the plurality of data sets is
derived from one biological sample type that is treated
differently.
72. The method of claim 60, wherein the plurality of data sets are
derived from biological samples taken at different times for the
same organism.
73. The method of claim 60, wherein the biological system is in a
mammal.
74. An article of manufacture having a computer-readable medium
with computer-readable instructions embodied thereon for performing
the method of claim 60.
75. A method of profiling a state of an animal, the method
comprising the steps of: (a) providing a plurality of data sets
comprising different types of measurements of a gene transcript, a
protein, or a metabolite derived from a sample of an animal; (b)
preprocessing the plurality of the data sets to generate
preprocessed data sets; (c) evaluating the preprocessed data sets
with a multivariate analysis and correlating features among the
preprocessed data sets to determine one or more sets of differences
among at least a portion of the plurality of data sets; (d)
generating a profile of one or more biomarkers in response to one
or more correlations, the profile characterizing a state of the
animal; and (e) displaying at least a portion of the data relevant
to the profile.
76. A method of profiling a state of an animal, the method
comprising the steps of: (a) analyzing a sample of an animal to
provide a plurality of data sets, the plurality of data sets
comprising different types of measurements of a gene transcript, a
protein, or a metabolite; (b) preprocessing the plurality of data
sets to generate preprocessed data sets; (c) evaluating the
preprocessed data sets with a multivariate analysis and correlating
features among the plurality of the data sets to determine one or
more sets of differences among at least a portion of the plurality
of data sets; and (d) generating a profile of one or more
biomarkers in response to one or more correlations, the profile
characterizing a state of the animal.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims the benefit of and priority
to copending U.S. provisional application No. 60/312,145, filed
Aug. 13, 2001, the entire disclosure of which is herein
incorporated by reference.
FIELD OF THE INVENTION
[0002] The invention relates to the field of data processing and
evaluation. In particular, the invention relates to an analytical
technology platform for separating and measuring multiple
components of a biological sample, and statistical data processing
methods for identifying components and revealing patterns and
relationships between and among the various measured
components.
BACKGROUND
[0003] The characterization of complex mixtures has become
important in a variety of research and application areas, including
pharmaceuticals, biotechnological research, and nutraceutical
(functional food) topics. One important area is the study of small
molecules in pharmaceutical and biotechnology research, often
referred to as metabolomics.
[0004] For example, an important challenge in the development of
new drugs for complex (multi-factorial) diseases is the tracing and
validation of biomarkers/surrogate markers. Moreover, it appears
that instead of single biomarkers, biomarker-patterns may be
necessary to characterize and diagnose homeostasis or disease
states for such diseases.
[0005] In the discipline of metabolomics, the current art in the
field of biological sample profiling is based either on measurement
by nuclear magnetic resonance ("NMR") or by mass spectrometry
("MS") that focuses on a limited number of small molecule
compounds. Both of these profiling approaches have limitations. The
NMR approaches are limited in that they typically provide reliable
profiles only of compounds present at high concentration. On the
other hand, focused mass spectrometry based approaches do not
require high concentrations but can provide profiles of only
limited portions of the metabolome. What is needed is an approach
that can address limitations in current profiling techniques and
that facilitates the discernment of correlations between components
or patterns of component (such as biomarker patterns).
SUMMARY OF THE INVENTION
[0006] The present invention addresses limitations in current
profiling techniques by providing a method and system (or
collectively "technology platform") utilizing hierarchical
multivariate analysis of spectrometric data on one or more levels.
The present invention further provides a technology platform that
facilitates the discernment of similarities, differences, and/or
correlations not only between single biomolecular components of a
sample or biological system, but also between patterns of
biomolecular components of a single bimolecular component type.
[0007] As used herein, the term "biomolecule component type" refers
to a class of biomolecules generally associated with a level of a
biological system. For example, gene transcripts are one example of
a biomolecule component type that are generally associated with
gene expression in a biological system, and the level of a
biological system referred to as genomics or functional genomics.
Proteins are another example of a biomolecule component type and
generally associated with protein expression and modification,
etc., and the level of a biological system referred to as
proteomics. Further, another example of a biomolecule component
type are metabolites, which are generally associated with the level
of a biological system referred to as metabolomics.
[0008] The present invention provides a method and system for
profiling a biological system utilizing a hierarchical multivariate
analysis of spectrometric data to generate a profile of a state of
a biological system. The states of a biological system that may be
profiled by the invention include, but are not limited to, disease
state, pharmacological agent response, toxicological state,
biochemical regulation (e.g., apoptosis), age response,
environmental response, and stress response. The present invention
may use data on a biomolecule component type (e.g., metabolites,
proteins, gene transcripts, etc.) from multiple biological sample
types (e.g., body fluids, tissue, cells) obtained from multiple
sources (such as, for example, blood, urine, cerebospinal fluid,
epithelial cells, endothelial cells, different subjects, the same
subject at different times, etc.). In addition, the present
invention may use spectrometric data obtained on one or more
platforms including, but not limited to, MS, NMR, liquid
chromatography ("LC"), gas-chromatography ("GC"), high performance
liquid chromatography ("HPLC"), capillary electrophoresis ("CE"),
and any known form of hyphenated mass spectrometry in low or high
resolution mode, such as LC-MS, GC-MS, CE-MS, LC-UV, MS-MS,
MS.sup.n, etc.
[0009] As used herein, the term "spectrometric data" includes data
from any spectrometric or chromatographic technique and the term
"spectrometric measurement" includes measurements made by any
spectrometric or chromatographic technique. Spectrometric
techniques include, but are not limited to, resonance spectroscopy,
mass spectroscopy, and optical spectroscopy. Chromatographic
techniques include, but are not limited to, liquid phase
chromatography, gas phase chromatography, and electrophoresis.
[0010] As used herein, the terms "small molecule" and "metabolite"
are used interchangeably. Small molecules and metabolites include,
but are not limited to, lipids, steroids, amino acids, organic
acids, bile acids, eicosanoids, peptides, trace elements, and
pharmacophore and drug breakdown products.
[0011] In one aspect, the present invention provides a method of
spectrometric data processing utilizing multiple steps of a
multivariate analysis to process data in a hierarchal procedure. In
one embodiment, the method uses a first multivariate analysis on a
plurality of data sets to discern one or more sets of differences
and/or similarities between them and then uses a second
multivariate analysis to determine a correlation (and/or
anti-correlation, i.e., negative correlation) between at least one
of these sets of differences (or similarities) and one or more of
the plurality of data sets. The method may further comprise
developing a profile for a state of a biological system based on
the correlation.
[0012] As used herein, the term "data sets" refers to the
spectrometric data associated with one or more spectrometric
measurements. For example, where the spectrometric technique is
NMR, a data set may comprise one or more NMR spectra. Where the
spectrometric technique is UV spectroscopy, a data set may comprise
one or more UV emission or absorption spectra. Similarly, where the
spectrometric technique is MS, a data set may comprise one or more
mass spectra. Where the spectrometric technique is a
chromatographic-MS technique (e.g., LC-MS, GC-MS, etc.), a data set
may comprise one or more mass chromatograms. Alternatively, a data
set of a chromatographic-MS technique may comprise one or more a
total ion current ("TIC") chromatograms or reconstructed TIC
chromatograms. In addition, it should be realized that the term
"data set" includes both raw spectrometric data and data that has
been preprocessed (e.g., to remove noise, baseline, detect peaks,
to normalize, etc.).
[0013] Moreover, as used herein, the term "data sets" may refer to
substantially all or a sub-set of the spectrometric data associated
with one or more spectrometric measurements. For example, the data
associated with the spectrometric measurements of different sample
sources (e.g., experimental group samples v. control group samples)
may be grouped into different data sets. As a result, a first data
set may refer to experimental group sample measurements and a
second data set may refer to control group sample measurements. In
addition, data sets may refer to data grouped based on any other
classification considered relevant. For example, data associated
with the spectrometric measurements of a single sample source
(e.g., experimental group) may be grouped into different data sets
based, for example, on the instrument used to perform the
measurement, the time a sample was taken, the appearance of the
sample, etc. Accordingly, one data set (e.g., grouping of
experimental group samples based on appearance) may comprise a
sub-set of another data set (e.g., the experimental group data
set).
[0014] In another aspect, the present invention provides a method
of spectrometric data processing utilizing multivariate analysis to
process data at two or more hierarchal levels of correlation. In
one embodiment, the method uses a multivariate analysis on a
plurality of data sets to discern correlations (and/or
anti-correlations) between data sets at a first level of
correlation, and then uses the multivariate analysis to discern
correlations (and/or anti-correlations) between data sets at a
second level of correlation. The method may further comprise
developing a profile for a state of a biological system based on
the correlations discerned at one or more levels of
correlation.
[0015] In yet another aspect, the present invention provides a
method of spectrometric data processing utilizing multiple steps of
a multivariate analysis to process data sets in a hierarchal
procedure, wherein one or more of the multivariate analysis steps
further comprises processing data at two or more hierarchal levels
of correlation. For example, in one embodiment, the method
comprises: (1) using a first multivariate analysis on a plurality
of data sets to discern one or more sets of differences and/or
similarities between them; (2) using a second multivariate analysis
to determine a first level of correlation (and/or anti-correlation)
between a first sets of differences (or similarities) and one or
more of the data sets; and (3) using the second multivariate
analysis to determine a second level of correlation (and/or
anti-correlation) between the first sets of differences (or
similarities) and one or more of the data sets. The method of this
aspect may also comprise developing a profile for a state of a
biological system based on the correlations discerned at one or
more levels of correlation.
[0016] In other aspects of the invention, the present invention
provides systems adapted to practice the methods of the invention
set forth above. In one embodiment, the system comprises a
spectrometric instrument and a data processing device. In another
embodiment, the system further comprises a database accessible by
the data processing device. The data processing device may comprise
an analog and/or digital circuit adapted to implement the
functionality of one or more of the methods of the present
invention.
[0017] In some embodiments, the data processing device may
implement the functionality of the methods of the present invention
as software on a general purpose computer. In addition, such a
program may set aside portions of a computer's random access memory
to provide control logic that affects the hierarchical multivariate
analysis, data preprocessing and the operations with and on the
measured interference signals. In such an embodiment, the program
may be written in any one of a number of high-level languages, such
as FORTRAN, PASCAL, C, C++, or BASIC. Further, the program may be
written in a script, macro, or functionality embedded in
commercially available software, such as EXCEL or VISUAL BASIC.
Additionally, the software could be implemented in an assembly
language directed to a microprocessor resident on a computer. For
example, the software could be implemented in Intel 80.times.86
assembly language if it were configured to run on an IBM PC or PC
clone. The software may be embedded on an article of manufacture
including, but not limited to, "computer-readable program means"
such as a floppy disk, a hard disk, an optical disk, a magnetic
tape, a PROM, an EPROM, or CD-ROM.
[0018] In a further aspect, the present invention provides an
article of manufacture where the functionality of a method of the
present invention is embedded on a computer-readable medium, such
as, but not limited to, a floppy disk, a hard disk, an optical
disk, a magnetic tape, a PROM, an EPROM, CD-ROM, or DVD-ROM.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] The foregoing and other features and advantages of the
invention, as well as the invention itself, will be more fully
understood from the description, drawings, and claims that follow.
The drawings are not necessarily drawn to scale, and like reference
numerals refer to the same parts throughout the different
views.
[0020] FIG. 1A is a flow diagram of analyzing a plurality of data
sets according to various embodiments of the present invention.
[0021] FIG. 1B is a flow diagram of analyzing a plurality of data
sets according to various other embodiments of the present
invention.
[0022] FIGS. 2A and 2B are flow diagrams of the analysis performed
according to various embodiments of the present invention on a
plurality of data sets of multiple biological sample types obtained
from wildtype mice and APO E3 Leiden mice.
[0023] FIGS. 3A and 3B are examples of partial 400 MHz .sup.1H-NMR
spectra for urine samples of wildtype mouse samples, FIG. 3A and
APO E3 mouse samples, FIG. 3B.
[0024] FIGS. 4A and 4B are examples of partial 400 MHz .sup.1H-NMR
spectra for urine samples of wildtype mouse samples, FIG. 4A and
APO E3 mouse samples, FIG. 4B.
[0025] FIGS. 5A and 5B are examples of partial 400 MHz .sup.1H-NMR
spectra for blood plasma samples of wildtype mouse samples, FIG.
5A, and APO E3 mouse samples, FIG. 5B.
[0026] FIGS. 6A and 6B are examples of partial 400 MHz .sup.1H-NMR
spectra for blood plasma samples of wildtype mouse samples, FIG.
6A, and APO E3 mouse samples, FIG. 6B.
[0027] FIGS. 7A and 7B are examples of a blood plasma lipid profile
obtained by a LC-MS spectrometric technique using ESI on APO E3
mouse blood plasma samples, FIG. 7A, and wildtype mouse samples,
FIG. 7B.
[0028] FIG. 8 is an example of a PCA-DA score plot of the NMR data
for the urine samples of data sets 1 and 2 of FIGS. 2A and 2B.
[0029] FIG. 9 is an example of a PCA-DA score plot of the NMR data
for the urine samples of data set 1 (wildtype mouse) of FIGS. 2A
and 2B.
[0030] FIG. 10 is an example of a PCA-DA score plot of the NMR data
for the urine samples of data set 2 (APO E3 mouse) of FIGS. 2A and
2B.
[0031] FIG. 11 is an example of a PCA-DA score plot of the NMR data
for the urine samples of both wildtype and APO E3 mice.
[0032] FIG. 12 is an example of a PCA-DA score plot of the NMR data
for the blood plasma samples of data sets 3 and 4 of FIGS. 2A and
2B.
[0033] FIG. 13 is an example of a PCA-DA score plot of the LC-MS
data on the blood plasma samples of data sets 5, 6 of FIGS. 2A and
2B and human samples.
[0034] FIG. 14 is an example of a loading plot for axis D2 of FIG.
13.
[0035] FIG. 15 is an example of the comparison of normalized blood
plasma lipid profiles obtained by an LC-MS spectrometric technique
for wildtype mouse samples (thin sold line) and APO E3 mouse
samples (thick sold line).
[0036] FIG. 16 is an example of the comparison of normalized blood
plasma lipid profiles obtained by an LC-MS spectrometic technique
for wildtype mouse samples (thin sold line) and APO E3 mouse
samples (thick sold line).
[0037] FIG. 17 is an example of a canonical correlation score plot
for spectrometric data for one biological sample type (blood
plasma) from two different spectrometric techniques (NMR and
LC-MS).
[0038] FIG. 18 is an example of a canonical correlation score plot
for spectrometric data for one biological sample type (blood
plasma) from the same general spectrometric technique but different
instrument configurations.
[0039] FIG. 19 is a schematic representation of one embodiment of a
system adapted to practice the methods of the invention.
DETAILED DESCRIPTION
[0040] Referring to FIG. 1A, a flow chart of one embodiment of a
method according to the present invention is shown. One or more of
a plurality of data sets 110 are preferably subjected to a
preprocessing step 120 prior to multivariate analysis. Suitable
forms of preprocessing include, but are not limited to, data
smoothing, noise reduction, baseline correction, normalization and
peak detection. Preferable forms of data preprocessing include
entropy-based peak detection (such as disclosed in pending U.S.
patent application Ser. No. 09/920,993, filed Aug. 2, 2001, the
entire contents of which are hereby incorporated by reference) and
partial linear fit techniques (such as found in J. T. W. E. Vogels
et al., "Partial Linear Fit: A New NMR Spectroscopy Processing Tool
for Pattern Recognition Applications," Journal of Chemometrics,
vol. 10, pp. 425-38 (1996)). A multivariate analysis is then
performed at a first level of correlation 130 to discern
differences (and/or similarities) between the data sets. Suitable
forms of multivariate analysis include, for example, principal
component analysis ("PCA"), discriminant analysis ("DA"), PCA-DA,
canonical correlation ("CC"), partial least squares ("PLS"),
predictive linear discriminant analysis ("PLDA"), neural networks,
and pattern recognition techniques. In one embodiment, PCA-DA is
performed at a first level of correlation that produces a score
plot (i.e., a plot of the data in terms of two principal
components; see, e.g, FIGS. 8-12 which are described further
below). Subsequently, the same or a different multivariate analysis
is performed on the data sets at a second level of correlation 140
based on the differences (and/or similarities) discerned from the
first level of correlation.
[0041] For example, in one embodiment, where the first level
comprises a PCA-DA score plot, the second level of correlation
comprises a loading plot produced by a PCA-DA analysis. This second
level of correlation bears a hierarchical relationship to the first
level in that loading plots provide information on the
contributions of individual input vectors to the PCA-DA that in
turn are used to produce a score plot. For example, where each data
set comprises a plurality of mass chromatograms, a point on a score
plot represents mass chromatograms originating from one sample
source. In comparison, a point on a loading plot represents the
contribution of a particular mass (or range of masses) to the
correlations between data sets. Similarly, where each data set
comprises a plurality of NMR spectra, a point on a score plot
represents one NMR spectrum. In comparison, a point on the
corresponding loading plot represents the contribution of a
particular NMR chemical shift value (or range of values) to the
correlations between data sets.
[0042] Referring again to FIG. 1A, based on the correlations
discerned in the analysis at the first level of correlation 130
and/or that at the second level of correlation 140 a profile may be
developed 151 ("NO" to inspect spectra query 160). For example, the
region in a score plot where the data points fall for a certain
group of data sets may comprise a profile for the state of a
biological system associated with that group. Further, the profile
may comprise both the above region in a score plot and a specific
level of contribution from one or more points in an associated
loading plot. For example, where the data sets comprise mass
chromatograms and/or mass spectra, a biological system may only fit
into the profile of a state if spectrometric data sets from
appropriate samples fall in a certain region of the score plot and
if the mass chromatograms for a particular range of masses provide
a significant contribution to the correlation observed in the score
plot. Similarly, where the data sets comprise NMR spectra, a
biological system may only fit into the profile of a state if
spectrometric data sets from appropriate samples fall in a certain
region of the score plot and if a particular range of chemical
shift values in the NMR spectra provide a significant contribution
to the correlation observed in the score plot.
[0043] In addition, the method may further include a step of
inspection 155 of one or more specific spectra of the data sets
("YES" to inspect spectra query 160) based on the correlations
discerned in the analysis at the first level of correlation 130
and/or that at the second level of correlation 140. A profile based
on this inspection is then developed 152. For example, where the
spectra of the data sets comprise mass chromatograms, the method
inspects the mass chromatograms of those mass ranges showing a
significant contribution to the correlation based on the loading
plot. Inspection of these mass chromatograms, for example, may
reveal what species of chemical compounds are associated with the
profile. Such information may be of particular importance for
biomarker identification and drug target identification.
[0044] Referring to FIG. 1B, a flow chart of another embodiment of
a method according to the present invention is shown. One or more
of a plurality of data sets 210 are preferably subjected to a
preprocessing step 220 prior to multivariate analysis. A first
multivariate analysis is then performed 230 on a plurality of data
sets to discern one or more sets of differences and/or similarities
between them. The first multivariate analysis may be performed
between sub-sets of the data sets. For example, the first
multivariate analysis may be performed between data set 1 and data
set 2, 231 and the first multivariate analysis may be performed
separately between data set 2 and data set 3, 232. The method then
uses a second multivariate analysis 240 to determine a correlation
between at least one of the sets of differences (or similarities)
discerned in the first multivariate analysis and one or more of the
data sets. This second multivariate analysis 240 bears a hierarchal
relationship to the first 230 in that the differences between data
sets are discerned in a hierarchal fashion. For example, the
differences between data sets 1 and 2 (and data sets 2 and 3) are
first discerned 231, 232 and then those differences are subjected
to further multivariate analysis 240. In one embodiment, a profile
based on the correlations discerned in the second multivariate
analysis 240 is developed 250.
[0045] In addition, any of the multivariate analysis steps 231,
232, 240 may further comprise a step of performing the same or a
different multivariate analysis at another level of correlation 260
(for example, such as described with respect to FIG. 1A) based on
the differences (and/or similarities) discerned from the level of
correlation used in a prior multivariate analysis step 231, 232,
240. A profile based on the information from one or more of these
levels of correlation may then be developed 250, 251 ("NO" to
inspect spectra query 270). Alternatively, the method may further
include a step of inspection 255 of one or more specific spectra of
the data sets ("YES" to inspect spectra query 270) based on the
correlations discerned in the analysis at one ore more levels of
correlation and/or one or more multivariate analysis steps. A
profile based on this inspection then may be developed 252.
[0046] The methods of the present invention may be used to develop
profiles on any biomolecular component type. Such profiles
facilitate the development of comprehensive profiles of different
levels of a biological system, such as, for example, genome
profiles, transcriptomic profiles, proteome profiles, and
metabolome profiles. Further, such methods may be used for data
analysis of spectrometric measurements (of, for example, plasma
samples from a control and patient group), may be used to evaluate
any differences in single components or patterns of components
between the two groups exist in order to obtain a better insight
into underlying biological mechanisms, to detect novel
biomarkers/surrogate markers, and/or develop intervention
routes.
[0047] In various embodiments, the present invention provides
methods for developing profiles of metabolites and small molecules.
Such profiles facilitate the development of comprehensive
metabolome profiles. In other various embodiments, the present
invention provides methods for developing profiles of proteins,
protein-complexes and the like. Such profiles facilitate the
development of comprehensive proteome profiles. In yet other
various embodiments, the present invention provides methods for
developing profiles of gene transcripts, mRNA and the like. Such
profiles facilitate the development of comprehensive genome
profiles.
[0048] In one version of these embodiments, the method is generally
based on the following steps: (1) selection of biological samples,
for example body fluids (plasma, urine, cerebral spinal fluid,
saliva, synovial fluid etc.); (2) sample preparation based on the
biochemical components to be investigated and the spectrometric
techniques to be employed (e.g., investigation of lipids, proteins,
trace elements, gene expression, etc.); (3) measurement of the high
concentration components in the biological samples using methods
mass spectrometry and NMR; (4) measurement of selected molecule
subclasses using NMR-profiles and preferred MS-approaches to study
compounds such as, for example, lipids, steroids, bile acids,
eicosanoids, (neuro)peptides, vitamins, organic acids,
neurotransmitters, amino acids, carbohydrates, ionic organics,
nucleotides, inorganics, xenobiotics etc.; (5) raw data
preprocessing; (6) data analysis using multivariate analysis
according to any of the methods of the present invention (e.g., to
identify patterns in measurements of single subclasses of molecules
or in measurements of high concentration components using NMR or
mass spectrometry); and (7) using of multivariate analysis to
combine data sets from distinct experiments and find patterns of
interest in the data. In addition, the method may further comprise
a step of (8) acquiring data sets at a number of points in time to
facilitate the monitoring of temporal changes in the multivariate
patterns of interest.
[0049] The methods of the present invention may be used to develop
profiles on a biomolecular component type obtained from a wide
variety of biological sample types including, but not limited to,
blood, blood plasma, blood serum, cerebrospinal fluid, bile acid,
saliva, synovial fluid, plueral fluid, pericardial fluid,
peritoneal fluid, feces, nasal fluid, ocular fluid, intracellular
fluid, intercellular fluid, lymph urine, tissue, liver cells,
epithelial cells, endothelial cells, kidney cells, prostate cells,
blood cells, lung cells, brain cells, adipose cells, tumor cells
and mammary cells.
[0050] In another aspect, the present invention provides an article
of manufacture where the functionality of a method of the present
invention is embedded on a computer-readable medium, such as, but
not limited to, a floppy disk, a hard disk, an optical disk, a
magnetic tape, a PROM, an EPROM, CD-ROM, or DVD-ROM. The
functionality of the method may be embedded on the
computer-readable medium in any number of computer-readable
instructions, or languages such as, for example, FORTRAN, PASCAL,
C, C++, BASIC and assembly language. Further, the computer-readable
instructions can, for example, be written in a script, macro, or
functionally embedded in commercially available software (such as,
e.g., EXCEL or VISUAL BASIC).
[0051] In other aspects, the present invention provides systems
adapted to practice the methods of the present invention. Referring
to FIG. 19, in one embodiment, the system comprises one or more
spectrometric instruments 1910 and a data processing device 1920 in
electrical communication, wireless communication, or both. The
spectrometric instrument may comprise any instrument capable of
generating spectrometric measurements useful in practicing the
methods of the present invention. Suitable spectrometric
instruments include, but are not limited to, mass spectrometers,
liquid phase chromatographers, gas phase chromatographer, and
electrophoresis instruments, and combinations thereof. In another
embodiment, the system further comprises an external database 1930
storing data accessible by the data processing device, wherein the
data processing device implement the functionality of one or more
of the methods of the present invention using at least in part data
stored in the external database.
[0052] The data processing device may comprise an analog and/or
digital circuit adapted to implement the functionality of one or
more of the methods of the present invention using at least in part
information provided by the spectrometric instrument. In some
embodiments, the data processing device may implement the
functionality of the methods of the present invention as software
on a general purpose computer. In addition, such a program may set
aside portions of a computer's random access memory to provide
control logic that affects the spectrometric measurement
acquisition, multivariate analysis of data sets, and/or profile
development for a biological system. In such an embodiment, the
program may be written in any one of a number of high-level
languages, such as FORTRAN, PASCAL, C, C++, or BASIC. Further, the
program can be written in a script, macro, or functionality
embedded in proprietary software or commercially available
software, such as EXCEL or VISUAL BASIC. Additionally, the software
could be implemented in an assembly language directed to a
microprocessor resident on a computer. For example, the software
can be implemented in Intel 80.times.86 assembly language if it is
configured to run on an IBM PC or PC clone. The software may be
embedded on an article of manufacture including, but not limited
to, a computer-readable program medium such as a floppy disk, a
hard disk, an optical disk, a magnetic tape, a PROM, an EPROM, or
CD-ROM.
EXAMPLE
Small Molecule Study of the E3 Mouse Model for Atherosclerosis
[0053] An example of the practice of various embodiments of the
present invention is illustrated below in the context of a small
molecule study of the APO E3 Leiden transgenic mouse model.
[0054] A. The APO E3 Leiden Mouse
[0055] The APO E3 Leiden mouse model is a transgenic animal model
described in "The Use of Transgenic Mice in Drug Discovery and Drug
Development," by P. L. B. Bruijnzeel, TNO Pharma, Oct. 24, 2000.
Briefly, the APO E3-Leiden allele is identical to the APO E4
(Cys112.fwdarw.Arg) allele, but includes an in frame repeat of 21
nucleotides in exon 4, resulting in tandem repeat of codon 120-126
or 121-127. Transgenic mice expressing APO E3-Leiden mutation are
known to have hyperlipidemic phenotypes that under specific
conditions lead to the development of atherosclerotic plaques. The
model has a high predicted success rate in finding differences at
the small molecule (metabolite) and protein levels, while the gene
level is very well characterized.
[0056] In the present example, 10 wildtype and 10 APO E3 male mice
were sacrificed after collection of urine in metabolic cages. The
APO E3 mice were created by insertion of a well-defined human gene
cluster (APO E3-APC1), and a very homogeneous population was
generated by at least 20 inbred generations.
[0057] The following samples were available for analysis: (1) 10
wildtype and 10 APO E3 urine samples (about 0.5 ml/animal or more);
(2) 10 wildtype and 10 APO E3 (heparin) plasma samples (about 350
.mu.l/animal); (3) 10 wildtype and 10 APO E3 liver samples. From
the plasma samples 100 microliters were used for NMR and the same
samples were used for LC-MS, about 250 ul is available for protein
work and duplicates. All samples were stored at -20 C. In total, 19
plasma samples were received. One sample, animal #6 (APO-E3 Leiden
group) was not present. After cleanup, (described below) the
portions reserved for proteomics research were transferred to
-70.degree. C.
[0058] B. Experimental Details, Plasma and Urine Samples
[0059] Plasma sample extraction was accomplished with isopropanol
(protein precipitation). LC-MS lipid profile measurements of the
plasma samples were obtained with on an electrospray ionization
("ESI") and atmospheric pressure chemical ionization ("APCI") LC-MS
system. The resultant raw data was preprocessed with an
entropy-based peak detection technique substantially similar to
that disclosed in pending U.S. patent application Ser. No.
09/920,993, filed Aug. 2, 2001. The preprocessed data was then
subjected to principal component analysis ("PCA") and/or
discriminant analysis ("DA") according to the methods of the
present invention. The raw data from the NMR measurements of the
plasma samples was subjected to a pattern recognition analysis
("PARC"), which included preprocessing (such as a partial linear
fit), peak detection and multivariate statistical analysis.
[0060] Urine samples were prepared and NMR measurements of the
urine samples were obtained. The raw NMR data on the urine samples
was also subjected to a PARC analysis, which included
preprocessing, peak detection and multivariate statistical
analysis.
[0061] B.1. Mouse Blood Plasma Preparation and Cleanup
[0062] The mouse plasma samples were thawed at room temperature.
Aliquots of 100 .mu.l were transferred to a clean eppendorf vials
and stored at -70.degree. C. The sample volume for sample #12 was
low and only 50 .mu.l was transferred. For NMR and LC-MS lipid
analysis 150 .mu.l aliquots were transferred to clean eppendorf
vials.
[0063] Plasma samples were cleaned up and handled substantially
according to the following protocol: (1) add 0.6 ml of isopropanol;
(2) vortex; (3) centrifuge at 10,000 rpm for 5 min.; (4) transfer
500 .mu.l to clean tube for NMR analysis; (5) transfer 100 .mu.l to
clean eppendorf vial; (6) add 400 .mu.l water and mix; and (7)
transfer 200 .mu.l to autosampler vial insert. The remaining
extract and pellet (precipitated protein) were stored at
-20.degree. C.
[0064] B.2. Human Blood Plasma Preparation and Cleanup
[0065] Human heparin plasma was obtained from a blood bank. In a
glass tube, 1 ml of human plasma and 4 ml of isopropanol were mixed
(vortexed). After centrifugation, 1 ml of extract was transferred
to a tube and 4 ml of water was added. The resulting solution was
transferred to 4 autosampler vials (1 ml).
[0066] B.3. LC-MS of Blood Plasma Samples:
[0067] Spectrometric measurments of plasma samples were made with a
combination HPLC-time-of-flight MS instrument. Efluent emerging
from the chromatograph was ionized by electrosrpay ionization
("ESI") and atmospheric pressure chemical ionization ("APCI").
Typical instrument parameters used with HPLC instrument are given
in Table 1 and details of the gradient in Table 2; typical
parameters for the ESI source are given in Table 3, and those for
the APCI source are given in Table 4.
1TABLE 1 HPLC Parameters Column: Inertsil ODS3 5 .mu.m, 100 .times.
3 mm i.d. (Chrompack); R.sub.2 guard column (Chrompack) Mobile
phase A: 5% acetonitrile, 50 ml MeCN, water ad 1000 ml, 10 ml
ammonium acetate solution (1 mol/l), 1 ml formic acid Mobile phase
B: 30% isopropanol in acetonitrile, 300 ml isopropanol,
acetonitrile ad 1000 ml, 10 ml ammonium acetate solution (1 mol/l),
1 ml formic acid Mobile phase C: 50% dichloromethane in
isopropanol, 500 ml isopropanol, dichloromethane ad 1000 ml, 10 ml
ammonium acetate solution (1 mol/l), 1 ml formic acid Temperature:
ca. 20.degree. C. (conditioned laboratory) Injection volume: 75
.mu.l
[0068]
2TABLE 2 HPLC Gradient Time (min) Flow (ml/min) % A % B % C 0 0.7
70 30 2 0.7 70 30 15 0.7 5 95 35 0.7 5 35 60 40 0.7 5 35 60 41 0.7
5 95 45 0.7 70 30
[0069]
3TABLE 3 Electrospray (ESI) Parameters Mode: positive (+) Cap.
Heater: 250.degree. C. Spray voltage: 4 kV Sheath gas: 70 units
Aux. Gas: 15 units Scan: 200 to 1750, 1 s/scan
[0070]
4TABLE 4 Atmospheric Pressure Chemical Ionization (APCI) Parameters
Mode: positive (+) Cap. Heater: 175.degree. C. Vaporizer:
450.degree. C. Corona: 5 .mu.A Sheath gas: 70 units Aux. Gas: 0
units Scan: 200 to 1750, 1 s/scan
[0071] The injection sequence for samples was as follows. The mouse
plasma extracts were injected twice in a random order. The human
plasma extract was injected twice at the start of the sequence and
after every 5 injections of the mouse plasma extracts to monitor
the stability of the LC-MS conditions. The random sequence was
applied to prevent the detrimental effects of possible drift on the
multivariate statistics.
[0072] B.4. NMR of Plasma and Urine Samples:
[0073] NMR spectometric measurements of plasma samples were made
with a 400 MHz .sup.1H-NMR. Samples for the NMR were prepared and
handled substantially in accord with the following protocol.
Isopropanol plasma extracts (500 .mu.l from 2.3.1) were dried under
nitrogen, whereafter the residues were dissolved in deuterated
methanol (MeOD). Deuterated methanol was selected because it gave
the best NMR spectra when chlorofom, water, methanol and
dimethylsulfoxide (all deuterated) were compared.
[0074] NMR spectrometric measurements of urine samples were also
made with a 400 MHz .sup.1H-NMR.
[0075] C. Spectrometric Measurements and Analysis
[0076] The following spectrometric measurements were made at
metabolite/ small molecule level:
[0077] NMR--measurements of urine, multiple measurements
(preferably triplicate measurements) on a total of 40 samples;
[0078] NMR--measurement of plasma, multiple measurements
(preferably triplicate measurements) on a total of 40 samples;
and
[0079] LC/MS--measurement of plasma (plasmalipid profile), multiple
measurements (preferably triplicate measurements) on a total of 40
samples.
[0080] A flow chart illustrating the analysis of the spectrometric
data of this example according to one embodiment of the present
invention is shown in FIGS. 2A and 2B.
[0081] Referring to FIG. 2A, the spectrometric data obtained was
grouped into eight data sets 301-308. The data sets were as
follows: (1) data set 1 comprised 400 MHz 1H-NMR spectra of
wildtype mouse urine samples 301; (2) data set 2 comprised 400 MHz
1H-NMR spectra of APO E3 mouse urine samples 302; (3) data set 3
comprised 400 MHz 1H-NMR spectra of APO E3 mouse blood plasma
samples 303; (4) data set 4 comprised 400 MHz 1H-NMR spectra of
wildtype mouse blood plasma samples 304; (5) data set 5 comprised
LC-MS spectra (using ESI) of wildtype mouse blood plasma lipid
samples 305; (6) data set 6 comprised LC-MS spectra (using ESI) of
APO E3 mouse blood plasma lipid samples 306; (7) data set 7
comprised LC-MS spectra (using APCI) of APO E3 mouse blood plasma
lipid samples 307; and (8) data set 8 comprised LC-MS spectra
(using APCI) of wildtype mouse blood plasma lipid samples 308.
Examples of the spectrometric measurements obtained for each of
these data sets is as follows: FIGS. 3A and 4A for data set 1;
FIGS. 3B and 4B for data set 2; FIGS. 5B and 6B for data set 3;
FIGS. 5A and 6A for data set 4; FIG. 7B for data set 5; and FIG. 7A
for data set 6. Various features were noted in the data of FIGS.
3A-7B.
[0082] Referring to FIGS. 3A and 3B, it was noted that peaks
associated with hippuric acid 410 were observed in the wildtype
mouse urine sample .sup.1H-NMR spectra, while such peaks were
substantially absent from the APO E3 mouse urine sample .sup.1H-NMR
spectra, indicating a possible biochemical process unique to the
APO E3 mouse. Referring to FIGS. 4A and 4B, in addition, peaks
associated with an unidentified component 420 were observed in the
wildtype mouse urine sample .sup.1H-NMR spectra, which were also
substantially absent from corresponding .sup.1H-NMR spectra of the
APO E3 mouse urine samples.
[0083] Referring to FIGS. 5A and 5B, a two series of peaks 510, 520
were observed in the APO E3 mouse blood plasma sample .sup.1H-NMR
spectra, which were either substantially absent from the wildtype
spectra 510 or substantially reduced 520. As shown in FIGS. 6A and
6B, the peaks associated with the first series of peaks 510 are
substantially absent from the resonance shift region in wildtype
spectra 610, whole the second series of peaks 520 are present but
reduced in the wildtype spectra 620.
[0084] Referring to FIGS. 7A and 7B, it was noted that peaks
associated with lyso-phosphatidylcholines ("lyso-PC") 710 were
slightly reduced in intensity in the APO E3 mouse spectra relative
to those for the wildtype, that peaks associated with phospholipids
720 were substantially equal in intensity between the APO E3 and
wildtype spectra, and that peaks associated with triglycerides 730
were substantially increased in intensity in the APO E3 mouse
spectra relative to those for the wildtype.
[0085] The raw data from data sets 1 to 8 was preprocessed 320 and
a first multivariate analysis was performed between data sets 1 and
2, 3 and 4, 5 and 6, and 7 and 8, respectively, each at a first
level of correlation 330, i.e., PCA-DA score plots. Examples of the
results of the first multivariate analysis at a first level of
correlation are illustrated in FIGS. 8-11 for data sets 1 and 2;
FIG. 12 for data sets 3 and 4; and FIG. 13 for data sets 5 and 6
(which includes data from human samples). Data from the first
multivariate analysis was then used to produce an analysis at a
second level of correlation 340, i.e., PCA-DA loading plots. An
example of one such PCA-DA loading plot is shown in FIG. 14.
[0086] Referring to FIG. 8, a PCA-DA score plot of the NMR data for
the urine samples of data sets 1 and 2 is shown. As illustrated,
the analysis groups NMR data for APO E3 and wildtype group into two
substantially distinct regions in the score plot, an APO E3 region
810 and a wildtype region 820, indicating that urine samples alone
may suffice to develop a profile that reflects the transgenic
nature of the APO E3 mice and serve as a bodyfluid biomarker
profile for distinguishing APO E3 mice from other types of
mice.
[0087] Referring to FIG. 9, a score plot of the NMR data for the
urine samples of data set 1 is shown. As illustrated, the analysis
indicates that there are similarities and differences within the
urine samples of data set 1 that correlate with urine color.
Specifically, the analysis illustrates three distinct regions in
the score plot correlated to deep brown urine 910, brown urine 920,
and yellow urine 930. FIG. 9 illustrates that there are three
distinct subgroups of mouse urine profiles in the wildtype mouse
cohort.
[0088] Similarly in FIG. 10, a score plot of the NMR data for the
urine samples of data set 2 is shown. As illustrated, the analysis
indicates that there are similarities and differences within the
urine samples of data set 2 that correlate with urine color.
Specifically, the analysis illustrates three regions in the score
plot, one correlated to brown urine 1010, and another to pale brown
urine 1020, that slightly overlaps with a yellow urine correlated
region 1030. FIG. 10 illustrates that there are three subgroups of
mouse urine profiles in the APO E3 mouse cohort.
[0089] Referring to FIG. 11, a PCA-DA score plot of the NMR data
for the urine samples of both wildtype and APO E3 mice is shown. As
illustrated, the analysis indicates that there are similarities and
differences within the urine samples of data sets 1 and 2 even for
urine with the same color. Specifically, the analysis illustrates
three regions in the score plot, one correlated to yellow APO E3
mouse urine 1110, one to pale brown APO E3 mouse urine 1120, and
another to yellow wildtype mouse urine 1130. FIG. 11 illustrates
that there are three distinct subgroups of mouse urine profiles
which can be used as profiles to distinguish between APO E3 animals
from wildtype animals, and to distinguish animals producing yellow
urine from pale brown urine.
[0090] Referring to FIG. 12, a PCA-DA score plot of the NMR data
for the blood plasma samples of data sets 3 and 4 is shown. As
illustrated, the analysis groups NMR data for APO E3 and wildtype
group into two substantially distinct regions in the score plot, a
wildtype region 1210 and an APO E3 region 1220, indicating that
blood samples alone may be suffice to develop a profile that
distinguishes APO E3 mice from wildtype mice.
[0091] Referring to FIG. 13, a PCA-DA score plot of the NMR data
for the blood plasma samples of data sets 5, 6 and the human
samples is shown. As illustrated, the analysis groups NMR data
regions corresponding to each organism type, a human region 1310, a
wildtype region 1320 and an APO E3 region 1330. FIG. 13 indicates
that blood plasma samples may suffice to develop a profile that
distinguishes organisms and genotypes. In one embodiment,
information at a second level of correlation is obtained from the
analysis illustrated in FIG. 13 to investigate, for example, the
contribution of each metabolite measured by the NMR technique to
the segregation of the data into three regions. In one version a
loading plot is used to determine a second level of correlation. An
example of a loading plot for axis D2 of FIG. 13 is shown in FIG.
14.
[0092] Referring to FIGS. 14 and 2A, four ranges of numbers are
circled 1401-1404. The abscissa corresponds to masses (or
mass-to-charge ranges). Points with positive values along the
ordinate indicate component masses that are lower in abundance in
the APO E3 mouse versus wildtype, and negative values indicate the
reverse. As can be seen in FIG. 14, the circled ranges are a
significant contribution to the correlations of, for example, FIG.
13. The mass chromatograms associated these regions were
investigated 350 and the upper circled ranges 1401, 1403 found to
be associated with lyso-phosphatidylcholines ("lyso-PC"), and the
lower ranges 1402, 1404 with triglycerides. An example of the
phosphatidylcholine mass chromatograms for both wildtype and APO E3
mouse are shown in FIG. 15, and an example of the
lyso-phosphatidylcholine mass chromatograms for both wildtype and
APO E3 mouse are shown in FIG. 16.
[0093] Referring to FIG. 15, a series of peaks corresponding
phosphatidylcholines, where n refers to the number of residues, is
shown for both wildtype (thin solid line) and APO E3 (thick solid
line) plasma samples. The chromatograms in FIG. 15 are each
normalized such that the maximum intensity of the n=3 peak 1510 is
equal for all the spectra and it should be noted that although some
n=1 is present, the majority of the signal corresponding to this
peak location 1540 is not believed to arise from a
phosphatidylcholine. As illustrated, it was observed that the peaks
corresponding to n=5 1520, 1530 were substantially reduced in the
APO E3 mouse spectra relative to wildtype.
[0094] Referring to FIG. 16, a series of peaks corresponding
lyso-phosphatidylcholines, where the designation x:y refers to x
number of carbon atoms on the fatty acids and y carbon bonds, is
shown for both wildtype (thin solid line) and APO E3 (thick solid
line) plasma samples. The chromatograms in FIG. 16 are each
normalized such that the maximum intensity of peak 1610 is equal
for all the spectra. As illustrated, it was observed that the peaks
corresponding to arachidonic acid 1620, and linoleic acid 1630 were
substantially reduced in the APO E3 mouse spectra relative to
wildtype.
[0095] Referring again to FIGS. 2A and 2B, a second multivariate
analysis was also performed ("YES" to query 360) comprising a
canonical correlation. This second multivariate analysis was
performed on data sets 3, 4, 5, and 6, 371, to produce a canonical
correlation score plot 381. An example of the results of this
second multivariate analysis is shown in FIG. 17. It should be
noted that analysis 371 correlates data from two very different
spectrometric techniques: data sets 3 and 4 from NMR, and 5 and 6
from LC-MS. Such an analysis, for example, may discern whether
different information is being provided by such different
techniques.
[0096] As illustrated in FIG. 17, the canonical correlation groups
both NMR and LC-MS results for the APO E3 mouse and wildtype mouse
into two substantially distinct regions in the plot, a wildtype
region 1710 and an APO E3 region 1720, indicating that both NMR and
LC-MS techniques result in segregation into distinct regions,
however the LC-MS method yielded a more pronounced separation.
[0097] A second multivariate analysis was performed on data sets 5,
6, 7 and 8, 372, to produce a canonical correlation score plot 382.
An example of the results of this second multivariate analysis is
shown in FIG. 18. It should be noted that analysis 372 correlates
data from in many respects the same spectrometric technique LC-MS,
but different instrument configurations: data sets 5 and 6 using
ESI, and 7 and 8 using APCI. Such an analysis, for example, may
discern whether different information is being provided by such
different instrument configurations. In addition, such a
multivariate analysis may be used to discern whether different
machines (that use the exact same instrumentation) provide
different information. In cases where different machines provide
significantly different information (on the same sample, using the
same technique, parameters, and instrumentation) user or machine
errors may be detected.
[0098] As illustrated in FIG. 18, the canonical correlation groups
both ESI LC-MS results (crosses +) and APCI LC-MS results
(asterisks *) for the APO E3 mouse and wildtype mouse into two
substantially distinct regions in the plot, a wildtype region 1810
and an APO E3 region 1820, indicating that both ESI LC-MS and APCI
LC-MS techniques result in segregation into distinct regions.
[0099] While the invention has been particularly shown and
described with reference to specific embodiments, it should be
understood by those skilled in the art that various changes in form
and detail may be made therein without departing from the spirit
and scope of the invention as defined by the appended claims. The
scope of the invention is thus indicated by the appended claims and
all changes which come within the meaning and range of equivalency
of the claims are therefore intended to be embraced.
* * * * *