U.S. patent application number 10/475734 was filed with the patent office on 2005-06-16 for methods for analysis of spectral data and their applications.
Invention is credited to Brindle, Joanne Tracey, Graiger, David John, Holmes, Elaine, Lindon, John Christopher, Nicholson, Jeremy K..
Application Number | 20050130321 10/475734 |
Document ID | / |
Family ID | 32715151 |
Filed Date | 2005-06-16 |
United States Patent
Application |
20050130321 |
Kind Code |
A1 |
Nicholson, Jeremy K. ; et
al. |
June 16, 2005 |
Methods for analysis of spectral data and their applications
Abstract
This invention pertains to chemometric methods for the analysis
of chemical, biochemical, and biological data, for example,
spectral data, for example, nuclear magnetic resonance (NMR)
spectra, and their applications, including, e.g., classification,
diagnosis, prognosis.
Inventors: |
Nicholson, Jeremy K.;
(Croydon, GB) ; Holmes, Elaine; (London, GB)
; Lindon, John Christopher; (Westerham, GB) ;
Brindle, Joanne Tracey; (Watchfield, GB) ; Graiger,
David John; (Cambridge, GB) |
Correspondence
Address: |
Robert K Cerpa
Morrison & Foerster
755 Page Mill Road
Palo Alto
CA
94304-1018
US
|
Family ID: |
32715151 |
Appl. No.: |
10/475734 |
Filed: |
November 29, 2004 |
PCT Filed: |
April 23, 2002 |
PCT NO: |
PCT/GB02/01881 |
Current U.S.
Class: |
436/518 ; 436/84;
702/19 |
Current CPC
Class: |
A61B 5/412 20130101;
A61B 5/055 20130101; G01R 33/4625 20130101; G01R 33/465 20130101;
A61B 5/7264 20130101; A61P 19/08 20180101; A61B 5/7267 20130101;
A61B 5/7232 20130101; A61B 5/7203 20130101 |
Class at
Publication: |
436/518 ;
436/084; 702/019 |
International
Class: |
G06F 019/00; G01N
033/48; G01N 033/50; G01N 033/20; G01N 033/543 |
Foreign Application Data
Date |
Code |
Application Number |
Apr 23, 2001 |
GB |
00109930.8 |
Jul 17, 2001 |
GB |
0117428.3 |
Claims
1. A method of classifying a sample, said method comprising the
step of relating NMR spectral intensity at one or more
predetermined diagnostic spectral windows for said sample with a
predetermined condition.
2-182. (canceled)
183. A method of classifying a sample, said method comprising the
step of relating NMR spectral intensity at one or more
predetermined diagnostic spectral windows for said sample with a
predetermined condition.
184. A method according to claim 183, wherein said sample is a
sample from a subject and said predetermined condition is a
predetermined condition of said subject.
185. A method according to claim 183, wherein said relating with a
predetermined condition is relating with the presence or absence of
a predetermined condition.
186. A method according to claim 184, wherein said relating with a
predetermined condition is relating with the presence or absence of
a predetermined condition.
187. A method according to claim 183, wherein said relating NMR
spectral intensity is relating a modulation of NMR spectral
intensity, relative to a control value.
188. A method according to claim 184, wherein said relating NMR
spectral intensity is relating a modulation of NMR spectral
intensity, relative to a control value.
189. A method according to claim 185, wherein said relating NMR
spectral intensity is relating a modulation of NMR spectral
intensity, relative to a control value.
190. A method according to claim 186, wherein said relating NMR
spectral intensity is relating a modulation of NMR spectral
intensity, relative to a control value.
191. A method according to claim 183, wherein said one or more
predetermined diagnostic spectral windows is: a single
predetermined diagnostic spectral window.
192. A method according to claim 183, wherein said one or more
predetermined diagnostic spectral windows is: a plurality of
predetermined diagnostic spectral windows.
193. A method according to claim 183, wherein said one or more
predetermined diagnostic spectral windows is: a plurality of
diagnostic spectral windows, and, said NMR spectral intensity at
one or more predetermined diagnostic spectral windows is: a
combination of a plurality of NMR spectral intensities, each of
which is NMR spectral intensity for one of said plurality of
predetermined diagnostic spectral windows.
194. A method according to claim 193, wherein said combination is a
linear combination.
195. A method according to claim 183, wherein said one or more,
predetermined diagnostic spectral windows are associated with one
or more diagnostic species.
196. A method according to claim 183, wherein at least one of said
one or more predetermined diagnostic spectral windows encompasses a
chemical shift value for an NMR resonance of a diagnostic
species.
197. A method according to claim 183, each of a plurality of said
one or more predetermined diagnostic spectral windows encompasses a
chemical shift value for an NMR resonance of a diagnostic
species.
198. A method of classifying a sample, said method comprising the
step of relating the amount of, or relative amount of one or more
diagnostic species present in said sample with a predetermined
condition.
199. A method according to claim 198, wherein said sample is a
sample from a subject and said predetermined condition is a
predetermined condition of said subject.
200. A method, according to claim 198, wherein said relating with a
predetermined condition is relating with the presence or absence of
a predetermined condition.
201. A method, according to claim 199, wherein said relating with a
predetermined condition is relating with the presence or absence of
a predetermined condition.
202. A method according to claim 198, wherein said relating the
amount of, or relative amount of one or more diagnostic species is
relating a modulation of the amount of, or relative amount of one
or more diagnostic species.
203. A method according to claim 199, wherein said relating the
amount of, or relative amount of one or more diagnostic species is
relating a modulation of the amount of, or relative amount of one
or more diagnostic species.
204. A method according to claim 200, wherein said relating the
amount of, or relative amount of one or more diagnostic species is
relating a modulation of the amount of, or relative amount of one
or more diagnostic species.
205. A method according to claim 201, wherein said relating the
amount of, or relative amount of one or more diagnostic species is
relating a modulation, of the amount of, or relative amount of one
or more diagnostic species.
206. A method according to claim 198, wherein said classification
is performed on the basis of an amount, or a relative amount, of a
single diagnostic species.
207. A method according to claim 198, wherein said classification
is performed on the basis of an amount, or a relative amount, of a
plurality of diagnostic species.
208. A method according to claim 198, wherein said classification
is performed on the basis of a total amount, or a relative total
amount, of a plurality of diagnostic species.
209. A method according to claim 198, wherein: said one or more
diagnostic species is: a plurality of diagnostic specie; and, said
amount of, or relative amount of one or more diagnostic species is:
a combination of a plurality of amounts, or relative amounts, each
of which is the amount of, or relative amount of one of said
plurality of diagnostic species.
210. A method according to claim 209, wherein said combination is a
linear combination.
211. A method of classifying a test sample, said method comprising
the step of: using a predictive mathematical model; wherein said
model is formed by applying a modelling method to modelling data;
to classify said test sample.
212. A method according to claim 211, wherein said modelling data
comprises a plurality of data sets for modelling samples of known
class; and said classifying is classifying said test sample as
being a member of one of said known classes.
213. A method according to claim 211, wherein said modelling data
comprises at least one data set for each of a plurality of
modelling samples; wherein said modelling samples define a class
group consisting of a plurality of classes; wherein each of said
modelling samples is of a known class selected from said class
group; and said model is used with a data set for said test sample;
and said classifying is classifying said test sample as being a
member of one class selected from said class group.
214. A method according to claim 213, wherein said class group
comprises classes associated with said predetermined condition.
215. A method according to claim 211, wherein said modelling method
is a multivariate statistical analysis modelling method.
216. A method according to claim 211, wherein said modelling method
is a multivariate statistical analysis modelling method which
employs a pattern recognition method.
217. A method according to claim 211, wherein said modelling method
is, or employs PCA, PLS, or PLS-DA.
218. A method according to claim 211, wherein said modelling method
includes a step of data filtering, orthogonal data filtering, or
OSC.
219. A method according to claim 211, wherein said modelling data
comprise NMR spectral data.
220. A method according to claim 211, wherein said modelling data
comprise both NMR spectral data and non-NMR spectral data.
221. A method according to claim 211, wherein said modelling data
comprises at least one data set for each of a plurality of
modelling samples.
222. A method according to claim 221, wherein each of said data
sets comprises NMR spectral data.
223. A method according to claim 221, wherein each of said data
sets comprises both NMR spectral data and non-NMR spectral
data.
224. A method of classifying a subject comprising classifying a
sample from said subject by a method according to claim 183.
225. A method of diagnosis of a predetermined condition of a
subject comprising classifying a sample from said subject by a
method according to claim 183.
226. A method of prognosis of a subject which employs a method
according to claim 183.
227. A method of therapeutic monitoring of a subject undergoing
therapy which employs a method according to claim 183.
228. A method of evaluating drug therapy and/or drug efficacy which
employs a method according to claim 183.
229. A method of identifying a diagnostic species, or a combination
of a plurality of diagnostic species, for a predetermined
condition, said method comprising the steps of: (a) applying a
multivariate statistical analysis method to experimental data;
wherein said experimental data comprises at least one data
comprising experimental parameters measured for each of a plurality
of experimental samples; wherein said experimental samples define a
class group consisting of a plurality of classes; wherein at least
one of said plurality of classes is a class associated with said
predetermined condition, e.g., a class associated with the presence
of said predetermined condition; wherein at least one of said
plurality of classes is a class not associated with said
predetermined condition, e.g., a class associated with the absence
of said predetermined condition; wherein each of said experimental
samples is of known class selected from said class group; and: (b)
identifying one or more critical experimental parameters; wherein
each of said critical experimental parameters is statistically
significantly different for classes of said class group, e.g., is
statistically significant for discriminating between classes of
said class group; and, (c) matching each of one or more of said one
or more critical experimental parameters with said diagnostic
species; or: (b) identifying a combination of a plurality of
critical experimental parameters; wherein said combination of a
plurality of critical experimental parameters is statistically
significantly different for classes of said class group, e.g., is
statistically significant for discriminating between classes of
said class group; and, (c) matching each of one or more of said
plurality of critical experimental parameters with said combination
of a plurality of diagnostic species.
230. A method, according to claim 229, wherein: one or more of said
critical experimental parameters is a spectral parameter, and said
identifying and matching steps are: (b) identifying one or more
critical experimental spectral parameters; and, (c) matching each
of one or more of said one or more critical experimental spectral
parameters with a spectral feature, e.g., a spectral peak; and
matching one or more of said spectral peaks with said diagnostic
species; or: (b) identifying a combination of a plurality of
critical experimental spectral parameters; and, (c) matching each
of a plurality of said plurality of critical experimental spectral
parameters with a spectral feature, e.g., a spectral peak; and
matching one or more of said spectral peaks with said combination
of a plurality of diagnostic species.
231. A method according to claim 229, wherein said multivariate
statistical analysis method is a multivariate statistical analysis
method which employs a pattern recognition method.
232. A method according to claim 229, wherein said multivariate
statistical analysis method is, or employs PCA, PLS or PLS-DA.
233. A method according to claim 229, wherein said multivariate
statistical analysis method includes a step of data filtering, a
step of orthogonal data filtering, or a step of OSC.
234. A method according to claim 229, wherein said experimental
parameters comprise NMR spectral data.
235. A method according to claim 229, wherein said experimental
parameters comprise both NMR spectral data and non-NMR spectral
data.
236. A method according to claim 229, wherein said class group
comprises classes associated with said predetermined condition.
237. A method according to claim 236, said method further
comprising the additional step of: (d) confirming the identity of
said diagnostic species.
238. A diagnostic species identified by a method according to claim
229.
239. A method of classification which employs or relies upon one or
more diagnostic species identified by a method according to claim
229.
240. An assay for use in a method of classification, which assay
relies upon one or more diagnostic species identified by a method
according to claim 229.
241. A computer program, optionally embodied on a computer readable
medium, comprising computer program means adapted to perform a
method according to claim 183, when said program is run on a
computer.
242. A system comprising: (a) a first component comprising a device
for obtaining NMR spectral intensity data for a sample; and, (b) a
second component comprising computer system or device, such as a
computer or linked computers, operatively configured to implement a
method according to claim 183, and operatively linked to said first
component.
Description
RELATED APPLICATIONS
[0001] This application is related to (and where permitted by law,
claims priority to):
[0002] (a) United Kingdom patent application GB 0109930.8 filed 23
Apr. 2001;
[0003] (b) United Kingdom patent application GB 0117428.3 filed 17
Jul. 2001;
[0004] (c) United States Provisional patent application U.S. Ser.
No. 60/307,015 filed 20 Jul. 2001;
[0005] the contents of each of which are incorporated herein by
reference in their entirety.
[0006] This application is one of five applications filed on even
date naming the same applicant:
[0007] (1) attorney reference number WJW/LP5995600
(PCT/GB02/______);
[0008] (2) attorney reference number WJW/LP5995618
(PCT/GB02/______);
[0009] (3) attorney reference number WJW/LP5995626
(PCT/GB02/______);
[0010] (4) attorney reference number WJW/LP5995634
(PCT/GB02/______);
[0011] (5) attorney reference number WJW/LP5995642
(PCT/GB02/______);
[0012] the contents of each of which are incorporated herein by
reference in their entirety.
TECHNICAL FIELD
[0013] This invention pertains generally to the field of
metabonomics, and, more particularly, to chemometric methods for
the analysis of chemical, biochemical, and biological data, for
example, spectral data, for example, nuclear magnetic resonance
(NMR) spectra, and their applications, including, e.g.,
classification, diagnosis, prognosis, etc.
BACKGROUND
[0014] Throughout this specification, including the claims which
follow, unless the context requires otherwise, the word "comprise,"
and variations such as "comprises" and "comprising," will be
understood to imply the inclusion of a stated integer or step or
group of integers or steps but not the exclusion of any other
integer or step or group of integers or steps.
[0015] It must be noted that, as used in the specification and the
appended claims, the singular forms "a," "an," and "the" include
plural referents unless the context clearly dictates otherwise.
Ranges are often expressed herein as from "about" one particular
value, and/or to "about" another particular value. When such a
range is expressed, another embodiment includes from the one
particular value and/or to the other particular value. Similarly,
when values are expressed as approximations, by the use of the
antecedent "about," it will be understood that the particular value
forms another embodiment.
[0016] Biosystems
[0017] Biosystems can conveniently be viewed at several levels of
bio-molecular organisation based on biochemistry, i.e., genetic and
gene expression (genomic and transcriptomic), protein and
signalling (proteomic) and metabolic control and regulation
(metabonomic). There are also important cellular ionic regulation
variations that relate to genetic, proteomic and metabolic
activities, and systematic studies on these even at the cellular
and sub-cellular level should also be investigated to complete the
full description of the bio-molecular organisation of a
bio-system.
[0018] Significant progress has been made in developing methods to
determine and quantify the biochemical processes occurring in
living systems. Such methods are valuable in the diagnosis,
prognosis and treatment of disease, the development of drugs, for
improving therapeutic regimes for current drugs, and the like.
[0019] Many diseases of the human or animal body (such as cancers,
degenerative diseases, autoimmune diseases and the like) have an
underlying basis in alterations in the expression of certain genes.
The expressed gene products, proteins, mediate effects such as
abnormal cell growth, cell death or inflammation. Some of these
effects are caused directly by protein-protein interactions; other
are caused by proteins acting on small molecules (e.g. "second
messengers") which trigger effects including further gene
expression.
[0020] Likewise, disease states caused by external agents such as
viruses and bacteria provoke a multitude of complex responses in
infected host.
[0021] In a similar manner, the treatment of disease through the
administration of drugs can result in a wide range of desired
effects and unwanted side effects in a patient.
[0022] In recent years, it has been appreciated that the reaction
of human and animal subjects to disease and treatments for them can
vary according to the genomic makeup of an individual. This has led
to the development of the field of "pharmacogenomics." A fuller
understanding of how an individual's own genome reacts to a
particular disease and/or drug treatment will allow the development
of new therapies, as well as the refinement of existing ones.
[0023] At the genetic level, methods for examining gene expression
in response to these types of events are often referred to as
"genomic methods," and are concerned with the detection and
quantification of the expression of an organism's genes,
collectively referred to as its "genome," usually by detecting
and/or quantifying genetic molecules, such as DNA and RNA. Genomic
studies often exploit proprietary "gene chips," which are small
disposable devices encoded with an array of genes that respond to
extracted mRNAs produced by cells (see, for example, Klenk et al.,
1997). Many genes can be placed on a chip array and patterns of
gene expression, or changes therein, can be monitored rapidly,
although at some considerable cost.
[0024] However, the biological consequences of gene expression, or
altered gene expression following perturbation, are extremely
complex. This has led to the development of "proteomic methods"
which are concerned with the semi-quantitative measurement of the
production of cellular proteins of an organism, collectively
referred to as its "proteome" (see, for example, Geisow, 1998).
Proteomic measurements utilise a variety of technologies, but all
involve a protein separation method, e.g., 2D gel-electrophoresis,
allied to a chemical characterisation method, usually, some form of
mass spectrometry.
[0025] At present, genomic methods have a high associated
operational cost and proteomic methods require investment in
expensive capital cost equipment and are labour intensive, but both
have the potential to be powerful tools for studying biological
response. The choice of method is still uncertain since careful
studies have sometimes shown a low correlation between the pattern
of gene expression and the pattern of protein expression, probably
due to sampling for the two technologies at inappropriate time
points. See, e.g., Gygi et al., 1999. Even in combination, genomic
and proteomic methods still do not provide the range of information
needed for understanding integrated cellular function in a living
system, since they do not take account of the dynamic metabolic
status of the whole organism.
[0026] For example, genomic and proteomic studies may implicate a
particular gene or protein in a disease or a xenobiotic response
because the level of expression is altered, but the change in gene
or protein level may be transitory or may be counteracted
downstream and as a result there may be no effect at the cellular
and/or biochemical level. Conversely, sampling tissue for genomic
and proteomic studies at inappropriate time points may result in a
relevant gene or protein being overlooked.
[0027] Gene-based prognosis has yet to become a clinical reality
for any major prevalent disease, almost all of which have
multi-gene modes of inheritance and significant environmental
impact making it difficult to identify the gene panels responsible
for susceptibility.
[0028] While genomic and proteomic methods may be useful aids, for
example, in drug development, they do suffer from substantial
limitations. For example, while genomic and proteomic methods may
ultimately give profound insights into toxicological mechanisms and
provide new surrogate biomarkers of disease, at present it is very
difficult to relate genomic and proteomic findings to classical
cellular or biochemical indices or endpoints. One simple reason for
this is that with current technology and approach, the correlation
of the time-response to drug exposure is difficult. Further
difficulties arise with in vitro cell-based studies. These
difficulties are particularly important for the many known cases
where the metabolism of the compound is a prerequisite for a toxic
effect and especially true where the target organ is not the site
of primary metabolism. This is particularly true for pro-drugs,
where some aspect of in situ chemical (e.g., enzymatic)
modification is required for activity.
[0029] Metabonomics
[0030] A new "metabonomic" approach has been developed which is
aimed at augmenting and complementing the information provided by
genomics and proteomics. "Metabonomics" is conventionally defined
as "the quantitative measurement of the multiparametric metabolic
response of living systems to pathophysiological stimuli or genetic
modification" (see, for example, Nicholson et al., 1999). This
concept has arisen primarily from the application of .sup.1H NMR
spectroscopy to study the metabolic composition of biofluids,
cells, and tissues and from studies utilising pattern recognition
(PR), expert systems and other chemoinformatic tools to interpret
and classify complex NMR-generated metabolic data sets. Metabonomic
methods have the potential, ultimately, to determine the entire
dynamic metabolic make-up of an organism.
[0031] As outlined above, each level of bio-molecular organisation
requires a series of analytical bio-technologies appropriate to the
recovery of the individual types of bio-molecular data. Genomic,
proteomic and metabonomic technologies by definition generate
massive data sets which require appropriate multi-variate
statistical tools (chemometrics, bio-informatics) for data mining
and to extract useful biological information. These data
exploration tools also allow the inter-relationships between
multivariate data sets from the different technologies to be
investigated, they facilitate dimension reduction and extraction of
latent properties and allow multidimensional visualization.
[0032] This leads to the concept of "bionomics", the quantitative
measurement and understanding of the integrated function (and
dysfunction)of biological systems at all major levels of
bio-molecular organisation. In the study of altered gene
expression, (known as transcriptomics), the variables are mRNA
responses measured using gene chips, in proteomics, protein
synthesis and associated post-translational modifications are
typically measured using (mainly) gel-electrophoresis coupled to
mass spectrometry. In both cases, thousands of variables can be
measured and related to biological end-points using statistical
methods. In metabolic (metabonomic) studies, only NMR (especially
.sup.1H) and mass spectrometry has been used to provide this level
of data density on bio-materials although these data can be
supplemented by conventional biochemical assays.
[0033] For in vivo mammalian studies, the ability to perform
metabonomic studies on biofluids such as plasma, CSF and urine is
very important because it gives integrated systems-based
information on the whole organism. Furthermore, in clinical
settings, for the full utilization of functional genomic knowledge
in patient screening, diagnostics and prognostics, it is much more
practical and ethically-acceptable to analyze biofluid samples than
to perform human tissue biopsies and measure gene responses.
[0034] A pathological condition or a xenobiotic may act at the
pharmacological level only and hence may not affect gene regulation
or expression directly. Alternatively significant disease or
toxicological effects may be completely unrelated to gene
switching. For example, exposure to ethanol in vivo may cause many
changes in gene expression but none of these events explains
drunkenness. In cases such as these, genomic and proteomic methods
are likely to be ineffective. However, all disease or drug-induced
pathophysiological perturbations result in disturbances in the
ratios and concentrations, binding or fluxes of endogenous
biochemicals, either by direct chemical reaction or by binding to
key enzymes or nucleic acids that control metabolism. If these
disturbances are of sufficient magnitude, effects will result which
will affect the efficient functioning of the whole organism. In
body fluids, metabolites are in dynamic equilibrium with those
inside cells and tissues and, consequently, abnormal cellular
processes in tissues of the whole organism following a toxic insult
or as a consequence of disease will be reflected in altered
biofluid compositions.
[0035] Fluids secreted, excreted, or otherwise derived from an
organism ("biofluids") provide a unique window into its biochemical
status since the composition of a given biofluid is a consequence
of the function of the cells that are intimately concerned with the
fluid's manufacture and secretion. For example, the composition of
a particular fluid (e.g., urine, blood plasma, milk, etc.) can
carry biochemical information on details of organ function (or
dysfunction), for example, as a result of xenobiotics, disease,
and/or genetic modification. Similarly, the composition and
condition of an organism's tissues are also indicators of the
organism's biochemical status.
[0036] In general, a xenobiotic is a substance (e.g., compound,
composition) which is administered to an organism, or to which the
organism is exposed. In general, xenobiotics are chemical,
biochemical or biological species (e.g., compounds) which are not
normally present in that organism, or are normally present in that
organism, but not at the level obtained following
administration/exposure. Examples of xenobiotics include drugs,
formulated medicines and their components (e.g., vaccines,
immunological stimulants, inert carrier vehicles), infectious
agents, pesticides, herbicides, substances present in foods (e.g.
plant compounds administered to animals), and substances present in
the environment.
[0037] In general, a disease state pertains to a deviation from the
normal healthy state of the organism. Examples of disease states
include, but are not limited to, bacterial, viral, and parasitic
infections; cancer in all its forms; degenerative diseases (e.g.,
arthritis, multiple sclerosis); trauma (e.g., as a result of
injury); organ failure (including diabetes); cardiovascular disease
(e.g., atherosclerosis, thrombosis); and, inherited diseases caused
by genetic composition (e.g., sickle-cell anaemia).
[0038] In general, a genetic modification pertains to alteration of
the genetic composition of an organism. Examples of genetic
modifications include, but are not limited to: the incorporation of
a gene or genes into an organism from another species; increasing
the number of copies of an existing gene or genes in an organism;
removal of a gene or genes from an organism; and, rendering a gene
or genes in an organism non-functional.
[0039] Biofluids often exhibit very subtle changes in metabolite
profile in response to external stimuli. This is because the body's
cellular systems attempt to maintain homeostasis (constancy of
internal environment), for example, in the face of cytotoxic
challenge. One means of achieving this is to modulate the
composition of biofluids. Hence, even when cellular homeostasis is
maintained, subtle responses to disease or toxicity are expressed
in altered biofluid composition. However, dietary, diurnal and
hormonal variations may also influence biofluid compositions, and
it is clearly important to differentiate these effects if correct
biochemical inferences are to be drawn from their analysis.
[0040] Metabonomics offers a number of distinct advantages (over
genomics and proteomics) in a clinical setting: firstly, it can
often be performed on standard preparations (e.g., of serum,
plasma, urine, etc.), circumventing the need for specialist
preparations of cellular RNA and protein required for genomics and
proteomics, respectively. Secondly, many of the risk factors
already identified (e.g., levels of various lipids in blood) are
small molecule metabolites which will contribute to the metabonomic
dataset.
[0041] Application of NMR to Metabonomics
[0042] One of the most successful approaches to biofluid analysis
has been the use of NMR spectroscopy (see, for example, Nicholson
et al., 1989); similarly, intact tissues have been successfully
analysed using magic-angle-spinning .sup.1H NMR spectroscopy (see,
for example, Moka et al., 1998; Tomlins et al., 1998).
[0043] The NMR spectrum of a biofluid provides a metabolic
fingerprint or profile of the organism from which the biofluid was
obtained, and this metabolic fingerprint or profile is
characteristically changed by a disease, toxic process, or genetic
modification. For example, NMR spectra may be collected for various
states of an organism (e.g., pre-dose and various times post-dose,
for one or more xenobiotics, separately or in combination; healthy
(control) and diseased animal; unmodified (control) and genetically
modified animal).
[0044] For example, in the evaluation of undesired toxic
side-effects of drugs, each compound or class of compound produces
characteristic changes in the concentrations and patterns of
endogenous metabolites in biofluids that provide information on the
sites and basic mechanisms of the toxic process. .sup.1H NMR
analysis of biofluids has successfully uncovered novel metabolic
markers of organ-specific toxicity in the laboratory rat, and it is
in this "exploratory" role that NMR as an analytical biochemistry
technique excels. However, the biomarker information in NMR spectra
of biofluids is very subtle, as hundreds of compounds representing
many pathways can often be measured simultaneously, and it is this
overall metabonomic response to toxic insult that so well
characterises the lesion.
[0045] Another important advantage of NMR-based metabonomics over
genomics or proteomics is the intrinsic analytical accuracy of NMR
spectroscopy. Reanalysis of the same sample by 1H NMR spectroscopy
results in a typical coefficient of variation for the measurement
of peak intensities in a spectrum of less than 5% across the whole
range of peaks. Thus if the appropriate experiments are undertaken,
on average the value of each peak intensity will lie in the range
0.95 to 1.05 of the true value. In addition, it is possible using
NMR spectroscopy to measure absolute amounts or concentrations of a
number of analytes whereas using gene chip technology only fold
changes can be determined. The best available accuracy achieved
using gene chips is a two fold change, i.e., the value for each
parameter lies in the range 0.50 to 2.00 fold of the "true" value)
and proteomic technology is even less intrinsically accurate. A
similar limitation also applies to proteomic studies.
[0046] Although, undoubtedly, technology is improving at a rapid
rate the gap between the intrinsic accuracies of NMR spectroscopy
and gene chip technology is so wide that it will require a
revolutionary rather than evolutionary improvement in gene
expression quantification methodology before it can rival the
accuracy of NMR spectroscopy.
[0047] The intrinsic accuracy of NMR provides a distinct advantage
when applying pattern recognition techniques. The multivariate
nature of the NMR data means that classification of samples is
possible using a combination of descriptors even when one
descriptor is not sufficient, because of the inherently low
analytical variation in the data. All biological fluids and tissues
have their own characteristic physico-chemical properties, and
these affect the types of NMR experiment that may be usefully
employed. One major advantage of using NMR spectroscopy to study
complex biomixtures is that measurements can often be made with
minimal sample preparation (usually with only the addition of 5-10%
D.sub.2O) and a detailed analytical profile can be obtained on the
whole biological sample. Sample volumes are small, typically 0.3 to
0.5 mL for standard probes, and as low as 3 .mu.L for microprobes.
Acquisition of simple NMR spectra is rapid and efficient using
flow-injection technology. It is usually necessary to suppress the
water NMR resonance.
[0048] Many biofluids are not chemically stable and for this reason
care should be taken in their collection and storage. For example,
cell lysis in erythrocytes can easily occur. If a substantial
amount of D.sub.2O has been added, then it is possible that certain
.sup.1H NMR resonances will be lost by H/D exchange. Freeze-drying
of biofluid samples also causes the loss of volatile components
such as acetone. Biofluids are also very prone to microbiological
contamination, especially fluids, such as urine, which are
difficult to collect under sterile conditions. Many biofluids
contain significant amounts of active enzymes, either normally or
due to a disease state or organ damage, and these enzymes may alter
the composition of the biofluid following sampling. Samples should
be stored deep frozen to minimise the effects of such
contamination. Sodium azide is usually added to urine at the
collection point to act as an antimicrobial agent. Metal ions and
or chelating agents (e.g., EDTA) may be added to bind to endogenous
metal ions (e.g., Ca.sup.2+, Mg.sup.2+ and Zn.sup.2+) and chelating
agents (e.g., free amino acids, especially glutamate, cysteine,
histidine and aspartate; citrate) to intentionally alter and/or
enhance the NMR spectrum.
[0049] In all cases the analytical problem usually involves the
detection of "trace" amounts of analytes in a very complex matrix
of potential interferences. It is, therefore, critical to choose a
suitable analytical technique for the particular class of analyte
of interest in the particular biomatrix which could be, for
example, a biofluid or a tissue. High resolution NMR spectroscopy
(in particular .sup.1H NMR) appears to be particularly appropriate.
The main advantages of using .sup.1H NMR spectroscopy in this area
are the speed of the method (with spectra being obtained in 5 to 10
minutes), the requirement for minimal sample preparation, and the
fact that it provides a non-selective detector for all metabolites
in the biofluid regardless of their structural type, provided only
that they are present above the detection limit of the NMR
experiment and that they contain non-exchangeable hydrogen atoms.
The speed advantage is of crucial importance in this area of work
as the clinical condition of a patient may require rapid diagnosis,
and can change very rapidly and so correspondingly rapid changes
must be made to the therapy provided.
[0050] NMR studies of body fluids should ideally be performed at
the highest magnetic field available to obtain maximal dispersion
and sensitivity and most .sup.1H NMR studies have been performed at
400 MHz or greater. With every new increase in available
spectrometer frequency the number of resonances that can be
resolved in a biofluid increases and although this has the effect
of solving some assignment problems, it also poses new ones.
Furthermore, there are still important problems of spectral
interpretation that arise due to compartmentation and binding of
small molecules in the organised macromolecular domains that exist
in some biofluids such as blood plasma and bile. All this
complexity need not reduce the diagnostic capabilities and
potential of the technique, but demonstrates the problems of
biological variation and the influence of variation on diagnostic
certainty.
[0051] The information content of biofluid spectra is very high and
the complete assignment of the .sup.1H NMR spectrum of most
biofluids is usually not possible (even using 900 MHz NMR
spectroscopy). However, the assignment problems vary considerably
between biofluid types. Some fluids have near constant composition
and concentrations and in these the majority of the NMR signals
have been assigned. In contrast, urine composition can be very
variable and there is enormous variation in the concentration range
of NMR-detectable metabolites; consequently, complete analysis is
much more difficult. Those metabolites present close to the limits
of detection for 1-dimensional (1D) NMR spectroscopy (typically ca.
100 nM at 800 MHz) pose severe NMR spectral assignment problems.
(In absolute terms, the detection limit may be ca. 4 nmol, e.g., 1
.mu.g of a 250 g/mol compound in a 0.5 mL sample volume.) Even at
the present level of technology in NMR, it is not yet possible to
detect many important biochemical substances (e.g. hormones, some
proteins, nucleic acids) in body fluids because of problems with
sensitivity, line widths, dispersion and dynamic range and this
area of research will continue to be technology-limited. In
addition, the collection of NMR spectra of biofluids may be
complicated by the relative water intensity, sample viscosity,
protein content, lipid content, and low molecular weight peak
overlap.
[0052] Usually in order to assign .sup.1H NMR spectra, comparison
is made with spectra of authentic materials and/or by standard
addition of an authentic reference standard to the sample.
Additional confirmation of assignments is usually sought from the
application of other NMR methods, including, for example,
2-dimensional (2D) NMR methods, particularly COSY (correlation
spectroscopy), TOCSY (total correlation spectroscopy),
inverse-detected heteronuclear correlation methods such as HMBC
(heteronuclear multiple bond correlation), HSQC (heteronuclear
single quantum coherence), and HMQC (heteronuclear multiple quantum
coherence), 2D J-resolved (JRES) methods, spin-echo methods,
relaxation editing, diffusion editing (e.g., both ID NMR and 2D NMR
such as diffusion-edited TOCSY), and multiple quantum filtering.
Detailed .sup.1H NMR spectroscopic data for a wide range of
metabolites and biomolecules found in biofluids have been published
(see, for example, Lindon et al., 1999) and supplementary
information is available in several literature compilations of data
(see, for example, Fan, 1996; Sze et al., 1994).
[0053] For example, the successful application of .sup.1H NMR
spectroscopy of biofluids to study a variety of metabolic diseases
and toxic processes has now been well established and many novel
metabolic markers of organ-specific toxicity have been discovered
(see, for example, Nicholson et al., 1989; Lindon et al., 1999).
For example, NMR spectra of urine is identifiably altered in
situations where damage has occurred to the kidney or liver. It has
been shown that specific and identifiable changes can be observed
which distinguish the organ that is the site of a toxic lesion.
Also it is possible to focus in on particular parts of an organ
such as the cortex of the kidney and even in favourable cases to
very localised parts of the cortex.
[0054] It is also possible to deduce the biochemical mechanism of
the xenobiotic toxicity, based on a biochemical interpretation of
the changes in the urine. A wide range of toxins has now been
investigated including mostly kidney toxins and liver toxins, but
also testicular toxins, mitochondrial toxins and muscle toxins.
[0055] Pattern Recognition
[0056] However, a limiting factor in understanding the biochemical
information from both 1D and 2D-NMR spectra of tissues and
biofluids is their complexity. The most efficient way to
investigate these complex multiparametric data is employ the 1D and
2D NMR metabonomic approach in combination with computer-based
"pattern recognition" (PR) methods and expert systems. These
statistical tools are similar to those currently being explored by
workers in the fields of genomics and proteomics.
[0057] Pattern recognition (PR) methods can be used to reduce the
complexity of data sets, to generate scientific hypotheses and to
test hypotheses. In general, the use of pattern recognition
algorithms allows the identification, and, with some methods, the
interpretation of some non-random behaviour in a complex system
which can be obscured by noise or random variations in the
parameters defining the system. Also, the number of parameters used
can be very large such that visualisation of the regularities,
which for the human brain is best in no more than three dimensions,
can be difficult. Usually the number of measured descriptors is
much greater than three and so simple scatter plots cannot be used
to visualise any similarity between samples. Pattern recognition
methods have been used widely to characterise many different types
of problem ranging for example over linguistics, fingerprinting,
chemistry and psychology. In the context of the methods described
herein, pattern recognition is the use of multivariate statistics,
both parametric and non-parametric, to analyse spectroscopic data,
and hence to classify samples and to predict the value of some
dependent variable based on a range of observed measurements. There
are two main approaches. One set of methods is termed
"unsupervised" and these simply reduce data complexity in a
rational way and also produce display plots which can be
interpreted by the human eye. The other approach is termed
"supervised" whereby a training set of samples with known class or
outcome is used to produce a mathematical model and this is then
evaluated with independent validation data sets.
[0058] Unsupervised PR methods are used to analyse data without
reference to any other independent knowledge, for example, without
regard to the identity or nature of a xenobiotic or its mode of
action. Examples of unsupervised pattern recognition methods
include principal component analysis (PCA), hierarchical cluster
analysis (HCA), and non-linear mapping (NLM).
[0059] One of the most useful and easily applied unsupervised PR
techniques is principal components analysis (PCA) (see, for
example, Kowalski et al, 1986). Principal components (PCs) are new
variables created from linear combinations of the starting
variables with appropriate weighting coefficients. The properties
of these PCs are such that (i) each PC is orthogonal to
(uncorrelated with) all other PCs, and (ii) the first PC contains
the largest part of the variance of the data set (information
content) with subsequent PCs containing correspondingly smaller
amounts of variance.
[0060] PCA, a dimension reduction technique, takes m objects or
samples, each described by values in K dimensions (descriptor
vectors), and extracts a set of eigenvectors, which are linear
combinations of the descriptor vectors. The eigenvectors and
eigenvalues are obtained by diagonalisation of the covariance
matrix of the data. The eigenvectors can be thought of as a new set
of orthogonal plotting axes, called principal components (PCs). The
extraction of the systematic variations in the data is accomplished
by projection and modelling of variance and covariance structure of
the data matrix. The primary axis is a single eigenvector
describing the largest variation in the data, and is termed
principal component one (PC1). Subsequent PCs, ranked by decreasing
eigenvalue, describe successively less variability. The variation
in the data that has not been described by the PCs is called
residual variance and signifies how well the model fits the data.
The projections of the descriptor vectors onto the PCs are defined
as scores, which reveal the relationships between the samples or
objects. In a graphical representation (a "scores plot" or
eigenvector projection), objects or samples having similar
descriptor vectors will group together in clusters. Another
graphical representation is called a loadings plot, and this
connects the PCs to the individual descriptor vectors, and displays
both the importance of each descriptor vector to the interpretation
of a PC and the relationship among descriptor vectors in that PC.
In fact, a loading value is simply the cosine of the angle which
the original descriptor vector makes with the PC. Descriptor
vectors which fall close to the origin in this plot carry little
information in the PC, while descriptor vectors distant from the
origin (high loading) are important in interpretation.
[0061] Thus a plot of the first two or three PC scores gives the
"best" representation, in terms of information content, of the data
set in two or three dimensions, respectively. A plot of the first
two principal component scores, PC1 and PC2 provides the maximum
information content of the data in two dimensions. Such PC maps can
be used to visualise inherent clustering behaviour, for example,
for drugs and toxins based on similarity of their metabonomic
responses and hence mechanism of action. Of course, the clustering
information might be in lower PCs and these have also to be
examined.
[0062] Hierarchical Cluster Analysis, another unsupervised pattern
recognition method, permits the grouping of data points which are
similar by virtue of being "near" to one another in some
multidimensional space. Individual data points may be, for example,
the signal intensities for particular assigned peaks in an NMR
spectrum. A "similarity matrix," S, is constructed with elements
s.sub.ij=1-r.sub.ij/r.sub.ij.sup.max, where r.sub.ij is the
interpoint distance between points i and j (e.g., Euclidean
interpoint distance), and r.sub.ij.sup.max is the largest
interpoint distance for all points. The most distant pair of points
will have s.sub.ij equal to 0, since r.sub.ij then equals
r.sub.ij.sup.max. Conversely, the closest pair of points will have
the largest s.sub.ij. For two identical points, s.sub.ij is 1.
[0063] The similarity matrix is scanned for the closest pair of
points. The pair of points are reported with their separation
distance, and then the two points are deleted and replaced with a
single combined point. The process is then repeated iteratively
until only one point remains. A number of different methods may be
used to determine how two clusters will be joined, including the
nearest neighbour method (also known as the single link method),
the furthest neighbour method, and the centroid method (including
centroid link, incremental link, median link, group average link,
and flexible link variations).
[0064] The reported connectivities are then plotted as a dendrogram
(a treelike chart which allows visualisation of clustering),
showing sample-sample connectivities versus increasing separation
distance (or equivalently, versus decreasing similarity). The
dendrogram has the property in which the branch lengths are
proportional to the distances between the various clusters and
hence the length of the branches linking one sample to the next is
a measure of their similarity. In this way, similar data points may
be identified algorithmically.
[0065] Non-linear mapping (NLM) is a simple concept which involves
calculation of the distances between all of the points in the
original K dimensions. This is followed by construction of a map of
points in 2 or 3 dimensions where the sample points are placed in
random positions or at values determined by a prior principal
components analysis. The least squares criterion is used to move
the sample points in the lower dimension map to fit the inter-point
distances in the lower dimension space to those in the K
dimensional space. Non-linear mapping is therefore an approximation
to the true inter-point distances, but points close in
K-dimensional space should also be close in 2 or 3 dimensional
space (see, for example, Brown et al., 1996; Farrant et al.,
1992).
[0066] In this simple metabonomic approach, a sample from an animal
treated with a compound of unknown toxicity is compared with a
database of NMR-generated metabolic data from control and
toxin-treated animals. By observing its position on the PR map
relative to samples of known effect, the unknown toxin can often be
classified. The same approach can be used for human samples for
classification according to disease. However, such data are often
more complex, with time-related biochemical changes detected by
NMR. Also, it is more rigorous to compare effects of xenobiotics in
the original K-dimensional NMR metabonomic space.
[0067] Alternatively, and in order to develop automatic
classification methods, it has proved efficient to use a
"supervised" approach to NMR data analysis. Here, a "training set"
of NMR metabonomic data is used to construct a statistical model
that predicts correctly the "class" of each sample. This training
set is then tested with independent data (referred to as a test or
validation set) to determine the robustness of the computer-based
model. These models are sometimes termed "expert systems," but may
be based on a range of different mathematical procedures.
Supervised methods can use a data set with reduced dimensionality
(for example, the first few principal components), but typically
use unreduced data, with all dimensionality. In all cases the
methods allow the quantitative description of the multivariate
boundaries that characterise and separate each class, for example,
each class of xenobiotic in terms of its metabolic effects. It is
also possible to obtain confidence limits on any predictions, for
example, a level of probability to be placed on the goodness of fit
(see, for example, Kowalski et al., 1986). The robustness of the
predictive models can also be checked using cross-validation, by
leaving out selected samples from the analysis.
[0068] Expert systems may operate to generate a variety of useful
outputs, for example, (i) classification of the sample as "normal"
or "abnormal" (this is a useful tool in the control of spectrometer
automation, e.g., using sequential flow injection NMR
spectroscopy); (ii) classification of the target organ for toxicity
and site of action within the tissue where in certain cases,
mechanism of toxic action may also be classified; and, (iii)
identification of the biomarkers of a pathological disease
condition or toxic effect for the particular compound under study.
For example, a sample can be classified as belonging to a single
class of toxicity, to multiple classes of toxicity (more than one
target organ), or to no class. The latter case would indicate
deviation from normality (control) based on the training set model
but having a dissimilar metabolic effect to any toxicity class
modelled in the training set (unknown toxicity type). Under (ii), a
system could also be generated to support decisions in clinical
medicine (e.g., for efficacy of drugs) rather than toxicity.
[0069] Examples of supervised pattern recognition methods include
the following:
[0070] soft independent modelling of class analysis (SIMCA) (see,
for example, Wold, 1976);
[0071] partial least squares analysis (PLS) (see, for example,
Wold, 1966; Joreskog, 1982; Frank, 1984; Bro, R., 1997);
[0072] linear descriminant analysis (LDA) (see, for example,
Nillson, 1965);
[0073] K-nearest neighbour analysis (KNN) (see, for example, Brown
et al., 1996);
[0074] artificial neural networks (ANN) (see, for example,
Wasserman, 1989; Anker et al., 1992; Hare, 1994);
[0075] probabilistic neural networks (PNNs) (see, for example,
Parzen, 1962; Bishop, 1995; Speckt, 1990; Broomhead et al., 1988;
Patterson, 1996);
[0076] rule induction (RI) (see, for example, Quinlan, 1986);
and,
[0077] Bayesian methods (see, for example, Bretthorst, 1990a,
1990b, 1988).
[0078] As the size of metabonomic databases increases together with
improvements in rapid throughput of NMR samples (>300 samples
per day per spectrometer is now possible with the first generation
of flow injection systems), more subtle expert systems may be
necessary, for example, using techniques such as "fuzzy logic"
which permit greater flexibility in decision boundaries.
[0079] Application to Metabonomics
[0080] Pattern recognition methods have been applied to the
analysis of metabonomic data. See, for example, Lindon et al.,
2001. A number of spectroscopic techniques have been used to
generate the data, including NMR spectroscopy and mass
spectrometry. Pattern recognition analysis of such data sets has
been succesful in some cases. The successful studies include, for
example, complex NMR data from biofluids, (see, for example,
Anthony et al., 1994; Anthony et al., 1995; Beckwith-Hall et al.,
1998; Gartland et al., 1990a; Gartland et al., 1990b; Gartland et
al., 1991; Holmes et al., 1998a; Holmes et al., 1998b; Holmes et
al., 1992; Holmes et al., 1994; Spraul et al., 1994; Tranter et
al., 1999) conventional NMR spectra from tissue samples (Somorjai
et al., 1995), magic-angle-spinning (MAS) NMR spectra of tissues
(Garrod et al., 2001), in vivo NMR spectra (Morvan et al., 1990;
Howells et al., 1993; Stoyanova et al., 1995; Kuesel et al., 1996;
Confort-Gouny et al., 1992; Weber et al., 1998), wines (Martin et
al., 1998, 1999) and plant tissues (Kopka et al., 2000).
[0081] Although the utility of the metabonomic approach is well
established, its full potential has not yet been exploited. The
metabolic variation is often subtle, and powerful analysis methods
are required for detection of particular analytes, especially when
the data (e.g., NMR spectra) are so complex. For example, all that
has been previously proposed is still not generally sufficient to
achieve clinically useful diagnosis of disease. New methods to
extract useful metabolic information from biofluids are needed.
[0082] The inventors have developed novel methods (which employ
multivariate statistical analysis and pattern recognition (PR)
techniques, and optionally data filtering techniques) of analysing
data (e.g., NMR spectra) from a test population which yield
accurate mathematical models which may subsequently be used to
classify a test sample or subject, and/or in diagnosis.
[0083] Unlike methods previously described, the methods described
herein have the power to provide clinically useful and accurate
diagnostic and prognostic information in a medical setting.
[0084] The methods described herein represent a significant advance
over chemometric methodologies described previously. Although
chemometrics has been able to provide some classification of types
previously, the studies have required that the classification be
done under a series of restrictions which limit the ability to
apply the method to analysis of complex datasets as would be
required to apply the method for the practical diagnosis/prognosis
of diseases that could be useful clinically.
[0085] For example, several studies have reported on the
classification of animals on the basis of an NMR spectrum of urine
or plasma. Although these studies clearly demonstrate the potential
of the technique, they are limited because the animals which
compose each class are genetically homogenous (in-bred
populations). As a result, these methods have been demonstrated to
be able to detect patterns but only against "low noise"
backgrounds. Application of metabonomics to "real" populations
(e.g., in human clinical practice) requires the ability to detect
patterns against the substantial noise due to the genetic variation
of outbred populations and also due to dietary and hormonal
differences.
[0086] Similarly, many of the studies described to date have
examined relatively major differences between groups, for example,
the ability to differentiate renally acting toxins from liver
acting toxins. The two groups under study differed in a broad
spectrum of metabolites making the pattern relatively easy to
detect. In conjugation with the restriction of using in-bred
populations of animals, most studies published to date have only
demonstrated metabonomics to be practicable under conditions of
high "signal to noise" ratio, conditions which are very different
from the human clinical environment.
[0087] Some studies have begun to attempt classifications of
out-bred human populations where the data variation is high.
However, to date, all these studies have simplified the system
substantially to focus in on specific molecules: for example, some
studies have looked specifically at the resonances associated with
lipoproteins. Since lipoproteins are major constituents of plasma,
the variance they contribute readily exceeds the background
variance due to genetic and environmental differences between
individuals. Unfortunately, such an approach is insufficiently
powerful to identify weak patterns against the background
biochemical noise, and could not be used, for example, to determine
the extent of coronary heart disease or to distinguish identical
from non-identical twins. Identification of such low "signal to
noise" ratio patterns requires the application of the methods of
this invention, which represent a significant advance over what has
been previously reported.
SUMMARY OF THE INVENTION
[0088] One aspect of the present invention pertains to a method of
classifying a sample, as described herein.
[0089] One aspect of the present invention pertains to a method of
classifying a subject as described herein.
[0090] One aspect of the present invention pertains to a method of
diagnosing a subject as described herein.
[0091] One aspect of the present invention pertains to a method of
identifying a diagnostic species, or a combination of a plurality
of diagnostic species, for a predetermined condition, as described
herein.
[0092] One aspect of the present invention pertains to a diagnostic
species identified by a method as described herein.
[0093] One aspect of the present invention pertains to a diagnostic
species identified by a method as described herein, for use in a
method of classification.
[0094] One aspect of the present invention pertains to a method of
classification which employs or relies upon one or more diagnostic
species identified by a method as described herein
[0095] One aspect of the present invention pertains to use of one
or more diagnostic species identified by a method of classification
as described herein.
[0096] One aspect of the present invention pertains to an assay for
use in a method of classification, which assay relies upon one or
more diagnostic species identified by a method as described
herein.
[0097] One aspect of the present invention pertains to use of an
assay in a method of classification, which assay relies upon one or
more diagnostic species identified by a method as described
herein.
[0098] One aspect of the present invention pertains to a method of
therapeutic monitoring of a subject undergoing therapy which
employs a method of classification as described herein.
[0099] One aspect of the present invention pertains to a method of
evaluating drug therapy and/or drug efficacy which employs a method
of classification, as described herein.
[0100] One aspect of the present invention pertains to a computer
system or device, such as a computer or linked computers,
operatively configured to implement a method as described herein;
and related computer code computer programs, data carriers carrying
such code and programs, and the like.
[0101] These and other aspects of the present invention are
described herein.
[0102] As will be appreciated by one of skill in the art, features
and preferred embodiments of one aspect of the present invention
will also pertain to other aspects of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0103] Twins
[0104] FIG. 1-1A-TW is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the PCA model (principal components analysis) derived
from 1-D .sup.1H-NMR spectra of serum from monozygote twins (MZ)
(triangles, .tangle-solidup.) and dizygote twins (DZ) (circles,
.circle-solid.).
[0105] FIG. 1-1B-TW is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 1-1A-TW.
[0106] FIG. 1-1C-TW is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the PCA model derived from 1-D .sup.1H-NMR spectra of
serum from MZ twins (triangles, .tangle-solidup.) and DZ twins
(circles, .circle-solid.). Prior to PCA, the data were filtered (in
this case, using orthogonal signal correction, OSC).
[0107] FIG. 1-1D-TW is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 1-1C-TW.
[0108] FIG. 1-1E-TW is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the PLS-DA model derived from 1-D .sup.1H NMR spectra
of serum from MZ twins (triangles, .tangle-solidup.) and DZ twins
(circles, .circle-solid.). Prior to PCA, the data were filtered (in
this case, using orthogonal signal correction, OSC).
[0109] FIG. 1-1F-TW is the corresponding loadings scatter plot (p2
vs. p1) for the PLS-DA shown in FIG. 1-1E-TW.
[0110] FIG. 1-2A-TW shows a section of the variable importance plot
(VIP) for the OSC-PLS-DA model, showing the calculated importance
of the 21 most important variables.
[0111] FIG. 1-2B-TW is a plot of the regression coefficients of the
1-D .sup.1H NMR variables for the MZ and DZ serum samples, derived
from the OSC-PLS-DA model. Each bar represents a spectral region
covering .delta. 0.04.
[0112] FIG. 1-3-TW is a y-predicted scatter plot, showing MZ twins
(triangles, .tangle-solidup.) and DZ twins (circles,
.circle-solid.) samples and validation samples (diamonds,
.diamond-solid.), for a partial least squares discriminant analysis
(PLS-DA) model calculated for the same data, following OSC.
[0113] Hypertension
[0114] FIG. 2-1A-HYP is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the principal components analysis (PCA) model derived
from 1-D .sup.1H NMR spectra from serum samples from patients with
low SBP (triangles, .tangle-solidup.), middle SBP (circles,
.circle-solid.), or high SBP (squares, .box-solid.).
[0115] FIG. 2-1B-HYP is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 2-1A-HYP.
[0116] FIG. 2-1C-HYP is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the PCA model derived from 1-D .sup.1H NMR spectra from
serum samples from patients with low SBP (triangles,
.tangle-solidup.), middle SBP (circles, .circle-solid.), or high
SBP (squares, .box-solid.). Prior to PCA, the data were filtered
(in this case, using orthogonal signal correction, OSC).
[0117] FIG. 2-1D-HYP is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 2-1C-HYP.
[0118] FIG. 2-1E-HYP shows three pairs of plots (a scores scatter
plot for PC2 and PC1 (t2 vs. t1) for a PCA model calculated from
1-D .sup.1H NMR data for pairs of classes of serum samples, and the
corresponding loadings plot (p2 vs. p1)). In the scores plots,
serum samples from patients with low SBP are denoted with triangles
(.tangle-solidup.), middle SBP with circles (.circle-solid.), and
high SBP with squares (.box-solid.):
[0119] FIG. 2-1E(1)-HYP: Low and Middle SBP scores scatter
plot.
[0120] FIG. 2-1E(2)-HYP: Low and Middle SBP loadings scatter
plot.
[0121] FIG. 2-1E(3)-HYP: Middle and High SBP scores scatter
plot.
[0122] FIG. 2-1E(4)-HYP: Middle and High SBP loadings scatter
plot.
[0123] FIG. 2-1E(5)-HYP: Low and High SBP scores scatter plot.
[0124] FIG. 2-1E(6)-HYP: Low and High SBP loadings scatter plot
[0125] FIG. 2-1F-HYP shows three pairs of plots (a scores scatter
plot for PC2 and PC1 (t2 vs. t1) for an OSC-PLS-DA model calculated
from 1-D .sup.1H NMR data for pairs of classes of serum samples,
and the corresponding loadings plot (p2 vs. p1)). In the scores
plots, serum samples from patients with low SBP are denoted with
triangles (.tangle-solidup.), middle SBP with circles
(.circle-solid.), and high SBP with squares (.box-solid.):
[0126] FIG. 2-1F(1)-HYP: Low and Middle SBP scores scatter
plot.
[0127] FIG. 2-1F(2)-HYP: Low and Middle SBP loadings scatter
plot.
[0128] FIG. 2-1F(3)-HYP: Middle and High SBP scores scatter
plot.
[0129] FIG. 2-1F(4)-HYP: Middle and High SBP loadings scatter
plot.
[0130] FIG. 2-1F(5)-HYP: Low and High SBP scores scatter plot.
[0131] FIG. 2-1F(6)-HYP: Low and High SBP loadings scatter
plot.
[0132] FIG. 2-2-HYP shows, for each of the OSC-PLS-DA models
described in FIG. 2-1F-HYP, both a section of the variable
importance plot (VIP) and a plot of regression coefficients for
each of the respective models:
[0133] FIG. 2-2-(1)-HYP: VIP for low and middle SBP samples.
[0134] FIG. 2-2-(2)-HYP: Regress. coefs., low with respect to
middle SBP.
[0135] FIG. 2-2-(3)-HYP: VIP for middle and high SBP samples.
[0136] FIG. 2-2-(4)-HYP: Regress. coefs., middle with respect to
high SBP.
[0137] FIG. 2-2-(5)-HYP: VIP for low and high SBP samples.
[0138] FIG. 2-2-(6)-HYP: Regress. coefs., low with respect to high
SBP.
[0139] FIG. 2-3A-HYP is a plot of distance-to-model(DModXPS) for a
model constructed from 14 low SBP samples, and tested with low and
middle SBP samples. DCrit at 1.41. Prediction rate: 84%.
[0140] FIG. 2-3B-HYP is a plot of distance-to-model(DModXPS) for a
model constructed from 9 middle SBP samples, and tested with low
and middle SBP samples. DCrit at 1.50. Prediction rate: 84%.
[0141] FIG. 2-4A-HYP is a plot of distance-to-model(DModXPS) for a
model constructed from 9 middle SBP samples, and tested with middle
and high SBP samples. DCrit at 1.50. Prediction rate: 59%.
[0142] FIG. 2-4B-HYP is a plot of distance-to-model(DModXPS) for a
model constructed from 9 high SBP samples, and tested with middle
and high SBP samples. DCrit at 1.50. Prediction rate: 37%.
[0143] FIG. 2-5A-HYP is a plot of distance-to-model(DModXPS) for a
model constructed from 15 low SBP samples, and tested with low and
high SBP samples. DCrit at 1.41. Prediction rate: 80%.
[0144] FIG. 2-5B-HYP is a plot of distance-to-model(DModXPS) for a
model constructed from 9 high SBP samples, and tested with low and
high SBP samples. DCrit at 1.50. Prediction rate: 83%.
[0145] Atherosclerosis/Coronary Heart Disease
[0146] FIG. 3-1-CHD is a 600 MHz 1-D .sup.1H NMR spectrum for serum
obtained from (A) a patient with normal coronary arteries (NCA);
and (B) a patient with triple vessel disease patient (TVD). The
spectra were recorded at a temperature of 300 K, corrected for
phase and baseline distortions, and chemical shifts were referenced
to that of lactate (CH.sub.3; .delta. 1.33).
[0147] FIG. 3-2A-CHD is a scores scatter plot for PC3 and PC2 (t3
vs. t2) for the principal components analysis (PCA) model derived
from 1-D .sup.1H NMR spectra from serum samples from NCA (circles,
.circle-solid.) and TVD (squares, .box-solid.) patients.
[0148] FIG. 3-2B-CHD is the corresponding loadings scatter plot (p3
vs. p2) for the PCA shown in FIG. 3-2A-CHD.
[0149] FIG. 3-2C-CHD is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the PCA model derived from 1-D .sup.1H NMR spectra from
serum samples from NCA (circles, .circle-solid.) and TVD (squares,
.box-solid.) patients. Prior to PCA, the data were filtered (in
this case, using orthogonal signal correction, OSC).
[0150] FIG. 3-2D-CHD is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 3-2C-CHD.
[0151] FIG. 3-2E-CHD is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the PLS-DA model derived from 1-D .sup.1H NMR spectra
from serum samples from NCA (circles, .circle-solid.) and TVD
(squares, .box-solid.) patients. Prior to PCA, the data were
filtered (in this case, using orthogonal signal correction,
OSC).
[0152] FIG. 3-2F-CHD is the corresponding loadings scatter plot
(w*c2 vs. w*c1) for the PLS-DA shown in FIG. 3-2E-CHD.
[0153] FIG. 3-3A-CHD shows a section of the variable importance
plot (VIP) for the OSC-PLS-DA model, showing the calculated
importance of the 13 most important variables.
[0154] FIG. 3-3B-CHD is a plot of the regression coefficients of
the 1-D .sup.1H NMR variables for the TVD serum samples, derived
from the OSC-PLS-DA. Each bar represents a spectral region covering
.delta. 0.04.
[0155] FIG. 3-4-CHD is a y-predicted scatter plot, showing NCA
(circles, .circle-solid.) and TVD (squares, .box-solid.) samples
and validation samples (triangle, .tangle-solidup., NCA or TVA as
marked), for an OSC-PLS-DA model.
[0156] FIG. 3-5A-CHD is the scores scatter plot for PC2 and PC1 (t2
vs. t1) for the PCA model calculated from 1-D .sup.1H NMR data for
all three classes of serum sample: type "1" vessel disease
(triangles, .tangle-solidup.), type "2" vessel disease (circles,
.circle-solid.), and type "3" vessel disease (squares,
.box-solid.).
[0157] FIG. 3-5B-CHD is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 3-5A-CHD.
[0158] FIG. 3-5C-CHD shows three pairs of plots (a scores scatter
plot for PC2 and PC1 (t2 vs. t1) for a PLS-DA model calculated from
1-D .sup.1H NMR data for pairs of classes of serum samples, and the
corresponding W*c loadings plot (wc2 vs. wc1)). In the scores
plots, type "1" samples are denoted by triangles
(.tangle-solidup.); type "2" samples are denoted by circles
(.circle-solid.); and type "3" samples are denoted by squares
(.box-solid.).
[0159] FIG. 3-5C-(1)-CHD: type "1" and "2" scores scatter plot.
[0160] FIG. 3-5C-(2)-CHD: type "1" and "2" loadings w*c scatter
plot.
[0161] FIG. 3-5C-(3)-CHD: type "2" and "3" scores scatter plot.
[0162] FIG. 3-5C-(4)-CHD: type "2" and "3" loadings w*c scatter
plot.
[0163] FIG. 3-5C-(5)-CHD: type "1" and "3" scores scatter plot.
[0164] FIG. 3-5C-(6)-CHD: type "1" and "3" loadings w*c scatter
plot.
[0165] FIG. 3-6A-CHD is a scores scatter plot for PC2 and PC1 (t2
vs. t1) calculated for a PCA model calculated using filtered 1-D
.sup.1H NMR data (in this case, filtered using orthogonal signal
correction, OSC), for all three classes of serum sample: type "1"
vessel disease (triangles, .tangle-solidup.); type "2" vessel
disease (circles, .circle-solid.); and type "3" vessel disease
(squares, .box-solid.).
[0166] FIG. 3-6B-CHD is the corresponding loadings scatter plot (p2
vs. p1) for PCA shown in
[0167] FIG. 3-5A-CHD.
[0168] FIG. 3-6C-CHD shows three pairs of plots (a scores scatter
plot for PC2 and PC1 (t2 vs. t1) for a PLS-DA model calculated from
1-D .sup.1H NMR data for pairs of classes of serum samples,
following OSC, and the corresponding w*c loadings plot (wc2 vs.
wc1)). In the scores plots, type "1" samples are denoted by
triangles (.tangle-solidup.); type "2" samples are denoted by
circles (.circle-solid.); and type "3" samples are denoted by
squares (.box-solid.).
[0169] FIG. 3-8C-(1)-CHD: type "1" and "2" scores scatter plot.
[0170] FIG. 3-6C-(2)-CHD: type "1" and "2" loadings w*c scatter
plot.
[0171] FIG. 3-6C-(3)-CHD: type "2" and "3" scores scatter plot.
[0172] FIG. 3-6C-(4)-CHD: type "2" and "3" loadings w*c scatter
plot.
[0173] FIG. 3-6C-(5)-CHD: type "1" and "3" scores scatter plot.
[0174] FIG. 3-6C-(6)-CHD: type "1" and "3" loadings w*c scatter
plot.
[0175] FIG. 3-7-CHD shows, for each of the three models described
in FIG. 3-6C, both a section of the variable importance plot (VIP)
and a plot of the regression coefficients for the respective
OSC-PLS-DA model. Each bar represents a spectral region covering
.delta. 0.04.
[0176] FIG. 3-7-(1)-CHD: VIP for "1" and "2" vessel disease
samples.
[0177] FIG. 3-7-(2)-CHD: Regression coefficients, "1" with respect
to "2" vessel disease.
[0178] FIG. 3-7-(3)-CHD: VIP for "2" and "3" vessel disease
samples.
[0179] FIG. 3-7-(4)-CHD: Regression coefficients, "2" with respect
to "3" vessel disease.
[0180] FIG. 3-7-(5)-CHD: VIP for "1" and "3" vessel disease
samples.
[0181] FIG. 3-7-(6)-CHD: Regression coefficients, "1" with respect
to "3" vessel disease.
[0182] FIG. 3-4-CHD shows three y-predicted scatter plots, showing
type "1" (triangles, .tangle-solidup.), type "2" (circles,
.circle-solid.), type "3" (squares, .box-solid.) and validation
samples (diamonds), for PLS-DA models calculated for the same data,
following OSC.
[0183] FIG. 3-8A-CHD: type "1" and "2".
[0184] FIG. 3-8B-CHD: type "2" and "3".
[0185] FIG. 3-8C-CHD: type "1" and "3".
[0186] FIG. 3-9A-CHD is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for a PCA model calculated from established clinical
parameters for subjects with type "1" (triangles,
.tangle-solidup.), type "2" (circles, .circle-solid.), type "3"
(squares, .box-solid.) vessel disease.
[0187] FIG. 3-9B-CHD is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 3-9A-CHD.
[0188] FIG. 3-9C-CHD shows three pairs of plots (a scores scatter
plot for PC2 and PC1 (t2 vs. t1) for a PLS-DA model calculated
using established clinical parameters, and the corresponding
loadings w*c plot (w*c2 vs. w*c1)). In the scores plots, type "1"
samples are denoted by triangles (.tangle-solidup.); type "2"
samples are denoted by circles (.circle-solid.); and type "3"
samples are denoted by squares (.box-solid.).
[0189] FIG. 3-9C-(1)-CHD: type "1" and "2" scores scatter plot.
[0190] FIG. 3-9C-(2)-CHD: type "1" and "2" loadings w*c scatter
plot.
[0191] FIG. 3-9C-(3)-CHD: type "2" and "3" scores scatter plot.
[0192] FIG. 3-9C-(4)-CHD: type "2" and "3" loadings w*c scatter
plot.
[0193] FIG. 3-9C-(5)-CHD: type "1" and "3" scores scatter plot.
[0194] FIG. 3-9C-(6)-CHD: type "1" and "3" loadings w*c scatter
plot.
[0195] FIG. 3-10-CHD shows, for each of the three models described
in FIG. 3-9C, both a section of the variable importance plot (VIP)
and a plot of the regression coefficients for the respective
OSC-PLS-DA models. Each bar represents a spectral region covering
.delta. 0.04.
[0196] FIG. 3-10-(1)-CHD: VIP for "1" and "2" vessel disease
samples.
[0197] FIG. 3-10-(2)-CHD: Regres. coefs., "1" with respect to "2"
vessel disease.
[0198] FIG. 3-10-(3)-CHD: VIP for "2" and "3" vessel disease
samples.
[0199] FIG. 3-10-(4)-CHD: Regres. coefs., "2" with respect to "3"
vessel disease.
[0200] FIG. 3-10-(5)-CHD: VIP for "1" and "3" vessel disease
samples.
[0201] FIG. 3-10-(6)-CHD: Regres. coefs., "1" with respect to "3"
vessel disease.
[0202] Osteoporosis
[0203] FIG. 4-1A-OP is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the principal components analysis (PCA) model derived
from 1-D .sup.1H NMR spectra from serum samples from control
subjects (triangles, .tangle-solidup.) and patients with
osteoporosis (circles, .circle-solid.).
[0204] FIG. 4-1B-OP is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 4-1A-OP.
[0205] FIG. 4-1C-OP is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the PCA model derived from 1-D .sup.1H NMR spectra from
serum samples from control subjects (triangles, .tangle-solidup.)
and patients with osteoporosis (circles, .circle-solid.). Prior to
PCA, the data were filtered (in this case, using orthogonal signal
correction, OSC).
[0206] FIG. 4-1D-OP is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 4-1C-OP.
[0207] FIG. 4-1E-OP is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the PLS-DA model derived from 1-D .sup.1H NMR spectra
from serum samples from control subjects (triangles,
.tangle-solidup.) and patients with osteoporosis (circles,
.circle-solid.). Prior to PLS-DA, the data were filtered (in this
case, using orthogonal signal correction, OSC).
[0208] FIG. 4-1F-OP is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 4-1E-OP.
[0209] FIG. 4-2A-OP shows a section of the variable importance plot
(VIP) derived from the PLS-DA model described in FIG. 4-1E-OP.
[0210] FIG. 4-2B-OP shows a section of the regression coefficient
plot derived from the PLS-DA model described in FIG. 4-1E-OP.
[0211] FIG. 4-3-OP is a y-predicted scatter plot for a PLS-DA model
calculated using .about.85% of the control (triangles,
.tangle-solidup.) and osteoporosis (circles, .circle-solid.)
samples, which was then used to predict the presence of disease in
the remaining 15% of samples (squares, .box-solid.) (the validation
set).
[0212] Osteoarthritis
[0213] FIG. 5-1A-OA is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the principal components analysis (PCA) model derived
from 1-D .sup.1H NMR spectra from serum samples from control
subjects (triangles, .tangle-solidup.) and patients with
osteoarthritis. (circles, .circle-solid.).
[0214] FIG. 5-1B-OA is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 5-1A-OA.
[0215] FIG. 5-1C-OA is a scores scatter plot for PC3 and PC2 (t3
vs. t2) for the PCA model derived from 1-D .sup.1H NMR spectra from
serum samples from control subjects (triangles, .tangle-solidup.)
and patients with osteoarthritis (circles, .circle-solid.). Prior
to PCA, the data were filtered (in this case, using orthogonal
signal correction, OSC).
[0216] FIG. 5-1D-OA is the corresponding loadings scatter plot (p3
vs. p2) for the PCA shown in FIG. 5-1C-OA.
[0217] FIG. 5-1E-OA is a scores scatter plot for PC2 and PC1 (t2
vs. t1) for the PLS-DA model derived from 1-D .sup.1H NMR spectra
from serum samples from control subjects (triangles,
.tangle-solidup.) and patients with Osteoarthritis (circles,
.circle-solid.). Prior to PLS-DA, the data were filtered (in this
case, using orthogonal signal correction, OSC).
[0218] FIG. 5-1F-OA is the corresponding loadings scatter plot (p2
vs. p1) for the PCA shown in FIG. 5-1E-OA.
[0219] FIG. 5-2A-OA shows a section of the variable importance plot
(VIP) derived from the PLS-DA model described in FIG. 5-1E-OA.
[0220] FIG. 5-2B-OA shows a section of the regression coefficient
plot derived from the PLS-DA model described in FIG. 5-1E-OA.
[0221] FIG. 5-3-OA is a y-predicted scatter plot for a PLS-DA model
calculated using .about.85% of the control (triangles,
.tangle-solidup.) and osteoporosis (circles, .circle-solid.)
samples, which was then used to predict the presence of disease in
the remaining 15% of samples (squares, .box-solid.) (the validation
set).
DETAILED DESCRIPTION OF THE INVENTION
[0222] Introduction
[0223] The inventors have developed novel methods (which employ
multivariate statistical analysis and pattern recognition (PR)
techniques, and optionally data filtering techniques) of analysing
data (e.g., NMR spectra) from a test population which yield
accurate mathematical models which may subsequently be used to
classify a test sample or subject, and/or in diagnosis.
[0224] An NMR spectrum provides a fingerprint or profile for the
sample to which it pertains. Such spectra represent a measure of
all NMR detectable species present in the sample (rather than a
select few) and also, to some extent, interactions between these
species. As such, these spectra are characterised by a high data
density which, heretofore, has not been fully exploited.
[0225] The methods described herein facilitate the analysis of such
spectra, and the subsequent use of the results of that analysis to
classify test spectra (and therefore the associated samples and
subjects, if applicable) according to one or more distinguishing
criteria, at a discrimination level never before achieved.
[0226] These methods find particular application in the field of
medicine. For example, analysis of NMR spectra for samples taken
from a population characterised by a certain condition yields a
mathematical model which can be used to classify an NMR spectrum
for a sample from a test subject as positive (also having the
condition) or negative (not having the condition) with a high
degree of confidence.
[0227] In effect, these methods facilitate the identification of
the particular combination of amounts of (e.g., endogenous) spedes
which are invariably associated with the presence of the condition.
These combinations (patterns), which typically comprise many (often
small) uncorrelated variances which together are diagnostic, are
encoded within the high data density of the NMR spectra. The
methods described herein permit their identification and subsequent
use for classification.
[0228] However, it must be stressed that metabonomic analysis based
on NMR spectra is much more powerful than simply using a high
technology analytical tool (the NMR spectrometer) to measure the
levels of known metabolites. That is, the methods described herein
are distinct from methods which simply carry out multiple
independent measures of discrete chemical entitities (e.g., LDL
cholesterol concentration).
[0229] For example, considering the variance in NMR spectral
intensity (total peak intensity) in any particular defined chemical
shift region (known as a bucket or bin), a part of that variance
may be associated with a given molecule (a biomarker), the level of
which varies consistently as a result of the condition under study.
The remainder of the variance may be due to differences in the
levels of other molecules which give peaks in that integral region
but which are unrelated to the condition under study (e.g.,
individual to individual differences such as dietary factors, age,
gender, etc.).
[0230] The methods described herein, which employ pattern
recognition techniques, permit identification of that NMR peak
intensity which is related to the condition under study, even
though only a small part of the variance in a spectral region
(bucket) may be related to the condition under study. The
identification power is enhanced by the application of data
filtering techniques (e.g., orthogonal signal correction, OSC)
which can lower the influence of buckets with variance unrelated to
the condition of interest. Actual identification of the molecular
biomarkers contributing to significant buckets is carried out by
reexamination of the original NMR spectra by NMR experts, and could
involve additional NMR spectroscopic experiments such as
2-dimensional NMR spectroscopy; separation of putative substances
and their identification using HPLC-NMR-MS; addition of authentic
substance to the sample and re-measuring the NMR spectrum, checking
for coincidence of NMR peaks; etc.
[0231] For example, in NMR spectra of blood plasma, in the region
around .delta. 1.2-1.3, a number of peaks appear, all of which will
contribute to the intensity in those buckets labelled .delta. 1.30
(e.g., the chemical shift region .delta. 1.32-1.28), .delta. 1.26
(e.g., the region .delta. 1.28-1.24), and .delta. 1.22 (e.g., the
region .delta. 1.24-1.20). Given the bucket width of 0.04 ppm
(i.e., 24 Hz at 600 MHz), the wings of the lorentzian lines of the
NMR resonances will have contributions in most or all of these
buckets even though the peak maximum appears in a single bucket The
two main broad NMR peak envelopes in this region of the spectrum
have been assigned to the long chain methylene groups of the fatty
acyl chains of lipoproteins, and in addition there are a number of
small molecule metabolites which have NMR resonances in this
region, some of which have been assigned. See, e.g., Nicholson et
al, 1995. These include the methyl resonances of lactate (a doublet
at .delta. 1.33), threonine (a doublet at .delta. 1.32), fucose (a
doublet at .delta. 1.31), in some cases 3-hydroxybutyrate (a
doublet at .delta. 1.20) and part of the methylene resonance of
isoleucine (a multiplet at .delta. 1.28). The two overlapping
lipoprotein peaks have been assigned as mainly VLDL at .delta. 1.29
and mainly LDL at .delta. 1.25. However both of these signals are
asymmetric in appearance and are comprised of a number of
overlapping resonances. By examination of the .sup.1H NMR spectra
of individual lipoprotein fractions, it has been possible to use
mathematical deconvolution techniques to show that this composite
envelope in the .delta. 1.3-1.2 region is comprised of two bands
from VLDL, 3 bands from LDL and 2 bands from HDL. See, e.g., M.
Ala-Korpela, Progress in NMR Spectroscopy, 27, 475-554 (1995)). In
fact, the inventors have shown that the variance in the spectral
intensity in the bucket at .delta. 1.30 is only weakly correlated
with the LDL level measured independently for a panel of 100
patients. The correlation coefficient (r) between the level of LDL
as measured by a conventional method and the bucket intensity at
.delta. 1.30 in the NMR spectra of the same samples, is only 0.45.
Therefore, the changes in the concentration of LDL over the samples
in this panel of 100 patients only accounts for about 20% of the
variance in this bucket intensity, since variance is proportional
to r.sup.2. Thus the variance in the intensity in the .delta. 1.30
bucket, over the sample population, contains much more information
than solely the variance in the LDL concentration. The methods the
present invention permit the determination and exploitation of such
of the additional, until now hidden, information.
[0232] Furthermore, the methods can be applied to achieve
classification into multiple categories on the basis of a single
dataset (e.g., an NMR spectrum for a single sample). Due to the
very high data density of the input dataset, the analysis method
can separately (i.e., in parallel) or sequentially (i.e., in
series) perform multiple classifications. For example, a single
blood sample could be used to determine (e.g., diagnose) the
presence or absence of several, or indeed, many, (e.g., unrelated)
conditions or diseases.
[0233] Thus, one aspect of the present invention pertains to
improved methods for the analysis of chemical, biochemical, and
biological data, for example spectra, for example, nuclear magnetic
resonance (NMR) and other types of spectra.
[0234] Methods of Classifying, Diagnosing
[0235] One aspect of the present invention pertains to a method of
classifying a sample, as described herein.
[0236] One aspect of the present invention pertains to a method of
classifying a subject by classifying a sample from said subject,
wherein said method of classifying a sample is as described
herein.
[0237] One aspect of the present invention pertains to a method of
diagnosing a subject by classifying a sample from said subject,
wherein said method of classifying a sample is as described
herein.
[0238] Classifying a Sample: by NMR Spectral Intensity
[0239] One aspect of the present invention pertains to a method of
classifying a sample, said method comprising the step of relating
NMR spectral intensity at one or more predetermined diagnostic
spectral windows for said sample with a predetermined
condition.
[0240] One aspect of the present invention pertains to a method of
classifying a sample from a subject, said method comprising the
step of relating NMR spectral intensity at one or more
predetermined diagnostic spectral windows for said sample with a
predetermined condition of said subject.
[0241] One aspect of the present invention pertains to a method of
classifying a sample, said method comprising the step of relating
NMR spectral intensity at one or more predetermined diagnostic
spectral windows for said sample with the presence or absence of a
predetermined condition.
[0242] One aspect of the present invention pertains to a method of
classifying a sample from a subject, said method comprising the
step of relating NMR spectral intensity at one or more
predetermined diagnostic spectral windows for said sample with the
presence or absence of a predetermined condition of said
subject.
[0243] One aspect of the present invention pertains to a method of
classifying a sample, said method comprising the step of relating a
modulation of NMR spectral intensity, relative to a control value,
at one or more predetermined diagnostic spectral windows for said
sample with a predetermined condition.
[0244] One aspect of the present invention pertains to a method of
classifying a sample from a subject, said method comprising the
step of relating a modulation of NMR spectral intensity, relative
to a control value, at one or more predetermined diagnostic
spectral windows for said sample with a predetermined condition of
said subject.
[0245] One aspect of the present invention pertains to a method of
classifying a sample, said method comprising the step of relating a
modulation of NMR spectral intensity, relative to a control value,
at one or more predetermined diagnostic spectral windows for said
sample with the presence or absence of a predetermined
condition.
[0246] One aspect of the present invention pertains to a method of
classifying a sample from a subject, said method comprising the
step of relating a modulation of NMR spectral intensity, relative
to a control value, at one or more predetermined diagnostic
spectral windows for said sample with the presence or absence of a
predetermined condition of said subject.
[0247] Classifying a Subject: by NMR Spectral Intensity
[0248] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the step of relating
NMR spectral intensity at one or more predetermined diagnostic
spectral windows for a sample from said subject with a
predetermined condition of said subject.
[0249] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the step of relating
NMR spectral intensity at one or more predetermined diagnostic
spectral windows for a sample from said subject with the presence
or absence of a predetermined condition of said subject.
[0250] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the step of relating
a modulation of NMR spectral intensity, relative to a control
value, at one or more predetermined diagnostic spectral windows for
a sample from said subject with a predetermined condition of said
subject.
[0251] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the step of relating
a modulation of NMR spectral intensity, relative to a control
value, at one or more predetermined diagnostic spectral windows for
a sample from said subject with the presence or absence of a
predetermined condition of said subject.
[0252] Diagnosing a Subject: by NMR Spectral Intensity
[0253] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the step of relating NMR spectral intensity at one or
more predetermined diagnostic spectral windows for a sample from
said subject with said predetermined condition of said subject.
[0254] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the step of relating NMR spectral intensity at one or
more predetermined diagnostic spectral windows for a sample from
said subject with the presence or absence of said predetermined
condition of said subject.
[0255] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the step of relating a modulation of NMR spectral
intensity, relative to a control value, at one or more
predetermined diagnostic spectral windows for a sample from said
subject with said predetermined condition of said subject.
[0256] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the step of relating a modulation of NMR spectral
intensity, relative to a control value, at one or more
predetermined diagnostic spectral windows for a sample from said
subject with the presence or absence of said predetermined
condition of said subject.
[0257] Classifying a Sample: by Amount of Diagnostic Species
[0258] One aspect of the present invention pertains to a method of
classifying a sample, said method comprising the step of relating
the amount of, or relative amount of one or more diagnostic species
present in said sample with a predetermined condition.
[0259] One aspect of the present invention pertains to a method of
classifying a sample from a subject, said method comprising the
step of relating the amount of, or relative amount of one or more
diagnostic species present in said sample with a predetermined
condition of said subject.
[0260] One aspect of the present invention pertains to a method of
classifying a sample, said method comprising the step of relating
the amount of, or relative amount of one or more diagnostic species
present in said sample with the presence or absence of a
predetermined condition.
[0261] One aspect of the present invention pertains to a method of
classifying a sample from a subject, said method comprising the
step of relating the amount of, or the relative amount of, one or
more diagnostic species present in said sample with the presence or
absence of a predetermined condition of said subject.
[0262] One aspect of the present invention pertains to a method of
classifying a sample, said method comprising the step of relating a
modulation of the amount of, or relative amount of one or more
diagnostic species present in said sample, as compared to a control
sample, with a predetermined condition.
[0263] One aspect of the present invention pertains to a method of
classifying a sample from a subject, said method comprising the
step of relating a modulation of the amount of, or relative amount
of one or more diagnostic species present in said sample, as
compared to a control sample, with a predetermined condition of
said subject.
[0264] One aspect of the present invention pertains to a method of
classifying a sample, said method comprising the step of relating a
modulation of the amount of, or relative amount of one or more
diagnostic species present in said sample, as compared to a control
sample, with the presence or absence of a predetermined
condition.
[0265] One aspect of the present invention pertains to a method of
classifying a sample from a subject, said method comprising the
step of relating a modulation of the amount of, or relative amount
of one or more diagnostic species present in said sample, as
compared to a control sample, with the presence or absence of a
predetermined condition of said subject.
[0266] Classifying a Subject: by Amount of Diagnostic Species
[0267] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the step of relating
the amount of, or relative amount of one or more diagnostic species
present in a sample from said subject with a predetermined
condition of said subject.
[0268] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the step of relating
the amount of, or relative amount of one or more diagnostic species
present in a sample from said subject with the presence or absence
of a predetermined condition of said subject.
[0269] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the step of relating
a modulation of the amount of, or relative amount of one or more
diagnostic species present in a sample from said subject, as
compared to a control sample, with a predetermined condition of
said subject.
[0270] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the step of relating
a modulation of the amount of, or relative amount of one or more
diagnostic species present in a sample from said subject, as
compared to a control sample, with the presence or absence of a
predetermined condition of said subject.
[0271] Diagnosing a Subject: by Amount of Diagnostic Species
[0272] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the step of relating the amount of, or relative amount
of one or more diagnostic species present in a sample from said
subject with said predetermined condition of said subject.
[0273] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the step of relating the amount of, or relative amount
of one or more diagnostic species present in a sample from said
subject with the presence or absence of said predetermined
condition of said subject.
[0274] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the step of relating a modulation of the amount of, or
relative amount of one or more diagnostic species present in a
sample from said subject, as compared to a control sample, with
said predetermined condition of said subject.
[0275] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the step of relating a modulation of the amount of, or
relative amount of one or more diagnostic species present in a
sample from said subject, as compared to a control sample, with the
presence or absence of said predetermined condition of said
subject.
[0276] Classifying a Sample: by Mathematical Modelling
[0277] One aspect of the present invention pertains to a method of
classification, said method comprising the steps of:
[0278] (a) forming a predictive mathematical model by applying a
modelling method to modelling data;
[0279] (b) using said model to classify a test sample.
[0280] One aspect of the present invention pertains to a method of
classifying a test sample, said method comprising the steps of:
[0281] (a) forming a predictive mathematical model by applying a
modelling method to modelling data;
[0282] wherein said modelling data comprises a plurality of data
sets for modelling samples of known class;
[0283] (b) using said model to classify said test sample as being a
member of one of said known classes.
[0284] One aspect of the present invention pertains to a method of
classifying a test sample, said method comprising the steps of:
[0285] (a) forming a predictive mathematical model by applying a
modelling method to modelling data;
[0286] wherein said modelling data comprises at least one data set
for each of a plurality of modelling samples;
[0287] wherein said modelling samples define a class group
consisting of a plurality of classes;
[0288] wherein each of said modelling samples is of a known class
selected from said class group; and,
[0289] (b) using said model with a data set for said test sample to
classify said test sample as being a member of one class selected
from said class group.
[0290] One aspect of the present invention pertains to a method of
classification, said method comprising the step of:
[0291] using a predictive mathematical model;
[0292] wherein said model is formed by applying a modelling method
to modelling data;
[0293] to classify a test sample.
[0294] One aspect of the present invention pertains to a method of
classifying a test sample, said method comprising the step of:
[0295] using a predictive mathematical model;
[0296] wherein said model is formed by applying a modelling method
to modelling data;
[0297] wherein said modelling data comprises a plurality of data
sets for modelling samples of known class;
[0298] to classify said test sample as being a member of one of
said known classes.
[0299] One aspect of the present invention pertains to a method of
classifying a test sample, said method comprising the step of:
[0300] using a predictive mathematical model;
[0301] wherein said model is formed by applying a modelling method
to modelling data;
[0302] wherein said modelling data comprises at least one data set
for each of a plurality of modelling samples;
[0303] wherein said modelling samples define a class group
consisting of a plurality of classes;
[0304] wherein each of said modelling samples is of a known class
selected from said class group;
[0305] with a data set for said test sample to classify said test
sample as being a member of one class selected from said class
group.
[0306] Classifying a Subject: by Mathematical Modelling
[0307] One aspect of the present invention pertains to a method of
classification, said method comprising the steps of:
[0308] (a) forming a predictive mathematical model by applying a
modelling method to modelling data;
[0309] (b) using said model to classify a subject.
[0310] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the steps of:
[0311] (a) forming a predictive mathematical model by applying a
modelling method to modelling data;
[0312] wherein said modelling data comprises a plurality of data
sets for modelling samples of known class;
[0313] (b) using said model to classify a test sample from said
subject as being a member of one of said known classes, and thereby
classify said subject.
[0314] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the steps of:
[0315] (a) forming a predictive mathematical model by applying a
modelling method to modelling data;
[0316] wherein said modelling data comprises at least one data set
for each of a plurality of modelling samples;
[0317] wherein said modelling samples define a class group
consisting of a plurality of classes;
[0318] wherein each of said modelling samples is of a known class
selected from said class group; and,
[0319] (b) using said model with a data set for a test sample from
said subject to classify said test sample as being a member of one
class selected from said class group, and thereby classify said
subject.
[0320] One aspect of the present invention pertains to a method of
classification, said method comprising the step of:
[0321] using a predictive mathematical model;
[0322] wherein said model is formed by applying a modelling method
to modelling data;
[0323] to classify a subject.
[0324] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the step of:
[0325] using a predictive mathematical model
[0326] wherein said model is formed by applying a modelling method
to modelling data;
[0327] wherein said modelling data comprises a plurality of data
sets for modelling samples of known class;
[0328] to classify a test sample from said subject as being a
member of one of said known classes, and thereby classify said
subject.
[0329] One aspect of the present invention pertains to a method of
classifying a subject, said method comprising the step of:
[0330] using a predictive mathematical model,
[0331] wherein said model is formed by applying a modelling method
to modelling data;
[0332] wherein said modelling data comprises at least one data set
for each of a plurality of modelling samples;
[0333] wherein said modelling samples define a class group
consisting of a plurality of classes;
[0334] wherein each of said modelling samples is of a known class
selected from said class group;
[0335] with a data set for a test sample from said subject to
classify said test sample as being a member of one class selected
from said class group, and thereby classify said subject.
[0336] Diagnosing a Subject: by Mathematical Modelling
[0337] One aspect of the present invention pertains to a method of
diagnosis, said method comprising the steps of:
[0338] (a) forming a predictive mathematical model by applying a
modelling method to modelling data;
[0339] (b) using said model to diagnose a subject.
[0340] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the steps of:
[0341] (a) forming a predictive mathematical model by applying a
modelling method to modelling data;
[0342] wherein said modelling data comprises a plurality of data
sets for modelling samples of known class;
[0343] (b) using said model to classify a test sample from said
subject as being a member of one of said known classes, and thereby
diagnose said subject.
[0344] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the steps of:
[0345] (a) forming a predictive mathematical model by applying a
modelling method to modelling data;
[0346] wherein said modelling data comprises at least one data set
for each of a plurality of modelling samples;
[0347] wherein said modelling samples define a class group
consisting of a plurality of classes;
[0348] wherein each of said modelling samples is of a known class
selected from said class group; and,
[0349] (b) using said model with a data set for a test sample from
said subject to classify said test sample as being a member of one
class selected from said class group, and thereby diagnose said
subject.
[0350] One aspect of the present invention pertains to a method of
diagnosis, said method comprising the step of:
[0351] using a predictive mathematical model;
[0352] wherein said model is formed by applying a modelling method
to modelling data;
[0353] to diagnose a subject.
[0354] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the step of:
[0355] using a predictive mathematical model;
[0356] wherein said model is formed by applying a modelling method
to modelling data;
[0357] wherein said modelling data comprises a plurality of data
sets for modelling samples of known class;
[0358] to classify a test sample from said subject as being a
member of one of said known classes, and thereby diagnose said
subject.
[0359] One aspect of the present invention pertains to a method of
diagnosing a predetermined condition of a subject, said method
comprising the step of:
[0360] using a predictive mathematical model;
[0361] wherein said model is formed by applying a modelling method
to modelling data;
[0362] wherein said modelling data comprises at least one data set
for each of a plurality of modelling samples;
[0363] wherein said modelling samples define a class group
consisting of a plurality of classes;
[0364] wherein each of said modelling samples is of a known class
selected from said class group;
[0365] with a data set for a test sample from said subject to
classify said test sample as being a member of one class selected
from said class group, and thereby diagnose said subject.
[0366] Certain Preferred Embodiments
[0367] In one embodiment, said sample is a sample from a subject,
and said predetermined condition is a predetermined condition of
said subject.
[0368] In one embodiment, said test sample is a test sample from a
subject, and said predetermined condition is a predetermined
condition of said subject.
[0369] In one embodiment, said one or more predetermined diagnostic
spectral windows are associated with one or more diagnostic
species.
[0370] In one embodiment, said relating step involves the use of a
predictive mathematical model; for example, as described
herein.
[0371] The nature of a predictive mathematical model is determined
primarily by the modelling method employed when forming that
model.
[0372] In one embodiment, said modelling method is a multivariate
statistical analysis modelling method.
[0373] In one embodiment, said modelling method is a multivariate
statistical analysis modelling method which employs a pattern
recognition method.
[0374] In one embodiment, said modelling method is, or employs
PCA.
[0375] In one embodiment, said modelling method is, or employs
PLS.
[0376] In one embodiment, said modelling method is, or employs
PLS-DA.
[0377] In one embodiment, said modelling method includes a step of
data filtering.
[0378] In one embodiment, said modelling method includes a step of
orthogonal data filtering.
[0379] In one embodiment, said modelling method includes a step of
OSC.
[0380] In one embodiment, said model takes account of one or more
diagnostic species.
[0381] The precise details of the predictive mathematical model are
determined primarily by the modelling data (e.g., modelling data
sets).
[0382] In one embodiment, said modelling data comprise spectral
data.
[0383] In one embodiment, said modelling data comprise both
spectral data and non-spectral data (and is referred to as a
"composite data").
[0384] In one embodiment, said modelling data comprise NMR spectral
data.
[0385] In one embodiment, said modelling data comprise both NMR
spectral data and non-NMR spectral data.
[0386] In one embodiment, said NMR spectral data comprises .sup.1H
NMR spectral data and/or .sup.13C NMR spectral data.
[0387] In one embodiment, said NMR spectral data comprises .sup.1H
NMR spectral data.
[0388] In one embodiment, said modelling data comprise spectra.
[0389] In one embodiment, said modelling data are spectra.
[0390] In one embodiment, said modelling data comprises a plurality
of data sets for modelling samples of known class.
[0391] In one embodiment, said modelling data comprises at least
one data set for each of a plurality of modelling samples.
[0392] In one embodiment, said modelling data comprises exactly one
data set for each of a plurality of modelling samples.
[0393] In one embodiment, said using step is: using said model with
a data set for said test sample to classify said test sample as
being a member of one class selected from said class group.
[0394] In one embodiment, each of said data sets comprises spectral
data.
[0395] In one embodiment, each of said data sets comprises both
spectral data and non-spectral data (and is referred to as a
"composite data set").
[0396] In one embodiment, each of said data sets comprises NMR
spectral data.
[0397] In one embodiment, each of said data sets comprises both NMR
spectral data and non-NMR spectral data.
[0398] In one embodiment, said NMR spectral data comprises .sup.1H
NMR spectral data and/or .sup.13C NMR spectral data.
[0399] In one embodiment, said NMR spectral data comprises .sup.1H
NMR spectral data.
[0400] In one embodiment, each of said data sets comprises a
spectrum.
[0401] In one embodiment, each of said data sets comprises a
.sup.1H NMR spectrum and/or .sup.13C NMR spectrum.
[0402] In one embodiment, each of said data sets comprises a
.sup.1H NMR spectrum.
[0403] In one embodiment, each of said data sets is a spectrum.
[0404] In one embodiment, each of said data sets is a .sup.1H NMR
spectrum and/or .sup.13C NMR spectrum.
[0405] In one embodiment, each of said data sets is a .sup.1H NMR
spectrum.
[0406] In one embodiment, said non-spectral data is non-spectral
clinical data.
[0407] In one embodiment, said non-NMR spectral data is
non-spectral clinical data.
[0408] In one embodiment, said class group comprises classes
associated with said predetermined condition (e.g., presence,
absence, degree, etc.).
[0409] In one embodiment, said class group comprises exactly two
classes.
[0410] In one embodiment, said class group comprises exactly two
classes: presence of said predetermined condition; and absence of
said predetermined condition.
[0411] Classification, Classifying, and Classes
[0412] As discussed above, many aspects of the present invention
pertain to methods of classifying things, for example, a sample, a
subject, etc. In such methods, the thing is classified, that is, it
is associated with an outcome, or, more specifically, it is
assigned membership to a particular class (i.e., it is assigned
class membership), and is said "to be of," "to belong to," "to be a
member of," a particular class. Classification is made (i.e., class
membership is assigned) on the basis of diagnostic criteria. The
step of considering such diagnostic criteria, and assigning class
membership, is described by the word "relating," for example, in
the phrase "relating NMR spectral intensity at one or more
predetermined diagnostic spectral windows for said sample (i.e.,
diagnostic criteria) with the presence or absence of a
predetermined condition (i.e., class membership)."
[0413] For example, "presence of a predetermined condition" is one
class, and "absence of a predetermined condition" is another class;
in such cases, classification (i.e., assignment to one of these
classes) is equivalent to diagnosis.
[0414] Samples
[0415] As discussed above, many aspects of the present invention
pertain to methods which involve a sample, e.g., a particular
sample under study ("study sample").
[0416] In general, a sample may be in any suitable form. For
methods which involve spectra obtained or recorded for a sample,
the sample may be in any form which is compatible with the
particular type of spectroscopy, and therefore may be, as
appropriate, homogeneous or heterogeneous, comprising one or a
combination of, for example, a gas, a liquid, a liquid crystal, a
gel, and a solid.
[0417] Samples which originate from an organism (e.g., subject,
patient) may be in vivo; that is, not removed from or separated
from the organism. Thus, in one embodiment, said sample is an in
vivo sample. For example, the sample may be circulating blood,
which is "probed" in situ, in vivo, for example, using NMR
methods.
[0418] Samples which originate from an organism may be ex vivo;
that is, removed from or separated from the organism (e.g., an ex
vivo blood sample, an ex vivo urine sample). Thus, in one
embodiment, said sample is an ex vivo sample.
[0419] In one embodiment, said sample is an ex vivo blood or
blood-derived sample.
[0420] In one embodiment, said sample is an ex vivo blood
sample.
[0421] In one embodiment, said sample is an ex vivo plasma
sample.
[0422] In one embodiment, said sample is an ex vivo serum
sample.
[0423] In one embodiment, said sample is an ex vivo urine
sample.
[0424] In one embodiment, said sample is removed from or separated
from an/said organism, and is not returned to said organism (e.g.,
an ex vivo blood sample, an ex vivo urine sample).
[0425] In one embodiment, said sample is removed from or separated
from an/said organism, and is returned to said organism (i.e., "in
transit") (e.g., as with dialysis methods). Thus, in one
embodiment, said sample is an ex vivo in transit sample.
[0426] Examples of samples include:
[0427] a whole organism (living or dead, e.g., a living human);
[0428] a part or parts of an organism (e.g., a tissue sample, an
organ);
[0429] a pathological tissue such as a tumour;
[0430] a tissue homogenate (e.g. a liver microsome fraction);
[0431] an extract prepared from a organism or a part of an organism
(e.g., a tissue sample extract, such as perchloric acid
extract);
[0432] an infusion prepared from a organism or a part of an
organism (e.g., tea, Chinese traditional herbal medicines);
[0433] an in vitro tissue such as a spheroid;
[0434] a suspension of a particular cell type (e.g.
hepatocytes);
[0435] an excretion, secretion, or emission from an organism
(especially a fluid);
[0436] material which is administered and collected (e.g., dialysis
fluid);
[0437] material which develops as a function of pathology (e.g., a
cyst, blisters); and, supernatant from a cell culture.
[0438] Examples of fluid samples include, for example, blood
plasma, blood serum, whole blood, urine, (gall bladder) bile,
cerebrospinal fluid, milk, saliva, mucus, sweat, gastric juice,
pancreatic juice, seminal fluid, prostatic fluid, seminal vesicle
fluid, seminal plasma, amniotic fluid, foetal fluid, follicular
fluid, synovial fluid, aqueous humour, ascite fluid, cystic fluid,
blister fluid, and cell suspensions; and extracts thereof.
[0439] Examples of tissue samples include liver, kidney, prostate,
brain, gut, blood, blood cells, skeletal muscle, heart muscle,
lymphoid, bone, cartilage, and reproductive tissues.
[0440] Still other examples of samples include air (e.g., exhaust),
water (e.g., seawater, groundwater, wastewater, e.g., from
factories), liquids from the food industry (e.g. juices, wines,
beers, other alcoholic drinks, tea, milk), solid-like food samples
(e.g. chocolate, pastes, fruit peel, fruit and vegetable flesh such
as banana, leaves, meats, whether cooked or raw, etc.).
[0441] A few preferred samples are discussed below.
[0442] Blood, Plasma, Serum
[0443] Blood is the fluid that circulates in the blood vessels of
the body, that is, the fluid that is circulated through the heart,
arteries, veins, and capillaries. The function of the blood and the
circulation is to service the needs of other tissues: to transport
oxygen and nutrients to the tissues, to transport carbon dioxide
and various metabolic waste products away, to conduct hormones from
one part of the body to another, and in general to maintain an
appropriate environment in all tissue fluids for optimal survival
and function of the cells.
[0444] Blood consists of a liquid component, plasma, and a solid
component, cells and formed elements (e.g., erythrocytes,
leukocytes, and platelets), suspended within it. Erythrocytes, or
red blood cells account for about 99.9% of the cells suspended in
human blood. They contain hemoglobin which is involved in the
transport of oxygen and carbon dioxide. Leukocytes, or white blood
cells, account for about 0.1% of the cells suspended in human
blood. They play a role in the body's defense mechanism and repair
mechanism, and may be classified as agranular or granular.
Agranular leukocytes include monocytes and small, medium and large
lymphocytes, with small lymphocytes accounting for about 20-25% of
the leukocytes in human blood. T cells and B cells are important
examples of lymphocytes. Three classes of granular leukocytes are
known, neutrophils, eosinophils, and basophils, with neutrophils
accounting for about 60% of the leukocytes in human blood.
Platelets (i.e., thrombocytes) are not cells but small
spindle-shaped or rodlike bodies about 3 microns in length which
occur in large numbers in circulating blood. Platelets play a major
role in clot formation.
[0445] Plasma is the liquid component of blood. It serves as the
primary medium for the transport of materials among cellular,
tissue, and organ systems and their various external environments,
and it is essential for the maintenance of normal hemostasis. One
of the most important functions of many of the major tissue and
organ systems is to maintain specific components of plasma within
acceptable physiological limits.
[0446] Plasma is the residual fluid of blood which remains after
removal of suspended cells and formed elements. Whole blood is
typically processed to removed suspended cells and formed elements
(e.g., by centrifugation) to yield blood plasma. Serum is the fluid
which is obtained after blood has been allowed to dot and the clot
removed. Blood serum may be obtained by forming a blood clot (e.g.,
optionally initiated by the addition of thrombin and calcium ion)
and subsequently removing the clot (e.g., by centrifugation). Serum
and plasma differ primarily in their content of fibrinogen and
several components which are removed in the clotting process.
Plasma may be effectively prevented from clotting by the addition
of an anticoagulant (e.g., sodium citrate, heparin, lithium
heparin) to permit handling or storage. Plasma is composed
primarily of water (approximately 90%), with approximately 7%
proteins, 0.9% inorganic salts, and smaller amounts of
carbohydrates, lipids, and organic salts. The term "blood sample,"
as used herein, pertains to a sample of whole blood. The term
"blood-derived sample," as used herein, pertains to an ex vivo
sample derived from the blood of the subject under study.
[0447] Examples of blood and blood-derived samples include, but are
not limited to, whole blood (WB), blood plasma (including, e.g.,
fresh frozen plasma (FFP)), blood serum, blood fractions, plasma
fractions, serum fractions, blood fractions comprising red blood
cells (RBC), platelets (PLT), leukocytes, etc., and cell lysates
including fractions thereof (for example, cells, such as red blood
cells, white blood cells, etc., may be harvested and lysed to
obtain a cell lysate).
[0448] Methods for obtaining, preparing, handling, and storing
blood and blood-derived samples (e.g., plasma, serum) are well
known in the art. Typically, blood is collected from subjects using
conventional techniques (e.g., from the ante-cubital fossa),
typically pre-prandially.
[0449] For use in the methods described herein, the method used to
prepare the blood fraction (e.g., serum) should be reproduced as
carefully as possible from one subject to the next. It is important
that the same or similar procedure be used for all subjects. It may
be preferable to prepare serum (as opposed to plasma or other blood
fractions) for two reasons: (a) the preparation of serum is more
reproducible from individual to individual than the preparation of
plasma, and (b) the preparation of plasma requires the addition of
anticoagulants (e.g., EDTA, citrate, or heparin) which will be
visible in the NMR metabonomic profile and may reduce the data
density available.
[0450] A typical method for the preparation of serum suitable for
analysis by the methods described herein is as follows: 10 mL of
blood is drawn from the antecubital fossa of an individual who had
fasted overnight, using an 18 gauge butterfly needle. The blood is
immediately dispensed into a polypropylene tube and allowed to clot
at room temperature for 3 hours. The clotted blood is then
subjected to centrifugation (e.g., 4,500.times.g for 5 minutes) and
the serum supernatant removed to a clean tube. If necessary, the
centrifugation step can be repeated to ensure the serum is
efficiently separated from the clot. The serum supernatant may be
analysed "fresh" or it may be stored frozen for later analysis.
[0451] A typical method for the preparation of plasma suitable for
analysis by the methods described herein is as follows: High
quality platelet-poor plasma is made by drawing the blood using a
19 gauge butterfly needle without the use of a tourniquet from the
anetcubital fossa. The first 2 mL of blood drawn is discarded and
the remainder is rapidly mixed and aliquoted into Diatube H
anticoagulant tubes (Becton Dickinson). After gentle mixing by
inversion the anticoagulated blood is cooled on ice for 15 minutes
then subjected to centrifugation to pellet the cells and platelets
(approximately 1,200.times.g for 15 minutes). The platelet poor
plasma supernatant is carefully removed, drawing off the middle
third of the supernatant and discarding the upper third (which may
contain floating platelets) and the lower third which is too close
to the readily disturbed platelet layer on the top of the cell
pellet. The plasma may then be aliquoted and stored frozen at
-20.degree. C. or colder, and then thawed when required for
assay.
[0452] Samples may be analysed immediately ("fresh"), or may be
frozen and stored (e.g., at -80.degree. C.) ("fresh frozen") for
future analysis. If frozen, samples are completely thawed prior to
NMR analysis.
[0453] In one embodiment, said sample is a blood sample or a
blood-derived sample.
[0454] In one embodiment, said sample is a blood sample.
[0455] In one embodiment, said sample is a blood plasma sample.
[0456] In one embodiment, said sample is a blood serum sample.
[0457] Urine
[0458] The composition of urine is complex and highly variable both
between species and within species according to lifestyle. A wide
range of organic acids and bases, simple sugars and
polysaccharides, heterocydes, polyols, low molecular weight
proteins and polypeptides are present together with inorganic
species such as Na.sup.+, K.sup.+, Ca.sup.2+, Mg.sup.2+,
HCO.sub.3.sup.-, SO.sub.4.sup.2 and phosphates.
[0459] The term "urine," as used herein, pertains to whole (or
intact) urine, whether in vivo (e.g., foetal urine) or ex vivo,
e.g., by excretion or catheterisation.
[0460] The term "urine-derived sample," as used herein, pertains to
an ex vivo sample derived from the urine of the subject under study
(e.g., obtained by dilution, concentration, addition of additives,
solvent- or solid-phase extraction, etc.). Analysis may be
performed using, for example, fresh urine; urine which has been
frozen and then thawed; urine which has been dried (e.g.,
freeze-dried) and then reconstituted, e.g., with water or
D.sub.2O.
[0461] Methods for the collection, handling, storage, and
pre-analysis preparation of many classes of sample, especially
biological samples (e.g., biofluids) are well known in the art.
See, for example, Undon et al., 1999.
[0462] In one embodiment, said sample is a urine sample or a
urine-derived sample.
[0463] In one embodiment, said sample is a urine sample.
[0464] Organisms, Subjects, Patients
[0465] As discussed above, in many cases, samples are, or originate
from, or are drawn or derived from, an organism (e.g., subject,
patient). In such cases, the organism may be as defined below.
[0466] In one embodiment, the organism is a prokaryote (e.g.,
bacteria) or a eukaryote (e.g., protoctista, fungi, plants,
animals).
[0467] In one embodiment, the organism is a prokaryote (e.g.,
bacteria) or a eukaryote (e.g., protoctista, fungi, plants,
animals).
[0468] In one embodiment, the organism is a protoctista, an alga,
or a protozoan.
[0469] In one embodiment, the organism is a plant, an angiosperm, a
dicotyledon, a monocotyledon, a gymnosperm, a conifer, a ginkgo, a
cycad, a fern, a horsetail, a clubmoss, a liverwort, or a moss.
[0470] In one embodiment, the organism is an animal.
[0471] In one embodiment, the organism is a chordate, an
invertebrate, an echinoderm (e.g., starfish, sea urchins,
brittlestars), an arthropod, an annelid (segmented worms) (e.g.,
earthworms, lugworms, leeches), a mollusk (cephalopods (e.g.,
squids, octopi), pelecypods (e.g., oysters, mussels, clams),
gastropods (e.g., snails, slugs)), a nematode (round worms), a
platyhelminthes (flatworms) (e.g., planarians, flukes, tapeworms),
a cnidaria (e.g., jelly fish, sea anemones, corals), or a porifera
(e.g., sponges).
[0472] In one embodiment, the organism is an arthropod, an insect
(e.g., beetles, butterflies, moths), a chilopoda (centipedes), a
diplopoda (millipedes), a crustacean (e.g., shrimps, crabs,
lobsters), or an arachnid (e.g., spiders, scorpions, mites).
[0473] In one embodiment, the organism is a chordate, a vertebrate,
a mammal, a bird, a reptile (e.g., snakes, lizards, crocodiles), an
amphibian (e.g., frogs, toads), a bony fish (e.g., salmon, plaice,
eel, lungfish), a cartilaginous fish (e.g., sharks, rays), or a
jawless fish (e.g., lampreys, hagfish).
[0474] In one embodiment, the organism (e.g., subject, patient) is
a mammal.
[0475] In one embodiment, the organism (e.g., subject, patient) is
a placental mammal, a marsupial (e.g., kangaroo, wombat), a
monotreme (e.g., duckbilled platypus), a rodent (e.g., a guinea
pig, a hamster, a rat, a mouse), murine (e.g., a mouse), a
lagomorph (e.g., a rabbit), avian (e.g., a bird), canine (e.g., a
dog), feline (e.g., a cat), equine (e.g., a horse), porcine (e.g.,
a pig), ovine (e.g., a sheep), bovine (e.g., a cow), a primate,
simian (e.g., a monkey or ape), a monkey (e.g., marmoset, baboon),
an ape (e.g., gorilla, chimpanzee, orangutans, gibbon), or a
human.
[0476] Furthermore, the organism may be any of its forms of
development, for example, a spore, a seed, an egg, a larva, a pupa,
or a foetus.
[0477] In one embodiment, the organism (e.g., subject, patient) is
a human.
[0478] The subject (e.g., a human) may be characterised by one or
more criteria, for example, sex, age (e.g., 40 years or more, 50
years or more, 60 years or more, etc.), ethnicity, medical history,
lifestyle (e.g., smoker, non-smoker), hormonal status (e.g.,
pre-menopausal, post-menopausal), etc.
[0479] The term "population," as used herein, refers to a group of
organisms (e.g., subjects, patients). If desired, a population
(e.g., of humans) may be selected according to one or more of the
criteria listed above.
[0480] Conditions
[0481] As discussed above, many methods of the present invention
involve assigning class membership, for example, to one of one or
more classes, for example, to one of the two dasses: (i) presence
of a predetermined condition, or (ii) absence of a predetermined
condition.
[0482] A condition is "predetermined" in the sense that it is the
condition in respect to which the invention is practised; a
condition is predetermined by a step of selecting a condition for
considering, study, etc.
[0483] As used herein, the term "condition" relates to a state
which is, in at least one respect, distinct from the state of
normality, as determined by a suitable control population.
[0484] A condition may be pathological (e.g., a disease) or
physiological (e.g., phenotype, genotype, fasting, water load,
exercise, hormonal cycles, e.g., oestrus, etc.).
[0485] Included among conditions is the state of "at risk of" a
condition, "predisposition towards a" condition, and the like,
again as compared to the state of normality, as determined by a
suitable control population. In this way, osteoporosis, at risk of
osteoporosis, and predisposition towards osteoporosis are all
conditions (and are also conditions associated with
osteoporosis).
[0486] Where the condition is the state of "at risk of,"
"Predisposition towards," and the like, a method of diagnosis may
be considered to be a method of prognosis.
[0487] In this context, the phrases "at risk of," "predisposition
towards," and the like, indicate a probability of being
classified/diagnosed (or being able to be classified/diagnosed)
with the predetermined condition which is greater (e.g.,
1.5.times., 2.times., 5.times., 10.times., etc.) than for the
corresponding control. Often, a time period (e.g., within the next
5 years, 10 years, 20 years, etc.) is associated with the
probability. For example, a subject who is 2.times. more likely to
be diagnosed with the predetermined condition within the next 5
years, as compared to a suitable control, is "at risk of" that
condition.
[0488] Included among conditions is the degree of a condition, for
example, the progress or phase of a disease, or a recovery
therefrom. For example, each of different states in the progress of
a disease, or in the recovery from a disease, are themselves
conditions. In this way, the degree of a condition may refer to how
temporally advanced the condition is. Another example of a degree
of a condition relates to its maximum severity, e.g., a disease can
be classified as mild, moderate or severe). Yet another example of
a degree of a condition relates to the nature of the condition
(e.g., anatomical site, extent of tissue involvement, etc.).
[0489] Indications
[0490] The term "indication," as used herein, pertains to any
condition (e.g., pathologic, physiologic) which affects more than
about 100 individuals worldwide, and includes conditions or
syndromes which has yet to be identified by medical research. The
lower limit of "about 100 individuals" is determined by the need
for a sufficiently large population from which an effective model
can be constructed and verified.
[0491] A great many indications are known.
[0492] Specific examples of indications include, but are not
limited to, the following:
[0493] Diseases of the cardiovascular system, including
atherosclerosis and diseases such as myocardial infarction, stroke,
acute ischemia which is secondary to atherosclerosis; local or
systemic vasculitis, e.g., Behcet's Syndrome, giant cell arteritis,
polymyalgia rheumatica, Wegener's granulomatosis, Churg-Strauss
syndrome vasculitis, Henoch-Schonlein purpura, Kawasaki disease,
microscopic polyanglitis, Takayasu's arteritis, essential
cryoglobulinemic vasculitis, cutaneous leukocytoclastic anglitis,
polyarteritis nodosa, primary granulomatous central nervous system
vasculitis, drug-induced antineutrophil cytoplasmic autoantibodies
(ANCA)-associated vasculitis, cryoglobulinemic vasculitis, lupus
vasculitis, rheumatoid vasculitis, Sjogren's syndrome vasculitis,
hypocomplemtemic urticarial vasculitis, Goodpasture's syndrome,
serum-sickness vasculitis, drug-induced immune complex vasculitis,
paraneoplastic small vessel vasculitis (e.g., lymphoproliferative
neoplasm-induced vasculitis, myeloproliferative neoplasm-induced
vasculitis, and carcinoma-induced vasculitis), and inflammatory
bowel disease vasculitis; hypertension; reperfusion injury to a
range of tissues, but particularly brain and heart; aortic
aneurysms; vein graft hyperplasia; angiogenesis;
hypercholesterolemia, including xanthoma; congestive heart failure;
Kawasaki's disease; stenosis or restenosis, particularly in
patients undergoing angioplasty, or in stents, or in arteriovenous
shunts or fistulas; thromboembolic disease; deep-vein thrombosis;
sudden death syndrome; arrythmias; varicose veins.
[0494] Diseases of the skeleton, including osteoporosis and related
disorders associated with pathologically low bone mineral density;
osteoarthritis; osteopetrosis; Padget's disease; and ectopic bone
formation.
[0495] Disorders of the central nervous system, including obesity,
anorexia, migraine, chronic pain, neuralgia, clinical depression,
epilepsy; psychiatric disorders such as manic depression,
schizophrenia, catolepsy, hebephrenia, cleptomania, dipsomania,
sadomasochism, and addictive behaviour; sexual dysfunction,
including male erectile dysfunction; Alzheimers disease and
idiopathic dementia; spongiform encephalopathies, including
Creutzfeld-Jacob Disease (CJD) and new variant CJD; Parkinsons
Disease; multiple myositis; Meniere's disease; Guillain-Barre
syndrome; amyotrophic lateral sclerotis; myalgic encephaolyelitis;
solitary neuritis; radiculopathy.
[0496] Diseases of the respiratory system including asthma, chronic
obstructive pulmonary disease (COPD); pulmonary fibrosis such as
idiopathic pulmonary fibrosis; cystic fibrosis; lung disease, e.g.,
due to respiratory syncicial virus infection, or lung injury
(Lukacs et al, Adv. Immunol., 62, 257 (1996)); adult respiratory
distress syndrome (Robbins, Patholoqic Basis of Disease, Cotran et
al. (Eds.), 5th ed.); Loeffler's syndrome; chronic eosinophilic
pneumonia; acute interstitial pneumonitis; pulmonary fibrosis;
emphysema; pleurisy.
[0497] Renal disorders, including nephritis due to, for example,
autosomal dominant polycystic kidney disease, diabetic nephropathy,
IgA nephropathy, interstitial fibrosis, or lupus;
glomerulonephritis (Gesualdo et al, Kidney International, 51, 155
(1997)); hemolytic uremic syndrome (Van Setten et al., Pediatr.
Res., 43, 759 (1998)); kidney stones; urinary incontinence.
[0498] Skin disorders, including urticaria; eczema; psoriasis;
dermatomyositis; leukoderma vulgaris; photosensitivity; cutaneous
T-cell lymphoma.
[0499] Eye diseases such as uveitis or blinding Herpes stromal
keratitis; cataracts; myopia, astigmatism and related disorders;
detached retina; macular degeneration; comeal damage; siderosis
bulbi; retinitis; pigmentosa.
[0500] Diseases of the liver and related organs, including
cihrrosis and other forms of liver fibrosis such as drug-induced
fibrosis; diseases of the gall bladder, such as gall stones.
[0501] Diseases with an inflammatory or autoimmune component,
including allergic diseases, such as atopy, allergic rhinitis,
atopic dermatitis, anaphylaxis, allergic bronchopulmonary
aspergillosis, and hypersensitivity pneumonitis (pigeon breeders
disease, farmer's lung disease, humidifier lung disease, malt
workers' lung disease); allergies, including flea allergy
dermatitis in mammals such as domestic animals, e.g., dogs and
cats, contact allergens including mosquito bites or other insect
sting allergies, poison ivy, poison oak, poison sumac, or other
skin allergens; autoimmune disorders, including, but not limited
to, type I diabetes, Crohn's disease, multiple sclerosis,
arthritis, rheumatoid arthritis (Ogata et at., J. Pathol., 182, 106
(1997); Gong et al., J. Exp. Med., 186, 131 (1997)), systemic lupus
erythematosus, autoimmune (Hasimoto's) thyroiditis, autoimmune
liver diseases such as hepatitis and primary biliary cirrhosis,
hyperthyroidism (Graves' disease; thyrotoxicosis),
insulin-resistant diabetes, autoimmune adrenal insufficiency
(Addison's disease), autoimmune oophoritis, autoimmune orchitis,
autoimmune hemolytic anemia, paroxysmal cold hemoglobinuria,
Behcet's disease, autoimmune thrombocytopenia, autoimmune
neutropenia, pernicious anemia, pure red cell anemia, autoimmune
coagulopathies, myasthenia gravis, experimental allergic
encephalomyelitis, autoimmune polyneuritis, pemphigus and other
bullous diseases, rheumatic carditis, Goodpasture's syndrome,
postcardiotomy syndrome, Sjogren's syndrome, polymyositis,
dermatomyositis, and scleroderma; disease states resulting from
inappropriate inflammation, either local or systemic, for example,
irritable or inflammatory bowel syndrome (Mazzucchelli et al., J.
Pathol., 178, 201 (1996)), skin diseases such as psoriasis and
lichen planus, delayed type hypersensitivity, chronic pulmonary
inflammation, e.g., pulmonary alveolitis and pulmonary granuloma,
gingival inflammation or other periodontal disease, and osseous
inflammation associated with lesions of endodontic origin
(Volejnikova et al., Am. J. Pathol., 150, 1711 (1997)),
hypersensitivity lung diseases such as hypersensitivity pneumonitis
(Sugiyama et al., Eur. Respir. J., 8, 1084 (1995)), and
inflammation related to histamine release from basophils (Dvorak et
al., J. Allergy Clin. Immunol., 98, 355 (1996)), such as hay fever,
histamine release from mast cells (Martin et al., 1989), or mast
cell tumors, types of type 1 hypersensitivity reactions
(anaphylaxis, skin allergy, hives, allergic rhinitis, and allergic
gastroenteritis); ulcerative colitis.
[0502] Infection by one or more pathogens, including but not
limited to viruses, protozoa, fungi and bacteria. For example,
infection with human immunodeficiency virus (HIV), other
lentiviruses or retroviruses, or infection with other viruses,
e.g., cytomegalovirus, herpesvirus, viral meningitis; common cold;
influenza; fever, general viremia; measles; mumps; small pox;
poliomyelitis; protozoal infections including, malaria, cerebral
malaria, and other consequences of infection by parasites related
to plasmodium; parasitic infections, e.g., trypanosome,
Mycobacterium leprae or Mycobacterium tuberculosis infection,
helminth infections, such as nematodes (round worms) (Trichuriasis,
Enterobiasis, Ascariasis, Hookwom, Strongyloidiasis, Trichinosis,
filariasis); trematodes (fluxes) (Schistosomiasis, Clonorchiasis),
cestode (tape worms) (Echinococcosis, Taeniasis saginata,
Cysticercosis); visceral works, visceral larva migrans (e.g.,
Toxocara), eosinophilic gastroenteritis (e.g., Anisaki spp.,
Phocanema ssp.), cutaneous larva migrans (Ancylostoma braziliens,
Ancylostoma carinum); bacterial infection, e.g., bacterial
peritonitis, meningitis or gram negative sepsis; toxic shock
syndrome; lethal endotoxemia; granulomatous diseases such as
Mycobacteriosis, Pneumocystosis, Histoplasmosis, Blastomycosis,
Coccidiomycosis, Cryptococcosis, Aspergillosis, granulomatous
enteritis; foreign body granulomas and peritonitis, pulmonary
granulomatosis; syphilis and other sexual transmitted infections;
cat-scratch disease; erhlichiosis or Lyme disease including Lyme
arthritis; Helicobacter pylori infection, and similar chronic
infections; infection by Chlamydia species; septic and nonseptic
shock; infections with fungi, including candida albicans, taenia
pedis, taenia cruris.
[0503] Neoplasia in any organ system, both as a primary tumour and
as metastases e.g., histocytoma, glioma, astrocyoma, sarcoma,
osteosarcoma, osteoma (Zheng et al., J. Cell Biochem., 70, 121
(1998)), melanoma, Kaposi's sarcoma, ovarian carcinoma, breast
carinome, bowel cancer, lung cancer, small cell lung cancer,
various leukemias, testicular cancer, prostate cancer as well as
myelosuppression and mucositis associated with chemotherapy; as
well as benign growths and tumours, including inflammatory
pseudotumor of the lung.
[0504] Metabolic disorders, including non-insulin-dependent
diabetes (or type II diabetes); lipoproteinemias; gout;
malnourishment (including vitamin deficiency, mineral deficiency,
imbalanced dietary intake, etc.); hypothyroidism; hyperthyroidism;
Basedow's disease; Addison's disease.
[0505] Miscellaneous diseases including aberrant hematopoiesis;
anemia; otitis externa; pancreatitis; cachexia of tuberculosis or
cancer; dental disorders, including dental caries, gum disease,
gingivitis; cosmetic disorders (e.g., halitosis, alopetia,
baldness, excessive body odour, etc.); sterility and related
reproductive dysfunction; diarrohea; headaches.
[0506] Trauma induced pathology, either surgical or non-surgical in
origin, including complications of organ transplantation, such as
acute transplant rejection or delayed graft function, allograft
rejection and graft versus host disease; transplant vasculopathy;
intraperitoneal adhesions, e.g., adhesions which develop
post-surgery, particularly after gynecologic or intestinal
surgeries (Zeyneloglu et al., Am. J. Obstet. Gynecol., 179, 438
(1998)); scarring after surgery; radiation-induced fibrosis;
post-trauma inflammation, e.g., post-surgical inflammation such as
that following orthopedic surgeries, e.g., prosthetic implants, as
well as atherectomy, circulatory surgeries, and tissue
replacements; complications associated with peritoneal dialysis
(such as inflammation e.g. Sach et at., Nephol. Dial. Transplant,
12, 315 (1997)); brain or spinal cord trauma, such as after disc
surgery (Ghimikar et al., J. Neurosci. Res., 46, 727 (1996); Berman
et al., J. Immunol., 156, 3017 (1996)); scarring during normal
wound healing; complications associated with trauma which is not
externally obvious, including bone fractures and breaks, internal
organ damage; internal bleeding, edema (in particular cerebral
swelling), ruptured kidney, pancreas or liver, brain damage,
persistant vegetative state; coma; presence or severity of chemical
bums; presence or severity of thermal bums.
[0507] States arising from poisoning or substance abuse, including
silicosis, sarcoidosis (Iida et al., Thorax. 52, 431 (1997); Car et
al., Am. J. Respir. Crit. Care Med., 149, 655 (1994)) and
berylliosis; poisoning (e.g., with heavy metals such as lead or
cadmium; with arsenic or mercuric salts; with organics such as
strychnine, ergots, etc.); substance abuse (e.g., overdoses with
therapeutics such as paracetamol or other common substances such as
alcohol or nicotine, or use of illegal substances such as heroin,
cocaine or LSD, as well as anabolic steroids, etc.).
[0508] In one embodiment, said indication is selected from:
neurodegeneration (including Alzeihmer Disease, Parkinsons Disease,
and Creutzfeld Jacob disease (CJD); osteoporosis; osteoarthritis;
atherosclerosis (including coronary heart disease and stroke);
hypertension; respiratory diseases (including asthma, COPD
emphysema, and respiratory distress syndrome); autoimmune diseases
(including rheumatoid arthritis, multiple sclerosis, systemic lupus
erythromatosis, and various allergies); cancer (primary tumours and
metastatic disease); systemic or local fibrosis; and, infection
(with viruses, bacteria, fungi, or protozoa).
[0509] In one embodiment, said indication is selected from:
hypertension; atherosclerosis/coronary heart disease; osteoporosis;
and osteoarthritis.
[0510] In one embodiment, said predetermined condition is
associated with one of the above-mentioned indications, for
example, the indication, degree of the indication, predisposition
to the indication, etc.
[0511] Predetermined Condition
[0512] In one embodiment, said predetermined condition is
associated with genetic and/or fraternal twinning.
[0513] In one embodiment, said predetermined condition is
associated with hypertension.
[0514] In one embodiment, said predetermined condition is
associated with atherosclerosis/coronary heart disease.
[0515] In one embodiment, said predetermined condition is
associated with osteoporosis.
[0516] In one embodiment, said predetermined condition is
associated with osteoarthritis.
[0517] Hypertension
[0518] Hypertension, or high blood pressure, is one of the most
prevalent clinically significant abnormalities in the Western
world. More than 20% of individuals suffer from mild to moderate
elevations of blood pressure. The consequences of hypertension
include headaches, nausea and visual disturbances as direct
consequences, and an increased risk of atherosclerosis (and hence
heart attacks and ischemic stroke) as well as diabetes,
haemorrhagic stroke and kidney failure.
[0519] In most cases, the causes of hypertension are not well
understood. In the minority of cases, a single well-defined cause
of the hypertension can be identified (for example, tumours of the
adrenal gland can lead to overproduction of the pressor hormones
such as adrenalin, leading to a chronic elevation in blood
pressure). In the majority of cases, however, the cause of the
hypertension remain unidentified by the clinician, and the disorder
is classified as "essential hypertension."
[0520] Much of the risk of hypertension is thought to be genetic,
since individuals who have the highest blood pressure within a
population sample when they are children mostly have the highest
blood pressures with a population sample as adults. Since blood
pressure rises in everyone with age (as the artery walls harden
reducing vessel elasticity and compliance) these individuals are
readily identified as those subjects at highest risk of developing
hypertension as they age.
[0521] It is cheap, trivial and non-invasive to measure blood
pressure using a sphyngomanometer. Hence, unlike coronary artery
disease, the major difficulty in the treatment of hypertension is
not identifying people with the disease, or at risk of the disease,
but in treating the root cause of the abnormality. Since it is not
understood what is the cause of hypertension in the vast majority
of the cases, treatment consists of symptomatic relief. For
example, people with hypertension are often treated with inhibitors
of the enzyme ACE (angiotensin converting enzyme) since angiotensin
II is a major acute pressor signal, and reducing angiotensin II
production reduces blood pressure. A typical ACE inhibitor,
captopril, allows rapid and effective symptomatic control of high
blood pressure.
[0522] While it was clear that relieving the phenotype of
hypertension reduces some of the consequential co-morbidity e.g.,
reduces the risk of haemorrhagic stroke which is directly caused by
the high blood pressure, it is much less clear whether intervening
to reduce blood pressure normalises the risk of diabetes or
coronary artery disease that is associated with hypertension. In
other words, an hypertensive individual may have the same elevated
risk of coronary heart disease whether or not they are successfully
treated with agents that lower blood pressure. The likely reason
for this is that a common pathogenic mechanism underlies both the
hypertension and the coronary artery disease and that the
treatments to lower blood pressure do not do so by attacking the
root cause of the two problems.
[0523] Atherosclerosis/Coronary Heart disease
[0524] Coronary heart disease (CHD) is a major cause of mortality
and morbidity in developed countries, affecting as many as 1 in 3
individuals before the age of 70 years (see, e.g., Kannel et al.,
1974).
[0525] Atherosclerosis (commonly called "hardening of the
arteries"), is a vascular condition in which arteries narrow. It is
associated with deposits of oxidised lipid on the walls of
arteries, which accumulate and eventually harden into plaques. The
arteries become calcified and lose elasticity, and as this process
continues, blood flow slows. It can affect any artery, including,
e.g., the coronary arteries.
[0526] In order to perform the arduous task of pumping blood, the
heart muscle needs a plentiful supply of oxygen-rich blood, which
is provided through a network of coronary arteries. Coronary artery
disease is the end result of atherosclerosis, preventing sufficient
oxygen-rich blood from reaching the heart. Oxygen deprivation in
vital cells (called ischaemia) causes injury to the tissues of the
heart. If the artery becomes completely blocked, damage becomes so
extensive that cell death, a heart attack, occurs. A heart attack
usually occurs when a blood clot forms completely sealing off the
passage of blood in a coronary artery. This typically happens when
the plaque itself develops fissures or tears; blood platelets
adhere to the site to seal off the plaque and a blood clot
(thrombus) forms.
[0527] Angina is not a disease itself but is the primary symptom of
coronary artery disease. It is typically experienced as chest pain,
which can be mild, moderate, or severe, but is often reported as a
dull, heavy pressure that may resemble a crushing object on the
chest. Pain often radiates to the neck, jaw, or left shoulder and
arm. Less commonly, patients report mild burning chest discomfort,
sharp chest pain, or pain that radiates to the right arm or back.
Sometimes a patient experiences shortness of breath, fatigue, or
palpitations instead of pain. Classic angina is precipitated by
exertion, stress, or exposure to cold and is relieved by rest or
administration of nitroglycerin. Angina can also be precipitated by
large meals, which place an immediate demand upon the heart for
more oxygen. The intensity of the pain does not always relate to
the severity of the medical problem. Some people may feel a
crushing pain from mild ischemia, while others might experience
only mild discomfort from severe ischemia. Some people have also
reported a higher sensitivity to heat on the skin with the onset of
angina.
[0528] Although atherosclerosis is far and away the leading cause
of angina, other conditions can impair the delivery of oxygen to
the heart muscle and cause pain. Such conditions include: spasm in
the coronary artery, abnormalities of the heart muscle itself,
hyperthyroidism, anaemia, vasculitis (a group of disorders that
cause inflammation of the blood vessels), and, in rare cases,
exposure to high altitudes. Many conditions may cause chest pains
unrelated to heart or blood vessel abnormalities. High on the list
are anxiety attacks, gastrointestinal disorders (gallstone attacks,
peptic ulcer disease, hiatal hernia, heartburn), lung disorders
(asthma, blood clots, bronchitis, pneumonia, collapsed lung), and
problems affecting the ribs and chest muscles (injured muscles,
fractures, arthritis, spasms, infections).
[0529] Stable angina can be extremely painful, but its occurrence
is predictable; it is usually triggered by exertion or stress and
relieved by rest. Stable angina responds well to medical treatment.
Any event that increases oxygen demand can cause angina, including
exercise, cold weather, emotional tension, and even large meals.
Angina attacks can occur at any time during the day, but a high
proportion seems to take place between the hours of 6:00 AM and
noon.
[0530] Unstable angina is a much more serious situation and is
often an intermediate stage between stable angina and a heart
attack. A patient is usually diagnosed with unstable angina under
the following conditions: pain awakens a patient or occurs during
rest, a patient who has never experienced angina has severe or
moderate pain during mild exertion (walking two level blocks or
climbing one flight of stairs), or stable angina has progressed in
severity and frequency within a two-month period. Medications are
less effective in relieving pain of unstable angina.
[0531] Another type of angina, called variant or Prinzmetal's
angina, is caused by a spasm of a coronary artery. It almost always
occurs when the patient is at rest. Irregular heartbeats are
common, but the pain is generally relieved immediately with
treatment.
[0532] Some people with severe coronary artery disease do not
experience angina pain, a condition known as silent ischaemia,
which some experts attribute to abnormal processing of heart pain
by the brain.
[0533] Coronary artery disease (premature blockage of one or more
of the coronary arteries) is the leading killer in the USA of both
men and women, responsible for over 475,000 deaths in 1996. On the
positive side, mortality rates from coronary artery disease have
significantly declined in industrialised countries over the past
few decades, although they are on the rise in developing nations.
When the necessary lifestyle changes are enacted in combination
with appropriate medical or surgical treatments, a person suffering
angina and heart disease has a good chance of living a normal life.
Experts have believed, for example, that unstable angina indicates
a very high risk for death after a heart attack, but a recent study
indicated that after the first year of treatment, such a patient's
risk for death is only 1.2% above the risk in the normal
population. Much evidence exists, in fact, that onset of angina
less than 48 hours before a heart attack is actually protective,
possibly by conditioning the heart to resist the damage resulting
from the attack. In one study, people without chest pain
experienced much higher complication and mortality rates than those
with pain.
[0534] Angiographic x-ray imaging ("angiography") has grown into
its own classification of x-ray imaging over time. The basic
principal is the same as a conventional x-ray scan: x-rays are
generated by an x-ray tube and as they pass through the body part
being imaged, they are attenuated (weakened) at different levels.
These differences in x-ray attenuation are then measured by an
image intensifier and the resulting image is picked up by a TV
camera. In modern angiography systems, each frame of the analogue
TV signal is then converted to a digital frame and stored by a
computer in memory and/or on hard magnetic disk. These x-ray
"movies" can be viewed in real time as the angiography is being
performed, or they can be reviewed later using recall from digital
memory.
[0535] During angiography, physicians inject streams of contrast
agents or dyes into the area of interest using catheters to create
detailed images of the blood vessels in real time. During the
angiographic procedure, physicians can guide a catheter into the
area of interest to remove stenoses (blockages) of blood vessels.
Patients with blockages of the major leg vessels, for instance, can
have nearly total recovery after such angioplasty is performed to
remove the constriction.
[0536] X-ray angiography is performed to specifically image and
diagnose diseases of the blood vessels of the body, including the
brain and heart Traditionally, angiography was used to diagnose
pathology of these vessels such as blockage caused by plaque
build-up. However in recent decades, radiologists, cardiologists
and vascular surgeons have used the x-ray angiography procedure to
guide minimally invasive surgery of the blood vessels and arteries
of the heart. In the last several years, diagnostic vascular images
are often made using magnetic resonance imaging, computed x-ray
tomography or ultrasound and whilst x-ray angiography is reserved
for therapy. Conventional x-ray angiography has a lead role in the
detection, diagnosis and treatment of heart disease, heart attack,
acute stroke and vascular disease which can lead to stroke.
[0537] Most conventional x-ray angiography procedures are similar.
Patient preparation involves removing clothing and jewellery and
wearing a patient gown. In all cases, angiography requires that an
intravenous contrast agent is administered. For interventional or
therapeutic angiography, a small incision is made in the groin or
arm so that a catheter can be inserted during the study. The
patient is positioned on the examination table by the technologist
so that the anatomy of interest (e.g. coronary arteries) is in the
proper field of view between the x-ray tube and image intensifier.
The technologist and radiologist remain at table-side during the
procedure to operate the angiography system and work with the
catheters, contrast injectors and related devices. Typically the
patient simply needs to relax and stay calm during angiography.
Some angiography procedures can take up to two hours while other
procedures take less than an hour. Once the procedure is finished,
the patient will be given a period of time to recover. During this
period, the patient's case is reviewed on film or monitor.
Depending on the type of angiographic procedure and the patient's
medical condition, an inpatient recovery may be required or the
patient may be released after a short time. In some cases, more
images may need to be taken.
[0538] Using angiography to see inside the body, doctors can repair
blood vessels without the use of a scalpel and fully invasive
surgical methods. Advances in the design and use of catheters
(small tubes that are guided into the blood vessels through tiny
incisions in the groin area or upper arm) allow physicians to
perform very complex therapeutic procedures from within the blood
vessel. Pathology of the blood vessels such as plaque build up in
the arms and legs, neck and brain, and heart can be treated using a
variety of interventional angiographic surgery (e.g. coronary
angioplasty).
[0539] Although coronary angiography is the gold standard for CHD
(including detection, diagnosis, and treatment), this technique is
not without its problems. Coronary angiography is an extremely
invasive technique and is associated with a morbidity rate of 1%
and a mortality rate of 0.1%. In addition to the invasive nature of
angiography, the technique is also very expensive and
time-consuming. In the UK, the average cost for coronary
angiography is approximately .English Pound.8,000-.English
Pound.10,000 per case. The disadvantages associated with coronary
angiography make the technique unsuitable as a routine screening
procedure.
[0540] Over the past three decades a range of environmental and
biochemical risk factors for the development of CHD have been
identified in cross-sectional studies (see, e.g., Kjelsberg et al.,
1997). Examples are listed in Table 3-1-CHD. For example, tobacco
smoking is associated with an approximately 2-fold increased risk
of CHD (see, e.g., Kuller et al., 1991). Similarly, high levels of
cholesterol in large, triglyceride-rich lipoprotein particles
(mainly VLDL and LDL) and lower levels of cholesterol in HDL
particles is well known to be associated with increased risk of CHD
(see, e.g., MRFIT Research Group, 1986; Despres et al., 2000).
1TABLE 3-1-CHD Risk Factors for Coronary Heart Disease Potentially
Changeable Risk Factors Fixed Risk Factors Strong Association Weak
Association age hyperlipidaemia personality male sex cigarette
smoking obesity positive family history hypertension gout diabetes
mellitus soft water lack of exercise contraceptive pill heavy
alcohol intake
[0541] These epidemiological studies have been tremendously useful
in a number of ways. Firstly, they have underpinned public health
policy on a range of issues, discouraging tobacco smoking and
promoting low cholesterol diets (see, e.g., McIlvain et al., 1992;
Dolecek et al., 1986). Secondly, they have provided vital clues as
to the underlying molecular mechanisms which cause atherosclerosis
and CHD (see, e.g., Ross, 1999). For example, once the association
between elevated levels of LDL-cholesterol and CHD had been
identified, it was possible to demonstrate that increased
LDL-cholesterol actually causes atherosclerosis by reverse genetic
techniques in mice (see, e.g., Plump et al., 1992; Yokode et al.,
1990; Breslow, 1993). Extending these studies, therapies were then
designed on the basis of their ability to lower LDL-cholesterol.
These lipid lowering therapies have now been shown to be broadly
effective in reducing the risk of myocardial infarction, even among
people with normal levels of LDL-cholesterol.
[0542] However, the risk factors identified to date from
cross-sectional epidemiological studies are insufficiently powerful
to provide a clinically useful diagnosis of CHD. Although
algorithms have been designed based on a range of risk factors,
such as age, sex, lipoprotein levels and blood pressure, which can
identify sub-populations at very significant excess risk of CHD,
even the best of these based on the excellent PROCAM study in
Munster, Germany, cannot diagnose the presence of CHD on an
individual by individual basis (see, e.g., Cullen et al., 1998). It
is likely that CHD is weakly associated with a very large number of
environmental, physiological and biochemical variables, and as a
result even the full range of risk factors discovered to date
comprise insufficient density of data to accurately discriminate
CHD patients from healthy controls on an individual basis (see,
e.g., Isles et al., 2000).
[0543] Recently, there have been technical advances which have
allowed datasets to be constructed from individuals which have
extremely high data densities. Techniques such as genomics
(examining the cellular gene expression pattern of thousands of
genes simultaneously, see, e.g., Collins et al., 2001), proteomics
(examining the cellular contents of multiple proteins
simultaneously, see, e.g., Dutt et al., 2000) and metabonomics
(examining the changes in hundreds or thousands of low molecular
weight metabolites in an intact tissue or biofluid) offer the
prospect of efficiently distinguishing individuals with particular
disease or toxic states (see, e.g., Nicholson et al., 1999).
[0544] Whereas currently, a firm diagnosis of CHD can only be made
through application of angiography, which is both expensive and
invasive, the introduction of metabonomic screening, as described
herein, would allow diagnosis to be made simply and cheaply on the
basis of a single blood sample, e.g., a non-invasive diagnosis of
CHD. Such changes would revolutionize the provision of health care
for CHD, allowing both widespread population screening and
efficient targeting of drugs such as statins which, while being
broadly effective in reducing the risk of myocardial infarction,
are difficult to target to those most in need of treatment.
[0545] Atherosclerotic Load and Atherosclerotic Conditions
[0546] In one embodiment, the predetermined condition is related to
atherosclerotic load, for example, a state of abnormally high
atherosclerotic load.
[0547] The terms "atherosclerotic load" and "atherosclerotic
burden," as used herein, pertain to the total volume of
atherosclerotic plaque tissue found throughout the vascular tree of
a subject. Although most direct diagnostic procedures, such as
angiography, examine only a particular site (e.g., the coronary
arteries), most biochemical tests which depend on analysis of the
blood are associated with the total atherosclerotic load throughout
the vascular tree. In most cases, however, the presence of
atherosclerosis in one organ system is indicative of its presence
in others. Thus, subjects with coronary artery atherosclerosis
will, in general, have higher total atherosclerotic load than
subjects without coronary artery atherosclerosis. The converse is
also true: individuals with high total atherosclerotic loads are
much more likely to have coronary artery disease than individuals
with low atherosclerotic loads. Different conditions are associated
with the presence of atherosclerosis in particular arteries, for
example, coronary heart disease is associated with atherosclerosis,
at least in part, in the coronary arteries; stroke is associated
with atherosclerosis, at least in part, in the carotid
arteries.
[0548] In one embodiment, the predetermined condition is related to
an atherosclerotic condition.
[0549] The term "atherosderotic condition," as used herein,
pertains to a condition associated with an abnormally high
atherosclerotic load, as compared to a suitable control
population.
[0550] Examples of atherosclerotic conditions include, but are not
limited to, the following, which are organised by the artery system
affected or most affected or most relevant:
[0551] Peripheral vascular disease (PVD). This can lead to ischemia
in the extremities, leading to pain, morbidity and in severe cases
to amputation.
[0552] Deep vein thrombosis (DVT). This is a common cause of
ischemia, often secondary to PVD, but may have other causes (e.g.,
long periods of inactivity on long-haul flights).
[0553] Diabetes macrovascular atherosclerosis. This is one of the
most common complications of diabetes. It may also include
complications at specific vascular beds, most commonly diabetic
retinopathy and diabetic nephropathy, where the vascular beds of
the eye and kidney, respectively, are particularly badly
affected.
[0554] Coronary artery disease (CAD). This is the most common cause
of heart attacks, and is atherosclerosis of one or more major
coronary artery.
[0555] Angina. This describes the specific symptoms of CAD, and can
be stable or unstable.
[0556] Ischemic stroke. The most common cause of stroke is ischemia
secondary to atherosclerosis of the major arteries supplying the
brain. This includes all forms of stroke except haemorrhagic
stroke.
[0557] Transient ischemic attack syndrome (TIA). This is the brain
equivalent of angina, in which the blood supply to the brain is
reduced--not sufficiently to cause infarction (tissue death), but
sufficiently to lead to symptoms resembling epilepsy.
[0558] Renal hypertension. One of the most commori causes of
hypertension is atherosclerosis of the renal artery, which reduces
kidney perfusion and upsets the blood volume regulatory
mechanisms.
[0559] Marfan Syndrome. A relatively common inherited monogenic
disorder due to mutation in the fibrillin genes, which results in
vascular changes which can resemble atherosclerosis.
[0560] MoyaMoya disease. This condition is similar to Marfan
syndrome, but affects predominantly the brain vasculature.
[0561] Monkeburg Syndrome. A rare monogenic disorder in which
vascular calcification, similar to that seen in atherosclerosis,
affects the aorta. This condition resembles Marfan syndrome and can
lead to dissection of the vessel and death.
[0562] Functions of Bone
[0563] The function of bone is to provide mechanical support for
joints, tendons and ligaments, to protect vital organs from damage
and to act as a reservoir for calcium and phosphate in the
preservation of normal mineral homeostasis. Diseases of bone
compromise these functions, leading to clinical problems such as
fracture, bone pain, bone deformity and abnormalities of calcium
and phosphate homeostasis.
[0564] Types of Bone
[0565] The normal skeleton contains two types of bone; cortical or
compact bone, which makes up most of the shafts (diaphysis) of the
long bones such as the femur and tibia, and trabecular or spongy
bone which makes up most of the vertebral bodies and the ends of
the long bones.
[0566] All bone is subject to continual turnover, with old bone
being actively resorbed, and new bone being deposited. This
turnover, or "remodelling" is essential for maintenance of
structural competence because continual loading results in the
formation of numerous microfractures in the bone matrix which, if
left unchecked, would be weak points that could seed catastrophic
failures of the bone, i.e., clinically obvious fractures. Such a
process can be likened to a stone-chip on an automobile windscreen:
the small crack can act as a catalyst for the sudden failure of the
entire structure.
[0567] Remodelling is therefore an essential process for the
maintaining bone strength. As the bone is resorbed and
re-deposited, the microfractures and structural imperfections are
removed.
[0568] Trabecular bone has a greater surface area than cortical
bone and because of this is remodeled more rapidly. Consequently,
conditions associated with increased bone turnover tend to affect
trabecular bone more quickly and more profoundly than cortical
bone. Cortical bone is arranged in so-called Haversian systems
which consists of a series of concentric lamellae of collagen
fibres surrounding a central canal that contains blood vessels.
Nutrients reach the central parts of the bone by an interconnecting
system of canaliculi that run between osteocytes buried deep within
bone matrix and lining cells on the bone surface. Trabecular bone
has a similar structure, but here the lamellae run in parallel to
the bone surface, rather than concentrically as in cortical
bone.
[0569] Bone Composition
[0570] The organic component of bone matrix comprises mainly of
type I collagen: a fibrillar protein formed from three protein
chains, wound together in a triple helix. Collagen type I is laid
down by bone forming cells (osteoblasts) in organised parallel
sheets (lamellae). Type I collagen is a member of the collagen
superfamily of related proteins which all share the unique
structural motif of a left-handed triple helix. The presence of
this structural motif, which is responsible for the mechanical
strength of collagen sheets, imposes certain absolute requirements
on the primary amino acid sequence of the protein. If these
requirements are not met, the protein cannot form into the triple
helix characteristic of collagens. The most important structural
requirements are the presence of glycine amino acid residues at
every third position (where the amino acid side chain points in
towards the center of the triple helix) and proline residues at
every third position to provide both structural rigidity and
periodicity on the helix. Glycine is required because it has the
smallest side chain of all the proteogenic amino acids (just a
single hydrogen atom) and so can be accommodated in the spatially
constrained interior of the helix. Proline is required because
proline is the only secondary amine among the 20 proteogenic acids,
which introduces a rigid `bend` in the polypeptide, such that the
presence of proline residues at repeated intervals will result in
the adoption of a helical conformation.
[0571] After synthesis, the collagen protein is the subject of
post-translational modifications which are essential for the
structural rigidity required in bone. Firstly, collagen becomes
hydroxylated on certain proline and lysine residues (e.g. to form
hydoxyproline and hydroxylysine, respectively). This hydroxylation
depends on the activity of enzymes that require vitamin C as a
cofactor. Vitamin C deficiency leads to scurvy, a disease in which
bone and other collagen-containing tissues (such as skin, tendon
and connective tissue) are structurally weakened. This demonstrates
the essential requirement for normal collagen hydroxylation.
[0572] After deposition into the bone, the collagen chains become
cross-linked by specialised covalent bonds (pyridinium cross-links)
which help to give bone its tensile strength. These cross links are
formed by the action of enzymes on the hydroxylated amino acids
(particularly hydroxylysine) in the collagen. It is the absence of
these crosslinks which results in the weakened state of the tissue
in scurvy when hydroxylation is inhibited by the absence of
sufficient vitamin C.
[0573] The biochemical structure of collagen is an important factor
in the strength of bone, but the pattern in which it is laid down
is also important. The collagen fibres should be laid down in
ordered sheets for maximal tensile strength. However, when bone is
formed rapidly (for example in Paget's disease, or in bone
metastases), the lamellae are laid down in a disorderly fashion
giving rise to "woven bone," which is mechanically weak and easily
fractured.
[0574] Bone matrix also contains small amounts of other collagens
and several non-collagenous proteins and glycoproteins. The
function of non-collagenous bone proteins is unclear, but it is
thought that they are involved in mediating the attachment of bone
cells to bone matrix, and in regulating bone cell activity during
the process of bone remodelling. The organic component of bone
forms a framework (called osteoid) upon which mineralisation
occurs. After a lag phase of about 10 days, the matrix becomes
mineralised, as hydroxyapatite ((Ca.sub.10(PO.sub.4).sub.6(OH).su-
b.2) crystals are deposited in the spaces between collagen fibrils.
Mineralisation confers upon bone the property of mechanical
rigidity, which complements the tensile strength, and elasticity
derived from bone collagen.
[0575] Bone Cell Function and Bone Remodelling
[0576] The mechanical integrity of the skeleton is maintained by
the process of bone remodelling, which occurs throughout life, in
order that damaged bone can be replaced by new bone. Remodelling
can be divided into four phases; resorption; reversal, formation,
and quiescence (see, e.g., Raisz, 1988; Mundy, 1996). At any one
time approximately 10% of bone surface in the adult skeleton is
undergoing active remodelled whereas the remaining 90% is
quiescent.
[0577] Osteoclast Formation and Differentiation
[0578] Remodelling commences with attraction of bone resorbing
cells (osteoclasts) to the site, which is to be resorbed. These are
multinucleated phagocytic cells, rich in the enzyme
tartrate-resistant acid phosphatase, which are formed by fusion of
precursors derived from the cells of monocyte/macrophage lineage.
Osteoclast formation and activation is dependent on close contact
between osteoclast precursors and bone marrow stromal cells.
Stromal cells secrete the cytokine M-CSF, which is essential for
differentiation of both osteoclasts and macrophages from a common
precursor.
[0579] Mature osteoclasts form a tight seal over the bone surface
and resorb bone by secreting hydrochloric acid and proteolytic
enzymes through the "ruffled border" into a space beneath the
osteoclast (Howship's lacuna). The hydrochloric acid secreted by
osteoclasts dissolves hydroxyapatite and allows proteolytic enzymes
(mainly Cathepsin K and matrix metalloproteinases) to degrade
collagen and other matrix proteins. Deficiency of these proteins
causes osteopetrosis which is a disease associated with increased
bone mineral density and osteoclast dysfunction. After resorption
is completed osteoclasts undergo programmed cell death (apoptosis),
in the so-called reversal phase which heralds the start of bone
formation.
[0580] Osteoblast Formation and Differentiation
[0581] Bone formation begins with attraction of osteoblast
precursors, which are derived from mesenchymal stem cells in the
bone marrow, to the bone surface. Although these cells have the
potential to differentiate into many cell types including
adipocytes, myocytes, and chondrocytes, in the bone matrix they are
driven towards an osteoblastic fate. Mature osteoblasts are plump
cuboidal cells, which are responsible for the production of bone
matrix. They are rich in the enzyme alkaline phosphatase and the
protein osteocalcin, which are used clinically as serum markers of
osteoblast activity. Osteoblasts lay down bone matrix which is
initially unmineralised (osteoid), but which subsequently becomes
calcified after about 10 days to form mature bone. During bone
formation, some osteoblasts become trapped within the matrix and
differentiate into osteocytes, whereas others differentiate into
flattened "lining cells" which cover the bone surface. Osteocytes
connect with one another and with lining cells on the bone surface
by an intricate network of cytoplasmic processes, running through
cannaliculi in bone matrix. Osteocytes appear to act as sensors of
mechanical strain in the skeleton, and release signalling molecules
such as prostaglandins and nitric oxide (NO), which modulate the
function of neighbouring bone cells.
[0582] Regulation of Bone Remodelling
[0583] Bone remodelling is a highly organised process, but the
mechanisms which determine where and when remodelling occurs are
poorly understood. Mechanical stimuli and areas of micro-damage are
likely to be important in determining the sites at which
remodelling occurs in the normal skeleton. Increased bone
remodelling may result from local or systemic release of
inflammatory cytokines like interleukin-1 and tumour necrosis
factor in inflammatory diseases. Calciotropic hormones such as
parathyroid hormone (PTH) and 1,25-dihydroxyvitamin D, act together
to increase bone remodelling on a systemic basis allowing skeletal
calcium to be mobilised for maintenance of plasma calcium
homeostasis. Bone remodelling is also increased by other hormones
such as thyroid hormone and growth hormone, but suppressed by
oestrogen, androgens and calcitonin. There has been considerable
study of the processes which regulate the bone resorption side of
the balance, but the factors regulating the rate of bone deposition
are considerably less well understood.
[0584] Bone Disorders
[0585] There are a range of disorders of bone which result from the
failure to properly regulate the metabolic processes which govern
bone turnover (e.g., metabolic bone disorders). Osteoporosis (OP)
is the most prevalent metabolic bone disease. It is characterized
by reduced bone mineral density (BMD), deterioration of bone
tissue, and increased risk of fracture, e.g., of the hip, spine,
and wrist Many factors contribute to the pathogenesis of
osteoporosis including poor diet, lack of exercise, smoking, and
excessive alcohol intake. Osteoporosis may also arise in
association with inflammatory diseases such as rheumatoid
arthritis, endocrine diseases such as thyrotoxicosis, and with
certain drug treatments such as glucocorticoids. However there is
also a strong genetic component in the pathogenesis of
osteoporosis.
[0586] Osteoporosis is a major health problem in developed
countries. As many as 60% of women suffer from osteoporosis, as
defined by the World Health Organisation (WHO), with half of these
suffers also having clinically relevant skeletal fractures. Thus 1
in 3 of all women in developed countries will have a skeletal
fracture due to osteoporosis. This is a major cause of morbidity
and mortality leading to massive health care costs (an estimated
$14 billion per annum in the USA alone) (see, e.g., Melton et al.,
1992).
[0587] Osteopetrosis, the opposite of osteoporosis, is
characterised by excessive bone mineral density. It is, however,
much rarer than osteoporosis with as few as 1 in 25,000 women
affected.
[0588] After osteoporosis, the next most prevalent bone disease is
osteoarthritis. Osteoarthritis (OA) is the most common form of
arthritis in adults, with symptomatic disease affecting roughly 10%
of the US population over the age of 30 (see, e.g., Felson et al.,
1998). Because OA affects the weight bearing joints of the knee and
hip more frequently than other joints, osteoarthritis accounts for
more physical disability among the elderly than any other disease
(see, e.g., Guccione et al., 1994). Osteoaithritis is the most
common cause of total knee and hip replacement surgery, and hence
offers significant economic as well as quality of life burden.
Recent estimates suggest the total cost of osteoarthritis to the
economy, accounting for lost working days, early retirement and
medical treatment may exceed 2% of the gross domestic product (see,
e.g., Yelin, 1998).
[0589] The physiological mechanisms which underlie osteoarthritis
remain hotly debated (see, e.g., Felson et al., 2000) but it seems
certain that several environmental factors contribute, including
excess mechanical loading of the joints, acute joint injury, and
diet, as well as a strong genetic component. The disease is
characterised by the narrowing of the synovial space in the joint,
inflammatory and fibrous changes to the connective tissue, and
altered turnover of connective tissue proteins, including the
primary connective tissue collagen, type II. The most recent
studies suggest that osteoarthritis may result from misregulated
connective tissue remodelling in much the same way that
osteoporosis results from misregulated bone remodelling. Whereas
osteoporosis is a disease of quantitatively low bone mineral
density, osteoarthritis is a disease of spatially inappropriate
bone mineralisation.
[0590] There are a range of other less common bone disorders,
including:
[0591] Ricketts and osteomalacia are the result of vitamin D
deficiency. Vitamin D is required for absorption of calcium and
phosphate and for their proper incorporation into bone mineral.
Deficiency of vitamin D (called Ricketts in children and
osteomalacia in adults) results in a range of symptoms including
low bone mineral density, bone deformation and in severe cases
muscle tetany due to depletion of extracellular calcium ion
stores.
[0592] Hyperparathyroidism (over production of parathyroid horomone
or PTH) can have similar symptoms to Ricketts. This is unsurprising
since PTH production is stimulated in Ricketts as an attempt to
maintain the free calcium ion concentration. PTH stimulates bone
resorption by promoting osteoclast activity, and hence can result
in symptoms resembling osteoporosis. Osteomalacia and
hyperparathyoidism combined contribute only a very small fraction
of all cases of adult osteoporosis. In almost every case, adult
osteoporosis is due to defective bone deposition rather than
overactive resorption (see, e.g. Guyton, 1991).
[0593] Paget's disease of bone is a relatively common condition
(affecting as many as 1 in 1000 people in some areas of the world)
of unknown cause, characterized by increased bone turnover and
disorganized bone remodeling, with areas of increased osteoclastic
and osteoblast activity. Although Pagetic bone is often denser than
normal bone, the abnormal architecture causes the bone to be
mechanically weak, resulting in bone deformity and increased
susceptibility to pathological fracture.
[0594] Multiple myeloma is a cancer of plasma cells. In contrast to
most other haematological malignancies, the tumour cells do not
circulate in the blood, but accumulate in the bone marrow where
they give rise to high levels of cytokines that activate
osteoclastic bone resorption (e.g., interleukin-6). The disease
accounts for approximately 20% of all haematological cancers and is
mainly a disease of elderly people.
[0595] Balance Between Bone Deposition and Bone Resorption
[0596] All of the bone pathologies listed above result from an
imbalance between bone deposition and bone resorption. If the
mechanisms regulating these two processes become uncoupled than
pathological changes in bone mineral density result. In just a few
cases, the cause of the imbalance seems clear for example prolonged
estrogen deficiency (such as due to surgical sterilisation) or
lengthy treatment with glutocorticoids (such as for asthma) both
perturb the balance and can lead to rapid demineralisation of the
bone and osteoporosis.
[0597] Unfortunately, in the vast majority of cases the mechanisms
resulting in loss of balance are much less dear. The difficulty in
identifying the causes stems in part of the small scale imbalances
that must be occurring. For example, most osteoporotic fractures do
not occur until 20-30 years after the menopause. If, as is
generally assumed, the osteoporosis was initiated by the reduction
in estrogen levels after the menopause, then the demineralisation
has been occurring steadily over two or three decades. Since the
bone remodelling process is relatively rapid (complete within 28
days in any given osteon) we must assume that the imbalance in
favour of demineralisation is very small.
[0598] Current Treatments
[0599] There are currently two major classes of drugs used in the
prevention and treatment of osteoporosis: (1) Hormonally active
medications (estrogens, selective estrogen receptor modulators
(SERMs)); and (2) anti-resorptives.
[0600] There is presently good data to suggest that the long term
use of hormonally active medications (usually estrogen, estrogen
analogs or conjugated estrogens) after the menopause in women can
prevent bone demineralisation and hence delay the onset of
osteoporosis. The molecular mechanisms involved are not clearly
defined, possibly because they are so complex. However, there are
plausible mechanisms which involve both stimulation of bone
deposition and suppression of resorption.
[0601] To date, such hormonally active medications, including the
new generation of SERMs, such as Raloxifene.TM., which have the
beneficial effects of estrogen on bone and the cardiovascular
system but do not have the side effects of breast and uterine
hyperplasia that can increase the risk of cancer, have not achieved
widespread use for the treatment of existing osteoporosis.
[0602] At present, treatment of known or suspected bone mineral
deficiency is most commonly by the use of drugs to suppress
osteoclast activity. The two most important drug groups in this
class are bisphophonates (BPs) and non-steroidal anti-inflammatory
drugs (NSAIDs).
[0603] Bisphosphonates (also know as diphosphonates) are an
important class of drugs used in the treatment of bone diseases
involving excessive bone destruction or resorption, e.g., Paget's
disease, tumour-associated osteolysis, and also in post-menopausal
osteoporosis where the defect might be in either bone deposition or
resorption. Bisphosphonates are structural analogues of naturally
occurring pyrophosphate. Whereas pyrophasphate consists of two
phosphate groups linked by an oxygen atom (P--O--P),
bisphosphonates have two phosphate groups linked by a carbon atom
(P--C--P).
[0604] This makes bisphosphonates very stable and resistant to
degradation. Furthermore, like pyrophosphate, bisphosphonates have
very high affinity for calcium and therefore target to bone mineral
in vivo. The carbon atom that links the two phosphate groups has
two side chains attached to it, which can be altered in structure.
This gives rise to a multitude of bisphosphonate compounds with
different anti-resorptive potencies. Bone resorption is mediated by
highly specialised, multinucleated osteoclast cells. Bisphosphonate
drugs specifically inhibit the activity and survival of these
cells. Firstly, after intravenous or oral administration, the
bisphosphonates are rapidly cleared from the circulation and bind
to bone mineral. As the mineral is then resorbed and dissolved by
osteoclasts, it is thought that the drug is released from the bone
mineral and is internalised by osteoclasts. Intracellular
accumulation of the drugs inhibits the ability of the cells to
resorb bone (probably by interfering with signal transduction
pathways or cellular metabolism) and causes osteoclast apoptosis
(see, e.g., Hughes et al., 1997).
[0605] NSAIDs are widely used in the treatment of inflammatory
diseases, but often cause severe gastro-intestinal (GI) side
effects, due their inhibition of the prostaglandin-generating
enzyme, cyclooxygenase (COX). Recently developed selective
cyclooxygenase-2 (COX-2) inhibitors offer new treatment strategies
which are likely to be less toxic to the GI tract. NSAIDs developed
by Nicox SA (Sophia Antipolis, France), that contain a nitric oxide
(NO)-donor group (NO-NSAID) exhibit anti-inflammatory properties
without causing GI side effects. The mechanisms responsible for the
beneficial effects of NSAIDs on bone are not definitively
identified, but since the bone resorbing osteoclast cells are
derived from the circulating monocyte pool, it is not difficult to
imagine why generalised anti-inflammatory treatments might have
anti-resoptive effects. However, another class of powerful
anti-inflammatory molecules, the glucacorticoids and their analogs
such as dexamethasone have the opposite effects to NSAIDs: chronic
dexamethasone treatment (for example, in asthma) induces
demineralisation and leads to symptoms of rapid onset osteoporosis.
Consequently, while NSAIDs empirically have anti-resorptive
properties, further investigations into the detail mechanism of
action of these drugs are clearly required.
[0606] It has recently been discovered that many of the drugs,
which are used clinically to inhibit bone resorption, such as
bisphosphonates and oestrogen do so by promoting osteoclast
apoptosis (see, e.g., Hughes et al., 1997). At present the most
commonly used types of drugs used to suppress osteoclast activity
in these diseases are bisphophonates (BPs) and non-steroidal
anti-inflammatory drugs (NSAIDs).
[0607] Limitations of Current Treatments
[0608] There are a number of limitations which impact on the
clinical utility of all the available therapeutic and preventative
modalities. For example, both hormonal medications (HRT and SERMs)
and antiresorptives (BPs and NSAIDs) primarily target resorption.
While this may be useful in, for example Paget's disease, it is
likely to be less useful in osteoporosis, where the majority of
cases have reduced deposition rates as the primary defect. Of
course, because bone mineral density is a balance between
deposition and resorption rates, antiresorptive strategies can have
some efficacy even where the primary defect is in the rate of
deposition.
[0609] Possibly because current therapeutics target resorption when
suppressed deposition is the primary defect in osteoporosis, none
of the current agents can build bone, but instead only halt further
demineralisation. Because of the limited availability of diagnostic
techniques, particularly for population screening, treatment cannot
usually begin until clinical symptoms exist (such as fracture) by
which point the bones may already be dangerously demineralised. In
such cases (which are the majority), a therapy which increases bone
mineral density would be desirable. A new treatment based on
abolishing proline deficiency would stimulate deposition rate and
hence be a new category of therapeutic: one which targets
deposition preferentially over resorption. Therapeutics of this
category would be expected to overcome the limitation of being
unable to increase bone mineral density.
[0610] Another limitation of existing therapies is the failure to
treat the underlying cause of the pathology, but rather to try and
alleviate the symptoms. In part, this is because few direct causes
of osteoporosis have been identified. The inventors have identified
a novel contributory mechanism to the development of osteoporosis
and hence have provided the first therapeutic approach to target
one of the direct mechanisms resulting in pathologically low bone
mineral density.
[0611] Bone Disorder Diagnostics
[0612] It has long been clear that early diagnosis of bone
disorders was essential for good therapeutic management. Although
there are now several effective treatments for osteoporosis, each
one is only able to arrest the further loss of bone mineral
density. No treatment to date has been effective in reversing loss
which has already occurred. Thus early, reliable diagnosis of
declining bone mineral density is of the utmost clinical
importance.
[0613] Existing diagnosis methods for bone disorders fall into two
categories:
[0614] (a) direct observation (for example, bone mineral density
scans for osteoporosis or radiographic assessment for
osteoarthritis); and,
[0615] (b) indirect observation of molecular markers of remodelling
(for example, collagen breakdown products).
[0616] Of the major determinants for bone fracture, only bone
mineral density can presently be determined with any precision and
accuracy.
[0617] Bone densitometers typically give results in absolute terms
(i.e., bone mineral density, BMD, typically in units of g/cm.sup.2)
or in relative terms (T-scores or Z-scores) which are derived from
the BMD value. The Z-score compares a patient's BMD result with BMD
measurements taken from a suitable control population, which is
usually a group of healthy people matched for sex and age, and
probably also weight. The T-score compares the patient's BMD result
BMD measurements taken from a control population of healthy young
adults, matched for sex. In other words, for Z-scores, age and
sex-matched controls are used; for T-scores just sex-matched
controls are used. The World Health Organisation (WHO) defines
osteoporosis as a bone mineral density (BMD) below a cut-off value
which is 1.5 standard deviations (SDs) below the mean value for the
age- and sex-matched controls (Z-scores), or a bone mineral density
(BMD) below a cut-off value which is 2.5 standard deviations (SDs)
below the mean value for the sex-matched controls (T-scores) (see,
e.g., World Health Organisation, 1994).
[0618] The two most widely used methods for assessing bone mineral
density (BMD) is the DEXA scan (dual emission X-ray absorbtion
scanning) and ultrasound. The DEXA method is considered the gold
standard diagnostic tool for bone mineral density, providing a
reliable estimate of average bone mineral density in units of grams
per cubic centimetre. It can be applied to a number of different
bones, but is most commonly used to measure lumbar spine density
(as a measure of cortical bone) and femoral neck density (as a
measure of trebecular bone mineral density). Ultrasound is easier
and cheaper to perform than DEXA scanning, but provides a less
reliable estimate of bone mineral density and its accuracy is
compromised by the surrounding soft tissue. As a result, ultrasound
is usually performed on the heel, where interference by soft tissue
is minimised, but it is unclear whether this is typical of whole
body bone mineral density, and in any case it does not allow an
assessment of cortical bone. See, for example, Pocock et al., 2000;
Prince, 2001.
[0619] Almost all of the molecular diagnostics currently employed
are based on measurements of bone breakdown products. The steady
state level of breakdown products should be related to the bone
remodelling rate, although it will be biased towards detection of
overactive resorption rather than underactive deposition. It may
be, in part, for this reason that all therapies currently on trial
for osteoporosis (such as estrogen receptor modulators or
bisphosphonates) are based on an antiresorptive strategy rather
than on promoting deposition, even though (as noted above) most
cases of osteoporosis are not due to overactive resorption.
[0620] Examples of molecular diagnostics include the measurement of
free crosslinks, hydroxyproline, collagen propeptides, or alkaline
phosphatase in serum or urine. Free crosslinks are produced when
collagen is degraded during resorption. Although the collagen can
mostly be broken down to free amino acids, the trimerised
hydroxylysine residues that formed the crosslinks cannot be further
metabolised and so accumulate in the blood until secreted by the
kidney in urine. Thus the levels of crosslink in serum or in urine
will be related to the rate of collagen breakdown (most, but not
all, of which will be occurring in the bone). Tests for
hydroxyproline rely on a similar principle: free proline (that is,
proline not incorporated into protein) is never in the hydroxylated
form, hydroxyproline. As a result, the only source of free
hydroxyproline in blood is from collagen breakdown. As for
crosslinks, the free hydroxyproline generated during breakdown
cannot be metabolised any further and accumulates until excreted by
the kidney. Unfortunately, the level of both of these metabolites
(in either serum or urine) is significantly affected by kidney
function.
[0621] Collagen is produced as a proprotein which has both an
N-terminal and C-terminal extension cleaved off prior to
incorporation into the extracellular matrix. These extensions, or
propeptides, are then metabolised or excreted. However, the steady
state level of the propeptides has been suggested to be a marker
for collagen deposition, some, but not all, of which is likely to
be occurring in the bone.
[0622] Problems with Current Diagnostic Methods
[0623] The gold standard bone densitometry method, DEXA scanning,
is too cumbersome and expensive for routine screening procedures in
women without clinical signs of osteoporosis. It requires
specialist apparatus (which is large and expensive to install and
maintain) as well as specialist training for its operation. Despite
accurately measuring bone mineral density, and hence providing the
benchmark diagnosis of osteoporosis, nevertheless it does not
accurately predict future fracture risk, suggesting that bone
quality as well as density may also be important (see, for example,
the comments above).
[0624] Ultrasound measurements on the heel are simpler to perform,
using cheaper apparatus and requiring less operator training, but
the results are generally less able to predict the presence of
either osteoporosis or future fracture risk.
[0625] Molecular diagnostics are considerably easier to implement,
although in many cases the reagents required for the assays are
expensive to obtain. The major disadvantage of the markers which
have been evaluated to date is that the levels of the breakdown
products in serum or urine are not particularly temporally stable,
changing with diurnal rhythm and also from day to day. As a result,
spot measures (i.e., a single specimen taken at a randomly chosen
time) have virtually no diagnostic or prognostic power. Series of
measurements can be used to provide some indication of relative
risk for osteoporosis, but the odds ratio for having osteoporosis
is only approximately 2-fold among individuals with high levels of
the turnover markers (see, e.g., Gamero, 1996). Such a weak
association is of little or no practical clinical value, and as a
result, biochemical markers of bone metabolism have not found
widespread application in the clinical arena, and have not been
considered for population screening.
[0626] Another important limitation of current molecular
diagnostics is the focus on the products of bone metabolism (such
as cross links, hydroxyproline, and collagen propetides). These
species might offer diagnostic potential but they provide no
information at all about the underlying causes of the imbalance
between deposition and resorption. Identification of a risk factor
that was not a direct marker of bone turnover may offer the
prospect of identifying therapeutic targets as well as having
prognostic potential.
[0627] Osteoarthritis
[0628] Osteoarthritis (OA) is the most prevalent type of arthritis,
particularly in adults 65 years and older. OA is a chronic
degenerative arthropathy that frequently leads to chronic pain and
disability. With the aging of the population, this condition is
becoming increasing prevalent and its treatment increasingly
financially burdensome. Finding better treatments for OA is a major
focus of research at this time. Reported incidence and prevalence
rates of OA in specific joints vary widely, due to differences in
the case definition of OA. OA may be defined by radiographic
criteria alone (radiographic OA), by typical symptoms (symptomatic
OA), or by both. Using radiographic criteria, the distal and
proximal interphalangeal joints of the hand have been identified as
the joints most commonly affected by OA, but they are the least
likely to be symptomatic. In contrast, the knee and hip, which
constitute the second and third most common locations of
radiographic OA, respectively, are nearly always symptomatic. The
first metatarsal phalangeal and carpometacarpal joints are also
frequent sites of radiographic OA, while the shoulder, elbow, wrist
and metacarpophalangeal joints rarely develop idiopathic OA.
[0629] In demographic studies, age is the most consistently
identified risk factor for OA, regardless of the joint being
studied. Prevalence rates for both radiographic OA and, to a lesser
extent, symptomatic OA rise steeply after age 50 in men and age 40
in women. Female gender is also a well-recognized risk factor for
OA. Hand OA is particularly prevalent among women. In addition,
polyarticular OA and isolated knee OA are slightly more common in
women than men, while hip OA occurs more commonly in men. Woman are
more likely to report pain in all affected joints, including the
hip, than men. Cohort studies have demonstrated a clear association
of obesity with the development of radiographic knee OA in women
and a weaker association with hip OA Whether obesity is a risk
factor for the development of hand OA remains controversial.
[0630] Occupation-related repetitive injury and physical trauma
contribute to the development of secondary (non-idiopathic) OA,
sometimes occurring in joints that are not affected by primary
(idiopathic) OA, such as the metacarpophalangeal joints, wrists and
ankles. Although the prevalence of knee OA is greater in adults who
have engaged in occupations that require repetitive bending and
strenuous activities, an association with regular, intense exercise
remains controversial. While early studies in joggers failed to
find a higher prevalence of OA of the knee in joggers compared to
non-joggers, a recent study of the Framingham data base in elderly
adults provided the first longitudinal association between high
level of physical activity and incident knee OA. Low-impact and
recreational exercises are unlikely to constitute a risk factor for
knee OA, and are likely to benefit the cardiovascular system. Prior
menisectomy is a significant risk factor in men for the development
of OA in the knee.
[0631] Signs and Symptoms of OA
[0632] OA is diagnosed by a triad of typical symptoms, physical
findings and radiographic changes. The American College of
Rheumatology has set forth classification criteria to aid in the
identification of patients with symptomatic OA that include, but do
not rely solely on, radiographic findings. Patients with early
disease experience localized joint pain that worsens with activity
and is relieved by rest, while those with severe disease may have
pain at rest. Weight bearing joints may "lock" or "give way" due to
internal derangement that is a consequence of advanced disease.
Stiffness in the morning or following inactivity ("gel phenomenon")
rarely exceeds 30 minutes.
[0633] Physical findings in osteoarthritic joints include bony
enlargement, crepitus, cool effusions, and decreased range of
motion. Tenderness on palpation at the joint line and pain on
passive motion are also common, although not unique to OA.
Radiographic findings in OA include osteophyte formation, joint
space narrowing, subchondral sclerosis and cysts. The presence of
an osteophyte is the most specific radiographic marker for OA
although it is indicative of relatively advanced disease.
[0634] Diagnosis
[0635] If a patient has the typical symptoms and radiographic
features described above, the diagnosis of OA is relative
straightforward and is unlikely to be confused with other entities.
However, in less straightforward cases, other diagnoses should be
considered. For example, periarticular pain that is not reproduced
by passive motion or palpation of the joint should suggest an
alternate etiology such as bursitis, tendonitis or periostitis. If
the distribution of painful joints includes MCP, wrist, elbow,
ankle or shoulder, OA is unlikely. Prolonged stiffness (greater
than one hour) should raise suspicion for an inflammatory arthritis
such as rheumatoid arthritis. Marked warmth and erythema in a joint
suggests an infectious or microcrystalline etiology. Weight loss,
fatigue, fever and loss of appetite suggest a systemic illness such
as polymyalgia rheumatica, rheumatoid arthritis, lupus or sepsis or
malignancy.
[0636] Radiographs are considered the "gold standard" test for the
diagnosis of OA, but radiographic changes are evident only
relatively late in the disease. The need is great for a sensitive
and specific biological marker that would enable early diagnosis of
OA, and monitoring of its progression. Routine laboratory studies,
such as sedimentation rates and c-reactive protein, are not useful
as markers for OA, although a recent study suggests that elevation
of CRP predicts more rapidly progressive disease.
[0637] Several epitopes of cartilage components, however, have been
described that offer some promise as markers of OA. For example,
chondroitin sulfate epitope 846, normally expressed only in fetal
and neonatal cartilage, has been observed in OA, but not normal
adult, cartilage and synovial fluid. An epitope unique to type II
collagen has been described in OA cartilage, and can be unmasked in
vitro by exposing normal cartilage to MMPs. This epitope can be
measured in blood and urine and may prove useful in diagnosing or
monitoring OA progression. Elevated serum hyaluronan levels have
also been shown by some to correlate with radiographic OA. The
finding of elevated cartilage oligomeric protein (COMP) levels in
synovial fluid after traumatic joint injury may portend development
of OA in the injured joint. Other potential markers of OA have been
listed but are either not easily accessible or lack the sensitivity
and specificity required to consider them as potential OA
markers.
[0638] Current treatment for OA is relatively limited. Because
there are currently no pharmacological agents capable of retarding
or preventing disease, treatment is predominantly focused on relief
of pain, and maintenance of quality of life and functional
independence.
[0639] Several studies have shown acetaminophen (paracetamol) to be
superior to placebo and equivalent to nonsteroidal
anti-inflammatory agents (NSAIDs) for the short-term management of
OA pain. At present, acetaminophen (up to 4,000 mg/daily) is the
recommended initial analgesic of choice for symptomatic OA.
However, many patients eventually require NSAIDs or more potent
analgesics to control pain.
[0640] The mechanism by which NSAIDs exert their anti-inflammatory
and analgesic effects is via inhibition of the
prostaglandin-generating enzyme, cyclooxygenase (COX). In addition
to their inflammatory potential, prostaglandins also contribute to
important homeostatic functions, such as maintenance of the gastric
lining, renal blood flow, and platelet aggregation. Reduction of
prostaglandin levels in these organs can result in the
well-recognized side effects of traditional non-selective
NSAIDs--that is, gastric ulceration, renal insufficiency, and
prolonged bleeding time. The elderly are at higher risk for these
side effects. For example, adults over the age of 60 who are taking
NSAIDs have a 4-5 fold higher risk of gastrointestinal bleeding or
ulceration then their age-matched counterparts. Other risk factors
for NSAID-induced GI bleed include prior peptic ulcer disease and
concomitant steroid use. Potential renal toxicities of NSAIDs
include azotemia, proteinura, and renal failure requiring
hospitalization. Hematologic and cognitive abnormalities have also
been reported with several NSAIDs. Therefore, in elderly patients,
and those with a documented history of NSAID-induced ulcers,
traditional non-selective NSAIDs should be used with caution,
usually in lower dose and in conjunction with a proton pump
inhibitor. Renal function should be monitored in the elderly. In
addition, prophylactic treatment to reduce risk of gastrointestinal
ulceration, perforation and bleeding is recommended in patients
>60 years of age with: prior history of peptic ulcer disease;
anticipated duration of therapy of >3 months; moderate to high
dose of NSAIDs; and, concurrent corticosteroids. Misoprostol, at a
dose of 200 mg four times daily, constitutes effective anti-ulcer
prophylaxis but is often poorly tolerated due to diarrhea.
Omeprazole, and other proton pump inhibitors, are also very
effective anti-ulcer prophylactic agents, although cost can be
limiting. The recent development of selective cyclooxygenase-2
(COX-2) inhibitors offers a new strategy for the management of pain
and inflammation that is likely to be less toxic to the GI
tract.
[0641] The development of selective COX-2 inhibitors has been an
exciting advance in the management of pain and inflammation, as
discussed above. If the safety profile of the specific COX-2
inhibitors is confirmed in additional long-term studies to be
superior to non-selective COX inhibitors, they are likely to
replace traditional NSAIDs in the management of arthritis and other
painful, inflammatory conditions. It should be kept in mind,
however, that no matter what their degree of selectivity, COX
inhibitors (NSAIDs) do not alter the natural history of OA and a
"disease modifying" agent is still critically needed.
[0642] Local analgesic therapies include topical capsaicin and
methyl salicylate creams. Occasionally in late stage disease,
patients will require narcotic analgesics to control pain. Oral
glucosamine and chondroitin sulfate have been shown (each
individually) to have a mild to moderate analgesic effect in
several double-blind, placebo-controlled studies.
[0643] Judicious use of intra-articular glucocorticoid injections
is appropriate for OA patients who cannot tolerate, or whose pain
is not well controlled by, oral analgesic and anti-inflammatory
agents. Periarticular injections may effectively treat bursitis or
tendonitis that can accompany OA. The need for four or more
intra-articular injections suggests the need for orthopedic
intervention.
[0644] Intraarticular injection of hyaluronate preparations has
been demonstrated in several small clinical trials to reduce pain
in OA of the knee. These injections are given in a series of 3 or 5
weekly injections (depending on the specific preparation) and may
reduce pain for up to 6 months in some patients.
[0645] Non-Pharmacological Management
[0646] Weight reduction in obese patients has been shown to
significantly relieve pain, presumably by reducing biomechanical
stress on weight bearing joints. Exercise has also been shown to be
safe and beneficial in the management of OA. It has been suggested
that joint loading and mobilization are essential for articular
integrity. In addition, quadricep weakness, which develops early in
OA, may contribute independently to progressive articular damage.
Several studies in older adults with symptomatic knee OA have shown
consistent improvements in physical performance, pain and
self-reported disability after 3 months of aerobic or resistance
exercise. Other studies have shown that resistive strengthening
improves gait, strength and overall function. Low-impact
activities, including water-resistive exercises or bicycle
training, may enhance peripheral muscle tone and strength and
cardiovascular endurance, without causing excessive force across,
or injury, to joints. Studies of nursing home and
community-dwelling elderly clearly demonstrate that one additional
important benefit of exercise is a reduction in the number of
falls.
[0647] Surgical Management
[0648] Patients in whom function and mobility remain compromised
despite maximal medical therapy, and those in whom the joint is
structurally unstable, should be considered for surgical
intervention. Patients in whom pain has progressed to unacceptable
levels--that is, pain at rest and/or nighttime pain--should also be
considered as surgical candidates. Surgical options include
arthroscopy, osteotomy and arthroplasty.
[0649] Arthroscopic removal of intra-articular loose bodies and
repair of degenerative menisci may be indicated in some patients
with knee OA. Tibial osteotomy is an option for some patients who
have a relatively small varus angulation (less than 10 degrees) and
stable ligamentous support. Total knee arthroplasty is recommended
for patients with more severe varus, or any valgus, deformity and
ligamentous instability. Arthroplasty is also indicated for
patients who have had ineffective pain relief following a tibial
osteotomy, and for those with advanced hip OA. Patients who have
not yet developed appreciable muscle weakness, generalized or
cardiovascular deconditioning and who would medically withstand the
stress of surgery are ideal surgical candidates. In contrast, full
mobility and function may not be realistically expected in patients
with significant cognitive impairment or symptomatic
cardiopulmonary disease, since these conditions can impede
post-operative rehabilitation.
[0650] Several questionnaires have been established as validated,
reliable research instruments for assessing functional outcomes in
patients with arthritis. These include the Lequesne index, the
Western Ontario McMaster Arthritis scale (WOMAC), activities of
daily living (ADL), etc. Several performance-based tests of
function can be done rapidly and easily in the office, however, and
may be more sensitive in predicting impending disability than
direct questions about disability and impairment. Such measures
include grip strength, a timed walk, and sequential chair-stands.
These tests can provide the clinician with valuable information on
the patient's current level of function, as well as serve
longitudinally to assess decline in function.
[0651] Osteoarthritis is the most prevalent articular disease in
the elderly. Disease markers that will detect early disease, and
agents that will slow down or halt disease progression are
critically needed. Current management should include safe and
adequate pain relief using systemic and local therapies, and should
include medical and rehabilitative interventions that limit
functional deterioration.
[0652] Although epidemiological studies have promoted understanding
of the risk factors that predispose to OA, the initiating events
that trigger the disease are not yet understood.
[0653] Cartilage is a unique tissue with viscoelastic and
compressive properties which are imparted by its extracellular
matrix, composed predominantly of type II collagen and
proteoglycans. Under normal conditions, this matrix is subjected to
a dynamic remodeling process in which low levels of degradative and
synthetic enzyme activities are balanced, such that the volume of
cartilage is maintained. In OA cartilage, however, matrix degrading
enzymes are overexpressed, shifting this balance in favor of net
degradation, with resultant loss of collagen and proteoglycans from
the matrix. Presumably in response to this loss, chondrocytes
initially proliferate and synthesize enhanced amounts of
proteoglycan and collagen molecules. As the disease progresses,
however, reparative attempts are outmatched by progressive
cartilage degradation. Fibrillation, erosion and cracking initially
appear in the superficial layer 6f cartilage and progress over time
to deeper layers, resulting eventually in large clinically
observable erosions. OA, in simplistic terms, therefore, can be
thought of as a process of progressive cartilage matrix degradation
to which an ineffectual attempt at repair is made.
[0654] A critical question is whether OA is truly a disease or a
natural consequence of aging. Several differences between aging
cartilage and OA cartilage have been described, suggesting the
former. For example, although denatured type II collagen is found
in both normal aging and OA cartilage, it is more predominant in
OA. In addition, OA and normal aging cartilage differ in the amount
of water content and the in ratio of chondroitin-sulfate to keratin
sulfate constituents. The expression of a chondroitin-sulfate
epitope (epitope 846) in OA cartilage, that is otherwise only
present in fetal and neonatal cartilage, provides further evidence
that OA is a distinct pathologic process. A final but important
distinction is that degradative enzyme activity is increased in OA,
but not in normal aging cartilage.
[0655] The primary enzymes responsible for the degradation of
cartilage are the matrix metalloproteinases (MMPs). These enzymes
are secreted by both synovial cells and chondrocytes and are
categorized into three general categories: a) collagenases; b)
stromelysins; and, c) gelatinases. Under normal conditions, MMP
synthesis and activation are tightly regulated at several levels.
They are secreted as inactive proenzymes that require enzymatic
cleavage in order to become activated. Once activated, MMPs become
susceptible to the plasma-derived MMP inhibitor,
alpha-2-macroglobulin, and to tissue inhibitors of MMPs (TIMPs)
that are also secreted by synovial cells and chondrocytes. In OA,
synthesis of MMPs is greatly enhanced and the available inhibitors
are overwhelmed, resulting in net degradation. Interestingly,
stromelysin can serve as an activator for its own proenzyme, as
well as for procollagenase and prostromelysin, thus creating a
positive feedback loop of proMMP activation in cartilage.
[0656] One candidate is interleukin-1 (IL-1). IL-1 is a potent
pro-inflammatory cytokine that, in vitro, is capable of inducing
chondrocytes and synovial cells to synthesize MMPs. Furthermore,
IL-1 suppresses the synthesis of type II collagen and
proteoglycans, and inhibits transforming growth factor-.beta.
stimulated chondrocyte proliferation. The presence of IL-1 RNA and
protein have been confirmed in OA joints. Thus, IL-1 may not only
actively promote cartilage degradation, but may also suppress
attempts at repair, in OA. In addition to these effects, IL-1
induces nitric oxide production, chondrocyte apoptosis, and
prostaglandin synthesis, which further contribute to cartilage
deterioration. Under normal conditions, an endogenous IL-1 receptor
antagonist regulates IL-1 activity. A relative excess of IL-1
and/or deficiency of the IL-1 receptor antagonist could conceivably
result in the cartilage destruction that is characteristic of OA.
It is likely that other cytokines or particulate material from
damaged cartilage may also contribute to this inflammatory,
degradative process.
[0657] Growth factors are produced locally in cartilage and
synovium and are likely to contribute to local cartilage remodeling
by stimulating the de novo synthesis of collagen and proteoglycans.
Transforming growth factor .beta. (TGF.beta.) is the best
characterized and most potent of the chondrocyte growth factors.
Not only does TGF.beta. stimulate de novo matrix synthesis, but it
also counteracts cartilage degradation by down regulating IL-1
receptor expression and by increasing IL-1 receptor antagonist
release and TIMP expression, Insulin-like growth factor (IGF-1) and
basic fibroblast growth factor (b-FGF) are also present in OA
cartilage and likely to contribute to reparative attempts,
although, as noted, degradation ultimately outstrips repair in OA
cartilage.
[0658] One disease modifying strategy is to suppress the
progressive degradation of cartilage that occurs in OA. To
accomplish this, the ratio of MMP inhibitors to MMP enzymes must be
shifted in favor of the former. This could be accomplished by
enhancing articular levels of TIMP by recombinant gene therapy or
by administration of exogenous TIMP. Studies of exogenous TIMP
administration to animals with OA-like disease have had
inconclusive results, however, perhaps due to ineffective
penetration of this relatively high molecular weight protein into
the cartilage matrix. An alternate approach, that has progressed
more rapidly, is to develop oral inhibitors of MMPs. In fact,
several synthetic small molecular weight inhibitors of MMPs have
proven efficacious in animal models of arthritis and are entering
Phase III clinical trials in humans. The antibiotic, tetracycline,
and its semisynthetic derivatives, doxycycline and minocycline,
have modest MMP inhibitory properties and have prompted
investigations of these agents in the treatment of both OA and RA.
Finally, inhibition of IL-1, via administration of a soluble IL-1
receptor or receptor antagonist, represents another rational
strategy for suppressing MMP synthesis, and preliminary studies in
RA are also promising.
[0659] Enhancing the repair of damaged cartilage constitutes
another rational strategy for the treatment and/or prevention of
OA. Administration of exogenous growth factors, such as IGF and
bFGF, to stimulate chondrocyte proliferation and/or matrix
synthesis has had beneficial effects in animals models of OA, and
TGF-.beta. has the added advantage of suppressing MMP synthesis.
The same effect could also be achieved theoretically by
transplanting healthy autologous chondrocytes that have been
genetically engineered to over-express one or more of these growth
factors site. However, because the area of cartilage loss in OA can
be quite extensive and because older chondrocytes are metabolically
less active than young chondrocytes, it is unclear that either
autologous or heterologous transplants will be practical for the
management of OA.
[0660] In summary, MMPs and pro-inflammatory cytokines (e.g., IL-1)
appear to be important mediators of cartilage destruction in OA.
Synthesis and secretion of growth factors and of inhibitors of MMPs
and cytokines are apparently inadequate to counteract these
degradative forces. Progressive cartilage degradation and OA
result. New therapies, focused on reducing MMP activity and on
stimulating matrix synthesis, are in development.
[0661] NMR Spectroscopy
[0662] As discussed above, many aspects of the present invention
pertain to methods which employ NMR spectra, or data obtained or
derived from NMR spectra.
[0663] The principal nucleus studied in biomedical NMR spectroscopy
is the proton or .sup.1H nucleus. This is the most sensitive of all
naturally occurring nuclei. The chemical shift range is about 10
ppm for organic molecules. In addition .sup.13C NMR spectroscopy
using either the naturally abundant 1.1% .sup.13C nuclei or
employing isotopic enrichment is useful for identifying
metabolites. The .sup.13C chemical shift range is about 200 ppm.
Other nuclei find special application. These include .sup.15N (in
natural abundance or enriched), .sup.19F for studies of drug
metabolism, and .sup.31P for studies of endogenous phosphate
biochemistry either in vitro or in vivo.
[0664] In order to obtain an NMR spectrum, it is necessary to
define a "pulse program". At its simplest, this is application of a
radio-frequency (RF) pulse followed by acquisition of a free
induction decay (FID)--a time-dependent oscillating, decaying
voltage which is digitised in an analog-digital converter (ADC). At
equilibrium, the nuclear spins are present in a number of quantum
states and the RF pulse disturbs this equilibrium. The FID is the
result of the spins returning towards the equilibrium state. It is
necessary to choose the length of the pulse (usually a few
microseconds) to give the optimum response.
[0665] This, and other experimental parameters are chosen on the
basis of knowledge and experience on the part of the
spectroscopist. See, for example, T. D. W. Claridge,
High-Resolution NMR Techniques in Organic Chemistry: A Practical
Guide to Modern NMR for Chemists, Oxford University Press, 2000.
These are based on the observation frequency to be used, the known
properties of the nucleus under study (i.e., the expected chemical
shift range will determine the spectral width, the desired peak
resolution determines the number of data points, the relaxation
times determine the recycle time between scans, etc.). The number
of scans to be added is determined by the concentration of the
analyte, the inherent sensitivity of the nucleus under study and
its abundance (either natural or enhanced by isotopic
enrichment).
[0666] After data acquisition, a number of possible manipulations
are possible. The FID can be multiplied by a mathematical function
to improve the signal-to-noise ratio or reduce the peak line
widths. The expert operator has choice over such parameters. The
FID is then often filled by a number of zeros and then subjected to
Fourier transformation. After this conversion from time-dependent
data to frequency dependent data, it is necessary to phase the
spectrum so that all peaks appear upright--this is done using two
parameters by visual inspection on screen (now automatic routines
are available with reasonable success). At this point the spectrum
baseline can be curved. To remedy this, one defines points in the
spectrum where no peaks appear and these are taken to be baseline.
Usually, a polynomial function is fitted to these points, but other
methods are available, and this function subtracted from the
spectrum to provide a fiat baseline. This can also be done in an
automatic fashion. Other manipulations are also possible. It is
possible to extend the FID forwards or backwards by "linear
prediction" to improve resolution or to remove so-called truncation
artefacts which occur if data acquisition of a scan is stopped
before the FID has decayed into the noise. All of these decisions
are also applicable to 2- and 3-dimensional NMR spectroscopy.
[0667] An NMR spectrum consists of a series of digital data points
with a y value (relating to signal strength) as a function of
equally spaced x-values (frequency). These data point values run
over the whole of the spectrum. Individual peaks in the spectrum
are identified by the spectroscopist or automatically by software
and the area under each peak is determined either by integration
(summation of the y values of all points over the peak) or by curve
fitting. A peak can be a single resonance or a multiplet of
resonances corresponding to a single type of nucleus in a
particular chemical environment (e.g., the two protons ortho to the
carboxyl group in benzoic acid). Integration is also possible of
the three dimensional peak volumes in 2-dimensional NMR spectra.
The intensity of a peak in an NMR spectrum is proportional to the
number of nuclei giving rise to that peak (if the experiment is
conducted under conditions where each successive accumulated free
induction decay (FID) is taken starting at equilibrium). Also, the
relative intensity of peaks from different analytes in the same
sample is proportional to the concentration of that analyte (again
if equilibrium prevails at the start of each scan).
[0668] Thus, the term "NMR spectral intensity," as used herein,
pertains to some measure related to the NMR peak area, and may be
absolute or relative. NMR spectral intensity may be, for example, a
combination of a plurality of NMR spectral intensities, e.g., a
linear combination of a plurality of NMR spectral intensities.
[0669] In the context of NMR spectral intensity, the term "NMR"
refers to any type of NMR spectroscopy.
[0670] NMR spectroscopic techniques can be classified according to
the number of frequency axes and these include 1D-, 2D-, and
3D-NMR. 1D spectra include, for example, single pulse; water-peak
eliminated either by saturation or non-excitation; spin-echo, such
as CPMG (i.e., edited on the basis of spin-spin relaxation);
diffusion-edited, selective excitation of specific spectra regions.
2D spectra include for example J-resolved (JRES); 1H-1H correlation
methods, such as NOESY, COSY, TOCSY and variants thereof;
heteronuclear correlation including direct detection methods, such
as HETCOR, and inverse-detected methods, such as 1H-13C HMQC, HSQC,
HMBC. 3D spectra, include many variants, all of which are
combinations of 2D methods, e.g. HMQC-TOCSY, NOESY-TOCSY, etc. All
of these NMR spectroscopic techniques can also be combined with
magic-angle-spinning (MAS) in order to study samples other than
isotropic liquids, such as tissues, which are characterised by
anisotropic composition.
[0671] Preferred nuclei include .sup.1H and .sup.13C. Preferred
techniques for use in the present invention include water-peak
eliminated, spin-echo such as CPMG, diffusion edited, JRES, COSY,
TOCSY, HMQC, HSQC, and HMBC.
[0672] NMR analysis (especially of biofluids) is carried out at as
high a field strength as is practical, according to availability
(very high field machines are not widespread), cost (a 600 MHz
instrument costs about .English Pound.500,000 but a shielded 800
MHz instrument can cost more than .English Pound.3,500,000,
depending on the nature of accessory equipment purchased), and
ability to accommodate the physical size of the instrument.
Maintenance/operational costs do not vary greatly and are small
compared to the capital cost of the machine and the personnel
costs.
[0673] Typically, the .sup.1H observation frequency is from about
200 MHz to about 900 MHz, more typically from about 400 MHz to
about 900 MHz, yet more typically from about 500 MHz to about 750
MHz .sup.1H observation frequencies of 500 and 600 MHz may be
particularly preferred. Instruments with the following .sup.1H
observation frequencies are/were commercially available: 200, 250,
270 (discontinued), 300, 360 (discontinued), 400, 500, 600, 700,
750, 800, and 900 MHz.
[0674] Higher frequencies are used to obtain better signal-to-noise
ratio and for greater spectral dispersion of resonances. This gives
a better chance of identifying the molecules giving rise to the
peaks. The benefit is not linear because in addition to the better
dispersion, the detailed spectral peaks can move from being
"second-order"--where analysis by inspection is not possible,
towards "first-order," where it is. Both peak positions and
intensities within multiplets change in a non-linear fashion as
this progression occurs. Lower observation frequencies would be
used where cost is an issue, but this is likely to lead to reduced
effectiveness for classification and identification of
biomarkers.
[0675] NMR Spectroscopy: Sample Preparation
[0676] NMR spectra can be measured in solid, liquid, liquid crystal
or gas states over a range of temperatures from 120 K to 420 K and
outside this range with specialised equipment. Typically, NMR
analysis of biofluids is performed in the liquid state with a
sample temperature of from about 274 K to about 328 K, but more
typically from about 283 K to about 321 K. An example of a typical
temperature is about 300 K.
[0677] Lower temperatures would be used to ensure that the biofluid
did not suffer from any decomposition or show any effects of
chemical or enzymatic reactions during the data acquisition. Higher
temperatures may be used to improve detection of certain species.
For example, for plasma or serum, lipoproteins undergo a series of
phase changes as the temperature is increased; in particular, the
low density lipoprotein (LDL) peak intensities are rather
temperature dependent and the lines sharpen and broader
more-difficult-to-detect components become visible as the
lipoprotein becomes more "liquid."
[0678] Typically, biofluid samples are diluted with solvent prior
to NMR analysis. This is done for a variety of reasons, including:
to lessen solution viscosity, to control the pH of the solution,
and to allow addition of reagents and reference materials.
[0679] An example of a typical dilution solvent is a solution of
0.9% by weight of sodium chloride in D.sub.2O. The D.sub.2O lessens
the overall concentration of H.sub.2O and eases the technical
requirements in the suppression of the solvent water NMR resonance,
necessary for optimum detection of metabolite NMR signals. The
deuterium nuclei of the D.sub.2O also provides an NMR signal for
locking the magnetic field enabling the exact co-registration of
successive scans.
[0680] Depending on the available amount of the biofluid,
typically, the dilution ratio is from about 1:50 to about 5:1 by
volume, but more typically from about 1:20 to about 1:1 by volume.
An example of a typical dilution ratio is 3:7 by volume (e.g., 150
.mu.L sample, 350 .mu.L solvent), typical for conventional 5 mm NMR
tubes and for flow-injection NMR spectroscopy.
[0681] Typical sample volumes for NMR analysis are from about 50
.mu.L (e.g., for microprobes) to about 2 mL. An example of a
typical sample volume is about 500 .mu.L.
[0682] NMR peak positions (chemical shifts) are measured relative
to that of a known standard compound usually added directly to the
sample. For biofluids such as urine this is commonly a partially
deuterated form of TSP, i.e.,
3-trimethylsilyl-[2,2,3,3-.sup.2H.sub.4]-propionate sodium salt.
For biofluids containing high levels of proteins, this substance is
not suitable since it binds to proteins and shows a broadened NMR
line. Added formate anion (e.g., as a salt) can be used in such
cases as for blood plasma.
[0683] NMR Spectroscopy: Manipulation of NMR Spectra
[0684] NMR spectra are typically acquired, and subsequently,
handled in digitised form. Conventional methods of spectral
pre-processing of (digital) spectra are well known, and include,
where applicable, signal averaging, Fourier transformation (and
other transformation methods), phase correction, baseline
correction, smoothing, and the like (see, for example, Lindon et
al., 1980).
[0685] Modem spectroscopic methods often permit the collection of
high or very high resolution spectra. In digital form, even a
simple spectrum (e.g., signal versus spectroscopic parameter) may
have many thousands, if not tens of thousands of data points. It is
often desirable to reduce or compress the data to give fewer data
points, for both practical computing methods and also to effect
some degree of signal averaging to compensate for physical effects,
such as pH variation, compartmentalisation, and the like. The
resulting data may be referred to as "spectral data."
[0686] For example, a typical .sup.1H NMR spectrum is recorded as
signal intensity versus chemical shift (.delta.) which ranges from
about .delta. 0 to .delta. 10. At a typical chemical shift
resolution of about .delta. 10.sup.-4-10.sup.-3 ppm, the spectrum
in digital form comprises about 10,000 to 100,000 data points. As
discussed above, it is often desirable to compress this data, for
example, by a factor of about 10 to 100, to about 1000 data
points.
[0687] For example, in one approach, the chemical shift axis,
.delta., is "segmented" into "buckets" or "bins" of a specific
length. For a 1-D .sup.1H NMR spectrum which spans the range from
.delta. 0 to .delta. 10, using a bucket length, .DELTA..delta., of
0.04 yields 250 buckets, for example, .delta. 10.0-9.96, .delta.
9.96-9.92, .delta. 9.92-9.88, etc., usually reported by their
midpoint, for example, .delta. 9.98, .delta. 9.94, .delta. 9.90,
etc. The signal intensity within a given bucket may be averaged or
integrated, and the resulting value reported. In this way, a
spectrum with, for example, 100,000 original data points can be
compressed to an equivalent spectrum with, for example, 250 data
points.
[0688] A similar approach can be applied to 2-D spectra, 3-D
spectra, and the like. For 2-D spectra, the "bucket" approach may
be extended to a "patch." For 3-D spectra, the "bucket" approach
may be extended to a "volume." For example, a 2-D .sup.1H NMR
spectrum which spans the range from .delta. 0 to .delta. 10 on both
axes, using a patch of .DELTA..delta. 0.1.times..DELTA..delta. 0.1
yields 10,000 patches. In this way, a spectrum with perhaps
10.sup.8 original data points can be compressed to an equivalent
spectrum of 10.sup.4 data points.
[0689] In this context, the equivalent spectrum may be referred to
as "a spectral data set," "a data set comprising spectral data,"
etc.
[0690] Software for such processing of NMR spectra, for example
AMIX (Analysis of MIXture, V 2.5, Bruker Analytik, Rheinstetten,
Germany) is commercially available.
[0691] Often, certain spectral regions carry no real diagnostic
information, or carry conflicting biochemical information, and it
is often useful to remove these "redundant" regions before
performing detailed analysis. In the simplest approach, the data
points are deleted. In another simple approach, the data in the
redundant regions are replaced with zero values.
[0692] For example, due to the dynamic range problem with water in
comparison with other molecules, the water resonance (around
.delta. 4.7) is suppressed. However, small variations in water
suppression remain, and these variations can undesirably complicate
analysis. Similarly, variations in water suppression may also
affect the urea signal (around .delta. 6.0), by cross saturation.
Therefore, it is often useful to delete certain spectral regions,
for example, from about .delta. 4.5 to 6.0 (e.g., .delta. 4.52 to
6.00).
[0693] In general, NMR data is handled as a data matrix. Typically,
each row in the matrix corresponds to an individual sample (often
referred to as a "data vector"), and the entries in the columns
are, for example, spectral intensity of a particular data point, at
a particular .delta. or .DELTA..delta. (often referred to as
"descriptors").
[0694] It is often useful to pre-process data, for example, by
addressing missing data, translation, scaling, weighting, etc.
[0695] Multivariate projection methods, such as principal component
analysis (PCA) and partial least squares analysis (PLS), are
so-called scaling sensitive methods. By using prior knowledge and
experience about the type of data studied, the quality of the data
prior to multivariate modelling can be enhanced by scaling and/or
weighting. Adequate scaling and/or weighting can reveal the
important and interesting variation hidden within in the data, and
therefore make subsequent multivariate modelling more efficient.
Scaling and weighting may be used to place the data in the correct
metric, based on knowledge and experience of the studied system,
and therefore reveal patterns already inherently present in the
data.
[0696] If at all possible, missing data, for example, gaps in
column values, should be avoided. However, if necessary, such
missing data may replaced or "filled" with, for example, the mean
value of a column ("mean fill"); a random value ("random fill"); or
a value based on a principal component analysis ("principal
component fill"). Each of these different approaches will have a
different effect on subsequent PR analysis.
[0697] "Translation" of the descriptor coordinate axes can be
useful. Examples of such translation include normalisation and mean
centring.
[0698] "Normalisation" may be used to remove sample-to-sample
variation. Many normalisation approaches are possible, and they can
often be applied at any of several points in the analysis. Usually,
normalisation is applied after redundant spectral regions have been
removed. In one approach, each spectrum is normalised (scaled) by a
factor of 1/A, where A is the sum of the absolute values of all of
the descriptors for that spectrum. In this way, each data vector
has the same length, specifically, 1. For example, if the sum of
the absolute values of intensities for each bucket in a particular
spectrum is 1067, then the intensity for each bucket for this
particular spectrum is scaled by 1/1067.
[0699] "Mean centring" may be used to simplify interpretation.
Usually, for each descriptor, the average value of that descriptor
for all samples is subtracted. In this way, the mean of a
descriptor coincides with the origin, and all descriptors are
"centred" at zero. For example, if the average intensity at .delta.
10.0-9.96, for all spectra, is 1.2 units, then the intensity at
.delta. 10.0-9.96, for all spectra, is reduced by 1.2 units.
[0700] In "unit variance scaling," data can be scaled to equal
variance. Usually, the value of each descriptor is scaled by
1/StDev, where StDev is the standard deviation for that descriptor
for all samples. For example, if the standard deviation at .delta.
10.0-9.96, for all spectra, is 2.5 units, then the intensity at
.delta. 10.0-9.96, for all spectra, is scaled by 1/2.5 or 0.4. Unit
variance scaling may be used to reduce the impact of "noisy" data.
For example, some metabolites in biofluids show a strong degree of
physiological variation (e.g., diurnal variation, dietary-related
variation) that is unrelated to any pathophysiological process.
Without unit variance scaling, these noisy metabolites may dominate
subsequent analysis.
[0701] "Pareto scaling" is, in some sense, intermediate between
mean centering and unit variance scaling. In effect, smaller peaks
in the spectra can influence the model to a higher degree than for
the mean centered case. Also, the loadings are, in general, more
interpretable than for unit variance based models. In pareto
scaling, the value of each descriptor is scaled by 1/sqrt(StDev),
where StDev is the standard deviation for that descriptor for all
samples. In this way, each descriptor has a variance numerically
equal to its initial standard deviation. The pareto scaling may be
performed, for example, on raw data or mean centered data.
[0702] "Logarithmic scaling" may be used to assist interpretation
when data have a positive skew and/or when data spans a large
range, e.g., several orders of magnitude. Usually, for each
descriptor, the value is replaced by the logarithm of that value.
For example, the intensity at .delta. 10.0-9.96 is replaced the
logarithm of the intensity at .delta. 10.0-9.96, for all
spectra.
[0703] In "equal range scaling," each descriptor is divided by the
range of that descriptor for all samples. In this way, all
descriptors have the same range, that is, 1. For example, if, at
.delta. 10.0-9.96, for all spectra, the largest value is 87 units
and the smallest value is 1, then the range is 86 units, and the
intensity at .delta. 10.0-9.96, for all spectra, is divided by 86
units. However, this method is sensitive to presence of outlier
points.
[0704] In "autoscaling," each data vector is mean centred and unit
variance scaled. This technique is a very useful because each
descriptor is then weighted equally and, in the case of NMR
descriptors, large and small peaks are treated with equal emphasis.
This can be important for metabolites present at very low, but
still detectable, levels.
[0705] Several supervised methods of scaling data are also known.
Some of these can provide a measure of the ability of a parameter
(e.g., a descriptor) to discriminate between classes, and can be
used to improve classification by stretching a separation.
[0706] For example, in "variance weighting," the variance weight of
a single parameter (e.g., a descriptor) is calculated as the ratio
of the inter-class variances to the sum of the intra-class
variances. A large value means that this variable is discriminating
between the classes. For example, if the samples are known to fall
into two classes (e.g., a training set), it is possible to examine
the mean and variance of each descriptor. If a descriptor has very
different mean values and a small variance, then it will be good at
separating the classes.
[0707] "Feature weighting" is a more general description of
variance weighting, where not only the mean and standard deviation
of each descriptor is calculated, but other well known weighting
factors, such as the Fisher weight, are used.
[0708] Multivariate Statistical Analysis
[0709] As discussed above, multivariate statistics analysis
methods, including pattern recognition methods, are often the most
convenient and efficient way to analyse complex data, such as NMR
spectra.
[0710] For example, such analysis methods may be used to identify,
for example diagnostic spectral windows and/or diagnostic species,
for a particular condition under study.
[0711] Also, such analysis methods may be used to form a predictive
model, and then use that model to classify test data. For example,
one convenient and particularly effective method of classification
employs multivariate statistical analysis modelling, first to form
a model (a "predictive mathematical model") using data ("modelling
data") from samples of known class (e.g., from subjects known to
have, or not have, a particular condition), and second to classify
an unknown sample (e.g., "test data"), as having, or not having,
that condition.
[0712] Examples of pattern recognition methods include, but are not
limited to, Principal Component Analysis (PCA) and Partial Least
Squares-Discriminant Analysis (PLS-DA).
[0713] PCA is a bilinear decomposition method used for overviewing
"clusters" within multivariate data. The data are represented in
K-dimensional space (where K is equal to the number of variables)
and reduced to a few principal components (or latent variables)
which describe the maximum variation within the data, independent
of any knowledge of class membership (i.e., "unsupervised"). The
principal components are displayed as a set of "scores" (t) which
highlight clustering, trends, or outliers, and a set of "loadings"
(p) which highlight the influence of input variables on t. See, for
example, Kowalski et al., 1986).
[0714] The PCA decomposition can be described by the following
equation:
X=TP'+E
[0715] where T is the set of scores explaining the systematic
variation between the observations in X and P is the set of
loadings explaining the between variable variation and provides the
explanation to clusters, trends, and outliers in the score space.
The non-systematic part of the variation not explained by the model
forms the residuals, E.
[0716] PLS-DA is a supervised multivariate method yielding latent
variables describing maximum separation between known classes of
samples. PLS-DA is based on PLS which is the regression extension
of the PCA method explained earlier. When PCA works to explain
maximum variation between the studied samples PLS-DA suffices to
explain maximum separation between known classes of samples in the
data (X). This is done by a PLS regression against a "dummy vector
or matrix" (Y) carrying the class separating information. The
calculated PLS components will thereby be more focused on
describing the variation separating the classes in X if this
information is present in the data. From an interpretation point of
view all the features of PLS can be used, which means that the
variation can be interpreted in terms of scores (t,u), loadings
(p,c), PLS weights (w) and regression coefficients (b). The fact
that a regression is carried out against a known class separation
means that the PLS-DA is a supervised method and that the class
membership has to be known prior to the actual modelling. Once a
model is calculated and validated it can be used for prediction of
class membership for "new" unknown samples. Judgement of class
membership is done on basis of predicted class membership (Ypred),
predicted scores (tpred) and predicted residuals (DmodXpred) using
statistical significance limits for the decision. See, for example,
Sjostrom et al., 1986; Stahle et al., 1987.
[0717] In PLS, the variation between the objects in X is described
by the X-scores, T, and the variation in the Y-block regressed
against is described in the Y-scores, U. In PLS-DA the Y-block is a
"dummy vector or matrix" describing the class membership of each
observation. Basically, what PLS does is to maximize the covariance
between T and U. For each component, a PLS weight vector, w, is
calculated, containing the influence of each X-variable on the
explanation of the variation in Y. Together the weight vectors will
form a matrix, W, containing the variation in X that maximizes the
covariance between the scores T and U for each calculated
component. For PLS-DA this means that the weights, W, contain the
variation in X that is correlated to the class separation described
in Y. The Y-block matrix of weights is designated C. A matrix of
X-loadings, P, is also calculated. These loadings are apart from
interpretation used to perform the proper decomposition of X.
[0718] The PLS decomposition of X and Y can hence be described as
follows:
X=TP'+E
Y=TC'+F
[0719] The PLS regression coefficients, B, are then given by:
B=W(P'W).sup.-1C'
[0720] The estimate of Y, Y.sub.hat, can then be calculated
according to the following formula:
Y.sub.hat=XW(P'W).sup.-1C'=XB
[0721] Both of the pattern recognition algorithms exemplified
herein (PCA, PLS-DA) rely on extraction of linear associations
between the input variables. When such linear relationships are
insufficient, neural network-based pattern recognition techniques
can in some cases improve the ability to classify individuals on
the basis of the many inter-related input variables (see, e.g.,
Ala-Korpela et al., 1995; Hiltunen et al., 1995). Nevertheless, the
methods applied herein are sufficiently powerful to allow
classification of the individuals studied, and they provide an
additional benefit over neural network methods in that they allow
some information to be gained as to what aspects of the input
dataset were particularly important in allowing classification to
be made.
[0722] Spurious or irregular data in spectra ("outliers"), which
are not representative, are preferably identified and removed.
Common reasons for irregular data ("outliers") include spectral
artefacts such as poor phase correction, poor baseline correction,
poor chemical shift referencing, poor water suppression, and
biological effects such as bacterial contamination, shifts in the
pH of the biofluid, toxin- or disease-induced biochemical response,
and other conditions, e.g., pathological conditions, which have
metabolic consequences, e.g., diabetes.
[0723] Outliers are identified in different ways depending on the
method of analysis used. For example, when using principal
component analysis (PCA), small numbers of samples lying far from
the rest of the replicate group can be identified by eye as
outliers. A more objective means of identification for PCA is to
use the Hotelling's T Test which is the multivariate version of the
well known Student's T test used in univariate statistics. For any
given sample, the T2 value can be calculated and this is compared
with a standard value within which a chosen fraction (e.g., 95%) of
the samples would normally lie. Samples with T2 values
substantially outside this limit can then be flagged as
outliers.
[0724] Also, when using more sophisticated supervised methods, such
as SIMCA or PNNs, a similar method is used. A confidence level
(e.g., 95%) is selected and the region of multivariate space
corresponding to confidence values above this limit is determined.
This region can be displayed graphically in several different ways
(for example by plotting the critical T2 ellipse on a PCA scores
plot). Any samples falling outside the high confidence region are
flagged as potential outliers.
[0725] Confidence Limits for outlier detection are also calculated
in the residual direction expressed as the distance to model in X
(DModX).
[0726] Briefly, DModX is the perpendicular distance of an object to
the principal component (or to the plane or hyper plane made up by
two or more principal components). In the SIMCA software, DModX is
calculated as:
DModX=v*sqrt(e.sup.2/K-A)
[0727] wherein e is the residual for a single observation;
[0728] K is the number of original variables in the data set;
[0729] A is the number of principal components in the model;
[0730] v is a correction factor, based on the number of
observations (N) and the number of principal components (A), and is
slightly larger than one.
[0731] The outliers in this direction are not as severe as those
occurring in the score direction but should always be carefully
examined before making a decision whether to include them in the
modelling or not. In general, all outliers are thoroughly
investigated, for example, by examining the contributing loadings
and distance to model (DModX) as well as visually inspecting the
original NMR spectrum for deviating features, before removing them
from the model. Outlier detection by automatic algorithm is a
possibility using the features of scores and residual distance to
model (DModX) described above.
[0732] When using PLS methods, the distance to the model in Y
(DmodY) can also be calculated in the same way.
[0733] Data Filtering
[0734] Although pattern recognition methods may be applied to
"unfiltered" data, it is often preferable to first filter data to
removed irrelevant variation.
[0735] In one method, latent variables which are of no interest may
be removed by "filtering."
[0736] Examples of filtering methods include the regression of
descriptor variables against an index based on sample class to
eliminate variables with low correlation to the predefined classes.
Related methods include target rotation (see, e.g., Kvalheim et
al., 1989) and PCT filtering (see, e.g., Sun, 1997). In these
methods, the removed variation is not necessarily completely
uncorrelated with sample class (i.e., orthogonal).
[0737] In another method, latent variables which are orthogonal to
some variation or class index of interest are removed by
"orthogonal filtering." Here, variation in the data which is not
correlated to (i.e., is orthogonal to) the class separating
variation of interest may be removed. Such methods are, in general,
more efficient than non-orthogonal filtering methods.
[0738] Various orthogonal filtering methods have been described
(see, e.g., Wold et al., 1998a; Fearn, 2000; Anderson, 1999;
Westerhuis et al., 2001; Wise et al., 2001).
[0739] One preferred orthogonal filtering method is conventionally
referred to as Orthogonal Signal Correction (OSC), wherein latent
variables orthogonal to the variation of interest are removed. See,
for example, Wold et al., 1998a.
[0740] The class identity is used as a response vector, Y, to
describe the variation between the sample classes. The OSC method
then locates the longest vector describing the variation between
the samples which is not correlated with the Y-vector, and removes
it from the data matrix. The resultant dataset has been filtered to
allow pattern recognition focused on the variation correlated to
features of interest within the sample population, rather than
non-correlated, orthogonal variation.
[0741] OSC is a method for spectral filtering that solves the
problem of unwanted systematic variation in the spectra by removing
components, latent variables, orthogonal to the response calibrated
against. In PLS, the weights, w, are calculated to maximise the
covariance between X and Y. In OSC, in contrast, the weights, w,
are calculated to minimize the covariance between X and Y, which is
the same as calculating components as close to orthogonal to Y as
possible. These components, orthogonal to Y, containing unwanted
systematic variation are then subtracted from the spectral data, X,
to produce a filtered predictor matrix describing the variation of
interest. Briefly, OSC can be described as a bilinear decomposition
of the spectral matrix, X, in a set of scores, T**, and a set of
corresponding loadings, P**, containing varition orthogonal to the
response, Y. The unexplained part or the residuals, E, is equal to
the filtered X-matrix, X.sub.osc, containing less unwanted
variation. The decomposition is described by the following
equation:
X=T**P**'+E
X.sub.osc=E
[0742] The OSC procedure starts by calculation of the first latent
variable or principal component describing the variation in the
data, X. The calculation is done according to the NIPALS
algorithm.
X=tp'+E
[0743] The first score vector, t, which is a summary of the between
sample variation in X, is then orthogonalized against response (Y),
giving the orthogonalized score vector t*.
t*=(I-Y(Y'Y).sup.-1Y')t
[0744] After orthogonalization, the PLS weights, w, are calculated
with the aim of making Xw=t*. By doing this, the weights, w, are
set to minimize the covariance between X and Y. The weights, w, are
given by:
w=x-t*
[0745] An estimate of the orthogonal score t** is calculated
from:
t**=Xw
[0746] The estimate or updated score vector t** is then again
orthogonalized to Y, and the iteration proceeds until t** has
converged. This will ensure that t** will converge towards the
longest vector orthogonal to response Y, still giving a good
description of the variation in X. The data, X, can then be
described as the score, t**, orthogonal to Y, times the
corresponding loading vector p**, plus the unexplained part, the
residual, E.
X=t**p**'+E
[0747] The residual, E, equals the filtered X, X.sub.osc, after
subtraction of the first component orthogonal to the response
Y.
E=X-t**p**'
Xosc=E
[0748] If more than one component needs to be removed, the same
procedure is repeated using the residual, E, as the starting data
matrix, X.
[0749] New external data not present in the model calculation must
be treated according to filtering of the modelling data. This is
done by using the calculated weights, w, from the filtering to
calculate a score vector, t.sub.new, for the new data,
X.sub.new.
t.sub.new=X.sub.newW
[0750] By subtracting t.sub.new times the loading vector from the
calibration, p**, from the new external data, X.sub.new, the
residual, E.sub.new, will be the resulting OSC filtered matrix for
the new external data.
E.sub.new=X.sub.new-t.sub.newP**'
[0751] If PCA suggests separation between the classes under
investigation, orthogonal signal correction (OSC) can be used to
optimize the separation, thus improving the performance of
subsequent multivariate pattern recognition analysis and enhancing
the predictive power of the model. In the examples described
herein, both PCA and PLS-DA analyses were improved by prior
application of OSC.
[0752] An example of a typical OSC process includes the following
steps:
[0753] (a) .sup.1H NMR data are segmented using AMIX, normalised,
and optionally scaled and/or mean centered. The default for
orthogonal filtering of spectral data is to use only mean centered
data, which means that the mean for each variable (spectral bucket)
is subtracted from each single variable in the data matrix.
[0754] (b) a response vector (y) describing the class separating
variation is created by assigning class membership to each
sample.
[0755] (c) one latent variable orthogonal to the response vector
(y) is removed according to the OSC algorithm.
[0756] (d) if desired, the removed orthogonal variation can be
viewed and interpreted in terms of scores (T) and loadings (P).
[0757] (e) the filtered data matrix, which contains less variation
not correlated to class separation, is next used for further
multivariate modelling after optional scaling and/or mean
centering.
[0758] Any particular model is only as good as the data used to
formulate it. Therefore, it is preferable that all modelling data
and test data are obtained under the same (or similar) conditions
and using the same (or similar) experimental parameters. Such
conditions and parameters include, for example, sample type (e.g.,
plasma, serum), sample collection and handling protocol, sample
dilution, NMR analysis (e.g., type, field strength/frequency,
temperature), and data-processing (e.g., referencing, baseline
correction, normalisation). If appropriate, it may be desirable to
formulate models for a particular sub-group of cases, e.g.,
according to any of the parameters mentioned above (e.g., field
strength/frequency), or others, such as sex, age, ethnicity,
medical history, lifestyle (e.g., smoker, nonsmoker), hormonal
status (e.g., pre-menopausal, post-menopausal).
[0759] In general, the quality of the model improves as the amount
of modelling data increases. Nonetheless, as shown in the examples
below, even relatively small sets of modelling data (e.g., about
50-100 subjects) is sufficient to achieve a confident
classification (e.g., diagnosis).
[0760] A typical unsupervised modelling process includes the
following steps:
[0761] (a) optionally scaling and/or mean centering modelling
data;
[0762] (b) classifying data (e.g., as control or positive, e.g.,
diseased);
[0763] (c) fitting the model (e.g., using PCA, PLS-DA);
[0764] (d) identifying and removing outliers, if any;
[0765] (e) re-fitting the model;
[0766] (f) optionally repeating (c), (d), and (e) as necessary.
[0767] Optionally (and preferably), data filtering is performed
following step (d) and before step (e). Optionally (and
preferably), orthogonal filtering (e.g., OSC) is performed
following step (d) and before step (e).
[0768] An example of a typical PLS-DA modelling process, using OSC
filtered data, includes the following steps:
[0769] (a) OSC filtered data is optionally scaled and/or mean
centered.
[0770] (b) a response vector (y) describing the class separating
variation is created by assigning class membership to all
samples.
[0771] (c) a PLS regression model is calculated between the OSC
filtered data and the response vector (y). The calculated latent
variables or PLS components will be focused on describing maximum
separation between the known classes.
[0772] (d) the model is interpreted by viewing scores (T), loadings
(P), PLS weights (W), PLS coefficients (B) and residuals (E).
Together they will function as a means for describing the
separation between the classes as well as provide an explanation to
the observed separation.
[0773] Once the model has been calculated, it may be verified using
data for samples of known class which were not used to calculate
the model. In this way, the ability of the model to accurately
predict classes may be tested. This may be achieved, for example,
in the method above, with the following additional step:
[0774] (e) a set of external samples, with known class belonging,
which were not used in the (e.g., PLS) model calculation is used
for validation of the model's predictive ability. The prediction
results are investigated, fore example, in terms of predicted
response (y.sub.pred), predicted scores (T.sub.pred), and predicted
residuals described as predicted distance to model
(DmodX.sub.pred).
[0775] The model may then be used to classify test data, of unknown
class. Before classification, the test data are numerically
pre-processed in the same manner as the modelling data.
[0776] Interpreting the output from the pattern recognition (PR)
analysis provides useful information on the biomarkers responsible
for the separation of the biological classes. Of course, the PR
output differs somewhat depending on the data analysis method used.
As mentioned above, methods for PR and interpretation of the
results are known in the art. Interpretation methods for two PR
techniques (PCA and PLS-DA) are discussed briefly herein.
[0777] Interpreting PCA Results
[0778] The data matrix (X) is built up by N observations (samples,
rats, patients, etc.) and K variables (spectral buckets carrying
the biomarker information in terms of .sup.1H-NMR resonances).
[0779] In PCA, the N*K matrix (X) is decomposed into a few latent
variables or principal components (PCs) describing the systematic
variation in the data. Since PCA is a bilinear decomposition
method, each PC can be divided into two vectors, scores (t) and
loadings (p). The scores can be described as the projection of each
observation on to each PC and the loadings as the contribution of
each variable (spectral bucket) to the PC expressed in terms of
direction.
[0780] Any clustering of observations (samples) along a direction
found in scores plots (e.g., PC1 versus PC2) can be explained by
identifying which variables (spectral buckets) have high loadings
for this particular direction in the scores. A high loading is
defined as a variable (spectral bucket) that changes between the
observations in a systematic way showing a trend which matches the
sample positions in the scores plot. Each spectral bucket with a
high loading, or a combination thereof, is defined by its .sup.1H
NMR chemical shift position; this is its diagnostic spectral
window. These chemical shift values then allow the skilled NMR
spectroscopist to examine the original NMR spectra and identify the
molecules giving rise to the peaks in the relevant buckets; these
are the biomarkers. This is typically done using a combination of
standard 1- and 2-dimensional NMR methods.
[0781] If, in a scores plot, separation of two classes of sample
can be seen in a particular direction, then examination of those
loadings which are in the same direction as in the scores plots
indicates which loadings are important for the class
identification. The loadings plot shows points which are labelled
according to the bucket chemical shift. This is the .sup.1H NMR
spectroscopic chemical shift which corresponds to the centre of the
bucket. This bucket defines a diagnostic spectral window. Given a
list of these bucket identifiers, the skilled NMR spectroscopist
then re-examines the .sup.1H NMR spectra and identifies, within the
bucket width, which of several possible NMR resonances are changed
between the two classes. The important resonance is characterised
in terms of exact chemical shift, intensity, and peak multiplicity.
Using other NMR experiments, such as 2-D NMR spectroscopy and/or
separation of the specific molecule using HPLC-NMR-MS for example,
other resonances from the same molecule are identified and
ultimately, on the basis of all of the NMR data and other data if
appropriate, an identification of the molecule (biomarker) is
made.
[0782] In a classification situation as described herein, one
procedure for finding relevant biomarkers using PCA is as
follows:
[0783] (a) PCA of the data matrix (X) containing N observations
belonging to either of two known classes (healthy or diseased). The
description of the observations lies in the K variables (spectral
buckets) containing the biomarker information in terms of .sup.1H
NMR resonances.
[0784] (b) Interpretation of the scores (t) to find the direction
for the separation between the two known classes in X.
[0785] (c) Interpretation of loadings (p) reveals which variables
(spectral buckets) have the largest impact on the direction for
separation described in the scores (t). This identifies the
relevant diagnostic spectral windows.
[0786] (d) Assignment of the spectral buckets or combinations
thereof to certain biomarkers. This is done, for example, by
interpretation of the resonances in .sup.1H NMR spectra and by
using previously assigned spectra of the same type as a library for
assignments.
[0787] Interpreting PLS-DA Results
[0788] In PLS-DA, which is a regression extension of the PCA
method, the options for interpretation are more extensive compared
to the PCA case. PLS-DA performs a regression between the data
matrix (X) and a "dummy matrix" (Y) containing the class membership
information (e.g., samples may be assigned the value 1 for healthy
and 2 for diseased classes). The calculated PLS components will
describe the maximum covariance between X and Y which in this case
is the same as maximum separation between the known classes in X.
The interpretation of scores (t) and loadings (p) is the same in
PLS-DA as in PCA. Interpretation of the PLS weights (w) for each
component provides an explanation of the variables in X correlated
to the variation in Y. This will give biomarker information for the
separation between the classes.
[0789] Since PLS-DA is a regression method, the features of
regression coefficients (b) can also be used for discovery and
interpretation of biomarkers. The regression coefficients (b) in
PLS-DA provide a summary of which variables in X (spectral buckets)
that are most important in terms of both describing variation in X
and correlating to Y. This means that variables (spectral buckets)
with high regression coefficients are important for separating the
known classes in X since the Y matrix against which it is
correlated only contains information on the class identity of each
sample.
[0790] Again, as discussed above, the scores plot is examined to
identify important loadings, diagnostic spectral windows, relevant
NMR resonances, and ultimately the associated biomarkers.
[0791] In a classification situation as described herein, one
procedure for finding relevant biomarkers using PLS-DA is as
follows:
[0792] (a) A PLS model between the N*K data matrix (X) against a
"dummy matrix" Y, containing information on class membership for
the observations in X, is calculated yielding a few latent
variables (PLS components) describing maximum separation between
the two classes in X (e.g., healthy and diseased).
[0793] (b) Interpretation of the scores (t) to find the direction
for the separation between the two known classes in X.
[0794] (c) Interpretation of loadings (p) revealing which variables
(spectral buckets) have the largest impact on the direction for
separation described in the scores (t); these are diagnostic
spectral windows.
[0795] In PLS-DA, a variable importance plot (VIP) is another
method of evaluating the significance of loadings in causing a
separation of class of sample in a scores plot. Typically, the VIP
is a squared function of PLS weights, and therefore only positive
numerical values are encountered; in addition, for a given model,
there is only one set of VIP-values. Variables with a VIP value of
greater than 1 are considered most influential for the model. The
VIP shows each loading in a decreasing order of importance for
class separation based on the PLS regression against class
variable.
[0796] A (w*c) plot is another diagnostic plot obtained from a
PLS-DA analysis. It shows which descriptors are mainly responsible
for class separation. The (w*c) parameters are an attempt to
describe the total variable correlations in the model, i.e.,
between the descriptors (e.g., NMR intensities in buckets), between
the NMR descriptors and the class variables, and between class
variables if they exist (in the present two class case, where
samples are assigned by definition to class 1 and class 2 there is
no correlation). Thus for a situation in a scores plot (e.g., t1
vs. t2), if class I samples are clustered in the upper right hand
quadrant and class 2 samples are clustered in the lower left hand
quadrant, then the (w*c) plot will show descriptors also in these
quadrants. Descriptors in the upper right hand quadrant are
increased in class 1 compared to class 2 and vice versa for the
lower left hand quadrant.
[0797] (d) Interpretation of PLS weights (w) reveals which
variables (spectral buckets) in X are important for correlation to
Y (class separation); these, too, are diagnostic spectral
windows.
[0798] (e) Interpretation of the PLS regression coefficients (b)
reveals an overall summary of which variables (spectral buckets)
have the largest impact on the direction for separation described
in the scores; these, too, are diagnostic spectral windows.
[0799] In a typical regression coefficient plot for .sup.1H NMR,
each bar represents a spectral region (e.g., 0.04 ppm) and shows
how the .sup.1H NMR profile of one class of samples differs from
the .sup.1H NMR profile of a second class of samples. A positive
value on the x-axis indicates there is a relatively greater
concentration of metabolite (assigned using NMR chemical shift
assignment tables) in one class as compared to the other class, and
a negative value on the x-axis indicates a relatively lower
concentration in one class as compared to the other class.
[0800] (f) Assignment of the spectral buckets or combinations
thereof to certain biomarkers. This is done, for example, by
interpretation of the resonances in .sup.1H NMR spectra and by
using previously assigned spectra of the same type as a library for
assignments.
[0801] Timed Sampling
[0802] The analysis methods described herein can be applied to a
single sample, or alternatively, to a timed series of samples.
These samples may be taken relatively close together in time (e.g.,
daily) or less frequently (e.g., monthly or yearly).
[0803] The timed series of samples may be used for one or more
purposes, e.g., to make sequential diagnoses, applying the same
classification method as if each sample were a single sample. This
will allow greater confidence in the diagnosis compared to
obtaining a single sample for the patient, or alternatively to
monitor temporal changes in the subject (e.g., changes in the
underlying condition being diagnosed, treated, etc.).
[0804] Alternatively, the timed series of samples can be
collectively treated as a single dataset increasing the information
density of the input dataset and hence increasing the power of the
analysis method to identify weaker patterns.
[0805] As yet another alternative, the timed series of samples can
be collectively processed to yield a single dataset in which the
temporal changes (e.g., in each bin) is included as an extra list
of variables (e.g., as in composite data sets). Temporal changes in
the amount of (e.g., endogenous) diagnostic species may greatly
improve the ability of the analysis method to accurate classify
patterns (especially when patterns are weak).
[0806] Batch Modelling
[0807] The methods described herein, including their applications
(e.g., diagnosis, prognosis), may be further improved by employing
batch modelling.
[0808] Statistical batch processing can be divided into two levels
of multivariate modelling. The lower or the observation level is
usually based on Partial Least Squares (PLS) regression against
time (or any other index describing process maturity), whereas the
upper or batch level consists of a PCA based on the scores from the
lower level PLS model. PLS can also be used in the upper level to
correlate the matrix based on the lower level scores with the end
properties of the separate batches. This is common in industrial
applications where properties of the end product are used as a
description of quality.
[0809] At the lower level of the Batch modelling the evolution of
the studied process with time (maturity) can be monitored and
interpreted in terms of PLS scores and loadings. Since the PLS
performs a regression against sampling time (maturity), the
calculated components will be focused on the evolution with time.
The fact that the calculated PLS components are orthogonal to each
other means that it is possible to detect independent time
(maturity) profiles and also to interpret which measured variables
are causing these profiles. Confidence limits are used for
detection of deviating behaviour of any spectra at any time point
for some optional significance level, usually 95% and/or 99%.
[0810] The residuals expressed as distance to model (DModX) is, at
the lower level, another important tool for detecting outlying
batches or deviating behaviour for a specific batch at a specific
time point. The upper level or batch level provides the possibility
to just look at the difference between the separate batches. This
is done by using the lower level scores including all time points
for each batch as new variables describing each single batch and
then performing a PCA on this new data matrix. The features of
scores, loadings and DmodX are used in the same way as for ordinary
PCA analysis, with the exception that the upper level loadings can
be traced back down to the lower level for a more detailed
explanation in the original loadings.
[0811] Predictions for "new" batches can be done on both levels of
the batch model. On the lower level monitoring of evolution with
time using scores and DmodX is a powerful tool for detecting
deviating behaviour from normality for batch at any time point. On
the upper level prediction of single batch behaviour can be done in
terms of scores and DmodX.
[0812] The definition of a batch process, and also a requirement
for batch modelling, is a process where all batches have equal
duration and are synchronised according to sample collection. For
example, samples taken from a cohort of animals at identical fixed
time points to monitor the effects of an administered xenobiotic
substance.
[0813] The advantage of using batch modelling for such studies is
the possibility of detecting known, or discovering new, metabolic
processes which evolve with time in the lower level scores, and
also the identification of the actual metabolites involved in the
different processes from the contributing lower level loadings. The
lower level analysis also makes it possible to differentiate
between single observations (e.g., individual animals at specific
time points).
[0814] Applications for the lower level modelling include, for
example, distinguishing between undosed controls and dosed animals
in terms of metabolic effects of dosing in certain time points; and
creating models for normality and using the models as a
classification tool for new samples, e.g., as normal or abnormal.
This may be achieved using a PLS prediction of the new sample's
class using the model describing normality. Decisions can then be
made on basis of the combination of the predicted scores and
residuals (DmodX).
[0815] An automated expert system can be used for early fault
detection in the lower level batch modelling, and this can be used
to further enhance the analysis procedure and improve
efficiency.
[0816] The upper level provides the possibility of making
predictions of new animals using the existing model. Abnormal
animals can then be detected by judging predicted scores and
residuals (DmodX) together. Since the upper level model is based on
the lower level scores, the interpretation of an animal predicted
to be abnormal can be traced back to the original lower level
scores and loadings as well as the original raw variables making up
the NMR spectra. Combining the upper and lower level for prediction
of the status of a new animal, the classification can be based on
four parameters: upper level scores and residuals (DmodX) and lover
level scores and residuals (DModX). This demonstrates that batch
modelling is an efficient tool for determining if an animal is
normal or abnormal, and if the latter, why and when they are
deviating from normality.
[0817] See, for example, Wold et al, 1998b and Eriksson et al.,
1999.
[0818] Integrated Metabonomics
[0819] As discussed above, many of the methods of the present
invention may also be applied to composite data or composite data
sets. The term "composite data set," as used herein, pertains to a
spectrum (or data vector) which comprises spectral data (e.g., NMR
spectral data, e.g., an NMR spectrum) as well as at least one other
datum or data vector. Examples of other data vectors include, e.g.,
one or more other NMR spectral data, e.g., NMR spectra, e.g.,
obtained for the same sample using a different NMR technique; other
types of spectra, e.g., mass spectra, numerical representations of
images, etc.; obtained for the another sample, of the same sample
type (e.g., blood, urine, tissue, tissue extract), but obtained
from the subject at a different timepoint; obtained for another
sample of different sample type (e.g., blood, urine, tissue, tissue
extract) for the same subject; and the like.
[0820] Examples of other data including, e.g., one or more clinical
parameters. Clinical parameters which are suitable for use in
composite methods include, but are not limited to, the
following:
[0821] (a) established clinical parameters routinely measured in
hospital clincal labs: age; sex; body mass index; height; weight;
family history; medication history; cigarette smoking; alcohol
intake; blood pressure; full blood cell count (FBCs); red blood
cells; white blood cells; monocytes; lymphocytes; neutrophils;
eosinophils; basophils; platelets; haematocrit; haemoglobin; mean
corpuscular volume and related haemodilution indicators;
fibrinogen; functional clotting parameters (thromoboplastin and
partial thromboplastin); electrolytes (sodium, potassium, calcium,
phosphate); urea; creatinine; total protein; albumin; globulin;
bilirubin; protein markers of liver function (alanine
aminotransferase, alkaline phosphatase, gamma glutamyl
transferase); glucose; Hbalc (a measure of glucose-Haemoglobin
conjugates used to monitor diabetes); lipoprotein profile; total
cholesterol; LDL; HDL; triglycerides; blood group.
[0822] (b) established research parameters routinely measured in
research laboratories but not usually measured in hospitals:
hormonal status; testosterone; estrogen; progesterone; follicle
stimulating hormone; inhibin; transforming growth factor-betal;
Transforming growth factor-beta2; chemokines; MCP-1; eotaxin;
plasminogen activator inhibitor-1; cystatin C.
[0823] (c) early-stage research parameters measured in one or a
small number of specialist labs: antibodies to sRII; antibodies to
blood group A antigen; antibodies to blood group B antigen;
immunoglobulin (IgD) against alpha-gal; immunoglobulin (IgD)
against penta-gal.
[0824] Diagnostic Spectral Windows
[0825] As discussed above, many of the methods of the present
invention involve relating NMR spectral intensity at one or more
predetermined diagnostic spectral windows with a predetermined
condition.
[0826] Examples of methods for identifying one or more suitable
diagnostic spectral windows for a given condition, using, for
example, pattern recognition methods, are described herein.
[0827] The term "diagnostic spectral window," as used herein,
pertains to narrow range of chemical shift (.DELTA..delta.) values
encompassing an index value, .delta..sub.r (that is, .delta..sub.r
falls within the range .DELTA..delta.). Each index value, and its
associated spectral window, define a range of chemical shift
(.DELTA..delta.) in which the NMR spectral intensity is indicative
of the presence of one or more chemical species.
[0828] For 2D NMR methods, the diagnostic spectral window refers to
a chemical shift patch (.DELTA..delta..sub.1, .DELTA..delta..sub.2)
which encompasses an index value, [.delta..sub.r1, .delta..sub.r2].
For 3D NMR methods, the diagnostic spectral window refers to a
chemical shift volume (.DELTA..delta..sub.1, .DELTA..delta..sub.2,
.DELTA..delta..sub.3) which encompasses an index value,
[.delta..sub.r1, .delta..sub.r2, .delta..sub.3].
[0829] In one embodiment, the spectral window is centred with
respect to its index value (e.g., .delta..sub.r=1.30;
.vertline..DELTA..delta..vertl- ine.=.delta. 0.04, and
.DELTA..delta. 1.28-1.32).
[0830] The breadth of the range,
.vertline..DELTA..delta..vertline., is determined largely by the
spectroscopic parameters, such as field strength/frequency,
temperature, sample viscosity, etc. The breadth of the range is
often chosen to encompass a typical spin-coupled multiplet pattern.
For peaks whose position varies with sample pH, the breadth of the
range is may be widened to encompass the expected range of
positions.
[0831] Typically, the breadth of the range,
.vertline..DELTA..delta..vertl- ine., is from about .delta. 0.001
to about .delta. 0.2.
[0832] In one embodiment, the breadth is from about .delta. 0.005
to about .delta. 0.1.
[0833] In one embodiment, the breadth is from about .delta. 0.005
to about .delta. 0.08.
[0834] In one embodiment, the breadth is from about .delta. 0.01 to
about .delta. 0.08.
[0835] In one embodiment, the breadth is from about .delta. 0.02 to
about .delta. 0.08.
[0836] In one embodiment, the breadth is from about .delta. 0.005
to about .delta. 0.06.
[0837] In one embodiment, the breadth is from about .delta. 0.01 to
about .delta. 0.06.
[0838] In one embodiment, the breadth is from about .delta. 0.02 to
about .delta. 0.06.
[0839] In one embodiment, the breadth is about .delta. 0.04.
[0840] In one embodiment, the breadth is equal to the "bucket" or
"bin" width. In one embodiment, the breadth is equal to an integer
multiple of the "bucket" or "bin" width.
[0841] Although the diagnostic spectral windows are determined in
relation to the condition under study, the precise index values for
such windows may vary in accordance with the experimental
parameters employed, for example, the digital resolution in the
original spectra, the width of the buckets used, the temperature of
the spectral data acquisition, etc. The exact composition of the
sample (e.g., biofluid, tissue, etc.) can affect peak positions by
compartmentation, metal complexation, protein-small molecule
binding, etc. The observation frequency will have an effect because
of different degrees of peak overlap and of first/second order
nature of spectra.
[0842] In one embodiment, said one or more predetermined diagnostic
spectral windows is: a single predetermined diagnostic spectral
window.
[0843] In one embodiment, said one or more predetermined diagnostic
spectral windows is: a plurality of predetermined diagnostic
spectral windows. In practice, this may be preferred.
[0844] Although the theoretical limit on the number of
predetermined diagnostic spectral windows is a function of the data
density (e.g., the number of variables, e.g., buckets), typically
the number of predetermined diagnostic spectral windows is from 1
to about 30. It is possible for the actual number to be in any
sub-range within these general limits. Examples of lower limits
include 1, 2, 3, 4, 5, 6, 8, 10, and 15. Examples of upper limits
include 3, 4, 5, 6, 8, 10, 15, 20, 25, and 30.
[0845] In one embodiment, the number is from 1 to about 20.
[0846] In one embodiment, the number is from 1 to about 15.
[0847] In one embodiment, the number is from 1 to about 10.
[0848] In one embodiment, the number is from 1 to about 8.
[0849] In one embodiment, the number is from 1 to about 6.
[0850] In one embodiment, the number is from 1 to about 5.
[0851] In one embodiment, the number is from 1 to about 4.
[0852] In one embodiment, the number is from 1 to about 3.
[0853] In one embodiment, the number is 1 or 2.
[0854] In one embodiment, said one or more predetermined diagnostic
spectral windows is: a plurality of diagnostic spectral windows;
and, said NMR spectral intensity at one or more predetermined
diagnostic spectral windows is: a combination of a plurality of NMR
spectral intensities, each of which is NMR spectral intensity for
one of said plurality of predetermined diagnostic spectral
windows.
[0855] In one embodiment, said combination is a linear
combination.
[0856] In one embodiment, at least one of said one or more
predetermined diagnostic spectral windows encompasses a chemical
shift value for an NMR resonance of a diagnostic species (e.g., a
.sup.1H NMR resonance of a diagnostic species).
[0857] In one embodiment, each of a plurality of said one or more
predetermined diagnostic spectral windows encompasses a chemical
shift value for an NMR resonance of a diagnostic species (e.g., a
.sup.1H NMR resonance of a diagnostic species).
[0858] In one embodiment, each of said one or more predetermined
diagnostic spectral windows encompasses a chemical shift value for
an NMR resonance of a diagnostic species (e.g., a .sup.1H NMR
resonance of a diagnostic species).
[0859] Diagnostic Spectral Windows--Hypertension
[0860] It is believed that the index values, and the associated
diagnostic spectral windows, primarily reflect the species
described in Table 2-1-HYP, Table 2-2-HYP, and/or Table
2-3-HYP.
[0861] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 2-1-HYP, Table
2-2-HYP, and/or Table 2-3-HYP.
[0862] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 2-1-HYP, Table
2-2-HYP, and/or Table 2-3-HYP, and breadth of the range value,
.vertline..DELTA..delta..vertlin- e. about 0.04.
[0863] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 2-1-HYP, Table
2-2-HYP, and/or Table 2-3-HYP, and which are determined using the
conditions set forth in the section entitled "NMR Experimental
Parameters."
[0864] Diagnostic Spectral Windows--Atherosclerosis/CHD
[0865] It is believed that the index values, and the associated
diagnostic spectral windows, primarily reflect the species
described in Table 3-4-CHD.
[0866] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 3-4-CHD.
[0867] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 3-4-CHD, and
breadth of the range value, .vertline..DELTA..delta..vertline.
about 0.04.
[0868] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 3-4-CHD, and
which are determined using the conditions set forth in the section
entitled "NMR Experimental Parameters."
[0869] Diagnostic Spectral Windows--Osteoporosis
[0870] It is believed that the index values, and the associated
diagnostic spectral windows, primarily reflect the species
described in Table 4-1-OP and/of Table 4-2-OP.
[0871] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 4-1-OP and/or
Table 4-2-OP.
[0872] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 4-1-OP and/or
Table 4-2-OP, and breadth of the range value,
.vertline..DELTA..delta..vertline. about 0.04.
[0873] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 4-1-OP and/or
Table 4-2-OP, and which are determined using the conditions set
forth in the section entitled "NMR Experimental Parameters."
[0874] Diagnostic Spectral Windows--Osteoarthritis
[0875] It is believed that the index values, and the associated
diagnostic spectral windows, primarily reflect the species
described in Table 5-1-OA and/or Table 5-2-OA.
[0876] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 5-1-OA and/or
Table 5-2-OA.
[0877] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 5-1-OA and/or
Table 5-2-OA, and breadth of the range value,
.vertline..DELTA..delta..vertline. about 0.04.
[0878] In one embodiment, said predetermined diagnostic spectral
windows are defined by one or more index values, .delta..sub.r,
corresponding to the bucket regions listed in Table 5-1-OA and/or
Table 5-2-OA, and which are determined using the conditions set
forth in the section entitled "NMR Experimental Parameters."
[0879] Diagnostic Species and Biomarkers
[0880] The index values, and the associated diagnostic spectral
windows, define ranges of chemical shift in which NMR spectral
intensity is indicative of the presence of one or more chemical
species, one or more of which are diagnostic species (e.g.,
biomarkers), for example, for a condition (e.g., indication) under
study.
[0881] In one embodiment, said one or more diagnostic species are
endogenous diagnostic species.
[0882] In one embodiment, said one or more diagnostic species are
associated with NMR spectral intensity at predetermined diagnostic
spectral windows.
[0883] In one embodiment, said one or more diagnostic species are a
plurality of diagnostic species (i.e., a combination of diagnostic
species).
[0884] In one embodiment, said one or more diagnostic species is a
single diagnostic species.
[0885] The term "endogenous species," as used herein, pertains to
chemical species which originated from the subject under study, for
example, which were present in the sample of the subject.
[0886] Once an index value, and its associated diagnostic spectral
window, is identified (e.g., by the application of modelling
methods as described herein), it is often possible to identify one
or more putative biomarkers which give rise to NMR spectral
intensity in that particular window.
[0887] The (e.g., integrated) NMR spectral intensity in a
particular spectral window (e.g., bucket) is the sum of the
spectral intensity for all of the NMR peaks in that window. Usually
for small molecules which give sharp NMR peaks, it is possible to
examine the raw NMR data and determine which of the peaks is
responsible for that particular spectral window being selected as
significant by the applied pattern recognition method. The relevant
peak(s) are then assigned.
[0888] Such assignments may be made, for example, by reference to
published data; by comparison with spectra of authentic materials;
by standard addition of an authentic reference standard to the
sample; by separating the individual component, e.g., by using
HPLC-NMR and identifying it using NMR and mass spectrometry.
Additional confirmation of assignments is usually sought from the
application of other NMR methods, including, for example,
2-dimensional (2D) NMR methods.
[0889] In another approach, concentrations of candidate chemical
species are measured by another specific method (e.g., ELISA,
chromatography, RIA, etc.) and compared with the spectral intensity
observed in the relevant diagnostic spectral window, and any
correlation noted. This will reveal how much of the variance in the
diagnostic spectral window is contributed by the candidate chemical
species. This may also reveal that suspected diagnostic species
are, in fact, not highly correlated with the condition under
examination.
[0890] Methods of Identifying Diagnostic Species
[0891] Thus, the methods described herein also facilitate the
identification of species (often referred to as biomarkers or
diagnostic species) which are indicative (e.g., diagnostic) of a
particular condition. For example, particular metabolites (e.g., in
blood, urine, etc.) may be diagnostic of a particular
condition.
[0892] One aspect of the present invention pertains to a method of
identifying such diagnostic species (e.g., biomarkers), as
described herein.
[0893] One aspect of the present invention pertains to a method of
identifying a diagnostic species, or a combination of a plurality
of diagnostic species, for a predetermined condition, said method
comprising the steps of:
[0894] (a) applying a multivariate statistical analysis method to
experimental data;
[0895] wherein said experimental data comprises at least one data
comprising experimental parameters measured for each of a plurality
of experimental samples;
[0896] wherein said experimental samples define a class group
consisting of a plurality of classes;
[0897] wherein at least one of said plurality of classes is a class
associated with said predetermined condition, e.g., a class
associated with the presence of said predetermined condition;
[0898] wherein at least one of said plurality of classes is a class
not associated with said predetermined condition, e.g., a class
associated with the absence of said predetermined condition;
[0899] wherein each of said experimental samples is of known class
selected from said class group;
[0900] and:
[0901] (b) identifying one or more critical experimental
parameters;
[0902] wherein each of said critical experimental parameters is
statistically significantly different for classes of said class
group, e.g., is statistically significant for discriminating
between classes of said class group; and,
[0903] (c) matching each of one or more of said one or more
critical experimental parameters with said diagnostic species;
[0904] or:
[0905] (b) identifying a combination of a plurality of critical
experimental parameters;
[0906] wherein said combination of a plurality of critical
experimental parameters is statistically significantly different
for classes of said class group, e.g., is statistically significant
for discriminating between classes of said class group; and,
[0907] (c) matching each of one or more of said plurality of
critical experimental parameters with said combination of a
plurality of diagnostic species.
[0908] In one embodiment, one or more of said critical experimental
parameters is a spectral parameter (i.e., a critical experimental
spectral parameter); and said identifying and matching steps
are:
[0909] (b) identifying one or more critical experimental spectral
parameters; and,
[0910] (c) matching each of one or more of said one or more
critical experimental spectral parameters with a spectral feature,
e.g., a spectral peak; and
[0911] matching one or more of said spectral peaks with said
diagnostic species;
[0912] or:
[0913] (b) identifying a combination of a plurality of critical
experimental spectral parameters; and,
[0914] (c) matching each of a plurality of said plurality of
critical experimental spectral parameters with a spectral feature,
e.g., a spectral peak; and
[0915] matching one or more of said spectral peaks with said
combination of a plurality of diagnostic species.
[0916] In one embodiment, said multivariate statistical analysis
method is a multivariate statistical analysis method which employs
a pattern recognition method.
[0917] In one embodiment, said multivariate statistical analysis
method is, or employs PCA.
[0918] In one embodiment, said multivariate statistical analysis
method is, or employs PLS.
[0919] In one embodiment, said multivariate statistical analysis
method is, or employs PLS-DA.
[0920] In one embodiment, said multivariate statistical analysis
method includes a step of data filtering.
[0921] In one embodiment, said multivariate statistical analysis
method includes a step of orthogonal data filtering.
[0922] In one embodiment, said multivariate statistical analysis
method includes a step of OSC.
[0923] In one embodiment, said experimental parameters comprise
spectral data.
[0924] In one embodiment, said experimental parameters comprise
both spectral data and non-spectral data (and is referred to as a
"composite experimental data").
[0925] In one embodiment, said experimental parameters comprise NMR
spectral data.
[0926] In one embodiment, said experimental parameters comprise
both NMR spectral data and non-NMR spectral data.
[0927] In one embodiment, said NMR spectral data comprises .sup.1H
NMR spectral data and/or .sup.13C NMR spectral data.
[0928] In one embodiment, said NMR spectral data comprises .sup.1H
NMR spectral data.
[0929] In one embodiment, said non-spectral data is non-spectral
clinical data.
[0930] In one embodiment, said non-NMR spectral data is
non-spectral clinical data.
[0931] In one embodiment, said critical experimental parameters are
spectral parameters.
[0932] In one embodiment, said class group comprises classes
associated with said predetermined condition (e.g., presence,
absence, degree, etc.).
[0933] In one embodiment, said class group comprises exactly two
classes.
[0934] In one embodiment, said class group comprises exactly two
classes: presence of said predetermined condition; and absence of
said predetermined condition.
[0935] In one embodiment, said class associated with said
predetermined condition is a class associated with the presence of
said predetermined condition.
[0936] In one embodiment, said class not associated with said
predetermined condition is a class associated with the absence of
said predetermined condition.
[0937] In one embodiment, said method further comprises the
additional step of:
[0938] (d) confirming the identity of said diagnostic species.
[0939] One aspect of the present invention pertain to novel
diagnostic species (e.g., biomarker) which are identified by such a
method.
[0940] One aspect of the present invention pertains to one or more
diagnostic species (e.g., biomarkers) which are identified by such
a method for use in a method of classification (e.g.,
diagnosis).
[0941] One aspect of the present invention pertains to a method of
classification (e.g., diagnosis) which employs or relies upon one
or more diagnostic species (e.g., biomarkers) which are identified
by such a method.
[0942] One aspect of the present invention pertains to use of one
or more diagnostic species (e.g., biomarkers) which are identified
by such a method in a method of classification (e.g.,
diagnosis).
[0943] One aspect of the present invention pertains to an assay for
use in a method of classification (e.g., diagnosis), which assay
relies upon one or more diagnostic species (e.g., biomarkers) which
are identified by such a method.
[0944] One aspect of the present invention pertains to use of an
assay in a method of classification (e.g., diagnosis), which assay
relies upon one or more diagnostic species (e.g., biomarkers) which
are identified by such a method.
[0945] Diagnostic Species--Hypertension
[0946] In one embodiment, at least one of said one or more
predetermined diagnostic species is a species described in Table
2-1-HYP, Table 2-2-HYP, and/or Table 2-3-HYP.
[0947] In one embodiment, each of a plurality of said one or more
predetermined diagnostic species is a species described in Table
2-1-HYP, Table 2-2-HYP, and/or Table 2-3-HYP.
[0948] In one embodiment, each of said one or more predetermined
diagnostic species is a species described in Table 2-1-HYP, Table
2-2-HYP, and/or Table 2-3-HYP.
[0949] Diagnostic Species--Atherosclerosis/CHD
[0950] In one embodiment, at least one of said one or more
predetermined diagnostic species is a species described in Table
3-4-CHD.
[0951] In one embodiment, each of a plurality of said one or more
predetermined diagnostic species is a species described in Table
3-4-CHD.
[0952] In one embodiment, each of said one or more predetermined
diagnostic species is a species described in Table 3-4-CHD.
[0953] Diagnostic Species--Osteoporosis
[0954] In one embodiment, at least one of said one or more
predetermined diagnostic species is a species described in Table
4-1-OP and/or Table 4-2-OP.
[0955] In one embodiment, each of a plurality of said one or more
predetermined diagnostic species is a species described in Table
4-1-OP and/or Table 4-2-OP.
[0956] In one embodiment, each of said one or more predetermined
diagnostic species is a species described in Table 4-1-OP and/or
Table 4-2-OP.
[0957] Diagnostic Species--Osteoarthritis
[0958] In one embodiment, at least one of said one or more
predetermined diagnostic species is a species described in Table
5-1-OA and/or Table 5-2-OA.
[0959] In one embodiment, each of a plurality of said one or more
predetermined diagnostic species is a species described in Table
5-1-OA and/or Table 5-2-OA.
[0960] In one embodiment, each of said one or more predetermined
diagnostic species is a species described in Table 5-1-OA and/or
Table 5-2-OA.
[0961] Amount or Relative Amount
[0962] As discussed above, many of the methods of the present
invention involve classification on the basis of an amount, or a
relative amount, of one or more diagnostic species.
[0963] In one embodiment, said classification is performed on the
basis of an amount, or a relative amount, of a single diagnostic
species.
[0964] In one embodiment, said classification is performed on the
basis of an amount, or a relative amount, of a plurality of
diagnostic species.
[0965] In one embodiment, said classification is performed on the
basis of an amount, or a relative amount, of each of a plurality of
diagnostic species.
[0966] In one embodiment, said classification is performed on the
basis of a total amount, or a relative total amount, of a plurality
of diagnostic species.
[0967] In one embodiment (wherein said one or more diagnostic
species is: a plurality of diagnostic species), said amount of, or
relative amount of one or more diagnostic species is: a combination
of a plurality of amounts, or relative amounts, each of which is
the amount of, or relative amount of one of said plurality of
diagnostic species.
[0968] In one embodiment, said combination is a linear
combination.
[0969] The term "amount," as used in this context, pertains to the
amount regardless of the terms of expression.
[0970] The term "amount," as used herein in the context of "amount
of, or relative amount of (e.g., diagnostic) species," pertains to
the amount regardless of the terms of expression.
[0971] Absolute amounts may be expressed, for example, in terms of
mass (e.g., .circle-solid.), moles (e.g., .mu.mol), volume (i.e.,
.mu.L), concentration (molarity, .mu.g/mL, .mu.g/g, wt %, vol %,
etc.), etc.
[0972] Relative amounts may be expressed, for example, as ratios of
absolute amounts (e.g., as a fraction, as a multiple, as a %) with
respect to another chemical species. For example, the amount may
expressed as a relative amount, relative to an internal standard,
for example, another chemical species which is endogenous or
added.
[0973] The amount may be indicated indirectly, in terms of another
quantity (possibly a precursor quantity) which is indicative of the
amount. For example, the other quantity may be a spectrometric or
spectroscopic quantity (e.g., signal, intensity, absorbance,
transmittance, extinction coefficient, conductivity, etc.;
optionally processed, e.g., integrated) which itself indicative of
the amount.
[0974] The amount may be indicated, directly or indirectly, in
regard to a different chemical species (e.g., a metabolic
precursor, a metabolic product, etc.), which is indicative the
amount.
[0975] Diagnostic Shift
[0976] As discussed above, many of the methods of the present
invention involve classification on the basis of a modulation,
e.g., of NMR spectral intensity at one or more predetermined
diagnostic spectral windows; of the amount, or a relative amount,
of diagnostic species; etc. In this context, "modulation" pertains
to a change, and may be, for example, an increase or a decrease. In
one embodiment, said "a modulation of" is "an increase or decrease
in."
[0977] In one embodiment, the modulation (e.g., increase, decrease)
is at least 10%, as compared to a suitable control. In one
embodiment, the modulation (e.g., increase, decrease) is at least
20%, as compared to a suitable control. In one embodiment, the
modulation is a decrease of at least 50% (i.e., a factor of 0.5).
In one embodiment, the modulation is a increase of at least 100%
(i.e., a factor of 2).
[0978] Each of a plurality of predetermined diagnostic spectral
windows, and each of a plurality of diagnostic species, may have
independent modulations, which may be the same or different. For
example, if there are two predetermined diagnostic spectral
windows, NMR spectral intensity may increase in one window and
decrease in the other window. In this way, combinations of
modulations of NMR spectral intensity in different diagnostic
spectral windows may be diagnostic. Similarly, if there are two
diagnostic species, the amount of one may increase, and the amount
of the other may decrease. Again, combinations of modulations of
amounts, or relative amounts of, different diagnostic species may
be diagnostic. See, for example, the data in the Examples below,
which illustrate cases where different species have different
modulations.
[0979] The term "diagnostic shift," as used herein, pertains a
modulation (e.g., increase, decrease), as compared to a suitable
control.
[0980] A diagnostic shift may be in regard to, for example, NMR
spectral intensity at one or more predetermined diagnostic spectral
windows; or the amount of, or relative amount of, diagnostic
species.
[0981] Control Samples, Control Subjects, Control Data
[0982] Suitable controls are usually selected on the basis of the
organism (e.g., subject, patient) under study (test subject, study
subject, etc.), and the nature of the study (e.g., type of sample,
type of spectra, etc.). Usually, controls are selected to represent
the state of "normality." As described herein, deviations from
normality (e.g., higher than normal, lower than normal) in test
data, test samples, test subjects, etc. are used in classification,
diagnosis, etc.
[0983] For example, in most cases, control subjects are the same
species as the test subject and are chosen to be representative of
the equivalent normal (e.g., healthy) organism. A control
population is a population of control subjects. If appropriate,
control subjects may have characteristics in common (e.g., sex,
ethnicity, age group, etc.) with the test subject. If appropriate,
control subjects may have characteristics (e.g., age group,. etc.)
which differ from those of the test subject. For example, it may be
desirable to choose healthy 20-year olds of the same sex and
ethnicity as the study subject as control subjects.
[0984] In most cases, control samples are taken from control
subjects. Usually, control samples are of the same sample type
(e.g., serum), and are collected and handled (e.g., treated,
processed, stored) under the same or similar conditions, as the
sample under study (e.g., test sample, study sample).
[0985] In most cases, control data (e.g., control values) are
obtained from control samples which are taken from control
subjects. Usually, control data (e.g., control data sets, control
spectral data, control spectra, etc.) are of the same type (e.g.,
1-D .sup.1H NMR, etc.), and are collected and handled (e.g.,
recorded, processed) under the same or similar conditions (e.g.,
parameters), as the test data.
[0986] Implementation
[0987] The methods of the present invention, or parts thereof, may
be conveniently performed electronically, for example, using a
suitably programmed computer system.
[0988] One aspect of the present invention pertains to a computer
system or device, such as a computer or linked computers,
operatively configured to implement a method of the present
invention, as described herein.
[0989] One aspect of the present invention pertains to computer
code suitable for implementing a method of the present invention,
as described herein, on a suitable computer system.
[0990] One aspect of the present invention pertains to a computer
program comprising computer program means adapted to perform a
method according to the present invention, as described herein,
when said program is run on a computer.
[0991] One aspect of the present invention pertains to a computer
program, as described above, embodied on a computer readable
medium.
[0992] One aspect of the present invention pertains to a data
carrier which carries computer code suitable for implementing a
method of the present invention, as described herein, on a suitable
computer.
[0993] In one embodiment, the above-mentioned computer code or
computer program includes, or is accompanied by, computer code
and/or computer readable data representing a predictive
mathematical model, as described herein.
[0994] In one embodiment, the above-mentioned computer code or
computer program includes, or is accompanied by, computer code
and/or computer readable data representing data from which a
predictive mathematical model, as described herein, may be
calculated.
[0995] One aspect of the present invention pertains to computer
code and/or computer readable data representing a predictive
mathematical model, as described herein.
[0996] One aspect of the present invention pertains to a data
carrier which carries computer code and/or computer readable data
representing a predictive mathematical model, as described
herein.
[0997] One aspect of the present invention pertains to a computer
system or device, such as a computer or linked computers,
programmed or loaded with computer code and/or computer readable
data representing a predictive mathematical model, as described
herein.
[0998] Computers may be linked, for example, internally (e.g., on
the same circuit board, on different circuit boards which are part
of the same unit), by cabling (e.g., networking, ethernet,
internet), using wireless technology (e.g., radio, microwave,
satellite link, cell-phone), etc., or by a combination thereof.
[0999] Examples of data carriers and computer readable media
include chip media (e.g., ROM, RAM, flash memory (e.g., Memory
Stick.TM., Compact Flash.TM., Smartmedia.TM.), magnetic disk media
(e.g., floppy disks, hard drives), optical disk media (e.g.,
compact disks (CDs), digital versatile disks (DVDs),
magneto-optical (MO) disks), and magnetic tape media.
[1000] Although the .sup.1H-NMR spectra analysed here were
generated using a conventional (and hence large and expensive) 600
MHz NMR spectrometer, on-going technological advances suggest that
spectrometers of similar resolving power may soon be available as
desktop units (provided the sample to be analyzed is small, as is
the case with plasma or serum samples). Such units, together with a
personal computer to perform automated pattern recognition, may
soon be available not only in large hospitals but also in the
primary healthcare milieu.
[1001] One aspect of the present invention pertains to a system
(e.g., an "integrated analyser", "diagnostic apparatus") which
comprises:
[1002] (a) a first component comprising a device for obtaining NMR
spectral intensity data for a sample (e.g., a NMR spectrometer,
e.g., a Bruker INCA 500 MHz); and,
[1003] (b) a second component comprising computer system or device,
such as a computer or linked computers, operatively configured to
implement a method of the present invention, as described herein,
and operatively linked to said first component.
[1004] In one embodiment, the first and second components are in
close proximity, e.g., so as to form a single console, unit,
system, etc. In one embodiment, the first and second components are
remote (e.g., in separate rooms, in separate buildings).
[1005] A simple process for the use of such a system is described
below. In a first step, a sample (e.g., blood, urine, etc.) is
obtained from a subject, for example, by a suitably qualified
medical technician, nurse, etc., and the sample is processed as
required. For example, a blood sample may be drawn, and
subsequently processed to yield a serum sample, within about three
hours.
[1006] In a second step, the sample is appropriately processed
(e.g., by dilution, as described herein), and an NMR spectrum is
obtained for the sample, for example, by a suitably qualified NMR
technician. Typically, this would require about fifteen
minutes.
[1007] In a third step, the NMR spectrum is analysed and/or
classified using a method of the present invention, as described
herein. This may be performed, for example, using a computer system
or device, such as a computer or linked computers, operatively
configured to implement the methods described herein. In one
embodiment, this step is performed at a location remote from the
previous step. For example, an NMR spectrometer located in a
hospital or clinic may be linked, for example, by ethernet,
internet, or wireless connection, to a remote computer which
performs the analysis/classification. If appropriate, the result is
then forwarded to the appropriate destination, e.g., the attending
physician. Typically, this would require about fifteen minutes.
[1008] Applications
[1009] The methods described herein can be used in the analysis of
chemical, biochemical, and biological data.
[1010] The methods described herein provide powerful means for the
diagnosis and prognosis of disease, for assisting medical
practitioners in providing optimum therapy for disease, and for
understanding the benefits and side-effects of xenobiotic compounds
thereby aiding the drug development process.
[1011] Furthermore, the methods described herein can be applied in
a non-medical setting, such as in post mortem examinations,
forensic science, and the analysis of complex chemical mixtures
other than mammalian cells or biofluids.
[1012] Examples of these and other applications of the methods
described herein include, but are not limited to, the
following:
[1013] Medical Diagnostic Applications
[1014] (a) Early detection of abnormality/problem. For example, the
technique can be used to identify subjects suffering from cerebral
edema immediately on arrival in the acute emergency department of a
hospital. At present, when patients present with head trauma, it is
difficult to tell whether cerebral edema will be a problem: as a
result, it may not be possible to intervene until clinical symptoms
of cerebral edema become evident, which may be too late to save the
patient.
[1015] In a similar example, patients arriving at acute emergency
departments can be screened for internal bleeding and organ
rupture, to facilitate early surgical intervention.
[1016] In a third example, the methods described herein can be used
to identify a clinically silent disease (e.g., low bone mineral
density (e.g., osteoporosis); infection with Helicobacter Pylori)
prior to the onset of clinical symptoms (e.g., fracture;
development of ulcers).
[1017] (b) Diagnosis (identification of disease), especially cheap,
rapid, and non-invasive diagnosis. For example, the methods
described herein can be used to replace treadmill exercise tests,
echiocardiograms, electrocardiograms, and invasive angiography as
the collective method for the identification of coronary heart
disease. Since the current tests for coronary heart disease are
slow, expensive, and invasive (with associated morbidity and
mortality), the methods described herein offer significant
advantages.
[1018] (c) Differential diagnosis, e.g., classification of disease,
severity of disease, etc., for example, the ability to distinguish
patients with coronary artery disease affecting 1,2, or all 3
coronary arteries (see example below); the ability to distinguish
disease at different anatomical sites, e.g., in the left coronary
artery versus the circumflex artery, or in the carotid arteries as
opposed to the coronary arteries.
[1019] (d) Population targeting. A condition (e.g., coronary heart
disease, osteoporosis) may be clinically silent for many years
prior to an acute event (e.g., heart attack, bone fracture), which
may have significant associated morbidity or mortality. Drugs may
exist to help prevent the acute event (e.g., statins for heart
disease, bisphosphonates for osteoporosis), but often they cannot
be efficiently targeted at the population level. The requirements
for a test to be useful for population screening are that they must
be cheap and non-invasive. The methods described herein are ideally
suited to population screening. Screens for multiple diseases with
a single blood sample (e.g., osteoporosis, heart disease, and
cancer) further improve the cost/benefit ratio for screening.
[1020] (e) Classification, fingerprinting, and diagnosis of
metabolic diseases (e.g., inborn errors of metabolism).
[1021] (f) Identifying, classifying, determining the progress of,
and monitoring the treatment of, infectious diseases.
[1022] (g) Characterization and identification of drugs used in
overdose. For example, a patient may be unconscious following an
overdose and/or the nature of the drug taken in overdose may not be
known. The methods described herein can be used to characterise the
biological consequences of the overdose and to rapidly identify
candidate agents, facilitating rapid intervention to reverse the
effects. Thus an overdose of opioids could rapidly be countered
with naloxone.
[1023] (h) Characterization and identification of poisons, and the
metabolic or biological consequences of poisoning. Many victims of
poisoning (e.g., children) are unaware of the nature of the
substance they have taken. Furthermore, the subject may be
unconscious or unable to communicate. The methods described herein
can be used to characterise the biological consequences of the
poisoning and to rapidly identify candidate poisons. This would
facilitate administration of appropriate antidote, which typically
must be done as quickly as possible after exposure to (e.g.,
ingestion of) the toxic substance.
[1024] Medical Prognosis Applications
[1025] (a) Prognosis (prediction of future outcome), including, for
example, analysis of "old" samples to effect retrospective
prognosis. For example, a sample can be used to assess the risk of
myocardial infarction among sufferers of angina, permitting a more
aggressive therapeutic strategy to be applied to those at greatest
risk of progressing to a heart attack.
[1026] (b) Risk assessment, to identify people at risk of suffering
from a particular indication. The methods described herein can be
used for population screening (as for diagnosis) but in this case
to screen for the risk of developing a particular disease. Such an
approach will be useful where an effective prophylaxis is known but
must be applied prior to the development of the disease in order to
be effective. For example, bisphosphonates are effective at
preventing bone loss in osteoporosis but they do not increase
pathologically low bone mineral density. Ideally, therefore, these
drugs are applied prior to any bone loss occurring. This can only
be done with a technique which facilitates prediction of future
disease (prognosis). The methods described herein can be used to
identify those people at high risk of losing bone mineral density
in the future, so that prophylaxis may begin prior to disease
inception.
[1027] (c) Antenatal screening for a wide range of disease
susceptibilities. The methods described herein can be used to
analyse blood or tissue drawn from a pre-term fetus (e.g., during
chorionic vilus sampling or amniocentesis) for the purposes of
antenatal screening.
[1028] Aids to Theraputic Intervention
[1029] (a) Therapeutic monitoring, e.g., to monitor the progress of
treatment. For example, by making serial diagnostic tests, it will
be possible to determine whether and to what extent the subject is
returning to normal following initiation of a therapeutic
regimen.
[1030] (b) Patient compliance, e.g., monitoring patient compliance
with therapy. Patient compliance is often very poor, particularly
with therapies that have significant side-effects. Patients often
claim to comply with the therapeutic regimen, but this may not
always be the case. The methods described herein permit the patient
compliance to be monitored, both by directly measuring the drug
concentration and also by examining biological consequences of the
drug. Thus, the methods described herein offer significant
advantages over existing methods of monitoring compliance (such as
measuring plasma concentrations of the drug) since the patient may
take the drug just prior to the investigation, while having failed
to comply for previous weeks or months. By monitoring the
biological consequences of therapy, it is possible to assess
long-term compliance.
[1031] (c) Toxicology, including sophisticated monitoring of any
adverse reactions suffered, e.g., on a patient-by-patient basis.
This will facilitate investigation of idiosyncratic toxicity. Some
patients may suffer real, clinically significant side-effects from
a therapy which were not seen in the majority. Application of the
methods described herein facilitate rapid identification of these
rare, idiosyncratic toxicities so that the therapy can be
discontinued or modified as appropriate. Such an approach allows
the therapy to be tailored to the individual metabolism of each
patient.
[1032] (d) The methods described herein can be used for
"pharmacometabonomics," in analogy to pharmacogenomics, e.g.,
subjects could be divided into "responders" and "nonresponders"
using the metabonomic profile as evidence of "response," and
features of the metabonomic profile could then be used to target
future patients who would likely respond to a particular
therapeutic course. For example, patients given statins could be
monitored using the methods described herein for beneficial changes
in the subtle composition of the lipoproteins which are associated
with coronary heart disease. On this basis, the patients could be
categorised into "statin responsive" or "statin unresponsive". In a
second stage, the methods described herein could be re-applied to
the untreated metabonomic fingerprint to identify pattern elements
which predict future responses to statins. Thus, the clinician
would know whether or other patients should be treated with
statins, without having to wait weeks or months to assess the
outcome.
[1033] Tools for Drug Development
[1034] (a) Clinical evaluations of drug therapy and efficacy. As
for therapeutic monitoring, the methods described herein can be
used as one end-point in clinical trials for efficacy of new
therapies. The extent to which sequential diagnostic fingerprints
move towards normal can be used as one measure of the efficacy of
the candidate therapy.
[1035] (b) Detection of toxic side-effects of drugs and model
compounds (e.g., in the drug development process and in clinical
trials). For example, it will be possible to identify the major
sites of toxic effects (e.g., liver, kidney, etc.) for new
treatments during Phase I studies, as well as identifying
idiosyncratic toxicities during later stage clinical trials.
[1036] (c) Improvement in the quality control of transgenic animal
models of disease; aiding the design of transgenic models of
disease. Transgenic models of various diseases have been useful for
the preclinical development of new therapies. Although the
transgenic model may recapitulate many of the phenotypic markers of
the human disease, it is often unclear whether similar biochemical
mechanisms underlie the resulting phenotype.
[1037] (d) Other animal models of disease. For example, injection
of bovine type II collagen into mice has often been used as model
of rheumatoid arthritis, resulting in joint swelling and
autoantibodies, but the mechanisms resulting in the phenotype have
little in common with the human disease. As a result, therapies
which are effective in the animal model may be ineffective in man.
The methods described herein can be used to examine the metabolic
and phenotypic consequences of gene manipulation or other
interventions used to yield an animal model of disease, and to
compare those with the metabolic and phenotypic changes
characteristic of the disease in man, and thereby validate a range
of animal models of human diseases.
[1038] (e) Searching for new biochemical markers of disease and/or
tissue or organ damage.
[1039] For example, the NMR bin around 53.22 was identified as
being particularly associated with coronary heart disease (see
examples below), and the associated species has been identified as
a novel metabolic marker of coronary heart disease which may be
amenable to therapeutic intervention.
[1040] Commercial and Other Non-Medical Applications
[1041] (a) Commercial classification for actuarial assessment, to
address the commercial need for insurance companies to assess
future risk of disease. Examples include the provision of health
insurance and general life cover. This application is similar to
prognostic assessment and risk assessment in population screening,
except that the purpose is to provide accurate actuarial
information.
[1042] (b) Clinical trial enrollment, to address the commercial
need for the ability to select individuals suffering from, or at
risk of suffering from, a particular condition for enrolment in
clinical trials. For example, at present to perform a clinical
trial to assess efficacy of a drug intended to prevent heart
disease it would be necessary to enroll at least 4,000 subjects and
follow them for 4 years. If it were possible to select individuals
who were suffering from heart disease, it is estimated that it
would be possible to use 400 subjects followed for 2 years reducing
the cost by 25-fold or more.
[1043] (c) Characterization and identification of illicit drugs,
and the metabolic or biological consequences of substance abuse. As
for monitoring patient compliance with desired therapeutics, the
methods described herein can be used to examine the metabolic
consequences of illegal substance abuse, permitting confirmation of
the use of the substance, even if none of the substance or its
metabolites are present in the system at the time of investigation.
This circumvents the ability to use proscribed substances
chronically, but to temporally suspend their use to avoid being
identified. This application could be applied to identification of
habitual users of illegal drugs (such as heroin, cocaine,
amphetamines, etc.) for police use, or for monitoring use of banned
substances in sports (e.g., to detect use of anabolic steroids
among athletes, etc.).
[1044] (d) Application to pathology and post-mortem studies. For
example, the methods described herein could be used to identify the
proximate cause of death in a subject undergoing post-mortem
examination.
[1045] (e) Application to forensic science. For example, the
methods described herein can be used to identify the metabolic
consequences of a range of actions on a subject (who may be either
dead or alive at the time of the investigation). For example, the
methods described herein can be applied to identify metabolic
consequences of asphyxiation, poisoning, sexual arousal, or
fear.
[1046] (f) Analysis of samples other than mammalian cells or
biofluids. For example, the methods described herein can be applied
to a panel of wines, classified by experts for their quality. By
recognising patterns associated with good quality, the methods
described herein can be used by wine manufacturers during the
preparation of blends, as well as by wine purchasers to facilitate
a rapid and independent assessment of the quality of a given
wine.
[1047] (g) The methods described herein can also be used to
identify (known or novel) genotypes and/or phenotypes, and to
determine an organism's phenotype or genotype. This may assist with
the choice of a suitable treatment or facilitate assessment of its
relevance in a drug development process. For example, the
generation of metabonomic data in panels of individuals with
disease states, infected states, or undergoing treatment may
indicate response profiles of groups of individuals which can be
differentiated into two or more subgroups, indicating that an
allelic genetic basis for response to the disease, state, or
treatment exists. For example, a particular phenotype may not be
susceptible to treatment with a certain drug, while another
phenotype may be susceptible to treatment. Conversely, one
phenotype might show toxicity because of a failure to metabolise
and hence excrete a drug, which drug might be safe in another
phenotype as it does not exhibit this effect. For example,
metabonomic methods can be used to determine the acetylator status
of an organism: there are two phenotypes, corresponding to "fast"
and "slow" acetylation of drug metabolites. Phenotyping can be
achieved on the basis of the urine alone (i.e., without dosing a
xenobiotic), or on the basis of urine following dosing with a
xenobiotic which has the potential for acetylation (e.g.,
galactosamine). Similar methods can also be used to determine other
differences, such as other enzymatic polymorphisms, for example,
cytochrome P450 polymorphism.
[1048] As shown below, the methods described herein can be used
successfully to discriminate between twins, whether identical twins
or non-identical twins.
[1049] The methods described herein may also be used in studies of
the biochemical consequences of genetic modification, for example,
in "knock-out animals" where one or more genes have been removed or
made non-functional; in "knock-in" animals where one or more genes
have been incorporated from the same or a different species; and in
animals where the number of copies of a gene has been increased, as
in the model which results in the over-expression of the beta
amyloid protein in mice brains as a model for Alzheimer's disease).
Genes can be transferred between bacterial, plant and animal
species.
[1050] The combination of genomic, proteomic, and metabonomic data
sets into comprehensive "bionomic" systems may permit an holistic
evaluation of perturbed in vivo function.
[1051] The methods described herein may be used as an alternative
or adjunct to other methods, e.g., the various genomic,
pharmacogenomic, and proteomic methods.
EXAMPLES
[1052] The following are examples are provided solely to illustrate
the present invention and are not intended to limit the scope of
the present invention, as described herein.
Example 1
Identical Versus Non-Identical Twins
[1053] As discussed above, the inventors have developed novel
methods (which employ multivariate statistical analysis and pattern
recognition (PR) techniques, and optionally data filtering
techniques) of analysing data (e.g., NMR spectra) from a test
population which yield accurate mathematical models which may
subsequently be used to classify a test sample or subject, and/or
in diagnosis.
[1054] These techniques have been applied to the analysis of blood
serum in the context of identifying identical and non-identical
twins. The metabonomic analysis can distinguish between identical
and non-identical twins. Novel diagnostic biomarkers for identical
and non-identical twins have been identified, and methods for
associated classification have been developed.
[1055] This example describes how lifelong differences in
metabolism between identical monozygotic (MZ) and non-identical
dizygotic (DZ) twins are revealed by .sup.1H NMR based
metabonomics, specifically, by the methods described herein. The
two types of twin differ by so little, yet the methods described
herein can detect changes that occurred in utero, in this case,
more than half a century earlier.
[1056] Comparison of identical (MZ) and non-identical (DZ) twins
has been used extensively to assess the likely contribution of
genetic differences to variations in a wide range of phenotypes,
both physiological (such as blood pressure) and molecular (such as
the plasma concentration of various cytokines). These studies
depend for their accuracy on the assumption that MZ and DZ twin
pairs differ only in the proportion of genetic material which they
share. While this assumption has been the subject of some
controversy, there is little direct evidence for any other
systematic differences between MZ and DZ twins in adulthood.
[1057] Here, a .sup.1H-NMR based metabonomic analysis was applied
to a panel of MZ and DZ twin pairs which had previously been used
for the genetic analysis of a range of phenotypes.
[1058] The clinical data for the 204 subjects are summarised in
Table 1-1-TW, below. For each parameter, the average value is given
together with one standard deviation (SD), and where n is the
number of patients. In general, there is no difference between MZ
and DZ twins based on the measured parameters.
2 TABLE 1-1-TW Monozygote Twins (MZ) Dizygote Twins (DZ) Number of
subjects (n) 98 106 Age 58.6 .+-. 6.3 56.2 .+-. 8.6 BMI
(kg/m.sup.2) 24.8 .+-. 3.5 24.5 .+-. 3.8 Height (m) 160 .+-. 6.2
161.6 .+-. 5.9 Systolic Blood Pressure 137.9 .+-. 28.1 134.6 .+-.
24.6 Dystolic Blood Pressure 83.3 .+-. 15.6 82.0 .+-. 13.5 Smoker:
Current 28 38 Smoker: Ex 12 22 Smoker: Never 58 46
[1059] Obtaining NMR Spectra
[1060] Blood was drawn from each patient, allowed to clot in
plastic tubes for 2 hours at room temperature, and the serum was
collected by centrifugation. Aliquots of serum were stored at
-80.degree. C. until assayed.
[1061] Prior to NMR analysis, samples (150 .mu.l) were diluted with
solvent solution (10% D.sub.2O v/v, 0.9% NaCl w/v) (350 .mu.l). The
diluted samples were then placed in 5 mm high quality NMR tubes
(Goss Scientific Instruments Ltd).
[1062] Conventional 1-D .sup.1H NMR spectra of the blood serum
samples were measured on a Bruker DRX-600 spectrometer using the
conditions set forth in the section entitled "NMR Experimental
Parameters."
[1063] NMR Experimental Parameters
[1064] (a) General:
[1065] Samples were NON-SPINNING in the spectrometer
[1066] Temperature: 300 K
[1067] Operating Frequency: 600.22 MHz
[1068] Spectral Width: 8389.3 Hz
[1069] Number of data points (TD): 32K
[1070] Number of scans: 64
[1071] Number of dummy scans: 4 (once only, before the start of the
acquisition).
[1072] Acquisition time: 1.95 s
[1073] (b) Pulse Sequence:
[1074] noesypr1d (Bruker standard noesypresat sequence, as listed
in their manual):
RD--90.degree.--t.sub.1--90.degree.--t.sub.m--90.degree.--FID
[1075] Relaxation delay (RD): 1.5 s
[1076] Fixed interval (t.sub.1): 4 .mu.s
[1077] Mixing time (t.sub.m): 150 ms
[1078] 90.degree. pulse length: 10.9 .mu.s
[1079] Total recycle period: 3.6 s
[1080] Secondary irradiation at the water resonance during RD and
t.sub.m
[1081] (c) Phase Cycling
[1082] The phase of the RF pulses and the receiver was cycled on
successive scans to remove artefacts according to the following
scheme, where PH1 refers to the first 90.degree. pulse, PH2 refers
to the second, PH3 refers to the third and PH31 refers to the phase
of the receiver. In the following scheme:
[1083] 0 denotes 0.degree. phase increment
[1084] 1 denotes 90.degree. phase increment
[1085] 2 denotes 180.degree. phase increment
[1086] 3 denotes 270.degree. phase increment
[1087] PH1=0 2
[1088] PH2=0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2
[1089] PH3=0 0 2 2 1 1 3 3
[1090] PH31=0 2 2 0 1 3 3 1 2 0 0 2 3 1 1 3
[1091] (d) Processing of the FIDs:
[1092] This was done using using XWINNMR (version 2.1, Bruker GmbH,
Germany).
[1093] Automatic zero fill.times.2 at end of FID.
[1094] Line broadening by multiplying the FID by a negative
exponential equivalent to a line broadening of +0.3 Hz.
[1095] Fourier transform.
[1096] (e) Processing of the NMR spectra:
[1097] This was done using using XWINNMR (version 2.1, Bruker GmbH,
Germany).
[1098] Spectrum peak phase adjusted manually using the zero and
first order parameters PHC0, PHC1.
[1099] Baseline corrected manually using the command "basl." This
allows the subtraction of baselines of various degrees of
polynomial. The simplest is to subtract a constant to remove a DC
offset and this was sufficient in the present case. In other cases,
it can be necessary to subtract a straight line of adjustable slope
or to subtract a baseline defined by a quadratic function. The
possibility exists within the software for functions up to quartic
in nature.
[1100] Once properly phased and baseline corrected, the full
spectra showed a flat featureless baseline on both sides of the
main set of signals (i.e., outside the range .delta. 0 to 10), and
the peaks of interest showed a clear in-phase absorption
profile.
[1101] .sup.1H NMR chemical shifts in the spectra were defined
relative to that of the lactate methyl group (the middle of the
doublet, taken to be at .delta. 1.33).
[1102] (f) Reduction of the NMR Spectra to Descriptors
[1103] The .sup.1H NMR spectra in the region .delta. 10-.delta. 0.2
were segmented into 245 regions or "buckets" of equal length
(.delta. 0.04) using AMIX (Analysis of MIXtures software, version
2.5, Bruker, Germany). The integral of the spectrum in each segment
was calculated. In order to remove the effects of variation in the
suppression of the water resonance, and also the effects of
variation in the urea signal caused by partial cross solvent
saturation via solvent exchanging protons, the region .delta. 6.0
to 4.5 was set to zero integral. The following AMIX profile was
used:
[1104] command=bucket_d_table
[1105] input-file=<namesfile>
[1106] output_file=<mydata.amix>
[1107] left_ppm=10
[1108] right_ppm=0.2
[1109] exclude1_left_ppm=6.0
[1110] exclude1_right_ppm=4.5
[1111] exclude2_left_ppm=(intentionally undefined)
[1112] exclude2_right_ppm=(intentionally undefined)
[1113] bucket_width=0.04
[1114] bucket_mode=0
[1115] bucket_scale_mode=3
[1116] bucket_multiplier=0.01
[1117] bucket_output_format=2
[1118] normalization_region_left=10
[1119] normalization_region_right=0.2
[1120] The integral data were normalized to the total spectral area
using Excel (Microsoft, USA). Intensity was integrated over all
included regions, and each region was then divided by the total
integral and multiplied by a constant (i.e., 100, so that final
integrated intensities are expressed as percentages of the total
intensity).
[1121] The normalized data were then exported to the SIMCA-P
(version 8.0 Umetrics, Sweden) software package and each descriptor
was mean-centered. All subsequent analysis was therefore performed
on normalised mean-centered data.
[1122] Data Analysis
[1123] Visual comparison of the NMR spectra did not reveal any
obvious differences between the individuals from an MZ twin pair
compared with those from a DZ pair.
[1124] Application of the principal component analysis (PCA)
pattern recognition technique showed some separation; however,
there was much overlap between DZ and MZ twins. Some clustering of
DZ twins was evident on the left hand side of the plot shown in
FIG. 1-1A-TW, suggesting that significant systematic differences
exist between individuals composing MZ and DZ twin pairs. The
corresponding loadings plot, which reveals which elements of the
spectra made the greatest contribution to distinguishing the two
types of twins, is shown in FIG. 1-1B-TW; the most influential
loadings are 1.34, 1.30, 1.26, 1.22, 0.90 and 0.86 ppm.
[1125] Application of OSC to the spectra from the twins, followed
by PCA, emphasised the extent of the difference between the groups,
although the spectral regions contributing to the separation did
not change. The improved separation between MZ end DZ twins is
evident in the scores scatter plot shown in FIG. 1-1C-TW, with MZ
twin samples dominating in the upper right of the plot and the DZ
twin samples in the lower left of the plot. Optimum separation is
now observed in PC1 and PC2. The corresponding loadings plot is
shown in FIG. 1-1D-TW; the most influential loadings are: 3.22,
1.38, 1.34, 1.30, 1.26, 1.22, 1.18, 0.90, 0.86, and 0.82 ppm.
[1126] Partial least square descriminant analysis (PLS-DA)
performed using the same data, following application of OSC, was
performed. The resulting scores plot of PC2 and PC1 is shown in
FIG. 1-2E-TW. The corresponding loadings plot is shown in FIG.
1-2F-TW.
[1127] A section of the variable importance plot (VIP) for the
PLS-DA model calculated from OSC-filtered NMR data from MZ and DZ
twins is shown in FIG. 1-2A-TW.
[1128] The regression coefficients for the PLS-DA model calculated
from OSC-filtered NMR data from MZ and DZ twins are shown
graphically in FIG. 1-2B-TW.
[1129] The 11 loadings (variables) that are most influential in
causing separation between MZ and DZ twins are summarised in Table
1-2-TW, below. The assignments were made by comparing the loadings
with published tables of NMR data.
3TABLE 1-2-TW NMR spectral Bucket Chem. Shift intensity, in Region
(ppm) and MZ # (ppm) Assignment Multiplicity wrt. DZ 1 3.22 choline
--N(CH.sub.3).sub.3.sup.+ 3.21(s) decreased 2 2.26 Lipid
(--CH.sub.2CO) 2.23(m) increased 3 2.02 Lipid (--CH.sub.2--C.dbd.C)
2.00(m) increased 4 1.58 Lipid (--CH.sub.2CH.sub.2CO) 1.57(m)
increased 5 1.38 Lipid (--CH.sub.2CH.sub.2CH.sub.2CO) increased 6
1.34 Lipid (--CH.sub.2CH.sub.2CH.sub.2CO) increased 7 1.30 Lipid
(--CH.sub.2CH.sub.2CH.sub.2CO) 1.29(m) increased 8 1.26 Lipid --LDL
1.25(m) increased (--CH.sub.2CH.sub.2(CH.sub.2).sub.n) 9 1.18 Lipid
--ethanol 1.18(t) decreased 10 0.90 Lipid --VLDL (--CH.sub.3)
0.87(t) increased 11 0.86 Lipid --LDL (--CH.sub.3) 0.84(t)
increased
[1130] Validation
[1131] Prediction analysis was performed in order to validate the
models separating MZ and DZ twins. Samples (six MZ and six DZ) were
removed to comprise a validation set; a model was calculated for
the remaining samples and subsequently used to predict class
membership of samples in the validation set. The model was
calculated using partial least squares discriminant analysis
(PLS-DA) using OSC filtered data. The resulting y-predicted scatter
plot is shown in FIG. 1-3-TW, in which sample are assigned as
either class 1 (MZ) or class 0 (DZ); the cut-off is 0.5. The PLS-DA
model predicted the zygosity accurately in 83% of cases,
furthermore, for a two-component model, class can be predicted with
a significance of >83%, using a 99% confidence limit.
[1132] Clearly there are a range of systematic metabolic
differences between individuals composing MZ twin pairs compared
with DZ pairs which persist into late adulthood. Consequently, the
assumption that MZ and DZ pairs differ only in the proportion of
their genetic material which they share is invalid.
[1133] The metabolic differences between the two groups are likely
to be biologically as well as statistically significant. Both
lipoprotein metabolism and the ketone body pathway occupy a central
location in the web of intermediary metabolism. As a result, it is
plausible that many of the other phenotypes (such as blood
pressure, inflammatory cell populations, and cytokine levels) could
be affected.
[1134] It is important to note that the observation that many
phenotypes do not themselves differ between MZ and DZ twins does
not eliminate the concern that identical and non-identical twins
differ by more than the percentage of the genome they share. Based
on these observations, it is plausible that twin based studies of
heritability over-estimate, in some cases very significantly, the
contribution of genetic variation to the control of a range of
phenotypes. Twin studies can only provide an upper estimate for the
heritability of any characteristic, and caution should be exercised
before initiating the search for linked polymorphisms on the basis
of twin-derived heritability estimates.
Example 2
Hypertension
[1135] As discussed above, the inventors have developed novel
methods (which employ multivariate statistical analysis and pattern
recognition (PR) techniques, and optionally data filtering
techniques) of analysing data (e.g., NMR spectra) from a test
population which yield accurate mathematical models which may
subsequently be used to classify a test sample or subject, and/or
in diagnosis.
[1136] These techniques have been applied to the analysis of blood
serum in the context of hypertension. The metabonomic analysis can
distinguish between individuals with and without hypertension.
Novel diagnostic biomarkers for hypertension have been identified,
and methods for associated diagnosis have been developed.
[1137] Obtaining NMR Spectra
[1138] Analysis was performed on serum samples collected as part of
the coronary heart disease (CHD) NCA/TVD study described
herein.
[1139] The data were classified according to systolic blood
pressure (SBP), as follows:
[1140] low SBP (.ltoreq.130 mmHg; 28 samples)--triangles
(.tangle-solidup.).
[1141] middle SBP (131-149 mm Hg; 19 samples)--circles
(.circle-solid.).
[1142] high SBP (2150 mmHg; 17 samples)--boxes (.box-solid.).
[1143] Blood was drawn from each patient, allowed to clot in
plastic tubes for 2 hours at room temperature, and the serum was
collected by centrifugation. Aliquots of serum were stored at
-80.degree. C. until assayed.
[1144] Prior to NMR analysis, samples (150 .mu.l) were diluted with
solvent solution (10% D.sub.2O v/v, 0.9% NaCl w/v) (350 .mu.l). The
diluted samples were then placed in 5 mm high quality NMR tubes
(Goss Scientific Instruments Ltd).
[1145] Conventional 1-D .sup.1H NMR spectra of the blood serum
samples were measured on a Bruker DRX-600 spectrometer using the
conditions set forth in the section entitled "NMR Experimental
Parameters."
[1146] Data Analysis
[1147] A Principal Components Analysis (PCA) model was calculated
from the 1D .sup.1H NMR spectra of serum from patients with low,
middle and high SBP. The corresponding scores and loadings plots
are shown in FIG. 2-1A-HYP and FIG. 2-1B-HYP, respectively. Those
regions of the NMR spectrum which are responsible for causing
separation between the different SBP samples are also indicated in
FIG. 2-1B-HYP. There is substantial overlap between the
samples.
[1148] A Principal Components Analysis (PCA) model was calculated
from the 1D .sup.1H NMR spectra of serum from patients with low,
middle and high SBP, but, in this case, prior to PCA, the data were
filtered by application of orthogonal signal correction (OSC),
which serves to remove variation that is not correlated to class
and therefore improves subsequent data analysis. The corresponding
scores and loadings plots are shown in
[1149] FIG. 2-1C-HYP and FIG. 2-1D-HYP, respectively. The improved
separation between the different SBP's is evident in FIG. 2-1C-HYP,
especially with the low SBP samples which dominate on the right of
the PCA plot.
[1150] Those regions of the NMR spectrum which are responsible for
causing separation between the different SBP samples are also
indicated in FIG. 2-1D-HYP. The regions influencing separation of
the low SBP samples lie around .delta. 1.30 and .delta. 1.26
((CH.sub.2).sub.n chains of lipids, in particular VLDL and LDL).
The regions influencing separation of the middle and high SBP
samples lie around .delta. 0.86 (lipid, CH.sub.3]), .delta. 1.22
((CH.sub.2).sub.n of lipids, in particular HDL), and .delta. 3.22
(--N(CH.sub.3).sub.3.sup.+ of choline).
[1151] Due to the fact that the pattern recognition software
package (SIMCA) displays data only in 2-dimensions, and in this
example there are three sample classes, it is necessary to plot two
classes at a time calculated for the models. A scores plot and the
corresponding loadings for each pair ("low" and "middle"; "middle"
and "high"; "low" and "high") is shown in FIG. 2-1E-HYP.
[1152] Improved separation is possible using PLS-DA (rather than
the unsupervised PCA). Again, it is necessary to plot two classes
at a time calculated for the PLS-DA models. A scores plot and the
corresponding loadings for each pair (low and middle; middle and
high; low and high) is shown in FIG. 2-1F-HYP.
[1153] FIG. 2-1F-HYP illustrates separation between low and middle
SBP (FIG. 2-1F(1)-HYP) and also between low and high SBP (FIG.
2-1F(5)-HYP). There is, however, overlap between the middle and
high SBP samples (FIG. 2-1F(3)-HYP). These results suggest that the
NMR profiles of low SBP samples are different from the NMR profiles
of middle and high SBP samples. In addition, there must be a large
degree of similarity in the NMR profiles of middle and high SBP
samples, which accounts for the overlap observed in FIG.
2-1F(3)-HYP. Note, with the removal of the low SBP samples, the
region .delta. 1.30 becomes less influential (FIG.
2-1F(4)-HYP).
[1154] FIG. 2-2-HYP shows sections of the variable importance plots
(VIP) and regression coefficient plots derived from the PLS-DA
models described in FIG. 2-1F-HYP.
[1155] In the regression coefficient plots, each bar represents a
spectral region covering 0.04 ppm and shows how the .sup.1H NMR
profile of one class of SBP samples differs from the .sup.1H NMR
profile of a second class of SBP samples. A positive value on the
x-axis indicates there is a relatively greater concentration of
metabolite (assigned using NMR chemical shift assignment tables)
and a negative value on the x-axis indicates a relatively lower
concentration.
[1156] The 10 most important chemical shift windows for each of the
three models are summarised in the following tables. The
assignments were made by comparing the loadings with published
tables of NMR data.
4TABLE 2-1-HYP NMR spectral Bucket Chem. Shift intensity, in low
Region (ppm) and SBP wrt middle # (ppm) Assignment Multiplicity SBP
1 1.30 lipid (CH.sub.2).sub.n 1.29(m) increase 2 1.26 lipid
(CH.sub.2).sub.n 1.26(m), increase 1.25(m) 3 0.90 lipid (CH.sub.3)
0.93(m) increase 4 1.34 lipid (CH.sub.2).sub.n 1.32(m) increase 5
1.58 lipid (CH.sub.2CH.sub.2CO) 1.57(m) increase 6 2.02 lipid
(CH.sub.2C.dbd.C) 2.00(m) increase 7 2.22 lipid (CH.sub.2CO)
2.23(m) increase 8 0.82 lipid (CH.sub.3)/cholesterol 0.84 decrease
9 1.22 lipid (CH.sub.2).sub.n 1.22(m) increase 10 0.78 lipid
(CH.sub.3) 0.6-0.8(m) decrease
[1157]
5TABLE 2-2-HYP NMR spectral intensity, Bucket Chem. Shift in middle
Region (ppm) and SBP wrt # (ppm) Assignment Multiplicity high SBP 1
1.22 lipid (CH.sub.2).sub.n 1.22(m) decrease 2 0.86 lipid
(CH.sub.3) 0.84(t), 0.87(t) increase 3 1.26 lipid (CH.sub.2).sub.n
1.26(m), 1.25(m) increase 4 1.34 lipid (CH.sub.2).sub.n 1.32(m)
increase 5 3.22 choline N(CH.sub.3).sub.3.sup.+ 3.21(s) decrease 6
1.30 lipid (CH.sub.2).sub.n 1.29(m) decrease 7 2.06 lipid
CH.sub.2C.dbd.C 2.00(m) decrease 8 0.82 lipid
(CH.sub.3)/cholesterol 0.84 decrease 9 1.98 lipid (CH.sub.2C.dbd.C)
1.97(m) increase 10 2.74 lipid (C.dbd.CCH.sub.2C.dbd.C) 2.71(m)
decrease
[1158]
6TABLE 2-3-HYP Bucket Chem. Shift NMR spectral Region (ppm) and
intensity, in low # (ppm) Assignment Multiplicity SBP wrt high SBP
1 1.30 lipid (CH.sub.2).sub.n 1.29(m) increase 2 1.26 lipid
(CH.sub.2).sub.n 1.26(m), increase 1.25(m) 3 1.34 lipid
(CH.sub.2).sub.n 1.32(m) increase 4 0.90 lipid (CH.sub.3) 0.91
increase 5 1.22 lipid (CH.sub.2).sub.n 1.22(m) decrease 6 1.58
lipid (CH.sub.2CH.sub.2CO) 1.57(m) increase 7 0.82 lipid
(CH.sub.3)/cholesterol 0.84 decrease 8 2.22 lipid (CH.sub.2CO)
2.23(m) increase 9 2.02 lipid (CH.sub.2C.dbd.C) 2.00(m) increase 10
3.22 choline N(CH.sub.3).sub.3.sup.+ 3.21(s) decrease
[1159] Validation
[1160] Validation was performed for various models. In each case, a
model comprising a number of one class of samples (e.g., low,
middle, high SBP) was constructed and the distance-to-model
calculated for remaining samples of the pair under consideration
(e.g., low and middle; middle and high; low and high).
[1161] The distance-to-model plot shows the ability of the model to
predict class membership of the remaining samples. The further the
distance (DModX), the more dissimilar the sample is to the model,
and therefore the sample is classed as a non-member. The closer the
distance, the more similar the sample is to the model, and
therefore the sample is classed as a member. The DCrit line is a
threshold between samples which are considered close and far from
the model.
[1162] FIG. 2-2A-HYP
[1163] Model calculated using 14 low SBP samples.
[1164] Tested with remaining low and middle SBP samples.
[1165] DCrit at 1.41.
[1166] Prediction Rate: 84%
[1167] Overall, in FIG. 2-2A-HYP, the low SBP samples lie close to
the model (fall below the DCrit line), whilst the middle SBP
samples lie far from the model (fall above the DCrit line).
[1168] FIG. 2-2B-HYP
[1169] Model calculated using 9 middle SBP samples.
[1170] Tested with remaining low and middle SBP samples.
[1171] DCrit at 1.50.
[1172] Prediction Rate: 84%
[1173] Overall, in FIG. 2-2B-HYP, the middle SBP samples lie close
to the model (fall below the DCrit line), whilst the low SBP
samples lie far from the model (fall above the DCrit line).
[1174] FIG. 2-3A-HYP
[1175] Model calculated using 9 middle SBP samples.
[1176] Tested with remaining middle and high SBP samples.
[1177] DCrit at 1.50.
[1178] Prediction Rate: 59%
[1179] The distance-to-model plot shown in FIG. 2-3A-HYP suggests
that, overall, most of the samples lie close to the model (fall
below the DCrit line) and therefore are classed as middle SBP
samples. This however is not the case, as it is known that some of
the samples are, in fact, high SBP. FIG. 2-3A-HYP may result from
the fact that the NMR spectra of middle and high SBP samples are
very similar, and hence cannot be predicted as either middle or
high. This correlates with the fact that there is poor separation
observed in the scores scatter plot for middle vs high SBP samples
(FIG. 2-1E(3)-HYP).
[1180] FIG. 2-3B-HYP
[1181] Model calculated using 9 high SBP samples.
[1182] Tested with remaining middle and high SBP samples.
[1183] DCrit at 1.50.
[1184] Prediction Rate: 37%
[1185] The distance-to-model plot shown in FIG. 2-3B-HYP suggests
that, overall, most of the samples lie close to the model (fall
below the DCrit line) and therefore are classed as high SBP
samples. This, however, is not the case as it is known that some of
the samples are, in fact, middle SBP. FIG. 2-3B-HYP may result from
the fact that the NMR spectra of middle and high SBP samples are
very similar, and hence cannot be predicted as either middle or
high. This correlates with the fact that there is poor separation
observed in the scores scatter plot for middle vs high SBP samples
(FIG. 2-1E(3)-HYP).
[1186] FIG. 2-4A-HYP
[1187] Model calculated using 15 low SBP samples.
[1188] Tested with remaining low and high SBP samples.
[1189] DCrit at 1.41.
[1190] Prediction Rate: 80%
[1191] Overall, in FIG. 2-4A-HYP, the low SBP samples lie close to
the model (fall below the DCrit line), whilst the high SBP samples
lie far from the model (fall above the DCrit line).
[1192] FIG. 24B-HYP
[1193] Model calculated using 9 high SBP samples.
[1194] Tested with remaining low and high SBP samples.
[1195] DCrit at 1.50.
[1196] Prediction Rate: 83%
[1197] Overall, in FIG. 2-4B-HYP, the high SBP samples lie close to
the model (fall below the DCrit line), whilst the low SBP samples
lie far from the model (fall above the DCrit line).
Example 3
Diagnosis of Coronary Heart Disease (CHD)
[1198] As discussed above, the inventors have developed novel
methods (which employ multivariate statistical analysis and pattern
recognition (PR) techniques, and optionally data filtering
techniques) of analysing data (e.g., NMR spectra) from a test
population which yield accurate mathematical models which may
subsequently be used to classify a test sample or subject, and/or
in diagnosis.
[1199] In the context of atherosclerosis/CHD, the inventors have
applied these techniques to the analysis of either serum or plasma
taken from individuals who have been extensively characterized,
both for the presence of atherosclerosis/CHD by the gold-standard
angiographic technique and also for a wide range of conventional
risk factors. The metabonomic analysis can distinguish between
individuals with and without atherosclerosis/CHD; and/or the degree
of atherosclerosis/CHD. Novel diagnostic biomarkers for
atherosclerosis/CHD have been identified, and methods for
associated diagnosis have been developed.
[1200] Obtaining NMR Spectra
[1201] Patients were recruited to the TVD (triple vessel disease)
group who had significant coronary artery disease (defined as a
reduction of more than 50% in the intralumenal diameter) of all
three coronary arteries (left anterior descending, circumflex and
right coronary arteries). The symptoms of angina had been stable
for at least one month and no patient had suffered a myocardial
infarction in the preceding three months.
[1202] Patients were recruited to the NCA (normal coronary artery)
group who had chest pain and a positive exercise electrocardiogram
(the Bruce protocol (see, e.g., Bruce, 1974; Berman et al., 1978;
Guyton, 1991) was used, where the presence of at least 1 mm of
horizontal or downward sloping ST segment depression at 80 ms after
the J point is considered positive), but normal coronary angiograms
Budged by two independent observers). NCA patients with
hypertension, diabetes mellitus and valvular heart disease or left
ventricular hypertrophy were excluded.
[1203] Consecutive patients presenting at Papworth Hospital
(Cambridgeshire, UK) who met the above criteria for either the TVD
or NCA group were recruited to the study. 36 patients with severe
CHD (TVD patients) and 30 patients with angiographically normal
coronary arteries (NCA patients) were enrolled. The clinical data
for these patient groups is shown in Table 3-2-CHD, below. For each
parameter, the average value is given together with one standard
deviation.
7 TABLE 3-2-CHD TVD NCA Age (years) 64.1 .+-. 7.2 57.2 .+-. 9.0
Sex: Male (n) 34 7 Sex: Female (n) 2 23 Myocardial infarction 19 1
Systolic Blood Pressure (mmHg) 138 .+-. 23 141 .+-. 22 Diastolic
Blood Pressure (mmHg) 75 .+-. 12 78 .+-. 12 Smokers (n) 1 2 Urea
(mM) 5.6 .+-. 1.6 5.0 .+-. 1.2 Creatinine (.mu.M) 108 .+-. 18 93
.+-. 14 Glucose (mM) 5.6 .+-. 0.9 5.2 .+-. 0.6 Total cholesterol
(mM) 6.2 .+-. 0.8 5.9 .+-. 1.1 HDL-cholesterol (mM) 0.8 .+-. 0.2
1.1 .+-. 0.2 LDL-cholesterol (mM) 4.5 .+-. 0.7 4.3 .+-. 1.1 Total
Chol: HDL-Chol ratio 8.3 .+-. 1.9 5.8 .+-. 1.8 PAI-1 (ng/dl) 49.1
.+-. 16.6 37.9 .+-. 17.4 Triglycerides (mM) 2.1 .+-. 1.1 1.5 .+-.
1.2 TGF-beta 1.6 .+-. 1.4 4.4 .+-. 4.8 Total protein (g) 69.4 .+-.
4.0 70.4 .+-. 6.3 Albumin (g) 37.4 .+-. 2.6 38.6 .+-. 3.2 %
Globulin 46 .+-. 4 45 .+-. 5
[1204] Blood was drawn from each patient, allowed to clot in
plastic tubes for 2 hours at room temperature, and the serum was
collected by centrifugation. Aliquots of serum were stored at
-80.degree. C. until assayed.
[1205] Prior to NMR analysis, samples (150 .mu.l) were diluted with
solvent solution (10% D.sub.2O v/v, 0.9% NaCl w/v) (350 .mu.l). The
diluted samples were then placed in 5 mm high quality NMR tubes
(Goss Scientific Instruments Ltd).
[1206] Conventional 1-D .sup.1H NMR spectra of the blood serum
samples were measured on a Bruker DRX-600 spectrometer using the
conditions set forth in the section entitled "NMR Experimental
Parameters."
[1207] Visual Analysis of Spectra
[1208] The 600 MHz .sup.1H NMR spectra of human sera from patients
with severe CHD (TVD patients) and patients with angiographically
normal coronary arteries (NCA patients) were visually compared
(see, e.g., FIG. 3-1-CHD). Few systematic differences could be
detected when the two groups were compared.
[1209] Chemical components visible in the spectra were assigned on
the basis of previously published data (see, e.g., Nicholson et
al., 1995; Lui et al., 1997; Ala-Korpela, 1995). The features
assigned in FIG. 3-1-CHD are summarised in Table 3-3-CHD,
below.
8TABLE 3-3-CHD Chemical Shift No. (.delta.) Assignment 1 0.66
Lipid, HDL; C18 methyl group of HDL-C 2 0.84, 0.87 Lipid, mainly
LDL and VLDL; CH.sub.3 3 0.97, 1.02 Valine 4 1.25, 1.29 Lipid,
mainly LDL and VLDL; (CH.sub.2).sub.n 5 1.33 Lactate 6 1.46 Alanine
7 1.57 Lipid; CH.sub.2CH.sub.2CO. 8 1.69 Lipid;
CH.sub.2CH.sub.2C.dbd.C 9 1.97 Lipid; CH.sub.2C.dbd.C 10 2.04
Acetyl signal from .alpha.-1 acid glycoprotein 11 2.23 Lipid;
CH.sub.2CO 12 2.41 Glutamine 13 2.52, 2.69 Citrate 14 2.69 Lipid;
--C.dbd.CCH.sub.2C.dbd.C 15 2.89 Albumin lysyl 16 3.05 Creatinine
17 3.21 Choline 18 3.24 H-2 of .beta.2-glucose 19 3.3-4.0 CH
protons from glycerol, glucose, and amino acid 20 4.11 Lactate 21
4.64 H-1 of .beta.-glucose 22 4.7 Residual water 23 5.23 H-1 of
.alpha.-glucose 24 5.26-5.33 Lipids; .dbd.CH
[1210] Data Analysis
[1211] To determine whether it was possible to distinguish TVD and
NCA patients on the basis of the NMR spectra, principal component
analysis (PCA) was performed.
[1212] The scores plot of PC2 and PC3 (FIG. 3-2A-CHD) shows that,
while there was much overlap between the two sample classes, some
clustering was evident. Whilst there is overlap between NCA and TVD
samples, some separation is evident, with NCA samples dominating in
the upper right quadrant and TVD samples dominating in the lower
left quadrant. Optimum separation was seen in PC2 and PC3, and
hence t2 vs t3 is shown in FIG. 3-2A-CHD.
[1213] The corresponding PCA loadings scatter plot (FIG. 3-2B-CHD)
shows which regions of the NMR spectrum are responsible for causing
separation between NCA and TVD samples; the most influential
loadings are shown to be: regions .delta. 1.30; .delta. 1.22;
.delta. 3.22; .delta. 0.86; and .delta. 1.26.
[1214] Following application of OSC, the TVD and NCA groups were
well separated in the scores plot of PC1 and PC2 (FIG. 3-2C-CHD, as
compared to FIG. 3-2A-CHD). Here, NCA samples (circles) dominate in
the lower left quadrant; TVD samples (squares) dominate in the
upper right quadrant Optimum separation was observed in PC1 and
PC2, and hence t1 vs. t2 is shown in FIG. 3-2C-CHD.
[1215] The corresponding loadings plot (FIG. 3-2D-CHD) shows which
regions of the NMR spectrum are responsible for causing separation
between NCA and TVD samples. Importantly, the same regions of the
spectra that contributed to the clustering in the unfiltered data
set (FIG. 3-2B-CHD) also contributed to the clustering seen after
application of OSC (FIG. 3-2D-CHD): .delta. 1.30; .delta. 1.34;
.delta. 1.22; .delta. 3.22; .delta. 0.86; and .delta. 1.26.
[1216] Partial least square descriminant analysis (PLS-DA)
performed using the same data, following application of OSC,
yielded excellent separation. The resulting scores plot of PC2 and
PC1 (see FIG. 3-2E-CHD); here, NCA samples (circles) dominate the
right hand side; TVD samples (squares) dominate the left hand side.
The corresponding loadings plot (see FIG. 3-2F-CHD) shows which
regions of the NMR spectrum are responsible for causing separation
between NCA and TVD samples. Again, the same regions appear .delta.
1.30; .delta. 1.22; .delta. 1.26; .delta. 1.34; .delta. 3.22;
.delta. 0.86; etc.
[1217] A section of the variable importance plot (VIP) for the
PLS-DA model calculated from OSC-filtered NMR data is shown in FIG.
3-3A-CHD.
[1218] The regression coefficients for the OSC filtered data are
shown graphically in FIG. 3-3B-CHD. For the regression
coefficients, a positive value indicates a relatively greater
concentration of a metabolite (e.g., assigned using NMR chemical
shift assignment tables) present in TVD samples and a negative
value indicates a relatively lower concentration, both with respect
to control samples.
[1219] The regression coefficients for the PLS-DA model (whether
obtained using the unfiltered data or OSC-filtered data) again
indicated that the same spectral regions contributed most strongly
to the discrimination of the classes: lipid, mostly VLDL and LDL,
and choline.
[1220] The loadings (variables) that are most influential in
causing separation between NCA and TVD samples are summarised in
Table 3-4-CHD, below, and are listed in order of decreasing
importance. The assignments were made by comparing the loadings
with published tables of NMR data.
9TABLE 3-4-CHD NMR spectral Bucket Chem. Shift intensity, Region
(ppm) and in TVD # (ppm) Assignment Multiplicity vs. NCA 1 1.30
lipid (CH.sub.2).sub.n 1.29(m) increased 2 1.22 lipid
(CH.sub.2).sub.n 1.22(m) decreased 3 1.26 lipid (CH.sub.2).sub.n
1.26(m), 1.25(m) increased 4 1.34 lipid (CH.sub.2).sub.n 1.32(m)
increased 5 3.22 choline N(CH.sub.3).sub.3.sup.+ 3.21(s) decreased
6 0.86 lipid (CH.sub.3) 0.84(t), 0.87(t) increased 7 0.90 lipid
(CH.sub.3) 0.91 increased 8 0.82 lipid (CH.sub.3)/cholesterol 0.84
decreased 9 2.02 lipid (CH.sub.2C.dbd.C) 2.00(m) increased 10 1.58
lipid (CH.sub.2CH.sub.2CO) 1.57(m) increased 11 2.22 lipid
(CH.sub.2CO) 2.23(m) increased 12 1.98 lipid (CH.sub.2C.dbd.C)
1.97(m) decreased
[1221] The region at .delta. 3.22 is assigned to
--N(CH.sub.3).sub.3.sup.+ groups in molecules containing the
choline moiety, principally phosphatidylcholine from lipoproteins,
mainly HDL, based on the known phospholipid content of
lipoproteins.
[1222] The regions as .delta. 1.30, 1.22, 1.26, and 1.34 all arise
from the (CH.sub.2).sub.n chains of fatty acyl groups, which are
present in all lipoproteins as phosholipids, cholesteryl esters,
and tricylglyerols. The proportions of all three three classes of
compounds vary across the types of lipoprotein. There are two broad
.sup.1H NMR peaks in the region .delta. 1.34-1.22 which are usually
assigned as LDL and VLDL; however, both peaks will contribute to
all of these regions because of the peak line widths.
[1223] Liproteins account for approximately 10% of total human
blood protein. Lipoproteins are water soluble complexes comprising
protein components (e.g., apolipoproteins) and lipid components
(e.g., cholesterol, cholesteryl esters, phospholipids, and
triglycerides). Liproteins are often conveniently considered to
comprise a hydrophobic core (primarily of cholesteryl esters and
triglycerides) surrounded by a relatively more hydrophilic shell
(primarily apolipoproteins, phospholipids, and unesterified
cholesterol) projecting its hydrophilic domains into the aqueous
environment. Lipoproteins presumably serve as transport proteins
for lipids, such as triacylglyercols, cholesterol (and cholesteryl
esters), and other lipids (e.g., phospholipids).
[1224] Several classes of lipoproteins (e.g., .alpha., .beta.,
broad-.beta., pre-.beta.) can be distinguished in human blood,
according to their electrophoretic behaviour. However, lipoproteins
are more conveniently characterized by their ultracentrifugation
behavior in high-salt media, as described by their flotation
constants (densities), as follows: chylomicra, less than 1.006
g/mL; very low density (VLDL), 1.006-1019 g/mL; low density (LDL),
1.019-1.063 g/mL; high density (HDL), 1.063-1.21 g/mL; very high
density (VHDL), >1.21 g/mL. Lipoproteins are often approximately
spherical in shape, and range in diameter from about 0.1 micron
(for chylomicra) to about 5 nanometers (for VHDL). Lipoproteins
range in molecular weight from 200 kd to 10,000 kd and from 4 to
95% lipid (the higher the density the lower the lipid content).
Chylomicra and VLDLs are rich in triglycerides (.about.90% and
.about.60% of the total lipid content, respectively), while LDLs
are rich in cholesterol (.about.60% of total lipid content) and
HDLs are rich in phospholipids (.about.50% of total lipid
content).
[1225] Choline (HO--CH.sub.2CH.sub.2--N(CH.sub.3).sub.3.sup.+) is
incorporated into many biologically important species, including
phosphorylcholine, glycerophosphocholine and phosphatidylcholine
(e.g., phospholipids). Phospholipids are components of lipid
membranes and also of lipoproteins. The predominant
choline-containing species in blood plasma are
phosphatidylcholines.
[1226] Validation
[1227] Having established the presence of "clusters" by PCA, the
data were analysed by PLS-DA to test the predictive power of the
model.
[1228] For cross-validation purposes, training sets comprising
approximately 80% of the samples under study (selected randomly)
were constructed, and used to predict the class of the remaining
20% of the samples. Approximately 80% of the samples were selected
at random to construct a PLS-DA model which could then be used to
predict the class membership of the remaining 20% of samples. Class
membership was predicted using a 0.5 dividing line between the two
classes and a class membership probability value >0.01 (99%
confidence interval).
[1229] The PLS-DA model calculated for the OSC-filtered data was
then used to predict the class membership of the samples not
included in the training set (FIG. 3-4-CHD). Using approximately
80% of the NCA (circles) and TVD (squares) samples, a PLS-DA model
was calculated and used to predict the presence of TVD in the
remaining 20% of samples (the validation set) (triangles, NCA or
TVA as marked). The y-predicted scatter plot assigns samples to
either class 1 (in this case, corresponding to TVD) or class 0 (in
this case, corresponding to NCA); 0.5 is the cut-off. The PLS-DA
model predicted the presence and absence of TVD with a sensitivity
of 92% and a specificity of 93% based on a 99% confidence limit for
class membership.
[1230] This demonstrates that .sup.1H-NMR based metabonomic
analysis of plasma samples, in itself minimally invasive and
nondestructive of sample, can achieve clinically useful diagnostic
performance, when compared to invasive angiography.
[1231] This example demonstrates that it is possible to completely
separate CHD patients with stenosis of all three major arteries
from subjects with normal coronary arteries using principle
component analysis (PCA).
[1232] Furthermore, using the supervised PLS-DA algorithm, it is
possible to predict the artery status of unknown samples using a
training set that composed only 24 NCA and 30 TVD individuals. The
small size of the training set required to achieve >90%
sensitivity and specificity highlights the power of this technique.
Substantially larger training sets obtained through application of
this technique to clinical practice should further improve the
diagnostic sensitivity and specificity of the technique.
[1233] While the peaks around .delta. 1.30 are known to result
predominantly from lipid CH.sub.2 resonances, the values of the NMR
descriptors in this region only correlate weakly with the level of
LDL-cholesterol (r.sup.2=0.20). This means that there is
considerable NMR signal intensity information in these windows
which is uncorrelated with the level of LDL-cholesterol. This
arises from the presence of some small molecule metabolites such as
lactate and threonine and also contributions from other
lipoproteins (mainly VLDL) present in the biofluid. The line widths
of the LDL and VLDL CH.sub.2 peaks are such that the two peaks
overlap considerably and both will contribute to all of the windows
in this region to varying amounts. The remaining variance is likely
to result from subtle chemical differences in the lipid composition
of LDL particles between individuals, for example, degree of fatty
acid side chain unsaturation and lipoprotein-protein molecular
interactions. Such observations will contribute to on-going studies
using both NMR and other analytical techniques to understand the
contribution of lipoprotein particle composition to the development
of CHD. It does, however, emphasize an important facet of high data
density metabolic analysis in that it is entirely unnecessary to
understand fully the complex molecular differences that underlie
the spectral features associated with CHD to be able to correctly
classify individuals with very high sensitivity and specificity.
Further analysis of the molecular basis of the spectral
differences, however, will give insight into the mechanistic
processes involved.
Example 4
Determination of Severity of Coronary Heart Disease (CHD)
[1234] As discussed above, the inventors have developed novel
methods (which employ multivariate statistical analysis and pattern
recognition (PR) techniques, and optionally data filtering
techniques) of analysing data (e.g., NMR spectra) from a test
population which yield accurate mathematical models which may
subsequently be used to classify a test sample or subject, and/or
in diagnosis.
[1235] In the context of atherosclerosis/CHD, the inventors have
applied these techniques to the analysis of either serum or plasma
taken from individuals who have been extensively characterized,
both for the presence of atherosclerosis/CHD by the gold-standard
angiographic technique and also for a wide range of conventional
risk factors. The metabonomic analysis can distinguish between
individuals with and without atherosclerosis/CHD; and/or the degree
of atherosclerosis/CHD. Novel diagnostic biomarkers for
atherosclerosis/CHD have been identified, and methods for
associated diagnosis have been developed.
[1236] Obtaining NMR Spectra--Severity of CHD
[1237] To determine whether .sup.1H NMR based metabonomic analysis
could distinguish the severity of CHD present, samples were
collected from individuals with stenosis of one, two or three major
coronary arteries. Although this is a crude indicator of disease
severity, it is plausible that the number of vessels stenosed
correlated (at least weakly) with whole body atherosclerotic plaque
load.
[1238] Using plasma from 76 patients (28 with 1 vessel stenosed:
type "1" vessel disease; 20 with 2 vessels stenosed: type "2"
vessel disease; 28 with 3 vessels stenosed: type "3" vessel
disease), .sup.1H NMR spectral analysis was used to classify the
severity of CHD. The methods for collection of samples; NMR
spectroscopy; data processing; and pattern recognition methods were
all as described above, unless specified otherwise.
[1239] Patients were recruited according to the same criteria as
described above, except that patients with more than 50% stenosis
of either one, two or all three coronary arteries (assessed by two
independent observers) were recruited and females were excluded.
The clinical data that were measured (conventionally) for these
patient groups are shown in Table 3-5-CHD, below. For each
parameter, the average value is given together with one standard
deviation.
10TABLE 3-5-CHD # Parameter Type "1" Type "2" Type "3" 1 Number (n)
28 20 28 (all male) 2 Height (m) 1.76 .+-. 0.07 1.80 .+-. 0.05 1.78
.+-. 0.06 3 Weight (kg) 83.5 .+-. 14.7 91.1 .+-. 10.0 86.7 .+-. 9.6
4 BMI (kg/m.sup.2) 26.77 .+-. 4.01 28.07 .+-. 3.55 27.32 .+-. 2.22
5 Erythrocytes 4.64 .+-. 0.35 4.54 .+-. 0.55 4.66 .+-. 0.25 6
Haemoglobin 13.9 .+-. 0.82 13.53 .+-. 1.52 13.54 .+-. 0.95 (g d/L)
7 Hematocrit 0.418 .+-. 0.026 0.410 .+-. 0.053 0.409 .+-. 0.025 8
MCV (fl) 90.2 .+-. 4.3 90.2 .+-. 4.3 87.7 .+-. 5.3 9 MCHC (g d/L)
30.1 .+-. 1.6 29.8 .+-. 1.5 29.1 .+-. 2.0 10 Platelets (10.sup.9/L)
210 .+-. 45 210 .+-. 27 214 .+-. 57 11 Leukocytes 6.30 .+-. 1.21
6.74 .+-. 1.74 6.22 .+-. 1.50 12 Neutrophils 3.63 .+-. 0.89 4.09
.+-. 1.77 3.61 .+-. 1.14 (10.sup.9/L) 13 Lymphocytes 1.88 .+-. 0.52
1.84 .+-. 0.55 1.79 .+-. 0.44 (10.sup.9/L) 14 Monocytes 0.53 .+-.
0.14 0.51 .+-. 0.17 0.53 .+-. 0.14 (10.sup.9/L) 15 Eosinophils 0.21
.+-. 0.12 0.19 .+-. 0.12 0.16 .+-. 0.10 (10.sup.9/L) 16 Basophils
(10.sup.9/L) 0.02 .+-. 0.01 0.02 .+-. 0.01 0.02 .+-. 0.01 17 LUC
0.08 .+-. 0.03 0.08 .+-. 0.04 0.09 .+-. 0.05 18 Fibrinogen 3.52
.+-. 0.86 3.76 .+-. 1.01 3.57 .+-. 0.84 19 PT test (s) 13.6 .+-.
0.9 13.6 .+-. 1.2 13.7 .+-. 0.8 20 APTT test 29.0 .+-. 2.9 30.1
.+-. 4.0 30.2 .+-. 3.1 21 Sodium (mmol/L) 140 .+-. 2 139 .+-. 2 140
.+-. 2 22 Potassium 4.1 .+-. 0.3 4.1 .+-. 0.2 4.2 .+-. 0.3 (mmol/L)
23 Urea (mmol/L) 6.1 .+-. 1.7 6.6 .+-. 1.4 6.1 .+-. 1.3 24
Creatinine 104 .+-. 10 103 .+-. 10 107 .+-. 11 (.mu.mol/L) 25
Protein (g/L) 72 .+-. 4 72 .+-. 6 72 .+-. 3 26 Albumin (g/L) 42
.+-. 3 41 .+-. 4 42 .+-. 3 27 Immunoglogulins 31 .+-. 4 30 .+-. 5
30 .+-. 3 (g/L) 28 Bilirubin 9 .+-. 4 11 .+-. 4 10 .+-. 4
(.mu.mol/L) 29 ALT (U/L) 19 .+-. 6 23 .+-. 10 22 .+-. 8 30 ALP
(U/L) 183 .+-. 41 178 .+-. 39 173 .+-. 41 31 yGt (U/L) 12.1 .+-.
7.0 14.0 .+-. 10.3 12.9 .+-. 7.5 32 Glucose 5.8 .+-. 1.3 5.9 .+-.
1.4 6.1 .+-. 2.3 (mmol/L) 33 HbA1c 5.6 .+-. 0.5 5.9 .+-. 1.3 6.3
.+-. 0.6 34 Cholesterol 5.3 .+-. 0.9 5.6 .+-. 1.4 5.2 .+-. 0.9
(mmol/L) 35 LDL-C (mmol/L) 3.3 .+-. 0.8 3.6 .+-. 1.3 3.2 .+-. 0.9
36 HDL-C (mmol/L) 1.01 .+-. 0.23 0.97 .+-. 0.17 1.04 .+-. 0.34 37
Triglycerides 2.0 .+-. 1.1 2.2 .+-. 1.0 2.1 .+-. 0.8 (mmol/L)
[1240] Blood samples from these patients were drawn into Diatube H
tubes, and platelet-poor plasma was prepared as previously
described. Aliquots of plasma were stored at -80.degree. C. until
assayed.
[1241] Samples were obtained, and 1-D .sup.1H NMR spectra were
collected using the same methods and parameters as described in the
NCA/TVD section.
[1242] Data Analysis
[1243] A principal components analysis (PCA) model was calculated
using 1-D .sup.1H NMR spectra for serum samples from patients with
either 1, 2, or 3 vessels stenosed (i.e., type "1", type "2", and
type "3" vessel disease, respectively).
[1244] The scores scatter plot for the PCA model is shown in FIG.
3-5A-CHD. Whilst there is much overlap between the three classes of
sample, some separation is evident particularly for the type "1"
vessel disease samples which dominating the lower left of the plot.
Optimum separation was observed in PC2 and PC1, hence t2 vs. t1 is
plotted in the figure.
[1245] The corresponding loadings plot is shown in FIG. 3-5B-CHD,
which shows which regions of the NMR spectrum are responsible for
causing separation between the three different degrees of severity
of CHD. Due to the extent of overlap, the loadings plot is
difficult to interpret, however, the most influential loadings are
regions: 3.22; 1.38; 1.34; 1.30; 1.26; 1.22; 0.90; 0.86; and 0.82
ppm.
[1246] Improved separation is possible using PLS-DA (rather than
the unsupervised PCA). Due to the fact that the pattern recognition
software package (SIMCA) displays data only in 2-dimensions, and in
this example there are three sample classes, it is necessary to
plot two classes at a time calculated for, e.g., PLS-DA models. A
scores plot and the corresponding loadings for each pair ("1" and
"2"; "1" and "3"; "2" and "3") is shown in FIG. 3-5C-CHD. There
remains much overlap between the classes; however, some separation
is evident.
[1247] Another PCA model was calculated using the same data.
However, prior to PCA, the NMR data were filtered by application of
OSC which serves to remove variation that is not correlated to
class and therefore improves subsequent multivariate analysis.
[1248] The scores scatter plot for the resulting PCA model is shown
in FIG. 3-6A-CHD. The improved separation between the classes of
different severity of CHD is evident, with type "1" vessel disease
dominating in the lower left quadrant.
[1249] The corresponding loadings scatter plot is shown in FIG.
3-6B-CHD, which shows which regions of the NMR spectrum are
responsible for distinguishing severity of CHD. Importantly, it is
the same regions as for distinguishing NCA from TVD that are
depicted in FIG. 3-5B-CHD, namely: 3.22; 1.38; 1.34; 1.30; 1.26;
1.22; 0.90; 0.86; and 0.82 ppm.
[1250] Again, improved separation is possible using PLS-DA (rather
than the unsupervised PCA). A scores plot and the corresponding
loadings for each pair ("1" and "2"; "1" and "3"; "2" and "3") is
shown in FIG. 3-6C-CHD. Most separation is observed between types
"1" and "2" (FIG. 3-6C-(1)-CHD) and types "0" and "3" (FIG.
3-C-(5)-CHD).
[1251] This suggests that the metabolic profile (NMR spectrum) for
type "1" vessel disease differs the most compared to the profiles
for type "2" and type "3", which are more similar to each
other.
[1252] Pairs of variable importance plots (VIPs) and regression
coefficient plots for each of the three PLS-DA models described in
FIG. 3-6C-(1)-CHD through (6)-CHD are shown in FIG. 3-7-(1)-CHD
through (6)-CHD.
[1253] The regression coefficients in the loadings plots indicated
that spectral windows ca. .delta. 1.30 and .delta. 1.26, dominated
by lipid resonances, contributed to most of the separation between
the severity classes, with the window at .delta. 3.22 (choline)
being relatively less important than in the comparison of TVD and
NCA patients.
[1254] Validation
[1255] Y-predicted scatter plots for the OSC-PLS-DA models are
shown in FIG. 3-8A-CHD, FIG. 3-8B-CHD, and FIG. 3-8C-CHD, and these
demonstrate the ability of .sup.1H NMR based metabonomics to
predict class membership (severity of CHD; 1, 2 or 3 vessels
affected) of unknown samples. For each plot, about 80% of the total
number of samples were used to calculate a PLS-DA model which was
then used to predict the severity in the remaining 20% of the
samples. The y-predicted scatter plots assign samples to either
class 1 or class 0; and the cut-off is 0.5.
[1256] The type "1" and type "2" vessel disease PLS-DA model (FIG.
3-8A-CHD) predicted the severity accurately in 88% of cases.
Furthermore, for a two-component model, severity was predicted with
a significance level .gtoreq.90% using a 99% confidence limit.
[1257] The type "2" and type "3" vessel disease PLS-DA model (FIG.
3-8B-CHD) predicted the severity accurately in 88% of cases.
Furthermore, for a two-component model, severity was predicted with
a significance level .gtoreq.85% using a 99% confidence limit.
[1258] The Type "1" and type "3" vessel disease PLS-DA model (FIG.
3-8C-CHD) predicted the severity accurately in 75% of cases.
Furthermore, for a two-component model, severity was predicted with
a significance level >92% using a 99% confidence limit.
[1259] This metabonomic analysis can distinguish individuals with
different severity of CHD. Even using the crude parameter of number
of major coronary vessels with >50% stenosis, this example
demonstrates that both PCA and PLS-DA are capable of categorizing
CHD patients on the basis of severity. The failure to achieve
complete separation of the classes is as likely to reflect the
crude nature of the severity designations based solely on coronary
angiography as on any lack of power in the metabonomic analysis to
discriminate individuals.
Example 5 (Comparison Example)
Use of Established Clinical Risk Factors
[1260] In this example, multivariate data analysis was used to
classify the severity of CHD on the basis of established clinical
parameters.
[1261] This allows direct comparison of the performance of the
metabonomic analysis as a diagnostic technique with algorithms
based on conventional risk factors.
[1262] A PCA model was calculated using established clinical
parameters measured for patients with 1, 2 or 3 vessels stenosed.
The scores scatter plot for PC1 and PC2 is shown in FIG. 3-9A-CHD.
The PCA model shows there is much overlap between the samples, and
no separation is evident; compare this with FIG. 3-5A-CHD and FIG.
3-6A-CHD. There is no evidence of separation in the PCA scores
plot, suggesting that clinical parameters do not distinguish
between "1", "2", or "3" vessel disease.
[1263] The corresponding loadings plot is shown in FIG. 3-9B-CHD,
and shows which of the established clinical are responsible for
causing separation between the three different degrees of severity
of CHD. Due to the extent of overlap, the loadings plot is
difficult to interpret.
[1264] Improved separation is possible using PLS-DA (rather than
the unsupervised PCA). Due to the fact that the pattern recognition
package (SIMCA) displays data only in 2-dimensions, and in this
example there are three sample dasses, it is necessary to plot two
dasses at a time calculate for, e.g., PLS-DA models. A scores plot
and the corresponding loadings for each pair is shown in FIG.
3-9C-CHD. As can be seen from the figures, the separation based on
established clincial parameters is not as evident as it was based
on NMR data.
[1265] Pairs of variable importance plots (VIPs) and regression
coefficient plots for each of the three PLS-DA models described in
FIG. 3-9C-(1)-CHD through (6)-CHD are shown in FIG. 3-10-(1)-CHD
through (6)-CHD.
[1266] None of the risk factors measured (including age, blood
pressure, LDL and HDL cholesterol, total cholesterol, total
triglyceride, fibrinogen, PAI-1, white blood cell count, creatinine
or history of cigarette smoking) were significantly different
between the three groups (p>0.05 by ANOVA in each case).
[1267] This demonstrates that .sup.1H-NMR based metabonomic methods
described above are substantially better able to distinguish the
severity of CHD based on a single blood sample than any of the
conventional risk factors yet identified.
[1268] No other conventional risk factors measured in these
subjects (including age, blood pressure, lipoprotein levels or
clotting parameters) differed between the severity classes, even in
a cross-sectional analysis, and hence were completely unable to
distinguish individuals within the population on the basis of CHD
severity. This demonstrates the extent to which metabonomics
improves upon conventional risk factor analysis.
Example 6
Osteoporosis
[1269] As discussed above, the inventors have developed novel
methods (which employ multivariate statistical analysis and pattern
recognition (PR) techniques, and optionally data filtering
techniques) of analysing data (e.g., NMR spectra) from a test
population which yield accurate mathematical models which may
subsequently be used to classify a test sample or subject, and/or
in diagnosis.
[1270] These techniques have been applied to the analysis of blood
serum in the context of osteoporosis. The metabonomic analysis can
distinguish between individuals with and without osteoporosis.
Novel diagnostic biomarkers for osteoporosis have been, identified,
and methods for associated diagnosis have been developed.
[1271] Briefly, metabonomic methods were applied to blood serum
sample for subjects in an osteoporosis study. Biomarkers, including
free proline, were identified as being diagnostic for osteoporosis.
Subsequently, proline levels were used to classify (e.g., diagnose)
patients, specifically, by using predictive mathematical models
which take account of free proline levels.
[1272] Collection of NMR Spectra
[1273] Analysis was performed on serum samples collected from
subjects under study. Serum taken from control subjects (n=40) and
patients with osteoporosis (n=29), prior to a formal diagnosis of
bone disease.
[1274] The data were classified as "control" (triangle,
.tangle-solidup.) or "osteoporosis" (circle, .circle-solid.).
[1275] Osteoporosis was diagnosed according to bone mineral density
(BMD) of the lumbar spine (LS), which was expressed as a Z-score.
Osteoporosis in a subject was diagnosed using the World Health
Organisation (WHO) definition of osteoporosis as a bone mineral
density (BMD) which was below a cut-off value which was 1.5
standard deviations (SDs) below the age- and sex-matched mean
(i.e., a Z-score of -1.5 or below) or by the presence of spinal
fractures (see, e.g., World Health Organisation, 1994). Control
subjects had a Z-score above this cut-off value and no history of
fractures.
[1276] Blood was drawn from each patient, allowed to clot in
plastic tubes for 2 hours at room temperature, and the serum was
collected by centrifugation. Aliquots of serum were stored at
80.degree. C. until assayed.
[1277] Prior to NMR analysis, samples (150 .mu.l) were diluted with
solvent solution (10% D.sub.2O v/v, 0.9% NaCl w/v) (350 .mu.l). The
diluted samples were then placed in 5 mm high quality NMR tubes
(Goss Scientific Instruments Ltd).
[1278] Conventional 1-D 1H NMR spectra of the blood serum samples
were measured on a Bruker DRX-600 spectrometer using the conditions
set forth in the section entitled "NMR Experimental
Parameters."
[1279] Data Analysis
[1280] A Principal Components Analysis (PCA) model was calculated
from the 1D .sup.1H NMR spectra of serum samples from control
subjects (.tangle-solidup.) and patients with osteoporosis
(.circle-solid.).
[1281] The corresponding scores and loadings plots are shown in
FIG. 4-1A-OP and FIG. 4-1B-OP, respectively. Those regions of the
NMR spectrum which are responsible for causing separation between
the different samples are also indicated in FIG. 4-1B-OP.
Separation between controls and osteoporosis is evident in PC2,
with control samples dominating the lower two quadrants and
osteoporosis samples dominating the upper two quadrants.
[1282] A Principal Components Analysis (PCA) model was calculated
from the 1D .sup.1H NMR spectra of serum samples from control
subjects (.tangle-solidup.) and patients with osteoporosis
(.circle-solid.), but, in this case, prior to PCA, the data were
filtered by application of orthogonal signal correction (OSC),
which serves to remove variation that is not correlated to class
and therefore improves subsequent data analysis. The corresponding
scores and loadings plots are shown in FIG. 4-1C-OP and FIG.
4-1D-OP, respectively.
[1283] The improved separation between the control and osteoporosis
samples is evident, with controls dominating the left hand side of
the plot and osteoporosis dominating the right hand side. Note
also, that application of OSC results in maximum variation being
observed in PC1 rather than in PC2.
[1284] Improved separation is possible using PLS-DA (rather than
the unsupervised PCA). A scores plot and the corresponding loadings
plot is shown in FIG. 4-1E-OP and FIG. 4-1F-OP, respectively.
Improved separation is evident, with controls dominating the right
hand side of the plot and osteoporosis dominating the left hand
side.
[1285] FIG. 4-2A-OP shows sections of the variable importance plots
(VIP) and regression coefficient plots derived from the PLS-DA
model described in FIG. 4-1E-OP.
[1286] FIG. 4-2B-OP shows a section of the regression coefficient
plot derived from the PLS-DA model described in FIG. 4-1E-OP. In
the regression coefficient plot, each bar represents a spectral
region covering 0.04 ppm and shows how the .sup.1H NMR profile of
one control samples differs from the .sup.1H NMR profile of a
osteoporosis samples. A positive value on the x-axis indicates
there is a relatively greater concentration of metabolite (assigned
using NMR chemical shift assignment tables) and a negative value on
the x-axis indicates a relatively lower concentration of
metabolite.
[1287] The 10 most important chemical shift windows for the PLS-DA
model are summarised in the following table. The assignments were
made by comparing the loadings with published tables of NMR
data.
11TABLE 4-1-OP NMR spectral Bucket Chem. intensity, in Region Shift
(ppm) osteoporosis wrt # (ppm) Assignment and Multiplicity control
1 1.34 predominantly lipid 1.32(m) decreased*
CH.sub.2CH.sub.2CH.sub.2C- O also lactate CH.sub.3 1.33(d)
increased* 2 1.30 lipid 1.30(m) decreased CH.sub.2 3 1.26 lipid
1.25(m) decreased (CH.sub.2).sub.n, mainly LDL 4 0.86 lipid 0.84(t)
& 0.87(t) decreased CH.sub.3, mainly LDL, VLDL 5 3.38 proline
3.34(m) decreased half .delta.-CH.sub.2 6 2.06 proline 2.05(m)
decreased half .beta.-CH.sub.2 7 2.02 proline 1.99(m) decreased
.gamma.-CH.sub.2 8 4.10 lactate 4.11(q) increased CH 9 3.34 proline
3.34(m) decreased half .delta.-CH.sub.2 10 3.22 choline 3.21(s)
decreased N(CH.sub.3).sub.3 *Intensity changes of these overlapped
peaks were determined by referral to the original .sup.1H NMR
spectra.
[1288] In summary, with respect to control samples, osteoporosis
samples appear to have decreased levels of lipids, proline,
choline, and 3-hydroxybutyrate, and increased levels of lactate,
alanine, creatine, creatinine, glucose, and aromatic amino acids.
Additional data for the buckets associated with these species are
described in the following table. Again, the assignments were made
by comparing the loadings with published tables of NMR data.
12TABLE 4-2-OP NMR spectral Bucket Chem. intensity, in Region Shift
(ppm) and osteoporosis wrt (ppm) Assignment Multiplicity control*
lipid 1.34 CH.sub.2CH.sub.2CH.sub.2CO 1.32(m) decreased 1.30
CH.sub.2 1.30(m) decreased 1.26 (CH.sub.2).sub.n, LDL 1.25(m)
decreased 1.22 CH.sub.3CH.sub.2CH.sub.2 1.22(m) 0.86 CH.sub.3, LDL,
VLDL 0.84(t)&0.87(t) decreased proline 3.38 half
.delta.-CH.sub.2 3.34(m) decreased 3.46 half .delta.-CH.sub.2
3.45(m) decreased 3.42 half .delta.-CH.sub.2 3.45(m) decreased 2.34
half .beta.-CH.sub.2 2.36(m) decreased 2.06 half .beta.-CH.sub.2
2.05(m) decreased 2.02 .gamma.-CH.sub.2 1.99(m) decreased choline
3.22 N(CH.sub.3).sub.3 3.21(s) decreased 3.66 NCH.sub.2 3.66(m)
decreased 3-hydroxybutyrate 4.14 .beta.-CH 4.13(m) decreased 2.38
half .alpha.-CH.sub.2 2.38(m) decreased 2.30 half .alpha.-CH.sub.2
2.31(m) decreased 1.14 .gamma.-CH.sub.3 1.20(d) decreased lactate
4.14 & CH 4.11(q) increased 4.10 1.34 CH.sub.3 1.33(d)
increased alanine 3.74 .alpha.-CH 3.76(q) increased 1.46 CH.sub.3
1.46(d) increased creatine 3.90 CH.sub.2 3.93(s) increased 3.02
CH.sub.3 3.04(s) increased creatinine 4.06 CH.sub.2 4.05(s)
increased 3.06 CH.sub.3 3.05(s) increased glucose 3.66-4.42 various
3.2-5.5 increased aromatic amino acids 7.00-8.00 various 7.00-8.00
increased
[1289] The intensity changes for the proline resonance at .delta.
3.42 and .delta. 3.46, the choline resonance at .delta. 3.66, the
lactate resonance at .delta. 1.34 and the .beta.-hydroxybutyrate
resonance at .delta. 4.14, all of which overlap with other peaks,
were confirmed by referral to the original .sup.1H NMR spectra.
[1290] Validation
[1291] Validation was performed using a y-predicted scatter plot.
FIG. 4-SOP shows the y-predicted scatter plot, and hence the
ability of .sup.1H NMR based metabonomics to predict class
membership (control or osteoporosis) of unknown samples. Using
.about.85% of the control and osteoporosis samples, a PLS-DA model
was constructed and used to predict the presence of disease in the
remaining 15% of samples (the validation set). The y-predicted
scatter plot assigns samples to either class 1 (in this case
corresponding to control) or class 0 (in this case corresponding to
osteoporosis); 0.5 is the cut-off. The PLS-DA model predicted the
presence or absence of osteoporosis in 100% of cases, furthermore,
for a four-component model, class can be predicted with a
significance level .gtoreq.88%, using a 99% confidence limit.
[1292] Proline as Diagnostic Species/Biomarker
[1293] Following this analysis, the buckets designated 3.38, 2.06,
2.02, 3.34 were identified as having lower intensity in
osteoporosis patient plasma as compared to control samples.
[1294] Re-examination of the original NMR spectra rather than the
data-reduced, segmented files derived from them which are used for
the statistical analysis, enables a visual inspection of the NMR
peaks in those specific regions. Identification of the peak
multiplicities in these regions leads a trained NMR spectroscopist
to suggest free proline as the molecule responsible for the peaks.
The fact that these peaks are spin-coupled to each other and hence
are part of the same molecule comes from interpretation of the
cross-peaks seen in a 2-dimensional COSY spectrum. The NMR peaks
seen in the conventional 1-dimensional NMR spectrum are then
compared visually with those of authentic proline dissolved in
water at a comparable pH value. See, for example, Ellenberger et
al., 1975; Lindon et al., 1999.
[1295] The regions 3.38 and 3.34 are both seen to include part of a
multiplet at .delta. 3.34 assignable to one of the protons of the
.delta.-CH.sub.2 pair of hydrogen atoms. The region designated 2.06
shows a resonance at .delta. 2.05 identifiable as one of the
protons from the .beta.-CH.sub.2 group. Similarly the region
designated 2.02 contains a resonance at .delta. 1.99 identified as
one or both of the .gamma.-CH.sub.2 protons of proline (the
chemical shift difference between the two .gamma. protons is
small). The peak multiplicity of each of these peaks is consistent
with an authentic sample of proline measured under comparable
conditions.
[1296] There are 4 other proton resonances for proline which should
also show a change in level with osteoporosis if proline is a
biomarker. These are the other .beta.-, .gamma.-, and
.delta.-CH.sub.2 protons at .delta. 2.34, .about..delta. 2.0, and
.delta. 3.45 respectively and the .alpha.-CH proton at .delta.
4.14. Indeed, examination of the spectra shows that the intensity
of the signals for the other .beta.-CH.sub.2 and .delta.-CH.sub.2
protons also correlate with the diagnosis. It is not possible to
distinguish the other .gamma.-CH.sub.2 proton because its shift is
close to the first .gamma.-CH.sub.2 proton and may already have
been included above. Nor is it possible to observe the chemical
shift of the .alpha.-CH proton because of spectral overlap.
[1297] Finally, confirmation that proline is the substance
responsible for the diagnostic NMR peaks is obtained by adding a
sample of authentic proline to a plasma sample and noting complete
coincidence of all of the endogenous signals assigned to proline
with those of the added proline.
[1298] The .sup.1H NMR chemical shifts for all amino acids
including proline are dependent on the solution pH because of the
presence of the ionisable groups. In the case of proline, these are
the carboxylic acid group (--COOH) and the secondary amine group
(--NH--). Hence it is important to compare the NMR spectra of
plasma with that of an authentic sample of proline at the same pH.
This has been done as described above.
[1299] In addition, it is possible for amino acids to react with
bicarbonate ion (HCO.sub.3.sup.-) in a biological sample to form
carbamate adducts, i.e., formed between the amino acid amino group
and the bicarbonate ion. The resulting adduct has different NMR
chemical shifts to those of the parent amino acid. This problem has
not been seen with proline specifically. However, this problem of
changed chemical shifts can be overcome by adding authentic proline
to the appropriate plasma sample and noting exact coincidence of
all of the added proline proton peaks with those of the endogenous
biomarker peaks.
Example 7
Osteoarthritis
[1300] As discussed above, the inventors have developed novel
methods (which employ multivariate statistical analysis and pattern
recognition (PR) techniques, and optionally data filtering
techniques) of analysing data (e.g., NMR spectra) from a test
population which yield accurate mathematical models which may
subsequently be used to classify a test sample or subject, and/or
in diagnosis.
[1301] These techniques have been applied to the analysis of blood
serum in the context of osteoarthritis. The metabonomic analysis
can distinguish between individuals with and without
osteoarthritis. Novel diagnostic biomarkers for osteoarthritis have
been identified, and methods for associated diagnosis have been
developed.
[1302] Obtaining NMR Spectra
[1303] Analysis was performed on serum samples collected from
subjects under study. Serum taken from control subjects (n=40) and
patients with osteoarthritis (n=29), prior to a formal diagnosis of
bone disease.
[1304] The data were classified as "control" (.tangle-solidup.) or
"osteoarthritis" (.circle-solid.) on the basis of x-ray examination
of the knee and wrist joints. The presence of any bony outgrowth
into the cartilage in any of these joints defined the subject as
having osteoarthritis. In marked contrast to osteoporosis, bone
mineral density is not used in the clinical diagnosis of
osteoarthritis, since patients with osteoarthritis can have any
degree of average bone mineralisation (including pathologically low
bone mineral density) although on average patients with
osteoarthritis have slightly higher bone mineral density than
control subjects.
[1305] Blood was drawn from each patient, allowed to clot in
plastic tubes for 2 hours at room temperature, and the serum was
collected by centrifugation. Aliquots of serum were stored at
-80.degree. C. until assayed.
[1306] Prior to NMR analysis, samples (150 .mu.l) were diluted with
solvent solution (10% D.sub.2O v/v, 0.9% NaCl w/v) (350 .mu.l). The
diluted samples were then placed in 5 mm high quality NMR tubes
(Goss Scientific Instruments Ltd).
[1307] Conventional 1-D .sup.1H NMR spectra of the blood serum
samples were measured on a Bruker DRX-600 spectrometer using the
conditions set forth in the section entitled "NMR Experimental
Parameters."
[1308] Data Analysis
[1309] A Principal Components Analysis (PCA) model was calculated
from the 1D .sup.1H NMR spectra of serum samples from control
subjects (.tangle-solidup.) and patients with osteoarthritis
(.circle-solid.). The corresponding scores and loadings plots are
shown in FIG. 5-1A-OA and FIG. 5-1B-OA, respectively. Those regions
of the NMR spectrum which are responsible for causing separation
between the different samples are also indicated in FIG. 5-1B-OA.
Little or no separation was evident.
[1310] A Principal Components Analysis (PCA) model was calculated
from the ID .sup.1H NMR spectra of serum samples from control
subjects (.tangle-solidup.) and patients with osteoarthritis
(.circle-solid.), but, in this case, prior to PCA, the data were
filtered by application of orthogonal signal correction (OSC),
which serves to remove variation that is not correlated to class
and therefore improves subsequent data analysis. The corresponding
scores and loadings plots are shown in FIG. 5-1C-OA and FIG.
5-1D-OA, respectively. PC2 vs PC3 was plotted, and the improved
separation between the control and osteoarthritis samples is
evident, with controls dominating the upper right of the plot and
osteoarthritis dominating the lower left of the plot.
[1311] Improved separation is possible using PLS-DA (rather than
the unsupervised PCA). The corresponding scores and loadings plots
are shown in FIG. 5-1E-OA and FIG. 5-1F-OA, respectively. Improved
separation is evident, with controls dominating the right hand side
of the plot and osteoarthritis samples dominating the left hand
side.
[1312] FIG. 5-2A-OA shows sections of the variable importance plots
(VIP) and regression coefficient plots derived from the PLS-DA
model described in FIG. 5-1E-OA.
[1313] FIG. 5-2B-OA shows a section of the regression coefficient
plot derived from the PLS-DA model described in FIG. 5-1E-OA In the
regression coefficient plot, each bar represents a spectral region
covering 0.04 ppm and shows how the .sup.1H NMR profile of one
control samples differs from the .sup.1H NMR profile of a
osteoarthritis samples. A positive value on the x-axis indicates
there is a relatively greater concentration of metabolite (assigned
using NMR chemical shift assignment tables) and a negative value on
the x-axis indicates a relatively lower concentration of
metabolite.
[1314] The 7 most important chemical shift windows for the PLS-DA
model which contain NMR peaks from identified metabolites are
summarised in the following table. The assignments were made by
comparing the loadings with published tables of NMR data.
13 TABLE 5-1-OA NMR spectral Bucket Chem. Shift intensity, Region
(ppm) and in osteoarthritis wrt (ppm) Assignment Multiplicity
control* 1 1.30 lipid 1.30(m) increased CH.sub.2 2 1.26 lipid
1.25(m) increased (CH.sub.2).sub.n, LDL 3 1.34 predominantly lipid
1.32(m) increased* CH.sub.2CH.sub.2CH.sub.2C- O also lactate
CH.sub.3 1.33(d) decreased* 4 3.22 choline 3.21(s) decreased
N(CH.sub.3).sub.3 5 0.86 lipid 0.84(t)&0.87(t) increased
CH.sub.3, LDL, VLDL 6 1.22 lipid 1.22(m) increased
CH.sub.3CH.sub.2CH.sub.2 7 3.38 proline 3.34(m) decreased half
.delta.-CH.sub.2 *Intensity changes of these overlapped peaks were
determined by referral to the original .sup.1H NMR spectra.
[1315] In summary, with respect to control samples, osteoarthritis
samples appear to have decreased levels of proline, choline, and
3-hydroxybutyrate; and increased levels of lipids, creatine, and
creatinine. Additional data for the buckets associated with these
species are described in the following table. Again, the
assignments were made by comparing the loadings with published
tables of NMR data.
14TABLE 5-2-OA NMR spectral Bucket intensity, in Region Chem. Shift
(ppm) and osteoarthritis wrt (ppm) Assignment Multiplicity control*
proline 3.38 half .delta.-CH.sub.2 3.34(m) decreased 3.46 half
.delta.-CH.sub.2 3.45(m) decreased 3.42 half .delta.-CH.sub.2
3.45(m) decreased 2.34 half .beta.-CH.sub.2 2.36(m) decreased 2.06
half .beta.-CH.sub.2 2.05(m) decreased 2.02 .gamma.-CH.sub.2
1.99(m) decreased choline 3.22 N(CH.sub.3).sub.3 3.21(s) decreased
3.66 NCH.sub.2 3.66(m) decreased 3-hydroxybutyrate 4.14 .beta.-CH
4.13(m) decreased 2.38 half .alpha.-CH.sub.2 2.38(m) decreased 2.30
half .alpha.-CH.sub.2 2.31(m) decreased 1.14 .gamma.-CH.sub.3
1.20(d) decreased lipid 1.34 CH.sub.2CH.sub.2CH.sub.2CO 1.32(m)
increased 1.30 CH.sub.2 1.30(m) increased 1.26 (CH.sub.2).sub.n,
LDL 1.25(m) increased 1.22 CH.sub.3CH.sub.2CH.sub.2 1.22(m)
increased 0.86 CH.sub.3, LDL, VLDL 0.84(t)&0.87(t) increased
creatine 3.90 CH.sub.2 3.93(s) increased 3.02 CH.sub.3 3.04(s)
increased creatinine 4.06 CH.sub.2 4.05(s) increased 3.06 CH.sub.3
3.05(s) increased
[1316] The intensity changes for the proline resonance, the choline
resonance at .delta. 3.66, the lactate resonance at .delta. 1.34
and the ,hydroxybutyrate resonance at .delta. 4.14, all of which
overlap with other peaks, were confirmed by referral to the
original .sup.1H NMR spectra.
[1317] Validation
[1318] Validation was performed using a y-predicted scatter plot.
FIG. 5-3-OA shows the y-predicted scatter plot, and hence the
ability of .sup.1H NMR based metabonomics to predict class
membership (control or osteoarthritis) of unknown samples. Using
.about.85% of the control and osteoarthritis samples, a PLS-DA
model was constructed and used to predict the presence of disease
in the remaining 15% of samples (the validation set). The
y-predicted scatter plot assigns samples to either class 1 (in this
case corresponding to control) or class 0 (in this case
corresponding to osteoarthritis); 0.5 is the cut-off. The PLS-DA
model predicted the presence or absence of osteoporosis in 90% of
cases, furthermore, for a two-component model, class can be
predicted with a significance level .gtoreq.70%, using a 99%
confidence limit.
[1319] The foregoing has described the principles, preferred
embodiments, and modes of operation of the present invention.
However, the invention should not be construed as limited to the
particular embodiments discussed. Instead, the above-described
embodiments should be regarded as illustrative rather than
restrictive, and it should be appreciated that variations may be
made in those embodiments by workers skilled in the art without
departing from the scope of the present invention as defined by the
appended claims.
References
[1320] A number of patents and publications are cited herein in
order to more fully describe and disclose the invention and the
state of the art to which the invention pertains. Full citations
for these references are provided herein. Each of these references
is incorporated herein by reference in its entirety into the
present disclosure, to the same extent as if each individual
reference was specifically and individually indicated to be
incorporated by reference.
[1321] Ala-Korpela, M., 1995, "H-1 NMR spectroscopy of human blood
plasma," Progress in Nuclear Magnetic Resonance Spectroscopy, Vol.
27, pp. 475-554.
[1322] Ala-Korpela, M., Hiltunen, Y. and Bell, J. D., 1995,
"Quantification of biomedical NMR data using artificial neural
network analysis: Lipoprotein lipid profiles from H-1 NMR data of
human plasma," NMR Biomed., Vol. 8, pp. 235-244.
[1323] Andersen, C. A., 1999, "Direct orthogonalization,"
Chemometrics and Intelligent Laboratory Systems, Vol. 47, pp.
51-63.
[1324] Anker, L. S., and Jurs, P. C., 1992, "Prediction of C-13
nuclear magnetic resonance chemical shifts by artificial neural
networks," Anal. Chem., Vol. 64, pp. 1157-1164.
[1325] Anthony, M. L. et al., 1994, "Pattern recognition
classification of the site of nephrotoxicity based on metabolic
data derived from proton nuclear magnetic resonance spectra of
urine," Mol. Pharmacol., Vol. 46, pp. 199-211.
[1326] Anthony, M. L. et al., 1995, "Classification of
toxin-induced changes in .sup.1H NMR spectra of urine using an
artificial neural network," J. Pharm. Biomed. Anal., Vol. 13, pp.
205-211.
[1327] Beckwith-Hall, B. M. et al., 1998, "Nuclear magnetic
spectroscopic and principal components analysis investigations into
biochemical effects of three model hepatotoxins," Chem. Res. Tox.,
Vol. 11, pp. 260-272.
[1328] Berman J. W., Guida M. P., Warren J., Amat J., and Brosnan
C. F., 1996, "Localization of monocyte chemoattractant peptide-1
expression in the central nervous system in experimental autoimmune
encephalomyelitis and trauma in the rat", Journal of Immunology,
Vol. 156, pp 3017-3023.
[1329] Berman, J. L., Wynne, J., Cohn, P. F. (1978), "A
multivariate approach for interpreting treadmill exercise tests in
coronary artery disease," Circulation, Vol. 58, pp. 505-512.
[1330] Bishop, C., 1995, Neural Networks for Pattern Recognition,
University Press, Oxford, England, pp. 164-193.
[1331] Breslow, J. L., 1993, "Transgenic mouse models of
lipoprotein metabolism and atherosclerosis," Proc. Natl. Acad. Sci.
USA, Vol. 90, pp. 8314-8318.
[1332] Bretthorst, G. L., 1990a, "Bayesian Analysis. 2.
Signal-Detection and Model Selection," J. Magn. Reson., Vol. 88,
pp. 552-570.
[1333] Bretthorst, G. L., 1990b, "Bayesian Analysis. 3. Applicants
to NMR Signal-Detection, Model Selection, and
Parameter-Estimation," J. Magn. Reson., Vol. 88, pp. 571-595.
[1334] Bretthorst, G. L., Hung, C. C., Davignon, D. A., et al.,
1988, "Bayesian-Analysis of Time-Domain Magnetic Resonance
Signals," J. Magn. Reson., Vol. 79, pp. 369-376.
[1335] Bro, R., 1997, "PARAFAC. Tutorial and applications," in
Chemometrics and Intelligent Laboratory Systems, Vol. 38, pp.
149-171.
[1336] Broomhead, D. S., and Lowe, D., 1988, "Multi-variable
functional interpolation and adaptive networks," Complex Systems,
Vol. 2, pp. 321-355.
[1337] Brown, T. R. and Stoyanova, R., 1996, "NMR spectral
quantitation by principal-component analysis . 2. Determination of
frequency and phase shifts," J. Magn. Reson., Series B, Vol. 112,
pp. 32-43.
[1338] Bruce, R. A., 1974, "The value of the Balke protocol," Am.
Heart J., Vol. 88, pp. 533-534.
[1339] Claridge, T. D. W., High-Resolution NMR Techniques in
Organic Chemistry: A Practical Guide to Modem NMR for Chemists,
Oxford University Press, 2000.
[1340] Collins, F. S. and McKusick, V. A., 2001, "Implications of
the Human Genome Project for medical science," JAMA, Vol. 285, pp.
540-544.
[1341] Confort-Gouny, S., Vion-Dury, J., Nicoli, F., Dano, P.,
Gastaut, J.-L., and Cozzone, P. J., 1992, "Metabolic
characterization of neurological diseases by proton localized
nmr-spectroscopy of the human brain," Comptes Rendus de l'Academie
des Sciences Serie III--Sciences de la Vie-Life Sciences, Vol. 315,
pp. 287-293.
[1342] Cullen, P., Funke, H., Schulte, H. and Assmann, G., 1998,
"Lipoproteins and cardiovascular risk--from genetics to CHD
prevention," European Heart Journal, Vol. 19, pp. C5-C11, Suppl.
C.
[1343] Despres, J., Lemieux, I., Dagenais, G., Cantin, B. and
Lamarche, B., 2000, "HDL-cholesterol as a marker of coronary heart
disease risk: the Qubec cardiovascular study," Atherosclerosis,
Vol. 153, pp. 263-272.
[1344] Dolecek, T. A., Milas, N. C., Van Hom, L. V., Farrand, M.
E., Gorder, D. D., Duchene, A. G., Dyer, J. R., Stone, P. A. and
Randall, B. L., 1986, "A long-term nutrition intervention
experience--lipid responses and dietary adherence patterns in the
multiple risk factor intervention trial," J. Am. Diet Assoc., Vol.
86, pp. 752-758.
[1345] Dutt, M. J. and Lee, K. H., 2000, "Proteomic analysis,"
Curr. Oin. Biotechnol., Vol. 11, pp. 176-179.
[1346] Dvorak A. M., Schroeder J. T., MacGlashan D. W., Bryan K.
P., Morgan E. S., Lichtenstein L. M. and MacDonald S. M., 1996,
"Comparative ultrastructural morphology of human basophils
stimulated to release histamine by anti-Ige, recombinant
IGE-dependent histamine-releasing factor, or monocyte chemotactic
protein-1", Journal of Allergy and Clinical Immunology, Vol. 98, pp
355-370.
[1347] Eriksson, L., Johansson, E., Kettaneh-Wold, H., and Wold,
S., 1999, Induction to Multi and Megavariate Analysis using
Projection Methods (PCA & PLS), UMETRICS Inc. (Box 7960,
SE90719 Umea, SWEDEN), pp. 267-296.
[1348] Fan, T. W.-M., 1996, "Metabolite profiling by one- and
two-dimensional NMR analysis of complex mixtures," Prog. NMR
Spectrosc., Vol. 28, pp. 161-219.
[1349] Farrant, R. D., et al., 1992, "An automatic data reduction
and transfer method to aid pattern-recognition analysis and
classification of NMR spectra," J. Pharm. Biomed. Anal., Vol. 10,
pp. 141-144.
[1350] Feam, T., 2000, "On orthogonal signal correction,"
Chemometrics and Intelligent Laboratory Systems, Vol. 50, pp.
47-52.
[1351] Frank, I. E., et al., 1984, "Prediction of product quality
from spectral data using the partial least-squares method," J.
Chem. Info. Comp., Vol. 24, p. 20-24.
[1352] Garrod, S., Humpher, E., Connor, S. C., Connelly, J. C.,
Spraul, M., Nicholson, J. K., and Holmes, E., 2001,
"High-resolution H-1 NMR and magic angle spinning NMR spectroscopic
investigation of the biochemical effects of 2-bromoethanamine in
intact renal and hepatic tissue," Magn. Reson. Med., Vol. 45, pp.
781-790.
[1353] Gartland, K. P. R. et al., 1990a, "A pattern recognition
approach to the comparison of .sup.1H NMR and clinical chemical
data for classification of nephrotoxicity," J. Pharm. Biomed.
Anal., Vol. 8, pp. 963-968.
[1354] Gartland, K. P. R. et al., 1990b, "Pattern recognition
analysis of high resolution .sup.1H NMR spectra of urine. A
nonlinear mapping approach to the classification of toxicological
data," NMR in Biomed., Vol. 3, pp. 166-172.
[1355] Gartland, K. P. R. et al., 1991, "The application of pattern
recognition methods to the analysis and classification of
toxicological data derived from proton NMR spectroscopy of urine,"
Mol. Pharmacol., Vol. 39, pp. 629-642.
[1356] Geisow, M. J., 1998, "Proteomics: One small step for a
digital computer, one giant leap for humankind," Nature
Biotechnology, Vol. 16, p. 206.
[1357] Ghirnikar R. S., Lee Y. L., He T. R., Eng L. F., 1996,
"Chemokine expression in rat stab wound brain injury", Journal of
Neuroscience Research, Vol. 46, pp 727-733.
[1358] Gong J.-H., Ratkay L. G., Waterfield J. D., and Clark-lewis
I., 1997, "An antagonist of monocyte chemoattractant protein 1
(mcp-1) inhibits arthritis in the mrl-lpr mouse model", Journal of
Experimental Medicine, Vol. 186, pp 131-137.
[1359] Guyton, A. C., 1991, "Chapter 12: Electrocardiographic
interpretation of cardiac muscle and coronary abnormalities," In: A
Textbook of Medical Physiology, Eighth Edition (WB Saunders,
London), pp. 124-137.
[1360] Gygi, S. P.; Rochon, Y.; Franza, B. R.; Aebersold, R, 1999,
"Correlation between protein and mRNA abundance in yeast,"
Molecular and Cellular Biology, Vol. 19, pp. 1720-1730.
[1361] Hare, B. J., and Prestegard, J. H., 1994, "Application of
neural networks to automated assignment of NMR spectra of
proteins," J. Biomol. NMR, Vol. 4, pp. 35-46.
[1362] Hiltunen, Y., Heiniemi, E. and Ala-Korpela, M., 1995,
"Lipoprotein lipid quantification by neural-network analysis of H-1
NMR data from human blood-plasma," J. Mag. Res. Ser. B, Vol. 106,
pp. 191-194.
[1363] Holmes, E. et al., 1998a, "Development of a model for
classification of toxin-induced lesions using .sup.1H NMR
spectroscopy of urine combined with pattern recognition," NMR in
Biomed., Vol. 11, pp. 235-244.
[1364] Holmes, E. et al., 1998b, "The identification of novel
biomarkers of renal toxicity using automatic data reduction
techniques and PCA of proton NMR spectra of urine," Chemomet. &
Intel. Lab Systems, Vol. 44, pp. 245-255.
[1365] Holmes, E., et al., 1992, "NMR spectroscopy and pattern
recognition analysis of the biochemical processes associated with
the progression and recovery from nephrotoxic lesions in the rat
induced by mercury(II)chloride and 2-bromo-ethanamine," Mol.
Pharmacol., Vol. 42, pp. 922-930.
[1366] Holmes, E., et al., 1994, "Automatic data reduction and
pattern recognition methods for analysis of .sup.1H NMR spectra of
human urine from normal and pathological states," Anal. Biochem.,
Vol. 220, pp. 284-296.
[1367] Howells, S. L., Maxwell, R. J., Howe, F. A., Peet, A. C.,
Stubbs, M., Rodrigues, L. M., Robinson, S. P., Baluch, S., and
Griffiths, J. R., 1993, "Pattern-recognition of P-31
magnetic-resonance spectroscopy tumor spectra obtained in-vivo,"
NMR Biomed., Vol. 6, pp. 237-241.
[1368] Iida K, Kadota J., Kawakami K, Matsubara Y., Shirai R., and
Kohno S., 1997, "Aanalysis of T cell subsets and beta chemokines in
patients with pulmonary sarcoidosis", Thorax, Vol. 52, pp
431-437.
[1369] Isles, C. G. and Paterson, J. R., 2000, "Identifying
patients at risk for coronary heart disease: implications from
trials of lipid-lowering drug therapy," Q. J. Med., Monthly Journal
of the Association of Physicians, Vol. 93, pp. 567-574.
[1370] Joreskog, K. G., and Wold, H., 1982 System under Indirect
Observation, North Holland, Amsterdam.
[1371] Kannel, W. B, Gordon, T. (eds.), February 1974, The
Framingham Study. An epidemiological investigation of
cardiovascular disease, DHEW pub. no. (NIH) 74-599, Public Health
Service, Washington, DC (U.S. Government Printing Office).
[1372] Kjelsberg, M. O., Cutler, J. A. and Dolecek, T. A., 1997,
"Brief description of the Multiple Risk Factor Intervention Trial,"
Amer. J. Clinical Nutrition, Vol. 65 (supplement), pp.
S191-S195.
[1373] Klenk, H. P., et al., 1997, "The complete genome sequence of
the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus
fulgidus," Nature, Vol. 390, pp. 364-370.
[1374] Kopka, P. Dormann, T. Altmann, R. N. Trethewey and L.
Willmitzer, 2000, "Metabolic profiling for plant functional
genomics," Nature Biotechnology, Vol. 18, pp. 1157-1161.
[1375] Kowalski, B. R., Sharaf, M. and Illman D., Chemometrics
(John Wiley & Sons, Chichester, 1986).
[1376] Kuesel, A. C., Stoyanova, R., Aiken, N. R., Li, C.-W.,
Szwergold, B. S., Shaller, C. and Brown, T. R., 1996, "Quantitation
of resonances in biological P-31 NMR spectra via principal
component analysis: Potential and limitations," NMR Biomed., Vol.
9, pp. 93-104.
[1377] Kuller, L. H., Ockene, J. K, Meilahn, E., Wentworth, D. N.,
Svendsen, K. H. and Neaton, J. D., 1991, "Cigarette-smoking and
mortality," Preventative Medicine, Vol. 20, pp. 638-654.
[1378] Kvalheim, O. M., Karstang, T. V., 1989, "Interpretation of
latent-variable regression models," Chemometrics and Intelligent
Laboratory Systems, Vol. 7, pp. 39-51.
[1379] Lindon, J. C., et al., 1980, "Digitisation and Data
Processing in Fourier Transform NMR," Progress in NMR Spectroscopy,
Vol. 14, pp. 27-66.
[1380] Lindon, J. C., et al., 1999, "NMR spectroscopy of
biofluids," in Annual Reports on NMR Spectroscopy (Webb, G. A.,
ed), Academic Press (London), Vol. 38, pp. 1-88.
[1381] Lindon, J. C.; Holmes, E.; Nicholson, J. K, 2001, "Pattern
recognition methods and applications in biomedical magnetic
resonance," Progress in NMR Spectroscopy," Vol. 39, pp. 140.
[1382] Martin, G. J., 1998, "Recent advances in site-specific
natural isotope fractionation studied by nuclear magnetic
resonance," Isotopes in Environmental and Health Studies, Vol. 34,
pp. 233-243.
[1383] Martin, M. L. and Martin, G. J., 1999, "Site-specific
isotope effects and origin inference," Analysis, Vol. 27, p.
209-213.
[1384] Martin T. R., Galli S. J., Katona I. M. and Drazen J. M.,
1989, "Role of mast-cells in anaphylaxis--evidence for the
importance of mast-cells in the cardiopulmonary alterations and
death induced by anti-IGE in mice", Journal of Clinical
Investigation, Vol. 83, pp 1375-1383.
[1385] Mazzucchelli L., Hauser C., Zgraggen K., Wagner H. E., Hess
M. W., Laissue J. A. and Mueller C, 1996, "Differential in situ
expression of the genes encoding the chemokines mcp-1 and rantes in
human inflammatory bowel disease", Journal of Pathology Vol. 178,
201-206.
[1386] McIlvain, H. E., McKinney, M. E., Thompson, A. V. and Todd,
G. L., 1992, "Application of the MRFIT smoking cessation program to
a healthy, mixed-sex sample," Am. J. Prev. Med., Vol. 8, pp.
165-170.
[1387] Moka, D., et al., 1998, "Biochemical classification of
kidney carcinoma biopsy samples using magic angle spinning NMR
spectroscopy," J. Pharm. Biomed. Anal., Vol. 17, pp. 125-132.
[1388] Morvan, D., Jehenson, P., Duboc, D., and Syrota, A., 1990,
"Discriminant factor-analysis of P-31 NMR spectroscopic data in
myopathies," Magn. Reson. Med., Vol. 13, pp. 216-227.
[1389] Multiple Risk Factor Intervention Trial (MRFIT) Research
Group, 1986, "Relationship between baseline risk factors and
coronary heart disease and total mortality in the Multiple Risk
Factor Intervention Trial," Prev. Med., Vol. 15, pp. 254-273.
[1390] Nicholson, J. K. et al., 1989, "High resolution proton
magnetic resonance spectroscopy of biological fluids," Prog. NMR
Spectrosc., Vol. 21, pp. 449-501.
[1391] Nicholson, J. K. et al., 1995, "750 MHz .sup.1H and
.sup.1H-.sup.13C NMR spectroscopy of human blood plasma,"
Analytical Chemistry, Vol. 67, pp. 793-811.
[1392] Nicholson, J. K., et al., 1999, "Metabonomics--understanding
the metabolic responses of living systems to pathophysiological
stimuli via multivariate statistical analysis of biological NMR
spectroscopic data," Xenobiotica, Vol. 29, pp. 1181-1189.
[1393] Nillson, N. J., 1965, Learning Machines, McGraw-Hill, New
York.
[1394] Ogata H., Takeya M., Yoshimura T., Takagi K. and Takahashi
K. 1997, "The role of monocyte chemoattractant protein-1 (mcp-1) in
the pathogenesis of collagen-induced arthritis in rats", Journal of
Pathology Vol. 182, pp106-114.
[1395] Parzen, E., 1962, "On estimation of a probability density
function and mode," Ann. Mathemat. Stat., Vol. 33, p.
1065-1076.
[1396] Patterson, D., 1996, Artificial Neural Networks, Prentice
Hall, Singapore.
[1397] Plump, A. S., Smith, J. D., Hayek, T., Aalto-Setala, K.,
Walsh, A., Verstuft, J. G., Rubin E. M. & Breslow, J. L., 1992,
"Severe hypercholesterolemia and atherosclerosis in apolipoproteinE
deficient mice created by homologous recombination in ES cells,"
Cell, Vol. 71, pp. 343-353.
[1398] Press, William H., Teukolsky, Saul A., Vetterling, William
T., Flannery, Brian P., January 1993, Numerical Recipes in C: The
Art of Scientific Computing, 2nd edition, Cambridge University
Press.
[1399] Quinlan, J. R., 1986, "Induction of decision trees," Machine
Learning, Vol. 1, pp. 81-106.
[1400] Ross, R., 1999, "Mechanisms of disease--Atherosclerosis--An
inflammatory disease," The New England Journal of Medicine, Vol.
340, pp. 115-126.
[1401] Sach M., Bauermeister K, Burger J., Loetscher P., Elsner J.,
Schollmeyer P, and Dobos G., 1997, "Inverse mcp-1/il-8 ration in
effluents of CAPD patients with peritonitis and in isolated
cultured human peritoneal macrophages", Nephrology. Dialysis and
Transplantation, Vol. 12, pp 315-320.
[1402] Sjostrom, M., Wold, S., and Soderstrom, B., 1986, "PLS
Discriminant Plots," Proceedings of PARC in Practice, Amsterdam,
Jun. 19-21, 1985, Elsevier Science Publishers B.V., North
Holland.
[1403] Somorjai, R. L., Nikulin, A. E., Pizzi, N., Jackson, D.,
Scarth, G., Dolenko, B., Gordon, H., Russell, P., Lean, C. L.,
Delbridge, L., Mounfford, C. E., and Smith, I. C. P., 1995,
"Computerized consensus diagnosis--a classification strategy for
the robust analysis of MR spectra .1. application to H-1 spectra of
thyroid neoplasms," Magn. Reson. Med., Vol. 33, pp. 257-263.
[1404] Speckt, D. F., 1990, "Probabilistic Neural Networks," Neur.
Networks, Vol. 3, pp. 109-118.
[1405] Spraul, M. et al., 1994, "Automatic reduction of NMR
spectroscopic data for statistical and pattern recognition
classification of samples," J. Pharm. Biomed. Anal., Vol. 12, pp.
1215-1225.
[1406] Stahle, L., and Wold, S., 1987, "Partial Least Squares
Analysis with Cross-Validation for the Two-Class Problem: A Monte
Carlo Study," Journal of Chemometrics, Vol. 1, pp. 185-196.
[1407] Stoyanova, R., Kuesel, A. C., and Brown, T. R., 1995,
"Application of principal-component analysis for NMR spectral
quantitation," J. Magn. Reson., Series A, Vol. 115, pp.
265-269.
[1408] Sugiyama Y., Kasahara T., Mukaida N., Matsushima K. and
Kitamura S., 1995, "Chemokines in bronchoalveolar lavage fluid in
summer-type hypersensitivity pneumonitis", European Respiratory
Journal, Vol. 8, pp 1084-1090.
[1409] Sun, J., 1997, "Statistical analysis of NIR data: data
pretreatment," Journal of Chemometrics, Vol. 11, pp. 525-532.
[1410] Sze, D. Y., et al., 1994, "High-resolution proton NMR
studies of lymphocyte extracts," Immunomethods, Vol. 4, pp.
113-126.
[1411] Tomlins, A. M. et al., 1998, "High resolution magic angle
spinning .sup.1H NMR analysis of intact prostatic hyperplastic and
tumour tissues," Anal. Comm., Vol. 35, pp. 113-115.
[1412] Tranter, G. E., et al., 1999, "Metabonomic prediction of
drug toxicity via probabilistic neural network analysis of NMR
biofluid data," Abstr. .sup.9th North American ISSX Meeting, Oct.
24-28, 1999, p. 246.
[1413] Volejnikova S., Laskari M., Marks jr. S. C., and Graves D.
T., 1997, "Monocyte recruitment and expression of monocyte
chemoattractant protein-1 are developmentally regulated in
remodeling bone in the mouse", American Journal of Pathology, Vol.
150, pp 1711-1721.
[1414] Wassermann, P. D., 1998, Nerural Computing: Theroy and
Practice, (Van Nostrand, ed.) Reinhold, New York, USA.
[1415] Weber, O. M., Duc, C. O., Meier, D., and Boesiger, P., 1998,
"Heuristic optimization algorithms applied to the quantification of
spectroscopic data," Magn. Reson. Med., Vol. 39, pp. 723-730.
[1416] Westerhuis, J. A., de Jong, S., Smilde, A. K., 2001, "direct
orthogonal signal correction," Chemometrics and Intelligent
Laboratory Systems, Vol. 56, pp. 13-25.
[1417] Wise, B. M., Gallagher, N. B., 2001,
http://www.eigenvector.com/MAT- LAB/OSC.html.
[1418] Wold, H., 1966, in Multivariate Analysis (P. R. Kirshnaiah,
Ed.) Academic Press, New York.
[1419] Wold, S., 1976, "Pattern recognition by means of disjoint
principal components models," Pattern Recog., Vol. 8, pp.
127-139.
[1420] Wold, S., Antti, H., Lindgren, F., and Ohman, J., 1998a,
"Orthogonal Signal Correction of Near-infrared Spectra,"
Chemometrics and Intelligent Laboratory Systems, Vol. 44, pp.
175-185.
[1421] Wold, S., Kettaneh, N., Friden, H., and Holmberg, A., 1998b,
"Moldelling and Diagnostics of Batch Processes and Analogous
Kinetic Experiments," Chemometrics and Intelligent Laboratory
Systems, Vol. 44, pp. 331-340.
[1422] Yokode, M., Hammer, R. E., Ishibashi, S., Brown, M. S. &
Goldstein, J. L., 1990, "Diet-induced hypercholesterolemia in mice:
prevention by over-expression of LDL receptors," Science, Vol. 250,
pp. 1273-1275.
[1423] Zeyneloglu H. B., Seli E., Senturk L. M., Gutierrez L. S.,
Olive D. L. and Arici A., 1998, "The effect of monocyte chemotactic
protein 1 in intraperitoneal adhesion formation in a mouse model",
American Journal of Obstetrics and Gynecology, Vol. 179, pp
438-443.
[1424] Zheng M. H., Fan Y, Smith A, Wysocki S., Papadimitriou J.
M., Wood D. J., 1998, "Gene expression of monocyte chemoattractant
protein-1 in giant cell tumors of bone osteoclastoma: possible
involvement in cd68.sup.+ macrophage-like cell migration", Journal
of Cellular Biochemistry, Vol. 70, pp 121-129.
* * * * *
References