U.S. patent application number 14/493141 was filed with the patent office on 2015-10-15 for methods and systems for determining autism spectrum disorder risk.
The applicant listed for this patent is SynapDx Corporation. Invention is credited to Doris Damian, Mark A. DePristo, Ute Geigenmuller, Stanley N. Lapidus, Jeffrey R. Luber, Maciej Pacula.
Application Number | 20150294081 14/493141 |
Document ID | / |
Family ID | 54265279 |
Filed Date | 2015-10-15 |
United States Patent
Application |
20150294081 |
Kind Code |
A1 |
Geigenmuller; Ute ; et
al. |
October 15, 2015 |
METHODS AND SYSTEMS FOR DETERMINING AUTISM SPECTRUM DISORDER
RISK
Abstract
In certain embodiments, the invention stems from the discovery
that analysis of population distribution curves of metabolite
levels in blood can be used to facilitate predicting risk of autism
spectrum disorder (ASD) and/or to differentiate between ASD and
non-ASD developmental delay (DD) in a subject. In certain aspects,
information from assessment of the presence, absence, and/or
direction (upper or lower) of a tail effect in a metabolite
distribution curve is utilized to predict risk of ASD and/or to
differentiate between ASD and DD.
Inventors: |
Geigenmuller; Ute;
(Lexington, MA) ; Damian; Doris; (Lexington,
MA) ; Pacula; Maciej; (Lexington, MA) ;
DePristo; Mark A.; (Lexington, MA) ; Luber; Jeffrey
R.; (Lexington, MA) ; Lapidus; Stanley N.;
(Lexington, MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
SynapDx Corporation |
Lexington |
MA |
US |
|
|
Family ID: |
54265279 |
Appl. No.: |
14/493141 |
Filed: |
September 22, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62002169 |
May 22, 2014 |
|
|
|
61978773 |
Apr 11, 2014 |
|
|
|
Current U.S.
Class: |
506/12 ;
702/19 |
Current CPC
Class: |
G06F 19/00 20130101;
G16H 50/30 20180101; G01N 33/492 20130101; G01N 2800/28 20130101;
G01N 2800/38 20130101; G01N 33/6896 20130101; G16H 50/20
20180101 |
International
Class: |
G06F 19/00 20060101
G06F019/00; G01N 33/49 20060101 G01N033/49 |
Claims
1-29. (canceled)
30. A method comprising steps of: (i) obtaining a first set of
blood samples from human subjects having autism spectrum disorder
(ASD) and a second set of blood samples from human subjects having
non-ASD developmental delay (DD); (ii) measuring levels of
metabolites in the first and second set of samples, wherein the
metabolites comprise xanthine, gamma-CEHC, hydroxy-chlorothalonil,
5-hydroxyindoleacetate (5-HIAA), indoleacetate, p-cresol sulfate,
1,5-anhydroglucitol (1,5-AG), 3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate,
hydroxyisovaleroylcarnitine (C5), isovalerylglycine, lactate,
N1-Methyl-2-pyridone-5-carboxamide, pantothenate (Vitamin B5),
phenylacetylglutamine, pipecolate, 3-hydroxyhippurate, and
octenoylcarnitine; (iii) determining a distribution profile for the
metabolites in the first set and second set of samples, wherein the
distribution profile has a lower end and an upper end; (iv)
defining for each metabolite: (a) a first threshold at the lower
end of the distribution profile (left tail), wherein the first
threshold is the 15.sup.th percentile; and (b) a second threshold
at the upper end of the distribution profile (right tail), wherein
the second threshold is the 90.sup.th percentile.
31. (canceled)
32. The method of claim 30, wherein the blood samples is are plasma
samples.
33. The method of claim 30, wherein the metabolite levels are
measured by mass spectrometry.
34-35. (canceled)
36. A method of differentiating between autism spectrum disorder
(ASD) and non-ASD developmental delay (DD) in a human subject, the
method comprising steps of: (i) measuring the levels of a plurality
of metabolites in a blood sample obtained from the subject, wherein
the plurality of metabolites comprises at least two metabolites
selected from the group consisting of xanthine, gamma-CEHC,
hydroxy-chlorothalonil, 5-hydroxyindoleacetate (5-HIAA),
indoleacetate, p-cresol sulfate, 1,5-anhydroglucitol (1,5-AG),
3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate,
hydroxyisovaleroylcarnitine (C5), isovalerylglycine, lactate,
N1-Methyl-2-pyridone-5-carboxamide, pantothenate (Vitamin B5),
phenylacetylglutamine, pipecolate, 3-hydroxyhippurate, and
octenoylcarnitine; (ii) calculating the number of metabolites
measured in step (i) selected from
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,
hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,
N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,
phenylacetylglutamine, octenoylcarnitine, and 1,5-anhydroglucitol
(1,5-AG) with a level in the sample at or below the lower threshold
as defined in step (iv) of claim 30 (ASD left tail); (iii)
calculating the number of metabolites measured in step (i) selected
from 5-hydroxyindoleacetate, lactate, pantothenate (Vitamin B5),
pipecolate, xanthine, and hydroxy-chlorothalonil with a level in
the sample at or above the upper threshold as defined in step (iv)
of claim 30 (ASD right tail); wherein a higher number of
metabolites calculated in (ii) and (iii) is indicative of a higher
likelihood that a subject has ASD; (iv) calculating the number of
metabolites measured in step (i) selected from
3-(3-hydroxyphenyl)propionate, pipecolate, xanthine, and
3-hydroxyhippurate with a level in the sample at or below the lower
threshold as defined in step (iv) of claim 30 (DD left tail); (v)
calculating the number of metabolites measured in step (i) selected
from 3-indoxyl sulfate, isovalerylglycine, p-cresol sulfate, and
phenylacetylglutamine with a level in the sample at or above the
upper threshold as defined in step (iv) of claim 30 (DD right
tail); wherein a higher number of metabolites calculated in (iv)
and (v) is indicative of a higher likelihood that a subject has DD;
and (vi) determining that the subject has ASD or DD based on the
number obtained in steps (ii)-(iii) and/or (iv)-(v).
37. The method of claim 36, wherein the blood sample is a plasma
sample.
38. The method of claim 36, wherein the metabolites are measured by
mass spectrometry.
39. The method of claim 36, wherein the subject is no greater than
about 54 months of age.
40. The method of claim 36, wherein the subject is no greater than
about 36 months of age.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims the benefit of U.S.
Provisional Patent Application Ser. No. 61/978,773 filed Apr. 11,
2014, and U.S. Provisional Patent Application Ser. No. 62/002,169
filed May 22, 2014; the contents of each of which are incorporated
by reference herein in their entirety.
FIELD OF THE INVENTION
[0002] The present invention relates generally to the prediction of
risk for Autism Spectrum Disorder (ASD) and other disorders.
BACKGROUND
[0003] Autism Spectrum Disorders (ASD) are pervasive developmental
disorders characterized by reciprocal social interaction deficits,
language difficulties, and repetitive behaviors and restrictive
interests that often manifest during the first 3 years of life. The
etiology of ASD is poorly understood but is thought to be
multifactorial, with both genetic and environmental factors
contributing to disease development.
[0004] Data show that although the average age at which parents
begin to suspect an ASD in their child is 20 months, the median age
of diagnosis is not until 54 months. An important challenge from a
clinical perspective is determining, as early as possible, whether
a child has ASD and requires specialist referral for an autism
treatment plan.
SUMMARY
[0005] Diagnosis of ASD is typically made by developmental
pediatricians and other specialists only after careful assessment
of children using criteria spelled out in the Diagnostic and
Statistical Manual of Mental Disorders. Reliable diagnosis often
entails intense assessment of subjects by multiple experts
including developmental pediatricians, neurologists, psychiatrists,
psychologists, speech and hearing specialists and occupational
therapists. Moreover, the median age of diagnosis of ASD is 54
months despite the fact that the average age at which parents
suspect ASD is as early as 20 months. The CDC (Centers for Disease
Control) has observed that only 18% of children who end up with an
ASD diagnosis are identified by age 36 months. Regrettably, young
children suffering from undiagnosed ASD miss an opportunity to
benefit from early therapeutic intervention during an important
window of childhood development. A medical diagnostic test to
reliably determine ASD risk is needed, particularly to identify
younger children earlier when therapeutic intervention is likely to
be more effective.
[0006] Embodiments of the present invention stem from the discovery
that analysis of distribution curves of measured analytes, such as
metabolites, within and across populations provides information
that can be utilized to build or improve a classifier for
prediction of risk for a condition or disorder, such as ASD. In
particular, analysis of population distribution curves of
metabolite levels in blood facilitates prediction of the risk of
autism spectrum disorder (ASD) in a subject. For example, analysis
of population distribution curves of metabolite levels in blood can
be used to differentiate between autism spectrum disorder (ASD) and
non-ASD developmental disorders in a subject such as developmental
delay (DD) not due to autism spectrum disorder.
[0007] The statistical analysis of a biomarker differentiating two
groups usually assumes that the two populations differ in their
mean biomarker levels and that variation around this mean is due to
experimental and/or population variation best characterized by a
Gaussian distribution. Contrary to this baseline model, it is
observed herein that for some analytes, but not for others, the
distribution in ASD, or sometimes in DD, is best characterized as
itself composed of multiple sub-distributions--one sub-distribution
that is essentially undifferentiated from the other health state
(e.g., where ASD and DD distributions are undifferentiated), and
another sub-distribution that is far removed from the mean in a
minority of subjects, e.g., a "tail" of the combined distribution
for that population. This insight leads to a significantly
different analytic framework from the baseline; it is found that
for certain analytes, better results are achieved by defining a
threshold based on a top or bottom portion of the population
distribution, e.g., by establishing a ranking that does not require
an underlying Gaussian distribution model.
[0008] Thus, a metabolite is described herein as exhibiting a "tail
enrichment" or "tail" effect, where there is an enrichment of
samples from a particular population (e.g., either ASD or DD) at a
distal portion of the distribution curve of metabolite levels for
that metabolite. Information from assessment of the presence,
absence, and/or direction (upper or lower) of a tail effect in a
metabolite distribution curve can be utilized to predict risk of
ASD. It has been discovered that for particular metabolites,
metabolite levels corresponding to a top or bottom portion (e.g.,
decile) of the distribution curve, i.e., within a `tail` of the
distribution curve (whether in a `right tail` or `left tail`), are
highly informative of the presence or absence of ASD.
[0009] Furthermore, it is found that risk prediction improves as
multiple metabolites are incorporated having a low degree of
overlapping, mutual information. For example, for assessment of
ASD, there are particular groups of metabolites that provide
complementary diagnostic/risk assessment information. That is,
ASD-positive individuals who are identifiable by analysis of the
level of a first metabolite (e.g., individuals within an identified
tail of the first metabolite) are not the same as the ASD-positive
individuals who are identifiable by analysis of a second metabolite
(or there may be a low, non-zero degree of overlap). Without
wishing to be bound to a particular theory, this discovery may be
reflective of the multi-faceted nature of ASD, itself.
[0010] Thus, in certain embodiments, the risk assessment method
includes identifying whether a subject falls within any of a
multiplicity of identified metabolite tails involving a plurality
of metabolites, e.g., where the predictors of the different
metabolite tails are at least partially disjoint, e.g., they have
low mutual information, such that risk prediction improves as
multiple metabolites are incorporated with low mutual information.
The classifier has a predetermined level of predictability, e.g.,
in the form of AUC--i.e., area under a ROC curve for the classifier
that plots false positive rate (1-specificity) against true
positive rate (sensitivity)--where AUC increases upon addition of
metabolites to the classifier that exhibit tail effects with low
mutual information.
[0011] In some embodiments, the invention stems from the discovery
that certain threshold values of metabolite levels in blood can be
used to facilitate predicting risk of autism spectrum disorder
(ASD) in a subject. In certain aspects, these threshold values of
metabolites deduced from assessment of the presence, absence,
and/or direction (upper or lower) of a tail effect in a metabolite
distribution curve are utilized to predict risk of ASD. In certain
aspects, these threshold values could be at either the upper or
lower end of the distribution of metabolite levels in a population.
It has been discovered that, for particular metabolites, levels of
the metabolite above an upper threshold value and/or below a lower
threshold value are highly informative of the presence or absence
of ASD.
[0012] In some embodiments, levels of these metabolites are useful
in distinguishing ASD from other forms of developmental delay
(e.g., developmental delay (DD) not due to autism spectrum
disorder).
[0013] In one aspect, the invention is directed to a method of
differentiating between autism spectrum disorder (ASD) and non-ASD
developmental delay (DD) in a subject, the method comprising: (i)
measuring the level of a first metabolite of a plurality of
metabolites from a sample obtained from the subject, the population
distributions of the first metabolite being previously
characterized in a first population of subjects with ASD and in a
second population of subjects with non-ASD developmental delay
(DD), wherein the first metabolite is predetermined to exhibit an
ASD tail effect and/or a DD tail effect, each tail effect
comprising an associated right tail or left tail enriched in
members of the corresponding (ASD or DD) population, and where the
first metabolite exhibits an ASD tail effect with a right tail, the
level of the first metabolite in the sample is within the ASD tail
when the level of the first metabolite in the sample is greater
than a predetermined upper (minimum) threshold defining the right
tail enriched in first (ASD) population members, and, where the
first metabolite exhibits an ASD tail effect with a left tail, the
level of the first metabolite in the sample is within the ASD tail
when the level of the first metabolite in the sample is less than a
predetermined lower (maximum) threshold defining the left tail
enriched in first (ASD) population members, and where the first
metabolite exhibits a DD tail effect with a right tail, the level
of the first metabolite in the sample is within the DD tail when
the level of the first metabolite in the sample is greater than a
predetermined upper (minimum) threshold defining the right tail
enriched in second (DD) population members, and, where the first
metabolite exhibits a DD tail effect with a left tail, the level of
the first metabolite in the sample is within the DD tail when the
level of the first metabolite in the sample is less than a
predetermined lower (maximum) threshold defining the left tail
enriched in second (DD) population members; (ii) measuring the
level of at least one additional metabolite of the plurality of
metabolites from the sample, the population distribution of each of
the at least one additional metabolite being previously
characterized in the first population and in the second population
and predetermined to exhibit at least one of an ASD tail effect and
a DD tail effect, and, for each of the at least one additional
metabolite, identifying whether the level of said metabolite in the
sample is within the corresponding ASD tail and/or DD tail,
according to step (i); and (iii) determining with a predetermined
level of predictability that (a) the subject has ASD and not DD or
(b) the subject has DD and not ASD, based on the identified ASD
tails and/or the identified DD tails within which the sample lies
for the metabolites analyzed in step (i) and step (ii).
[0014] In certain embodiments, the first metabolite is
predetermined to exhibit an ASD tail effect with an associated
upper (minimum) or lower (maximum) threshold, said threshold
predetermined such that the odds that a sample of unknown
classification (a previously uncharacterized sample) meeting this
criteria is ASD as opposed to DD are no less than 1.6:1 with
p.ltoreq.0.3. In certain embodiments, the odds are no less than
2:1, or no less than 2.5:1, or no less than 2.75:1, or no less than
3:1, or no less than 3.25:1, or no less than 3.5:1, or no less than
3.75:1, or no less than 4:1. In any of the preceding, p-value
(statistical significance value) satisfies p.ltoreq.0.3, or
p.ltoreq.0.25, or p.ltoreq.0.2, or p.ltoreq.0.15, or p.ltoreq.0.1,
or p.ltoreq.0.05.
[0015] In certain embodiments, the first metabolite is
predetermined to exhibit a DD tail effect with an associated upper
(minimum) or lower (maximum) threshold, said threshold
predetermined such that the odds that a sample of unknown
classification (a previously uncharacterized sample) meeting this
criteria is DD as opposed to ASD are no less than 1.6:1 with
p.ltoreq.0.3. In certain embodiments, the odds are no less than
2:1, or no less than 2.5:1, or no less than 2.75:1, or no less than
3:1, or no less than 3.25:1, or no less than 3.5:1, or no less than
3.75:1, or no less than 4:1. In any of the preceding, p-value
(statistical significance value) satisfies p.ltoreq.0.3, or
p.ltoreq.0.25, or p.ltoreq.0.2, or p.ltoreq.0.15, or p.ltoreq.0.1,
or p.ltoreq.0.05.
[0016] In certain embodiments, the predetermined level of
predictability corresponds to a Receiver Operating Characteristic
(ROC) curve that plots false positive rate (1-specificity) against
true positive rate (sensitivity) having an AUC (area under curve)
of at least 0.70.
[0017] In certain embodiments, the predetermined upper (minimum)
threshold for one or more of the metabolites is a percentile from
85.sup.th to 95.sup.th percentile (e.g., about the 90.sup.th
percentile, or about the 85.sup.th, 86.sup.th, 87.sup.th,
88.sup.th, 89.sup.th, 91.sup.st, 92.sup.nd, 93.sup.rd, 94.sup.th,
or 95.sup.th percentile, rounded to the nearest percentile), and
wherein the predetermined lower (maximum) threshold for one or more
of the metabolites is a percentile from 10.sup.th to 20.sup.th
percentile (e.g., about the 15.sup.th percentile, or about the
10.sup.th, 11.sup.th, 12.sup.th, 13.sup.th, 14.sup.th, 16.sup.th,
17.sup.th, 18.sup.th, 19.sup.th, or 20.sup.th percentile, rounded
to the nearest percentile).
[0018] In certain embodiments, the plurality of metabolites
comprises at least two metabolites selected from the group
consisting of 5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol
(1,5-AG), 3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,
hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,
lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,
pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,
xanthine, hydroxy-chlorothalonil, octenoylcarnitine, and
3-hydroxyhippurate.
[0019] In certain embodiments, the plurality of metabolites
comprises at least two metabolites selected from the group
consisting of phenylacetylglutamine, xanthine, octenoylcarnitine,
p-cresol sulfate, isovalerylglycine, gamma-CEHC, indoleacetate,
pipecolate, 1,5-anhydroglucitol (1,5-AG), lactate,
3-(3-hydroxyphenyl)propionate, 3-indoxyl sulfate, pantothenate
(Vitamin B5), and hydroxy-chlorothalonil.
[0020] In certain embodiments, the plurality of metabolites
comprises at least three metabolites selected from the group
consisting of phenylacetylglutamine, xanthine, octenoylcarnitine,
p-cresol sulfate, isovalerylglycine, gamma-CEHC, indoleacetate,
pipecolate, 1,5-anhydroglucitol (1,5-AG), lactate,
3-(3-hydroxyphenyl)propionate, 3-indoxyl sulfate, pantothenate
(Vitamin B5), and hydroxy-chlorothalonil.
[0021] In certain embodiments, the plurality of metabolites
comprises at least one pair of metabolites selected from the pairs
listed in Table 6.
[0022] In certain embodiments, the plurality of metabolites
comprises at least one triplet of metabolites selected from the
triplets listed in Table 7.
[0023] In certain embodiments, the plurality of metabolites
comprises at least one pair of metabolites that, combined together
as a set of two metabolites, provides an AUC of at least 0.62
(e.g., at least about 0.63, 0.64, or 0.65), where AUC is area under
a ROC curve that plots false positive rate (1-specificity) against
true positive rate (sensitivity) for a classifier based only on the
set of two metabolites.
[0024] In certain embodiments, the plurality of metabolites
comprises at least one triplet of metabolites that, combined
together as a set of three metabolites, provide an AUC of at least
0.66 (e.g., at least about 0.67 or 0.68), where AUC is area under a
ROC curve that plots false positive rate (1-specificity) against
true positive rate (sensitivity) for a classifier based only on the
set of three metabolites.
[0025] In another aspect, the invention is directed to a method of
determining autism spectrum disorder (ASD) risk in a subject, the
method comprising: (i) analyzing the level of a first metabolite of
a plurality of metabolites from a sample obtained from the subject,
the population distribution of the first metabolite being
previously characterized in a reference population of subjects
having known classifications, wherein the first metabolite is
predetermined to exhibit an ASD tail effect comprising an
associated right tail or left tail enriched in ASD members, and
where the first metabolite exhibits an ASD tail effect with a right
tail, the level of the first metabolite in the sample is within the
ASD tail when the level of the first metabolite in the sample is
greater than a predetermined upper (minimum) threshold defining the
right tail enriched in ASD population members, and, where the first
metabolite exhibits an ASD tail effect with a left tail, the level
of the first metabolite in the sample is within the ASD tail when
the level of the first metabolite in the sample is less than a
predetermined lower (maximum) threshold defining the left tail
enriched in ASD population members; (ii) measuring the level of at
least one additional metabolite of the plurality of metabolites
from the sample, the population distribution of each of the at
least one additional metabolite being previously characterized in
the reference population and predetermined to exhibit an ASD tail
effect, and, for each of the at least one additional metabolite,
identifying whether the level of said metabolite in the sample is
within the corresponding ASD tail, according to step (i); and (iii)
determining with a predetermined level of predictability the risk
of the subject having ASD based on the identified ASD tails within
which the sample lies for the metabolites analyzed in step (i) and
step (ii).
[0026] In certain embodiments, the first metabolite is
predetermined to exhibit an ASD tail effect with an associated
upper (minimum) or lower (maximum) threshold, said threshold
predetermined such that the odds that a sample of unknown
classification (a previously uncharacterized sample) meeting this
criteria is ASD as opposed to DD are no less than 1.6:1 with
p.ltoreq.0.3. In certain embodiments, the odds are no less than
2:1, or no less than 2.5:1, or no less than 2.75:1, or no less than
3:1, or no less than 3.25:1, or no less than 3.5:1, or no less than
3.75:1, or no less than 4:1. In any of the preceding, p-value
(statistical significance value) satisfies p.ltoreq.0.3, or
p.ltoreq.0.25, or p.ltoreq.0.2, or p.ltoreq.0.15, or p.ltoreq.0.1,
or p.ltoreq.0.05.
[0027] In certain embodiments, the predetermined level of
predictability corresponds to a Receiver Operating Characteristic
(ROC) curve that plots false positive rate (1-specificity) against
true positive rate (sensitivity) having an AUC (area under curve)
of at least 0.70.
[0028] In certain embodiments, the plurality of metabolites
comprises at least two metabolites selected from the group
consisting of 5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol
(1,5-AG), 3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,
hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,
lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,
pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,
xanthine, hydroxy-chlorothalonil, octenoylcarnitine, and
3-hydroxyhippurate.
[0029] In another aspect, the invention is directed to a method of
determining autism spectrum disorder (ASD) risk in a subject,
comprising: (i) analyzing levels of a plurality of metabolites in a
sample obtained from the subject, the plurality of metabolites
comprising at least two metabolites selected from the group
consisting of 5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol
(1,5-AG), 3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,
hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,
lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,
pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,
xanthine, hydroxy-chlorothalonil, octenoylcarnitine, and
3-hydroxyhippurate; and (ii) determining the risk that the subject
has ASD based on the quantified levels of the plurality of
metabolites.
[0030] In certain embodiments, the subject is no greater than about
54 months of age. In certain embodiments, the subject is no greater
than about 36 months of age.
[0031] In certain embodiments, the plurality of metabolites
comprises at least two metabolites selected from the group
consisting of phenylacetylglutamine, xanthine, octenoylcarnitine,
p-cresol sulfate, isovalerylglycine, gamma-CEHC, indoleacetate,
pipecolate, 1,5-anhydroglucitol (1,5-AG), lactate,
3-(3-hydroxyphenyl)propionate, 3-indoxyl sulfate, pantothenate
(Vitamin B5), and hydroxy-chlorothalonil.
[0032] In certain embodiments, the plurality of metabolites
comprises at least three metabolites selected from the group
consisting of phenylacetylglutamine, xanthine, octenoylcarnitine,
p-cresol sulfate, isovalerylglycine, gamma-CEHC, indoleacetate,
pipecolate, 1,5-anhydroglucitol (1,5-AG), lactate,
3-(3-hydroxyphenyl)propionate, 3-indoxyl sulfate, pantothenate
(Vitamin B5), and hydroxy-chlorothalonil.
[0033] In certain embodiments, the plurality of metabolites
comprises at least one pair of metabolites selected from the pairs
listed in Table 6.
[0034] In certain embodiments, the plurality of metabolites
comprises at least one triplet of metabolites selected from the
triplets listed in Table 7.
[0035] In certain embodiments, the plurality of metabolites
comprises at least one pair of metabolites that, combined together
as a set of two metabolites, provides an AUC of at least 0.62
(e.g., at least about 0.63, 0.64, or 0.65), where AUC is area under
a ROC curve that plots false positive rate (1-specificity) against
true positive rate (sensitivity) for a classifier based only on the
set of two metabolites.
[0036] In certain embodiments, the plurality of metabolites
comprises at least one triplet of metabolites that, combined
together as a set of three metabolites, provide an AUC of at least
0.66 (e.g., at least about 0.67 or 0.68), where AUC is area under a
ROC curve that plots false positive rate (1-specificity) against
true positive rate (sensitivity) for a classifier based only on the
set of three metabolites.
[0037] In certain embodiments, the sample is a plasma sample.
[0038] In certain embodiments, measuring the levels of metabolites
comprises performing mass spectrometry. In certain embodiments,
performing mass spectrometry comprises performing one or more
members selected from the group consisting of pyrolysis mass
spectrometry, Fourier-transform infrared spectrometry, Raman
spectrometry, gas chromatography-mass spectroscopy, high pressure
liquid chromatography/mass spectroscopy (HPLC/MS), liquid
chromatography (LC)-electrospray mass spectroscopy, cap-LC-tandem
electrospray mass spectroscopy, and ultrahigh performance liquid
chromatography/electrospray ionization tandem mass
spectrometry.
[0039] In another aspect, the invention is directed to a method of
differentiating between autism spectrum disorder (ASD) and non-ASD
developmental delay (DD) in a subject, comprising: (i) analyzing
levels of a plurality of metabolites in a sample obtained from the
subject, the plurality of metabolites comprising at least two
metabolites selected from the group consisting of
5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol (1,5-AG),
3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,
hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,
lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,
pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,
xanthine, hydroxy-chlorothalonil, octenoylcarnitine, and
3-hydroxyhippurate, the levels and/or population distributions of
the plurality of metabolites being previously characterized in a
reference population; and
(ii) determining with a predetermined level of predictability that
(a) the subject has ASD and not DD or (b) the subject has DD and
not ASD by comparing the levels of the plurality of metabolites
from the sample from the subject with predetermined thresholds
(e.g., thresholds determined from a reference population of samples
having known classifications).
[0040] In certain embodiments, the invention provides methods for
analyzing metabolites by assigning weights to different metabolites
to reflect their respective functions in risk prediction. In some
embodiments, the weight assignment can be deduced from the
biological functions of the metabolites (e.g., the pathways to
which they belong), their clinical utility, or their significance
from statistical or epidemiology analyses.
[0041] In certain embodiments, the invention provides methods for
measuring metabolites using different techniques, including, but
not limited to, a chromatography assay, a mass spectrometry assay,
a fluorimetry assay, an electrophoresis assay, an immune-affinity
assay, and immunochemical assay.
[0042] In certain embodiments, the invention provides methods for
determining autism spectrum disorder (ASD) risk in a subject,
comprising analyzing levels of a plurality of metabolites from a
sample from the subject; and determining with a predetermined level
of predictability whether the subject has ASD instead of non-ASD
developmental disorders based on the quantified levels of the
plurality of metabolites.
[0043] In certain embodiments, the plurality of metabolites
includes at least one metabolite selected from the group consisting
of 5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol (1,5-AG),
3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,
hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,
lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,
pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,
xanthine, hydroxy-chlorothalonil, octenoylcarnitine,
3-hydroxyhippurate, and combinations thereof
[0044] In certain embodiments, the plurality of metabolites include
at least two metabolites selected from the group consisting of
5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol (1,5-AG),
3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,
hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,
lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,
pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,
xanthine, hydroxy-chlorothalonil, octenoylcarnitine,
3-hydroxyhippurate, and combinations thereof.
[0045] In certain embodiments, the plurality of metabolites
includes at least 3, at least 4, at least 5, at least 6, at least
7, at least 8, at least 9, or at least 10 metabolites selected from
the group consisting of 5-hydroxyindoleacetate (5-HIAA),
1,5-anhydroglucitol (1,5-AG), 3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,
hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,
lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,
pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,
xanthine, hydroxy-chlorothalonil, octenoylcarnitine,
3-hydroxyhippurate, and combinations thereof
[0046] In certain embodiments, the plurality of metabolites
includes additional metabolites. In some embodiments, the plurality
of metabolites includes more than 21 metabolites.
[0047] In certain embodiments, the invention provides methods for
differentiating between autism spectrum disorder (ASD) and non-ASD
developmental disorders in a subject, comprising steps of analyzing
levels of a plurality of metabolites from a sample from the
subject, comparing the levels of the metabolites to their
respective population distributions in one reference population,
and determining with a predetermined level of predictability
whether the subject has ASD instead of non-ASD developmental
disorders by comparing the levels of the plurality of metabolites
from the sample from the subject to the previously-characterized
levels and/or population distributions of the plurality of
metabolites in the reference population.
[0048] For example, in certain embodiments, the invention provides
a diagnostic criterion including at least one metabolite that could
predict the risk of ASD in a subject with ROC curve having an AUC
of at least 0.60, at least 0.65, at least 0.70, at least 0.75, at
least 0.80, at least 0.85 or at least 0.90. AUC is area under a ROC
curve that plots false positive rate (1-specificity) against true
positive rate (sensitivity) for the classifier.
[0049] In certain embodiments, at least one metabolite for analysis
is selected from the group consisting of 5-hydroxyindoleacetate
(5-HIAA), 1,5-anhydroglucitol (1,5-AG),
3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,
hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,
lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,
pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,
xanthine, hydroxy-chlorothalonil, octenoylcarnitine,
3-hydroxyhippurate, and combinations thereof
[0050] In certain embodiments, the at least one metabolite for
analysis comprises at least two or more members (e.g., 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
selected from the group consisting of 5-hydroxyindoleacetate
(5-HIAA), 1,5-anhydroglucitol (1,5-AG),
3-(3-hydroxyphenyl)propionate,
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl
sulfate, 4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,
hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,
lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,
pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,
xanthine, hydroxy-chlorothalonil, octenoylcarnitine, and
3-hydroxyhippurate, in which a non-ASD population distribution
curve and an ASD population distribution curve is established for
each of the metabolites (e.g., each of said metabolites
demonstrating a tail effect).
[0051] In certain embodiments, a metabolite for analysis is
selected from the group consisting of gamma-CEHC, xanthine,
p-cresol sulfate, octenoylcarnitine, phenylacetylglutamine, and
combinations thererof.
[0052] In certain embodiments, a metabolite for analysis is
gamma-CEHC.
[0053] In certain embodiments, a metabolite for analysis is
xanthine.
[0054] In certain embodiments, a metabolite for analysis is
p-cresol sulfate.
[0055] In certain embodiments, a metabolite for analysis is
octenoylcarnitine.
[0056] In certain embodiments, a metabolite for analysis is
phenylacetylglutamine.
[0057] In certain embodiments, a metabolite for analysis is
isovalerylglycine.
[0058] In certain embodiments, a metabolite for analysis is
pipecolate.
[0059] In certain embodiments, a metabolite for analysis is
indoleacetate.
[0060] In certain embodiments, a metabolite for analysis is
octenoylcarnitine.
[0061] In certain embodiments, a metabolite for analysis is
hydroxy-chlorothalonil.
[0062] In certain embodiments, the plurality of metabolites
comprises at least a first metabolite and a second metabolite that
are complementary (e.g., ASD tail samples for the first and second
metabolites are substantially non-overlapping such that the
predictors provided by the metabolites are partially disjoint and
have low mutual information. In certain embodiments, risk
prediction improves as multiple metabolites are incorporated with
low mutual information.
[0063] In certain embodiments, the plurality of metabolites
comprises two metabolites, wherein the two metabolites combined
together as a set of two metabolites provide an AUC of at least
0.62, 0.63, 0.64, or 0.65.
[0064] In certain embodiments, the plurality of metabolites
comprises three metabolites, wherein the three metabolites combined
together as a set of three metabolites provide an AUC of at least
0.66, 0.67, or 0.68.
[0065] In certain embodiments, the invention provides methods of
differentiating between autism spectrum disorder (ASD) and a
non-ASD developmental disorder in a subject, by analyzing levels of
two groups of previously defined metabolites. In certain
embodiments, the first group of metabolites represents metabolites
that are closely associated with ASD, while the second group of
metabolites represents those that are associated with a control
condition (e.g., DD). By analyzing both groups of metabolites from
a sample from a subject, the risk of the subject having ASD instead
of the control condition can be determined by a variety of methods
described in the present disclosure. For example, this can be
achieved by comparing the aggregated ASD tail effects for the first
group of metabolites to the aggregated non-ASD tail effects for the
second group of metabolites.
[0066] In certain embodiments, the invention provides methods for
determining ASD risk in a subject by measuring both levels of
certain metabolites and genetic information from the subject. In
some embodiments, the genetic information includes copy number
variation (CNVs), and/or Fragile X (FXS) testing.
[0067] In additional embodiments, limitations described with
respect to certain aspects of the invention can be applied to other
aspects of the invention. For example, the limitations of a claim
depending from one independent claim may, in some embodiments, be
applied to another independent claim.
BRIEF DESCRIPTION OF THE DRAWINGS
[0068] FIG. 1 illustrates the distribution of an exemplary
metabolite in two populations (e.g., ASD and DD), and the mean
shift of this metabolite between these two populations.
[0069] FIG. 2 illustrates the distribution of an exemplary
metabolite in two populations (e.g., ASD and DD), and a tail effect
(e.g., the ASD distribution has a more densely populated tail) of
this metabolite between these two populations.
[0070] FIG. 3 illustrates the distribution of the metabolite
5-hydroxyindoleacetate, in two populations (e.g., ASD and DD),
which exhibits a statistically significant mean shift (t-test;
p<0.01) and a statistically significant right tail effect
between the two populations. (extremes' signifies tail effect,
Fisher's test; p=0.001)
[0071] FIG. 4 illustrates the distribution of the metabolite,
gamma-CEHC, in two populations (e.g., ASD and DD), which exhibits a
statistically significant left tail effect between the two
populations. (extremes' signifies tail effect, Fisher's test;
p=0.008)
[0072] FIG. 5 illustrates the distribution of the metabolite,
phenylacetylglutamine, in two populations (e.g., ASD and DD), which
exhibits a statistically significant mean shift (t-test; p=0.001),
and statistically significant left and right tail effects between
the two populations (extremes' signifies tail effect, Fisher's
test; p=0.0001). The distributions appear as shifted Gaussian
curves in the two populations.
[0073] FIG. 6 illustrates the correlation of two exemplary
metabolites and demonstrates that these two metabolites possess
distinct profiles of tail effects and are complementary.
[0074] FIG. 7 illustrates the tail effects of 12 exemplary
metabolites in 180 subjects, and their predictive power for ASD and
DD.
[0075] FIG. 8A illustrates a plot of ASD and non-ASD tail effects
for 180 samples using an exemplary 12-metabolite panel,
demonstrating that samples from ASD patients show aggregation of
ASD tail effects.
[0076] FIG. 8B illustrates a plot of ASD and non-ASD tail effects
for 180 samples using an exemplary 12-metabolite panel, and an
exemplary method of binning the data.
[0077] FIG. 8C illustrates a plot of ASD and non-ASD tail effects
for 180 samples using an exemplary 21-metabolite panel,
demonstrating that samples from ASD patients show aggregation of
ASD tail effects.
[0078] FIG. 9 illustrates increases in the predictability of ASD
for an exemplary 12-metabolite panel as the number of metabolites
assessed increases.
[0079] FIG. 10A illustrates the effects of trichotomization on the
predictability of ASD using an exemplary 12-metabolite panel.
[0080] FIG. 10B illustrates the effects of trichotomization on the
predictability of ASD in the analysis of an exemplary 21-metabolite
panel.
[0081] FIG. 11A illustrates an improvement in the predictability of
ASD using voting methods compared to a non-voting method for
analysis of an exemplary 12-metabolite panel.
[0082] FIG. 11B illustrates an improvement in the predictability of
ASD using voting method compared to non-voting method using an
exemplary 21-metabolite panel.
[0083] FIG. 12 illustrates the validation process for using an
exemplary 12-metabolite panel to achieve a high predictability of
ASD.
[0084] FIGS. 13A-13U illustrate the population distribution of 21
exemplary metabolites in an ASD population and a non-ASD
population.
[0085] FIGS. 14A-B illustrate the effects on the predictability of
ASD by the inclusion and exclusion of an exemplary 12-metabolite
panel, an exemplary 21-metabolite panel, and a set of 84 candidate
metabolites from a total number of 600 metabolites, as assessed by
tail effect analysis and mean shift analysis. (Blacklist=excluded,
Whitelist=included, mx.sub.--12=exemplary 12-metabolite panel,
mx_targeted 21=exemplary 21-metabolite panel, mx_all_candidates=84
candidate metabolites, all features=total set of 600
metabolites)
[0086] FIGS. 14C-D illustrate the effects on the predictability of
ASD by the by the inclusion (whitelists) and exclusion (blacklists)
of an exemplary 12-metabolite panel and an exemplary 21-metabolite
panel from a total number of 600 metabolites as assessed by tail
effect analysis and mean shift analysis, and by comparing logistic
regression to Bayes analysis, in two cohorts of samples (i.e.,
"Christmas" and "Easter"). (Blacklist=excluded, Whitelist=included,
mx.sub.--12=exemplary 12 metabolite panel, mx_targeted 21=exemplary
21-metabolite panel, mx_all_candidates=84 candidate metabolites,
all features=total set of 600 metabolites)
[0087] FIG. 15 illustrates the effects on the predictability of ASD
by using an increasing number of metabolites selected from subsets
of an exemplary 21-metabolite panel.
[0088] FIG. 16A illustrates the effects of adding genetic
information to the tail effect analysis using an exemplary
12-metabolite panel, demonstrating improved power of separating ASD
from non-ASD.
[0089] FIG. 16B illustrates the effects of adding genetic
information to the tail effect analysis using an exemplary
21-metabolite panel, demonstrating improved power of separating ASD
from non-ASD.
[0090] FIGS. 17A-B illustrate the effects on the predictability of
ASD by the inclusion and exclusion of an exemplary 21-metabolite
panel from the total number metabolites, by comparing tail effect
analysis to mean shift analysis, and by comparing logistic
regression to Bayes analysis. (Blacklist=excluded,
Whitelist=included, mx.sub.--12=exemplary 12-metabolite panel,
mx_targeted 21=exemplary 21-metabolite panel, mx_all_candidates=84
candidate metabolites, all features=total set of 600
metabolites)
[0091] FIGS. 18A-B illustrate the effects on the predictability of
ASD by the inclusion and exclusion of an exemplary 21-metabolite
panel from the total number metabolites, by comparing tail effect
analysis to mean shift analysis, and by using logistic regression
in two cohorts (i.e., "Christmas" and "Easter").
(Blacklist=excluded, Whitelist=included, mx.sub.--12=exemplary
12-metabolite panel, mx_targeted 21=exemplary 21-metabolite panel,
mx_all_candidates=84 candidate metabolites, all features=total set
of 600 metabolites)
[0092] FIGS. 19A-D illustrate the effects on the predictability of
ASD by the inclusion and exclusion of an exemplary 21-metabolite
panel from the total number metabolites, by comparing tail effect
analysis to mean shift analysis, and by comparing logistic
regression to Bayes analysis using either the "Christmas" cohort,
or the "Easter" cohort, or both combined. (Blacklist=excluded,
Whitelist=included, mx.sub.--12=exemplary 12-metabolite panel,
mx_targeted 21=exemplary 21-metabolite panel, mx_all_candidates=84
candidate metabolites, all features=total set of 600
metabolites)
[0093] FIG. 20 illustrates a representative plot of the specificity
and sensitivity of tail effect analysis for an exemplary
21-metabolite panel for prediction of ASD.
DEFINITIONS
[0094] In order for the present invention to be more readily
understood, certain terms are first defined below. Additional
definitions for the following terms and other terms are set forth
throughout the specification.
[0095] In this application, unless otherwise clear from context,
(i) the term "a" may be understood to mean "at least one"; (ii) the
term "or" may be understood to mean "and/or"; (iii) the terms
"comprising" and "including" may be understood to encompass
itemized components or steps whether presented by themselves or
together with one or more additional components or steps; and (iv)
the terms "about" and "approximately" may be understood to permit
standard variation as would be understood by those of ordinary
skill in the art; and (v) where ranges are provided, endpoints are
included.
[0096] Agent: The term "agent" as used herein may refer to a
compound or entity of any chemical class including, for example,
polypeptides, nucleic acids, saccharides, lipids, small molecules,
metals, or combinations thereof
[0097] Approximately: As used herein, the term "approximately" and
"about" is intended to encompass normal statistical variation as
would be understood by those of ordinary skill in the art as
appropriate to the relevant context. In certain embodiments, the
term "approximately" or "about" refers to a range of values that
fall within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%,
10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either
direction (greater than or less than) of the stated reference value
unless otherwise stated or otherwise evident from the context
(except where such number would exceed 100% of a possible
value).
[0098] Area under curve (AUC): A classifier has an associated ROC
curve (Receiver Operating Characteristic curve) that plots false
positive rate (1-specificity) against true positive rate
(sensitivity). The area under the ROC curve (AUC) is a measure of
how well the classifier can distinguish between two diagnostic
groups. A perfect classifier has an AUC of 1.0, as compared with a
random classifier, which has an AUC of 0.5.
[0099] Associated with: Two events or entities are "associated"
with one another, as that term is used herein, if the presence,
level and/or form of one is correlated with that of the other. For
example, a particular entity is considered to be associated with a
particular disease, disorder, or condition, if its presence, level
and/or form correlates with incidence of and/or susceptibility of
the disease, disorder, or condition (e.g., across a relevant
population).
[0100] Autism spectrum disorder: As used herein, the term "autistic
spectrum disorder" is recognized by those of skill in the art to
refer to a developmental disorder on the autism "spectrum"
characterized by one or more of reciprocal social interaction
deficits, language difficulties, repetitive behaviors and
restrictive interests. Autism spectrum disorder has been
characterized in the DSM-V (May 2013) as a disorder comprising a
continuum of symptoms including, for example, communication
deficits, such as responding inappropriately in conversations,
misreading nonverbal interactions, difficulty building friendships
appropriate to age, overdependence on routines, highly sensitive to
changes in their environment, and/or intensely focused on
inappropriate items. Autism spectrum disorder has additionally been
characterized, for example, by DSM-IV-TR, to be inclusive of
Autistic Disorder, Asperger's Disorder, Rett's Disorder, Childhood
Disintegrative Disorder, and Pervasive Developmental Disorder Not
Otherwise Specified (including Atypical Autism). In some
embodiments, autism spectrum disorder (ASD) is characterized using
standardized testing instruments such as questionnaires and
observation schedules. For example, in some embodiments, ASD is
characterized by (i) a score meeting the cutoff for autism on
Communication plus Social Interaction Total in the Austism
Diagnostic Observation Schedule (ADOS) and a score meeting the
cutoff value on Social Interaction, Communication, Patterns of
Behavior, and Abnormality of Development at .ltoreq.36 months in
Autism Diagnostic Interview-Revised (ADI-R); and/or (ii) a score
meeting the ASD cutoff on Communication and Social Interaction
Total in ADOS and a score meeting the cutoff value on Social
Interaction, Communication, Patterns of Behavior, and Abnormality
of Development at .ltoreq.36 months in ADI-R and (ii)(a) a score
meeting the cutoff value for Social Interaction and Communication
in ADI-R or (ii)(b) a score meeting the cutoff value for Social
Interaction or Communication and within 2 points of the cutoff
value on Social Interaction or Communication (whichever did not
meet the cutoff value) in ADI-R or (ii)(c) a score is within 1
point of cutoff value for Social Interaction and Communication in
ADI-R.
[0101] Classification: As used herein, "classification" is the
process of learning to separate data points into different classes
by finding common features between collected data points which are
within known classes and then using mathematical methods or other
methods to assign data points to one of the different classes. In
statistics, classification is the problem of identifying the
sub-population to which new observations belong, where the identity
of the sub-population is unknown, on the basis of a training set of
data containing observations whose sub-population is known. Thus
the requirement is that new individual items are placed into groups
based on quantitative information on one or more measurements,
traits or characteristics, etc., and based on the training set in
which previously decided groupings are already established.
Classification has many applications. In some cases, it is employed
as a data mining procedure, while in others more detailed
statistical modeling is undertaken.
[0102] Classifier: As used herein, a "classifier" is a method,
algorithm, computer program, or system for performing data
classification. Examples of widely used classifiers include, but
are not limited to, the neural network (multi-layer perceptron),
logistic regression, support vector machines, k-nearest neighbors,
Gaussian mixture model, Gaussian naive Bayes, decision tree,
partial-least-squares determinant analysis (PSL-DA), Fisher's
linear discriminant, Logistic regression, Naive Bayes classifier,
Perceptron, support vector machines, quadratic classifiers, Kernet
estimation, Boosting, Neural networks, Bayesian networks, Hidden
Markov models, and Learning vector quantization.
[0103] Determine: Many methodologies described herein include a
step of "determining". Those of ordinary skill in the art, reading
the present specification, will appreciate that such "determining"
can utilize or be accomplished through use of any of a variety of
techniques available to those skilled in the art, including for
example specific techniques explicitly referred to herein. In some
embodiments, determining involves manipulation of a physical
sample. In some embodiments, determining involves consideration
and/or manipulation of data or information, for example utilizing a
computer or other processing unit adapted to perform a relevant
analysis. In some embodiments, determining involves receiving
relevant information and/or materials from a source. In some
embodiments, determining involves comparing one or more features of
a sample or entity to a comparable reference.
[0104] Determining risk: As used herein, determining risk includes
calculating or quantifying a probability that a given subject has,
or does not have, a particular condition or disorder. In some
embodiments, a positive or negative diagnosis for a disorder or
condition, for example, autism spectrum disorder (ASD) or
developmental delay (DD) may be made based in whole or in part on a
determined risk or risk score (e.g., an odds ratio, or range).
[0105] Developmental delay: As used herein, the phrase
developmental delay (DD) refers to ongoing major or minor delay in
one or more processes of child development, including, for example,
physical development, cognitive development, communication
development, social or emotional development, or adaptive
development that is not due to autism spectrum disorder. Even
though an individual with ASD may be considered to be
developmentally delayed, the classification of ASD as used herein
will be considered to trump that of DD such that the
classifications of ASD and DD are mutually exclusive. In other
words, unless indicated otherwise, the classification of DD is
assumed to mean non-ASD developmental delay. In some embodiments,
DD is characterized by non-autism (AU) and non-ASD, yet with (i)
score of 69 or lower on a Mullen Scale, score of 69 or lower on
Vineland Scale, and score of 14 or lower on SCQ, or (ii) score of
69 or lower on either Mullen or Vineland and within half a standard
deviation of cutoff value on the other assessment (score 77 or
lower).
[0106] Diagnostic information: As used herein, diagnostic
information or information for use in diagnosis is any information
that is useful in determining whether a patient has a disease or
condition and/or in classifying the disease or condition into a
phenotypic category or any category having significance with regard
to prognosis of the disease or condition, or likely response to
treatment (either treatment in general or any particular treatment)
of the disease or condition. Similarly, diagnosis refers to
providing any type of diagnostic information, including, but not
limited to, whether a subject is likely to have a disease or
condition (such as autism spectrum disorder), state, staging or
characteristic of the disease or condition as manifested in the
subject, information related to the nature or classification of the
disorder, information related to prognosis and/or information
useful in selecting an appropriate treatment. Selection of
treatment may include the choice of a particular therapeutic agent
or other treatment modality such as behavioral therapy, diet
modification, etc., a choice about whether to withhold or deliver
therapy, a choice relating to dosing regimen (e.g., frequency or
level of one or more doses of a particular therapeutic agent or
combination of therapeutic agents), etc.
[0107] Marker: A marker, as used herein, refers to an agent whose
presence or level is associated with, or has a correlation to, a
particular disease or condition. Alternatively or additionally, in
some embodiments, a presence or level of a particular marker
correlates with activity (or activity level) of a particular
signaling pathway, for example that may be characteristic of a
particular disorder. The marker may or may not play an etiological
role in the disease or condition. The statistical significance of
the presence or absence of a marker may vary depending upon the
particular marker. In some embodiments, detection of a marker is
highly specific in that it reflects a high probability that the
disorder is of a particular subclass. According to the present
invention a useful marker need not distinguish disorders of a
particular subclass with 100% accuracy.
[0108] Metabolite: As used herein, the term metabolite refers to a
substance produced during a bodily chemical or physical process.
The term "metabolite" includes any chemical or biochemical product
of a metabolic process, such as any compound produced by the
processing, cleavage or consumption of a biological molecule.
Examples of such molecules include, but are not limited to: acids
and related compounds; mono-, di-, and tri-carboxylic acids
(saturated, unsaturated aliphatic and cyclic, aryl, alkaryl);
aldo-acids, keto-acids; lactone forms; gibbereillins; abscisic
acid; alcohols, polyols, derivatives, and related compounds; ethyl
alcohol, benzyl alcohol, menthanol; propylene glycol, glycerol,
phytol; inositol, furfuryl alcohol, menthol; aldehydes, ketones,
quinones, derivatives, and related compounds; acetaldehyde,
butyraldehyde, benzaldehyde, acrolein, furfural, glyoxal; acetone,
butanone; anthraquinone; carbohydrates; mono-, di-,
tri-saccharides; alkaloids, amines, and other bases; pyridines
(including nicotinic acid, nicotinamide); pyrimidines (including
cytidine, thymine); purines (including guanine, adenine,
xanthines/hypoxanthines, kinetin); pyrroles; quinolines (including
isoquinolines); morphinans, tropanes, cinchonans; nucieotides,
oligonucleotides, derivatives, and related compounds; guanosine,
cytosine, adenosine, thymidine, inosine; amino acids, oligopepides,
derivatives, and related compounds; esters; phenols and related
compounds; heterocyclic compounds and derivatives; pyrroles,
tetrapyrroles (corrinoids and porphines/porphyrins, w/w/o
metal-ion); flavonoids; indoles; lipids (including fatty acids and
triglycerides), derivatives, and related compounds; carotenoids,
phytoene; and sterols, isoprenoids including terpenes; and modified
version of the above molecules. In some embodiments, a metabolite
is the product of metabolism of an endogenous substance. In some
embodiments, a metabolite is the product of metabolism of an
exogenous substance. In some embodiments, a metabolite is the
product of metabolism of an endogenous substance and an exogenous
substance. As used herein, the term "metabolome" refers to the
chemical profile or fingerprint of the metabolites in a bodily
fluid, a cell, a tissue, an organ, or an organism.
[0109] Metabolite distribution curve: As used herein, a metabolite
distribution curve is a probability distribution curve defined by a
function derived from metabolite level plotted against population
density (e.g., ASD or DD). In some embodiments, the distribution
curve is a standard curve fit of the data. In some embodiments, the
distribution curve is a least squares polynomial curve fit. In some
embodiments, the distribution curve is asymmetric, or non-Gaussian.
In some embodiments, the distribution curve is simply a plot of
cases with associated diagnostic category vs. metabolite values
(e.g., a `rug plot`), where there is no curve fit.
[0110] Mutual information: As used herein, mutual information
refers to a measure of the mutual dependence of two variables
(i.e., a degree to which knowing one variable reduces uncertainty
about another variable.) High mutual information indicates a large
reduction in uncertainty; low mutual information indicates a small
reduction; and zero mutual information between two random variables
means the variables are independent.
[0111] Non-autism spectrum disorder (Non-ASD): As used herein,
non-autism spectrum disorder (non-ASD) refers to a classification
that is not of a child or adult with an autistic spectrum disorder.
In some embodiments, "non-ASD" is normally developing subjects. In
some embodiments, a non-ASD population consists of or comprises
subjects with developmental delay (DD). In some embodiments,
"non-ASD" consists of or comprises both DD and normally developing
subjects.
[0112] Patient: As used herein, the term "patient" or "subject"
refers to any organism to which a test or composition is or may be
administered, e.g., for experimental, diagnostic, prophylactic,
and/or therapeutic purposes. In some embodiments, a patient is
suffering from or susceptible to one or more disorders or
conditions. In some embodiments, a patient displays one or more
symptoms of a disorder or condition. In some embodiments, a patient
is suspected to have one or more disorders or conditions.
[0113] Predictability: As used herein, predictability refers to the
degree to which a correct prediction or forecast of a subject's
disease status can be made either qualitatively or quantitatively.
Perfect predictability implies strict determinism, but lack of
predictability does not necessarily imply lack of determinism.
Limitations on predictability could be caused by factors such as a
lack of information or excessive complexity.
[0114] Prognostic and predictive information: As used herein, the
terms prognostic and predictive information are used
interchangeably to refer to any information that may be used to
indicate any aspect of the course of a disease or condition either
in the absence or presence of treatment. Such information may
include, but is not limited to, the likelihood that a patient will
be cured of a disease, the likelihood that a patient's disease will
respond to a particular therapy (wherein response may be defined in
any of a variety of ways). Prognostic and predictive information
are included within the broad category of diagnostic
information.
[0115] Reference: The term "reference" is often used herein to
describe a standard or control agent, individual, population,
sample, sequence or value against which an agent, individual,
population, sample, sequence or value of interest is compared. In
some embodiments, a reference agent, individual, population,
sample, sequence or value is tested and/or determined substantially
simultaneously with the testing or determination of the agent,
individual, population, sample, sequence or value of interest. In
some embodiments, a reference agent, individual, population,
sample, sequence or value is a historical reference, optionally
embodied in a tangible medium. Typically, as would be understood by
those skilled in the art, a reference agent, individual,
population, sample, sequence or value is determined or
characterized under conditions comparable to those utilized to
determine or characterize the agent, individual, population,
sample, sequence or value of interest.
[0116] Regression analysis: As used herein, "regression analysis"
includes any techniques for modeling and analyzing several
variables, when the focus is on the relationship between a
dependent variable and one or more independent variables. More
specifically, regression analysis helps understand how the typical
value of the dependent variable changes when any one of the
independent variables is varied, while the other independent
variables are held fixed. Most commonly, regression analysis
estimates the conditional expectation of the dependent variable
given the independent variables--that is, the average value of the
dependent variable when the independent variables are held fixed.
Less commonly, the focus is on a quantile, or other location
parameter of the conditional distribution of the dependent variable
given the independent variables. In all cases, the estimation
target is a function of the independent variables called the
regression function. In regression analysis, it is also of interest
to characterize the variation of the dependent variable around the
regression function, which can be described by a probability
distribution. Regression analysis is widely used for prediction and
forecasting, where its use has substantial overlap with the field
of machine learning. Regression analysis is also used to understand
which among the independent variables are related to the dependent
variable, and to explore the forms of these relationships. In
restricted circumstances, regression analysis can be used to infer
causal relationships between the independent and dependent
variables. A large body of techniques for carrying out regression
analysis has been developed. Familiar methods such as linear
regression and ordinary least squares regression are parametric, in
that the regression function is defined in terms of a finite number
of unknown parameters that are estimated from the data.
Nonparametric regression refers to techniques that allow the
regression function to lie in a specified set of functions, which
may be infinite-dimensional.
[0117] Risk: As will be understood from context, a "risk" of a
disease, disorder or condition is a degree of likelihood that a
particular individual will be diagnosed with or will develop the
disease, disorder, or condition. In some embodiments, risk is
expressed as a percentage. In some embodiments, risk is from 0,1,
2, 3, 4, 5, 6, 7, 8, 9, or 10 up to 100%. In some embodiments risk
is expressed as a risk relative to a risk associated with a
reference sample or group of reference samples. In some
embodiments, a reference sample or group of reference samples have
a known risk of a disease, disorder, or condition. In some
embodiments, a reference sample or group of reference samples are
from individuals comparable to a particular individual. In some
embodiments, relative risk is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or
more. In some embodiment, relative risk can be expressed as
Relative Risk (RR) or Odds Ratio (OR).
[0118] Sample: As used herein, the term "sample" typically refers
to a biological sample obtained or derived from a source of
interest, as described herein. In some embodiments, a source of
interest comprises an organism, such as an animal or human. In some
embodiments, a biological sample is or comprises biological tissue
or fluid. In some embodiments, a biological sample may be or
comprise bone marrow; blood; plasma; serum; blood cells; ascites;
tissue or fine needle biopsy samples; cell-containing body fluids;
free floating nucleic acids; sputum; saliva; urine; cerebrospinal
fluid, peritoneal fluid; pleural fluid; feces; lymph; gynecological
fluids; skin swabs; vaginal swabs; oral swabs; nasal swabs;
washings or lavages such as a ductal lavages or broncheoalveolar
lavages; aspirates; scrapings; bone marrow specimens; tissue biopsy
specimens; surgical specimens; feces, other body fluids,
secretions, and/or excretions; and/or cells therefrom, etc. In some
embodiments, a biological sample is or comprises cells obtained
from an individual. In some embodiments, obtained cells are or
include cells from an individual from whom the sample is obtained.
In some embodiments, a sample is a "primary sample" obtained
directly from a source of interest by any appropriate means. For
example, in some embodiments, a primary biological sample is
obtained by methods selected from the group consisting of biopsy
(e.g., fine needle aspiration or tissue biopsy), surgery,
collection of body fluid (e.g., blood, lymph, feces etc.), etc. In
some embodiments, as will be clear from context, the term "sample"
refers to a preparation that is obtained by processing (e.g., by
removing one or more components of and/or by adding one or more
agents to) a primary sample. For example, filtering using a
semi-permeable membrane. Such a "processed sample" may comprise,
for example nucleic acids or proteins extracted from a sample or
obtained by subjecting a primary sample to techniques such as
amplification or reverse transcription of mRNA, isolation and/or
purification of certain components, etc.
[0119] Subject: By "subject" is meant a mammal (e.g., a human, in
some embodiments including prenatal human forms). In some
embodiments, a subject is suffering from a relevant disease,
disorder or condition. In some embodiments, a subject is
susceptible to a disease, disorder, or condition. In some
embodiments, a subject displays one or more symptoms or
characteristics of a disease, disorder or condition. In some
embodiments, a subject does not display any symptom or
characteristic of a disease, disorder, or condition. In some
embodiments, a subject is someone with one or more features
characteristic of susceptibility to or risk of a disease, disorder,
or condition. A subject can be a patient, which refers to a human
presenting to a medical provider for diagnosis or treatment of a
disease. In some embodiments, a subject is an individual to whom
therapy is administered.
[0120] Substantially: As used herein, the term "substantially"
refers to the qualitative condition of exhibiting total or
near-total extent or degree of a characteristic or property of
interest. One of ordinary skill in the biological arts will
understand that biological and chemical phenomena rarely, if ever,
go to completion and/or proceed to completeness or achieve or avoid
an absolute result. The term "substantially" is therefore used
herein to capture the potential lack of completeness inherent in
many biological and chemical phenomena.
[0121] Suffering from: An individual who is "suffering from" a
disease, disorder, or condition has been diagnosed with and/or
exhibits or has exhibited one or more symptoms or characteristics
of the disease, disorder, or condition.
[0122] Susceptible to: An individual who is "susceptible to" a
disease, disorder, or condition is at risk for developing the
disease, disorder, or condition. In some embodiments, such an
individual is known to have one or more susceptibility factors that
are statistically correlated with increased risk of development of
the relevant disease, disorder, and/or condition. In some
embodiments, an individual who is susceptible to a disease,
disorder, or condition does not display any symptoms of the
disease, disorder, or condition. In some embodiments, an individual
who is susceptible to a disease, disorder, or condition has not
been or not yet been diagnosed with the disease, disorder, and/or
condition. In some embodiments, an individual who is susceptible to
a disease, disorder, or condition is an individual who has been
exposed to conditions associated with development of the disease,
disorder, or condition. In some embodiments, a risk of developing a
disease, disorder, and/or condition is a population-based risk
(e.g., family members of individuals suffering from allergy,
etc.)
[0123] Tail enrichment and tail effect: As used herein, the terms
"tail enrichment" or "tail effect" refer to a
classification-enhancing property exhibited by a metabolite (or
other analyte) that has a relatively high concentration of samples
from a particular population at a distal portion of a distribution
curve of metabolite levels. An "upper tail" or "right tail" refers
to a distal portion of a distribution curve that is greater than
the mean. A "lower tail" or "left tail" refers to a distal portion
of a distribution curve that is lower than the mean. In some
embodiments, a tail is determined by a predetermined threshold
value based on ranking. For example, a sample is designated to be
within a tail if its measurement for a certain metabolite is higher
than the value corresponding to a percentile from 85.sup.th to
95.sup.th (e.g., 90.sup.th) in a population for that metabolite, or
is lower than the value corresponding to a percentile from
10.sup.th to 20.sup.th (e.g., 15.sup.th) in the population for that
metabolite.
[0124] Therapeutic agent: As used herein, the phrase "therapeutic
agent" refers to any agent that has a therapeutic effect and/or
elicits a desired biological and/or pharmacological effect, when
administered to a subject. In some embodiments, an agent is
considered to be a therapeutic agent if its administration to a
relevant population is statistically correlated with a desired or
beneficial therapeutic outcome in the population, whether or not a
particular subject to whom the agent is administered experiences
the desired or beneficial therapeutic outcome.
[0125] Training set: As used herein, a "training set" is a set of
data used in various areas of information science to discover
potentially predictive relationships. Training sets are used in
artificial intelligence, machine learning, genetic programming,
intelligent systems, and statistics. In all these fields, a
training set has much the same role and is often used in
conjunction with a test set.
[0126] Test set: As used herein, a "test set" is a set of data used
in various areas of information science to assess the strength and
utility of a predictive relationship. Test sets are used in
artificial intelligence, machine learning, genetic programming,
intelligent systems, and statistics. In all these fields, a test
set has much the same role.
[0127] Treatment: As used herein, the term "treatment" (also
"treat" or "treating") refers to any administration of a substance
or therapy (e.g., behavioral therapy) that partially or completely
alleviates, ameliorates, relieves, inhibits, delays onset of,
reduces severity of, and/or reduces frequency, incidence or
severity of one or more symptoms, features, and/or causes of a
particular disease, disorder, and/or condition. Such treatment may
be of a subject who does not exhibit signs of the relevant disease,
disorder and/or condition and/or of a subject who exhibits only
early signs of the disease, disorder, and/or condition.
Alternatively or additionally, such treatment may be of a subject
who exhibits one or more established signs of the relevant disease,
disorder and/or condition. In some embodiments, treatment may be of
a subject who has been diagnosed as suffering from the relevant
disease, disorder, and/or condition. In some embodiments, treatment
may be of a subject known to have one or more susceptibility
factors that are statistically correlated with increased risk of
development of the relevant disease, disorder, and/or
condition.
DETAILED DESCRIPTION
[0128] The present invention provides methods and systems for
determining risk of autism spectrum disorder (ASD) in a subject
based on specific analysis of metabolite levels in a sample, e.g.,
a blood sample or a plasma sample. Various aspects of the invention
are described in detail in the following sections. The use of
sections and headers is not meant to limit the invention. Each
section can apply to any aspect of the invention. In this
application, the use of "or" means "and/or" unless otherwise
apparent.
Autism Spectrum Disorder
[0129] Criteria for a clinical diagnosis of autism spectrum
disorder (ASD) has been set forth in the Diagnostics and
Statistical Manual of Mental Disorders, version 5 (DSM-V, published
in May 2013).
[0130] ASD has additionally been characterized, for example, by
DSM-IV-TR, to be inclusive of Autistic Disorder, Asperger's
Disorder, Rett's Disorder, Childhood Disintegrative Disorder, and
Pervasive Developmental Disorder Not Otherwise Specified (including
Atypical Autism).
[0131] In some embodiments, ASD is characterized by (i) a score
meeting the cutoff for autism on Communication plus Social
Interaction Total in ADOS and a score meeting the cutoff value on
Social Interaction, Communication, Patterns of Behavior, and
Abnormality of Development at .ltoreq.36 months in ADI-R; and/or
(ii) a score meeting the ASD cutoff on Communication and Social
Interaction Total in ADOS and a score meeting the cutoff value on
Social Interaction, Communication, Patterns of Behavior, and
Abnormality of Development at .ltoreq.36 months in ADI-R and
(ii)(a) a score meeting the cutoff value for Social Interaction and
Communication in ADI-R or (ii)(b) a score meeting the cutoff value
for Social Interaction or Communication and within 2 points of the
cutoff value on Social Interaction or Communication (whichever did
not meet the cutoff value) in ADI-R or (ii)(c) a score is within 1
point of cutoff value for Social Interaction and Communication in
ADI-R.
Developmental Delay
[0132] Development delay is a major or minor delay in one or more
processes of child development, including, for example, physical
development, cognitive development, communication development,
social or emotional development, or adaptive development that is
not due to ASD. In some embodiments, DD is characterized by
non-Autism (AU) and non-ASD with (i) score of 69 or lower on a
Mullen Scale, score of 69 or lower on Vineland Scale, and score of
14 or lower on SCQ, or (ii) score of 69 or lower on either Mullen
or Vineland and within half a standard deviation of cutoff value on
the other assessment (score 77 or lower). Even though an individual
with ASD may be considered to be developmentally delayed, the
classification of ASD as used herein will be considered to trump
that of DD such that the classifications of ASD and DD are mutually
exclusive.
Risk Assessment of ASD
[0133] Children who present with symptoms of impaired language,
behavioral, or social development are often seen by clinicians,
most commonly in a primary care setting, who are unable to
determine whether that child has ASD, or some other condition,
disorder, or classification (e.g., DD). It is difficult to diagnose
children, particularly at an age prior to extensive language
development, and many primary care physicians do not have the
ability or resources to make a differential diagnosis of their
patients. For example, ASD may not be easily distinguished from
other developmental disorders, conditions, or classifications, such
as DD.
[0134] It is useful to assess risk of ASD in a subject (including
probability of non-ASD and DD), and to differentiate ASD from DD.
Risk assessment of ASD provides opportunities for early
intervention and treatment. For example, a non-specialist physician
may use ASD risk assessment to initiate a referral to a specialist.
A specialist may use ASD risk assessment to prioritize further
evaluation of patients. Assessment of ASD risk may also be used to
establish a provisional diagnosis, prior to a final diagnosis,
during which time facilitative services can be provided to a high
risk child and his or her family.
[0135] Described herein are methods for determining risk of ASD in
a subject. In some embodiments, determining ASD risk includes
determining that the subject has a greater than about a 50% chance
of having ASD. In some embodiments, determining ASD risk includes
determining the subject has a greater than about 60%, 65%, 70%,
74%, 80%, 85%, 90%, 95%, or 98% chance of having ASD. In some
embodiments determining ASD risk includes determining that a
subject has ASD. In some embodiments, determining ASD risk includes
determining that a subject does not have ASD (i.e., non-ASD).
[0136] In some embodiments, the invention provides methods for
differentiating ASD from a non-ASD classification (e.g., DD) in a
subject. In some embodiments, differentiating ASD from the non-ASD
classification/condition includes determining the subject has a
greater than about 60%, 65%, 70%, 74%, 80%, 85%, 90%, 95%, or 98%
chance of having ASD instead of the non-ASD classification (i.e.,
chance of having ASD and not having the non-ASD classification). In
some embodiments, the non-ASD classification is DD. In some
embodiments, the non-ASD classification is "normal".
[0137] In some embodiments, the invention provides methods for
determining that a subject does not have either ASD or DD.
Analytical Methods
[0138] Described herein are methods for assessing ASD risk, or
differentiating ASD from other non-ASD developmental disorders. In
some embodiments, the risk assessment is based (at least in part)
on measurement and characterization of metabolites in a sample from
a subject, e.g., a blood sample. In some embodiments, a plasma
sample is derived from the blood sample, and the plasma sample is
analyzed.
[0139] Metabolites can be detected in a variety of ways, including
assays based on chromatography and/or mass spectrometry,
fluorimetry, electrophoresis, immune-affinity, hybridization,
immunochemistry, ultra-violet spectroscopy (UV), fluorescence
analysis, radiochemical analysis, near-infrared spectroscopy
(nearlR), nuclear magnetic resonance spectroscopy (NMR), light
scattering analysis (LS), and nephelometry.
[0140] In some embodiments, the metabolites are analyzed by liquid
or gas chromatography or ion mobility (electrophoresis) alone or
coupled with mass spectrometry or by mass spectrometry alone. Such
methods have been used to identify and quantify biomolecules, such
as cellular metabolites. (See, for example, Li et al., 2000; Rowley
et al., 2000; and Kuster and Mann, 1998). Mass spectrometry methods
may be based on, for example, quadrupole, ion-trap, or
time-of-flight mass spectrometry, with single, double, or triple
mass-to-charge scanning and/or filtering (MS, MS/MS, or MS.sup.3)
and preceded by appropriate ionization methods such as electrospray
ionization, atmospheric pressure chemical ionization, atmospheric
pressure photo ionization, matrix-assisted laser desorption
ionization (MALDI), or surface-enhanced laser desorption ionization
(SELDI). (See, for example, International Patent Application
Publication Nos. WO 2004056456 and WO 2004088309). In some
embodiments, the first separation of metabolites from a biological
sample can achieved by using gas or liquid chromatography or ion
mobility/electrophoresis. In some embodiments, the ionization for
mass spectrometry procedures can be achieved by electrospray
ionization, atmospheric pressure chemical ionization, or
atmospheric pressure photoionization. In some embodiments, mass
spectrometry instruments include quadrupole, ion-trap, or
time-of-flight, or Fourier transform instruments.
[0141] In some embodiments, metabolites are analyzed on a mass
scale via a non-targeted ultrahigh performance liquid or gas
chromatography/electrospray or atmospheric pressure chemical
ionization tandem mass spectrometry platform optimized for the
identification and relative quantification of the small-molecule
complement of biological systems. (See, for example, Evans et al.,
Anal. Chem., 2009, 81, 6656-6667).
[0142] In some embodiments, the first separation of metabolites
from a biological sample can achieved by using gas or liquid
chromatography or ion mobility/electrophoresis. In some
embodiments, the ionization for mass spectrometry procedures can be
achieved by electrospray ionization, atmospheric pressure chemical
ionization, or atmospheric pressure photoionization. In some
embodiments, mass spectrometry instruments include quadrupole,
ion-trap, or time-of-flight, or Fourier transform instruments.
[0143] In some embodiments, a blood sample containing metabolites
of interest is centrifuged to separate plasma from other blood
components. In certain embodiments, internal standards are
unnecessary. In some embodiments, defined amounts of internal
standards are added to (a portion of) the plasma, and then methanol
is added to precipitate plasma components such as proteins.
Precipitates are separated from supernatant by centrifugation, and
the supernatant is harvested. If the concentration of a metabolite
of interest is to be increased for more accurate detection, the
supernatant is evaporated and the residual dissolved in the
appropriate amount of solvent. If the concentration of a metabolite
of interest is undesirably high, the supernatant is diluted in the
appropriate solvent. An appropriate amount of metabolite-containing
sample is loaded onto a liquid-chromatography column equilibrated
with the appropriate mixture of mobile phase A and mobile phase B.
In the case of reversed-phase liquid chromatography, mobile phase A
typically is water with or without a small amount of an additive
such as formic acid, and mobile phase B typically is methanol or
acetonitrile. An appropriate gradient of mobile phase A and mobile
phase B is pumped through the column to achieve separation of
metabolites of interest by retention time--or time of elution from
the column. As metabolites elute from the column, they are ionized
and brought into the gas phase, and the ions are detected and
quantified by mass spectrometry. Specificity of detection is
achieved by double-filtering for a specific precursor ion and a
specific product ion generated from the precursor ion. Absolute
quantification may be achieved by normalizing ion counts derived
from the metabolite of interest to the ion counts derived from
known amounts of an internal standard for a given metabolite and by
comparing the normalized ion count to a calibration curve
established with known amounts of pure metabolite and internal
standards. Internal standards typically are stable-isotope labeled
forms of the pure metabolite or pure forms of a structural analogue
of the metabolite. Alternatively, relative quantification of a
given metabolite in arbitrary units may be calculated by
normalization to a selected internal reference value (e.g., the
median value for metabolite levels on all samples run from a given
group).
[0144] In some embodiments, one or more metabolites are measured by
immunoassay. Numerous specific immunoassay formats and variations
thereof may be utilized for measurement of metabolites. (See, for
example, E. Maggio, Enzyme-Immunoassay, (1980) (CRC Press, Inc.,
Boca Raton, Fla.); see also U.S. Pat. No. 4,727,022 "Methods for
Modulating Ligand-Receptor Interactions and their application";
U.S. Pat. No. 4,659,678 "Immunoassay of Antigens"; U.S. Pat. No.
4,376,110, "Immunometric Assays Using Monoclonal Antibodies,"; U.S.
Pat. No. 4,275,149, "Macromolecular Environment Control in Specific
Receptor Assays,"; U.S. Pat. No. 4,233,402, "Reagents and Method
Employing Channeling," and U.S. Pat. No. 4,230,767, "Heterogenous
Specific Binding Assay Employing a Coenzyme as Label."). Antibodies
can be conjugated to a solid support suitable for a diagnostic
assay (e.g., beads such as protein A or protein G agarose,
microspheres, plates, slides or wells formed from materials such as
latex or polystyrene) in accordance with known techniques, such as
passive binding. Antibodies as described herein may likewise be
conjugated to detectable labels or groups such as radio labels
(e.g., .sup.35S, .sup.125I, .sup.131I), enzyme labels (e.g.,
horseradish peroxidase, alkaline phosphatase), and fluorescent
labels (e.g., fluorescein, Alexa, green fluorescent protein) in
accordance with known techniques.
Determination of ASD Risk
[0145] In some embodiments, methods of the present invention allow
one of skill in the art to identify, diagnose, or otherwise assess
subjects based at least in part on measuring metabolite levels in
samples obtained from subjects who may not presently exhibit signs
or symptoms of ASD and/or other developmental disorders, but who
nonetheless may be at risk for having or developing ASD and/or
other developmental disorders.
[0146] In certain embodiments, levels of metabolites, or other
analytes (e.g., proteomic or genomic information) can be measured
in a test sample and compared to normal control levels, or to
levels in subjects having a developmental disorder, condition, or
classification that is not ASD (e.g., non-ASD developmental delay,
DD). In some embodiments, the term "normal control level" refers to
the level of one or more metabolites, or other analytes, or
indices, typically found in subjects not suffering from ASD or not
likely to have ASD or other developmental disorder. In some
embodiments, a normal control level is a range or an index. In some
embodiments, a normal control level is determined from a database
of previously tested subjects. A difference in the level of one or
more metabolites, or other analytes, compared to a normal control
level can indicate that a subject has ASD or is at risk of
developing ASD. Conversely, a lack of difference in the level of
one or more metabolites compared to a normal control level of one
or more metabolites, or other analytes, can indicate that the
subject does not have ASD, or is at low risk of developing ASD.
[0147] In some embodiments, a reference value is that which has
been obtained from a control subject or population whose diagnosis
is known (i.e., has been diagnosed with or identified as suffering
from ASD, or has not been diagnosed with or identified as suffering
from ASD). In some embodiments, a reference value is an index value
or baseline value, such as, for example, a "normal control level"
as described herein. In some embodiments, a reference sample or
index value or baseline value is taken or derived from one or more
subjects who have been exposed to treatment for ASD, or may be
taken or derived from one or more subjects who are at low risk of
developing ASD, or may be taken or derived from subjects who have
shown improvements in ASD risk factors as a result of exposure to
treatment. In some embodiments, a reference sample or index value
or baseline value is taken or derived from one or more subjects who
have not been exposed to a treatment for ASD. In some embodiments,
samples are collected from subjects who have received initial
treatment for ASD and/or subsequent treatment for ASD to monitor
the progress of the treatment. In some embodiments, a reference
value has been derived from risk prediction algorithms or computed
indices from population studies of ASD. In some embodiments, a
reference value is from subjects or populations that have a disease
or disorder other than ASD, such as another developmental disorder,
e.g., non-ASD Developmental Delay (DD).
[0148] In some embodiments, differences in the level of metabolites
measured by the methods of the present invention comprise increases
or decreases in the level of the metabolites as compared to a
normal control level, reference value, index value, or baseline
value. In some embodiments, increases or decreases in levels of
metabolites relative to a reference value from a normal control
population, a general population, or from a population with another
disease, is indicative of presence of ASD, progression of ASD,
exacerbation of ASD or amelioration of ASD or ASD symptoms. In some
embodiments, increases or decreases in levels of metabolites
relative to a reference value from a normal control population, a
general population, or from a population with another disease, is
indicative of an increase or decrease in the risk of developing
ASD, or complications relating thereto. The increase or decrease
can be indicative of the success of one or more treatment regimens
for ASD, or can indicate improvements or regression of ASD risk
factors. The increase or decrease can be, for example, at least 5%,
at least 10%, at least 15%, at least 20%, at least 25%, at least
30%, at least 35%, at least 40%, at least 45%, or at least 50% of a
reference value.
[0149] In some embodiments, differences in the level of metabolites
as described herein are statistically significant differences.
"Statistically significant" refers to differences that are greater
than what might be expected to happen by chance alone. Statistical
significance can be determined by any method known in the art. For
example, statistical significance can be determined by p-value. The
p-value is a measure of probability that a difference between
groups during an experiment happened by chance. For example, a
p-value of 0.01 means that there is a 1 in 100 chance the result
occurred by chance. The lower the p-value, the more likely it is
that a measured difference between groups is not by chance. A
difference is considered to be statistically significant if the
p-value is at or below 0.05. In some embodiments, a statistically
significant p-value is at or below 0.04, 0.03, 0.02, 0.01, 0.005,
or 0.001. In some embodiments, a statistically significant p-value
is at or below 0.30, 0.25, 0.20, 0.15, or 0.10 (e.g., in the case
of identifying whether a single particular metabolite has additive
predictive value when used in a classifier including other
metabolites). In some embodiments, a p value is determined by
t-test. In some embodiments, a p value is obtained by Fisher's
test. In some embodiments statistical significance is achieved by
analysis of combinations of several metabolites in panels and
combined with mathematical algorithms to achieve a statistically
significant risk prediction.
[0150] A classification test, assay, or method has an associated
ROC curve (Receiver Operating Characteristic curve) that plots
false positive rate (1-specificity) against true positive rate
(sensitivity). The area under the ROC curve (AUC) is a measure of
how well the classifier can distinguish between two diagnostic
groups. The maximum AUC is 1.0 (a perfect test) and the minimum
area is 0.5 (e.g. the area where there is no discrimination of
normal versus disease). It is appreciated that as an AUC approaches
one, the accuracy of a test increases.
[0151] In some embodiments, a high degree of risk prediction
accuracy is a test or assay wherein the AUC is at least 0.60. In
some embodiments, a high degree of risk prediction accuracy is a
test or assay wherein the AUC at least 0.65, at least 0.70, at
least 0.75, at least 0.80, at least 0.85, at least 0.90, or at
least 0.95.
Predicting ASD Risk by Assessment of Tail Effects
[0152] In some embodiments, a mean difference of metabolite levels
is assessed among or between populations, e.g., between an ASD
population and a DD population, or compared to a normal control
population. In some embodiments, metabolites from samples of a
given population (i.e., ASD) are assessed for enrichment in a tail
of a distribution curve. That is, determining whether a greater
proportion of samples from a designated population (e.g., ASD) as
compared to a second population (e.g., DD) reside in a tail of the
distribution curve (i.e., a "tail effect"). In some embodiments,
both mean differences and tail effects are identified and utilized.
In some embodiments, a tail is determined by a predetermined
threshold value. For example, a sample is designated to be within a
tail if its measurement for a certain metabolite is higher than the
value corresponding to a 90.sup.th percentile in a population for
that metabolite (right tail, or upper tail), or is lower than the
value corresponding to a 15.sup.th percentile (left tail, or lower
tail). In some embodiments, the threshold for a right (upper) tail
for a given metabolite is the value corresponding to the 80.sup.th,
81.sup.st, 82.sup.nd, 83.sup.rd, 84.sup.th, 85.sup.th, 86.sup.th,
87.sup.th, 88.sup.th, 89.sup.th, 90.sup.th, 91.sup.st, 92.sup.nd,
93.sup.rd, 94.sup.th, 95.sup.th, 96.sup.th, 97.sup.th, 98.sup.th,
or 99.sup.th percentile (e.g., where a sample is designated to be
within a right tail if its measurement for the given metabolite is
higher than the value associated with this percentile). In some
embodiments, the threshold for a left (lower) tail for a given
metabolite is the value corresponding to the 25.sup.th, 24.sup.th,
23.sup.rd, 22.sup.nd, 21.sup.st, 20.sup.th, 19.sup.th, 18.sup.th,
17.sup.th, 16.sup.th, 15.sup.th, 14.sup.th, 13.sup.th, 12.sup.th, 1
l.sup.th, 10.sup.th, 9.sup.th, 8.sup.th, 7.sup.th, 6.sup.th,
5.sup.th, 4.sup.th, 3.sup.rd, 2.sup.nd, or 1.sup.st percentile
(e.g., where a sample is designated to be within a left tail if its
measurement for the given metabolite is lower than the value
associated with this percentile). Percentile values shown are
inclusive of fractional values.
[0153] In some embodiments, a distribution curve is generated from
a plot of metabolite levels for one or more populations. In some
embodiments, a distribution curve is generated from a single
reference population, e.g., a general population. In some
embodiments, distribution curves are generated from two
populations, e.g., an ASD population and a non-ASD population, such
as DD. In some embodiments, distribution curves are generated from
three or more populations, e.g., an ASD population, a non-ASD
population but with another developmental
disorder/condition/classification such as DD, and a healthy (e.g.,
no developmental disorder) control population. Metabolite
distribution curves from each of the populations may be utilized to
make more than one risk assessment (e.g. diagnosing ASD, diagnosing
DD, differentiating between ASD and DD). The methods for assessment
of utilizing tail effects described herein may be applied to more
than two populations.
[0154] In some embodiments, a plurality of metabolites and their
distributions are used for risk assessment. In some embodiments,
levels of two or more metabolites are utilized to predict ASD risk.
In some embodiments, at least two of the metabolites are selected
from the metabolites listed in Table 1. In some embodiments, at
least three of the metabolites are selected from the metabolites
listed in Table 1. In some embodiments, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, or 21 metabolites selected from
the metabolites listed in Table 1 are used to predict ASD risk.
[0155] Further discussion of Table 1 (Tables 1A through 1C) appears
in the Examples section below.
TABLE-US-00001 TABLE 1 A. Exemplary 21-metabolite panel with tail
effects predictive of ASD vs. DD Metabolite
3-(3-hydroxyphenyl)propionate
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF) 3-indoxyl
sulfate 4-ethylphenyl sulfate 5-hydroxyindoleacetate
8-hydroxyoctanoate gamma-CEHC hydroxyisovaleroylcarnitine (C5)
indoleacetate isovalerylglycine lactate
N1-Methyl-2-pyridone-5-carboxamide p-cresol sulfate pantothenate
(Vitamin B5) phenylacetylglutamine pipecolate xanthine
hydroxy-chlorothalonil octenoylcarnitine 3-hydroxyhippurate
1,5-anhydroglucitol (1,5-AG) B. Exemplary metabolites with tail
enrichment predictive of ASD Tail Odds Confidence Metabolite Effect
Ratio Interval (90%) 3-carboxy-4-methyl-5- Left; p = 0.23 1.61
1.19-3.65 propyl-2-furanpropanoate (CMPF) 3-indoxyl sulfate Left; p
= 0.01 3.03 1.91-6.12 4-ethylphenyl sulfate Left; p = 0.02 2.54
1.70-5.37 5-hydroxyindoleacetate Right; p < 0.01 4.91 2.22-15.35
8-hydroxyoctanoate Left; p = 0.01 3.03 1.64-5.34 gamma-CEHC Left; p
= 0.01 3.03 2.08-8.09 hydroxyisovaleroyl- Left; p = 0.23 1.61
1.01-2.73 carnitine (C5) indoleacetate Left; p = 0.06 2.16
1.40-4.17 isovalerylglycine Left; p = 0.12 1.86 1.09-3.14 lactate
Right; p = 0.06 2.64 1.23-4.64 N1-Methyl-2-pyridone-5- Left; p =
0.23 1.61 0.98-2.73 carboxamide p-cresol sulfate Left; p < 0.01
3.69 1.94-6.68 pantothenate (Vitamin B5) Right; p = 0.06 2.64
1.58-7.04 phenylacetylglutamine Left; p = 0.06 2.16 1.38-4.03
pipecolate Right; p < 0.01 4.91 1.79-15.32 xanthine Right; p =
0.15 2.08 1.25-4.92 hydroxy-chlorothalonil Right; p < 0.01 4.94
2.77-17.71 octenoylcarnitine Left; p = 0.01 3.03 1.84-7.31
1,5-anhydroglucitol Left; p = 0.01 3.03 1.76-6.44 (1,5-AG) C.
Exemplary metabolites with tail enrichment predictive of DD Tail
Odds Confidence Metabolite Effect Ratio Interval (90%)
3-(3-hydroxyphenyl)propionate Left; p < 0.01 0.36 0.24-0.62
3-indoxyl sulfate Right; p = 0.1 0.52 0.32-0.91 isovalerylglycine
Right; p = 0.01 0.33 0.19-0.66 p-cresol sulfate Right; p < 0.01
0.28 0.17-0.50 phenylacetylglutamine Right; p < 0.01 0.20
0.15-0.46 pipecolate Left; p = 0.30 0.69 0.40-0.95 xanthine Left; p
= 0.01 0.40 0.28-0.70 3-hydroxyhippurate Left; p = 0.02 0.45
0.29-0.71
[0156] In some embodiments, at least two metabolites for analysis
are selected from the group consisting of phenylacetylglutamine,
xanthine, octenoylcarnitine, p-cresol sulfate, isovalerylglycine,
gamma-CEHC, indoleacetate, pipecolate, 1,5-anhydroglucitol
(1,5-AG), lactate, 3-(3-hydroxyphenyl)propionate, 3-indoxyl
sulfate, pantothenate (Vitamin B5), hydroxy-chlorothalonil, and
combinations thereof
[0157] In some embodiments, at least three metabolites for analysis
are selected from the group consisting of phenylacetylglutamine,
xanthine, octenoylcarnitine, p-cresol sulfate, isovalerylglycine,
gamma-CEHC, indoleacetate, pipecolate, 1,5-anhydroglucitol
(1,5-AG), lactate, 3-(3-hydroxyphenyl)propionate, 3-indoxyl
sulfate, pantothenate (Vitamin B5), hydroxy-chlorothalonil, and
combinations thereof
[0158] In some embodiments, information on the lack of a tail
effect for a particular set of metabolites is used for risk
assessment. In some embodiments, a lack of tail effects is
determined to provide a null result (i.e., no information as
opposed to negative information). In some embodiments, a lack of
tail effects is determined to be indicative of one classification
over another (e.g., more indicative of DD over ASD).
[0159] In some embodiments, the distribution curve is asymmetrical,
or non-Gaussian. In some embodiments, the distribution curve does
not follow a parametric distribution pattern.
[0160] In some embodiments, information from mean differences
(e.g., mean shifts) is combined with tail effect information for
risk assessment. In some embodiments, information from mean
differences is used for risk assessment without use of tail effect
information.
[0161] In some embodiments, analysis of metabolites is combined
with other types of information, e.g., genetic information,
demographic information, and/or behavior assessment to determine a
subject's risk for ASD or other disorders.
[0162] In some embodiments, ASD risk-assessment is performed based
at least in part on measured amounts of certain metabolites in a
biological sample (e.g., blood, plasma, urine, saliva, stool)
obtained from a subject, where the certain metabolites are found
herein to exhibit "tail effects." It has been found by the
inventors that there is not necessarily a statistically significant
mean shift between two populations associated with a tail effect.
Thus, a tail effect is a specific phenomenon distinct from mean
shift.
[0163] In certain embodiments, a particular metabolite exhibits a
right tail effect indicative of ASD over a non-ASD population
(e.g., a DD population) when the metabolite is characterized as
follows: [0164] a non-ASD population distribution curve is
established for the metabolite in a non-ASD population (e.g., a DD
population) with x-axis indicative of the level of the first
metabolite and y-axis indicative of corresponding population;
[0165] an ASD population distribution curve is established for the
metabolite in an ASD population with x-axis indicative of the level
of the first metabolite and y-axis indicative of corresponding
population; and [0166] the non-ASD population distribution curve
and the ASD population distribution curve are characterized in that
one or both of (A) and (B) hold(s): [0167] (A) the ratio of (i)
area under the ASD population distribution curve for x>level n
of the metabolite to (ii) area under the non-ASD population
distribution curve for x>level n of the metabolite is greater
than 150% (e.g., >200%, >300%, >500%, >1000%, etc.),
thereby providing predictive utility for differentiating between an
ASD classification and a non-ASD classification for samples having
>level n of the metabolite, and [0168] (B) where n' is the
minimum threshold metabolite level corresponding to the top decile
(or, any cutoff from about 5% to about 20%) of combined non-ASD and
ASD populations used to create the distribution curves, then for an
unknown sample (e.g. a random sample selected from a population
having an equal number of ASD and non-ASD members) having a
metabolite level of at least n', the odds of the sample being ASD
as opposed to non-ASD are no less than 1.6:1 (e.g., no less than
2:1, no less than 3:1, no less than 4:1, no less than 5:1, no less
than 6:1, no less than 7:1, no less than 8:1, no less than 9:1, or
no less than 10:1) (e.g., where p<0.3, p<0.2, p<0.1,
p<0.05, p<0.03, or p<0.01, e.g., statistically significant
classification), thereby providing predictive utility for
differentiating between an ASD classification and a non-ASD
classification for samples having >level n' of the
metabolite.
[0169] In certain embodiments, a particular metabolite exhibits a
left tail effect indicative of ASD over a non-ASD population (e.g.,
a DD population) when the metabolite is characterized as follows:
[0170] a non-ASD population distribution curve is established for
the metabolite in a non-ASD population (e.g., a DD population) with
x-axis indicative of the level of the first metabolite and y-axis
indicative of corresponding population; [0171] an ASD population
distribution curve is established for the metabolite in an ASD
population with x-axis indicative of the level of the first
metabolite and y-axis indicative of corresponding population; and
[0172] the non-ASD population distribution curve and the ASD
population distribution curve are characterized in that one or both
of (A) and (B) hold(s): [0173] (A) the ratio of (i) area under the
ASD population distribution curve for x<level m of the
metabolite to (ii) area under the non-ASD population distribution
curve for x<level m of the metabolite is greater than 150%
(e.g., >200%, >300%, >500%, >1000%, etc.), thereby
providing predictive utility for differentiating between an ASD
classification and a non-ASD classification for samples having
<level m of the metabolite, and [0174] (B) where m' is the
maximum threshold metabolite level corresponding to the bottom
decile (or, any cutoff from about 5% to about 20%) of combined
non-ASD and ASD populations used to create the distribution curves,
then for an unknown sample (e.g. a random sample selected from a
population having an equal number of ASD and non-ASD members)
having a metabolite level of less than m', the odds of the sample
being ASD as opposed to non-ASD are no less than 1.6:1 (e.g., no
less than 2:1, no less than 3:1, no less than 4:1, no less than
5:1, no less than 6:1, no less than 7:1, no less than 8:1, no less
than 9:1, or no less than 10:1) (e.g., where p<0.3, p<0.2,
p<0.1, p<0.05, p<0.03, or p<0.01, e.g., statistically
significant classification), thereby providing predictive utility
for differentiating between an ASD classification and a non-ASD
classification for samples having <level m' of the
metabolite.
[0175] In certain embodiments, a particular metabolite exhibits a
right tail effect indicative of non-ASD (e.g., DD) over an ASD
population when the metabolite is characterized as follows: [0176]
a non-ASD population distribution curve is established for the
metabolite in a non-ASD population (e.g., a DD population) with
x-axis indicative of the level of the first metabolite and y-axis
indicative of corresponding population; [0177] an ASD population
distribution curve is established for the metabolite in an ASD
population with x-axis indicative of the level of the first
metabolite and y-axis indicative of corresponding population; and
[0178] the non-ASD population distribution curve and the ASD
population distribution curve are characterized in that one or both
of (A) and (B) hold(s): [0179] (A) the ratio of (i) area under the
non-ASD population distribution curve for x>level n of the
metabolite to (ii) area under the ASD population distribution curve
for x>level n of the metabolite is greater than 150% (e.g.,
>200%, >300%, >500%, >1000%, etc.), thereby providing
predictive utility for differentiating between a non-ASD
classification and an ASD classification for samples having
>level n of the metabolite, and [0180] (B) where n' is the
minimum threshold metabolite level corresponding to the top decile
(or, any cutoff from about 5% to about 20%) of combined non-ASD and
ASD populations used to create the distribution curves, then for an
unknown sample (e.g. a random sample selected from a population
having an equal number of ASD and non-ASD members) having a
metabolite level of greater than n', the odds of the sample being
non-ASD as opposed to ASD are no less than 1.6:1 (e.g., no less
than 2:1, no less than 3:1, no less than 4:1, no less than 5:1, no
less than 6:1, no less than 7:1, no less than 8:1, no less than
9:1, or no less than 10:1) (e.g., where p<0.3, p<0.2,
p<0.1, p<0.05, p<0.03, or p<0.01, e.g., statistically
significant classification), thereby providing predictive utility
for differentiating between a non-ASD classification and an ASD
classification for samples having >level n' of the
metabolite.
[0181] In certain embodiments, a particular metabolite exhibits a
left tail effect indicative of non-ASD (e.g., DD) over an ASD
population when the metabolite is characterized as follows: [0182]
a non-ASD population distribution curve is established for the
metabolite in a non-ASD population (e.g., a DD population) with
x-axis indicative of the level of the first metabolite and y-axis
indicative of corresponding population; [0183] an ASD population
distribution curve is established for the metabolite in an ASD
population with x-axis indicative of the level of the first
metabolite and y-axis indicative of corresponding population; and
[0184] the non-ASD population distribution curve and the ASD
population distribution curve are characterized in that one or both
of (A) and (B) hold(s): [0185] (A) the ratio of (i) area under the
non-ASD population distribution curve for x<level m of the
metabolite to (ii) area under the ASD population distribution curve
for x<level m of the metabolite is greater than 150% (e.g.,
>200%, >300%, >500%, >1000%, etc.), thereby providing
predictive utility for differentiating between a non-ASD
classification and an ASD classification for samples having
<level m of the metabolite, and [0186] (B) where m' is the
maximum threshold metabolite level corresponding to the bottom
decile (or, any cutoff from about 5% to about 20%) of combined
non-ASD and ASD populations used to create the distribution curves,
then for an unknown sample (e.g. a random sample selected from a
population having an equal number of ASD and non-ASD members)
having a metabolite level of less than m', the odds of the sample
being non-ASD as opposed to ASD are no less than 1.6:1 (e.g., no
less than 2:1, no less than 3:1, no less than 4:1, no less than
5:1, no less than 6:1, no less than 7:1, no less than 8:1, no less
than 9:1, or no less than 10:1) (e.g., where p<0.3, p<0.2,
p<0.1, p<0.05, p<0.03, or p<0.01, e.g., statistically
significant classification), thereby providing predictive utility
for differentiating between a non-ASD classification and an ASD
classification for samples having <level m' of the
metabolite.
[0187] In certain embodiments, a risk assessment is performed using
a plurality of metabolites that exhibit tail effects. It has been
observed that, for assessment of ASD, there are particular groups
of metabolites (e.g., two or more metabolites) which provide
complementary diagnostic/risk assessment information. For example,
ASD-positive individuals who are identifiable by analysis of the
level of a first metabolite (e.g., individuals within an identified
tail of the first metabolite) are not the same ASD-positive
individuals who are identifiable by analysis of a second metabolite
(or there may be a low, non-zero degree of overlap). The tail of a
first metabolite is predictive of certain ASD individuals, while
the tail of the second metabolite is predictive of other ASD
individuals. Without wishing to be bound to a particular theory,
this discovery may be reflective of the multi-faceted nature of
ASD, itself
[0188] Thus, in certain embodiments, the risk assessment method
includes identifying whether a subject falls within any of a
multiplicity of identified metabolite tails involving a plurality
of metabolites, e.g., where the predictors of the different
metabolite tails are at least partially disjoint, e.g., they have
low mutual information, such that risk prediction improves as
multiple metabolites are incorporated with low mutual
information.
EXAMPLES
Subjects
[0189] Blood samples were collected from subjects between the ages
of 18 and 60 months who were referred to nineteen developmental
evaluation centers for evaluation of a possible developmental
disorder other than isolated motor problems. Informed consent was
obtained for all subjects. Subjects with a prior diagnosis of ASD
from a clinic specialized in pediatric development evaluation or
who were unable or unwilling to complete study procedures were
excluded from the study.
[0190] The subjects are those who enrolled in the SynapDx Autism
Spectrum Disorder Gene Expression Analysis (STORY) study. The STORY
study was performed in accordance with current ICH guidelines on
Good Clinical Practice (GCP), and applicable regulatory
requirements. GCP is an international ethical and scientific
quality standard for designing, conducting, recording, and
reporting studies that involve the participation of human subjects.
Compliance with this standard provides public assurance that the
rights, safety, and wellbeing of study subjects are protected,
consistent with the principles that have originated in the
Declaration of Helsinki and that the clinical study data are
credible.
[0191] Results shown in FIGS. 1 to 12 are based on 180 blood
samples from males in the STORY study. The sample set included 122
ASD samples, and 58 DD (non-ASD) samples. ASD diagnosis followed
DSM-V diagnostic criteria. Additional results are based on a
broader set of 299 blood samples from male subjects in the STORY
study. The broader sample set included 198 ASD samples and 101 DD
samples.
[0192] For all tests, approximately 3 mL blood samples were
collected in EDTA tubes, and plasma was prepared by centrifuging
the tubes. The plasma was then frozen and shipped to a laboratory
for analysis. At the laboratory, methanol extraction of the samples
was conducted, and the extracts were analyzed by an optimized
ultrahigh performance liquid or gas chromatography/tandem mass
spectrometry (UHPLC/MS/MS or GC/MS/MS) method (See, for example,
Anal. Chem., 2009, 81, 6656-6667).
Data Analysis
[0193] Metabolites in blood samples were quantified for both male
and female subjects. Samples were assayed for levels of metabolites
and quantified as a concentration in arbitrary units normalized to
a median concentration for all samples measured on a given day. For
example, a unit of greater than 1 refers to a quantity of
metabolite that is greater than the median of samples for the day,
and a unit of less than 1 refers to a quantity that is less than
the median. A cross-validation was then carried out, where samples
were randomly divided into non-overlapping training/testing sets on
which the unbiased performance of machine learning classifiers was
evaluated. Twenty-one metabolites have been identified that are
highly informative individually and collectively for predicting
ASD, particularly in male subjects.
Example 1
Discerning Metabolite Level Information
[0194] This example shows that valuable information for risk
assessment for ASD can be discerned from identification and
analysis of tail effects in a sample distribution that would
otherwise be missed by traditional analyses (e.g., mean shift-based
analysis).
[0195] Once a metabolite level is determined, there are multiple
ways to implement the information for risk assessment, including
mean shifts and tail effects. Singularly, mean shifts were found to
provide some, but not optimal, predictive information. An exemplary
mean shift is shown in FIG. 1. In this figure, the ASD distribution
shifts to the right of the non-ASD distribution (DD).
[0196] In addition to traditional mean shift analysis, the
inventors discerned additional information from the samples.
Metabolite distribution curves were plotted for ASD and non-ASD
(here, DD) samples, and it was discovered that for a subset of
metabolites measured, samples from either the ASD or the DD
population were enriched in a right (upper) or left (lower) tail
(i.e., a tail effect). A representative tail effect is shown in
FIG. 2. Notably, the two distributions shared nearly identical mean
values (i.e., there was minimal or no mean shift). Thus, the
predictive value of the metabolite would not be discernible from
traditional analysis of mean shifts.
[0197] Metabolites may exhibit a right (upper) tail effect, or a
left (lower) tail effect, or both. ASD and non-ASD (here, DD)
distribution curves for a representative metabolite, 5-HIAA are
shown in FIG. 3. A clear right tail effect is observed, e.g., the
ASD distribution has a larger AUC on the right tail. Thus, it is
demonstrated that samples with high levels of this metabolite are
highly enriched with ASD-population members. With this metabolite,
both the mean shift (indicated by t-test value) and the right tail
(indicated by `extremes` Fisher test value) are statistically
significant.
[0198] ASD and non-ASD (here, DD) distribution curves for another
illustrative metabolite, gamma-CEHC, are shown in FIG. 4. A clear
left tail effect is observed, e.g., the ASD distribution has a
larger AUC on the left tail. Thus, it is demonstrated that samples
with low levels of this metabolite are highly enriched with
ASD-population members. With this metabolite, the mean shift
(indicated by t-test value) is not statistically significant, while
the left tail is statistically significant.
[0199] These data illustrate that identification and analysis of
tail effects provides additional information for risk assessment
that cannot be obtained via traditional mean shift analysis.
Example 2
Strong Prediction of ASD from Selected Metabolites Demonstrating
Tail Effects
[0200] This example illustrates the assessment of tail effects for
prediction of ASD. The inventors identified statistically
significant tail effects for a number of metabolites in samples
obtained from male subjects. The tail effects were singly and
cumulatively informative about which population the subject
belonged to--i.e., the ASD population or the DD population. Table 1
shows an exemplary panel of twenty-one metabolites exhibiting ASD
vs. DD tail effects with high predictive power.
[0201] Table 1B shows metabolites of the 21-metabolite panel that
have tail effects predictive of ASD. The statistical significance
(p-value) of each tail effect as well as its location on a
distribution curve (i.e., left tail effect or right tail effect) is
indicated. An odds ratio of greater than one indicates predictive
power for ASD. For example, 5HIAA has a right tail with an odds
ratio of 4.91, indicating that in the STORY study data set (in
which the ratio of ASD to DD samples was 2:1), approximately 10 ASD
samples for every DD sample was in the right tail. The confidence
intervals were estimated by bootstrap methods. One thousand
individual bootstraps were generated from the STORY data by
resampling with replacement. For each bootstrap, the position of
the tail and corresponding odds ratio was determined. The 90%
confidence interval was calculated from the distribution of
observed odds ratios.
[0202] Based on these criteria, nineteen metabolites of the
21-metabolite panel were found to be predictive of ASD.
[0203] Table 1C shows metabolites having tail effects that are
predictive of DD. The statistical significance (p-value) of each
tail effect as well as its location on a distribution curve (i.e.,
left tail effect or right tail effect) is indicated. An odds ratio
of less than one indicates predictive power for DD. Based on these
criteria, eight metabolites of the 21-metabolite panel were found
to be predictive of DD. The odds ratio and 90% confidence intervals
were determined similarly for ASD, taking into account the 1:2
ratio of DD to ASD samples in the STORY study.
[0204] Notably, certain metabolites demonstrate a single tail
effect (either left or right) with predictive power for either ASD
or DD, whereas other metabolites demonstrate both a left and right
tail effect, together providing predictive power for both ASD and
DD. For example, phenylacetylglutamine and p-cresol sulfate
demonstrate both right and left tail effects.
[0205] The tail effects of the 21 metabolites listed in Table 1 are
shown individually in the graphs of FIGS. 13A to 13U. For each
graph, distributions of one metabolite in both the ASD and DD
populations are shown. The legend at the top of each panel shows
the statistical significance of the left and right tails for the
metabolite (p-value generated by Fisher's test).
[0206] Some metabolites, e.g., phenylacetylglutamine, exhibit mean
shifts and tail effects. As shown in FIG. 5, phenylacetylglutamine
exhibits a statistically significant mean shift (t-test; p=0.001),
and statistically significant left and right tail effects between
the two populations (extremes' signifies tail effect, Fisher's
test; p=0.0001). The distributions appear as shifted Gaussian
curves between the ASD and DD populations.
[0207] Table 2 shows threshold values used to determine the tail
effects for the 21-metabolite panel, based on the underlying
population distribution of each metabolite in the ASD and non-ASD
populations. Illustratively, the upper threshold value corresponds
to the 90.sup.th percentile distribution, while the lower threshold
value corresponds to the 15.sup.th percentile distribution. The
absolute measurements of the threshold values (e.g., ng/mL, nM,
etc.) can be calculated by using values in Table 2 with average
concentrations of the metabolites in a population.
TABLE-US-00002 TABLE 2 Threshold levels for left tail (at or below
15.sup.th percentile) and right tail (at or above 90.sup.th
percentile) of metabolite distribution curve Left tail Right tail
Metabolite cut-off cut-off 1,5-anhydroglucitol (1,5-AG) 0.680 1.561
3-(3-hydroxyphenyl)propionate 0.270 3.462
3-carboxy-4-methyl-5-propyl-2- 0.396 13.734 furanpropanoate (CMPF)
3-indoxyl sulfate 0.584 1.601 4-ethylphenyl sulfate 0.281 4.054
5-hydroxyindoleacetate 0.729 2.027 8-hydroxyoctanoate 0.711 1.411
gamma-CEHC 0.505 2.199 hydroxyisovaleroylcarnitine (C5) 0.619 1.767
indoleacetate 0.707 1.690 isovalerylglycine 0.438 3.182 lactate
0.801 1.288 N1-Methyl-2-pyridone-5-carboxamide 0.554 2.254 p-cresol
sulfate 0.378 2.231 pantothenate (Vitamin B5) 0.675 1.980
phenylacetylglutamine 0.498 2.305 pipecolate 0.651 1.711 xanthine
0.731 1.507 hydroxy-chlorothalonil 0.597 2.645 octenoylcarnitine
0.479 2.214 3-hydroxyhippurate 0.375 3.651
Example 3
Predicting ASD with Multiple Metabolites
[0208] The information provided by multiple metabolites (e.g.,
those listed in Table 1) can be used individually or as a group to
assist in disease risk prediction. Particularly informative sets of
metabolites include members that do not correlate to each other
well and have low collinearity (i.e. low mutuality). For example,
FIG. 6 shows 5HIAA levels compared against gamma-CEHC levels
demonstrating a lack of correlation between informative levels of
the two metabolites. For example, the ASD individuals identified in
the tail of 5HIAA (FIG. 3) are generally not the same ASD
individuals identified in the tail of gamma-CEHC. Thus, the
metabolites 5HIAA and gamma-CEHC are deemed to provide
complementary information. Tail enriched metabolites with low
mutuality provide complementary classification information.
[0209] FIG. 7 is a chart indicating, for each of the 180 samples,
whether the sample was within a tail or not within a tail of each
of the metabolites of a 12-metabolite panel. In this exemplary
panel, tails for two metabolites, xanthine and P-cresol sulfate,
are predictive of non-ASD (e.g., DD), while tails for the other ten
metabolites are predictive of ASD.
[0210] When multiple metabolites are assessed, the number of
combinations of the aggregated tail effect counts increase, as well
as the potential aggregated tail effect count. The distribution of
aggregated tail effect counts from ASD and from non-ASD populations
can be plotted and the resulting distribution can be used to
determine suitable separation between ASD and non-ASD when an
unknown sample is measured. As shown in FIG. 8A, ASD and non-ASD
(here, DD) samples can be further analyzed by employing a voting
(e.g., binning) scheme to further utilize the complementary
information provided by the metabolites for which a tail effect was
observed. Data for a total of 12 metabolites are shown. In one
particular scheme, for a given sample, the number of metabolites
for which the sample fell within an ASD-predictive tail was summed,
as was the number of metabolites for which the sample fell within a
non-ASD (here, DD)-predictive tail. These two values are shown
plotted as x- and y-coordinates (FIG. 8A). Notably, as the number
of ASD enriched metabolites increase (higher in y-axis) and as the
number of non-ASD enriched metabolites decrease (lower in x-axis),
there appeared to be less mixing of non-ASD dots among ASD dots,
e.g., suggesting a lower likelihood for a false positive diagnosis
for ASD. On the other hand, as the number of ASD enriched
metabolites decreased (lower in y-axis) and as the number of
non-ASD enriched metabolites increased (higher in x-axis), there
was less mixing of ASD dots among non-ASD dots, e.g., suggesting a
lower likelihood for a false positive diagnosis of DD.
[0211] The samples were divided into four different bins, shown in
FIG. 8B. The bins on the top and on the bottom right in particular
showed clear separation, facilitative of ASD or DD risk
evaluation.
[0212] Of the four bins shown in FIG. 8, the bin most strongly
predictive of ASD included samples having 2 or more ASD-enriched
features and either 0 or 1 non-ASD enriched features. The bin
having 1 ASD-enriched feature and either 0 or 1 non-ASD enriched
features was also predictive of ASD, though less strongly than the
bin above. The bin having 1 or more non-ASD enriched features and 0
ASD-enriched feature was strongly predictive of non-ASD. A bin of
samples having no ASD-enriched features and no non-ASD-enriched
features may also provide predictive information in some
circumstances.
[0213] In one exemplary voting scheme, votes are tallied for a
given sample, for example, with ASD-enriched metabolites scoring a
point and non-ASD-enriched metabolites subtracting a point. A
sample with a positive result (e.g., equal to or greater than 1)
may be considered ASD (or having significant risk of ASD), a sample
with a negative result (equal to or less than -1) may be considered
non-ASD (or having a significant likelihood of non-ASD). A sample
with a zero result may be considered likely non-ASD or ASD,
depending on the distribution of ASD to non-ASD in the samples, or
may be returned as an indeterminate or "no classification result"
sample. Similarly, FIG. 8C shows vote tallying results for the
21-metabolite panel described in Table 1.
[0214] Tail effect information may be used to differentiate a
subject having ASD or a non-ASD condition. Likewise, tail effect
information may be used to predict the risk for another disease or
condition, e.g., DD, for a subject.
[0215] For example, tail effect distribution for a non-ASD
population, e.g, DD, as shown in FIGS. 8A and 8C, can be used to
establish a reference value for the average tail effect sum for a
given number of metabolites in that population. This average value
can be used as a reference to compare to the sum of average tail
effects from a sample from an unknown subject, and can be used to
assess the subject's risk for ASD without having to obtain the
population distribution curves of metabolites in both ASD and
non-ASD populations.
[0216] Tail effect information, e.g., as described in the above
exemplary voting schemes, or similar schemes, may also be combined
with traditional mean-shift information and/or other classification
information for improved classification results.
[0217] It is demonstrated herein that the predictability of ASD
risk can be increased by analysis of combinations of certain
metabolites. For example, FIGS. 9-11, and 13-14A-D illustrate how
use of a voting scheme can increase AUC of the classifier and
improve predictive ability. Use of subsets of a 12-metabolite panel
increased ASD predictive power (y-axis) as the number of
metabolites in the subsets increased (from 1 to 12) (FIG. 9). Use
of different classifiers (i.e., logistic regression, naive Bayes,
or support vector machine (SVM)), and selection of different
featured also affect the AUC (FIG. 9). FIG. 10A shows for the same
population, using a 12-metabolite panel, the trichotomized
prediction of ASD risk using different features and classifiers,
while FIG. 10B shows the results using a 21-metabolite panel. FIGS.
11A and 11B show the improvements in ASD risk prediction using
voting schemes of the 12-metabolite panel (FIG. 11A) and the
21-metabolite panel (FIG. 11B). Together, these analyses
demonstrate that by selecting targeted metabolites and using
appropriate statistical tools, a high degree of confidence for ASD
risk assessment can be achieved. For example, as shown in FIG. 12,
an AUC of at least 0.74 was obtained following the methods
described above using 12 metabolites.
Example 4
Selection of High Impact Metabolites from Metabolomics Data
[0218] Samples from ASD and DD subjects were screened for detection
of approximately 600 known metabolites (shown in Table 3). From the
initial set of 600, 84 candidate metabolites were identified to
exhibit a tail effect. A subset of the 84 metabolites detected in
the samples were elucidated and are identified by name in Table 4.
Metabolite panels (e.g., 12 and 21-panels) were selected from the
set of 84 candidate metabolites based on a high individual
metabolite AUCs. Certain candidate metabolites were excluded from
panels based on factors such as an association with medication or
age.
TABLE-US-00003 TABLE 3 Four hundred sixty five (465) elucidated
metabolites of the initial set of 600 metabolites assayed glycine
N-acetylglycine sarcosine (N- Methylglycine) serine N-acetylserine
threonine N-acetylalanine aspartate asparagine glutamine
N-acetylglutamate N-acetyl-aspartyl-glutamate (NAAG)
N-acetylhistidine 1-methylhistidine 3-methylhistidine imidazole
lactate lysine N6-acetyllysine glutarate (pentanedioate)
glutaroylcarnitine (C5) 3-methylglutarylcarnitine-1 phenylalanine
N-acetylphenylalanine phenylpyruvate phenylacetylglutamine tyrosine
N-acetyltyrosine phenol sulfate p-cresol sulfate o-cresol sulfate
3-methoxytyramine sulfate 3-(3-hydroxyphenyl)propionate
3-phenylpropionate (hydrocinnamate) tryptophan N-acetyltryptophan
indolelactate 3-indoxyl sulfate kynurenine kynurenate
indoleacetylglutamine tryptophan betaine C-glycosyltryptophan
N-acetylleucine 4-methyl-2-oxopentanoate isovalerate (C5)
beta-hydroxyisovalerate hydroxyisovaleroylcarnitine (C5)
alpha-hydroxyisovalerate 3-methyl-2-oxovalerate
2-methylbutyroylcarnitine (C5) tiglyl carnitine (C5) valine
N-acetylvaline 3-methyl-2-oxobutyrate 3-hydroxyisobutyrate
alpha-hydroxyisocaproate methionine S-adenosylhomocysteine
alpha-ketobutyrate 2-aminobutyrate (SAH) S-methylcysteine taurine
arginine proline citrulline homoarginine N-delta-acetylornithine
N-methyl proline hydroxyproline creatinine acisoga
5-methylthioadenosine (MTA) 4-guanidinobutanoate glutathione,
oxidized (GSSG) cys-gly, oxidized gamma-glutamylisoleucine
gamma-glutamylleucine gamma-glutamylmethionine
gamma-glutamyltyrosine gamma-glutamylvaline N-acetylcarnosine
cyclo(gly-pro) cyclo(leu-pro) cyclo(L-phe-L-pro) isoleucylglutamine
isoleucylglycine isoleucylvaline leucylglutamate leucylglycine
leucylphenylalanine phenylalanylalanine phenylalanylarginine
phenylalanylaspartate phenylalanylleucine phenylalanylmethionine
phenylalanylphenylalanine pyroglutamylglycine pyroglutamylvaline
serylleucine tryptophylphenylalanine valylglycine valylleucine
glucose 3-phosphoglycerate pyruvate ribitol xylonate xylose
arabitol sucrose fructose mannitol glucuronate erythronate
succinylcarnitine (C4) succinate fumarate valerate (5:0) caproate
(6:0) heptanoate (7:0) caprate (10:0) laurate (12:0) 5-dodecenoate
(12:1n7) 2-hydroxyglutarate suberate (octanedioate) azelate
(nonanedioate; C9) dodecanedioate (C12) tetradecanedioate (C14)
hexadecanedioate (C16) 3-carboxy-4-methyl-5-propyl-
2-aminoheptanoate 2-aminooctanoate 2-furanpropanoate (CMPF)
propionylcarnitine (C3) propionylglycine (C3) N-octanoylglycine
hydroxybutyrylcarnitine valerylcarnitine (C5) hexanoylcarnitine
(C6) cis-4-decenoyl carnitine laurylcarnitine (C12)
myristoylcarnitine linoleoylcarnitine oleoylcarnitine (C18)
deoxycarnitine 3-hydroxybutyrate (BHBA) alpha-hydroxycaproate
2-hydroxyoctanoate 2-hydroxystearate 3-hydroxypropanoate
3-hydroxyoctanoate 5-hydroxyhexanoate 8-hydroxyoctanoate
16-hydroxypalmitate oleic ethanolamide palmitoyl ethanolamide
N-oleoyltaurine myo-inositol scyllo-inositol choline
1-myristoyl-GPC (14:0) 2-myristoyl-GPC (14:0) 1-
pentadecanoylglycerophosphocholine (15:0) 1-palmitoleoyl-GPC (16:1)
2-palmitoleoyl-GPC (16:1) 1-heptadecanoyl-GPC (17:0) 1-oleoyl-GPC
(18:1) 2-oleoyl-GPC (18:1) 1-linoleoyl-GPC (18:2) 1-
1-eicosadienoyl-GPC (20:2) 1-arachidoyl-GPC (20:0)
nonadecanoylglycerophosphocholine (19:0) 2-eicosatrienoyl-GPC
(20:3) 1-arachidonoyl-GPC (20:4) 2-arachidonoyl-GPC (20:4)
1-docosapentaenoyl-GPC 1-docosahexaenoyl-GPC (22:6) 1- (22:5n6)
palmitoylplasmenylethanolamine 1-palmitoyl-GPE (16:0)
2-palmitoyl-GPE (16:0) 1-stearoyl-GPE (18:0) 2-oleoyl-GPE (18:1)
1-linoleoyl-GPE (18:2) 2-linoleoyl-GPE (18:2) 1- 1- 1-palmitoyl-GPI
(16:0) eicosatrienoylglycerophosphoethanolamine
docosahexaenoylglycerophosphoethanolamine 1-linoleoyl-GPI (18:2)
1-arachidonoyl-GPI (20:4) 1- arachidonoylglyercophosphate glycerol
glycerol 3-phosphate (G3P) 1-myristoylglycerol (14:0)
1-oleoylglycerol (18:1) 1-linoleoylglycerol (18:2) sphinganine
lathosterol cholesterol 7-beta-hydroxycholesterol
21-hydroxypregnenolone 5alpha-pregnan-3beta,20beta-diol
5alpha-pregnan- disulfate monosulfate 1 3beta,20alpha-diol
monsulfate 2 cortisol corticosterone cortisone epiandrosterone
sulfate androsterone sulfate 4-androsten-3alpha,17alpha- diol
monosulfate 3 5alpha-androstan- cholate glycocholate
3beta,17beta-diol disulfate taurochenodeoxycholate
tauro-beta-muricholate deoxycholate ursodeoxycholate
glycoursodeoxycholate tauroursodeoxycholate glycocholenate sulfate
taurocholenate sulfate 7-ketodeoxycholate xanthine xanthosine urate
AMP adenosine 3',5'-cyclic adenosine monophosphate (cAMP)
N6-methyladenosine N6-carbamoylthreonyladenosine guanosine
N2,N2-dimethylguanosine uridine pseudouridine 3-ureidopropionate
beta-alanine N-acetyl-beta-alanine 5,6-dihydrothymine
3-aminoisobutyrate nicotinamide N1-Methyl-2-pyridone-5- adenosine
5'-diphosphoribose riboflavin (Vitamin B2) carboxamide (ADP-ribose)
threonate arabonate alpha-tocopherol gamma-CEHC glucuronide heme
bilirubin pyridoxate hippurate 2-hydroxyhippurate (salicylurate)
benzoate catechol sulfate O-methylcatechol sulfate 4-methylcatechol
sulfate 4-ethylphenyl sulfate 4-vinylphenol sulfate theobromine
theophylline 1-methylurate 7-methylxanthine 2-piperidinone
levulinate (4-oxovalerate) gluconate cinnamoylglycine
dihydroferulic acid methyl indole-3-acetate N-(2-furoyl)glycine
piperine 4-allylphenol sulfate methyl glucopyranoside (alpha +
tartronate beta) (hydroxymalonate) 6-oxopiperidine-2-carboxylic
hydroquinone sulfate salicylate acid O-sulfo-L-tyrosine
2-aminophenol sulfate 2-ethylhexanoic acid EDTA glycerol
2-phosphate glycolate (hydroxyacetate) pyroglutamylglutamine
betaine phenylalanylglycine threonylleucine alanine
phenylalanyltryptophan 1,5-anhydroglucitol (1,5-AG) glutamate
serylphenyalanine glycerate histidine valylvaline threitol
imidazole propionate lactate mannose 2-aminoadipate arabinose
alpha-ketoglutarate pipecolate sorbitol phosphate
4-hydroxyphenylacetate citrate pelargonate (9:0)
3-(4-hydroxyphenyl)lactate malate (HPLA) 17-methylstearate
3-methoxytyrosine caprylate (8:0) undecanedioate
2-hydroxyphenylacetate methylpalmitate (15 or 2) docosadioate
indolepropionate sebacate (decanedioate) butyrylcarnitine (C4)
5-hydroxyindoleacetate octadecanedioate (C18) acetylcarnitine (C2)
leucine 2-methylmalonyl carnitine decanoylcarnitine (C10)
isovalerylcarnitine (C5) N-palmitoyl glycine stearoylcarnitine
(C18) N-acetylisoleucine octanoylcarnitine (C8) acetoacetate
3-hydroxy-2-ethylpropionate palmitoylcarnitine (C16)
2-hydroxypalmitate isobutyrylglycine (C4) carnitine
3-hydroxysebacate N-formylmethionine 2-hydroxydecanoate
12,13-DiHOME cysteine 3-hydroxydecanoate N-palmitoyltaurine
ornithine 13-HODE + 9-HODE ethanolamine N-acetylarginine
N-stearoyltaurine 2-palmitoyl-GPC (16:0) creatine
glycerophosphorylcholine (GPC) 2-stearoyl-GPC (18:0)
4-acetamidobutanoate 1-palmitoyl-GPC (16:0) 1-
gamma-glutamylalanine 1-stearoyl-GPC (18:0)
linolenoylglycerophosphocholine (18:3n3) 1-eicosatrienoyl-GPC
(20:3) gamma-glutamyltryptophan 2-linoleoyl-GPC (18:2)
1-docosapentaenoyl-GPC asparagylleucine 1- (22:5n3)
eicosenoylglycerophosphocholine (20:1n9) 1- isoleucylalanine 1-
oleoylplasmenylethanolamine eicosapentaenoylglycerophosphocholine
(20:5n3) 1-oleoyl-GPE (18:1) leucylaspartate 1-
stearoylplasmenylethanolamine 2-arachidonoyl-GPE (20:4)
methionylalanine 2- stearoylglycerophosphoethanolamine 1-oleoyl-GPI
(18:1) phenylalanylisoleucine 1-arachidonoyl-GPE (20:4) 1-
dimethylglycine 1-stearoyl-GPI (18:0) oleoylglycerophosphoglycerol
1-stearoylglycerol (18:0) N-acetylthreonine 1-
palmitoylglycerophosphoglycerol sphingosine N-acetylaspartate (NAA)
1-palmitoylglycerol (16:0) pregnenolone sulfate pyroglutamine
sphingosine 1-phosphate 5alpha-pregnan-3(alpha or trans-urocanate
7-HOCA beta),20beta-diol disulfate 16a-hydroxy DHEA 3-sulfate
N-6-trimethyllysine 5alpha-pregnan- 3beta,20alpha-diol disulfate
4-androsten-3beta,17beta-diol 3-methylglutarylcarnitine-2
dehydroisoandrosterone disulfate 2 sulfate (DHEA-S)
glycochenodeoxycholate phenyllactate (PLA)
4-androsten-3beta,17beta- diol disulfate 1 taurolithocholate
3-sulfate 4-hydroxyphenylpyruvate taurocholate glycohyocholate
vanillylmandelate (VMA) glycolithocholate sulfate hypoxanthine
p-toluic acid hyocholate ADP indoleacetate inosine
1-methyladenosine xanthurenate allantoin 1-methylguanosine
indole-3-carboxylic acid adenine 5,6-dihydrouracil
isovalerylglycine 7-methylguanine N4-acetylcytidine isoleucine
5-methyluridine (ribothymidine) trigonelline (N'-
2-hydroxy-3-methylvalerate cytidine methylnicotinate) pantothenate
(Vitamin B5) isobutyrylcarnitine (C4) 1-methylnicotinamide
gamma-CEHC N-acetylmethionine FAD biliverdin 2-hydroxybutyrate
(AHB) gamma-tocopherol 4-hydroxyhippurate urea bilirubin (E,E)
3-methyl catechol sulfate 2 dimethylarginine (ADMA +
3-hydroxyhippurate SDMA) paraxanthine prolylhydroxyproline 3-methyl
catechol sulfate 1 3-methylxanthine N-acetylputrescine caffeine
2-isopropylmalate 5-oxoproline 1-methylxanthine homostachydrine
gamma-glutamylphenylalanine 1,6-anhydroglucose thymol sulfate
alanylleucine erythritol 4-acetylphenyl sulfate glycylleucine
stachydrine 2-pyrrolidinone leucylalanine 4-acetaminophen sulfate
dimethyl sulfone leucylserine 1,2-propanediol phenylcarnitine
iminodiacetate (IDA) 2-hydroxyisobutyrate
TABLE-US-00004 TABLE 4 Identified candidate metabolites exhibiting
a tail effect 1-arachidonoyl-GPC (20:4) 1-arachidonoyl-GPE (20:4)
1-docosahexaenoylglycerophosphoethanolamine
1-oleoylplasmenylethanolamine 1-palmitoyl-GPC (16:0)
1-palmitoylglycerol (16:0) 1-palmitoylplasmenylethanolamine
1-stearoylglycerol (18:0) 1,5-anhydroglucitol (1,5-AG)
17-methylstearate 2-hydroxyisobutyrate 2-isopropylmalate
2-pyrrolidinone 3-(3-hydroxyphenyl)propionate
3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF)
3-hydroxyhippurate 3-indoxyl sulfate 4-ethylphenyl sulfate
4-hydroxyphenylpyruvate 5-hydroxyhexanoate 5-hydroxyindoleacetate
8-hydroxyoctanoate caffeine caprate (10:0) dihydroferulic acid
dimethylarginine (ADMA + SDMA) ethanolamine gamma-CEHC gamma-CEHC
glucuronide hexadecanedioate (C16) homoarginine
hydroxyisovaleroylcarnitine (C5) indoleacetate indolelactate
isobutyrylglycine (C4) isovalerylglycine lactate methionylalanine
methylpalmitate (15 or 2) N-acetylaspartate (NAA)
N-formylmethionine N1-Methyl-2-pyridone-5-carboxamide p-cresol
sulfate pantothenate (Vitamin B5) phenylacetylglutamine
phenylalanylarginine pipecolate serine serylphenylalanine sorbitol
urea 3,4-methylene-heptanoylcarnitine sulfated methylparaben
cyclo(prolylproline) hydroxy-Chlorothalonil phenylacetylcarnitine
xanthine
[0219] Two panels of metabolites (a 12-metabolite panel composed of
the metabolites of FIG. 7 and a 21 metabolite panel composed of the
metabolites of Table 1) were tested for ASD risk prediction. The
results show that the 12- and 21-metabolite panels contributed
strongly to prediction of ASD. An overview of the effects of
including and excluding metabolites of the 12- or 21 panel on ASD
prediction is shown in FIGS. 14A-D. Whitelists indicate AUC values
of classifiers using the data from the 12- or 21 metabolite panels
only, while blacklists indicate AUC values of classifiers excluding
the 12- or 21 metabolite panels but using other metabolites, either
from the group of 84 candidate metabolites or the full group of 600
metabolites (all_candidates=84 candidate metabolites; all
features=600 metabolites). Mean shift (top panel) and tail analysis
(bottom panel) were performed. These data show that the predictive
information for ASD is attributable to metabolites within the 12-
or 21-metabolite panels, whether assessed by mean shift or tail
analysis. Thus, the metabolites observed to exhibit strong tail
effects (the metabolites on the 12- and 21-metabolite groups) have
much greater ASD vs. DD predictive power than the other metabolites
from the 600 metabolite panel which do not exhibit strong tail
effects.
[0220] FIGS. 14C-D expands the results from FIGS. 14A-B and include
additional analyses using Naive Bayes analysis in addition to
logistic regression. In addition, FIG. 14B shows results broken up
into different cohorts of samples (i.e., "Christmas" and "Easter").
The far left panel shows AUC results in which the classifier was
trained on 192 samples and cross validated on the Christmas cohort
only; the middle left panel shows AUC results in which the
classifier was trained on 299 samples and cross validated on
Christmas and Easter cohorts; the middle right panel shows AUC
results in which the classifier was trained on samples from and
cross validated on Easter cohorts only; and the far right panel
shows AUC results in which the classifier was trained on samples
from Christmas and Easter cohorts and cross validated on Easter
cohorts. The highest AUCs were achieved using metabolites within
the 12- or 21-metabolite panels (e.g., the metabolites exhibiting
tail effects).
[0221] FIGS. 17A-B, 18A-B and 19A-D further expand the results of
FIGS. 14A-D by showing the AUC predictions by including the 12 or
21 metabolite panels (whitelists) and by excluding them
(blacklists) according to the number of features added to the
statistical analysis. Panels on top show results from mean shift
analysis while those on the bottom show tail effect analysis.
Within each individual panel, the bars represent different
metabolite panels as indicated by the symbols below and in the
legend.
[0222] An exemplary plot describing cumulative AUC for ASD risk
prediction when subsets total of 21 metabolites are assessed is
shown in FIG. 15. In this figure, the x-axis shows the number of
metabolites from subsets selected from a group of 21 metabolites.
The y-axis shows the predicative power of ASD. For each number on
the x-axis, a number of random metabolite combinations was analyzed
and their AUC values plotted (dots). The curve shows the increased
AUC that results from an increase in the number of metabolites used
(selected from the group of 21). On the other hand, the figure
demonstrates that even subsets having a small number of metabolites
(e.g., 3 or 5) exhibit a high AUC. Thus, certain metabolites appear
to have particularly important predictive tails.
[0223] An exemplary table describing representative subsets of the
21 metabolites from Table 1 containing 3, 4, 5, 6, and 7
metabolites that yield high AUC values is shown in Table 5. For
each subset size (3, 4, 5, 6 or 7), 50 random selections of
metabolite sets were analyzed. For example, for a subset of 3 from
a 21-metabolite panel, 50 random combinations of a 3-metabolite
subset were assessed (out of a total of 1330 possible
permutations). Combinations from the 50 random sets with the
highest AUC are shown. Thus, certain metabolite combinations
containing fewer than 21 metabolites yielded high AUC values.
Metabolites such as gamma-CEHC, p-cresol sulfate, xanthine,
phenylacetylglutamine, isovalerylglycine, octenoylcarnitine, and
hydroxy-chlorothalonil, appeared in multiple subsets that yielded
high AUC values, indicating that these metabolites may be closely
related to ASD status of a patient. Thus, these metabolites, alone
or in combination with each other or additional metabolites, appear
to be particularly useful for predicting the ASD risk of a
patient.
TABLE-US-00005 TABLE 5 Exemplary subsets of metabolites and
prediction of ASD Number of metabolites Representative subset with
high AUC AUC 3 gamma-CEHC, 0.675 isovalerylglycine p-cresol sulfate
4 Octenoylcarnitine 0.700 gamma-CEHC xanthine phenylacetylglutamine
5 3-indoxyl sulfate 0.692 3-(3-hydroxyphenyl)propionate p-cresol
sulfate gamma-CEHC Hydroxy-Chlorothalonil 6 phenylacetylglutamine
0.731 indoleacetate xanthine Octenoylcarnitine
hydroxyisovaleroylcarnitine (C5) pantothenate (Vitamin B5) 7
Octenoylcarnitine 0.720 pantothenate (Vitamin B5)
phenylacetylglutamine pipecolate xanthine indoleacetate
8-hydroxyoctanoate
[0224] Two-metabolite subsets of the 21 metabolites from Table 1
were assessed for predictability of ASD in paired combinations.
Representative paired combinations having a robust AUC are shown in
Table 6. Similarly, three-metabolite subsets of the 21 metabolites
from Table 1 were assessed for predictability of ASD in triplet
combinations. Representative triplet combinations having a robust
AUC are shown in Table 7.
TABLE-US-00006 TABLE 6 Exemplary metabolite pairs providing robust
AUC Metabolites AUC phenylacetylglutamine, xanthine 0.651
phenylacetylglutamine, octenoylcarnitine 0.647 p-cresol sulfate,
xanthine 0.646 isovalerylglycine, p-cresol sulfate 0.646
octenoylcarnitine, p-cresol sulfate 0.645 phenylacetylglutamine,
isovalerylglycine 0.643 gamma-CEHC, p-cresol sulfate 0.641
indoleacetate, p-cresol sulfate 0.635 gamma-CEHC, xanthine 0.633
octenoylcarnitine, xanthine 0.632 isovalerylglycine, pipecolate
0.632 Hydroxyl = chlorothalonil, p-cresol sulfate 0.631
phenylacetylglutamine, indoleacetate 0.629 pipecolate, p-cresol
sulfate 0.629 phenylacetylglutamine, p-cresol sulfate 0.628
1,5-anhydroglucitol (1,5-AG), p-cresol sulfate 0.628
phenylacetylglutamine, lactate 0.627 p-cresol sulfate, lactate
0.627 3-(3-hydroxyphenyl)propionate, 3-indoxyl 0.625 sulfate
pantothenate (Vitamin B5), p-cresol sulfate 0.625
TABLE-US-00007 TABLE 7 Exemplary metabolite triplets providing
robust AUC Metabolites AUC phenylacetylglutamine,
octenoylcarnitine, 0.685 xanthine phenylacetylglutamine,
octenoylcarnitine, 0.681 indoleacetate phenylacetylglutamine,
isovalerylglycine, 0.678 octenoylcarnitine isovalerylglycine,
octenoylcarnitine, p-cresol 0.678 sulfate isovalerylglycine,
octenoylcarnitine, pipecolate 0.677 indoleacetate,
isovalerylglycine, p-cresol 0.676 sulfate octenoylcarnitine,
p-cresol sulfate, xanthine 0.673 phenylacetylglutamine,
isovalerylglycine, 0.671 xanthine pantothenate (Vitamin B5),
p-cresol sulfate, 0.671 xanthine isovalerylglycine,
octenoylcarnitine, lactate 0.670 phenylacetylglutamine,
isovalerylglycine, 0.670 indoleacetate gamma-CEHC,
isovalerylglycine, p-cresol 0.670 sulfate indoleacetate,
octenoylcarnitine, p-cresol 0.668 sulfate phenylacetylglutamine,
pipecolate, xanthine 0.668 pipecolate, p-cresol sulfate, xanthine
0.668 octenoylcarnitine, hydroxy-chlorothalonil, 0.667 p-cresol
sulfate phenylacetylglutamine, isovalerylglycine, 0.667 gamma-CEHC
phenylacetylglutamine, xanthine, gamma- 0.667 CEHC
phenylacetylglutamine, p-cresol sulfate, 0.666 xanthine
indoleacetate, hydroxy-chlorothalonil, p-cresol 0.666 sulfate
Example 5
Validation of the 12-Metabolite Panel Classifier
[0225] Data from 180 samples tested, of which approximately two
thirds were ASD, was used to generate a classifier based on the 12
highly informative metabolites shown in FIG. 7. The classifier was
tested for the ability to discriminate ASD from non-ASD (here, DD)
in a second cohort of 130 samples. This method provided an unbiased
estimate of true predictive performance, corresponding to an AUC of
0.74. A schematic of the process is shown in FIG. 12.
Example 6
Adding Genetic Information to Metabolites May Improve ASD Risk
Prediction
[0226] Adding genetic information to metabolite information was
found to improve ASD risk prediction for certain groups. For
example, combining copy number variation (CNVs) data with
metabolite information significantly reduces the confidence
interval of ASD risk prediction as shown in FIGS. 16A and 16B. As
FIG. 16A demonstrates, adding genetic information further enhances
the separation between ASD and non-ASD groups. In addition to CNV,
other genetic information, including, but not limited to, Fragile X
(FXS) status, may further contribute to a diagnostic test that can
predict ASD risk with improved accuracy and reduction type I and/or
type II errors. As shown in FIG. 16B, including such additional
information (e.g., "PathoCV"), increased the separation between ASD
and DD groups, and thus helped differentiate between these two
conditions.
Example 7
Prominent Biological Pathways Emerging from Metabolite Analysis
[0227] Further analysis of metabolite information revealed clusters
of metabolites presented in Table 1 that play a prominent role in
distinct biological pathways. For example, 7 of 21 metabolites are
related to gut microbial activities (33%) and are shown in Table 8.
All 7 are amino acid metabolites. Six of 7 are metabolites of
aromatic amino acids and have a benzene ring.
TABLE-US-00008 TABLE 8 Seven metabolites involved in gut microbial
activity Change in ASD in the Benzene Bacterially Original
Metabolite STORY cohort ring derived precursor 3-indoxyl sulfate
ASD down Yes yes Tryptophan indoleacetate ASD down Yes yes
Tryptophan p-cresol sulfate ASD down Yes yes Phenylalanine or
Tyrosine 4-ethylphenyl sulfate ASD down Yes yes Phenylalanine or
Tyrosine phenylacetylglutamine ASD down Yes yes Phenylalanine or
Tyrosine 3-(3- DD down Yes yes Phenylalanine
hydroxyphenyl)propionate or Tyrosine pipecolate ASD up No yes
Lysine
[0228] Analysis of the metabolites that are strongly associated
with ASD, as shown in Table 1, reveals connections with certain
biological pathways. For example, particular metabolites that
provide predictive information for ASD suggested impairment of
phase II biotransformation, impaired ability metabolize benzene
rings, dysregulation of reabsorption in kidneys, dysregulation of
carnitine metabolism, and imbalance of transport of large neutral
amino acids into brain. Biological pathway information can be
further utilized to improve ASD risk assessment and/or explore
etiology and pathophysiology of ASD. Such information can also be
used to develop medicinal therapeutics for treatment ASD.
* * * * *