U.S. patent application number 17/469800 was filed with the patent office on 2022-08-04 for assessing risk of reading and language impairment.
This patent application is currently assigned to Yale University. The applicant listed for this patent is Yale University. Invention is credited to Jeffrey R. Gruen, Natalie Renee Powers.
Application Number | 20220243271 17/469800 |
Document ID | / |
Family ID | |
Filed Date | 2022-08-04 |
United States Patent
Application |
20220243271 |
Kind Code |
A1 |
Gruen; Jeffrey R. ; et
al. |
August 4, 2022 |
ASSESSING RISK OF READING AND LANGUAGE IMPAIRMENT
Abstract
Described herein are the association BV677278 (READ1) with
reading disability and language impairment, as well as the
synergistic interaction of DCDC2 risk haplotypes or alleles with
KIAA0319 risk allele.
Inventors: |
Gruen; Jeffrey R.; (Hamden,
CT) ; Powers; Natalie Renee; (Bar Harbor,
ME) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Yale University |
New Haven |
CT |
US |
|
|
Assignee: |
Yale University
New Haven
CT
|
Appl. No.: |
17/469800 |
Filed: |
September 8, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
14441076 |
May 6, 2015 |
11155871 |
|
|
PCT/US2013/069015 |
Nov 7, 2013 |
|
|
|
17469800 |
|
|
|
|
61723774 |
Nov 7, 2012 |
|
|
|
International
Class: |
C12Q 1/6883 20060101
C12Q001/6883; G09B 19/00 20060101 G09B019/00 |
Goverment Interests
FEDERAL FUNDING
[0002] This invention was made with government support under
R01NS043530 awarded by the National Institutes of Health. The
government has certain rights in the invention.
Claims
1-7. (canceled)
8. A method of assaying a sample for a haplotype, comprising: (a)
obtaining a sample comprising nucleic acids from an individual; (b)
sequencing nucleic acids in the sample; and (c) determining if the
sample comprises at least one of the following: (i) a CGCGAG
haplotype in a doublecortin domain containing 2 (DCDC2) gene in the
DYX2 locus at positions rs33914824, rs807694, rs707864, rs10456301,
rs16889066, and rs9379651, respectively; (ii) a CACGAG haplotype in
a DCDC2 gene in the DYX2 locus at positions rs33914824, rs807694,
rs707864, rs10456301, rs16889066, and rs9379651, respectively; or
(iii) both the CGCGAG haplotype of (i) and the CACGAG haplotype of
(ii) in a DCDC2 gene in the DYX2 locus.
9-11. (canceled)
12. The method of claim 8, wherein the sample is blood, cells or
tissue.
13. The method of claim 8, further comprising in (b): sequencing
the sample and determining if the sample comprises: (iv) allele 5
of a DCDC2 gene (SEQ ID NO: 35) in the DYX2 locus, (v) allele 6 of
a DCDC2 gene (SEQ ID NO: 36) in the DYX2 locus, or (iv) both allele
5 of a DCDC2 gene and allele 6 of a DCDC2 gene.
14-16. (canceled)
17. The method of claim 13, wherein the sample is blood, cells or
tissue.
18-62. (canceled)
63. The method of claim 8, wherein if the sample comprises any one
of (i)-(iii), the individual is identified as having or susceptible
for developing a learning disability (LD) and is treated, wherein
treating comprises providing interventions using books on tape;
using word-processing programs with spell-check features; helping
the individual learn through multisensory experiences; teaching
coping tools; and providing services to strengthen the individual's
ability to recognize and pronounce words.
64. The method of claim 13, wherein if the sample comprises any one
of (i)-(vi), the individual is identified as at risk for developing
a learning disability and is monitored to assess whether
development of a learning disability occurs and, if development
occurs, treating the individual, wherein treating comprises
providing interventions including services and materials, including
but not limited to: using special teaching techniques; making
classroom modifications; using books on tape; using word-processing
programs with spell-check features; helping the individual learn
through multisensory experiences; teaching coping tools; and
providing services to strengthen the individual's ability to
recognize and pronounce words.
65. The method of claim 63, wherein the classroom modifications
comprise providing extra time to complete tasks or taped tests to
permit the individual to hear, rather than read, the tests.
66. The method of claim 64, wherein the classroom modifications
comprise providing extra time to complete tasks or taped tests to
permit the individual to hear, rather than read, the tests.
Description
RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 14/441,076, filed May 6, 2015, which is a national stage filing
under 35 U.S.C. .sctn. 371 of International Application No.
PCT/US2013/069015, which was filed Nov. 7, 2013, and claims the
benefit of U.S. Provisional Application No. 61/723,774, filed Nov.
7, 2012. The teachings of the referenced applications are
incorporated by reference herein in their entirety.
BACKGROUND
[0003] Specific learning disabilities (LDs) are disorders
characterized by unexpected difficulty with a specific mode of
learning, despite adequate IQ and educational opportunity. LDs can
involve reading, math, writing, and speech skills, among others,
but the most common involve language. The National Institute of
Child Health and Development (NICHD) estimates 15-20% of Americans
have a language-based LD, of which reading disability (RD) afflicts
the majority (1). RD, also known as dyslexia, is a specific
impairment in processing written language (2). Another LD, language
impairment (LI), is characterized by difficulty processing and
expressing spoken language (3). These LDs are frequently comorbid;
children with LI have increased risk of developing RD (3). Because
reading and language skills are fundamental to academic success,
affected individuals are at risk for adverse psychological
outcomes, as well as limited educational and occupational prospects
(2). Additionally, the prevalence of these LDs makes the cost of
remediation burdensome to the educational system (4). Intervention
is more effective the earlier it is administered (2), making early
detection of high-risk individuals an attractive prospect.
SUMMARY
[0004] As described herein, two haplotypes, both in the same
six-marker haplotype block in the reading disability (RD) risk gene
DCDC2, are associated, respectively, with reading disability and
language impairment (LI). Each of the haplotypes is in very strong
linkage disequilibrium with an allele of BV677278 (also known as
READ1), which is a polymorphic compound STR associated with reading
disability and capable of modulating expression from the DCDC2
promoter. BV677278 (READ1) has been shown to specifically bind
ETV6, a potent transcriptional regulator and proto-oncogene, in
vivo. BV677278 binds the brain-expressed nuclear protein with very
high specificity and is capable of modulating reporter gene
expression from the DCDC2 promoter in an allele-specific manner.
Activation patterns in reading-related areas of the brain, as
measured by functional magnetic resonance imaging, are influenced
by BV677278 alleles. Work described herein shows that BV677278 is
associated with both reading and language, and that at least two
BV677278 alleles have a deleterious effect on reading and language.
Allele 5 is important for dyslexia (RD) and allele 6 is important
for language impairment. BV677278 has been renamed and is also
referred to herein as READ1, which stands for "regulatory element
associated with dyslexia 1." The two terms are used
interchangeably.
[0005] As also described herein, the DCDC2 risk haplotypes or
alleles interact with a KIAA0319 risk haplotype in a synergistic
manner. The synergy between the BV677278 (READ1) alleles or the
DCDC2 risk haplotypes and the KIAA0319 haplotypes in decreasing
performance in phoneme deletion (very important to reading),
spelling, and total IQ and performance IQ, has not previously been
described. Together, the effect is 3-fold to 8-fold greater than if
the deleterious version of either risk allele or haplotype is
present.
[0006] In particular, described herein is a six-marker haplotype
block within DCDC2, of which two haplotypes (CGCGAG and GACGAG)
associated with very poor performance on a phoneme deletion task
and composite language measure, respectively. The two haplotypes
show strong association with their respective phenotypes: CGCGAG
with RD and GACGAG with LI. Carriers of the CGCGAG haplotype, on
average, showed significantly poorer performance on eight
reading-related measures, compared to non-carriers, and carriers of
the GACGAG haplotype showed significantly lower average performance
on the WOLD/WR composite language measure.
[0007] Further described herein are methods of identifying or
aiding in identifying an individual who is at risk of developing at
least one (a, one or more) learning disability (LD), in which a
sample obtained from the individual is assayed for the presence of
at least one (a, one or more) haplotype in the DYX2 locus
(chromosome 6p; 6p22) that is associated with susceptibility for
developing at least one (a, one or more) LD in humans. The presence
of at least one haplotype that is associated with susceptibility
for developing at least one (a, one or more) LD in humans in the
DYX2 locus indicates that the individual is at risk for developing
at least one (a, one or more) LD. In some embodiments, the at least
one LD is reading disability (RD) or language impairment (LI). The
at least one haplotype can be located in the DCDC2 gene within the
DYX2 locus or in the KIAA0319 gene within the DYX2 locus.
Alternatively, a sample is assayed (analyzed) for the presence of a
haplotype located in the DCDC2 gene within the DYX2 locus and the
presence of a haplotype located in the KIAA0319 gene within the
DYX2 locus. The at least one haplotype can comprise (a) CGCGAG,
GACGAG or both in a DCDC2 gene within the DYX2 locus; (b) one or
more single nucleotide polymorphisms (SNPs) associated with a
variant KIAA0319, such as rs4504469, rs2038137, or rs2143340 or any
combination of two or three of rs4504469, rs2038137 and rs2143340;
or (c) any combination of the haplotypes in (a) and (b). Also
described herein are methods of treating an individual suspected or
identified as having a LD. Treatment can include, for example,
inhibiting ETV6 in the individual, and/or providing services
designed to address or remedy certain aspects associated with a LD,
such as RD or LI, such as providing intervention, including
services and materials, including but not limited to using special
teaching techniques; making classroom modifications, such as
providing extra time to complete tasks and taped tests to permit
the individual to hear, rather than read the tests; using books on
tape; using word-processing programs with spell-check features;
helping the individual learn through multisensory experiences;
teaching coping tools; and providing services to strengthen the
individual's ability to recognize and pronounce words.
[0008] Described herein is a method of determining if a sample
obtained from an individual comprises nucleic acid which comprises
a haplotype associated with susceptibility for developing a
learning disability (LD) in humans, comprising assaying a sample
that comprises nucleic acid from the individual for the presence in
the DYX2 locus of at least one of the following markers: (a) CGCGAG
in a DCDC2 gene; (b) CACGAG in a DCDC2 gene; (c) both CGCGAG and
CACGAG in a DCDC2 gene; (d) rs4504469 in a KIAA0319 gene; (e)
rs2038137 in a KIAA0319 gene; (f) rs2143340 in a KIAA0319 gene; (g)
any combination of two or three of rs4504469, rs2038137 and
rs2143340 in a KIAA0319 gene; and (h) any combination of CGCGAG in
a DCDC2 gene; CACGAG in a DCDC2 gene; both CGCGAG and CACGAG in a
DCDC2 gene; rs4504469 in a KIAA0319 gene; rs2038137 in a KIAA03109
gene; rs2143340 in a KIAA0319 gene; and any combination of two or
three of rs4504469, rs2038137 and rs2143340 in a KIAA0319 gene,
wherein if the sample comprises at least one of (a)-(h), the sample
comprises a haplotype associated with susceptibility for developing
a learning disability in humans. In one embodiment, the sample is
assayed for at least one marker of (a), (b) and (c) and at least
one marker of (d), (e), (f) and (g). In various embodiments, the
sample is assayed (analyzed) for two or more markers of (a), (b)
and (c) and two or more markers of (d), (e), (f) and (g); only
markers of (a), (b) and (c); only markers of (d), (e), (f) and (g);
or any combination of the markers of (a)-(h). The LD can be RD or
LI or both RD and LI. Any sample that contains nucleic acid (e.g.,
genomic DNA; RNA) that can be analyzed for a haplotype of interest
can be assayed; methods for analyzing nucleic acids are well known
in the art and also described herein. They include, but are not
limited to, hybridization-mediated methods, and sequencing. The
sample can be, for example, blood, cells, or tissue. Alternatively,
genomic DNA can be sequenced and the presence or absence of a
haplotype associated with susceptibility for developing a learning
disability (LD) in humans determined. In one embodiment, the method
further comprises assaying the sample for allele 5 of DCDC2 gene,
as presented herein (e.g., SEQ ID NO:35); allele 6 of DCDC2 gene,
as presented herein (e.g., SEQ ID NO:36) or both.
[0009] Another embodiment is a method of assaying a sample for a
marker of a haplotype associated with susceptibility for developing
a learning disability (LD) in humans, comprising: (a) obtaining a
sample comprising nucleic acid from an individual; and (b)
determining if the sample comprises at least one of the following:
(i) CGCGAG in a DCDC2 gene in the DYX2 locus; (ii) CACGAG in a
DCDC2 gene in the DYX2 locus; (iii) both CGCGAG and CACGAG in a
DCDC2 gene in the DYX2 locus; (iv) rs4504469 in a KIAA0319 gene in
the DYX2 locus; (v) rs2038137 in a KIAA0319 gene in the DYX2 locus;
(vi) rs2143340 in a KIAA0319 gene in the DYX2 locus; (vii) any
combination of two or three of rs4504469, rs2038137 and rs2143340
in a KIAA0319 gene in the DYX2 locus; and (viii) any combination of
CGCGAG in a DCDC2 gene; CACGAG in a DCDC2 gene in the DYX2 locus;
both CGCGAG and CACGAG in a DCDC2 gene in the DYX2 locus; rs4504469
in a KIAA0319 gene in the DYX2 locus; rs2038137 in a KIAA03109 gene
in the DYX2 locus; rs2143340 in a KIAA0319 gene in the DYX2 locus;
and any combination of two or three of rs4504469, rs2038137 and
rs2143340 in a KIAA0319 gene in the DYX2 locus, wherein if the
sample comprises at least one marker of (i)-(vii), the sample
comprises a haplotype associated with susceptibility for developing
a learning disability in humans. In one embodiment, the sample is
assayed for at least one marker of (i), (ii) and (iii) and at least
one marker of (iv), (v), (vi) and (vii). The LD is reading
disability (RD) or language impairment (LI) or both RD and LI. Any
sample that contains nucleic acid (e.g., genomic DNA; RNA) that can
be analyzed for a haplotype of interest can be assayed; methods for
analyzing nucleic acids are well known in the art and also
described herein. They include, but are not limited to,
hybridization-mediated methods and sequencing. The sample can be,
for example, blood, cells, or tissue. Alternatively, genomic DNA
can be sequenced and the presence or absence of a haplotype
associated with susceptibility for developing a learning disability
(LD) in humans determined.
[0010] A further embodiment is a method of determining if a sample
obtained from an individual comprises nucleic acid which comprises
an allele associated with susceptibility for developing a learning
disability (LD) in humans, comprising: assaying a sample that
comprises nucleic acid from the individual for the presence of
allele 5 of DCDC2 gene in the DYX2 locus (e.g., SEQ ID NO:35),
allele 6 of DCDC2 gene in the DYX2 locus (e.g., SEQ ID NO:36), or
both allele 5 of DCDC2 gene and allele 6 of DCDC2 gene, wherein if
the sample comprises at least one of allele 5 and allele 6, the
sample comprises an allele associated with susceptibility for
developing a learning disability. The LD is reading disability (RD)
or language impairment (LI) or both RD and LI. Any sample that
contains nucleic acid (e.g., genomic DNA; RNA) that can be analyzed
for a haplotype of interest can be assayed; methods for analyzing
nucleic acids are well known in the art and also described herein.
They include, but are not limited to, hybridization-mediated
methods and sequencing. The sample can be, for example, blood,
cells, or tissue. Alternatively, genomic DNA can be sequenced and
the presence or absence of a haplotype associated with
susceptibility for developing a learning disability (LD) in humans
determined.
[0011] A further embodiment is a method of determining if a sample
obtained from an individual comprises at least one marker
associated with comorbid reading disability (RD) and language
impairment (LI) in humans, comprising: (a) obtaining a sample that
contains nucleic acid from the individual and (b) assaying the
sample for (i) at least one marker in DCDC2 gene in the DYX2 locus
for a haplotype associated with susceptibility for developing RD in
humans and (ii) at least one marker in KIAA0319 gene in the DXY2
locus for a haplotype associated with susceptibility for developing
LI in humans, wherein if the sample comprises a marker of (i) and a
marker of (ii), the sample comprises at least one marker associated
with comorbid RD and LI. The marker of (b)(i) and the marker of
(b)(ii) can be the same marker or two different markers. In one
embodiment, the at least one marker of (b)(i) is CGCGAG or GACGAG
and the at least one marker of (b)(ii) is rs4504469; rs2038137; or
rs2143340. In any of these embodiments, the method can further
comprise assaying the sample for allele 5 of DCDC2 gene in the DYX2
locus, allele 6 of DCDC2 gene of DYX2 locus, or both allele 5 of
DCDC2 gene in the DYX2 locus and allele 6 of DCDC2 gene in the DYX2
locus. In yet further embodiments, the at least one marker of
(b)(i) and the at least one marker of (b)(ii) are selected from:
rs12636438; rs1679255; rs9521789; rs1983931; rs9814232; rs7995158;
rs6573225; rs4082518; rs442555; rs259521; rs16889556; rs1047782;
rs1530680; rs12667130; rs6965855; rs985080; rs4726782; rs1718101;
rs10487689; rs1918296; rs737533; rs4504469; rs2038137; rs2143340;
rs9295626; rs7763790 rs6935076; rs2817201; rs10456309; rs4576240;
rs17307478; rs9356939; rs7763790; rs6456621; rs6456624; rs6935076;
rs2038137; rs3756821; rs1883593; rs3212236; rs6456621; rs12193738;
rs2817198; rs793845; rs2799373; rs793862; rs793834; rs2792682;
rs807704; rs707864; and rs807694. Any sample that contains nucleic
acid (e.g., genomic DNA; RNA) that can be analyzed for a haplotype
of interest can be assayed; methods for analyzing nucleic acids are
well known in the art and also described herein. They include, but
are not limited to, hybridization-mediated methods, and sequencing.
The sample can be, for example, blood, cells, or tissue.
Alternatively, genomic DNA can be sequenced and the presence or
absence of a marker associated with susceptibility for developing a
learning disability (LD) in humans determined.
[0012] Another embodiment is a method of determining if a sample
obtained from an individual comprises a marker associated with
language impairment (LI) in humans, comprising assaying a sample
obtained from the individual for at least one of the following
markers: CACGAG in a DCDC2 gene in the DYX2 locus; rs793845;
rs2799373; rs793862; rs793834; rs2792682; rs807704; rs707864;
rs12193738; rs2817198; rs10456309; rs985080; rs1554690; rs2533096;
rs6951437; rs344470; rs344468; rs807694; rs482700; rs7695228;
rs1940309; rs505277; rs476739; rs867036; rs867035; rs2071674;
rs7694946; rs4823324; and a marker for at least one of the
following genes: NEK2; DLEC1; NARS; IL4I1; PKD2; ATF5; NUP62;
SIGLEC11; ACAN; and PGD. Any sample that contains nucleic acid
(e.g., genomic DNA; RNA) that can be analyzed for a haplotype of
interest can be assayed; methods for analyzing nucleic acids are
well known in the art and also described herein. They include, but
are not limited to, hybridization-mediated methods and sequencing.
The sample can be, for example, blood, cells, or tissue.
Alternatively, genomic DNA can be sequenced and the presence or
absence of a marker associated with susceptibility for developing
LI in humans determined.
[0013] A further embodiment is a method of determining if a sample
obtained from an individual comprises a marker associated with
reading disability (RD) in humans, comprising assaying a sample
obtained from the individual for at least one of the following
markers: CGCGAG in a DCDC2 gene in the DYX2 locus; rs180950;
rs2590673; rs892100; rs1792745; rs12546767; rs12634033; rs892270;
rs10887149; rs10041417; rs6792971; rs4725745; rs12444778;
rs1444186; rs2294691; rs10456309; rs1562422; and a marker for at
least one of the following genes: MAP4; OR2L8; CRYBA4; OR2T8;
KIAA1622; OR2AK2; DHX30; GEMIN6; C20orf10; and PPIF. Any sample
that contains nucleic acid (e.g., genomic DNA; RNA) that can be
analyzed for a haplotype of interest can be assayed; methods for
analyzing nucleic acids are well known in the art and also
described herein. They include, but are not limited to,
hybridization-mediated methods and sequencing. The sample can be,
for example, blood, cells, or tissue. Alternatively, genomic DNA
can be sequenced and the presence or absence of a marker associated
with susceptibility for developing RD in humans determined.
[0014] Another embodiment is a method of determining if nucleic
acids (DNA, RNA) in an individual comprise markers of haplotypes
that interact in a synergistic manner in resulting in a learning
disorder (LD) in humans comprising: (a) obtaining a sample that
comprises nucleic acids from an individual and (b) assaying the
sample for at least one DCDC2 risk haplotype or DCDC2 risk allele
and at least one KIAA0319 risk haplotype, wherein the at least one
DCDC2 risk haplotype is CGCGAG or GACGAG, the at least one DCDC2
risk allele is allele 5 of DCDC2 gene in the DYX2 locus (SEQ ID
NO:35) or allele 6 of DCDC2 gene in the DYX2 locus (SEQ ID NO:36)
and the at least one KIAA0319 risk haplotype is a variant KIAA0319
haplotype comprising a snp which is rs4504469; rs2038137; or
rs2143340 and wherein if the sample comprises at least one DCDC2
risk haplotype or at least one DCDC2 risk allele and at least one
KIAA0319 risk haplotype, the nucleic acids comprise markers of
haplotypes that interact in a synergistic manner in resulting in a
LD in humans. Any sample that contains nucleic acid (e.g., genomic
DNA; RNA) that can be analyzed for a haplotype of interest can be
assayed; methods for analyzing nucleic acids are well known in the
art and also described herein. They include, but are not limited
to, hybridization-mediated methods and sequencing. The sample can
be, for example, blood, cells, or tissue. Alternatively, genomic
DNA can be sequenced and the presence or absence of markers that
interact in a synergistic manner in resulting in LD in humans
determined.
[0015] A further embodiment is a method of identifying or aiding in
identifying an individual at risk for developing at least one
learning disability (LD), comprising assaying a sample obtained
from the individual for the presence in the DYX2 locus of at least
one haplotype that is associated with susceptibility for developing
a LD in humans, wherein the presence in the DYX2 locus of at least
one haplotype that is associated with susceptibility for developing
a LD in humans indicates that the individual is at risk for
developing a LD. At least one LD is a reading disability or
language impairment. The at least one haplotype is located in the
DCDC2 gene within the DYX2 locus or in the KIAA0319 gene within the
DYX2 locus. The at least one haplotype can comprise (a) CGCGAG or
CACGAG in a DCDC2 gene within the DYX2 locus; or (b) rs4504469,
rs2038137, rs2143340, or any combination thereof in a KIAA0319 gene
within the DYX2 locus; or (c) any combination of the haplotypes in
(a) and (b). Any sample that contains nucleic acid (e.g., genomic
DNA; RNA) that can be analyzed for a haplotype of interest can be
assayed; methods for analyzing nucleic acids are well known in the
art and also described herein. They include, but are not limited
to, hybridization-mediated methods and sequencing. The sample can
be, for example, blood, cells, or tissue. Alternatively, genomic
DNA can be sequenced and the presence or absence in the DYX2 locus
of at least one haplotype that is associated with susceptibility
for developing a LD in humans determined. The presence in the DYX2
locus of at least one haplotype that is associated with
susceptibility for developing a LD in humans indicates that the
individual is at risk for developing a LD.
[0016] Also the subject herein is a method of determining if a
sample obtained from an individual comprises a marker for
susceptibility for developing a learning disability (LD) that is
reading disability (RD) or language impairment (LI), comprising:
obtaining a sample that comprises nucleic acid from the individual
and determining if the sample comprises at least one marker
selected from the group consisting of: rs12636438; rs1679255;
rs9521789; rs1983931; rs9814232; rs7995158; rs6573225; rs4082518;
rs442555; rs259521; rs482700; rs7695228; rs1940309; rs505277;
rs476739; rs867036; rs867035; rs2071674; rs7694946; rs4823324;
rs180950; rs2590673; rs892100; rs1792745; rs12546767; rs12634033;
rs892270; rs10887149; rs10041417; rs6792971; rs12636438; rs1679255;
rs9521789; rs476739; rs505277; rs482700; rs7695228; rs867036;
rs867035; rs1940309; rs16889556; rs1047782; rs1530680; rs12667130;
rs6965855; rs985080; rs4726782; rs1718101; rs10487689; rs1918296;
rs737533; rs793845; rs2799373; rs793862; rs793834; rs2792682;
rs807704; rs707864; rs12193738; rs2817198; rs10456309; rs985080;
rs1554690; rs2533096; rs6951437; rs344470; rs344468; rs4725745;
rs12444778; rs1444186; rs2294691; rs10456309; rs1562422; rs807694;
rs3756814; rs3777663; rs9295626; rs7763790; rs6935076; rs9348646;
rs2328791; rs2328791; rs2817201, rs9295626; rs4576240; rs17307478,
rs9356939, rs7763790, rs6456621; rs6456624, rs6935076, rs2038137,
rs3756821, rs1883593, rs3212236; rs3777663, rs3756814, rs6931809,
rs6916186, rs6933328, rs17491647; rs2328791; rs33914824a;
rs807694a; rs707864a; rs10456301a; rs16889066a; rs9379651a;
rs2817201; rs9295626; rs10456309; rs4576240; rs17307478; rs9356939;
rs7763790; rs6456621; rs3756821; rs1883593; rs3212236; rs2294691;
rs3777663; rs3756814; rs6931809; rs6916186; rs6933328; rs17491647;
rs9348646; rs1562422 and a marker for each of the following
genes:R5H2; OR5H6; RRAGA; OR6B3; UMOD; A26C1A; FAM29A; CHRNA1;
IFIT5; LOC643905; K2; DLEC1; NARS; IL4I1; PKD2; ATFS; NUP62;
SIGLEC11; ACAN; PGD; MAP4; OR2L8; CRYBA4; OR2T8; KIAA1622; OR2AK2;
DHX30; GEMIN6; C20orf10; and PPIF. The LD is reading disability
(RD) or language impairment (LI) or both RD and LI. Any sample that
contains nucleic acid (e.g., genomic DNA; RNA) that can be analyzed
for a marker for susceptibility for developing a learning
disability (LD) that is reading disability (RD) or language
impairment (LI)can be assayed; methods for analyzing nucleic acids
are well known in the art and also described herein. They include,
but are not limited to, hybridization-mediated methods and
sequencing. The sample can be, for example, blood, cells, or
tissue. Alternatively, genomic DNA can be sequenced and the
presence or absence of such a marker or markers.
[0017] Further described herein is a method of identifying or
aiding in identifying an individual at risk for developing at least
one learning disability (LD), comprising assaying a sample obtained
from the individual for the presence in the DYX2 locus of at least
one haplotype that is associated with susceptibility for developing
a LD in humans, wherein the presence in the DYX2 locus of at least
one haplotype that is associated with susceptibility for developing
a LD in humans indicates that the individual is at risk for
developing a LD. At least one LD is a reading disability (RD) or
language impairment (LI). The at least one haplotype is located in
the DCDC2 gene within the DYX2 locus or in the KIAA0319 gene within
the DYX2 locus and can comprise (a) CGCGAG, CACGAG, or both CGCGAG
and CACGAG in a DCDC2 gene within the DYX2 locus; or (b) rs4504469,
rs2038137, rs2143340, or any combination thereof in a KIAA0319 gene
within the DYX2 locus; or (c) any combination of the haplotypes in
(a) and (b). In the method, the assay comprises a
hybridization-mediated method, nucleic acid sequencing, or both a
hybridization-mediated method and nucleic acid sequencing. The
sample is blood, cells, or tissue.
[0018] Another embodiment is a method of identifying an individual
as having, or being susceptible to developing, a learning
disability (LD), comprising obtaining a sample comprising nucleic
acid from an individual; determining whether nucleic acid in the
sample comprises a DCDC2 gene haplotype in the DYX2 locus
associated with susceptibility for developing reading disability
(RD) and a KIAA0319 gene haplotype associated with susceptibility
for developing language impairment (LI), wherein the DCDC2 gene
haplotype and the KIAA0319 gene haplotype interact synergistically
in decreasing performance in phoneme deletion and in resulting in a
learning disorder (LD) in humans, wherein if the sample comprises
both haplotypes, the individual is identified as having or being
susceptible to developing a LD. In this method, the determining
comprises a hybridization-mediated method, nucleic acid sequencing,
or both a hybridization-mediated method and nucleic acid
sequencing. The sample is blood, cells, or tissue.
[0019] In some embodiments, a method by which an individual is
identified as having or being susceptible for developing a learning
disability (LD) comprises treating the individual so identified.
Treatment comprises providing interventions, including services and
materials, including but not limited to: using special teaching
techniques; making classroom modifications, such as providing extra
time to complete tasks and taped tests to permit the individual to
hear, rather than read the tests; using books on tape; using
word-processing programs with spell-check features; helping the
individual learn through multisensory experiences; teaching coping
tools; and providing services to strengthen the individual's
ability to recognize and pronounce words. See, for example,
nichd.nih.gov/health/topics/learning/conditioninfo. What are the
treatments for learning disabilities?
[0020] Another embodiment is a method of treating an individual for
a learning disability (LD) comprising inhibiting ETV6 in the
individual. The individual has RD, LI, or both.
[0021] In some embodiments, a method by which an individual is
identified as at risk for developing a learning disorder further
comprises monitoring the individual identified as at risk for
developing a learning disability to assess whether development of a
learning disability occurs and, if development occurs, treating the
individual, wherein treating comprises providing interventions,
including services and materials, including but not limited to:
using special teaching techniques; making classroom modifications,
such as providing extra time to complete tasks and taped tests to
permit the individual to hear, rather than read the tests; using
books on tape; using word-processing programs with spell-check
features; helping the individual learn through multisensory
experiences; teaching coping tools; and providing services to
strengthen the individual's ability to recognize and pronounce
words.
[0022] See, for
example,nichd.nih.gov/health/topics/learning/conditioninfo. What
are the treatments for learning disabilities?
[0023] Also described herein are arrays, such as microarrays (DNA
arrays or microarrays). According to one embodiment, an array
(e.g., microarray) for identifying or aiding in identifying an
individual at risk for developing at least one learning disability
(LD) is provided. The array comprises a support having a plurality
of discrete regions (e.g., spots), each discrete region having
(having affixed thereto) one or more nucleic acid fragment (e.g.,
probes) spotted or otherwise attached or deposited thereon.
Typically, each discrete region bears a reagent, such as nucleic
acid (DNA, RNA) that detects a marker (e.g., SNP, haplotype marker,
allele, etc) associated with susceptibility for developing a LD
(e.g., RD, LI) in humans. The nucleic acid fragments are
complementary to nucleic acids (e.g., DNA, such as genomic DNA, or
RNA, such as mRNA) that are markers for a variant gene, such as
variant DCDC2, KIAA0319 and others named herein, associated with
susceptibility for developing at least one LD (e.g., as provided
herein). The nucleic acid fragments on a particular discrete region
can be of any length and sequence (e.g., that complements the
nucleic acid comprising a marker) suitable for the detection of any
marker described herein. For example, in some embodiments, a
nucleic acid fragment (e.g., probe, SNP probe) is between 10 and
100 nucleotides in length. In some embodiments, a nucleic acid
fragment is between about 20 and 80, about 30 and 60, or about 40
and 50 nucleotides (nt) in length. In specific embodiments, the
probes are 25 nt, 30 nt, 35 nt. or 40 nt in length. See, for
example, LaFramboise, T., Nucl. Acids Res. (2009) 37 (13):
4181-4193. In some embodiments, a particular discrete region
comprises a plurality of nucleic acid fragments (e.g., probes, SNP
probes), each of which is capable of hybridizing to a particular
marker. In some embodiments, the plurality of nucleic acid
fragments are of varying lengths (e.g., as described herein) and
sequences. In some embodiments, the array detects two or more
markers associated with susceptibility for developing a learning
disability (LD) in humans, wherein the two or more markers comprise
one or more markers in a DCDC2 gene and one or more markers in a
KIAA0319 gene. In some embodiments, the one or more markers in a
DCDC2 gene are selected from CGCGAG, CACGAG, READ1 allele 5 (SEQ ID
NO:35), READ1 allele 6 (SEQ ID NO:36), or any combination of two,
three or four of CGCGAG, CACGAG, READ1 allele 5 (SEQ ID NO:35), and
READ1 allele 6 (SEQ ID NO:36). In some embodiments, the one or more
markers in a KIAA0319 gene are selected from rs4504469, rs2038137,
rs2143340, or any combination of two or three of rs4504469,
rs2038137 and rs2143340.
[0024] In some embodiments, the array (e.g., microarray) detects
markers associated with susceptibility for developing language
impairment (LI) in humans. In some embodiments, the array comprises
discrete regions (e.g., discrete regions comprising one or more
nucleic acid fragments) capable of detecting markers in a DCDC2
gene, such as CACGAG, READ1 allele 6 (SEQ ID NO:36), rs793845,
rs2799373, rs793862, rs793834, rs2792682, rs807704, rs707864,
rs807694, or any combination thereof. In some embodiments, the
array (further) detects one or more markers in a KIAA0319 gene,
such as rs12193738, rs2817198, rs10456309, or any combination
thereof. In some embodiments, the array further comprises one or
more discrete regions comprising nucleic acid fragments spotted on
the support that detect one or more markers selected from rs985080,
rs1554690, rs2533096, rs6951437, rs344470, rs344468, rs482700,
rs7695228, rs1940309, rs505277, rs476739, rs867036, rs867035,
rs2071674, rs7694946, rs4823324, and markers for the following
genes: NEK2; DLEC1; NARS; IL4I1; PKD2; ATFS; NUP62; SIGLEC11; ACAN;
and PGD.
[0025] In some embodiments, the array (e.g., microarray) detects
markers associated with susceptibility for developing a reading
disability (RD) in humans. In some embodiments, the array comprises
discrete regions (e.g., discrete regions comprising one or more
nucleic acid fragments) capable of detecting markers in a DCDC2
gene, CGCGAG and READ1 allele 5 (SEQ ID NO:35), or both. In some
embodiments, the array (further) detects one or more markers in a
KIAA0319 gene, such as rs10456309. In some embodiments, the array
further comprises one or more discrete regions comprising nucleic
acid fragments spotted on the support that detect one or more
markers selected from rs180950, rs2590673, rs892100, rs1792745,
rs12546767, rs12634033, rs892270, rs10887149, rs10041417,
rs6792971, rs4725745, rs12444778, rs1444186, rs2294691, rs10456309,
rs1562422, and a markers for the following genes: MAP4; OR2L8;
CRYBA4; OR2T8; KIAA1622; OR2AK2; DHX30; GEMIN6; C20orf10; and
PPIF.
[0026] In some embodiments, an array (e.g., microarray) is provided
that detects one or more markers associated with susceptibility for
developing a LD in humans, wherein the one or more markers are
selected from rs12636438; rs1679255; rs9521789; rs1983931;
rs9814232; rs7995158; rs6573225; rs4082518; rs442555; rs259521;
rs482700; rs7695228; rs1940309; rs505277; rs476739; rs867036;
rs867035; rs2071674; rs7694946; rs4823324; rs180950; rs2590673;
rs892100; rs1792745; rs12546767; rs12634033; rs892270; rs10887149;
rs10041417; rs6792971; rs12636438; rs1679255; rs9521789; rs476739;
rs505277; rs482700; rs7695228; rs867036; rs867035; rs1940309;
rs16889556; rs1047782; rs1530680; rs12667130; rs6965855; rs985080;
rs4726782; rs1718101; rs10487689; rs1918296; rs737533; rs793845;
rs2799373; rs793862; rs793834; rs2792682; rs807704; rs707864;
rs12193738; rs2817198; rs10456309; rs985080; rs1554690; rs2533096;
rs6951437; rs344470; rs344468; rs4725745; rs12444778; rs1444186;
rs2294691; rs10456309; rs1562422; rs807694; rs3756814; rs3777663;
rs9295626; rs7763790; rs6935076; rs9348646; rs2328791; rs2328791;
rs2817201, rs9295626; rs4576240; rs17307478, rs9356939, rs7763790,
rs6456621; rs6456624, rs6935076, rs2038137, rs3756821, rs1883593,
rs3212236; rs3777663, rs3756814, rs6931809, rs6916186, rs6933328,
rs17491647; rs2328791; rs33914824a; rs807694a; rs707864a;
rs10456301a; rs16889066a; rs9379651a; rs2817201; rs9295626;
rs10456309; rs4576240; rs17307478; rs9356939; rs7763790; rs6456621;
rs3756821; rs1883593; rs3212236; rs2294691; rs3777663; rs3756814;
rs6931809; rs6916186; rs6933328; rs17491647; rs9348646; rs1562422
and markers of the following genes:R5H2; OR5H6; RRAGA; OR6B3; UMOD;
A26C1A; FAM29A; CHRNA1; IFIT5; LOC643905; K2; DLEC1; NARS; IL4I1;
PKD2; ATFS; NUP62; SIGLEC11; ACAN; PGD; MAP4; OR2L8; CRYBA4; OR2T8;
KIAA1622; OR2AK2; DHX30; GEMIN6; C20orf10; and PPIF.
BRIEF DESCRIPTION OF DRAWINGS
[0027] FIG. 1: (A) Structure of the BV677278 STR. (B) Location of
the risk haplotype block within the DCDC2 gene, relative to
BV677278. Exons are numbered.
[0028] FIG. 2: (A-B) SILAC results for Raji and HeLa cells;
two-dimensional interaction plot. (C) ChIP results. a-H3: antibody
to a histone H3 variant enriched in actively transcribing genes.
(3-actin: control amplicon from the (3-actin gene. Error bars
represent standard deviation among 3 replicates. Double asterisk
(**) represents a p-value below 0.01 (one-tailed T-test; see Table
S6).
[0029] FIG. 3: (A) Effect of risk haplotype carrier status on
various reading, language, and cognitive phenotypes (described in
Table 51 and Materials and Methods of Examples Section). Data
points represent the mean of each group, converted to a z-score
relative to the mean of the ALSPAC sample population. Units of the
y-axis are fractions of a standard deviation. PD: phoneme deletion
task; Reading7: single-word reading at age 7; NW Reading: non-word
reading at age 9; Spelling? and 9: spelling at ages 7 and 9; WOLD:
Wechsler Objective Learning Dimensions verbal comprehension task;
NWR: non-word repetition task. (B) Model of differential effects of
BV677278 alleles. ETV6 monomers must at least homodimerize through
their pointed (PNT) domains to bind DNA through their ETS domains,
and they may homopolymerize in vivo. Indels of BV677278 repeat
units could change the size of the ETV6 polymer, and thus affect
target gene expression.
[0030] FIG. 4: Phylogenetic tree based on multiple alignment of 22
BV677278 alleles. The Clustal W algorithm was used, with default
method parameters (IUB matrix, gap penalty=15, gap extension
penalty=6.66). Clade 1, which contains risk alleles 5 and 6, is the
top most branched grouping, in light grey (not including the branch
for Allele 22).
[0031] FIG. 5: Epistasis of READ1 over KIAA0319 Risk Haplotype. (A)
This plot shows the effect of having at least one copy of the
denoted READ1 allele alone (e.g., allele 5), at least one copy of
KIAHap alone, and at least one copy of each (both), compared to all
members of the ALSPAC. (B) This plot shows the protective effects
of having a READ1 allele comprising a single copy of Repeat Unit 1
(RU1_1). Data points represent the z-transformed mean of each
group, compared to the mean of the entire ALSPAC (Mean_All), on the
indicated measures. Units of the y-axis are fractions of a standard
deviation. Verbal, Performance, and Total IQ were measured at age 8
by the WISC-III. PD: phoneme deletion task at age 7; Reading 7:
single-word reading task at age 7; Reading 9: single-word reading
task at age 9; NW Reading: single non-word reading task at age 9,
Spelling 7: spelling task at age 7; Spelling 9: spelling task at
age 9.
[0032] FIG. 6: (A) Schematic of the genes within the DYX2 locus on
chromosome 6p21.3. Genes in light grey, DCDC2 and KIAA0319, have
replicated associations with written and verbal language
phenotypes, namely RD and LI. Regions in dark grey within the genes
denote two functional variants, READ1 in DCDC2 and a risk haplotype
with markers in KIAA0319 and TDP2, which have been functionally
associated with RD and LI using animal models and molecular
techniques. (B) An updated schematic of genes with markers that
show replicated associations to RD, LI, and/or IQ. The genes (shown
in light grey) have expanded to seven (DCDC2, KIAA0319, TDP2,
ACOT13, C6orf62, FAM65B, and CMAHP), although linkage
disequilibrium may account for multiple associations (particularly
for KIAA0319, TDP2, ACOT13, and C6orf62).
DETAILED DESCRIPTION
[0033] Variant DCDC2 and variant KIAA0319 Polynucleotide Probes and
Primers Provided here are isolated, synthetic and recombinant
polynucleotides that detect an alteration in a DCDC2 gene (referred
to as a variant DCDC2 gene) that is associated with susceptibility
to developing a learning disability (LD), such as isolated and
recombinant polynucleotides that detect an alteration of DCDC2 in
the DYX2 locus. The variant is, for example, a DCDC2 risk haplotype
(e.g., CGCGAG, CACGAG), allele 5 of BV677278 (READ1); allele 6 of
BV677278 (READ1); or one, two or three or more of the variants
associated with susceptibility to developing a learning disability
(LD). Also provided are isolated, synthetic and recombinant
polynucleotides that detect an alteration in a KIAA0319 gene
(referred to as a variant KIAA0319 gene) that is associated with
susceptibility to developing a learning disability (LD), such as
isolated and recombinant polynucleotides that detect an alteration
of KIAA0319 in the DYX2 locus. The variant is, for example, a
KIAA0319 risk haplotype (e.g., rs4504469, rs2038137, rs2143340); or
one, two or three or more of the variants associated with
susceptibility to developing a learning disability (LD). The LD is
a reading disability (RD) or a language impairment (LI).
Polynucleotide probes typically have a sequence which is fully or
partially complementary to the sequence of the alteration and the
flanking region and hybridize to the alteration of interest, and
the flanking sequence in a specific manner. A variety of
alterations in a DCDC2 gene or in a KIAA0319 gene associated with
susceptibility for developing LD, such as RD and LI, may be
detected by the polynucleotides described herein. For example, a
single nucleotide polymorphism (SNP) of a coding region, exon,
exon-intron boundary, signal peptide, 5-prime untranslated region,
promoter region, enhancer sequence, 3-prime untranslated region or
intron that is associated with LD such as RD and LI can be
detected. These polymorphisms include, but are not limited to,
those that result in changes in the amino acid sequence of the
proteins encoded by the DCDC2 gene and changes in the amino acid
sequence of the proteins encoded by the KIAA0319 gene, produce
alternative splice products, create truncated products, introduce a
premature stop codon, introduce a cryptic exon, alter the degree or
expression to a greater or lesser extent, alter tissue specificity
of DCDC2 and/or KIAA0319 expression (e.g., at either the mRNA or
protein level), introduce changes in the tertiary structure of the
proteins encoded by DCDC2 and/or KIAA0319, introduce changes in the
binding affinity or specificity of the proteins expressed by DCDC2
and/or KIAA0319 or alter the function of the proteins encoded by
DCDC2 and/or KIAA0319. The subject polynucleotides include
polynucleotides that are variants of the polynucleotides described
herein, as long as the variant polynucleotides maintain their
ability to specifically detect a variation in the DCDC2 gene that
is associated with susceptibility for developing LD, such as RD
and/or LI or in the KIAA0319 gene that is associated with
susceptibility for developing LD, such as RD and/or LI. Variant
polynucleotides may include, for example, sequences that differ by
one or more nucleotide substitutions, additions or deletions.
[0034] In certain embodiments, the isolated or recombinant
polynucleotide is a probe that hybridizes, under stringent
conditions, such as highly stringent conditions, to an alteration
in the DCDC2 gene that is associated with susceptibility for
developing a LD, or to an alteration in the KIAA0319 gene that is
associated with susceptibility for developing a LD. A LD can be,
for example, a reading disability (RD) or language impairment (LI).
As used herein, the term "hybridization" is used in reference to
the pairing of complementary nucleic acids. The term "probe" refers
to a polynucleotide that is capable of hybridizing to another
nucleic acid of interest. The polynucleotide may be naturally
occurring, as in a purified restriction digest, or it may be
produced synthetically, recombinantly or by nucleic acid
amplification (e.g., PCR amplification).
[0035] It is well known in the art how to perform hybridization
experiments with nucleic acid molecules. The skilled artisan is
familiar with hybridization conditions and that appropriate
stringency conditions which promote DNA hybridization can be
varied. Such hybridization conditions are referred to in standard
text books such as Molecular Cloning: A Laboratory Manual, Cold
Spring Harbor Laboratory (1989); and Current Protocols in Molecular
Biology, eds. Ausubel et al., John Wiley & Sons: 1992. In one
embodiment, the polynucleotides hybridize to a variation in the
DCDC2 gene, to a variation in a KIAA0319 gene, or to both a
variation in the DCDC2 gene and the KIAA0319 gene (e.g., use of
distinct probes that hybridize to each gene, respectivelt). Under
highly stringent conditions, essentially no hybridization to
unrelated polynucleotides occurs.
[0036] Nucleic acid hybridization is affected by such conditions as
salt concentration, temperature, organic solvents, base
composition, length of the complementary strands, and the number of
nucleotide base mismatches between the hybridizing nucleic acids,
as will readily be appreciated by those skilled in the art.
Stringent temperature conditions will generally include
temperatures in excess of 30.degree. C., or may be in excess of
37.degree. C. or 45.degree. C. Stringent salt conditions will
ordinarily be less than 1000 mM, or may be less than 500 mM or 200
mM. For example, one could perform the hybridization at
6.0.times.sodium chloride/sodium citrate (SSC) at about 45.degree.
C., followed by a wash of 2.0.times.SSC at 50.degree. C. For
example, the salt concentration in the wash step can be selected
from a low stringency of about 2.0.times.SSC at 50.degree. C. to a
high stringency of about 0.2.times.SSC at 50.degree. C. In
addition, the temperature in the wash step can be increased from
low stringency conditions at room temperature, about 22.degree. C.,
to high stringency conditions at about 65 .degree. C. Both
temperature and salt may be varied, or temperature or salt
concentration may be held constant while the other variable is
changed. In one embodiment, nucleic acids hybridize under low
stringency conditions of 6.0.times.SSC at room temperature followed
by a wash at 2.0.times.SSC at room temperature. The combination of
parameters, however, is much more important than the measure of any
single parameter. See, e.g., Wetmur and Davidson, J Mol Biol. 1968;
31(3):349-70. Probe sequences may also hybridize specifically to
duplex DNA under certain conditions to form triplex or higher order
DNA complexes. The preparation of such probes and suitable
hybridization conditions are well known in the art. One method for
obtaining DNA encoding the biosynthetic constructs disclosed herein
is by assembly of synthetic oligonucleotides produced in a
conventional, automated, oligonucleotide synthesizer.
[0037] A polynucleotide probe or primer used in a method described
herein may be labeled with a reporter molecule, so that it is
detectable in a detection system, including, but not limited to,
enzyme (e.g., ELISA, as well as enzyme-based histochemical assays),
fluorescent, radioactive, chemical, and luminescent systems. A
polynucleotide probe or primer used in a method described herein
may further include a quencher moiety that, when placed very close
to a label (e.g., a fluorescent label), causes there to be little
or no signal from the label. It is not intended that the present
invention be limited to any particular detection system or
label.
[0038] In another embodiment, the isolated polynucleotide is a
primer that hybridizes adjacent, upstream, or downstream to an
alteration in the DCDC2 gene or the KIAA0319 gene that is
associated with susceptibility for developing a LD in humans. For
example, a polynucleotide primer can hybridize adjacent, upstream,
or downstream to an alteration in the DCDC2 gene or adjacent,
upstream, or downstream to an alteration in the KIAA0319 gene that
is associated with susceptibility for developing a LD (e.g., RD,
LI). As used herein, the term "primer" refers to a polynucleotide
that acts as a point of initiation of nucleic acid synthesis when
placed under conditions in which synthesis of a primer extension
product that is complementary to a nucleic acid strand is induced
(e.g., in the presence of nucleotides, an inducing agent such as
DNA polymerase, and suitable temperature, pH, and electrolyte
concentration). Alternatively, the primer ligates to a proximal
nucleic acid when placed under conditions in which ligation of two
unlinked nucleic acids is induced (e.g., in the presence of a
proximal nucleic acid, an inducing agent such as DNA ligase, and
suitable temperature, pH, and electrolyte concentration). A
polynucleotide primer may be naturally occurring, as in a purified
restriction digest, or may be produced synthetically. The primer is
single stranded or double stranded. If double stranded, the primer
is treated to separate its strands before being used. The primer
can be an oligodeoxyribonucleotide. The exact lengths of the
primers will depend on many factors, including temperature, source
of primer and the method used.
[0039] One embodiment is a pair of primers that specifically detect
an alteration in a DCDC2 gene or an alteration in a KIAA0319 gene
that is associated with susceptibility for developing a LD. In such
a case, the first primer hybridizes upstream from the alteration
and a second primer hybridizes downstream from the alteration. One
of the primers hybridizes to one strand of a region of DNA that
comprises an alteration in the DCDC2 gene or in the KIAA0319 gene
that is associated with susceptibility for developing LD and the
second primer hybridizes to the complementary strand of a region of
DNA that comprises an alteration in the DCDC2 gene or in the
KIAA0319 gene that is associated with susceptibility for developing
a LD. As used herein, the term "region of DNA" refers to a
sub-chromosomal length of DNA. Other embodiments are pairs of
primers that specifically detect alterations in other genes,
described herein, that are associated with a susceptibility for
developing a learning disability. A further embodiment is a set of
three primers useful for distinguishing between two alleles of
DCDC2, wherein the first allele is a non-deleted DCDC2 gene (e.g.,
an allele that does not comprise a deletion of READ1) and the
second allele comprises a deletion in the DCDC2 gene (e.g.,
comprises allele 39/indicated Del. in Table S4, Example 1) that is
associated with susceptibility for LD. The first primer hybridizes
to a nucleotide sequence that is common to both alleles, such as a
non-allelic nucleotide sequence that is upstream or downstream of
the polymorphic sequence in the DCDC2 gene. The second primer
specifically hybridizes to a nucleotide sequence that is unique to
a first allele (e.g., a non-deleted DCDC2 gene). The third primer
specifically hybridizes to a nucleotide sequence that is unique to
the second allele (e.g., a deletion in the DCDC2 gene that is
associated with susceptibility for RD). Use of the set of three
primers results in amplification of a region of DNA that is
dependent on which DCDC2 allele is present in the sample.
Alternatively, two primers of the set hybridize to a nucleotide
sequence that is common to two alleles of the DCDC2 gene, such as
non-allelic nucleotide sequences that are upstream and downstream
of a polymorphic sequence in the DCDC2 gene, and the third primer
specifically hybridizes to one of the two alleles of the DCDC2
gene.
[0040] Variant DCDC2 Polynucleotide Probes and Primers
[0041] The polynucleotides may be used in any assay that permits
detection of a variation in the DCDC2 gene that is associated with
susceptibility for developing a LD (e.g., RD, LI). Such methods may
encompass, for example, hybridization-mediated, ligation-mediated,
or primer extension-mediated methods of detection. Furthermore, any
combination of these methods may be utilized in the invention.
[0042] In one embodiment, the polynucleotides detect an alteration
in the DCDC2 gene that is associated with susceptibility for
developing a LD by amplifying a region of DNA that comprises the
alteration in the gene. Any method of amplification may be used. In
a second embodiment, the polynucleotides detect an alteration in
the KIAA0319 gene that is associated with susceptibility for
developing a LD by amplifying a region of DNA that comprises the
alteration in the gene. In one specific embodiment, a region of DNA
comprising an alteration is amplified by using polymerase chain
reaction (PCR). (Ann. Rev. Biochem., 61:131-156 (1992)); Gilliland
et al, Proc. Natl. Acad. Sci., 87: 2725-2729 (1990); Bevan et al,
PCR Methods and Applications, 1: 222-228 (1992); Green et al, PCR
Methods and Applications, 1: 77-90 (1991); Blackwell et al,
Science, 250: 1104-1110 (1990). PCR refers, for example, to the
method of Mullis (See e.g., U.S. Pat. Nos. 4,683,195 4,683,202, and
4,965,188, herein incorporated by reference), which describes a
method for increasing the concentration of a region of DNA, in a
mixture of genomic DNA, without cloning or purification. For
example, the polynucleotide primers described herein of the
invention are combined with a DNA mixture (or any polynucleotide
sequence that can be amplified with the polynucleotide primers),
wherein the DNA comprises the DCDC2 gene and/or the KIAA0319 gene.
The mixture also includes the amplification reagents (e.g.,
deoxyribonucleotide triphosphates, buffer, etc.) necessary for the
thermal cycling reaction. According to standard PCR methods, the
mixture undergoes a series of denaturation, primer annealing, and
polymerase extension steps to amplify the region of DNA that
comprises a variation in the DCDC2 gene and/or a variation in the
KIAA0319 gene. The length of the amplified region of DNA is
determined by the relative positions of the primers with respect to
each other and, therefore, this length is a controllable parameter.
For example, hybridization of the primers may occur such that the
ends of the primers proximal to the mutation are separated by 1 to
10,000 base pairs (e.g., 10 base pairs (bp) 50 bp, 200 bp, 500 bp,
1,000 bp, 2,500 bp, 5,000 bp, or 10,000 bp).
[0043] Standard instrumentation is used for amplification of DNA
and detection of amplified DNA. For example, a wide variety of
instrumentation has been developed for carrying out nucleic acid
amplifications, particularly PCR, e.g. Johnson et al, U.S. Pat. No.
5,038,852 (computer-controlled thermal cycler); Wittwer et al,
Nucleic Acids Research, 17: 4353-4357 (1989) (capillary tube PCR);
Hallsby, U.S. Pat. No. 5,187,084 (air-based temperature control);
Garner et al, Biotechniques, 14: 112-115 (1993) (high-throughput
PCR in 864-well plates); Wilding et al, International application
No. PCT/US93/04039 (PCR in micro-machined structures); Schnipelsky
et al, European patent application No. 90301061.9 (publ. No.
0381501 A2) (disposable, single use PCR device). In certain
embodiments, real-time PCR or other methods known in the art, such
as the Taqman assay, is used.
[0044] Amplified DNA may be analyzed by several different methods.
Such methods for analyzing the amplified DNA include, but are not
limited to, (Sanger) sequencing of the DNA, determining the size of
the fragment by electrophoresis or chromatography, hybridization
with a labeled probe, hybridization to a DNA array or microarray,
incorporation of biotinylated primers followed by avidin-enzyme
conjugate detection, or incorporation of .sup.32P-labeled
deoxynucleotide triphosphates, such as dCTP or dATP, into the
amplified segment. In one embodiment, the amplified DNA is analyzed
by gel electrophoresis. Methods of gel electrophoresis are well
known in the art. See for example, Current Protocols in Molecular
Biology, eds. Ausubel et al., John Wiley & Sons: 1992.
Amplified DNA can be visualized, for example, by fluorescent or
radioactive means. The DNA may also be transferred to a solid
support such as a nitrocellulose membrane and subjected to Southern
Blotting following gel electrophoresis. In one aspect, the DNA is
analyzed by electrophoresis and exposed to ethidium bromide and
visualized under ultra-violet light.
[0045] In one aspect, the alteration in the DCDC2 gene that is
associated with susceptibility for developing RD is a deletion. The
deletion may be detected using polynucleotide primers described
herein. For example, a set of three primers may be used to
distinguish between an allele of the DCDC2 gene that comprises a
deletion and a wildtype DCDC2 gene. Use of the set of three primers
results in amplification of a region of DNA that is dependent on
which DCDC2 allele is present in the sample. In some instances, a
deletion is protective, such as allele 39/Del in Table S4 of
Example 1. In some embodiments, alterations or variants are
protective.
[0046] In another embodiment, amplified DNA is analyzed by DNA
sequencing. DNA sequence determination may be performed by standard
methods such as dideoxy chain termination technology (Sanger
sequencing) and gel-electrophoresis, or by other methods such as by
pyrosequencing (Biotage AB, Uppsala, Sweden). The nucleic acid
sequence of the amplified DNA can be compared to the nucleic acid
sequence of wild type DNA to identify whether a variation in the
DCDC2 and/or KIAA0319 gene that is associated with susceptibility
for developing LD is present.
[0047] In another embodiment, the polynucleotides of the disclosure
detect an alteration in the DCDC2 gene that is associated with
susceptibility for developing a LD by hybridization-mediated
methods. In a further embodiment, the polynucleotides detect an
alteration in the KIAA0319 gene that is associated with
susceptibility for developing a LD by hybridization-mediated
methods. In one embodiment, a polynucleotide probe hybridizes to an
alteration in the DCDC2 gene that is associated with susceptibility
for developing a LD (and to flanking nucleotides), but not to a
wild type DCDC2 gene. In another embodiment, a polynucleotide probe
hybridizes to an alteration in the KIAA0319 gene that is associated
with susceptibility for developing a LD (and flanking nucleotides),
but not to a wild type KIAA0319 gene. The polynucleotide probe may
comprise nucleotides that are fluorescently, radioactively, or
chemically labeled to facilitate detection of hybridization.
Hybridization may be performed and detected by standard methods
known in the art, such as by Northern blotting, Southern blotting,
fluorescent in situ hybridization (FISH), or hybridization to
polynucleotides on a solid support (e.g., DNA arrays, microarrays,
cDNA arrays, or Affymetrix chips). In one embodiment, the
polynucleotide probe is used to hybridize genomic DNA by FISH. FISH
can be used, for example, in metaphase cells, to detect a deletion
in genomic DNA. Using FISH, genomic DNA is denatured to separate
the complementary strands within the DNA double helix structure.
The polynucleotide probe is combined with the denatured genomic
DNA. If an alteration in the DCDC2 gene that is associated with
susceptibility for developing a LD or an alteration in the KIAA0319
gene that is associated with susceptibility for developing a LD is
present, the probe will hybridize to the genomic DNA. The probe
signal (e.g., fluorescence) can be detected through a fluorescent
microscope for the presence or absence of signal. The absence of
signal indicates the absence of an alteration in the DCDC2 gene
that is associated with susceptibility for developing a LD (e.g.,
RD, LI) or the absence of an alteration in the KIAA0319 gene that
is associated with susceptibility for developing a LD (e.g., RD,
LI). Alternatively, presence of signal can be used to determine the
absence of an alteration in the DCDC2 or KIAA0319 gene.
[0048] In another embodiment, the polynucleotides detect an
alteration in the DCDC2 gene that is associated with susceptibility
for developing a LD (e.g., RD, LI) or an alteration in the KIAA0319
gene that is associated with susceptibility for developing a LD
(e.g., RD, LI) by primer extension with DNA polymerase. In one
embodiment, a polynucleotide primer hybridizes immediately adjacent
to the alteration. A single base sequencing reaction using labeled
dideoxynucleotide terminators may be used to detect the alteration.
If an alteration is present, the labeled terminator will be
incorporated into extension product; if an alteration is not
present, the labeled terminator will not be incorporated. In
another aspect, a polynucleotide primer hybridizes to an alteration
in the DCDC2 gene that is associated with susceptibility for
developing a LD or an alteration in the KIAA0319 gene that is
associated with susceptibility for developing a LD. The primer, or
a portion thereof, will not hybridize to a wild type DCDC2 or wild
type KIAA0319 gene. If an alteration is present, primer extension
occurs; if an alteration is not present, primer extension does not
occur. The primers and/or nucleotides may further include
fluorescent, radioactive, or chemical probes. A primer labeled by
primer extension may be detected by measuring the intensity of the
extension product, such as by gel electrophoresis, mass
spectrometry, or any other method for detecting fluorescent,
radioactive, or chemical labels.
[0049] In another embodiment, the polynucleotides detect an
alteration in the DCDC2 gene that is associated with susceptibility
for developing a LD or an alteration in the KIAA0319 gene that is
associated with susceptibility for developing a LD by ligation. In
one aspect, a polynucleotide primer hybridizes to a variation in
the DCDC2 gene that is associated with susceptibility for
developing a LD or to a variation in the KIAA0319 gene that is
associated with susceptibility for developing a LD. The primer will
not hybridize to the wild type gene (e.g., wild type DCDC2 gene). A
second polynucleotide that hybridizes to a region of the DCDC2 gene
immediately adjacent to the first primer or to a region of the
KIAA0319 gene immediately adjacent to the first primer is also
provided. One, or both, of the polynucleotide primers may be
fluorescently, radioactively, or chemically labeled. Ligation of
the two polynucleotide primers occurs in the presence of DNA ligase
if an alteration in the gene (e.g., an alteration DCDC2 or KIAA0319
gene) that is associated with susceptibility for developing a LD is
present. Ligation may be detected by gel electrophoresis, mass
spectrometry, or by measuring the intensity of fluorescent,
radioactive, or chemical labels.
EXAMPLES
[0050] The following examples are for illustrative purposes and are
not intended to be limiting in any way.
Example 1
Materials and Methods
[0051] Subject Recruitment, Data and DNA Collection, and Data
Management.
[0052] Subject recruitment and collection of phenotype data and DNA
for the ALSPAC cohort was done by the ALSPAC team, under the
supervision of S. M. Ring and these data were managed for this
study by L. L. Miller.
[0053] The Avon Longitudinal Study of Parents and Children
(ALSPAC)
[0054] The ALSPAC is a prospective birth cohort based in the Avon
region of the United Kingdom. It consists of children mostly of
northern European descent, born in 1991 and 1992. Children were
recruited before birth; recruitment of their pregnant mothers
resulted in a total of 15,458 fetuses, of whom 14,701 were alive at
1 year of age. Details regarding the participants, recruitment, and
study methodologies are described in detail elsewhere
(bristol.ac.uk/alspac) The children of the ALSPAC have been
extensively phenotyped from before birth to early adulthood. An
update on the status of the cohort was published in 2012. (S11).
The reading, language, and cognitive measures used for this study
were collected at ages 7, 8, and 9 years. DNA samples from 10,676
of these children were available for genotyping. Of this subset,
the number of children who completed the language and cognitive
measures varies by measure, but is generally 5200-5600
subjects.
[0055] ALSPAC Reading Measures
[0056] Reading measures in the ALSPAC include a phoneme deletion
task at age 7, single-word reading at ages 7 and 9, spelling at
ages 7 and 9, single non-word reading at age 9, and passage
comprehension, speed and accuracy at age 9. The phoneme deletion
task measures phoneme awareness (S12), which is widely considered
to be a core deficit in RD (S13). The child listens to a word
spoken aloud, and is then asked to remove a specific phoneme from
that word to make a new word (e.g. what word is created when
the/b/sound is removed from the word `block`? `Lock`). This task is
also known as the Auditory Analysis Test, and was developed by
Rosner and Simon (S14). Single-word reading was assessed at age 7
using the reading subtest of the Wechsler Objective Reading
Dimensions (WORD) (S15). At age 7 and 9, spelling was assessed; the
child was asked to spell a set of 15 age-adjusted words (S15). At
age 9, single-word reading was again assessed by asking the child
to read ten real words and ten non-words aloud. The words and
non-words used are a subset of a larger list of words and non-words
taken from research conducted by Terezinha Nunes and others at
Oxford (S16). Reading speed, accuracy, and comprehension scores
were ascertained at age 9, using the Neale Analysis of Reading
Ability (NARA-II) (S17). All three measures are standardized. The
child read passages from a booklet aloud and immediately afterward
was asked questions about what he/she read to assess reading
comprehension. Accuracy was measured by counting the number of
mistakes (mispronunciations, substitutions, etc.) the child made
and converting to a standardized score. Reading speed was number of
words per minute.
[0057] ALSPAC Language Measures
[0058] The language measures focused on for this study were
ascertained at 8 years of age. The first of these is a non-word
repetition (NWR) task, wherein the child is asked to repeat
recorded non-words. This task measures short-term phonological
memory and processing (S18). The second is a subtest of the
Wechsler Objective Language Dimensions that measures language
comprehension (WOLD Comp) (S19). For this task, the child is asked
a series of questions about a paragraph describing a picture, which
was read aloud by an examiner. Children with LI consistently
perform poorly on these measures (S20, S21).
[0059] ALSPAC IQ Assessment
[0060] Verbal, performance, and total IQ were assessed at age 8,
using the Wechsler Intelligence Scale for Children (WISC-III).
[0061] Ethical Approval
[0062] Ethical approval for the study was obtained from the ALSPAC
Ethics and Law Committee, the Local UK Research Ethics Committee,
and the Yale Human Investigation Committee.
[0063] DYX2 TagSNP Panel Design and Genotyping. TagSNPs designed to
capture the common variation in the DYX2 locus were selected using
the association study design server of Han et. al.
(design.cs.ucla.edu) (33). SNPs were genotyped on the Sequenom
platform, in collaboration with the Yale Center for Genome Analysis
(West Haven, CT), as per standard protocols. Call rate and
descriptive statistics for the SNPs described herein are listed in
Table S4. rs4504469, rs2038137, and rs2143340 were genotyped by
Scerri et. al., as described (24).). Minor allele frequency for all
tagSNPs was greater than or equal to 0.05. Average power to capture
known common variants (MAF>0.05) within DYX2 using this panel
was estimated at 83.44% a priori. A number of other SNPs were
included in addition to the tagSNP panel, including several that
had been previously associated with RD and coding SNPs in
DCDC2.
[0064] Haplotype-Based Association Analysis.
[0065] Linkage disequilibrium was assessed and haplotypes defined
using the Haploview software package, version 4.2 (34). Markers
that deviated substantially from Hardy-Weinberg equilibrium, or
that had a call rate <85%, were not used for haplotype analysis.
The four-gamete rule option was used to demarcate haplotype blocks,
which resulted in 44 haplotype blocks covering the DYX2 locus.
Association analysis was performed with individual haplotypes that
had frequencies of 0.01 or greater (208 total), using the Plink
software package, version 1.07 (S4). The association analyses were
performed using chi squared and logistic regression test statistics
(-hap-assoc and -hap-logistic options). Individuals who were not
identified as non-Hispanic white, who had a total IQ below 75, or
whose DNA sample returned an average call rate below 85% for SNPs
that passed quality control, were excluded from association
analysis. To correct for multiple testing, a Bonferroni correction
with the alpha level set at 0.05 was applied, treating each of the
208 haplotypes as an individual test; the threshold level is
therefore 0.05/208=2.4038.times.10-4. As each phenotype constituted
an independent hypothesis (phonological awareness and language),
this threshold was not doubled to account for there being two
phenotypes.
[0066] BV677278 Genotyping.
[0067] Carriers of the DCDC2 haplotypes of interest that could be
phased unequivocally (using Plink's -hap-phase function) were
genotyped for the BV677278 STR. BV677278 is genotyped by PCR
amplification and subsequent Sanger sequencing. Alleles are called
by an in-house Perl script developed by Y. Kong. The Perl script is
available upon request.
[0068] Amplification Primers
TABLE-US-00001 (SEQ ID NO: 1) STR_F:
5'-TGTAAAACGACGGCCAGTTGTTGAATCCCAGACCACAA-3' (SEQ ID NO: 2) STR_R:
5'-ATCCCGATGAAATGAAAAGG-3'
[0069] M13F Sequencing Primer
TABLE-US-00002 (SEQ ID NO: 3) 5'-TGTAAAACGACGGCCAGT
[0070] Amplification Reaction Mixture (per 10 .mu.l reaction)
[0071] 10.times.PCR Buffer (Qiagen): 1 .mu.l
[0072] MgCl.sub.2 (25 mM) (Qiagen): 0.4 .mu.l
[0073] dNTPs (10 mM): 0.25 .mu.l
[0074] Primer STR_F (10 .mu.M): 0.25 .mu.l
[0075] Primer STR_R (10 .mu.M): 0.25 .mu.l
[0076] HotStarTaq.TM. (Qiagen, diluted to 0.5 units/.mu.l in Taq
dilution buffer): 0.20 .mu.l
[0077] Template DNA: 1 .mu.l (.about.10 ng/.mu.l)
[0078] Nuclease-free H20: 6.65 .mu.l
[0079] Amplification Reaction
[0080] 1. 15 minutes, 95.degree. C.
[0081] 2. 30 seconds, 95.degree. C.
[0082] 3. 30 seconds, 65.degree. C. [0083] Decrease 1.degree.
C/cycle
[0084] 4. 60 seconds, 72.degree. C.
[0085] 5. GoTo step 2, 9 times
[0086] 6. 30 seconds, 95.degree. C.
[0087] 7. 30 seconds, 56.degree. C.
[0088] 8. 60 seconds, 72.degree. C.
[0089] 9. GoTo step 6, 34 times
[0090] 10. 5 minutes, 72.degree. C.
[0091] 11. co, 4.degree. C.
[0092] PCR Purification and Sequencing
[0093] PCR products were purified using ExoSAP-IT.RTM. enzyme mix,
according to the manufacturer's protocol. Purified amplicons were
then mixed with M13F sequencing primer, and sequenced. Sanger
sequencing was performed at the Yale W.M. Keck DNA Sequencing
Facility, as per their standard sequencing protocol.
[0094] Genotype Calling
[0095] Alleles were called from the electropherograms, using an
in-house Perl script developed by Y. Kong for the purpose.
[0096] Microdeletion Genotyping.
[0097] Carriers of the DCDC2 haplotypes of interest were also
genotyped for the 2,445 bp DCDC2 microdeletion. This deletion
encompasses the entire BV677278 STR within its breakpoints, so it
must be genotyped in addition to BV677278 to get an accurate
genotype for apparent BV677278 homozygotes. The microdeletion is
genotyped by allele-specific PCR and agarose electrophoresis. The
three-primer reaction generates a .about.600 bp amplicon from
intact chromosomes (no deletion), and a .about.200 bp amplicon from
chromosomes with the deletion, allowing heterozygotes and both
homozygotes to be readily distinguishable from one another.
[0098] Amplification Primers
TABLE-US-00003 (SEQ ID NO: 4) Primer Del_F:
5'-AGCCTGCCTACCACAGAGAA-3' (SEQ ID NO: 5) Primer Del_RC:
5'-GGAACAACCTCACAGAAATGG-3' (SEQ ID NO: 6) Primer Del_RD:
5'-TGAAACCCCGTCTCTACTGAA-3'
[0099] Amplification Reaction Mixture (per 10 .mu.l reaction)
[0100] 10.times.PCR Buffer (Qiagen): 1 .mu.l
[0101] MgCl.sub.2 (25 mM) (Qiagen): 0.4 .mu.l
[0102] dNTPs (10 mM): 0.25 .mu.l
[0103] Primer Del_F (10 .mu.M): 0.30 .mu.l
[0104] Primer Del_RC (10 .mu.M): 0.20 .mu.l
[0105] Primer Del_RD: 0.20 .mu.l
[0106] HotStarTaq.TM. (Qiagen, diluted to 0.5 units/.mu.l in Taq
dilution buffer): 0.20 .mu.l
[0107] Template DNA: 1 .mu.l (.about.10 ng/.mu.l)
[0108] Nuclease-free H.sub.2O: 6.45 .mu.l
[0109] Amplification Reaction
[0110] The amplification reaction for the microdeletion is the same
as for the BV677278 STR (see above).
[0111] Agarose Electrophoresis PCR products were electrophoresed on
1% agarose gels, using standard 1.times. TBE buffer with ethidium
bromide (0.2m/mL), via standard methods, at 100-150V depending on
gel size. Gels were imaged on a UV transilluminator, and documented
with a Bio-Rad Gel DocTM XR imaging system. Genotypes were called
from the gels manually.
[0112] Protein Identification by SILAC-Based Mass Spectrometry.
Raji and HeLa cells were SILAC-labeled with with Lys-8 and Arg-10
(Eurisotop) or their naturally-occurring counterparts Lys-0, Arg-10
(Sigma), as described (20). Heavy nuclear lysate prepared from
these cells was incubated with a biotinylated oligonucleotide probe
identical to a segment of BV677278 that had been previously shown
to bind a nuclear protein with high specificity (15). Light nuclear
lysate was incubated with a biotinylated scrambled probe previously
shown not to bind the nuclear protein of interest (15). The
resulting oligonucleotide-protein complexes were pulled down with
streptavidin-conjugated beads and subjected to quantitative mass
spectrometry, as described previously (36). The reverse experiment
was also done (binding probe with light lysate, scrambled probe
with heavy lysate), resulting in the two-dimensional interaction
plots in FIG. 3A-B. The above experiment is described in more
detail as follows:
[0113] SILAC labeling of HeLa and Raji cells
[0114] Raji cells were labeled for at least 8 generations in DMEM
(-Arg, -Lys) medium containing 10% dialyzed fetal bovine serum
(Gibco) supplemented with 58 mg/L 13C615N4 L-arginine and 34 mg/L
13C615N2 L-lysine (Eurisotop) or the corresponding non-labeled
amino acids. For Raji, cell extracts were prepared as described in
Wu et. al.(S5). HeLa S3 cells were SILAC-labeled in RPMI 1640
(-Arg, -Lys) medium containing 10% dialyzed fetal bovine serum
(Gibco) supplemented with 84 mg/L 13C615N4 L-arginine and 40 mg/L
13C615N2 L-lysine (Eurisotop) or the corresponding non-labeled
amino acids, respectively. For HeLa S3, three consecutive batches
of cells were independently harvested and cell extracts prepared as
described by Dignam et. al.(S6).
[0115] SILAC, DNA pulldown of proteins, and quantitative mass
spectrometry were performed as previously described (S7), using the
cell lines Raji and HeLa. The binding pulldown probe is a
concatamer of two copies of the EMSA3 probe used in the EMSA
experiments we reported in 2011, while the scrambled probe is a
concatamer of two copies of the EMSA3-Scraml probe from the same
experiments (S8).
[0116] Oligonucleotides:
TABLE-US-00004 EMSA3_for: (SEQ ID NO: 7)
5'-TTGAGAGGAAGGAAAGGAAGGATCCCTGAGAGGAAGGAAAGGAAGGA-3' EMSA3_rev:
(SEQ ID NO: 8)
5'-AATCCTTCCTTTCCTTCCTCTCAGGGATCCTTCCTTTCCTTCCTCTC-3'
EMSA3/Scram1_for: (SEQ ID NO: 9) 5'
-TTGAGAGAGAGAGAGAGAGAGATCCCTGAGAGAGAGAGAGAGAGAGA-3'
EMSA3/Scram1_rev: (SEQ ID NO: 10)
5'-AATCTCTCTCTCTCTCTCTCTCAGGGATCTCTCTCTCTCTCTCTCTC-3'
[0117] DNA-Pulldown
[0118] 25 .mu.g of annealed, concatenated and desthiobiotinylated
DNA probes was bound to 75 .mu.l of Dynabeads MyOne Cl (Life
Technologies). Excess oligonucleotides were removed and beads were
incubated with 400 .mu.g of SILAC-labelled nuclear extracts in
protein binding buffer (150 mM NaCl, 50 mM Tris-HCl (pH 8), 0.5%
NP-40, 10 mM MgCl2, protease inhibitor cocktail; Roche). After 1 h
of on a rotation wheel at 4.degree. C., the beads were washed three
times, combined and DNA-protein complexes eluted in protein binding
buffer containing 16 mM biotin. The supernatant was precipitated
with 4 volumes (v/v) of ethanol overnight and the proteins pellets
by maximum centrifugation on a table top microcentrifuge. The
pellet was resolubilized in 8M urea/50 mM Tris pH8.0, reduced with
1 mM DTT, alkylated with 3 mM iodoacetamide and subsequently
digested with trypsin (Promega) in 50 mM ammonium bicarbonate pH8
buffer at room temperature overnight. Samples were stored on stage
tips and eluted prior to use.
[0119] Mass spectrometry
[0120] Peptides were separated with a 140 min gradient from 5 to 60
percent acetonitril (EasyHPLC, Thermo Fisher) using a 75 um 15 cm
capillary packed with 3.0 um C18 beads (Dr. Maisch) directly
mounted to a LTQ-Orbitrap mass spectrometer (Thermo Fisher). The
instrument was operated in a data-dependent top10 acquisition
modus. The raw data was searched using the MaxQuant software
(version 1.2.0.18) suite against the complete IPI human database
(v3.68, 87061 entries). Enzyme search specificity was trypsin/p
with 2 allowed miscleavages. Carbamidomethylation was set as fixed
modification while methionine oxidation and protein N-acetylation
was considered as variable modifications. The search was performed
with an initial mass tolerance of 7 ppm mass accuracy for the
precursor ion and 0.5 Da for the MS/MS spectra.
[0121] ChIP-QPCR.
[0122] The AbCam ChIP kit (cat. # ab500) was used to perform the
ChIP assays described, according to the manufacturer's
instructions, but with several modifications described below. For
qPCR, the Qiagen QuantiTect.RTM. SYBR.RTM. Green qPCR kit for
ChIP-qPCR was used. Manufacturer's instructions were followed 25
.mu.g of template per reaction. All reactions in triplicate.
Antibodies, primer sequences, and detailed methods for this
experiment are available in the Supporting Information. Quality
control data for qPCR is shown in FIG. S3.
[0123] Step 1: Cell Fixation and Collection [0124] For each set of
3 ChIP reactions, 9 million freshly harvested cells (Raji) were
used instead of 3 million, with volumes appropriate for 9 million
cells, as given in the protocol.
[0125] Step 2: Cell Lysis [0126] After the final PBS wash in the
cell fixation step, each aliquot of 9 million cells was carried
through the cell lysis step with volumes appropriate for 3 million
cells (9 million treated as if it were 3 million--resulting in 3
million cells per ChIP reaction instead of 1 million). [0127] 100
.mu.l Buffer D with protease inhibitors was replaced with
.about.150 .mu.lof the following:
[0128] Buffer D2/PI (S9)
[0129] 10 mM Tris
[0130] 1 mM EDTA
[0131] 0.5 mM EGTA
[0132] pH=7
[0133] 1.times. Protease inhibitor cocktail [0134] A Branson 450
probe sonicator was used for sonication. DNA was fragmented with 18
20-second pulses, with the amplitude set to 6 and the DC set to
continuous. The samples were kept in ice water during sonication
and were allowed to rest 2 minutes on ice between each pulse.
[0135] Step 3: Immunoprecipitation [0136] Followed manufacturer's
protocol. [0137] .alpha.-ETV6 antibody: sc-166835.times. (Santa
Cruz Biotech), 5 .mu.m/ChIP [0138] Control antibody: ab1791 (AbCam)
(.alpha.-variant histone H3, enriched in actively transcribing
genes, an aliquot of this comes with the kit), 2 .mu.g/ChIP [0139]
Instead of the beads provided in the kit, Magna-Bind.TM. Protein
A/G magnetic beads were used and a magnet stand, according to
manufacturer's protocol was used, instead of the centrifuge, for
immunoprecipitation.
[0140] Step 4: DNA Purification [0141] After reversing the
crosslinks and proteinase K treatment, DNA was purified using a
Qiagen QIAquick.RTM. PCR Purification kit instead of the
DNA-purifying slurry provided in the ChIP kit. [0142] After
purification with a QIAquick.RTM. column, each final product was
eluted with a total of 60 .mu.l Buffer EB, in two 30 .mu.lelution
steps. [0143] The end product was quantified by fluorescence
(Quant-ITTM PicoGreen.RTM. dsDNA assay kit, as per manufacturer's
protocol). [0144] qPCR [0145] The Qiagen QuantiTect.RTM. SYBR.RTM.
Green qPCR kit for ChIP-qPCR was used, following manufacturer's
instructions. [0146] 25 .mu.g template per reaction was used.
[0147] 1.25 .mu.lof each primer (5 .mu.M) was used per reaction.
[0148] A log-5 standard curve was done using input Raji DNA
(sonicated, but not subjected to ChIP). See FIG. S3A-B. [0149] To
avoid pipetting error, all samples were diluted to the same
concentration as the least-concentrated sample, so the same amount
could be added to every reaction. Dilutions were done serially.
[0150] All reactions in triplicate were performed. [0151] qPCR
Primers [0152] ChIP-STR (BV677278 amplicon)
TABLE-US-00005 [0152] Primer ChIPSTR_F: 5'-TCATGCAAAGTTCCAAAACC-3'
(SEQ ID NO: 11) Primer ChIPSTR_R: 5'-GATTTCCTCCCTCCCTTCC-3' (SEQ ID
NO: 12)
[0153] These primers capture the entire BV677278 repeat, and
generate a -200 bp amplicon.
[0154] .beta.-actin (+control)
TABLE-US-00006 Primer .beta.Act_F: 5'-GCCCTAGGCACCAGGGTGTGA-3' (SEQ
ID NO: 13) Primer .beta.Act_R: 5'-ACAGGGTGCTCCTCAGGGGC-3' (SEQ ID
NO: 14)
[0155] These primers amplify a .about.150 bp sequence from the
actively transcribing .beta.-actin gene. qPCR Reaction
[0156] 1. 15 minutes, 95.degree. C.
[0157] 2. 30 seconds, 95.degree. C.
[0158] 3. 30 seconds, 65.degree. C. [0159] Decrease 1.degree. C.
per cycle
[0160] 4. 60 seconds, 72.degree. C.
[0161] 5. Plate read
[0162] 6. GoTo step 2, 9 times
[0163] 7. 30 seconds, 95.degree. C.
[0164] 8. 30 seconds, 56.degree. C.
[0165] 9. 60 seconds, 72.degree. C.
[0166] 10. Plate read
[0167] 11. GoTo step 6, 39 times
[0168] 12. 5 minutes, 72.degree. C.
[0169] 13. Melting Curve
[0170] 14. co, 4.degree. C.
[0171] *Quality control data for the ChIP experiment reported here
are shown in FIG. S3.
[0172] Enrichment Calculations
[0173] Fold enrichment was calculated with respect to the no
antibody control (a complete ChIP reaction, but with no
antibody--only beads). Briefly, this was done by raising 2 to the
negative power of the difference between the C(t) of an
experimental condition and its respective no-antibody control: Fold
Enrichment=2.sup.-[C(t)Exp-C(t)NoAnibody]
[0174] Results
[0175] Two Six-Marker Haplotypes in DCDC2 are Associated with
Reduced Performance on Reading and Language Measures
[0176] To determine whether DCDC2, KIAA0319, both, or neither gene
is responsible for the DYX2 signal, a tagSNP panel was designed to
densely cover the DYX2 locus. Haplotype-based association analysis
of reading and language in a large, extensively phenotyped birth
cohort: the Avon Longitudinal Study of Parents and Children
(ALSPAC), was then performed (17). Analysis showed a six-marker
haplotype block within DCDC2, of which two haplotypes--CGCGAG and
GACGAG--associated with very poor performance on a phoneme deletion
task and a composite language measure, respectively (Table 1). For
this analysis, RD cases were defined as individuals scoring two or
more standard deviations below the mean on the phoneme deletion
task, and LI cases as individuals scoring two or more standard
deviations below the mean on either of two language measures
(WOLD/NWR). The phoneme deletion task measures phonological
awareness, which is widely considered to be the core deficit in RD
(2). The Wechsler Objective Learning Dimensions (WOLD) verbal
comprehension and nonword repetition (NWR) tasks that comprise the
WOLD/NWR composite language measure are used to assess deficient
language skills; children with LI show consistently poor
performance on these measures (18, 19) (see Table 51 and the
Materials and Methods for more information on these phenotypic
measures). Cases were defined this way to examine association of
DYX2 variants with severe RD and LI. The two haplotypes show strong
association with their respective phenotypes; the CGCGAG-RD
association survived Bonferroni correction for multiple testing and
the GACGAG-LI p-value was just below the threshold. However, the
associations by themselves are not strong enough to rule out type I
error, partly due to the low frequencies of the haplotypes and the
small number of cases. Interestingly, however, the effect of these
haplotypes is strong enough to reduce mean performance on relevant
phenotypic measures in carriers versus non-carriers. Carriers of
the CGCGAG haplotype, on average, showed significantly poorer
performance on eight reading-related measures compared to
non-carriers. Likewise, carriers of the GACGAG haplotype showed
significantly lower average performance on the WOLD/NWR composite
language measure (Table 2). This quantitative effect indicated that
this finding is not a false positive and prompted further analysis.
Additionally, this haplotype block resides in close proximity to
BV677278, a putatively functional compound short tandem repeat
(STR) Applicants reported previously (11) (FIG. 1C). The
polymorphism of BV677278 derives from five discrete repeat units
that vary in number (FIG. 1A, Table S4). This STR evolves rapidly,
as indicated by its high degree of polymorphism among primate
species and within Homo sapiens (FIG. 1B, FIG. S2, Table S4).
[0177] The DCDC2 Risk Haplotypes are in Strong Linkage
Disequilibrium with Two Alleles of BV677278
[0178] The associated haplotype block is adjacent to BV677278 (FIG.
1C) and whether the two risk haplotypes could be capturing
association arising from functional alleles of BV677278 via linkage
disequilibrium was assessed. To address this question, all carriers
of these haplotypes were subjected to BV677278 genotyping by Sanger
sequencing. Of the carriers of the CGCGAG haplotype, 92% also carry
BV677278 allele 5. Likewise, 78% of the carriers of GACGAG also
carry BV677278 allele 6 (Table 1). Alleles 5 and 6 are similar in
structure to each other, and cluster phylogenetically to the same
clade (Table S4, FIG. 51). Indeed, nearly all carriers of these two
haplotypes also carry an allele from this clade (Table 1). These
results further implicate BV677278 as a RD risk variant and expand
it as a possible LI risk variant (11), and together with its
apparent regulatory capacity, suggest that these BV677278 alleles
are responsible for the risk haplotypes effects.
[0179] BV677278 Specifically Binds the Transcription Factor
ETV6
[0180] To gain mechanistic insight into the function of BV677278,
quantitative mass spectrometry was used to identify the protein(s)
that bind to this locus (20). To this end, a biotinylated
oligonucleotide probe carrying segments of the BV677278 repeat
previously shown to bind a nuclear protein, and a scrambled
non-binding control, were incubated with nuclear extracts that had
been SILAC-labeled (15). SILAC-labeling involves culturing two
parallel populations of cells--one with media containing amino
acids labeled with heavy isotopes of carbon and nitrogen, the other
with naturally-occurring isotopes. After the label is incorporated,
proteins from the two populations (`heavy` and `light`) can be
differentiated from each other by quantitative mass spectrometry.
The heavy nuclear extract was incubated with the BV677278 probe,
and the light nuclear extract with the control probe. The probes
were then pulled down with streptavidin-conjugated beads, and the
resulting protein mixture was subjected to quantitative mass
spectrometry, and proteins were looked for that were significantly
enriched by pulldown with the BV677278 probe compared to the
control probe (high heavy:light ratio). The experiment was
conducted with nuclear extracts derived from either HeLa or Raji
cells, and repeated with a label-switch resulting in a
two-dimensional interaction plot. These experiments yielded a
single candidate, shared by both HeLa and Raji: the transcription
factor ETV6 (FIG. 2A-B). To confirm the BV677278-ETV6 interaction,
chromatin immunoprecipitation with quantitative PCR was performed
(ChIP-qPCR) using a-ETV6 antibody, in Raji cells.
Immunoprecipitation with the a-ETV6 antibody showed marked
enrichment for the BV677278 amplicon, but not for the control
amplicon derived from the gene encoding (3-Actin (ACTB) (FIG. 2C).
These results demonstrated that ETV6 binds the BV677278 region.
[0181] The DCDC2 Risk Haplotypes Show a Synergistic Genetic
Interaction with a Known RD Risk Haplotype in the gene KIAA0319
[0182] Together with Applicant's previous findings, these data
implicated BV677278 as a regulatory element. Luciferase assays
suggest that BV677278 is capable of modulating expression from the
DCDC2 promoter, but it may regulate other genes (15). A
three-marker risk haplotype encompassing the 5' half and upstream
sequence of KIAA0319 has been consistently associated with lowered
reading performance (21-24). Additionally, expression of KIAA0319
in human neural cell lines is reduced with this haplotype, relative
to non-risk haplotypes (25). Applicants therefore questioned
whether BV677278 might interact genetically with the KIAA0319 risk
haplotype, and examined the effect of carrying both a DCDC2 (CGCGAG
or GACGAG) and the KIAA0319 risk haplotype on several reading,
language, and cognitive measures. Strikingly, subjects carrying
risk haplotypes in both genes showed markedly worse mean
performance (up to 0.40 standard deviations) on nearly all measures
examined (FIG. 3A). This reduction in performance in carriers of
both risk haplotypes is, for most of the phenotypes examined,
greater than the sum of those of single carriers, indicating a
synergistic interaction between these two genes. This result
corroborates a previous report, which provided statistical evidence
that DCDC2 and KIAA0319 interact to influence RD risk (26).
[0183] Discussion
[0184] Given the remarkable similarity of the human exome to those
of other higher primates, it has been hypothesized that rapidly
evolving regulatory elements are responsible for the large
phenotypic differences we observe. The recently published results
of the ENCODE Consortium, which showed most of the non-coding
genome to be active and much of the active proportion to be
regulatory, lend circumstantial support to this hypothesis (27).
Here, Applicants provide evidence of just such a regulatory element
affecting reading and language, two exclusively human phenotypes.
BV677278 expanded rapidly from gorilla to human, though the
sequence flanking it is quite conserved (FIG. 1B), and its
presence, length, and sequence vary widely among primate species
(FIG. 52). p This element specifically binds ETV6, a transcription
factor and proto-oncogene also known as TEL (translocation ETS
leukemia). The ETV6 gene forms oncogenic fusions, often with the
AML1 proto-oncogene, that are frequently seen in leukemia (28).
ETV6's effect on transcription is generally repressive via
recruitment of a co-repressor complex (29). Monomeric ETV6 has
essentially no affinity for its binding sequence; it must at least
dimerize to bind DNA (30). There is evidence that ETV6 polymerizes
in vivo, with the length of the polymer dependent on the number and
spacing of binding sites (31). This property suggests that
different alleles of BV677278 bind ETV6 polymers of different
lengths, depending on the number of suitably spaced ETV6 binding
sites, and that these differences change the regulatory power of
the complex (FIG. 3B). Supporting this idea is the structural
similarity of alleles 5 and 6: both have the same GGAA insertion in
repeat unit 2, relative to the most common allele (Table S4). GGAA
is the core binding sequence of ETV6 (30), and this insertion could
recruit an additional ETV6 monomer to the complex.
[0185] However, whether ETV6 represses transcription in this
context, and what genes it targets, are uncertain. Applicants
previously reported luciferase assays appear to indicate that some
BV677278 alleles activate transcription from the DCDC2 promoter,
and that alleles with very different structures (e.g. 3 and 5,
Table S4) activate transcription to a similar extent (15).
BV677278's genetic interaction with the KIAA0319 risk haplotype,
and its dramatic effect on phenotype, suggest KIAA0319 as a target
gene in vivo. The KIAA0319 risk haplotype is known to be associated
with reduced KIAA0319 expression, at least in human neural cell
lines, suggesting the possibility that it carries a promoter or
promoter-proximal variant that increases repression (or decreases
activation) by BV677278, resulting in reduced gene expression and
possible phenotypic consequences. That reduced IQ was also observed
with the DCDC2-KIAA0319 interaction (FIG. 3A) may reflect pathology
at the cellular level (e.g. disrupted neuronal migration), or it
may simply reflect the importance of language in measuring IQ.
BV677278 genotyping in all members of the ALSPAC and subsequent
combinatorial analysis, together with chromatin conformation
experiments, will further illuminate B V677278's target genes and
mechanism of action.
[0186] The DCDC2 and KIAA0319 risk haplotypes have a synergistic
effect on reading, language, and cognitive phenotypes. This lends
credence to the `phantom heritability` hypothesis, which explains
the so-called missing heritability of continuous traits as
resulting from non-additive interactions between risk variants
(32). Also supporting this idea is that although carriers of the
DCDC2 risk haplotypes show reduced average performance on
phenotypic measures, the standard deviations for these measures
were generally similar to those of non-carriers (Table 2). This
implies that the magnitude of effect of the risk haplotypes on
phenotype lies on a continuum, and is dependent on other,
interacting risk variants, as well as environmental and stochastic
factors. Additionally, these results may partially explain the
missing efficacy of GWAS studies. If rapidly evolving regulatory
elements are indeed substantially responsible for continuous
phenotypic variation, they would be expected, like BV677278, to
show a higher degree of polymorphism than the average SNP. This
would make them difficult to identify by standard single-marker
analyses in GWAS, reinforcing the importance of multi-marker,
pathway, and gene-gene interaction analyses in the study of complex
traits.
REFERENCES
[0187] 1. Development NIoCHa (2010) Learning Disabilities
(nichd.nih.gov/health/topics/learning_disabilities.cfm).
[0188] 2. Peterson RL & Pennington BF (2012) Lancet
379(9830):1997-2007.
[0189] 3. Pennington BF & Bishop DV (2009) Annual review of
psychology 60:283-306.
[0190] 4. Anonymous (2007) National Assessment of Educational
Progress (NAEP): The Nation's Report Card, Reading 2007 (National
Center for Education Statistics,
(nces.ed.gov/nationsreportcard/pubs/main2007/2007496.asp),
(Institute of Education Sciences UDoE).
[0191] 5. Scerri T S & Schulte-Korne G (2010) European child
& adolescent psychiatry 19(3):179-197.
[0192] 6. Meng H, et al. (2005) Human genetics 118(1):87-90.
[0193] 7. Peschansky V J, et al. (2010) Cereb Cortex
20(4):884-897.
[0194] 8. Poelmans G, et al. (2011) Molecular psychiatry
16(4):365-382.
[0195] 9. Liu J S (2011) Current neurology and neuroscience reports
11(2):171-178.
[0196] 10. Velayos-Baeza A, et al. (2010) The Journal of biological
chemistry 285(51):40148-40162.
[0197] 11. Meng H, et al. (2005) Proceedings of the National
Academy of Sciences of the United States of America
102(47):17053-17058.
[0198] 12. Ludwig K U, et al. (2008) Psychiatric genetics
18(6):310-312.
[0199] 13. Marino C, et al. (2012) Psychiatric genetics
22(1):25-30.
[0200] 14. Wilcke A, et al. (2009) Annals of dyslexia
59(1):1-11.
[0201] 15. Meng H, et al. (2011) Behavior genetics 41(1):58-66.
[0202] 16. Cope N, et al. (2012) Neurolmage 63(1):148-156.
[0203] 17. Boyd A, et al. (2012) International journal of
epidemiology.
[0204] 18. Bishop D V, et al. (1996) Journal of child psychology
and psychiatry, and allied disciplines 37(4):391-403.
[0205] 19. Newbury D F, et al. (2009) American journal of human
genetics 85(2):264-272.
[0206] 20. Mittler G, et al. (2009) Genome research
19(2):284-293.
[0207] 21. Francks C, et al. (2004) American journal of human
genetics 75(6):1046-1058.
[0208] 22. Luciano M, et al. (2007) Biological psychiatry
62(7):811-817.
[0209] 23. Paracchini S, et al. (2008) The American journal of
psychiatry 165(12):1576-1584.
[0210] 24. Scerri T S, et al. (2011) Biological psychiatry
70(3):237-245.
[0211] 25. Paracchini S, et al. (2006) Human molecular genetics
15(10):1659-1666.
[0212] 26. Harold D, et al. (2006) Molecular psychiatry
11(12):1085-1091, 1061.
[0213] 27. Djebali S, et al. (2012) Nature 489(7414):101-108.
[0214] 28. Fuka G, et al. (2011) PloS one 6(10):e26348.
[0215] 29. Wang L & Hiebert S W (2001) Oncogene
20(28):3716-3725.
[0216] 30. Green S M, et al. (2010) The Journal of biological
chemistry 285(24):18496-18504.
[0217] 31. Kim C A, et al. (2001) The EMBO journal
20(15):4173-4182.
[0218] 32. Zuk O, et al. (2012) Proceedings of the National Academy
of Sciences of the United States of America 109(4):1193-1198.
[0219] 33. Han B, et al. (2008) Annals of human genetics 72(Pt
6):834-847.
[0220] 34. Barrett J C, et al. (2005) Bioinformatics
21(2):263-265.
[0221] 35. Purcell S, et al. (2007) American journal of human
genetics 81(3):559-575.
[0222] 36. Butter F, et al. (2010) EMBO reports 11(4):305-311.
SUPPLEMENTAL REFERENCES
[0223] S1. Han B, et al. (2008) Annals of human genetics 72(Pt
6):834-847.
[0224] S2. Scerri T S, et al. (2011) Biological psychiatry
70(3):237-245.
[0225] S3. Barrett J C, et al. (2005) Bioinformatics
21(2):263-265.
[0226] S4. Purcell S, et al. (2007) American journal of human
genetics 81(3):559-575.
[0227] S5. Wu K K (2006) Methods Mol Biol 338:281-290.
[0228] S6. Dignam J D, et al. (1983) Nucleic acids research
11(5):1475-1489.
[0229] S7. Mittler G, et al. (2009) Genome research
19(2):284-293.
[0230] S8. Meng H, et al. (2011) Behavior genetics 41(1):58-66.
[0231] S9. Kolodziej K E, et al. (2009) BMC molecular biology
10:6.
[0232] S10. Meng H, et al. (2005) Proceedings of the National
Academy of Sciences of the United States of America
102(47):17053-17058.
[0233] S11. Boyd A, et al. (2012) International journal of
epidemiology.
[0234] S12. Hulme C, et al. (2007) Paired-associate learning,
phoneme awareness, and learning to read. Journal of experimental
child psychology 96(2):150-166.
[0235] S13. Peterson R L & Pennington B F (2012) Lancet
379(9830):1997-2007.
[0236] S14. Rosner J & Simon D P (1971) Journal of Learning
Disabilities 4(384):40-48.
[0237] S15. Rust J, et al. (1993) WORD: Wechsler Objective Reading
Dimensions Manual (Psychological Corporation, Sidcup, UK).
[0238] S16. Nunes T, et al. (2003) Scientific Studies of Reading
7(3):289-307.
[0239] S17. Neale M D (1997) Neale Analysis of Reading
Ability--Revised:--Manual for Schools (NFER-Nelson).
[0240] S18. Gathercole S E & Baddeley A D (1996) The Children's
Test of Nonword Repetition (Psychological Corporation, London,
UK).
[0241] S19. Wechsler D (1996) Wechsler Objective Language
Dimensions (WOLD) (Psychological Corporation, London, UK).
[0242] S20. Bishop D V, et al. (1996) Journal of child psychology
and psychiatry, and allied disciplines 37(4):391-403.
[0243] S21. Newbury DF, et al. (2009) American journal of human
genetics 85(2):264-272.
[0244] Tables of Example 1
TABLE-US-00007 TABLE 1 Association and linkage disequilibrium data
for DCDC2 risk haplotypes. Phenotypes are described in Table S1 and
in the Materials and Methods. Cases are defined by a score of less
than or equal to 2SD below the mean. P-values that survived
Bonferroni correction for multiple testing (.alpha. = 0.05) are
bolded. `% Allele 5,` etc. means `percentage of haplotype carriers
with at least one copy of that allele or group of alleles.` Clade
1, the phylogenetic branch of alleles that includes 5 and 6, is
described in FIG. S1. Association Data Phenotype- Haplotype 2SD n
Cases n Controls Haplotype Freq. Odds Ratio P-value CGCGAG Phoneme
Del. (RD) 89 5225 0.0236 3.20 6.068 .times. 10.sup.-5 GACGAG
WOLD/NWR (LI) 270 5240 0.0364 1.91 2.84 .times. 10.sup.-4 LD Data
Haplotype n Carriers % Allele 5 % Allele 6 % Clade 1 CGCGAG 226
92.0 7.5 94.3 GACGAG 392 12.0 77.6 91.3
TABLE-US-00008 TABLE 2 Mean performance on reading and cognitive
measures in DCDC2 risk haplotype carriers vs. non-carriers. The
standard deviation is shown in parentheses next to each mean. The
number of subjects in each category is shown below that category.
P-values are from Student's T-tests comparing the means of carriers
and non-carriers of each haplotype; p-values less than 0.05 are
marked with an asterisk. NWR/WOLD refers to the average z-score of
performance on NWR and WOLD Verbal Comprehension tasks. Phenotypes
are described in Table S1 and in the Materials and Methods. CGCGAG
(RD) Haplotype GACGAG (LI) Haplotype Carriers Non-carriers P-value
Carriers Non-carriers P-value Reading 7 27.34 (9.04) 29.01 (8.77)
0.005* 29.09 (8.62) 28.92 (8.80) 0.728 N 232 4929 358 4803 Spelling
7 24.38 (13.46) 26.29 (12.33) 0.023* 25.56 (12.77) 26.26 (12.36)
0.305 N 229 4896 355 4770 Phoneme 19.30 (10.00) 20.80 (9.17) 0.016*
20.61 (9.20) 20.74 (9.21) 0.796 N 230 4909 357 4782 Reading 9 7.37
(2.71) 7.73 (2.27) 0.020* 7.75 (2.33) 7.72 (2.29) 0.754 N 228 4914
359 4783 NW Read 9 5.05 (2.58) 5.38 (2.36) 0.043* 5.47 (2.36) 5.36
(2.44) 0.391 N 228 4911 359 4780 Spelling 9 10.03 (2.58) 10.50
(3.23) 0.031* 10.48 (3.25) 10.48 (3.26) 0.987 N 228 4904 357 4775
Speed 105.44 (11.76) 106.34 (12.10) 0.299 106.71 (11.77) 106.27
(12.11) 0.524 N 207 4430 326 4311 Accuracy 102.77 (14.00) 105.22
(13.10) 0.009* 105.18 (13.24) 105.11 (13.15) 0.919 N 208 4438 329
4317 Read Comp 99.74 (11.67) 101.54 (11.37) 0.026* 101.73 (11.82)
101.44 (11.35) 0.663 N 208 4438 329 4317 Verbal IQ 107.35 (15.70)
108.97 (15.67) 0.113 108.38 (15.90) 108.94 (15.65) 0.497 N 245 5334
388 5191 Perf. IQ 101.23 (14.96) 100.28 (16.16) 0.366 101.10
(15.72) 101.19 (16.14) 0.913 N 245 5334 388 5191 Total IQ 104.58
(14.22) 106.05 (15.26) 0.138 105.62 (14.95) 106.01 (15.23) 0.623 N
245 5334 388 5191 NWR 7.54 (1.94) 7.58 (1.91) 0.724 7.40 (1.91)
7.55 (1.91) 0.136 N 245 5276 384 5137 WOLD 7.11 (2.56) 7.33 (2.44)
0.178 7.12 (2.60) 7.33 (2.43) 0.104 N 245 5270 383 5132 NWR/WOLD
-0.031 (0.82) 0.00 (0.78) 0.532 -0.08 (0.77) 0.01 (0.78) 0.041* N
245 5281 384 5142
TABLE-US-00009 TABLE S1 (A) List of phenotypes used. A detailed
description of each phenotype is given in the Materials and
Methods. (B) Case/control definitions used in association analysis.
Phenotype Description A. Reading at 7 Wechsler Objective Reading
Dimensions (WORD), single-word reading task, age 7 Reading at 9
Single-word reading task, age 9 Phoneme Del Auditory Analysis task,
age 7 Total IQ Wechsler Intelligence Scale for Children (WISC),
Total IQ, age 8 Verbal IQ WISC Verbal IQ component, age 8
Performance IQ WISC Performance IQ component, age 8 WOLD Wechsler
Objective Language Dimensions (WOLD), verbal comprehension task,
age 8 NWR Non-word repetition task, age 8 NW Read at 9 Non-word
reading task, age 9 Spelling at 7 Single-word spelling task, age 7
Spelling at 9 Single-word spelling task, age 9 Speed Passage
reading, speed, age 9 Accuracy Passage reading, accuracy of words
read, age 9 Reading Comp. Passage reading, comprehension, age 9 B.
Phoneme Cases defined as having a score less than or equal to 2 Del
standard deviations below the mean on the phoneme deletion task
WOLD/NWR Cases defined as having a score less than or equal to 2
standard deviations below the mean on either the WOLD verbal
comprehension task or the non-word repetition task
TABLE-US-00010 TABLE S2 Bivariate Pearson correlations among
reading and language measures in ALSPAC. Phoneme = Phoneme deletion
task at age 7 years; NWR = Nonword Repetition at age 8 years; WOLD
= Wechsler Objective Learning Dimensions Verbal Comprehension task
at age 8 years; Avg NWR WOLD = average of z-score performance on
NWR and WOLD tasks; Reading 7 = Single word reading at age 7 years;
Reading 9 = Single word reading at age 9; All IQ measures were
collected at age 8 years with the Wechsler Intelligence Scale for
Children version III. Avg WOLD NWR Verbal Phoneme NWR Comp. WOLD
Reading 7 Reading9 Total IQ IQ Perf. IQ Phoneme 1 NWR 0.362 1 WOLD
0.165 0.214 1 Avg NWR 0.338 0.779 0.780 1 WOLD Reading 7 0.688
0.403 0.259 0.425 1 Reading 9 0.550 0.351 0.202 0.355 0.722 1 Total
IQ 0.406 0.324 0.386 0.455 0.500 0.387 1 Verbal IQ 0.426 0.346
0.424 0.494 0.536 0.421 0.871 1 Perf. IQ 0.246 0.192 0.216 0.262
0.292 0.218 0.819 0.435 1
TABLE-US-00011 TABLE S3 Statistics for the SNPs reported here.
Statistics were calculated after exclusion of low-call-rate samples
(<85% average call rate) and individuals not of European
descent. SNPs in normal (not bold) font (SNPs 1-6 in table)
comprise the DCDC2 risk haplotype block; SNPs in bold font (SNPs
7-9 in table) comprise the KIAA0319 risk haplotype block. SNPs are
listed in the order of their respective haplotype (e.g. CGCGAG).
Major Minor Call Major Allele Minor Allele HWE p- SNP Rate Allele
Freq. Allele Freq. value rs33914824 92.6% C 0.961 G 0.039 0.541
rs807694 94.0% G 0.952 A 0.047 0.974 rs707864 93.0% T 0.874 C 0.126
0.012 rs10456301 93.5% G 0.929 A 0.071 0.814 rs16889066 91.2% A
0.945 G 0.055 0.134 rs9379651 86.7% G 0.877 A 0.123 0.720 rs4504469
89.1% C 0.592 T 0.408 0.054 rs2038137 90.0% G 0.630 T 0.370 0.611
rs2143340 89.6% A 0.849 G 0.151 0.583
TABLE-US-00012 TABLE S4 Structures and population frequencies for
all BV677278 alleles described to date. Allele frequencies for
available alleles were calculated from a previous study (10).
Population allele frequencies for alleles 11-22--only frequencies
in DCDC2 risk haplotype carriers are available (see Table S5).
`Del` signifies the 2,445bp microdeletion encompassing BV677278.
Repeat unit 1 Repeat Repeat Allele SEQ ID NOs: 15 and 16 unit 2
unit 3 Repeat unit 4 Const. Region 1 (GAGAGGAAGGAAA)2 (GGAA)7
(GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO:
28) NO: 19) 2 (GAGAGGAAGGAAA)1 (GGAA)9 (GAAA)0 (GGAA)0 GGAAAGAATGAA
(SEQ ID NO: 16) SEQ ID (SEQ ID NO: 28) NO: 20) 3 (GAGAGGAAGGAAA)1
(GGAA)6 (GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO: 16) (SEQ ID (SEQ
ID NO: 28) NO: 21) 4 (GAGAGGAAGGAAA)2 (GGAA)6 (GAAA)1 (GGAA)2
GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO: 28) NO: 21) 5
(GAGAGGAAGGAAA)2 (GGAA)8 (GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO:
15) (SEQ ID (SEQ ID NO: 28) NO: 22) 6 (GAGAGGAAGGAAA)2 (GGAA)8
(GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO:
28) NO: 22) 7 (GAGAGGAAGGAAA)2 (GGAA)8 (GAAA)1 (GGAA)1 GGAAAGAATGAA
(SEQ ID NO: 15) (SEQ ID (SEQ ID NO: 28) NO: 22) 8 (GAGAGGAAGGAAA)2
(GGAA)7 (GAAA)0 (GGAA)0 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ
ID NO: 28) NO: 19) 9 (GAGAGGAAGGAAA)1 (GGAA)7 (GAAA)1 (GGAA)2
GGAAAGAATGAA (SEQ ID NO: 16) (SEQ ID (SEQ ID NO: 28) NO: 19) 10
(GAGAGGAAGGAAA)2 (GGAA)4 (GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO:
15) (SEQ ID (SEQ ID NO: 28) NO: 23) 11 (GAGAGGAAGGAAA)2 (GGAA)7
(GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO:
28) NO: 19) 12 (GAGAGGAAGGAAA)1 (GGAA)8 (GAAA)1 (GGAA)2
GGAAAGAATGAA (SEQ ID NO: 16) (SEQ ID (SEQ ID NO: 28) NO: 22) 13
(GAGAGGAAGGAAA)2 (GGAA)9 (GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO:
15) (SEQ ID (SEQ ID NO: 28) NO: 20) 14 (GAGAGGAAGGAAA)2 (GGAA)9
(GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO:
28) NO: 20) 15 (GAGAGGAAGGAAA)2 (GGAA)5 (GAAA)2 (GGAA)2
GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO: 28) NO: 24) 16
(GAGAGGAAGGAAA)2 (GGAA)5 (GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO:
15) (SEQ ID (SEQ ID NO: 28) NO: 24) 17 (GAGAGGAAGGAAA)2 (GGAA)4
(GAAA)2 (GGAA)2 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO:
28) NO: 23) 18 (GAGAGGAAGGAAA)2 (GGAA)7 (GAAA)2 (GGAA)2
GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO: 28) NO: 19) 19
(GAGAGGAAGGAAA)2 (GGAA)9 (GAAA)0 (GGAA)0 GGAAAGAATGAA (SEQ ID NO:
15) (SEQ ID (SEQ ID NO: 28) NO: 20) 20 (GAGAGGAAGGAAA)2 (GGAA)10
(GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO:
28) NO: 25) 21 (GAGAGGAAGGAAA)2 (GGAA)6 (GAAA)1 (GGAA)2
GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO: 28) NO: 21) 22
(GAGAGGAAGGAAA)2 (GGAA)10 (GAAA)0 (GGAA)0 GGAAAGAATGAA (SEQ ID NO:
15) (SEQ ID (SEQ ID NO: 28) NO: 25) 23 (GAGAGGAAGGAAA)2 (GGAA)11
(GAAA)0 (GGAA)0 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO:
28) NO: 26) 24 (GAGAGGAAGGAAA)2 (GGAA)6 (GAAA)2 (GGAA)2
GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO: 28) NO: 21) 25
(GAGAGGAAGGAAA)1 (GGAA)8 (GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO:
16) (SEQ ID (SEQ ID NO: 28) NO: 22) 26 (GAGAGGAAGGAAA)2 (GGAA)5
(GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO:
28) NO: 24) 27 (GAGAGGAAGGAAA)1 (GGAA)5 (GAAA)1 (GGAA)2
GGAAAGAATGAA (SEQ ID NO: 16) (SEQ ID (SEQ ID NO: 28) NO: 24) 28
(GAGAGGAAGGAAA)2 (GGAA)7 (GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO:
15) (SEQ ID (SEQ ID NO: 28) NO: 19) 29 (GAGAGGAAGGAAA)2 (GGAA)5
(GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO: 15) +(GGGA)1 (SEQ ID NO:
28) +(GGAA)1 (SEQ ID NO: 27) 30 (GAGAGGAAGGAAA)2 (GGAA)5 (GAAA)1
(GGAA)4 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID (SEQ ID NO:
28) NO: 24) NO: 23) 31 (GAGAGGAAGGAAA)2 (GGAA)7 (GAAA)1 (GGAA)1
GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID +(GGGA)1 (SEQ ID NO: 28) NO:
19) 32 (GAGAGGAAGGAAA)2 (GGAA)8 (GAAA)0 (GGAA)0 GGAAAGAATGAA (SEQ
ID NO: 15) (SEQ ID (SEQ ID NO: 28) NO: 22) 33 (GAGAGGAAGGAAA)2
(GGAA)6 (GAAA)0 (GGAA)0 GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ
ID NO: 28) NO: 21) 34 (GAGAGGAAGGAAA)2 (GGAA)7 (GAAA)2 (GGAA)2
GGAAAGAATGAA (SEQ ID NO: 15) (SEQ ID (SEQ ID NO: 28) NO: 19) 35
(GAGAGGAAGGAAA)1 (GGAA)7 (GAAA)1 (GGAA)2 GGAAAGAATGAA
+(GAGAGGAAGAAAA)1 (SEQ ID (SEQ ID NO: 28) (SEQ ID NO: 17) NO: 19)
36 (GAGAGGAAGGAAA)1 (GGAA)9 (GAAA)1 (GGAA)2 GGAAAGAATGAA
+(GAGAGGAAGGAA)1 (SEQ ID (SEQ ID NO: 28) (SEQ ID NO: 18) NO: 20) 37
(GAGAGGAAGGAAA)2 (GGAA)6 (GAAA)1 (GGAA)2 GGAAAGAATGAA (SEQ ID NO:
15) (SEQ ID (SEQ ID NO: 28) NO: 21) 38 (GAGAGGAAGGAAA)1 (GGAA)10
(GAAA)0 (GGAA)0 GGAAAGAATGAA (SEQ ID NO: 16) (SEQ ID NO: 28) 39/Del
x x x x x Const. Allele Allele Allele Repeat unit 5 Region Freq*
Freq** Length 1 (GGAA)4 (GGGA)2 0.624 0.5536 102 (SEQ ID NO: 23) 2
(GGAA)4 (GGGA)2 0.003 0.0143 85 (SEQ ID NO: 23) 3 (GGAA)4 (GGGA)2
0.060 0.0464 85 (SEQ ID NO: 23) 4 (GGAA)4 (GGGA)2 0.106 0.1429 98
(SEQ ID NO: 23) 5 (GGAA)4 (GGGA)2 0.028 0.0143 106 (SEQ ID NO: 23)
6 (GGAA)3 (GGGA)2 0.039 0.0571 102 (SEQ ID NO: 29) 7 (GGAA)4
(GGGA)2 0.003 0 102 (SEQ ID NO: 23) 8 (GGAA)4 (GGGA)2 0.003 0 90
(SEQ ID NO: 23) 9 (GGAA)4 (GGGA)2 0.005 0.00179 89 (SEQ ID NO: 23)
10 (GGAA)4 (GGGA)2 0.044 0.0286 90 (SEQ ID NO: 23) 11 (GGAA)3
(GGGA)2 N/A 0 98 (SEQ ID NO: 29) 12 (GGAA)3 (GGGA)2 N/A 0.0036 89
(SEQ ID NO: 29) 13 (GGAA)3 (GGGA)2 N/A 0.0071 106 (SEQ ID NO: 29)
14 (GGAA)4 (GGGA)2 N/A N/A 110 (SEQ ID NO: 23) 15 (GGAA)4 (GGGA)2
N/A N/A 98 (SEQ ID NO: 23) 16 (GGAA)4 (GGGA)2 N/A N/A 94*Coriell
(SEQ ID NO: 23) AfA Plate Only 17 (GGAA)4 (GGGA)2 N/A N/A
94*Coriell (SEQ ID NO: 23) AfA Plate Only 18 (GGAA)4 (GGGA)2 N/A
N/A 106*Coriell (SEQ ID NO: 23) AfA Plate Only 19 (GGAA)4 (GGGA)2
N/A N/A 98 (SEQ ID NO: 23) 20 (GGAA)4 (GGGA)2 N/A N/A 114 (SEQ ID
NO: 23) 21 (GGAA)3 (GGGA)2 N/A N/A 94 (SEQ ID NO: 29) 22 (GGAA)4
(GGGA)2 N/A N/A 102 (SEQ ID NO: 23) 23 (GGAA)4 (GGGA)2 N/A N/A 106
(SEQ ID NO: 23) 24 (GGAA)4 (GGGA)2 N/A N/A 102 (SEQ ID NO: 23) 25
(GGAA)4 (GGGA)2 N/A N/A 93 (SEQ ID NO: 23) 26 (GGAA)3 (GGGA)2 N/A
N/A 90 (SEQ ID NO: 29) 27 (GGAA)4 (GGGA)2 N/A N/A 81 (SEQ ID NO:
23) 28 (GGAA)5 (GGGA)2 N/A N/A 106 (SEQ ID NO: 24) 29 (GGAA)4
(GGGA)2 N/A N/A 102 (SEQ ID NO: 23) 30 (GGAA)4 (GGGA)2 N/A N/A 102
(SEQ ID NO: 23) 31 (GGAA)4 (GGGA)2 N/A N/A 102 (SEQ ID NO: 23) 32
(GGAA)4 (GGGA)2 N/A N/A 94 (SEQ ID NO: 23) 33 (GGAA)3 (GGGA)2 N/A
N/A 82 (SEQ ID NO: 29) 34 (GGAA)3 (GGGA)2 N/A N/A 102 (SEQ ID NO:
29) 35 (GGAA)4 (GGGA)2 N/A N/A 102 (SEQ ID NO: 23) 36 (GGAA)4
(GGGA)2 N/A N/A 109 (SEQ ID NO: 23) 37 (GGAA)4 (GGGA)1 N/A N/A 98
(SEQ ID NO: 23) +(GAAA)1 +(GGAA)2 (SEQ ID NO: 30) 38 (GGAA)4
(GGGA)2 N/A N/A 89 (SEQ ID NO: 23) 39/Del x x 0.085 0.1143
*Frequency among parents of the Colorado Learning Disability
Research Center families.
TABLE-US-00013 Sequences of alleles 1-39 from Table S4 of Example 1
Allele 1 (SEQ ID NO: 31)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAA
GGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 2 (SEQ ID NO: 32)
GAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAAGAATGAAG
GAAGGAAGGAAGGAAGGGAGGGA Allele 3 (SEQ ID NO: 33)
GAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAAGGAAAGAATGAAG
GAAGGAAGGAAGGAAGGGAGGGA Allele 4 (SEQ ID NO: 34)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAA
GGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 5 (SEQ ID NO: 35)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAA
GGAAGGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 6 (SEQ ID NO:
36) GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAA
GGAAGGAAGGAAAGAATGAAGGAAGGAAGGAAGGGAGGGA Allele 7 (SEQ ID NO: 37)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAA
GGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 8 (SEQ ID NO: 38)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAAGAA
TGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 9 (SEQ ID NO: 39)
GAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAAGGAAAGAAT
GAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 10 (SEQ ID NO: 40)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGAAAGGAAGGAAGGAAAGAA
TGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 11 (SEQ ID NO: 41)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAA
GGAAGGAAAGAATGAAGGAAGGAAGGAAGGGAGGGA Allele 12 (SEQ ID NO: 42)
GAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAAGGAAA
GAATGAAGGAAGGAAGGAAGGGAGGGA Allele 13 (SEQ ID NO: 43)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAA
GAAAGGAAGGAAGGAAAGAATGAAGGAAGGAAGGAAGGGAGGGA Allele 14 (SEQ ID NO:
44) GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAA
GAAAGGAAGGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 15 (SEQ ID
NO: 45)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGAAAGAAAGGAAGGAA
GGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 16 (SEQ ID NO: 46)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAAGGAA
AGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 17 (SEQ ID NO: 47)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGAAAGAAAGGAAGGAAGGAA
AGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 18 (SEQ ID NO: 48)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGAAA
GGAAGGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 19 (SEQ ID NO:
49) GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAA
GGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 20 (SEQ ID NO: 50)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAA
GGAAGAAAGGAAGGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 21 (SEQ
ID NO: 51)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAA
GGAAAGAATGAAGGAAGGAAGGAAGGGAGGGA Allele 22 (SEQ ID NO: 52)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAA
GGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 23 (SEQ ID NO: 53)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAA
GGAAGGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 24 (SEQ ID NO:
54) GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGAAAGGAA
GGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 25 (SEQ ID NO: 55)
GAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAAGGAAA
GAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 26 (SEQ ID NO: 56)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAAGGAA
AGAATGAAGGAAGGAAGGAAGGGAGGGA Allele 27 (SEQ ID NO: 57)
GAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAAGGAAAGAATGAAGGAAG
GAAGGAAGGAAGGGAGGGA Allele 28 (SEQ ID NO: 58)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAA
GGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 29 (SEQ ID NO:
59) GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGGAGGAAGAAAGGAA
GGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 30 (SEQ ID NO: 60)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAAGGAA
GGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 31 (SEQ ID NO: 61)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAA
GGGAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 32 (SEQ ID NO: 62)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAA
AGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 33 (SEQ ID NO: 63)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAAGAATGAA
GGAAGGAAGGAAGGGAGGGA Allele 34 (SEQ ID NO: 64)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGAAA
GGAAGGAAGGAAAGAATGAAGGAAGGAAGGAAGGGAGGGA Allele 35 (SEQ ID NO: 65)
GAGAGGAAGGAAAGAGAGGAAGAAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAA
GGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 36 (SEQ ID NO: 66)
GAGAGGAAGGAAAGAGAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAG
AAAGGAAGGAAGGAAAGAATGAAGGAAGGAAGGAAGGAAGGGAGGGA Allele 37 (SEQ ID
NO: 67)
GAGAGGAAGGAAAGAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGAAAGGAAGGAA
GGAAAGAATGAAGGAAGAAAGGAAGGAAGGGAGGGA Allele 38 (SEQ ID NO: 68)
GAGAGGAAGGAAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAAGAAT
GAAGGAAGGAAGGAAGGAAGGGAGGGA
TABLE-US-00014 TABLE S5 BV677278 allele frequencies in carriers of
the CGCGAG and GACGAG haplotypes in the ALSPAC. Alleles belonging
to Clade 1 are demarcated with a "1" in parentheses. BV677278
Allele Frequency, BV677278 Allele Frequency, Allele CGCGAG GACGAG 1
0.323 0.367 2 0 0 3 0.024 0.029 4 0.055 0.048 5(1) 0.473 0.061 6(1)
0.038 0.393 7 0 0 8 0 0.001 9 0.002 0.004 10 0.027 0.022 11(1) 0
0.008 12(1) 0 0 13(1) 0 0.009 14(1) 0.007 0.009 15 0.002 0 16 0 0
17 0 0 18 0 0.001 19 0 0 20(1) 0 0.005 21(1) 0 0.003 22 0 0.001 Del
0.049 0.037
TABLE-US-00015 TABLE S6 P-values for ChIP-qPCR experiment (FIG.
2B). Values represent one-tailed paired T-tests for fold enrichment
between each pair of ChIP conditions specified (three replicates
each). STR: BV677278 STR amplicon; .beta.-Actin: control amplicon
from the .beta.-actin gene; .alpha.-ETV6: ChIP with anti-ETV6
antibody; .alpha.-H3: ChIP with anti-variant histone H3 control
antibody; NA: no-antibody control ChIP. P-values below 0.05 are
shown in bold with an asterisk, p-values below 0.01 have two
asterisks. .alpha.-ETV6, .alpha.-ETV6, .beta.- .alpha.-H3, .beta.-
STR Actin NA, STR .alpha.-H3, STR Actin NA, .beta.-Actin
.alpha.-ETV6, STR X 0.00728** 0.01114* 0.07659 0.06967 0.01005*
.alpha.-ETV6, .beta.-Actin X 0.19296 0.05375 0.05228 0.11426 NA,
STR X 0.02079* 0.08653 0.42981 .alpha.-H3, STR X 0.46893 0.01133*
.alpha.-H3, .beta.-Actin X 0.05509 NA, .beta.-Actin X
TABLE-US-00016 TABLE S7 One-way ANOVA between groups listed in FIG.
3A-carriers of 1) no risk haplotype, 2) the KIAA0319 risk
haplotype, 3) either DCDC2 haplotype, and 4) a risk haplotype in
both genes. Phenotype (Z-transformed) ANOVA P-Value Phoneme
Deletion 0.088 Total IQ 0.003 Verbal IQ 0.006 Performance IQ 0.043
Verbal Comprehension (WOLD) 0.562 Nonword Repetition 0.053 Avg WOLD
and NWR 0.179
EXAMPLE 2
[0245] Having established the association between DCDC2 risk
haplotypes and KIAA0319 risk haplotypes, the relationship of READ1
(a functional RD/LI risk variant within DCDC2), and the KIAA0319 RD
risk variant KIAHap (a haplotype covering the 5' half of the RD
risk gene KIAA0319 and some of its upstream sequence and
neighboring gene TDP2) was investigated. As described herein, READ1
is comprised of five discrete repeat units, each of which varies in
number, giving rise to considerable polymorphism. Including the
deletion, 39 READ1 alleles have been described thus far, 6 of which
are common, and 32 of which are rare (in individuals of European
ancestry). These alleles vary in length from 81 bp to 114 bp. READ1
does not appear to exist outside of the higher primates, and among
higher primate species (and within Homo sapiens) its length is
highly variable. It appears as though READ1 is a hypermutable,
rapidly-evolving element that first appeared in primates and
reached its full size in Homo sapiens (Powers et al., 2013).
[0246] To determine the effects of individual READ1 alleles in
vivo, the association of READ1 with reading and language was
examined in the Avon Longitudinal Study of Parents and Children
(ALSPAC), a large longitudinal birth cohort. Allele 5 of READ1 was
strongly associated with Severe RD, while allele 6 of READ1 was
strongly associated with Severe LI (Table 1). Furthermore, those
individuals who carried at least one copy of allele 5 performed
worse on six different reading-related measures, on average,
compared to non-carriers of allele 5 (p<0.05), while carriers of
allele 6 performed worse on a composite language measure, on
average, than non-carriers of allele 6 (p<0.05). These two
relatively common alleles (allele frequencies of 3.6% and 5.0%,
respectively) are structurally similar to each other, and cluster
to the same clade phylogenetically. Compared to the most common
allele (allele 1) both alleles have a GGAA insertion in the same
position. By contrast, other common READ1 alleles appeared to have
a protective effect on reading and language. Carriers of at least
one copy of a shorter READ1 allele (an allele with only one
iteration of repeat unit 1 or RU1, as opposed to two iterations in
the majority of READ1 alleles) performed better on reading and
language tasks relative to non-carriers, although the protective
associations with Severe RD and LI were only suggestive (Table 1 of
Example 2, below).
TABLE-US-00017 TABLE 1 Association results for READ1. This table
shows the results of association with READ1 single and compound
alleles with Severe RD and Severe LI. Associations were computed
under an allelic model. Significant p-values are bolded, and
notable odds ratios for increased (.sup.#) and reduced (*) risk are
shown. P-Value, OR, P-Value, OR, READ1 MAF Severe Severe Severe
Severe Allele Description Alleles (ALSPAC) RD Rd LI LI 3 3 only, 3
0.0463 0.179 .sup. 0.575* 0.255 0.77 protective 4 4 only 4 0.0924
0.239 1.28.sup. 0.141 0.78 5 5 only, 5 0.0355 0.000058 2.37.sup.#
0.487 0.84 deleterious 6 6 only, 6 0.0496 0.0995 1.53.sup.# 0.00595
1.65 deleterious 10 10 only 7 0.0502 0.795 .sup. 0.919 0.603 0.9 5
+ 6 Major 5, 6 0.0851 0.000037 1.96.sup.# 0.000074 1.73 deleterious
alleles Shorter Only one 2, 3, 9, 12, 0.0521 0.0957 .sup. 0.506*
0.292 0.8 Alleles copy of 25, 27, 38 RU1; major protective
alleles
[0247] As described herein, the READ1-binding protein was
identified as the potent transcriptional repressor ETV6, whose
binding sequence (GGAA) matches the insertion seen in risk alleles
5 and 6. This suggests that alleles 5 and 6 have more ETV6 binding
sites available than neutral or protective alleles (particularly
the protective shorter alleles). When the properties of ETV6 are
considered, the peculiar behavior of READ1 alleles begins to make
sense. In its monomeric state, ETV6 is incapable of binding DNA. It
must at least homodimerize to displace an autoinhibitory domain
blocking its DNA-binding domain in the monomeric state (Green et
al., 2010), and is known to be capable of homopolymerization
(Tognon et al., 2004). This property suggests an intriguing
possibility--that READ1 alleles of different lengths bind different
numbers of ETV6 monomers, and this alters the regulatory power of
the complex (FIG. 3B).
[0248] Likewise, KIAHap is thought to tag a functional promoter or
promoter-proximal variant that alters KIAA0319 gene regulation.
There is evidence that this functional variant is the SNP
rs9461045. This SNP is in linkage disequilibrium with KIAHap and is
associated with reduced reporter gene expression from the KIAA0319
promoter in both neuronal and non-neuronal cell lines (Dennis et
al., 2009; Paracchini et al., 2006). The regulatory nature of
KIAHap and the fact that READ1 appears to be an ETV6-binding
regulatory element led to the question of whether READ1 risk
alleles 5 and 6 interact with KIAHap to affect phenotype.
Strikingly, carriers of both a READ1 risk allele and KIAHap showed
markedly worse performance in reading, language, and IQ tasks (FIG.
5A shows allele 5). This interaction is generally synergistic; that
is, the effect of having two risk alleles is greater than the sum
of the alleles' individual effects (FIG. 5A). Additionally, the
protective, shorter alleles of READ1 (e.g., those comprising a
single repeat unit 1, "RU1_1") appear to interact epistatically
with KIAHap. In individuals with at least one copy of a shorter,
RU1_1 allele of READ1, the deleterious effect of KIAHap is
completely negated--mean performance of subjects with both an RU1_1
allele and KIAHap is generally slightly above the population
average and resembles that of subjects with an RU1_1 allele alone
(FIG. 5B). Therefore, it appears that KIAHap synergizes with
deleterious READ1 alleles to exacerbate their individual
deleterious effects on reading and language, but is epistatically
masked by protective READ1 alleles. This suggests that ETV6 and
READ1 form a regulatory complex that epistatically regulates
KIAA0319, and possibly DCDC2 and other target genes.
EXAMPLE 3
[0249] Characterization of the DYX2 locus on chromosome 6p22 with
reading disability, language impairment, and overall cognition
INTRODUCTION
[0250] Described here is assessment of the relationship of the DYX2
locus with RD, LI, and cognition. A marker panel densely covering
the 1.4Mb DYX2 locus was developed and used to assess the
association with reading, language, and cognitive measures in
subjects from the Avon Longitudinal Study of Parents and Children
(ALSPAC). Associations were then replicated in three independent,
selected cohorts. Confirming the results of the other Examples
described herein, there were associations with known RD risk genes
KIAA0319 and DCDC2 (FIG. 6A). In addition, other markers were
identified in or near other DYX2 genes, including TDP2, ACOT13,
C6orf62, FAM65B, and CMAHP. The LD structure of the locus suggests
that association hits within TDP2, ACOT13, and C6orf62 are
capturing a previously reported risk variant in KIAA0319. These
results further substantiate KIAA0319 and DCDC2 as major effector
genes in DYX2 and identify FAM65B and CMAHP as new DYX2 risk genes.
Association of DYX2 with multiple neurobehavioral traits suggests
risk variants have functional consequences affecting multiple
neurological processes.
[0251] Methods
[0252] Subjects
[0253] The discovery cohort in this investigation was the Avon
Longitudinal Study of Parents and Children (ALSPAC). The ALSPAC is
a population-based birth cohort based in Avon, United Kingdom.
Subjects were recruited before birth--a total of 15,458 fetuses
were recruited, of whom 14,701 were alive at 1 year of age.
Recruitment, participants, and study methodologies are described in
detail elsewhere (bristol.ac.uk/alspac) (Boyd et al. 2012; Golding
et al. 2001). DNA samples for genetic analysis were available for
10,259 subjects. Reading, language, and cognitive measures were
performed at ages 7, 8, and 9 years. Subjects with IQ <75 on the
Wechsler Intelligence Scale for Children (WISC-III) Total IQ were
excluded to prevent confounding effects of Intellectual Disability
(Eicher et al. 2013a; Eicher et al. 2013b; Powers et al. 2013;
Wechsler et al. 2002). To prevent population stratification in
genetic analyses, subjects of non-European descent were also
excluded. Samples with overall genotype call rates <0.85 were
excluded from analyses, leaving a final sample size of 5579
individuals for LI analyses and 5525 individuals for RD analyses.
Ethical approval was obtained from the ALSPAC Ethics and Law
Committee, Local UK Research Ethics Committees, and the Yale Human
Investigation Committee.
[0254] Reading, Language, and Cognitive Measures
[0255] Reading measures in the ALSPAC used in this investigation
include a phoneme deletion task at age 7 years, single-word reading
at ages 7 and 9 years, and single non-word reading at age 9 years
(Table 1a). The phoneme deletion task, also known as the Auditory
Analysis Test, measures phoneme awareness, a core deficit in RD
(Rosner and Simon 1971). For the phoneme deletion task, the child
listens to a word spoken aloud and is then asked to remove a
specific phoneme from that word to make a new word. Single-word
reading was assessed at age 7 years using the reading subtest of
the Wechsler Objective Reading Dimensions (WORD) (Rust et al.
1993). At age 9 years, single-word reading was again assessed by
asking the child to read ten real words and ten non-words aloud, a
subset of a larger list of words and non-words (Nunes et al. 2003).
To examine severe cases (Severe RD), cases were defined as having a
score 2 or more standard deviations below the mean on the phoneme
deletion task (Table 1b). Cases with Moderate RD were also defined
as scoring at least 1 standard deviation below the mean on
single-word reading at age 7 years, single-word reading at age 9
years, and single non-word reading at age 9 years (Table 1b). A
threshold of 1 standard deviation was chosen as measures were
included at three time points to isolate individuals with
persistently poor decoding skills. Different severity levels were
examined because past studies in the DYX2 locus have shown
differences in genetic association patterns depending on case
severity, particularly with KIAA0319 associating with more moderate
RD case definitions and DCDC2 with more severe definitions
(Paracchini et al. 2008; Powers et al. 2013; Scerri et al.
2011).
[0256] Language measures were collected at age 8 years (Table 1a).
An adaptation of the Nonword Repetition Task (NWR), in which
subjects repeated recordings of nonwords, assessed short-term
phonological memory and processing abilities (Gathercole and
Baddeley 1996). Children also completed the Wechsler Objective
Language Dimensions (WOLD) verbal comprehension task at age 8 years
(Wechsler et al. 1996), where they answered questions about a
paragraph read aloud by an examiner describing a presented picture.
These measures were looked at because individuals with LI are known
to perform consistently poorly on NWR and WOLD tasks (Bishop et al.
1996; Newbury et al. 2009). As with RD, the association of the DYX2
locus with severe and moderate case definitions of LI was also
examined. To assess the risk imparted for severe LI, severe LI
cases were defined as scoring 2 or more standard deviations below
the sample mean on either task (Severe LI) (Table 1b). To assess
more moderate deficits, cases were defined as scoring at least 1.5
standard deviations below the sample mean on each task (Moderate
NWR and Moderate WOLD) (Table 1b). As fewer measures were used to
assess LI related traits, the threshold for case definitions was
increased to 1.5 standard deviations to assess more moderate
deficits. Verbal, performance, and total IQ were assessed at age 8
years, using the Wechsler Intelligence Scale for Children
(WISC-III) (Table 1a). IQ measures were examined as quantitative
traits (Table 1b).
[0257] Genotyping and Genetic Analyses
[0258] A SNP marker panel was developed to capture the common
variation in the DYX2 locus. TagSNPs in the locus were selected
using the association study design server of Han et al. (Han et al.
2008). The final DYX2 panel contained 195 SNPs with an estimated
average power of 83% and 68% to capture known common and rare
variants, respectively, in the DYX2 locus spanning approximately
1.4 Mb. Markers were genotyped on the Sequenom platform (San Diego,
Calif.), following manufacturer's guidelines, at the Yale Center
for Genome Analysis (West Haven, Conn.).
[0259] Markers that deviated substantially from Hardy-Weinberg
equilibrium, or that had an overall call rate <85%, were not
used for haplotype-based analysis. In the discovery ALSPAC cohort,
single marker SNP analyses of case-control statuses and
quantitative traits were completed using SNP & Variation Suite
(SVS) v7.6.4 (Bozeman, MT). Linkage disequilibrium was assessed and
haplotype blocks were constructed using the four-gamete rule option
in HaploView v4.2. Haplotype association tests were performed with
haplotypes that had frequencies greater than or equal to 1% using
PLINK v1.07 (Barrett et al. 2005; Purcell et al. 2007).
Associations with p<0.001 are reported for the ALSPAC discovery
cohort to present suggestive results. However, to correct for
multiple testing, a Bonferroni threshold of 0.000256 (0.05 divided
by 195 markers) was used for discovery association tests in the
ALSPAC cohort.
[0260] Following discovery analyses in the ALSPAC, associated
variants were assessed in three cohorts specifically recruited for
either RD or LI: Iowa LI, Italian RD, and Colorado RD (Table 2).
The Iowa LI cohort is comprised of 219 LI cases and 209 sex- and
age-matched, unrelated controls collected at the University of
Iowa. Subjects completed various language measures, including the
Woodcock Johnson-III (W J) and the Gray Oral Reading Test (GORT),
which were used to derive a composite language score, which was
then dichotomized into case-control status at -1.14 standard
deviations (Eicher et al. 2013a; Weismer 2000). The Colorado
Learning Disabilities Research Center (CLDRC) cohort consists of
1201 individuals in 293 nuclear families. Families were recruited
to the study if at least one child had a history of reading
problems. The Italian cohort consists of 878 individuals in 304
nuclear families; these families were recruited via a proband with
clinically diagnosed RD. Ethical approval for recruitment and study
methodologies were obtained from the Yale Human Investigation
Committee and Institutional Review Boards at the University of
Iowa, the University of Denver, and Italy. SNPs that had single
marker or within-haplotype associations with p<0.001 in the
ALSPAC were tested for replication in the Iowa LI, Italian RD, and
Colorado RD cohorts. Iowa LI was analyzed using SNP & Variation
Suite (SVS) v7.6.4 (Bozeman, MT), while the family-based Italian RD
and Colorado RD cohorts were examined using PLINK v1.07 (Purcell et
al. 2007). Suggestive ALSPAC results were moved forward for
replication analyses in order to emphasize replication of
associations over statistical corrections for multiple testing.
Replications with p<0.05 in the Iowa LI, Italian RD, and
Colorado RD cohorts are reported.
[0261] Results
[0262] Association with DYX2 markers was performed in three
separate domains: (1) RD, (2) LI, and (3) IQ. For the sake of
clarity, associations are presented domain-by-domain, with an
emphasis on replication for strength of associations as opposed to
correction for multiple testing.
[0263] RD
[0264] Associations with RD were performed using two different
severity definitions: (1) Severe RD and (2) Moderate RD (Table 1b).
For Severe RD, associations were found with DCDC2, KIAA0319, and
TDP2 (Table 3). The association of DCDC2 and Severe RD is explored
fully in Example 1; Powers et al. (2013). TDP2 marker rs2294691 did
not replicate its association in any of the three replication
cohorts (Table 5). KIAA0319 marker rs10456309 replicated in Iowa LI
and Colorado RD cohorts (Table 5). With Moderate RD, there was an
association between rs1562422 near the gene FAM65B and the
pseudogene CMAHP, which was replicated in the Colorado RD cohort
(Table 3, Table 5).
[0265] LI
[0266] Association tests were performed on three LI phenotypes: (1)
Severe LI, (2) Moderate NWR, and (3) Moderate WOLD (Table 1b). As
with Severe RD, there were associations between DCDC2 and Severe
LI. The DCDC2 haplotype that associated with Severe LI is discussed
in Example 1; Powers et al (2013). A marker within this DCDC2
haplotype, rs807694, showed association with Severe LI and was
replicated in the Iowa LI cohort (Table 3, Table 5). With a more
moderate case definition, associations were observed with ACOT13
and C6orf62 (Table 3), genes neighboring KIAA0319 and TDP2. Both
rs3777663 in ACOT13 and rs3756814 in C6orf62 showed associations in
the Italian RD and Iowa LI cohorts (Table 5).
[0267] IQ
[0268] Association tests were also performed between DYX2 markers
and Verbal IQ, Performance IQ, and Total IQ (Table 1b). Verbal IQ
associations included single markers and haplotypes covering the 5'
half of KIAA0319, rs9348646 in FAM65B, and a haplotype spanning
ACOT13 and C6orf62, with evidence of replication (Table 4a, Table
4b, Table 5). There was substantial overlap of DYX2 associations
with Verbal IQ and associations with RD and LI. These similarities
of associations are not unexpected, as the traits are highly
correlated and known to capture similar domains (Table 6). The
associations of DYX2 with Performance and Total IQ were weaker;
there were no associations with Performance IQ and a single,
non-replicated association of Total IQ with rs2328791, which is
located in a large intergenic region telomeric to NRSN1 and DCDC2
(Table 4a, Table 4b, Table 5).
[0269] Linkage Disequilibrium within DYX2
[0270] In the analyses, replicated associations were observed in
the following genes: DCDC2, KIAA0319, TDP2, ACOT13, C6orf62,
FAM65B, and the pseudogene CMAHP. However, as these SNPs are in
close proximity to each other, linkage disequilibrium (LD) was
assessed among the marker panel to determine whether the associated
SNPs were tagging the same variation in the locus. As described in
Examples 1 and 2, DCDC2 associations tagged READ1 alleles. Within
KIAA0319, there appear to be two clear LD blocks separating the
gene into a 5' half and a 3' half (data not shown). The 5' half of
KIAA0319 is in strong LD with TDP2, ACOT13, and C6orf62, indicating
that associations within these genes may be capturing that same
variation (data not shown). Associations in FAM65B and CMAHP appear
to be tagging independent associations (data not shown). Although
rs1562422 is located intergenic to FAM65B and CMAHP, this marker is
in strong LD with other markers within the CMAHP pseudogene.
Integration of the association analyses and LD structure indicate
four independent association signals centered on (1) DCDC2, (2) the
5' half of KIAA0319, (3) FAM65B, and (4) CMAHP.
[0271] In this Example, the relationship of the DYX2 locus with RD,
LI, and IQ is characterized (FIG. 6B). The results confirm the
associations of RD risk genes KIAA0319 and DCDC2 to include LI.
Additionally, FAM65B and CMAHP were identified as risk genes for
linguistic traits. Markers within the DYX2 locus showed association
with numerous communication traits, including RD, LI, and Verbal
IQ. There was a marked absence of DYX2 associations with Total and
Performance IQ, indicating that the DYX2 locus influences
language-related processes to a much greater extent than overall
cognitive traits.
[0272] The genetic association of DYX2 with RD, LI, and Verbal IQ
is the latest example of various neurocognitive and communication
processes sharing genetic associations. Applicant and others have
shown that these neurobehavioral traits have common genetic
contributors, including variants in FOXP2, KIAA0319, CMIP, ZNF385D,
CNTNAP2, and DCDC2 (Eicher et al 2013b; Newbury et al. 2009;
Newbury et al. 2011, Pennington and Bishop 2009, Peter et al. 2011;
Pinel et al. 2012; Powers et al. 2013; Scerri et al. 2011; Wilcke
et al. 2012). The expansion of DYX2's association from reading to
include other language-related processes suggests that the
causative variants may affect these traits in a pleiotropic manner,
as opposed to influencing written or verbal language exclusively.
The findings of this study collude with this `generalist genes
hypothesis,` which is also supported by a recent genome-wide
complex trait analysis (GTCA) of cognitive and learning abilities
(Trzaskowski et al. 2013). The strong correlations and relatedness
among these neurocognitive measures (Table 6) suggest that these
DYX2 genes affect central language processes, which in turn
manifest themselves phenotypically in various ways, including
reading, language, and cognition.
[0273] That multiple DYX2 genes showed association with the
phenotypes in this study is interesting, and at first glance
somewhat unexpected. One possibility is that one or two genes are
not solely responsible for the consistent implication of this locus
in reading, language, and cognitive phenotypes, as is largely
believed. KIAA0319 and DCDC2 are currently known as the two major
risk genes in the DYX2 locus. Both have been implicated in both RD
and sub-clinical variation in reading performance, using both
classical neurobehavioral measures, and more recently, neuroimaging
techniques (Eicher and Gruen 2013; Graham and Fisher 2013). Other
genes in DYX2 have been associated with RD, but not nearly as often
as DCDC2 and KIAA0319. However, with the dense SNP panel described
herein, it was possible to observe associations with other DYX2
elements, including FAM65B and CMAHP.
[0274] Another possible explanation for the number of DYX2 genes
observed associating with the phenotypes in this study is LD within
the DYX2 locus. In fact, LD likely explains the cluster of
associations around KIAA0319, TDP2, ACOT13, and C6orf62. As
described herein, two major LD blocks span KIAA0319--one spans the
3' half of the gene, while the other spans the 5' region of
KIAA0319 as well as ACOT13, TDP2, and part of C6orf62. Nearly all
of the associations in this study localize to this 5' LD block,
which also contains the previously reported KIAA0319 RD risk
haplotype. Because of this LD structure, it is impossible to
determine whether the associations in this region are independent,
or are capturing the same functional variant. The latter
possibility is considered the most likely, and it is believed that
the associations in this region are likely tagging the same
causative variant captured by the KIAA0319 RD risk haplotype.
Functional study of this region--particularly of the less studied
genes TDP2, ACOT13, and C6orf62--will be useful to determine
whether these associations are independent or not.
[0275] By contrast, the markers within or near FAM65B and CMAHP
appear to be capturing distinct association signals from two
different LD blocks (data not shown). The SNP rs9348646, which
showed association with Verbal IQ, is located within an intron of
FAM65B in one LD block, while rs1562422, which showed association
with Moderate RD, localized to a separate LD block. While rs1562422
is an intergenic marker located physically between FAM65B and
CMAHP, it shows strong LD with markers in CMAHP (data not shown).
The LD patterns within the DYX2 locus suggest that associations in
KIAA0319, TDP2, ACOT13, and C6orf62 are tagging the same causative
variant, while rs9348646 in FAM65B and rs1562422 are
independent.
[0276] The other DYX2 genes, including FAM65B and CMAHP, have been
studied far less than the risk genes DCDC2 and KIAA0319. Little is
known about FAM65B in terms of biological function; however, there
is evidence that FAM65B may influence migration in T lymphocytes
(Rougerie et al. 2013). Animal models of DCDC2 and KIAA0319 have
implicated these genes in migratory processes, albeit in a neural
context. CMAHP, which encodes a key enzyme in the synthesis of the
sialic acids Neu5Ac and Neu5Gc in other mammals, was rendered a
pseudogene in humans by an inactivating microdeletion and
subsequent fixation of the inactive allele in early human
populations (Chou et al. 1998). Although ACOT13 appears to be
tagging variation within KIAA0319, the preliminary functional
studies of ACOT13 are intriguing. ACOT13 was recently associated
with lower asymmetric activation of the posterior superior temporal
sulcus during reading and phonology tasks (Pinel et al. 2012). The
protein product encoded by ACOT13 has been co-localized with
beta-tubulin on microtubules; microtubule binding is postulated to
be important to RD, as DCDC2 contains two doublecortin domains that
are thought to bind microtubules (Cheng et al. 2006).
[0277] Genes and regulatory elements within the DYX2 locus may
contribute interactively to reading and language domains, as seen
with the non-additive relationship between putative regulatory
variants in DCDC2 and KIAA0319 (Example 1; Powers et al. 2013,
Ludwig et al. 2008). These risk variants have been shown to
influence gene expression, and to interact with each other to
substantially influence performance on reading and language tasks.
A complex regulatory network, where regulatory elements interact
and co-regulate other DYX2 genes and elements, may contribute to
reading, language, and cognition. If so, it is likely that the
READ1 element in DCDC2 and the causative variant tagged by the
KIAA0319 risk haplotype have the strongest effects upon gene
expression and ultimate neurocognitive phenotype. Supporting this
idea is the fact that so many of the association hits described
herein--both single-marker and haplotype-based, and with all three
phenotypes--localize to the same LD block as the KIAA0319 risk
haplotype. This result, together with the KIAA0319 risk haplotype's
association with reduced KIAA0319 expression and its synergistic
interaction with a regulatory element in an intron of DCDC2,
indicate the presence of at least one regulatory variant in this
region that influences KIAA0319 expression. The locations of the
only other independent hits in the locus (aside from READ1 in
DCDC2)--an intron of FAM65B and downstream of a pseudogene--may
suggest additional regulatory regions that influence gene
expression. Based on work described herein, DCDC2 and KIAA0319 are
the major effector genes responsible for DYX2's influence on RD and
LI risk and alteration of gene expression levels or patterns is the
mechanism by which this effect is exerted.
[0278] Replication of genetic associations in independent cohorts
was emphasized in the work described herein, rather than reliance
on statistical corrections for multiple testing, for validation of
associations in the ALSPAC discovery cohort. The replications of
genetic association with the neurocognitive traits of interest,
particularly in the varied cohorts in this investigation, provide
strong evidence that the results of this study are not due to Type
I error. Uncorrected p-values and a statistical threshold
correcting for 195 genetic markers (threshold of 0.000256) are also
reported in order to present the context of the findings in terms
of strength and reliability. Nonetheless, the three replication
cohorts were not identical and had inherent differences among
themselves and relative to the discovery cohort that may have
prevented replication. These differences included: (1) the disorder
for which each cohort was selected (RD vs. LI vs. unselected), (2)
severity of case definition and recruitment, and 3) country of
recruitment (UK vs. US vs. Italy), and language spoken (English vs.
Italian). For instance, the use of a regular language such as
Italian as opposed to an irregular language such as English may
have allowed for easier detection of true reading and language
deficiencies. This issue was likely avoided in the discovery ALSPAC
cohort due to the large sample size. The observation of multiple
replicated associations throughout the DYX2 locus described herein
increases confidence in these results.
[0279] In summary, the analyses indicate four association signals
for RD, LI, and Verbal IQ in the DYX2 locus: DCDC2, KIAA0319,
FAM65B, and the pseudogene CMAHP. The association results within
the DCDC2 and KIAA0319 (including TDP2, ACOT13, and C6orf62) areas
are in LD with two previously reported risk variants: the READ1
regulatory element in DCDC2 and the KIAA0319 risk haplotype in
KIAA0319 and TDP2. These results point strongly to variation in
KIAA0319 gene expression as a mediator of DYX2's effect on reading
and language phenotypes.
REFERENCES
[0280] Barrett J C, et al. (2005) Bioinformatics 21:263-265.
[0281] Beaver K M, et al. (2010) J Neural Transm 117(7):827-30.
[0282] Bishop D V, et al. (2008) Genes Brain Behav 7(3):365-72.
[0283] Bishop D V, et al. (1996) J Child Physiol Psychiatry
37:391-403.
[0284] Boyd A, et al. (2012) Int J Epidemiol 42(1):111-27.
[0285] Cardon L R, et al. (1994) Science 266(5183):276-9.
[0286] Catts H W, et al. (2005) J Speech Lang Hear Res
48(6):1378-96.
[0287] Cheng Z, et al. (2006) Biochem Biophys Res Commun
350(4):850-3.
[0288] Chou H H, et al. (1998) Proc Natl Acad Sci USA
95(20):11751-6.
[0289] Cope N, et al. (2005) Am J Hurn Genet 76(4):581-91.
[0290] Couto J M, et al. (2010) Am J Med Genet B Neuropsychatr
Genet 153B(2):447-62.
[0291] Deffenbacher K E, et al. (2004) Hum Genet 115(2):128-38.
[0292] Dennis M Y, et al. (2009) PLoS Genet 5(3):e1000436.
[0293] Eicher J D and Gruen J R. (2013) Mol Genet Metab, doi:
10.1016/j.ymgme.2013.07.001.
[0294] Eicher J D, et al. (2013a) PLoS One 8(5):e63762.
[0295] Eicher J D, et al. (2013b) Genes Brain Behav, in press.
[0296] Elbert A, et al. (2011) Behav Genet 41(1):77-89.
[0297] Francks C, et al. (2004) Am J Hum Genet 75(6):1046-1058.
[0298] Gathercole S, and Baddeley AD. (1990) Journal of Memory and
Language 29:336-360.
[0299] Gathercole S E, and Baddeley A D. (1993) Lawrence Erlbaum,
Mahwah, N.J.
[0300] Gathercole S E, and Baddeley A D. (1996) The Psychological
Corportation, London.
[0301] Gayan J, et al. (1999) Am J Hum Genet 64(1):157-64.
[0302] Golding J, et al. (2001) Paediatr Perinat Epidemiol
15(1):74-87.
[0303] Graham S A, and Fisher S E. (2013) Curr Opin Neurobiol
23(1):43-51.
[0304] Han B, et al. (2008) Ann Hum Genet 72(Pt 6):834-847
[0305] Harold D, et al. (2006) Mol Psychiatry 11(12):1085-1091.
[0306] Kaminen N, et al. (2003) J Med Genet 40(5):340-5.
[0307] Kaplan D E, et al. (2002) Am J Hum Genet 70(5):1287-98.
[0308] Landi N, et al. (2013) Dev Sci 16(1):13-23.
[0309] Lind P A, et al. (2010) Eur J Hurn Genet 18(6):668-73.
[0310] Luciano M, et al. (2007) Biol Psychiatry 62:811-817.
[0311] Ludwig K U, et al. (2008) J Neural Transm
115(11):1587-9.
[0312] Marino C, et al. (2012) Psychiatr Genet 22(1):25-30.
[0313] Meng H, et al. (2011) Behav Genet 41(1):58-66.
[0314] Meng H, et al. (2005) Proc Natl Acad Sci USA
102:17053-17058.
[0315] Newbury D F, et al. (2010) Genome Med 2(1):6.
[0316] Newbury D F, et al. (2011) Behav Genet 41(1):90-104.
[0317] Newbury D F, et al. (2009) Am J Hum Genet 85(2):264-72.
[0318] Nunes T, et al. (2003) Scientific Studies of Reading
7(3):289-307.
[0319] Paracchini S, et al. (2008). Am J Psychiat
165(12):1576-1584.
[0320] Paracchini S, et al. (2006) Hum Mol Genet
15(10):1659-1666.
[0321] Pennington B F. (2006) Cognition 101(2):385-413.
[0322] Pennington B F, and Bishop D V (2009) Annual Review of
Psychology 60:283-306.
[0323] Peter B, et al. (2011) J Neurodev Disord 3(1):39-49.
[0324] Pinel P, et al. (2012) J Neurosci 32(3):817-25.
[0325] Plomin R, et al. (2004) Mol Psychiatry 9(6):582-6.
[0326] Powers N R, et al. (2013) Am J Hum Genet 93(1):19-28.
[0327] Purcell S, et al. (2007) Am J Hum Genet 81(3):559-575.
[0328] Rosner J, and Simon D P. (1971) Journal of Learning
Disabilities 4(384):40-48.
[0329] Rougerie P, et al. (2013) J Immunol 190(2):748-55.
[0330] Rust J, et al. (1993) Psychological Corporation, Sidcup,
UK.
[0331] Scerri T S, et al. (2011) Biol Psychiatry 70:237-245.
[0332] Schumacher J, et al. (2006) Am J Hurn Genet 78(1):52-62.
[0333] Trzaskowski M, et al. (2013) Behav Genet 43(4):267-73.
[0334] Wechsler D. (1996) Psychological Corporation, London,
UK.
[0335] Wechsler D, et al. (1992) Psychological Corporation, Sidcup,
UK.
[0336] Weismer S E, et al. (2000) J Speech Lang Hear Res
43(4):865-78.
[0337] Wilcke A, et al. (2011) Eur J Hum Genet 20(2):224-9.
[0338] Wilcke A, et al. (2009) Ann Dyslexia 59(1):1-11.
[0339] Wise J C, et al. (2007) J Speech Lang Hear Res
50(4):1093-9.
[0340] Wong P C, et al. (2013) PLoS One 8(5):e64983.
[0341] Viding E, et al. (2004) J Child Psychol Psychiatry
45(2):315-25.
[0342] Zhong R, et al. (2013) Mol Neurobiol 47(1):435-42.
[0343] Zou L, et al. (2012) Am J Med Genet B Neuropsychiatr Genet
159B(8):970-6.
[0344] Figure Legends
[0345] FIG. 1: Schematic of the genes within the DYX2 locus on
chromosome 6p21.3. Genes in blue, DCDC2 and KIAA0319, have
replicated associations with written and verbal language
phenotypes, namely RD and LI. Regions in red mark two functional
variants, READ1 in DCDC2 and a risk haplotype with markers in
KIAA0319 and TDP2, which have been functionally associated with RD
and LI using animal models and molecular techniques.
[0346] FIG. 4: An updated schematic of genes in our study with
markers that show replicated associations to RD, LI, and/or IQ. The
list of these genes (shown in blue) has expanded to seven (DCDC2,
KIAA0319, TDP2, ACOT13, C6orf62, FAM65B, and CMAHP), although
linkage disequilibrium may account for multiple associations
(particularly for KIAA0319, TDP2, ACOT13, and C6orf62).
[0347] Tables of Example 3
TABLE-US-00018 TABLE 1a ALSPAC Phenotype Measures Measure Domain
Phoneme Deletion (PD) Age 7 Years Reading (RD) Single Word Reading
(SWR7) Age 7 Years Reading (RD) Single Nonword Reading (SNR) Age 7
Years Reading (RD) Single Word Reading (SWR9) Age 9 Years Reading
(RD) Wechsler Objective Language Dimensions (WOLD) Language (LI)
Verbal Comprehension Age 8 Nonword Repetition Task (NWR) Age 8
Years Language (LI) Wechsler Intelligence Scale for Children (WISC)
Cognition (IQ) Total IQ (TIQ) Age 8 Years Wechsler Intelligence
Scale for Children (WISC) Cognition (IQ) Verbal IQ (VIQ) Age 8
Years Wechsler Intelligence Scale for Children (WISC) Cognition
(IQ) Performance IQ (PIQ) Age 8 Years
TABLE-US-00019 TABLE 1b Phenotype Definitions for ALSPAC Analyses
Phenotype Definition Reading (RD) Severe RD 2 standard deviations
below sample mean on the phoneme deletion task Moderate RD 1
standard deviation below sample mean on SWR7, SNR, and SWR tasks
Language (LI) Severe LI 2 standard deviations below sample mean on
either WOLD and/or NWR tasks Moderate WOLD 1.5 standard deviations
below sample mean on the WOLD task Moderate NWR 1.5 standard
deviations below sample mean on the NWR task Cognition (IQ) Total
IQ Quantitative performance on WISC Total IQ task Verbal IQ
Quantitative performance on WISC Verbal IQ task Performance IQ
Quantitative performance on WISC Performance IQ task
TABLE-US-00020 TABLE 2 Replication Cohorts Iowa LI Colorado RD
Italy RD Number of Subjects 428 1201 878 Number of Families N/A 293
304 Disorder LI RD RD Cohort-type Case-control Family-based
Family-based Analysis SVS TDT (PLINK) TDT (PLINK) Association
Case-control Case-control Case-control Conditioned on: Status
Status and Status Discriminant Score Case Status Composite Speed or
Accuracy Determined on: score on on text- or single- language word
reading task measures
TABLE-US-00021 TABLE 3 Single marker genetic associations with
various RD and LI case-control definitions. Phenotype Marker Gene
BP Location Model OR (95% CI) P-value Severe RD rs2294691 TDP2
24652843 Allelic 2.0 (1.3-2.9) 0.00050 Severe RD rs2294691 TDP2
24652843 Additive 1.9 (1.3-2.8) 0.00053 Severe RD rs2294691 TDP2
24652843 Dominant 2.3 (1.5-3.7) 0.00018* Severe RD rs10456309
KIAA0319 24589562 Recessive 10.5 (2.2-49.5) 0.00020* Moderate RD
rs1562422 CMAHP 25044577 Dominant 1.7 (1.2-2.2) 0.00081 Severe LI
rs807694 DCDC2 24303383 Additive 1.8 (1.3-2.5) 0.00057 Severe LI
rs807694 DCDC2 24303383 Allelic 1.8 (1.3-2.5) 0.00050 Severe LI
rs807694 DCDC2 24303383 Dominant 1.9 (1.3-2.7) 0.00062 Moderate
WOLD rs3756814 C6orf62 24705835 Additive 0.7 (0.6-0.9) 0.00039
Moderate WOLD rs3756814 C6orf62 24705835 Allelic 0.7 (0.6-0.9)
0.00047 Moderate WOLD rs3777663 ACOT13 24700235 Additive 0.6
(0.5-0.8) 0.00039 Moderate WOLD rs3777663 ACOT13 24700235 Allelic
0.7 (0.5-0.8) 0.00041 *Genetic association survives correction for
multiple testing
TABLE-US-00022 TABLE 4a Single marker genetic associations with
cognition Phenotype Marker Gene BP Location Model Slope P-value
Verbal IQ rs9295626 KIAA0319 24587339 Allelic 1.40 0.00041 Verbal
IQ rs9295626 KIAA0319 24587339 Additive 1.39 0.00043 Verbal IQ
rs7763790 KIAA0319 24615063 Allelic -1.40 0.00045 Verbal IQ
rs7763790 KIAA0319 24615063 Additive -1.38 0.00048 Verbal IQ
rs6935076 KIAA0319 24644322 Allelic 1.16 0.00049 Verbal IQ
rs6935076 KIAA0319 24644322 Additive 1.15 0.00052 Verbal IQ
rs9348646 FAM65B 24052526 Allelic -1.14 0.00066 Verbal IQ rs9348646
FAM65B 24052526 Additive -1.14 0.00066 Total IQ rs2328791 N/A
23736848 Allelic -1.21 0.00066 Total IQ rs2328791 N/A 23736848
Additive -1.18 0.00075 Total IQ rs2328791 N/A 23736848 Recessive
-3.36 0.00042
TABLE-US-00023 TABLE 4b Haplotype based genetic associations with
cognition BP Markers Haplotype Gene Location Slope P-value
rs2817201, AT KIAA0319 24585214, 1.42 0.000378 rs9295626 24587339
rs10456309, GGTCAC KIAA0319 24589562, -1.40 0.000569 rs4576240,
24596478, rs17307478, 24605024. rs9356939, 24613354, rs7763790,
24615063, rs6456621 24618511 rs6456624, AGATA KIAA0319 24639223,
1.81 0.0000145* rs6935076, 24644322, rs2038137, 24645943,
rs3756821, 24646821, rs1883593, 24647191, rs3212236 24648455
rs3777663, TGTGGA ACOT13/ 24700235, -1.56 0.000742 rs3756814,
C6orf62 24705835, rs6931809, 24706770, rs6916186, 24708523,
rs6933328, 24710920, rs17491647 24713723 *Genetic association
survives correction for multiple testing
TABLE-US-00024 TABLE 5 Replication of genetic associations in the
Iowa, Italian, and Colorado cohorts. Iowa Italy Colorado Colorado
Case Case Case Discriminant Control Control Control Score Marker
Gene OR p OR p OR p Slope p rs2328791 N/A 1.0 0.813 1.0 1.000 0.9
0.646 0.087 0.447 rs33914824.sup.a DCDC2 2.2 0.034 0.9 0.768 1.1
0.847 0.023 0.934 rs807694.sup.a DCDC2 1.9 0.028 0.9 0.786 0.9
0.853 -0.025 0.919 rs707864.sup.a DCDC2 1.6 0.017 1.0 0.840 1.2
0.446 -0.246 0.101 rs10456301.sup.a DCDC2 0.9 0.553 1.1 0.811 1.5
0.289 0.221 0.162 rs16889066.sup.a DCDC2 1.2 0.517 1.0 0.884 1.2
0.622 -0.304 0.150 rs9379651.sup.a DCDC2 1.1 0.602 1.3 0.225 0.6
0.059 0.205 0.141 rs2817201 KIAA0319 1.1 0.733 1.2 0.129 1.0 1.000
0.034 0.787 rs9295626 KIAA0319 1.1 0.579 0.6 0.0055 1.0 0.823
-0.158 0.169 rs10456309 KIAA0319 0.5 0.073 0.7 0.189 0.4 0.206
0.628 0.00845 rs4576240 KIAA0319 1.1 0.825 1.9 0.0027 1.1 0.862
-0.052 0.754 rs17307478 KIAA0319 1.0 0.996 1.3 0.292 0.8 0.555
0.039 0.803 rs9356939 KIAA0319 4.0 0.018 0.8 0.069 1.3 0.151 -0.116
0.254 rs7763790 KIAA0319 1.0 0.831 1.1 0.627 1.4 0.163 0.014 0.910
rs6456621 KIAA0319 2.2 0.019 1.6 0.405 1.8 0.366 -0.458 0.104
rs3756821 KIAA0319 1.2 0.278 1.0 0.842 1.2 0.327 -0.033 0.734
rs1883593 KIAA0319 1.3 0.169 1.6 0.0052 1.3 0.239 -0.108 0.395
rs3212236 KIAA0319 1.0 0.883 1.1 0.496 0.9 0.745 -0.124 0.319
rs2294691 TDP2 1.1 0.779 1.9 0.0578 1.4 0.491 -0.290 0.247
rs3777663 ACOT13 0.7 0.016 0.6 0.0052 1.0 0.908 0.101 0.345
rs3756814 C6orf62 0.7 0.005 0.7 0.023 0.9 0.600 -0.003 0.980
rs6931809 C6orf62 1.4 0.023 1.4 0.017 1.2 0.491 -0.096 0.382
rs6916186 C6orf62 0.9 0.757 1.2 0.413 1.2 0.547 0.112 0.490
rs6933328 C6orf62 0.9 0.612 0.9 0.613 1.0 0.827 0.215 0.0437
rs17491647 C6orf62 0.8 0.155 0.7 0.104 1.0 0.901 0.042 0.709
rs9348646 FAM65B 0.9 0.358 1.1 0.535 1.4 0.144 -0.415 0.00021
rs1562422 CMAHP 1.0 0.793 1.0 0.796 0.6 0.093 -0.030 0.840
.sup.aThese markers are part of the six-marker risk haplotype in
DCDC2 fully discussed in Powers et al. 2013.
TABLE-US-00025 TABLE 6 Phenotype correlations in the ALSPAC cohort*
NWR WOLD SWR7 SWR9 SNR PD TIQ VIQ PIQ NWR 1 WOLD 0.214 1 SWR7 0.403
0.259 1 SWR9 0.351 0.202 0.722 1 SNR 0.306 0.149 0.660 0.708 1 PD
0.362 0.165 0.688 0.550 0.538 1 TIQ 0.324 0.386 0.500 0.387 0.343
0.406 1 VIQ 0.346 0.424 0.536 0.421 0.421 0.426 0.871 1 PIQ 0.192
0.216 0.292 0.218 0.218 0.246 0.819 0.435 1 *All correlations were
had p < 0.05
EXAMPLE 4
[0348] Genome-Wide Association Study of Shared Components of
Reading Disability and Language Impairment; ZNF385D influences
Reading and Language Disorders.
INTRODUCTION
[0349] Both RD and LI are complex traits that frequently co-occur,
leading to the hypothesis that these disorders share genetic
etiologies. To test this, a genome wide association study (GWAS)
was performed on individuals affected with both RD and LI in the
Avon Longitudinal Study of Parents and Children. The strongest
associations were seen with markers in ZNF385D (OR=1.81,
p=5.45.times.10.sup.-7) and COL4A2 (OR=1.71,
p=7.59.times.10.sup.-7). Markers within NDST4 showed the strongest
associations with LI individually (OR=1.827,
p=1.40.times.10.sup.-7). Association of ZNF385D was replicated
using receptive vocabulary measures in the Pediatric Imaging
Neurocognitive Genetics study (p=0.00245). Diffusion tensor imaging
fiber tract volume data on 16 fiber tracts was then used to examine
the implications of replicated markers. ZNF385D was a predictor of
overall fiber tract volumes in both hemispheres, as well as global
brain volume. In this Example, evidence is presented for ZNF385D as
a risk gene for RD and LI. The implication of transcription factor
ZNF385D in RD and LI underscores the importance of transcriptional
regulation in the development of higher order neurocognitive
traits.
[0350] Methods
[0351] ALSPAC.
[0352] Subject recruitment and collection of phenotype and genetic
data for the ALSPAC cohort was completed by the ALSPAC team. The
ALSPAC is a prospective population-based, birth cohort based in the
Avon region of the United Kingdom. It consists mainly of children
of northern European descent, born in 1991 and 1992. Children were
recruited before birth; recruitment of their pregnant mothers
resulted in a total of 15,458 fetuses, of whom 14,701 were alive at
1 year of age. Details regarding the participants, recruitment, and
study methodologies are described in detail elsewhere
(http://www.bristol.ac.uk/alspac) (Boyd et al., 2012; Golding et
al., 2001). The children of the ALSPAC have been extensively
phenotyped from before birth to early adulthood. Ethical approval
was obtained from the ALSPAC Ethics and Law Committee, Local UK
Research Ethics Committees, and the Yale Human Investigation
Committee.
[0353] Reading and Language Measures.
[0354] The reading, language, and cognitive measures used for this
study were collected at ages 7, 8, and 9 years. Subjects with IQ
<75 on the Wechsler Intelligence Scale for Children (WISC-III)
Total IQ, completed at age 8 years, were excluded from the
presented analyses (Weschler et al. 1992). Reading measures in the
ALSPAC include a phoneme deletion task at age 7, single-word
reading at ages 7 and 9 years, single non-word reading at age 9
years, and reading passage comprehension at age 9 years. The
phoneme deletion task measures phoneme awareness, widely considered
to be a core deficit in both RD and LI (Pennington 2006; Pennington
& Bishop, 2009). For the phoneme deletion task, also known as
the
[0355] Auditory Analysis Test, the child listens to a word spoken
aloud, and is then asked to remove a specific phoneme from that
word to make a new word (Rosner & Simon, 1971). Single-word
reading was assessed at age 7 using the reading subtest of the
Wechsler Objective Reading Dimensions (WORD). At age 9, single-word
and nonword reading were assessed by asking the child to read ten
real words and ten non-words aloud from a subset of a larger list
of words and non-words taken from research conducted by Terezinha
Nunes and colleagues (Rust et al., 1993). Reading comprehension
scores were ascertained at age 9, using the Neale Analysis of
Reading Ability (NARA-II) (Neale 1997). Two additional language
measures, nonword repetition and verbal comprehension tasks, were
collected during clinical interviews at age 8 years. An adaptation
of the Nonword Repetition Task (NWR), in which subjects repeated
recordings of nonwords, was used to assess short-term phonological
memory and processing (Gathercole & Baddeley, 1996). Children
also completed the Wechsler Objective Language Dimensions (WOLD)
verbal comprehension task, where they answered questions about a
paragraph read aloud by an examiner describing a presented picture
(Weschler 1996). Z-scores were calculated for each subject on each
individual measure.
[0356] Case Definitions.
[0357] Applicant aimed to capture persistently poor performers in
various reading and verbal language domains as RD and LI cases in
the case definitions (Table 1). Therefore, RD cases were defined as
having a z-score less than or equal to -1 on at least 3 out of the
5 following tasks: single word reading at age 7 years, phoneme
deletion at age 7 years, single word reading at age 9 years,
nonword reading at age 9 years, and reading comprehension at age 9
years. There were 527 subjects defined as RD cases. LI cases were
defined as having a z-score less than or equal to -1 on at least 2
out of the 3 following tasks: phoneme deletion at age 7 years,
verbal comprehension at age 8 years, and nonword repetition at age
8 years. There were 337 subjects defined as LI cases. As phoneme
awareness is important in both RD and LI, it was inclouded as part
of the case definition for both RD and LI to reflect clinical
presentation. There were 174 individuals affected with both RD and
LI, with a male to female ratio of 1.7:1. In the further
characterization of observed associations, subsets of cases were
created with no comorbidity. There were 163 LI cases excluding
those with comorbid RD, and 353 RD cases excluding those with
comorbid LI. For all analyses, controls were defined as ALSPAC
subjects of European ancestry who completed all the necessary
neurobehavioral assessments but did not meet the criteria for case
status.
[0358] Genotyping and Analysis.
[0359] Subjects were genotyped on Illumina HumanHap 550 bead arrays
(San Diego, Calif.). Subjects were excluded if the percentage of
missing genotypes was greater than 2% (n=6). To prevent possible
population stratification, only subjects of European ancestry were
included. In the primary analysis of RD and LI individuals, there
were 174 cases and 4117 controls. There were a total of 500,527
SNPs genotyped before quality assessment and quality control.
Markers were removed if Hardy-Weinberg equilibrium p<0.0001
(n=93) or if missingness was greater than 10% (n=19). All markers
had a minor allele frequency greater than 0.01. All genetic
analyses were performed using logistic regression in PLINK v1.07
(Purcell et al., 2007). To correct for multiple testing, a
Bonferroni corrected threshold was set of a=1.00.times.10.sup.-7
=0.05/500,000 markers tested.
[0360] Following the initial analyses examining cases with both RD
and LI, RD and LI case definitions were further examined
individually (e.g., LI excluding those with comorbid RD, and RD
excluding those with comorbid LI). These analyses were completed to
determine whether a single disorder (RD or LI) was driving
association signals in the comorbid RD and LI analysis. The
associations of markers within several previously identified RD
and/or LI risk genes were also examined, including those recently
reported in Luciano et al., in order to present their results with
these phenotypic definitions. These genes included: ABCC13, ATP2C2,
BC0307918, CMIP, CNTNAP2, DAZAP1, DCDC2, DYX1Cl, FOXP2, KIAA0319,
KIAA0319L, PRKCH, ROBO1, and TDP2.
[0361] Gene-based analyses were performed on each phenotype
(comorbid RD and LI, as well as RD and LI individually) using the
VEGAS program, similar to the Luciano et al. study (Liu et al.,
2010; Luciano et al., 2013). To correct for multiple testing, a
Bonferroni corrected threshold was set of a=2.84 .times.10.sup.-6
=0.05/17,610 genes tested.
[0362] PING Replication Analyses.
[0363] Replication analyses were completed in the PING study.
Details on the recruitment, ascertainment, neurobehavioral,
genetic, and neuroimaging methods and data acquisition in the PING
study are described in detail elsewhere, but are summarized briefly
below (Akshoomoff et al., 2013, Brown et al., 2012; Fjell et al.,
2012; Walhovd et al., 2012). The PING study is a cross-sectional
cohort of typically developing children between the ages of 3 and
20 years. Subjects were screened for history of major
developmental, psychiatric, and/or neurological disorders, brain
injury, or medical conditions that affect development. However,
subjects were not excluded due to learning disabilities such as RD
and LI. The human research protections programs and institutional
review boards at the 10 institutions (Weil Cornell Medical College,
University of California at Davis, University of Hawaii, Kennedy
Krieger Institute, Massachusetts General Hospital, University of
California at Los Angeles, University of California at San Diego,
University of Massachusetts Medical School, University of Southern
California, and Yale University) participating in the PING study
approved all experimental and consenting procedures. For
individuals under 18 years of age, parental informed consent and
child assent (for those 7 to 17 years of age) were obtained. All
participants age 18 years and older gave their written informed
consent.
[0364] Subjects completed the validated study version of the NIH
Toolbox Cognition Battery, in which two language- and
reading-related tasks were completed: the Oral Reading Recognition
Test and Picture Vocabulary Test (Akshoomoff et al., 2013;
Weintraub et al., 2013). In the Oral Reading Recognition Test, a
word or letter is presented on the computer screen and the
participant is asked to read it aloud. Responses are recorded as
correct or incorrect by the examiner, who views accepted
pronunciations on a separate computer screen. The Picture
Vocabulary Test is a measure of receptive vocabulary and
administered in a computerized adaptive format. The participant is
presented with an auditory recording of a word and four images on
the computer screen; the task is to touch the image that most
closely represents the meaning of the word.
[0365] Subjects were genotyped on the Illumina Human660W-Quad
BeadChip (San Diego, Calif.), with markers used for replication
analyses passing quality control filters (sample call rate >98%,
SNP call rate >95%, minor allele frequency >5%). A reference
panel was constructed as described elsewhere (Brown et al., 2012;
Fjell et al., 2012; Walhovd et al., 2012). To assess ancestry and
admixture proportions in the PING participants, a supervised
clustering approach implemented in the ADMIXTURE software
(Alexander et al., 2009) was used and clustered participant data
into six clusters corresponding to six major continental
populations: African, Central Asian, East Asian, European, Native
American, and Oceanic. Implementation of ancestry and admixture
proportions in the PING subjects is described in detail elsewhere
(Brown et al., 2012; Fjell et al., 2012; Walhovd et al., 2012). To
prevent possible population stratification, only subjects with a
European genetic ancestry factor (GAF) of 1 were included in
genetic analysis of behavior. These 440 individuals of European
ancestry (mean age of 11.5 [standard deviation=4.8] years, 53.0%
male) were analyzed using quantitative performance on the Oral
Reading Recognition and Picture Vocabulary scores with PLINK v1.07,
with age included as a covariate (Purcell et al., 2007). To correct
for multiple testing (20 total tests=10 SNPs.times.2 language
measures), we set statistical thresholds using the false discovery
rate with a=0.05 (Benjamini & Hochberg, 1995).
[0366] PING Imaging Analysis.
[0367] PING imaging techniques, data acquisition, and analyses are
discussed in depth elsewhere and briefly below (Brown et al., 2012;
Fjell et al., 2012; Walhovd et al., 2012). Across the ten sites and
12 scanners, a standardized multiple modality high-resolution
structural MRI protocol was implemented, involving 3D T1- and
T2-weighted volumes and a set of diffusion-weighted scans. At the
University of California at San Diego, data were obtained on a GE
3T SignaHDx scanner and a 3T Discovery 750.times.scanner (GE
Healthcare) using eight-channel phased array head coils. The
protocol included a conventional three-plane localizer, a sagittal
3D inversion recovery spoiled gradient echo T1-weighted volume
optimized for maximum gray/white matter contrast (echo time=3.5 ms,
repetition time=8.1 ms, inversion time=640 ms, flip angle=8.degree.
, receiver bandwidth =.+-.31.25 kHz, FOV=24 cm, frequency=256,
phase=192, slice thickness=1.2 mm), and two axial 2D diffusion
tensor imaging (DTI) pepolar scans (30-directions bvalue=1,000,
TE=83 ms, TR=13,600 ms, frequency=96, phase=96, slice thickness=2.5
mm). Acquisition protocols with pulse sequence parameters identical
or near identical to those protocols used at the University of
California at San Diego were installed on scanners at the other
nine sites. Data were acquired on all scanners to estimate
relaxation rates and measure and correct for scanner-specific
gradient coil nonlinear warping. Image files in DICOM format were
processed with an automated processing stream written in MATLAB
(Natick, MA) and C++by the UCSD Multimodal Imaging Laboratory.
T1-weighted structural images were corrected for distortions caused
by gradient nonlinearities, coregistered, averaged, and rigidly
resampled into alignment with an atlas brain. Image postprocessing
and analysis were performed using a fully automated set of tools
available in the FreeSurfer software suite
(http://surfer.nmr.mgh.harvard.edu/) as well as an atlas-based
method for delineating and labeling WM fiber tracts (Fischl,
2012).
[0368] Diffusion Tensor Imaging.
[0369] Diffusion-weighted images were corrected for eddy current
distortion using a least squares inverse and iterative conjugate
gradient descent method to solve for the 12 scaling and translation
parameters describing eddy current distortions across the entire
diffusion MRI scan, explicitly taking into account the orientations
and amplitudes of the diffusion gradient (Zhuang et al., 2006).
Head motion was corrected by registering each diffusion-weighted
image to a corresponding image synthesized from a tensor fit to the
data (Hagler et al., 2009). Diffusion MRI data were corrected for
spatial and intensity distortions caused by BO magnetic field
in-homogeneities using the reversing gradient method (Holland et
al., 2010). Distortions caused by gradient nonlinearities were
corrected by applying a predefined, scanner-specific, nonlinear
transformation (Jovicich et al., 2006). Diffusion-weighted images
were automatically registered to T1-weighted structural images
using mutual information (Wells et al., 1996) and rigidly resampled
into a standard orientation relative to the T1-weighted images with
isotropic 2-mm voxels. Cubic interpolation was used for all
resampling steps. Conventional DTI methods were used to calculate
diffusion measures (Basser et al., 1994; Pierpaoli et al., 1996).
Scanning duration for the DTI sequence was 4:24 min. White matter
fiber tracts were labeled using a probabilistic-atlas based
segmentation method (Hagler et al., 2009). Voxels containing
primarily gray matter or cerebral spinal fluid, identified using
FreeSurfer's automated brain segmentation, were excluded from
analysis (Fischl et al., 2002). Fiber tract volumes were calculated
as the number of voxels with probability greater than 0.08, the
value that provided optimal correspondence in volume between
atlas-derived regions of interest and manually traced fiber
tracts.
[0370] Statistical Analyses.
[0371] Imaging-genetics analyses were performed in individuals of
European genetic ancestry. Scanner, age, handedness, socioeconomic
status, and sex were included as covariates in all analyses
(Akshoomoff et al., 2013; Brown et al., 2012; Fjell et al., 2012;
Walhovd et al., 2012). 332 subjects of European genetic ancestry
had completed imaging measures that passed PING quality control.
Fiber tract volumes in 16 tracts of interest were tested by
multiple regression analyses in R using the PING data portal
(https://mmil-dataportal.ucsd.edu).
[0372] Results
[0373] SNP and Gene-Based Associations
[0374] The ten strongest GWAS associations with comorbid RD and LI
in ALSPAC are presented in Table 2. The strongest associations were
observed with ZNF385D (OR=1.81, p=5.45.times.10.sup.-7) and COL4A2
(OR=1.71, p='7.59.times.10.sup.-v) (Table 2). Next, RD and LI were
examined individually--with no comorbid cases included--determining
whether one disorder was driving these associations. The ten
strongest associations for RD cases and LI cases individually are
presented in Table 3 and Table 4, respectively. The strongest
associations with LI were with markers in NDST4 (OR=1.83,
p=1.40.times.10.sup.-7) (Table 3). Markers on chromosome 10
(OR=1.43, p=5.16.times.10.sup.-6), chromosome 8 (OR=1.70,
p=5.85.times.10.sup.-6), and the OPA3 gene (OR=1.53,
p=6.92.times.10.sup.-6) had the strongest associations with RD
(Table 4). Markers with p<0.01 within genes previously
implicated in RD and/or LI are presented in Supplemental Table 1
for each phenotype. The strongest associations with these markers
were seen for KIAA0319 with comorbid RD and LI (rs16889556,
p=0.0005177), FOXP2 with comorbid RD and LI (rs1530680, 0.0001702),
CNTNAP2 with LI (rs6951437, p=0.0000462) and DCDC2 with LI
(rs793834, 0.0002679) (Supplemental Table 1a-1c). Gene-based
analyses were completed on each phenotype (comorbid RD and LI, RD
individually, and LI individually), and the ten strongest
gene-based associations are presented in Supplemental Table 2. None
of the gene-based associations survived correction for
multiple-testing; however, the strongest associations were seen
with: (1) OR5H2, OR5H6, and RRAGA with comorbid RD and LI, (2)
NEK2, DLECl, and NARS with LI, and (3) MAP4, OR2L8, and CRYBA4 with
RD. Markers with the strongest p-values in discovery analyses in
ZNF385D, COL4A2, and NDST4 were carried forward for replication
analysis in PING. We observed replication of two markers within
ZNF385D and performance on the Picture Vocabulary Test (p=0.00245
and 0.004173) (Table 5). However, markers did not replicate with
the Oral Reading Recognition Test (p>0.05).
[0375] Imaging-Genetics of ZNF385D
[0376] To follow-up on the replicated associations of ZNF385D, the
effects of these variants on fiber tract volumes previously
implicated in written and verbal language were examined. Before
doing so, fiber tract volume was first determined as a predictor of
performance on Oral Reading Recognition and Picture Vocabulary
Tests (data not shown). Within subjects of only European genetic
ancestry, ZNF385D genotypes were predictors of overall fiber tract
volume and as well as fiber tract volumes in the right and left
hemispheres (Table 6). ZNF385D SNPs were also predictors
bilaterally within the inferior longitudinal fasiculus (ILF),
inferior fronto-occipto fasiculus (IFO), and temporal superior
longitudinal fasiculus (tSLF) in this subset (Table 6). To discern
whether these associations between ZNF385D and fiber tract volumes
reflect global brain volume differences among genotype, the
relationship of ZNF385D with both total brain segmentation and
total cortical volumes was examined. Associations for both measures
were found with rs1679255 (p=0.00072 and 0.00027, respectively) and
rs12636438 (p=0.000259 and 0.000069, respectively). The effects
appeared to be additive in nature, with heterozygous individuals
having intermediate phenotypes relative to those homozygous for the
major allele and to those homozygous for the minor allele. In fact,
when these total brain volume measures were inserted into the model
as a covariate, ZNF385D associations with DTI fiber tract volumes
were no longer present.
[0377] As described herein, genes were identified that contribute
to the common co-occurrence of RD and LI. In the discovery
analyses, associations of ZNF385D and COL4A2 were found in comorbid
cases, and of NDST4 with LI. Next, associations of ZNF385D with
performance were observed on a vocabulary measure, but not on an
oral reading measure, in PING. Association with performance on a
vocabulary measure, although not exactly recapitulating the
comorbidity phenotype, does provide further evidence for the
contribution of ZNF385D to language. To gain functional
understanding, the effects of replicated ZNF385D markers on the
volumes of language-related fiber tracts were interrogated. ZNF385D
markers associated bilaterally with overall fiber tract volumes, as
well as with overall brain volume.
[0378] Studies have shown that RD and LI share genetic contributors
(Trzaskowski et al. 2013). However, specific genes that contribute
to both RD and LI have only recently begun to be examined. These
studies have only used a candidate gene approach to examine this
shared genetic etiology. Such an approach has been successful in
showing the shared contribution of DCDC2, KIAA0319, FOXP2, CNTNAP2,
among others, to both RD and LI (Eicher & Gruen, 2013; Graham
& Fisher, 2013; Newbury et al., 2009; Newbury et al., 2010;
Pinel et al., 2012; Rice et al., 2009; Scerri et al., 2011). In
fact, markers within KIAA0319, FOXP2, and CNTNAP2 (along with
BC0307918) showed nominal association with comorbid RD and LI in
the analyses (p<0.01) described herein. RD/LI risk genes also
showed a tendency to associate with LI individually (DCDC2,
KIAA0319, and CNTNAP2) and with RD individually (CNTNAP2 and CMIP)
(p<0.01). The lack of replication for other RD/LI risk genes and
differences specifically between this study and those of Scerri et
al. (2011) and Luciano et al. (2013) are likely a results of
different case definitions and numbers, as the instant case
classifications were designed to capture as wide a range as
possible of reading-and language-impaired subjects as opposed to
using highly specific neurocognitive measures.
[0379] A glaring omission in the genetic investigations of RD and
LI is the lack of hypothesis-free methods. These methods allow for
discovery of new genes because they do not rely on pre-selected
candidates. Here, the GWAS analyses indicate that ZNF385D
contributes to comorbid RD and LI. This study is not the first to
perform a GWAS on reading- and language- related traits. Luciano et
al. (2013) recently reported a GWAS of quantitative measures of
written and verbal language measures in two population-based
cohorts, including ALSPAC. They found strong evidence that ABCC13,
BC0307918, DAZAP1, among others contribute to performance on these
measures, although the instant analyses did not provide strong
evidence for them. The analytical strategies differed in two ways:
(1) the use of dichotomous rather than quantitative measures to
condition genetic associations and (2) examining reading and
language together as opposed to individually. Past association
studies of RD and LI have shown differences in results depending on
whether genetic data were conditioned on dichotomous or
quantitative phenotypes. For instance, KIAA0319 tends to associate
more readily with quantitative measures, while DCDC2 associates
more often with dichotomized variables (Paracchini et al., 2008;
Powers et al., 2013; Scerri et al., 2011). The present study, which
examines comorbidity, and that of Luciano et al., which examined
performance on reading and language tasks individually, conditioned
genetic associations on different traits, which can lead to
different statistical associations. Both analytical strategies are
valid and have gleaned separate, yet related insight into the
genetic underpinnings of written and verbal language. They
demonstrate the importance of creative and careful examination of
phenotypes when examining neurocognitive and other complex
traits.
[0380] Following the primary analysis of comorbid RD and LI, RD and
LI were examined individually to determine whether a single
disorder was driving the association signals. ZNF385D did not
associate with either RD or LI individually, indicating that
ZNF385D contributes to processes related to both RD and LI, as
opposed to only one of these disorders. Within the PING cohort,
associations of ZNF385D markers were observed with performance on
the Picture Vocabulary Test and not the Oral Reading Recognition
Test. Measures of receptive vocabulary (e.g. the Picture Vocabulary
Test) are related to both written and verbal language tasks
(Scarborough 1990, Wise et al., 2007), while performance on
decoding measures (e.g. the Oral Reading Recognition Test) appear
to be specific to reading.
[0381] Therefore, the Picture Vocabulary Test may reflect the
comorbid RD and LI phenotype used for association in ALSPAC better
than the Oral Reading Recognition Test and explain the association
pattern of ZNF385D in PING. In addition to ZNF385D, suggestive
associations of COL4A2 with comorbid RD/LI and NDST4 with LI were
observed. Neither of these associations replicated with the
measures in PING, but future studies should attempt to replicate
these associations, particularly due to the known involvement of
COL4A2 in porencephaly and white matter lesions (Verbeek et al.,
2012, Yoneda et al., 2011).
[0382] Gene-based analyses did not reveal any associations that
survived correction for multiple testing. Nonetheless, there were
intriguing gene associations that should be investigated in future
studies. For instance, with LI, there were suggestive associations
with several genes on chromosome 19--IL4I, ATFS, NUP62, and
SIGLEC11--which may correspond to the SLI2 linkage peak (Monaco,
2007; SLI Consortium, 2002), Luciano et al. (2013) found a similar
accumulation of suggestively associated genes approximately 5Mb
away from the genes identified herein. Additionally, MAP4, a
microtubule assembly gene, was the strongest associated gene with
RD. There is evidence that microtubule function plays a key role in
reading development as aberrant neuronal migration is thought to
contribute to the etiology of RD and other RD candidate genes are
thought to interact with microtubules (e.g. DCDC2 and ACOT13)
(Cheng et al., 2006). These findings can be validated in an
independent cohort, using methods described herein and known
methods to conclude they are involved in RD and
[0383] LI.
[0384] The strongest observed associations in this study were with
markers within ZNF385D. ZNF385D has previously been implicated in
schizophrenia and attention deficit hyperactivity disorder (ADHD)
(Poelmans et al., 2011; Xu et al., 2013). Both schizophrenia and
ADHD are neurobehavioral disorders thought to have core impairments
in common with RD and LI, including comprehension and semantic
processing (Gilger et al., 1992; Li et al., 2009; Willcutt et al.,
2005). Additionally, the observed association of ZNF385D, as
described herein, on global brain volume may indicate that ZNF385D
influences various neurocognitive traits through its effect on the
entire brain.
[0385] There is little known regarding the function of ZNF385D,
although its zinc finger domain suggests it is a transcriptional
regulator. The importance of transcriptional regulation in written
and verbal language is not a new concept. The most widely studied
language gene, FOXP2, is a potent transcription factor that has
been shown to regulate another language gene, CNTNAP2 (Vernes et
al., 2007; Vernes et al., 2011). Additionally, in the DYX2 locus,
two risk variants, READ1 within DCDC2 and the KIAA0319 risk
haplotype, appear to have the capacity to regulate gene expression
(Couto et al., 2010; Dennis et al., 2009; Meng et al., 2011) and
possibly interact (Ludwig et al. 2008; Example 1; Powers et al.,
2013). ZNF385D variants now join this list of putative
transcriptional variants that influence written and verbal language
skills. The characterization of target genes of ZNF385D and of its
transcriptional effects on these targets will be an important next
step. Additionally, the identification of target genes may generate
therapeutic candidates for treatment and remediation of RD and LI.
To gain further insight into ZNF385D, imaging-genetics analyses of
ZNF385D and fiber tract volumes of language-related tracts were
performed. ZNF385D appears to modulate fiber tract and total brain
volumes, which may subsequently affect the connectivity and
functionality of brain regions important in the efficient, fluent
integration of written and verbal language. Thus, identification of
target genes and how the modulation of their expression during
neural development yields differences in fiber tract and total
brain volumes will be vital for dissecting not only the mechanism
of ZNF385D, but also for the development of core language skills in
children.
[0386] Characteristics of the population. First, although the
overall sample size of the ALSPAC is formidable, the number of
cases for each definition is relatively small. This is expected in
a cross-sectional cohort of the general population as the
prevalence of these disorders ranges between 5-17% (Pennington
& Bishop, 2009). The ALSPAC cohort would not be expected to be
enriched for RD and/or LI cases. Small sample size could have
hindered the statistical power and ability to identify risk genes
with small effect size. Second, the reading and language measures
performed in the ALSPAC and PING studies were not identical.
Phenotypes in PING were treated as a quantitative trait rather than
a dichotomous variable as in ALSPAC. Therefore, attempts to
replicate associations observed in the ALSPAC cohort may have been
hampered as reading/language measures in PING may have captured
different skills than those in ALSPAC. However, the associations
observed in the PING indicate that ZNF385D plays a substantial,
consistent role in overall language processes. Third, atlas-derived
tract volume measures, like volumes derived from manually traced
fiber tracts, are likely underestimates of true fiber volume for
most tracts. However, fiber tract volumes were derived consistently
for all subjects and likely reflect inter-individual differences.
Nonetheless, the strength and independent replication of the
associations described herein and the relationship with brain
imaging phenotypes strongly implicate ZNF385D in core language
processes underlying RD and LI.
[0387] In conclusion, ZNF385D was identified as a novel gene
contributing to both RD and LI, as well as fiber tract and overall
brain volume. The implication of another transcription factor in
communication disorders underscores the importance of
transcriptional regulation in neural development of language
domains in the brain. Future studies should aim to further
characterize the molecular functionality of ZNF385D and replicate
this association, as well as our non-replicated associations--NDST4
and COL4A2--in RD, LI, and other related disorders.
REFERENCES
[0388] Akshoomoff N., et al. (2013) J Int Neuropsychol Soc Under
Review.
[0389] Alexander D. H., et al. (2009) Genome Res 19(9),
1655-64.
[0390] Basser P. J., et al. (1994) Biophys J 66(1), 259-267.
[0391] Benjamini Y, et al. (1995) JR Statst Soc B 57(1):
289-300.
[0392] Boyd A., et al. (2012) Int J Epidemiol 42(1), 111-27.
[0393] Brown T. T., et al. (2012) Curr Biol 22(18), 1693-8.
[0394] Catts H. W., et al. (2005) J Speech Lang Hear Res 48(6),
1378-96.
[0395] Cheng Z, et al. (2006) Biochem Biophys Res Commun 350(4),
850-3.
[0396] Cope N., et al. (2012) Neuroimage 63(1), 148-56.
[0397] Couto J. M., et al. (2010) Am J Med Genet B Neuropsychiatr
Genet 153B(2), 447-62.
[0398] Darki F., et al. (2012) Biol Psychiatry 72(8), 671-6.
[0399] Dennis, M. Y., et al. (2009). PLoS Genet 5, e1000436.
[0400] Eicher J. D. and Gruen J. R. (2013) Mol Genet Metab, doi:
10.1016/j.ymgme.2013.07.001.
[0401] Fischl B. (2012) FreeSurfer. Neuroimage 62(2), 774-81.
[0402] Fischl B., et al. (2002) Neuron 33(3), 41-55.
[0403] Fjell A. M., et al. (2012) Proc Natl Acad Sci USA 109(48),
19620-5.
[0404] Gathercole S., and Baddeley A. D. (1990) Journal of Memory
and Language 29, 336-360.
[0405] Gathercole S. E., and Baddeley A. D. (1996) The
Psychological Corportation, London.
[0406] Gilger J. W., et al. (1992) J Am Acad Child Adolesc
Psychiatry 31(2), 343-8.
[0407] Golding J., et al. (2001) I. Study methodology. Paediatr
Perinat Epidemiol 15(1), 74-87.
[0408] Graham S. A., and Fisher S. E. (2013) Curr Opin Neurobiol
23(1), 43-51.
[0409] Hagler D J, Jr., et al. (2009) Hum Brain Mapp 30(5):
1535-1547.
[0410] Holland D., et al. (2010) Neuroimage 50(1), 175-183.
[0411] Jovicich J., et al. (2006) Neuroimage 30(2): 436-443.
[0412] Li X., et al. (2009) Curr Opin Psychiatry 22(2), 131-9.
[0413] Liegeois F., etal. (2003) Nat Neurosci 6(11), 1230-7.
[0414] Liu J. Z., et al. (2010) Am J Hum Genet 87(1), 139-45.
[0415] Luciano M., et al. (2013) Genes Brain Behav, doi:
10.1111/gbb.12053.
[0416] Ludwig K. U., etal. (2008) J Neural Transm 115(11),
1587-9.
[0417] Meng H., et al. (2011) Behav Genet 41(1), 58-66.
[0418] Monaco A. P. (2007) Ann Hum Genet 71(Pt5), 660-73.
[0419] Newbury D. F., et al. (2009) Am J Hum Genet 85(2),
264-72.
[0420] Newbury D. F., et al. (2010) Behav Genet 41(1), 90-104.
[0421] Neale M. D. (1997) Neale Analysis of Reading
Ability--Revised:--Manual for Schools, NFER-Nelson.
[0422] Paracchini S., et al. (2008) Am J Psychiatry 165(12),
1576-84.
[0423] Peterson R. L., and Pennington, B. F. (2012) Lancet
379,1997-2007.
[0424] Pennington B. F., and Bishop D. V. (2009) Annu Rev Psychol
60,283-306.
[0425] Pennington B F. (2006) Cognition 101(2), 385-413.
[0426] Pierpaoli C., et al. (1996) Radiology 201(3), 637-648.
[0427] Pinel P., et al. (2012) J Neurosci 32(3), 817-25.
[0428] Poelmans G., et al. (2011) Am J Psychiatry 168(4),
365-77.
[0429] Powers N. R., et al. (2013) Am J Hum Genet 93(1), 19-28.
[0430] Purcell S., et al. (2007) Am J Hurn Genet 81(3),
559-575.
[0431] Rice M. L., et al. (2009) J Neurodev Disord 1(4),
264-82.
[0432] Rosner J., and Simon D. P. (1971) Journal of Learning
Disabilities 4(384), 40-48.
[0433] Rust J., et al. (1993) WORD: Wechsler Objective Reading
Dimensions Manual. Psychological Corporation, Sidcup, UK.
[0434] Scarborough H. S. (1990) Child Dev 61(6), 1728-43.
[0435] Scerri T. S., et al. (2012) PLoS One 7(11), e50312.
[0436] Scerri T. S., et al. (2011) Biol Psychiatry 70(3),
237-45.
[0437] Scerri T. S., and Schulte-Korne G. (2010) Eur Child Adolesc
Psychiatry 19(3), 179-97.
[0438] Scott-Van Zeeland A. A., et al. (2010) Sci Transl Med 2(56),
doi:
[0439] 10.1126/scitranslmed.3001344.
[0440] Shaywitz S. E., and Shaywitz B. A. (2008) Dev Psychopathol
20(4), 1329-49.
[0441] SLI Consortium. (2002) Am J Hum Genet 70(2), 384-98.
[0442] Tan G. C., et al. (2010) Neuroimage 53(3), 1030-42.
[0443] Trzaskowski M., et al. (2013) Behav Genet 43(4), 267-73.
[0444] Vandermosten M., et al. (2012) Brain 135(Pt 3), 935-48.
[0445] Verbeek E., et al. (2012) Eur J Hum Genet 20(8), 844-51.
[0446] Vernes S. C., et al. (2011) PLoS Genet 7(7), e1002145.
[0447] Vernes S. C., et al. (2007) Am J Hum Genet 81(6),
1232-50.
[0448] Walhovd K. B., et al. (2012) Proc Natl Acad Sci USA 109(49),
20089-94.
[0449] Wechsler D. (1996) Wechsler objective language dimensions
(WOLD). The Psychological Corporation, London.
[0450] Wechsler D., et al. (1992) WISC-IIIUK:--Wechsler
Intelligence Scale for Children. Psychological Corporation, Sidcup,
UK.
[0451] Weintraub S., et al. (2013) Neurology 90(11 Suppl 3),
S54-64.
[0452] Wells W. M. 3rd, et al. (1996) Med Image Anal 1(1),
35-51.
[0453] Wilcke A., et al. (2011) Eur J Hum Genet 20(2), 224-9.
[0454] Willcutt E. G., et al. (2005) Dev Neuropsychol 27(1),
35-78.
[0455] Wise J. C., et al. (2007) J Speech Lang Hear Res 50(4),
1093-9.
[0456] Xu C., et al. (2013) PLoS One 8(1), e51674.
[0457] Yoneda Y., et al. (2012) Am J Hum Genet 90(1), 86-90.
[0458] Zhuang J., et al. (2006)
[0459] Tables of Example 4
TABLE-US-00026 TABLE 1 Reading and language measures used to define
Reading Disability (RD) and Language Impairment (LI) Cases Reading
Disability (RD) Language Impairment (LI) (n = 527)* (n = 337)**
Phoneme Deletion Age 7 Years Phoneme Deletion Age 7 Years Single
Word Reading Age 7 Years Verbal Comprehension Age 8 Years Single
Word Reading Age 9 Years Nonword Repetition Age 8 Years Nonword
Reading Age 9 Years Reading Comprehension Age 9 Years *RD Cases had
a z-score of less than or equal to -1 on at least 3 out of the 5
reading measures **LI Cases had a z-score of less than or equal to
-1 on at least 2 out of the 3 language measures
TABLE-US-00027 TABLE 2 Associations with comorbid RD and LI cases
in ALSPAC (n = 174) Minor MAF MAF Odds Marker Chr Base Pair Allele
Aff Unaff Gene Ratio P-value rs12636438 3 22038281 G 0.3017 0.1927
ZNF385D 1.811 5.45 .times. 10.sup.-7 rs1679255 3 22022938 C 0.3006
0.1923 ZNF385D 1.805 6.87 .times. 10.sup.-7 rs9521789 13 109917621
C 0.5201 0.3879 COL4A2 1.71 7.59 .times. 10.sup.-7 rs1983931 13
109916103 G 0.5201 0.3896 COL4A2 1.698 1.06 .times. 10.sup.-6
rs9814232 3 21948179 A 0.2931 0.1886 ZNF385D 1.784 1.30 .times.
10.sup.-6 rs7995158 13 109909718 A 0.5201 0.3911 1.687 1.44 .times.
10.sup.-6 rs6573225 14 58354640 C 0.1965 0.1122 1.935 1.56 .times.
10.sup.-6 rs4082518 10 17103032 T 0.3103 0.2049 CUBN 1.746 2.17
.times. 10.sup.-6 rs442555 14 58365937 C 0.1983 0.1149 1.905 2.38
.times. 10.sup.-6 rs259521 3 21942154 T 0.2902 0.1885 ZNF385D 1.761
2.42 .times. 10.sup.-6 Chr, Chromosome; MAF Aff, Minor allele
frequency in affected subjects; MAF Unaff, Minor allele frequency
in unaffected subjects
TABLE-US-00028 TABLE 3 Associations with LI cases in ALSPAC,
excluding comorbid RD cases (n = 163) Minor MAF MAF Odds Marker Chr
Base Pair Allele Aff Unaff Gene Ratio P-value rs482700 4 116286939
G 0.3896 0.2588 NDST4 1.827 1.40 .times. 10.sup.-7 rs7695228 4
116309516 T 0.3920 0.2636 NDST4 1.801 2.94 .times. 10.sup.-7
rs1940309 4 116306410 T 0.3865 0.2606 NDST4 1.788 4.14 .times.
10.sup.-7 rs505277 4 116248257 T 0.3773 0.2528 NDST4 1.791 4.35
.times. 10.sup.-7 rs476739 4 116248997 A 0.3773 0.2529 NDST4 1.79
4.41 .times. 10.sup.-7 rs867036 4 116381578 C 0.3957 0.2696 NDST4
1.774 5.31 .times. 10.sup.-7 rs867035 4 116381423 C 0.3957 0.2697
NDST4 1.773 5.45 .times. 10.sup.-7 rs2071674 4 2366882 T 0.0920
0.0389 ZFYVE28 2.503 1.90 .times. 10.sup.-6 rs7694946 4 116413588 C
0.3620 0.2526 NDST4 1.678 8.95 .times. 10.sup.-6 rs4823324 22
44616787 C 0.2914 0.4143 ATXN10 0.581 9.30 .times. 10.sup.-6 Chr,
Chromosome; MAF Aff, Minor allele frequency in affected subjects;
MAF Unaff, Minor allele frequency in unaffected subjects
TABLE-US-00029 TABLE 4 Associations with RD cases in ALSPAC,
excluding comorbid LI cases (n = 353) Minor MAF MAF Odds Marker Chr
Base Pair Allele Aff Unaff Gene Ratio P-value rs180950 10 115697957
G 0.456 0.369 1.431 5.16 .times. 10.sup.-6 rs2590673 8 126037337 G
0.133 0.083 1.697 5.85 .times. 10.sup.-6 rs892100 19 50772522 C
0.228 0.162 OPA3 1.526 6.92 .times. 10.sup.-6 rs1792745 18 51955991
T 0.187 0.129 1.558 1.22 .times. 10.sup.-5 rs12546767 8 126151747 C
0.152 0.099 KIAA0196 1.618 1.32 .times. 10.sup.-5 rs12634033 3
146524529 C 0.135 0.087 1.646 1.80 .times. 10.sup.-5 rs892270 12
105002956 G 0.534 0.451 NUAK1 1.395 2.16 .times. 10.sup.-5
rs10887149 10 124156994 A 0.278 0.357 PLEKHA1 0.069 2.25 .times.
10.sup.-5 rs10041417 5 33218502 T 0.226 0.164 1.489 2.58 .times.
10.sup.-5 rs6792971 3 68468217 C 0.111 0.068 FAM19A1 1.703 2.59
.times. 10.sup.-5 Chr, Chromosome; MAF Aff, Minor allele frequency
in affected subjects; MAF Unaff, Minor allele frequency in
unaffected subjects
TABLE-US-00030 TABLE 5 Replication of associations in PING (n =
440) Oral Reading Picture Vocabulary Minor Test Test Marker Allele
MAF Gene Beta P-value Beta P-value rs12636438 G 0.161 ZNF385D
-0.1867 0.9452 -2.88 0.004173* rs1679255 G 0.292 ZNF385D -1.84
0.5016 -3.048 0.002445** rs9521789 G 0.4370 COL4A2 -0.3411 0.7332
0.8647 0.3877 rs476739 A 0.265 NDST4 0.5406 0.5891 0.5159 0.6062
rs505277 A 0.280 NDST4 0.5406 0.5891 -0.3452 0.7301 rs482700 G
0.278 NDST4 0.5498 0.5828 -0.05341 0.9574 rs7695228 A 0.295 NDST4
0.6258 0.5318 0.09991 0.9205 rs867036 G 0.378 NDST4 0.2605 0.7946
-0.1414 0.8876 rs867035 G 0.377 NDST4 0.2961 0.7673 -0.1565 0.8757
rs1940309 A 0.281 NDST4 0.6049 0.5456 0.1296 0.8969 *P-value less
than FDR-adjusted statistical threshold (FDR-adjusted threshold =
0.05 .times. (2/19) = 0.00526 **P-value less than FDR-adjusted
statistical threshold (FDR-adjusted threshold = 0.05 .times. (1/20)
= 0.00250 MAF, Minor allele frequency in full PING sample
TABLE-US-00031 TABLE 6 ZNF385D Associations with DTI Fiber Tract
Volumes in subjects with 100% European Genetic Ancestry (n = 332)
rs1679255 rs12636438 Fiber Tract Slope P-value Slope P-value All
-3329.9 0.044* -3717.9 0.023* Right All -1731.4 0.039* -1965 0.017*
Left All -1616.3 0.055 -1775.6 0.033* Right ILF -251.3 0.011*
-234.4 0.016* Left ILF -256.9 0.0088** -254.6 0.009** Right IFO
-200.8 0.032* -190 0.041* Left IFO -221 0.012* -226.3 0.009** Right
SLF -168.1 0.06 -206 0.02* Left SLF -199.5 0.022* -212.9 0.013*
Right tSLF -170.8 0.011* -180.7 0.0068** Left tSLF -163.1 0.023*
-169.9 0.016* Right pSLF -153.1 0.079 -182.4 0.034* Left pSLF
-112.2 0.18 -125.3 0.131 Right SIFC -148.8 0.052 -165.6 0.029* Left
SIFC -34.54 0.66 -54.3 0.48 CC -977.1 0.15 -1181.6 0.081 *p
.ltoreq. 0.05 **p .ltoreq. 0.01 Abbreviations: All (All Fiber
Tracts), ILF (Inferior Longitudinal Fasiculus), IFO (Inferior
Fronto-occipital Fasiculus), SLF (Superior Longitudinal Fasiculus),
tSLF (Temporal Superior Longitudinal Fasiculus), pSLF (Parietal
Superior Longitudinal Fasiculus), SIFC (Striatal Inferior Frontal
Cortex), CC (Corpus Callosum)
[0460] Supplement Tables
TABLE-US-00032 SUPPLEMENTAL TABLE 1 Associations of markers within
genes previously implicated in RD and/or LI with (a) Comorbid RD
and LI, (b) LI individually, and (c) RD individually. Marker Gene
Chr. Base Pair P-value a) Comorbid RD and LI rs16889556 KIAA0319 6
24749584 0.0005177 rs1047782 TDP2 6 24758710 0.006515 rs1530680
FOXP2 7 114194632 0.0001702 rs12667130 FOXP2 7 114213035 0.003033
rs6965855 CNTNAP2 7 145348483 0.006804 rs985080 CNTNAP2 7 145359118
0.006157 rs4726782 CNTNAP2 7 145425012 0.005341 rs1718101 CNTNAP2 7
145753721 0.0008707 rs10487689 CNTNAP2 7 146835482 0.008787
rs1918296 CNTNAP2 7 147655135 0.00616 rs737533 BC0307918 10 3353137
0.001008 b) LI rs793845 DCDC2 6 24296970 0.005511 rs2799373 DCDC2 6
24303738 0.0009664 rs793862 DCDC2 6 24315179 0.002443 rs793834
DCDC2 6 24342912 0.0002679 rs2792682 DCDC2 6 24380363 0.006634
rs807704 DCDC2 6 24408825 0.001988 rs707864 DCDC2 6 24413827
0.001266 rs12193738 KIAA0319 6 24676372 0.00974 rs2817198 KIAA0319
24683073 0.00559 rs10456309 KIAA0319 6 24697541 0.002258 rs985080
CNTNAP2 7 145359118 0.006735 rs1554690 CNTNAP2 7 145377266 0.006486
rs2533096 CNTNAP2 7 146037312 0.004782 rs6951437 CNTNAP2 7
146037340 0.0000462 rs344470 CNTNAP2 7 146044430 0.001697 rs344468
CNTNAP2 7 146050259 0.003965 c) RD rs4725745 CNTNAP2 7 147032172
0.002407 rs12444778 CMIP 16 80330728 0.003148 rs1444186 CMIP 16
80330745 0.00482
TABLE-US-00033 SUPPLEMENTAL TABLE 2 Gene-based analyses of comorbid
RD and LI, LI individually, and RD individually. The top ten
gene-based associations for each are shown. No. SNPS Gene Ch Start
Position Stop Position in Gene p-value RD and LI OR5H2 3 99484421
99485366 16 0.000072 OR5H6 3 99465818 99466796 19 0.000127 RRAGA 9
19039371 19041021 30 0.000276 OR6B3 2 240633166 240634162 36
0.000294 UMOD 16 20251873 20271538 29 0.000307 A26C1A 2 131692393
131738886 1 0.000389 FAM29A 9 19043140 19092902 44 0.000406 CHRNA1
2 175320568 175337446 23 0.000420 IFIT5 10 91164418 91170733 27
0.000475 LOC643905 2 240629902 240631072 39 0.000562 LI NEK2 1
209902744 209915590 28 0.000117 DLEC1 3 38055699 38139232 20
0.000171 NARS 18 53418891 53440175 36 0.000203 IL4I1 19 55084722
55124574 22 0.000305 PKD2 4 89147843 89217953 34 0.000313 ATF5 19
55123785 55129004 18 0.000344 NUP62 19 55101893 55124598 19
0.000402 SIGLEC11 19 55144061 551556241 49 0.000578 ACAN 15
87147677 87219589 43 0.000633 PGD 1 10381671 10402788 12 0.000668
RD MAP4 3 47867188 48105715 18 0.000085 OR2L8 1 246178782 246179721
19 0.000139 CRYBA4 22 25347927 25356636 40 0.000219 OR2T8 1
246150942 246151881 24 0.000225 KIAA1622 14 93710401 93815825 42
0.000255 OR2AK2 1 246195256 246196264 15 0.000315 DHX30 3 47819654
47866687 11 0.000316 GEMIN6 2 38858830 38862610 8 0.000351 C20orf10
20 43435933 43440371 23 0.000450 PPIF 10 80777225 38862610 22
0.000493
Sequence CWU 1
1
68138DNAArtificial SequenceSynthetic oligonucleotide 1tgtaaaacga
cggccagttg ttgaatccca gaccacaa 38220DNAArtificial SequenceSynthetic
oligonucleotide 2atcccgatga aatgaaaagg 20318DNAArtificial
SequenceSynthetic oligonucleotide 3tgtaaaacga cggccagt
18420DNAArtificial SequenceSynthetic oligonucleotide 4agcctgccta
ccacagagaa 20521DNAArtificial SequenceSynthetic oligonucleotide
5ggaacaacct cacagaaatg g 21621DNAArtificial SequenceSynthetic
oligonucleotide 6tgaaaccccg tctctactga a 21747DNAArtificial
SequenceSynthetic oligonucleotide 7ttgagaggaa ggaaaggaag gatccctgag
aggaaggaaa ggaagga 47847DNAArtificial SequenceSynthetic
oligonucleotide 8aatccttcct ttccttcctc tcagggatcc ttcctttcct
tcctctc 47947DNAArtificial SequenceSynthetic oligonucleotide
9ttgagagaga gagagagaga gatccctgag agagagagag agagaga
471047DNAArtificial SequenceSynthetic oligonucleotide 10aatctctctc
tctctctctc tcagggatct ctctctctct ctctctc 471120DNAArtificial
SequenceSynthetic oligonucleotide 11tcatgcaaag ttccaaaacc
201219DNAArtificial SequenceSynthetic oligonucleotide 12gatttcctcc
ctcccttcc 191321DNAArtificial SequenceSynthetic oligonucleotide
13gccctaggca ccagggtgtg a 211420DNAArtificial SequenceSynthetic
oligonucleotide 14acagggtgct cctcaggggc 201526DNAArtificial
SequenceSynthetic oligonucleotide 15gagaggaagg aaagagagga aggaaa
261613DNAArtificial SequenceSynthetic oligonucleotide 16gagaggaagg
aaa 131726DNAArtificial SequenceSynthetic oligonucleotide
17gagaggaagg aaagagagga agaaaa 261825DNAArtificial
SequenceSynthetic oligonucleotide 18gagaggaagg aaagagagga aggaa
251928DNAArtificial SequenceSynthetic oligonucleotide 19ggaaggaagg
aaggaaggaa ggaaggaa 282036DNAArtificial SequenceSynthetic
oligonucleotide 20ggaaggaagg aaggaaggaa ggaaggaagg aaggaa
362124DNAArtificial SequenceSynthetic oligonucleotide 21ggaaggaagg
aaggaaggaa ggaa 242232DNAArtificial SequenceSynthetic
oligonucleotide 22ggaaggaagg aaggaaggaa ggaaggaagg aa
322316DNAArtificial SequenceSynthetic oligonucleotide 23ggaaggaagg
aaggaa 162420DNAArtificial SequenceSynthetic oligonucleotide
24ggaaggaagg aaggaaggaa 202540DNAArtificial SequenceSynthetic
oligonucleotide 25ggaaggaagg aaggaaggaa ggaaggaagg aaggaaggaa
402644DNAArtificial SequenceSynthetic oligonucleotide 26ggaaggaagg
aaggaaggaa ggaaggaagg aaggaaggaa ggaa 442728DNAArtificial
SequenceSynthetic oligonucleotide 27ggaaggaagg aaggaaggaa gggaggaa
282812DNAArtificial SequenceSynthetic oligonucleotide 28ggaaagaatg
aa 122912DNAArtificial SequenceSynthetic oligonucleotide
29ggaaggaagg aa 123016DNAArtificial SequenceSynthetic
oligonucleotide 30ggaagaaagg aaggaa 1631102DNAArtificial
SequenceSynthetic oligonucleotide 31gagaggaagg aaagagagga
aggaaaggaa ggaaggaagg aaggaaggaa ggaagaaagg 60aaggaaggaa agaatgaagg
aaggaaggaa ggaagggagg ga 1023285DNAArtificial SequenceSynthetic
oligonucleotide 32gagaggaagg aaaggaagga aggaaggaag gaaggaagga
aggaaggaag gaaagaatga 60aggaaggaag gaaggaaggg aggga
853385DNAArtificial SequenceSynthetic oligonucleotide 33gagaggaagg
aaaggaagga aggaaggaag gaaggaagaa aggaaggaag gaaagaatga 60aggaaggaag
gaaggaaggg aggga 853498DNAArtificial SequenceSynthetic
oligonucleotide 34gagaggaagg aaagagagga aggaaaggaa ggaaggaagg
aaggaaggaa gaaaggaagg 60aaggaaagaa tgaaggaagg aaggaaggaa gggaggga
9835106DNAArtificial SequenceSynthetic oligonucleotide 35gagaggaagg
aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa ggaaggaaga 60aaggaaggaa
ggaaagaatg aaggaaggaa ggaaggaagg gaggga 10636102DNAArtificial
SequenceSynthetic oligonucleotide 36gagaggaagg aaagagagga
aggaaaggaa ggaaggaagg aaggaaggaa ggaaggaaga 60aaggaaggaa ggaaagaatg
aaggaaggaa ggaagggagg ga 10237102DNAArtificial SequenceSynthetic
oligonucleotide 37gagaggaagg aaagagagga aggaaaggaa ggaaggaagg
aaggaaggaa ggaaggaaga 60aaggaaggaa agaatgaagg aaggaaggaa ggaagggagg
ga 1023890DNAArtificial SequenceSynthetic oligonucleotide
38gagaggaagg aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa ggaaggaaag
60aatgaaggaa ggaaggaagg aagggaggga 903989DNAArtificial
SequenceSynthetic oligonucleotide 39gagaggaagg aaaggaagga
aggaaggaag gaaggaagga agaaaggaag gaaggaaaga 60atgaaggaag gaaggaagga
agggaggga 894090DNAArtificial SequenceSynthetic oligonucleotide
40gagaggaagg aaagagagga aggaaaggaa ggaaggaagg aagaaaggaa ggaaggaaag
60aatgaaggaa ggaaggaagg aagggaggga 904198DNAArtificial
SequenceSynthetic oligonucleotide 41gagaggaagg aaagagagga
aggaaaggaa ggaaggaagg aaggaaggaa ggaagaaagg 60aaggaaggaa agaatgaagg
aaggaaggaa gggaggga 984289DNAArtificial SequenceSynthetic
oligonucleotide 42gagaggaagg aaaggaagga aggaaggaag gaaggaagga
aggaagaaag gaaggaagga 60aagaatgaag gaaggaagga agggaggga
8943106DNAArtificial SequenceSynthetic oligonucleotide 43gagaggaagg
aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa ggaaggaagg 60aagaaaggaa
ggaaggaaag aatgaaggaa ggaaggaagg gaggga 10644110DNAArtificial
SequenceSynthetic oligonucleotide 44gagaggaagg aaagagagga
aggaaaggaa ggaaggaagg aaggaaggaa ggaaggaagg 60aagaaaggaa ggaaggaaag
aatgaaggaa ggaaggaagg aagggaggga 1104598DNAArtificial
SequenceSynthetic oligonucleotide 45gagaggaagg aaagagagga
aggaaaggaa ggaaggaagg aaggaagaaa gaaaggaagg 60aaggaaagaa tgaaggaagg
aaggaaggaa gggaggga 984694DNAArtificial SequenceSynthetic
oligonucleotide 46gagaggaagg aaagagagga aggaaaggaa ggaaggaagg
aaggaagaaa ggaaggaagg 60aaagaatgaa ggaaggaagg aaggaaggga ggga
944794DNAArtificial SequenceSynthetic oligonucleotide 47gagaggaagg
aaagagagga aggaaaggaa ggaaggaagg aagaaagaaa ggaaggaagg 60aaagaatgaa
ggaaggaagg aaggaaggga ggga 9448106DNAArtificial SequenceSynthetic
oligonucleotide 48gagaggaagg aaagagagga aggaaaggaa ggaaggaagg
aaggaaggaa ggaagaaaga 60aaggaaggaa ggaaagaatg aaggaaggaa ggaaggaagg
gaggga 1064998DNAArtificial SequenceSynthetic oligonucleotide
49gagaggaagg aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa ggaaggaagg
60aaggaaagaa tgaaggaagg aaggaaggaa gggaggga 9850114DNAArtificial
SequenceSynthetic oligonucleotide 50gagaggaagg aaagagagga
aggaaaggaa ggaaggaagg aaggaaggaa ggaaggaagg 60aaggaagaaa ggaaggaagg
aaagaatgaa ggaaggaagg aaggaaggga ggga 1145194DNAArtificial
SequenceSynthetic oligonucleotide 51gagaggaagg aaagagagga
aggaaaggaa ggaaggaagg aaggaaggaa gaaaggaagg 60aaggaaagaa tgaaggaagg
aaggaaggga ggga 9452102DNAArtificial SequenceSynthetic
oligonucleotide 52gagaggaagg aaagagagga aggaaaggaa ggaaggaagg
aaggaaggaa ggaaggaagg 60aaggaaggaa agaatgaagg aaggaaggaa ggaagggagg
ga 10253106DNAArtificial SequenceSynthetic oligonucleotide
53gagaggaagg aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa ggaaggaagg
60aaggaaggaa ggaaagaatg aaggaaggaa ggaaggaagg gaggga
10654102DNAArtificial SequenceSynthetic oligonucleotide
54gagaggaagg aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa gaaagaaagg
60aaggaaggaa agaatgaagg aaggaaggaa ggaagggagg ga
1025593DNAArtificial SequenceSynthetic oligonucleotide 55gagaggaagg
aaaggaagga aggaaggaag gaaggaagga aggaagaaag gaaggaagga 60aagaatgaag
gaaggaagga aggaagggag gga 935690DNAArtificial SequenceSynthetic
oligonucleotide 56gagaggaagg aaagagagga aggaaaggaa ggaaggaagg
aaggaagaaa ggaaggaagg 60aaagaatgaa ggaaggaagg aagggaggga
905781DNAArtificial SequenceSynthetic oligonucleotide 57gagaggaagg
aaaggaagga aggaaggaag gaagaaagga aggaaggaaa gaatgaagga 60aggaaggaag
gaagggaggg a 8158106DNAArtificial SequenceSynthetic oligonucleotide
58gagaggaagg aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa ggaagaaagg
60aaggaaggaa agaatgaagg aaggaaggaa ggaaggaagg gaggga
10659102DNAArtificial SequenceSynthetic oligonucleotide
59gagaggaagg aaagagagga aggaaaggaa ggaaggaagg aaggaaggga ggaagaaagg
60aaggaaggaa agaatgaagg aaggaaggaa ggaagggagg ga
10260102DNAArtificial SequenceSynthetic oligonucleotide
60gagaggaagg aaagagagga aggaaaggaa ggaaggaagg aaggaagaaa ggaaggaagg
60aaggaaggaa agaatgaagg aaggaaggaa ggaagggagg ga
10261102DNAArtificial SequenceSynthetic oligonucleotide
61gagaggaagg aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa ggaagaaagg
60aagggaggaa agaatgaagg aaggaaggaa ggaagggagg ga
1026294DNAArtificial SequenceSynthetic oligonucleotide 62gagaggaagg
aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa ggaaggaagg 60aaagaatgaa
ggaaggaagg aaggaaggga ggga 946382DNAArtificial SequenceSynthetic
oligonucleotide 63gagaggaagg aaagagagga aggaaaggaa ggaaggaagg
aaggaaggaa ggaaagaatg 60aaggaaggaa ggaagggagg ga
8264102DNAArtificial SequenceSynthetic oligonucleotide 64gagaggaagg
aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa ggaagaaaga 60aaggaaggaa
ggaaagaatg aaggaaggaa ggaagggagg ga 10265102DNAArtificial
SequenceSynthetic oligonucleotide 65gagaggaagg aaagagagga
agaaaaggaa ggaaggaagg aaggaaggaa ggaagaaagg 60aaggaaggaa agaatgaagg
aaggaaggaa ggaagggagg ga 10266109DNAArtificial SequenceSynthetic
oligonucleotide 66gagaggaagg aaagagagga aggaaggaag gaaggaagga
aggaaggaag gaaggaagga 60agaaaggaag gaaggaaaga atgaaggaag gaaggaagga
agggaggga 1096798DNAArtificial SequenceSynthetic oligonucleotide
67gagaggaagg aaagagagga aggaaaggaa ggaaggaagg aaggaaggaa gaaaggaagg
60aaggaaagaa tgaaggaaga aaggaaggaa gggaggga 986889DNAArtificial
SequenceSynthetic oligonucleotide 68gagaggaagg aaaggaagga
aggaaggaag gaaggaagga aggaaggaag gaaggaaaga 60atgaaggaag gaaggaagga
agggaggga 89
* * * * *
References