U.S. patent application number 14/089398 was filed with the patent office on 2014-06-26 for methods using dna methylation for identifying a cell or a mixture of cells for prognosis and diagnosis of diseases, and for cell remediation therapies.
This patent application is currently assigned to The Regents of the University of California. The applicant listed for this patent is Brown University, The Regents of the University of California. Invention is credited to William P. Accomando, JR., Eugene Andres Houseman, Karl Kelsey, Carmen Marsit, John Wiencke.
Application Number | 20140178348 14/089398 |
Document ID | / |
Family ID | 50974897 |
Filed Date | 2014-06-26 |
United States Patent
Application |
20140178348 |
Kind Code |
A1 |
Kelsey; Karl ; et
al. |
June 26, 2014 |
Methods using DNA methylation for identifying a cell or a mixture
of cells for prognosis and diagnosis of diseases, and for cell
remediation therapies
Abstract
Methods using DNA Methylation arrays are provided for
identifying a cell or mixture of cells and for quantification of
alterations in distribution of cells in blood or in tissues, and
for diagnosing, prognosing and treating disease conditions,
particularly cancer. The methods use fresh and archival
samples.
Inventors: |
Kelsey; Karl; (Brookline,
MA) ; Houseman; Eugene Andres; (Albany, OR) ;
Wiencke; John; (San Francisco, CA) ; Accomando, JR.;
William P.; (Providence, RI) ; Marsit; Carmen;
(Enfield, NH) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
The Regents of the University of California
Brown University |
Oakland
Providence |
CA
RI |
US
US |
|
|
Assignee: |
The Regents of the University of
California
Oakland
CA
Brown University
Providence
RI
|
Family ID: |
50974897 |
Appl. No.: |
14/089398 |
Filed: |
November 25, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
PCT/US2012/039699 |
May 25, 2012 |
|
|
|
14089398 |
|
|
|
|
61489883 |
May 25, 2011 |
|
|
|
61509644 |
Jul 20, 2011 |
|
|
|
61585892 |
Jan 12, 2012 |
|
|
|
61619663 |
Apr 3, 2012 |
|
|
|
61865479 |
Aug 13, 2013 |
|
|
|
Current U.S.
Class: |
424/93.71 ;
435/6.11; 506/16; 506/9; 702/19 |
Current CPC
Class: |
C12Q 2600/154 20130101;
C12Q 1/6886 20130101; C12Q 2600/112 20130101; C12Q 2600/16
20130101; C12Q 2600/118 20130101 |
Class at
Publication: |
424/93.71 ;
435/6.11; 506/9; 506/16; 702/19 |
International
Class: |
C12Q 1/68 20060101
C12Q001/68; G06F 19/24 20060101 G06F019/24 |
Goverment Interests
GOVERNMENT SUPPORT
[0002] This invention was made with government support under grants
R01CA126831, R01CA52689, R01CA126939, R01CA121147, R01CA100679,
R01CA078609, R01ES06717, R01MH094609 and P50-CA97257 awarded by the
National Institutes of Health. The government has certain rights in
the invention.
Claims
1. A method for assessing a disease condition in a subject,
comprising: measuring a CD3Z positive T lymphocyte cell number in a
sample from the subject by analyzing methylation in the sample of
at least one CpG dinucleotide (CpG) in gene CD3Z or in an
orthologous or a paralogous gene thereof, wherein an amount of a
demethylated C of the at least one CpG in the sample is a measure
of CD3+ T lymphocyte cell number; and comparing the amount of the
demethylated C in the sample from the subject with that in positive
control samples from patients with the disease condition, and with
that in negative control samples from healthy subjects, wherein the
disease condition is selected from: an autoimmune disease, an
allergy, a transplant rejection, obesity, an inherited disease,
immunosuppression and a cancer.
2. The method according to claim 1, wherein assessing a disease
condition comprises at least one of: monitoring, diagnosing,
prognosing, and measuring response to therapy by comparing the
measured CD3+ T lymphocyte cell numbers in the subject after
therapy to that in the patients with the disease condition and in
the healthy subjects.
3-13. (canceled)
14. A kit for measuring CD3+ T lymphocyte and FOXP3+ T regulatory
cell numbers, by analyzing methylation of CpG positions in CD3Z and
FOXP3 genes, the kit comprising sequencing and PCR primers specific
for the CD3Z and the FOXP3 gene DMRs and instructions for analyzing
and comparing methylation of the CpG positions of a subject in need
of diagnosis of a disease with that of control subjects.
15. A method for assessing a disease condition by estimating an
alteration in proportions of types of leukocytes in a sample from a
subject, the method comprising: measuring a DNA methylation profile
for each type of leukocyte and for unfractionated cells, wherein
DNA methylation profiles are obtained for a plurality of CpG loci,
and obtaining the status of an individual CpG locus by amplifying
DNA from each of the types of leukocyte and from the unfractionated
cells, wherein amplifying comprises hybridizing methylation
sensitive locus-specific DNA oligomers corresponding to each CpG
locus; ordering CpG loci by ability to distinguish types of
leukocytes, wherein the ordering of the CpG loci determines
differentially methylated DNA regions (DMRs), wherein obtaining
DMRs comprises statistically minimizing introduction of bias in
amount of total methylation status of a large number of CpG loci
obtained from the unfractionated cells by employing a Bayesian
treatment utilizing prior probabilities of the methylation status
at each individual locus, thereby identifying a plurality of CpG
loci to include in the measurement, wherein an amount of CpG loci
distinguishes DMR signatures among the types of leukocytes and
minimizes bias; obtaining DNA methylation profiles comprising DMRs
from the types of leukocytes, wherein the DNA methylation profiles
comprise validating measures of relative amounts of the types of
leukocytes, and obtaining DNA methylation profiles of the
unfractionated cells as surrogate measures of relative amounts of
each type of leukocyte in the unfractionated cells; employing an
analog of a measurement error model wherein a DNA methylation
surrogate y is reverse formulated with respect to the disease
outcome z, as y=f(z), wherein y denotes a multivariate random
variable representing a methylation profile, z denotes a disease
outcome or state, and f denotes a probability distribution; y, z,
and leukocyte distribution, .omega. are related by the estimator
equations, E(y|.omega.)=g(.omega.), and under an assumption
E=(z|.omega.,y)=E(z|.omega.), wherein E denotes an expectation of a
random variable and .omega. denotes a subject specific distribution
of leukocytes; and, comparing relative amounts of each type of
leukocyte in the sample from the subject with those in a control
sample, thereby providing an assessment of the disease
condition.
16. The method according to claim 15, wherein the locus-specific
DNA oligomers are linked to an array selected from the group of: a
glass slide array; a quartz slide array; a fiber optic bundle
array, a planar slide array, a micro-well array; a multi-well dish
array; a digital PCR array; and a bead array having beads located
at known addressable locations on the array.
17-26. (canceled)
27. A method of predicting a methylation class membership in a
bodily fluid sample of a subject for assessing disease status of
the subject, wherein the methylation class membership corresponds
to an epigenetic signature of a plurality of leukocyte types, the
method comprising: measuring amounts of DNA methylation in each of
a plurality of leukocyte type populations to determine
differentially methylated regions (DMRs); ranking leukocyte DMRs
for each leukocyte type according to statistical strength of
association of the DMR with each leukocyte type; randomly dividing
a data set of control subjects and subjects with a disease into
groups having substantially the same numbers of control subjects
and subjects with the disease to obtain a training set and a
testing set; clustering samples in the training set using a defined
number of highest ranked leukocyte DMRs to determine clustering
solutions, wherein a clustering solution corresponds to the
methylation class membership; and predicting the methylation class
membership for subjects within the testing set by applying the
clustering solutions obtained from the training set to the highest
ranked leukocyte DMRs in the testing set, wherein clinical utility
of the predicted methylation class membership is determined by
testing association of the predicted methylation class membership
with the disease status of the subject.
28. The method according to claim 27, wherein the highest ranked
leukocyte DMRs is shown in Table 21, wherein each DMR is identified
by chromosomal location and gene name, and the defined number of
highest ranked leukocyte DMRs is selected from: at least 10, at
least 20, at least 30, at least 40 and 50.
29-36. (canceled)
37. An array for estimating proportions of leukocyte types in a
sample from a mammal for assessing a disease condition of the
mammal by analyzing differential methylation of CpG dinucleotides
in a plurality of genes of the sample, the array comprising: a
plurality of DNA probes attached to a plurality of surfaces at
known addressable locations on the array, wherein the surface at
each location is attached to a DNA probe having a specific
nucleotide sequence, wherein the DNA probe having the specific
nucleotide sequence hybridizes to a DNA sequence of a methylated
form or an ummethylated form of a CpG dinucleotide in a sequence of
a gene of the plurality of genes in the sample, wherein the array
is selected from having: at least 16 probes, at least 64 probes, at
least 96 probes, and at least 384 probes.
38. The array according to claim 37, wherein the plurality of DNA
probes has nucleotide sequences that hybridize with a respective
plurality of 118 different nucleotide sequences occurring in the
plurality of genes.
39. The array according to claim 38, wherein the plurality of 118
nucleotide sequences comprises at least one gene or locus selected
from the group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID
NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID
NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ
ID NO: 14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18,
SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID
NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ
ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32,
SEQ ID NO:33, SEQ ID NO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID
NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ
ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46,
SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID
NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID NO:55, SEQ
ID NO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60,
SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID
NO:65, SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ
ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO: 74,
SEQ ID NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID
NO:79, SEQ ID NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ
ID NO:84, SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88,
SEQ ID NO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID
NO:93, SEQ ID NO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119,
SEQ ID NO:120, SEQ ID NO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID
NO:124, SEQ ID NO:125, SEQ ID NO:126, SEQ ID NO:127, SEQ ID NO:128,
SEQ ID NO:129, SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:132, SEQ ID
NO:133, SEQ ID NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQ ID
NO:137, SEQ ID NO:138, SEQ ID NO:139, and SEQ ID NO:140.
40-46. (canceled)
47. A method for estimating proportions of types of leukocytes in a
sample from a subject for assessing a disease condition of the
subject by analyzing differential methylation of CpG dinucleotides
in a plurality of genes of the sample, the method comprising:
providing an array having a plurality of DNA probes attached to a
plurality of surfaces at known addressable locations on the array,
wherein the surface at each location is attached to a DNA probe
having a specific nucleotide sequence; reacting genomic DNA in the
sample with a bisulfite reagent to convert unmethylated cytosine
residues to uracil; hybridizing resulting bisulfite treated genomic
DNA with the array to obtain resulting hybridized probes on the
array, wherein the DNA probes hybridize to a DNA sequence of each
of a methylated form and an ummethylated form of a sequence having
a CpG dinucleotide in a gene for each of the plurality of genes;
and detecting the methylation status of each of the CpG
dinucleotides in each sequence, thereby estimating proportions of
types of leukocyte in the sample from the subject for assessing the
disease condition of the subject.
49. The method according to claim 48, wherein amplifying by PCR
further comprises: using primers pairs having a 5' primer specific
to each of the methylated or the unmethylated form of the CpG
dinucleotide containing gene, and a 3' primer specific to the gene
containing the CpG dinucleotide, thereby obtaining a first PCR
product; amplifying the first PCR product with differentially
labeled 5' primers specific for each of the methylated and the
unmethylated form of the CpG dinucleotide sequence containing gene,
and a common 3' primer, thereby obtaining a differentially labeled
second PCR product, and hybridizing the second PCR product to the
CpG dinucleotide containing gene for measuring amount of the second
PCR product, thereby detecting the methylation status of the CpG
dinucleotide sequence.
50-51. (canceled)
52. The method according to claim 47, wherein the plurality of
probes on the array hybridizes with a respective plurality of 118
different sequences occurring in the plurality of genes.
53. The method according to claim 52, wherein each probe on the
array is complementary to at least one nucleotide sequence selected
from the group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID
NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID
NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ
ID NO: 14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18,
SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID
NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ
ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32,
SEQ ID NO:33, SEQ ID NO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID
NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ
ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46,
SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID
NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID NO:55, SEQ
ID NO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60,
SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID
NO:65, SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ
ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO: 74,
SEQ ID NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID
NO:79, SEQ ID NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ
ID NO:84, SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88,
SEQ ID NO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID
NO:93, SEQ ID NO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119,
SEQ ID NO:120, SEQ ID NO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID
NO:124, SEQ ID NO:125, SEQ ID NO:126, SEQ ID NO:127, SEQ ID NO:128,
SEQ ID NO:129, SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:132, SEQ ID
NO:133, SEQ ID NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQ ID
NO:137, SEQ ID NO:138, SEQ ID NO:139, and SEQ ID NO:140.
54. The method according to claim 47, wherein the disease condition
assessed is selected from: an autoimmune disease, an allergy, a
transplant rejection, obesity, an inherited disease, and a
cancer.
55-58. (canceled)
59. A kit for estimating proportions of leukocyte types in a sample
from a subject by analyzing differential methylation of CpG
dinucleotides in a plurality of genes of the sample, the kit
comprising: an array comprising: a plurality of DNA probes attached
to a plurality of surfaces at known addressable locations on the
array, wherein the surface at each location is attached to a DNA
probe having a specific nucleotide sequence, wherein the DNA probe
having the specific nucleotide sequence hybridizes to a DNA
sequence of a methylated form or an ummethylated form of a CpG
dinucleotide in a sequence of a gene of the plurality of genes in
the sample, wherein the array is selected from having: at least 16
probes, at least 64 probes, at least 96 probes, and at least 384
probes; primers and reagents for detecting the hybridized probes
and for detecting the reaction products derived from the hybridized
probes; and instructions for using the array with a bisulfate
reagent, thereby providing an estimation of proportions of
leukocyte types in the sample.
60. (canceled)
61. The kit according to claim 59 wherein, the probes have
nucleotide sequences complementary to at least one selected from
the group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4,
SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9,
SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:
14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID
NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ
ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28,
SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID
NO:33, SEQ ID NO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ
ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42,
SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID
NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ
ID NO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID NO:55, SEQ ID NO:56,
SEQ ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60, SEQ ID
NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ
ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70,
SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO: 74, SEQ ID
NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ
ID NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ ID NO:84,
SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ ID
NO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ
ID NO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119, SEQ ID
NO:120, SEQ ID NO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID NO:124,
SEQ ID NO:125, SEQ ID NO:126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID
NO:129, SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:132, SEQ ID NO:133,
SEQ ID NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQ
Ill NO:138, SEQ ID NO:139, and SEQ ID NO:140.
62-65. (canceled)
66. A method of treating a subject for a disease condition, wherein
the subject is a human patient and wherein the disease condition is
a cancer, the method comprising: obtaining signatures comprising
differentially methylated regions (DMRs) from types of leukocytes
in a blood sample of the patient, the types of leukocytes
comprising at least one selected from: CD19+ B lymphocyte, CD15+
granulocyte, CD14+ monocyte, CD56.sup.dim Natural Killer cell,
CD56.sup.bright Natural Killer cell, and CD3+ T lymphocyte; and
from a healthy control human subject not having the cancer;
comparing a signature for a specific type of leukocyte in the
patient with that in the healthy subject, wherein the signature for
the specific type of leukocyte is an indication of amount of cells
of the specific type of leukocyte circulating in blood, and wherein
a decreased amount of the cells of the specific type of leukocyte
circulating in the blood of the patient compared to the healthy
subject is an indicium of the cancer; and, administering a
composition comprising the cells of the type of leukocyte to the
patient, thereby increasing the amount of the cells of the type of
leukocyte in the patient and treating the cancer.
67. The method according to claim 66, wherein the leukocyte type
cell is the CD56.sup.dim Natural Killer cell.
68-69. (canceled)
70. The method according to claim 67, wherein the DMR signature
specific for CD56.sup.dim Natural Killer cells comprises a CpG
dinucleotide in a region near the promoter of the gene NKp46,
wherein the methylation status of the CpG dinucleotide is
quantified by methylation specific quantitative polymerase chain
reaction (MS-qPCR) using primers and probes having SEQ ID NOs:
116-118 and 97-99.
71. The method according to claim 67, wherein the DMR signature
specific for CD56.sup.dim Natural Killer cells is a CpG
dinucleotide in a region near the promoter of the gene NKp46,
wherein the methylation status of the CpG dinucleotide is
quantified by digital PCR comprising emulsion and nanofluidic
partitioning using primers and probes having SEQ ID NOs: 116-118
and 97-99.
72-73. (canceled)
74. The method according to claim 66, wherein the signature
comprises at least one gene or locus selected from the group
consisting of: SEQ ID NO:1, SEQ ID NO:2, SEQ NO:3, SEQ ID NO:4, SEQ
ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID
NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO: 14, SEQ
ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19,
SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID
NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ
ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33,
SEQ ID NO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID
NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ
ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47,
SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID
NO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID NO:55, SEQ ID NO:56, SEQ
ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60, SEQ ID NO:61,
SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ ID
NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ
ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO: 74, SEQ ID NO:75,
SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID
NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ ID NO:84, SEQ
ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ ID NO:89,
SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ ID NO:
94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119, SEQ ID NO:120, SEQ
ID NO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID NO:124, SEQ ID
NO:125, SEQ ID NO:126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID NO:129,
SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:132, SEQ ID NO:133, SEQ ID
NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQ ID
NO:138, SEQ ID NO:139, and SEQ ID NO:140.
75. The method according to claim 74, wherein the at least one gene
or locus is selected from the group consisting of: FGD2, HLA-DOB,
BLK, IGSF6, CLDN15, SFT2D3, ZNF22, CEL, HDC, GSG1, FCN1, OSBPL5,
LDB2, NCR1, EPS8L3, CD3D, PPP6C, CD3G, TXK, and FAIM.
76. The method according to claim 74, wherein the at least one gene
or locus is selected from the group consisting of: CLEC9A (2 loci),
INPP5D, INHBE, UNQ473, SLC7A11, ZNF22, XYLB, HDC, RGR, SLCO2B1,
C1orf54, TM4SF19, IGSF6, KRTHA6, CCL21, SLC11A1, FGD2, TCL1A, MGMT,
CD19, LILRB4, VPREB3, FLJ10379, HLA-DOB, EPS8L3, SHANK1, CD3D (2
loci), CHRNA3, CD3G (2 loci), RARA, and GRASP.
Description
RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. provisional
application Ser. No. 61/865,479 filed Aug. 13, 2013, entitled,
"Methods using DNA methylation for identifying a cell or a mixture
of cells for prognosis and diagnosis of diseases, and for cell
remediation therapies", and is a continuation-in-part of
international application number PCT/US2012/39699 filed May 25,
2012, entitled, "Methods using DNA Methylation for identifying a
cell or a mixture of cells for prognosis and diagnosis of diseases,
and for cell remediation therapies" which claims the benefit of
provisional applications having Ser. Nos. 61/489,883 filed May 25,
2011 entitled, "Methods of Immunodiagnostics using DNA Methylation
arrays as surrogate measures of the identity of a cell or a mixture
of cells"; 61/509,644 filed Jul. 20, 2011 entitled, "Methods of
Immunodiagnostics using DNA Methylation arrays as surrogate
measures of the identity of a cell or a mixture of cells for
prognosis and diagnosis of diseases"; 61/585,892 filed Jan. 12,
2012 entitled, "Methods of Immunodiagnostics using DNA Methylation
arrays as surrogate measures of the identity of a cell or a mixture
of cells for prognosis and diagnosis of diseases"; and 61/619,663,
filed Apr. 3, 2012 entitled, "Methods using DNA Methylation arrays
for identifying a cell or a mixture of cells for prognosis and
diagnosis of diseases, and for cell remediation therapies",
inventors Karl Kelsey, Eugene Andres Houseman, John Wiencke,
William P. Accomando, Jr. and Carmen Marsit, of which each patent
application is hereby incorporated by reference herein in its
entirety.
TECHNICAL FIELD
[0003] Methods of determining altered immune cell distribution to
diagnose or prognose a disease condition based on determining DNA
methylation signatures of specific immune cell type of or mixture
of immune cells types are provided.
BACKGROUND
[0004] Leukocytes, commonly called white blood cells, are cells
that are primarily responsible for mounting an immune response by a
host to pathogens and to foreign antigens. Leukocyte distribution
is currently determined by simple histologic or flow cytometric
assessments. These methods have significant limitations. In
particular, flow cytometry is limited by the following:
availability of fluorescent antibody tags, laborious nature of the
antibody tagging process, and needs for separation of cells
requiring large volumes of fresh cells, expensive technology as
well as equipment for detection of cells, and maintaining the
integrity of the outer membrane of the cells to preserve labile
protein epitopes. Further limitation of methods requiring fresh
cells is that the methods are not useful in situations in which
prospective studies are impractical, such as in the case of rare
diseases, in which large numbers of disease subjects are not
available. In these cases retrospective studies are needed to
correlate disease outcome with disease parameters. However,
retrospective studies can be performed only if archival samples
derived from archived cohort populations could be used to analyze
the disease parameters. Currently there are no known methods in
which archived samples from patients and normal subjects could be
used to provide a quantitative estimate of leukocyte distributions
in disease conditions.
[0005] Thus there is a need for methods that provide quantification
of alterations in distribution of leukocytes in blood or tissues in
disease conditions that do not rely upon fresh samples, that are
not labor intensive and that do not use expensive technology or
equipment.
SUMMARY
[0006] In diverse medical conditions such as in disease or in
instances of immune-toxic exposure, the leukocyte distribution in
blood or tissues contains information about the underlying
immune-biology of the medical condition which is useful for
diagnosis, prognosis or treatment of the medical condition, or for
monitoring response to therapy. Accordingly, an embodiment of the
invention provides a method a method for assessing a disease
condition in a subject, including: measuring a CD3Z positive T
lymphocyte cell number in a sample from the subject by analyzing
methylation in the sample of at least one CpG dinucleotide (CpG) in
gene CD3Z or in an orthologous or a paralogous gene thereof, such
that an amount of a demethylated C of the at least one CpG in the
sample is a measure of CD3+ T lymphocyte cell number; and comparing
the amount of the demethylated C in the sample from the subject
with that in positive control samples from patients with the
disease condition, and with that in negative control samples from
healthy subjects, such that the disease condition is selected from:
an autoimmune disease, an allergy, a transplant rejection, obesity,
an inherited disease, immunosuppression and a cancer. As used
herein "subject" refers to any animal, for example, a mammal that
is healthy or that has a disease condition for example a human, or
a high value agricultural animal or a zoo animal. A "patient" is a
subject that either has a disease condition or is in need of
obtaining a diagnosis of a disease condition.
[0007] A related embodiment of the method includes at least one of:
monitoring, diagnosing, prognosing, and measuring response to
therapy by comparing the measured CD3+ T lymphocyte cell numbers in
the subject after therapy to that in the patients with the disease
condition and in the healthy subjects.
[0008] An embodiment of the method provides that the inherited
disease is an aneuploidy. For example, aneuploidy is selected from
trisomy 21, Turner's syndrome, and Klinefelter's syndrome.
[0009] The sample used in the method is a fresh sample. For
example, the fresh sample is freshly drawn blood, a tumor
infiltrate or cells obtained from a lymph node puncture.
Alternatively, the sample is an archival sample. For example, the
archival sample is archival blood collected and stored on filter
paper cards such as a Guthrie card, frozen blood specimens or
frozen tissue. Demethylation of DNA is a stable chemical
modification of DNA, and archival samples are used to measure cell
numbers. Flow cytometry in contrast, requires fresh cells, for
detection of cells depends on the availability of protein epitopes,
which are labile and not well preserved in archival samples.
[0010] In a related embodiment of the method the amount of the
demethylated C of the at least one CpG in the CD3Z gene in the
sample is at least about 80%, at least about 90%, or at least about
95% of the total amount of the CpG in CD3Z genes in the sample.
[0011] An embodiment of the method further involves analyzing the
methylation of the CD3Z gene further by amplifying by Polymerase
Chain Reaction (PCR) using primer pairs specific for amplification
of specific demethylated CpG loci. For example, amplification by
PCR involves monitoring quantitative PCR in real time using a
MethyLight assay or using digital PCR. In various embodiments, the
CpG loci are listed herein. For at least one gene or locus is
selected from the group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3,
SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8,
SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID
NO:13, SEQ ID NO: 14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ
ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22,
SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID
NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ
ID NO:32, SEQ ID NO:33, SEQ ID NO: 34, SEQ ID NO:35, SEQ ID NO:36,
SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID
NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ
ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50,
SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID
NO:55, SEQ ID NO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ
ID NO:60, SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64,
SEQ ID NO:65, SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID
NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ
ID NO: 74, SEQ ID NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78,
SEQ ID NO:79, SEQ ID NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID
NO:83, SEQ ID NO:84, SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ
ID NO:88, SEQ ID NO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92,
SEQ ID NO:93, SEQ ID NO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID
NO:119, SEQ ID NO:120, SEQ ID NO:121, SEQ ID NO:122, SEQ ID NO:123,
SEQ ID NO:124, SEQ ID NO:125, SEQ ID NO:126, SEQ ID NO:127, SEQ ID
NO:128, SEQ ID NO:129, SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:132,
SEQ ID NO:133, SEQ ID NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQ ID
NO:137, SEQ ID NO:138, SEQ ID NO:139, and SEQ ID NO:140. In various
embodiments, at least one locus is selected from the group
consisting of: FGD2, HLA-DOB, BLK IGSF6, CLDN15, SFT2D3, ZNF22,
CEL, HDC, GSG1, FCN1, OSBPL5, LDB2, NCR1, EPS8L3, CD3D, PPP6C,
CD3G, TXK, and FAIM. In various embodiments, at least one locus is
selected from the group consisting of: CLEC9A (2 loci), INPP5D,
INHBE, UNQ473, SLC7A11, ZNF22, XYLB, HDC, RGR, SLCO2B1, C1orf54,
TM4SF19, IGSF6, KRTHA6, CCL21, SLC11A1, FGD2, TCL1A, MGMT, CD19,
LILRB4, VPREB3, FLJ10379, HLA-DOB, EPS8L3, SHANK1, CD3D (2 loci),
CHRNA3, CD3G (2 loci), RARA, and GRASP. The nucleotide sequence and
corresponding amino acid sequence of each of the genes or loci
herein are listed and characterized in genome or protein databases
such as GenBank, European Nucleotide Archive, European
Bioinformatics Institute, GenomeNet, or The National Center for
Biotechnology Information (NCBI) Protein database. The nucleotide
sequences of the loci in computer readable form as an ASCII text
file (114 kilobytes) created Nov. 25, 2013 entitled
"SEQ_ID.sub.--11252013" containing sequence listings numbers 1-140
has been electronically filed herewith and is incorporated by
reference herein in its entirety. In various embodiments, each
locus includes a portion of any of the sequences described
herein.
[0012] An embodiment of the method further involves analyzing the
methylation of the CD3Z gene by a method selected from the group
of: Pyrosequencing, Methylation-sensitive single-nucleotide primer
extension (Ms-SNuPE), Methylation-sensitive single stranded
conformation analysis (MS-SSCA), and High resolution melting
analysis (HRM) and digital PCR methods comprising emulsion and
nanofluidic partitioning. According to a related embodiment,
Methylation-sensitive single-nucleotide primer extension further
includes: chemically converting the lymphocyte derived whole
genomic DNA with bisulfate; amplifying chemically converted whole
genomic DNA; enzymatically fragmenting resulting amplified DNA;
hybridizing fragmented DNA to methylation sensitive CpG locus
specific DNA oligomers; and labeling by single-base extension using
fluorescently labeled nucleotides.
[0013] Another embodiment of the method further provides steps for
analyzing methylation of differentially methylated regions (DMRs)
of gene FOXP3, using primer pairs for amplification of specific
loci of demethylated CpG in the FOXP3 gene. Within a gene "loci" as
used herein refers to locations of CpG dinucleotide containing
sequences present in that gene, and only one or a few may be
differentially demethylated in a specific cell.
[0014] A related embodiment of the method further includes:
determining a ratio of CpG demethylation of FOXP3 gene DMR to the
CpG demethylation of CD3Z gene DMR, in a sample of tumor
infiltrate, such that the ratio involves an index of T regulatory
cell number to the total T cell number in the infiltrate; and the
method further involves diagnosing of a pathological grade of the
cancer, so that the index of T regulatory cell number to the total
T cell number in the tumor infiltrate correlates with the grade of
the cancer. In a related embodiment, the cancer is selected from: a
glioma; an ovarian cancer; a head and neck squamous cell cancer
(HNSCC), breast cancer, lung cancer, prostate cancer, colon cancer,
pancreatic cancer, bladder cancer, cervical cancer and liver
cancer.
[0015] In a related embodiment the method further includes
prognosing survival of a patient having or needing a diagnosis of
glioma or HNSCC, in which amount of demethylation of CD3Z gene DMR
in the patient as a percent of total DNA greater than a median
value in a sample population of subjects correlates with a
prognosis of poor survival.
[0016] An embodiment of the invention provides a kit for measuring
CD3+ T lymphocyte and FOXP3+ T regulatory cell numbers by analyzing
methylation of CpG positions in CD3Z and FOXP3 genes, the kit
having sequencing and PCR primers specific for the CD3Z and the
FOXP3 gene DMRs and instructions for analyzing and comparing the
CpG methylation between healthy subjects and a patient.
[0017] An embodiment provides a method for assessing a disease
condition by estimating an alteration in proportions of types of
leukocytes in a sample from a subject, the method including the
steps of: measuring a DNA methylation profile for each type of
leukocyte and for unfractionated cells, such that DNA methylation
profiles are obtained for a plurality of CpG loci, and obtaining
the status of an individual CpG locus by amplifying DNA from each
of the types of leukocyte and from the unfractionated cells, such
that amplifying comprises hybridizing methylation sensitive
locus-specific DNA oligomers corresponding to each CpG locus;
ordering CpG loci by ability to distinguish types of leukocytes,
such that the ordering of the CpG loci determines differentially
methylated DNA regions (DMRs), such that obtaining DMRs comprises
statistically minimizing introduction of bias in amount of total
methylation status of a large number of CpG loci obtained from the
unfractionated cells by employing a Bayesian treatment of prior
probabilities of the methylation status at each individual locus,
thereby identifying a plurality of CpG loci to include in the
measurement, such that an amount of CpG loci distinguishes DMR
signatures among the types of leukocytes and minimizes bias;
obtaining DNA methylation profiles comprising DMRs from the types
of leukocytes, such that the DNA methylation profiles comprise
validating measures of relative amounts of the types of leukocytes,
and obtaining DNA methylation profiles of the unfractionated cells
as surrogate measures of relative amounts of each leukocyte type in
the unfractionated cells; employing an analog of a measurement
error model wherein a DNA methylation surrogate y is reverse
formulated with respect to the disease outcome z, as
y=f(z),
such that y denotes a multivariate random variable representing a
methylation profile, z denotes a disease outcome or state, and f
denotes a probability distribution; y, z, and leukocyte
distribution, .omega. are related by the estimator equations,
E(y|.omega.)=g(.omega.), and
under an assumption E(z|.omega.,y)=E(z|.omega.), such that, E
denotes an expectation of a random variable and .omega. denotes a
subject specific distribution of leukocytes; and, comparing
relative amounts of each type of leukocyte in the sample from the
subject with those in a control sample, thereby providing an
assessment of the disease condition. In related embodiments, the
locus-specific DNA oligomers are linked to an array selected from
the group of: a glass slide array; a quartz slide array; a fiber
optic bundle array, a planar slide array, a micro-well array; a
multi-well dish array; a digital PCR array; and a bead array having
beads located at known addressable locations on the array. A
related embodiment of the method further provides at least one of
steps of: monitoring, diagnosing, prognosing and measuring response
to therapy of the disease condition.
[0018] The method in a related embodiment further includes
analyzing sensitivity for correcting bias, such that correcting
bias is unrelated to measurement error and is related to errors
arising from unprofiled cell types and non-cell mediated profile
differences. In related embodiments of the method, fractionated
leukocyte types include at least one selected from: CD19+ B
lymphocytes, CD15+ granulocytes, CD14+ monocytes, CD56+ Natural
Killer cells, and CD3+ T lymphocytes.
[0019] In an embodiment of the method the disease condition is Head
and Neck Squamous Cell Carcinoma (HNSCC).
[0020] An embodiment of the method provides that the inherited
disease is an aneuploidy. For example, aneuploidy is selected from
trisomy 21, Turner's syndrome, and Klinefelter's syndrome.
[0021] According to another embodiment of the method the control
sample is taken from the subject at a different point in time for
prognosis of the course of the disease condition in the subject. In
another related embodiment, the method of assessing disease
condition further includes after employing the measurement model,
comparing the distribution of leukocytes to the relative amounts in
the control sample as a normal standard, such that the normal
standard is a statistical measure obtained from a plurality of
disease-free subjects.
[0022] In a related embodiment the method provides a diagnosis of
immunosuppression due to smoking in a currently smoking subject by:
determining a ratio of CpG demethylation of FOXP3 gene DMR to the
CpG demethylation of CD3Z gene DMR in blood in the currently
smoking subject, such that the ratio is an index of T regulatory
cell number to the total T cell number; and providing a diagnosis
of immunosuppression in the currently smoking subject, such that
the value of the index of T regulatory cell number to the total T
cell number in the currently smoking subject, greater than the
average value in a sample population of currently non-smoking
subjects correlates with immunosuppression due to smoking. In a
related embodiment of the method the subject with the
currently-smoking or currently non-smoking status is a patient
having a cancer, an infection or in need of a transplant.
[0023] An embodiment provides a method of predicting a methylation
class membership in a bodily fluid sample of a subject for
assessing disease status of the subject, in which the methylation
class membership corresponds to an epigenetic signature of a
plurality of leukocyte types, the method including: measuring
amounts of DNA methylation in each of a plurality of leukocyte type
populations to determine differentially methylated regions
(DMRs);
ranking leukocyte DMRs for each leukocyte type according to
statistical strength of association of the DMR with each leukocyte
type; randomly dividing a data set of control subjects and subjects
with a disease into groups having substantially the same numbers of
control subjects and subjects with the disease to obtain a training
set and a testing set; clustering samples in the training set using
a defined number of highest ranked leukocyte DMRs to determine
clustering solutions, in which a clustering solution corresponds to
the methylation class membership; and predicting methylation class
membership for subjects within the testing set by applying the
clustering solutions obtained from the training set to the highest
ranked leukocyte DMRs in the testing set, such that clinical
utility of the predicted methylation class membership is determined
by testing association of the predicted methylation class
membership with the disease status of the subject.
[0024] According to an embodiment of the method, the highest ranked
leukocyte DMRs are as shown in Table 21, in which each DMR is
identified by chromosomal location and gene name, and the defined
number of highest ranked leukocyte DMRs is selected from: least 10,
at least 20, at least 30, at least 40 and is 50.
[0025] The methylation class membership of the subject in the
testing set is predicted for example using a naive Bayes
classifier. Testing the association of the predicted methylation
class with disease status includes for example using receiver
operating characteristic curves (ROC) and the corresponding area
under each curve.
[0026] The bodily fluid sample in some embodiments is a fresh
sample, for example freshly collected blood or a blood derivative.
Alternatively, the bodily fluid is an archival sample, for example
stored frozen blood or archival blood collected and stored on a
filter paper card such as a Guthrie card.
[0027] The method in a related embodiment includes at least one of:
diagnosing, monitoring, prognosing and measuring response to
therapy of the disease status.
[0028] In related embodiments the leukocyte types are selected from
the group of: natural killer cells, B Cells, CD4+ T cells, CD8+ T
cells, granulocytes and monocytes. The disease according to an
embodiment of the method is exemplified by one of: head and neck
squamous cell carcinoma (HNSCC), ovarian cancer, and bladder
cancer.
[0029] An array is provided as another embodiment for estimating
proportions of leukocyte types in a sample from a mammal for
assessing a disease condition of the mammal by analyzing
differential methylation of CpG dinucleotides in a plurality of
genes of the sample, the array including: a plurality of DNA probes
attached to a plurality of surfaces at known addressable locations
on the array, such that the surface at each location is attached to
a DNA probe having a specific nucleotide sequence, such that the
DNA probe having the specific nucleotide sequence hybridizes to a
nucleotide sequence of a methylated form or an ummethylated form of
a CpG dinucleotide in a sequence of a gene of the plurality of
genes in the sample, such that the array is selected from having:
at least 16 probes, at least 64 probes, at least 96 probes, and at
least 384 probes.
[0030] The plurality of probes, in a related embodiment of the
array, have nucleotide sequences that hybridize with a respective
plurality of 118 different nucleotide sequences which are found in
nature occurring in the plurality of genes. In another related
embodiment, the plurality of probes include at least one of SEQ ID
NO: 1 to SEQ ID NO: 96. In various embodiments of the array, the
plurality of probes have nucleotide sequences that hybridize with
at least one gene or locus described herein. For example, the at
least one gene or locus is any of SEQ ID NO: 1-140. In various
embodiments, the at least one gene or locus is selected from the
group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ
ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID
NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO: 14, SEQ
ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19,
SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID
NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ
ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33,
SEQ ID NO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID
NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ
ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47,
SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID
NO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID NO:55, SEQ ID NO:56, SEQ
ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60, SEQ ID NO:61,
SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ ID
NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ
ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO: 74, SEQ ID NO:75,
SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID
NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ ID NO:84, SEQ
ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ ID NO:89,
SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ ID NO:
94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119, SEQ ID NO:120, SEQ
ID NO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID NO:124, SEQ ID
NO:125, SEQ ID NO:126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID NO:129,
SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:132, SEQ ID NO:133, SEQ ID
NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQ ID
NO:138, SEQ ID NO:139, and SEQ ID NO:140.
[0031] In a related embodiment of the array, the addressable
locations are wells of a substrate, such that the substrate is
selected from: glass slide; quartz slide: fiber optic bundle and
planar silica slides. In another related embodiment the surfaces
included in the array are particles added to the wells.
[0032] In alternative embodiments the addressable locations of the
array are defined spots on a glass slide or are microbeads or
particles labeled with a code. For example, the particles are
microbeads in the form of glass cylinders identifiable with
inscribed holographic code.
[0033] In various embodiments the disease condition is selected
from: an autoimmune disease, an allergy, a transplant rejection,
obesity, an inherited disease, immunosuppression and a cancer.
[0034] Another embodiment provides a method for estimating
proportions of types of leukocytes in a sample from a subject for
assessing a disease condition of the subject by analyzing
differential methylation of CpG dinucleotides in a plurality of
genes of the sample, the method including: providing an array
having a plurality of DNA probes attached to a plurality of
surfaces at known addressable locations on the array, such that the
surface at each location is attached to a DNA probe having a
specific nucleotide sequence; reacting genomic DNA in the sample
with a bisulfite reagent to convert unmethylated cytosine residues
to uracil; hybridizing resulting bisulfite treated genomic DNA with
the array to obtain resulting hybridized probes on the array, such
that the DNA probes hybridize to a DNA sequence of each of a
methylated form and an ummethylated form of a sequence having a CpG
dinucleotide in a gene for each of the plurality of genes; and
detecting the methylation status of each of the CpG dinucleotides
in each sequence, thereby estimating proportions of types of
leukocyte in the sample from the subject for assessing the disease
condition of the subject.
[0035] In a related embodiment, detecting the methylation status of
the CpG dinucleotide sequence includes: extending each hybridized
probe of the resulting hybridized probes on the array by primer
extension to obtain a resulting primer extension product; ligating
the resulting primer extension product to an oligonucleotide
complementary to the DNA sequence of a 3'' region of the gene to
obtain a resulting template for PCR on the array; and amplifying by
PCR and measuring amount of resulting PCR product, thereby
detecting the methylation status of the CpG dinucleotide containing
nucleotide sequence.
[0036] In another related embodiment amplifying by PCR further
includes: amplifying the resulting template on the array using
primers pairs including a 5' primer specific to each of the
methylated or the unmethylated form of the CpG dinucleotide
containing gene, and a 3' primer specific to the gene containing
the CpG dinucleotide, thereby resulting in a first PCR product;
amplifying the resulting first PCR product with differentially
labeled 5' primers that specifically amplify either the methylated
or the unmethylated form of the CpG dinucleotide containing
nucleotide sequence containing gene, and a common 3' primer,
resulting in a differentially labeled second PCR product, and
hybridizing the second PCR product to the CpG dinucleotide
containing gene for measuring amount of the second PCR product,
thereby detecting the methylation status of the CpG dinucleotide
sequence.
[0037] Detecting the methylation status of the CpG dinucleotide
sequence, in another related embodiment of the method, includes
extending the resulting hybridized probes on the array by single
base primer extension with a labeled nucleotide.
[0038] The array used in the method, in a related embodiment,
includes at least 16 probes, at least 64, at least 96 probes or at
least 384 probes. In another related embodiment of the method the
plurality of probes on the array hybridizes with a plurality of 118
different nucleotide sequences occurring in the plurality of genes.
In yet another related embodiment of the method each probe on the
array is complementary to nucleotide sequences having SEQ ID NO: 1
to SEQ ID NO: 96.
[0039] In various embodiments of the method, at least one probe on
the array is complementary to a nucleotide sequence described
herein, for example the nucleotide sequence corresponds to a gene
or locus described herein. In various embodiments, the gene or the
locus is found herein in an example, a figure, or a table. In
various embodiments, the gene or locus is selected from the group
of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID
NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID
NO:10, SEQ Ill NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO: 14,
SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID
NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ
ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28,
SEQ ID NO:29. SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID
NO:33, SEQ ID NO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ
ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42,
SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID
NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ
ID NO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID NO:55, SEQ ID NO:56,
SEQ ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60, SEQ ID
NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ
ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70,
SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO: 74, SEQ ID
NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ
ID NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ ID NO:84,
SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ ID
NO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ
ID NO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119, SEQ ID
NO:120, SEQ ID NO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID NO:124,
SEQ ID NO:125, SEQ ID NO:126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID
NO:129, SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:132, SEQ ID NO:133,
SEQ ID NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQ ID
NO:138, SEQ ID NO:139, and SEQ ID NO:140.
[0040] In various embodiments of the method, the disease condition
assessed is selected from: an autoimmune disease, an allergy, a
transplant rejection, obesity, an inherited disease, and a cancer.
Assessing the disease condition using the array, in related
embodiments of the method, includes at least one of: monitoring,
diagnosing, prognosing, and measuring response to therapy by
comparing estimated proportions of types of leukocytes of the
subject after therapy to proportions of leukocytes from a healthy
subject.
[0041] In a related embodiment of the method the sample containing
the genomic DNA used to hybridize with the probes on the array is
fresh i.e., obtained in real time prior to performing the method.
In another related embodiment of the method the sample is
archival.
[0042] In various embodiments of the method for estimating
proportions of leukocytes using the array, the leukocyte types
include at least one selected from: CD19+ B lymphocytes, CD15+
granulocytes, CD14+ monocytes, CD56+ natural Killer cells, and CD3+
T lymphocytes.
[0043] Another related embodiment provides a kit for estimating
proportions of leukocyte types in a sample by analyzing
differential methylation of CpG dinucleotides in a plurality of
genes of the sample, the kit including: an array having: a
plurality of DNA probes attached to a plurality of surfaces at
known addressable locations on the array, such that the surface at
each location is attached to a DNA probe having a specific
nucleotide sequence, such that the DNA probe having the specific
nucleotide sequence hybridizes to a DNA sequence of a methylated
form or an ummethylated form of a CpG dinucleotide in a sequence of
a gene of the plurality of genes in the sample, such that the array
is selected from having: at least 16 probes, at least 64 probes, at
least 96 probes, and at least 384 probes; primers and reagents for
detecting the hybridized probes and for detecting the reaction
products derived from the hybridized probes; and instructions for
using the array with a bisulfite reagent, thereby providing an
estimation of proportions of leukocyte types in the sample.
[0044] In a related embodiment of the kit, the probes hybridize
with a respective plurality of 118 different DNA sequences
occurring in the plurality of genes. In yet another related
embodiment of the kit the probes have nucleotide sequences
complementary to 96 nucleotide sequences having SEQ ID NO: 1 to SEQ
ID NO: 96.
[0045] In various embodiments of the kit, at least one probe is
complementary to a nucleotide sequence described herein, for
example at least one nucleotide sequence corresponds to a gene or
locus described herein. For example, the gene or locus is shown or
listed in an example, a figure, or a table herein. In various
embodiments, the gene or locus is at least one selected from the
group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ
ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID
NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO: 14, SEQ
ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19,
SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID
NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ
ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33,
SEQ ID NO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID
NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ
ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47,
SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID
NO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID NO:55, SEQ ID NO:56, SEQ
ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60, SEQ ID NO:61,
SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ ID
NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ
ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO: 74, SEQ ID NO:75,
SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID
NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ ID NO:84, SEQ
ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ ID NO:89,
SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ ID NO:
94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119, SEQ ID NO:120, SEQ
ID NO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID NO:124, SEQ ID
NO:125, SEQ ID NO:126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID NO:129,
SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:132, SEQ ID NO:133, SEQ ID
NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQ ID
NO:138, SEQ ID NO:139, and SEQ ID NO:140.
[0046] The instructions in a related embodiment of the kit include
methods for: reacting genomic DNA in the sample with the bisulfite
reagent to convert unmethylated cytosine residues to uracil;
hybridizing resulting bisulfite treated genomic DNA with probes
immobilized to the surfaces to obtain resulting hybridized probes
on the array, such that the DNA probes hybridize to a DNA sequence
of each of a methylated form and an ummethylated form of a CpG
dinucleotide sequence in a gene of the plurality of genes; and
detecting the methylation status of the CpG dinucleotide sequence,
thereby estimating proportions of leukocyte types in the sample
from the subject for assessing the disease condition of the
subject.
[0047] In a related embodiment of the kit the instructions for
detecting the methylation status of the CpG dinucleotide sequence
include methods for: extending each hybridized probe of the
resulting hybridized probes on the array by primer extension to
obtain a resulting primer extension product; ligating the resulting
primer extension product to an oligonucleotide complementary to the
DNA sequence of a 3' region of the gene to obtain a resulting
template for PCR on the array; and amplifying by PCR and measuring
amount of resulting PCR product, thereby detecting the methylation
status of the CpG dinucleotide sequence.
[0048] In another related embodiment of the instructions for kit
amplifying by PCR include methods for: amplifying the resulting
template on the array using primers pairs having a 5' primer
specific to each of the methylated or the unmethylated form of the
CpG dinucleotide containing gene, and a 3' primer specific to the
gene containing the CpG dinucleotide, thereby resulting in a first
PCR product; amplifying the resulting first PCR product with
differentially labeled 5' primers that specifically amplify each of
the methylated and unmethylated form of the CpG dinucleotide
sequence containing gene, and a common 3' primer, resulting in a
differentially labeled second PCR product, and hybridizing the
second PCR product to the CpG dinucleotide containing gene for
measuring amount of the second PCR product, to detect the
methylation status of the CpG dinucleotide sequence.
[0049] Instructions for detecting the methylation status of the CpG
dinucleotide sequence, in another related embodiment of the kit,
include methods for extending the resulting hybridized probes on
the array by single base primer extension with a labeled
nucleotide.
[0050] Another embodiment of the invention is a method of treating
a subject for a disease condition, such that the subject is a human
patient and, such that the disease condition is a cancer, the
method comprising: obtaining signatures comprising differentially
methylated regions (DMRs) from types of leukocytes in a blood
sample of the patient, the types of leukocytes comprising at least
one selected from: CD19+ B lymphocyte, CD15+ granulocyte, CD14+
monocyte, CD56.sup.dim Natural Killer cell, CD56.sup.bright Natural
Killer cell, and CD3+ T lymphocyte, and from a healthy control
human subject not having the cancer; comparing a signature specific
for the type of leukocyte in the patient with that in the healthy
subject, such that the type of leukocyte specific signature is an
indication of amount of cells of the type of leukocyte circulating
in blood, and such that a decreased amount of the cells of the type
of leukocyte circulating in the blood of the patient compared to
the healthy subject is an indicium of the cancer; and,
administering a composition comprising the cells of the type of
leukocyte to the patient, thereby increasing the amount of the
cells of the type of leukocyte in the patient and treating the
cancer.
[0051] In various embodiments of the method the leukocyte type cell
is the CD56.sup.dim Natural Killer cell.
[0052] The cancer in related embodiments of the method is head and
neck squamous cell carcinoma (HNSCC). In embodiments of the method
the DMR signature specific for CD56.sup.dim Natural Killer cells
includes at least one CpG dinucleotide in a region near the
promoter of gene NKp46. In other embodiments of the method the DMR
signature specific for CD56.sup.dim Natural Killer cells is a CpG
dinucleotide in a region near the promoter of the gene NKp46, such
the methylation status of the CpG dinucleotide is quantified by
methylation specific quantitative polymerase chain reaction
(MS-qPCR) using primers and probes having SEQ ID NOs: 116-118 and
97-99. According to other embodiments of the method, the DMR
signature specific for CD56.sup.dim Natural Killer cells is a CpG
dinucleotide in a region near the promoter of the gene NKp46, such
that the methylation status of the CpG dinucleotide is quantified
by digital PCR involving emulsion and nanofluidic partitioning
using primers and probes having SEQ ID NOs: 116-118 and 97-99.
[0053] In related embodiments of the method the blood sample is
archival. Alternatively the blood sample is fresh.
[0054] In various embodiments of the method, the signature
comprises at least one gene or locus described or shown in examples
herein, for example SEQ ID NO: 1-96 and 119-140. In various
embodiments of the method, the at least one gene or locus is
selected from the group consisting of: FGD2, HLA-DOB, BLK, IGSF6,
CLDN15, SFT2D3, ZNF22, CEL, HDC, GSG1, ECM1, OSBPL5, LDB2, NCR1,
EPS8L3, CD3D, PPP6C, CD3G, TXK, and FAIM. In various embodiments of
the method, the at least one gene or locus is selected from the
group consisting of: CLEC9A (2 loci), INPP5D, INHBE, UNQ473,
SLC7A11, ZNF22, XYLB, HDC, RGR, SLCO2B1, C1orf54, TM4SF19, IGSF6,
KRTHA6, CCL21, SLC11A1, FGD2, TCL1A, MGMT, CD19, LILRB4, VPREB3,
FLJ10379, HLA-DOB, EPS8L3, SHANK1, CD3D (2 loci), CHRNA3, CD3G (2
loci), RARA, and GRASP.
BRIEF DESCRIPTION OF THE DRAWINGS
[0055] FIG. 1 is a photograph of a clustering heatmap for External
Validation White Blood Cell Data (S.sub.0). The data were obtained
by applying the measurement error formulation described in Examples
1-3. The method delineates effects resulting from immune cell
distribution as compared to those resulting from other "non-cell
type" alterations in DNA methylation. Methylation array procedure
was carried out using Infinium HumanMethylation27 Beadchip
Microarrays from Illumina, Inc. (San Diego, Calif.). The White
Blood Cell data were gathered from a set of 46 samples of purified
white blood leukocyte subtypes obtained commercially.
Light=unmethylated (Y.sub.hj=0), black=partially methylated
(Y.sub.hj=0:5), dark=methylated (Y.sub.hj=1).
[0056] FIG. 2 is a chart of the results of cell mixture
reconstruction experiments validating prediction of individual
sample profiles. The reconstruction experiments involved six known
mixtures of monocytes and B cells and six known mixtures of
granulocytes and T cells. Known fractions (Expected) and resulting
predictions from Infinium 27K profiles (Observed) percentages of
each cell type are shown by shade (dark=100, white=0).
[0057] FIG. 3 is a photograph of a clustering heatmap for Target
HNSCC data (S.sub.1). The target data set S.sub.1 consisted of
arrays applied to whole blood specimens collected in a random
subset of individuals involved in an ongoing population-based
case-control study (Peters et al., 2005) of head and neck cancer
(HNSCC): 92 cases and 92 age and sex matched controls. Blood was
drawn at enrollment (prior to treatment in 85% of the cases).
Yellow or light areas represent unmethylated (Y.sub.hj=0), black
areas represent partially methylated (Y.sub.hj=0:5), gray areas
represent methylated (Y.sub.hj=1). The annotation track above the
heatmap indicates case-control status.
[0058] FIG. 4 is a graphical representation of bias sensitivity
analysis for HNSCC Data. Bias was assessed by resampling the case
coefficients of B.sub.1, a procedure that assumes maximum bias. The
abscissa shows the number of assumed non-zero alterations. The
knob-shaped central portions of each thick vertical lines (red)
indicate median value, the thick vertical lines (blue) indicate
interquartile range, the thin lines (blue) represent 95%
probability ranges, and the upper dots (black) represent 99%
probability ranges.
[0059] FIG. 5A and FIG. 5B are graphs of Rate-of-Convergence of the
Hessian matrix H.sub.m which allows the determination of the
optimal number of CpG sites whose combined methylation status
measurements most accurately reflect the exact distribution of
different cells in a mixture. The x-axis represents increasing m,
the number of CpG sites (ordered by F-statistic) included in the
model space, on a logarithmic scale.
[0060] FIG. 5A shows convergence by correlating the Hessian Matrix
with the number of CpG sites included in the measurement. The
dotted line shows the tangent at low values of m.
[0061] FIG. 5B shows the Rate of convergence which was calculated
by smoothing the first differences of log.sub.10(trH.sub.m). The
dotted line (red) in (B) corresponds to linear convergence.
[0062] FIG. 6 is a photograph of a clustering heatmap for Target
Ovarian Cancer data (S.sub.1) (Teschendorff et al., 2009, PLoS ONE
4, e8274). Only those cases were included in which blood was
collected pre-treatment. After removing four arrays with a
preponderance of missing values, the data set consisted of 272
controls and 129 cases having blood drawn prior to treatment.
Light=unmethylated (Y.sub.hj=0), black=partially methylated
(Y.sub.hj=0:5), dark=methylated (Y.sub.hj=1). The annotation track
above the heatmap indicates case-control status (cancer case or
control).
[0063] FIG. 7 is a photograph of a clustering heatmap for Target
Down Syndrome Data. The method herein was applied to a trisomy 21
(Down syndrome) data set (Kerkel et al., PLoS Genet. 2010,
6(11):e1001212) consisting of 29 total peripheral blood leukocyte
samples from Down syndrome cases and 21 controls, as well as six T
cell samples from cases and four T cell samples from controls (GEO
Accession number GSE25395). Light=unmethylated (Y.sub.hj=0),
black=partially methylated (Y.sub.hj=0:5), dark=methylated
(Y.sub.hj=1). The annotation track above the heatmap indicates
case-control and cell type status [Down syndrome case (whole
blood), control (whole blood), T cell (pooled cases and
controls)].
[0064] FIG. 8 is a photograph of a clustering heatmap for Target
Obesity Data obtained from applying the methods herein to an
obesity data set (Wang et al., BMC Med 2010, 8:87) having 7 lean
African-Americans and 7 Obese African-Americans (GEO Accession
number GSE25301). Light areas represent unmethylated (Y.sub.hj=0),
black areas represent partially methylated (Y.sub.hj=0:5), grey
areas represent methylated (Y.sub.hj=1). The annotation track above
the heatmap indicates case-control status (obese and lean).
[0065] FIG. 9 is a photograph of the methylation profiles of white
blood cells obtained from a DNA methylation array analysis
described in Example 9. Methylation array assay was performed using
Infinium HumanMethylation27 Beadchip Microarrays obtained from
Illumina, Inc. (San Diego, Calif.). The number of individual
leukocyte samples in each methylation class is shown in the table
to the right. The DNA methylation profile distinguishes Lymphocytes
from Myeloid Derived Leukocytes. The 5000 most variable CpG loci
are plotted on the left. Less methylated loci are represented as
grey areas and more methylated loci are represented as black areas.
A partitioned mixture model (RPMM) of autosomal gene Infinium beta
values from sorted human peripheral blood leukocytes was performed
using an R version 2.11.1 of Illumina's software which provides
convenient mechanisms for loading and analyzing of the results of
methylation status, and quality control and basic visualization
tasks.
[0066] FIG. 10A and FIG. 10B are graphical representations of the
DNA methylation status of regions in CD3E and CD3Z genes.
[0067] FIG. 10 A shows DNA methylation status of a region in CD3E
that was identified from the DNA methylation array analysis (the
results of which are shown in FIG. 9) as one of the two candidate
DMRs with specificity towards CD3+ T cells. The DNA methylation
status was measured by pyrosequencing bisulfite converted DNA from
different sorted, human, peripheral blood leukocytes.
[0068] FIG. 10 B shows DNA methylation status of a region in CD3Z
gene that was identified from the DNA methylation array analysis
(the results of which are shown in FIG. 9) as one of the two
candidate DMRs with specificity towards CD3+ T cells. The DNA
methylation status of the region in CD3Z gene in different sorted,
human, peripheral blood leukocytes was measured by MethyLight.RTM.
qPCR.
[0069] FIG. 11 is a drawing of the genomic region containing CD3Z
gene, based on information available from the public databases
UniProt, RefSeq and GenBank. UniProt is a freely accessible
universal protein resource of protein sequence and functional
information. RefSeq is a collection that provides integrated and
annotated set of sequences including genomic DNA, transcripts and
protein. GenBank.RTM. is the genetic sequence database of the
National Institutes of Health which contains an annotated
collection of publicly available DNA sequences.
[0070] FIG. 12 is a list of genomic regions used for measuring
methylation of CD3Z and FOXP3 gene, for quantitating genome copy
numbers, and a list of the corresponding primer and probe
sequences. Underlined letters are "C" in CpG motifs.
[0071] FIG. 13 A, FIG. 13B and FIG. 13C are graphical
representations of standard calibration curves which show the
relationship between copy numbers of genomic DNA and the signal
obtained from quantitative real time methylation specific PCR. The
calibration curves are used for quantifying CD3+ T cells, Tregs
(FOXP3 demethylated) and ratios of Tregs/CD3+ T cells. DNA isolated
from purified cell types was bisulfite converted and serially
diluted into a background of fully methylated commercial DNA
standard (Qiagen). The total genomic copy numbers of each sample
within a dilution series remained constant. Log dilutions were
performed in the appropriate range of Ct values corresponding to
test samples (whole blood, tumor specimens). Using cytosine-less:
C-less primers genome copy numbers for each test standard were
measured to ensure adequate input DNA and to normalize the CD3+ and
Treg assay values.
[0072] FIG. 13A shows the calibration curve for C-less total input.
(N=eight replicates); errors denote standard error of the mean Ct
value.
[0073] FIG. 13B shows dilution of isolated normal PanT cells
(N=seven replicates).
[0074] FIG. 13C shows dilution and calibration curve for isolated
CD3+CD25+ T cells (N=8 eight replicates). Calibration curves (FIG.
13A-C) were used to estimate total input copies, CD3+ T cell and
Tregs copies, respectively.
[0075] FIG. 14A-D are a drawing and a set of graphical
representations showing detection of CD3+ T cell numbers by
measuring differential demethylation using MS-qPCR.
[0076] FIG. 14A is a schematic diagram showing methylation specific
primers and probe targeting six CpGs (lollipops) in a region of the
CD3Z gene identified herein as demethylated in CD3+ T cells.
[0077] FIG. 14B shows results of real time PCR. The real time PCR
Ct values decreased linearly with a ten-fold increase in bisulfite
converted CD3+ T cell DNA concentration. Bisulfite converted
universal methylated DNA was used to keep total amount of DNA in
samples constant. At least five replicates of each sample were
plotted.
[0078] FIG. 14C shows correlation between T cell levels determined
by flow cytometry and CD3Z MS-qPCR. Evaluation of CD3+ T cell level
by flow cytometry was observed to be highly correlated with T cell
quantification by CD3Z MS-qPCR in whole blood specimens from glioma
patients and healthy donors.
[0079] FIG. 14D shows correlation between T cell counts obtained
using by immunohistochemical staining and CD3Z MS-qPCR. CD3+ T cell
count by immunohistochemical staining correlates with T cell
quantification by CD3Z MS-qPCR in excised tumors across
histological subtypes. Pearson correlations and F-test p-values are
shown in FIG. 14B-D.
[0080] FIG. 15 A, FIG. 15B and FIG. 15C (FIG. 15A-C) are graphical
representations showing T cells and Tregs in the peripheral blood
of glioblastoma multiform (GBM) patients and healthy donors
determined by MS-qPCR for demethylation of specific CpG loci.
[0081] FIG. 15A shows comparison of T cell numbers in blood between
GBM patients and control subjects measured using CD3Z demethylation
assay.
[0082] FIG. 15B shows comparison of Tregs between GBM patients and
control subjects measured using FOXP3 demethylation assay.
[0083] FIG. 15C is a graph showing comparison of Treg percent of T
cells between GBM patients and control subjects determined by the
ratio of FOXP3/CD3Z demethylation. Wilcoxon rank sum p-values are
shown.
[0084] FIG. 16 A, FIG. 16B and FIG. 16 C (FIG. 16A-C) are graphical
representations showing association between cigarette smoking and
peripheral blood T cells and Tregs in glioma patients and healthy
donors determined by MS-qPCR for demethylation of specific CpG
loci.
[0085] FIG. 16A shows a comparison of peripheral blood T cell
levels, determined by CD3Z demethylation, among never, former and
current cigarette smokers stratified by glioma case status
(indicated "cases" on the abscissa).
[0086] FIG. 16B shows a comparison of peripheral blood Treg levels,
determined by FOXP3 demethylation, among never, former and current
cigarette smokers stratified by glioma case status.
[0087] FIG. 16C shows a comparison of peripheral blood Treg percent
of T cells, determined by ratio of FOXP3 to CD3Z demethylation,
among never, former and current cigarette smokers stratified by
glioma case status. Wilcoxon rank sum p-values are shown.
[0088] FIG. 17A, FIG. 17B and FIG. 17C (FIG. 17A-C) are graphical
representations showing levels of T cell and Treg infiltrates in
excised glioma tumors determined by MS-qPCR for demethylation of
specific CpG loci.
[0089] FIG. 17A shows T cell levels, determined by CD3Z
demethylation, in solid glioma samples stratified by tumor
grade.
[0090] FIG. 17B shows Treg levels, determined by FOXP3
demethylation, in solid glioma samples stratified by tumor
grade.
[0091] FIG. 17C shows Treg percent of T cells, determined by ratio
of FOXP3 to CD3Z demethylation, in solid glioma samples stratified
by tumor grade. Wilcoxon rank sum p-values are shown.
[0092] FIG. 18A, FIG. 18B and FIG. 18C (FIG. 18A-C) are graphical
representations of flow cytometry analysis of CD3+ T cells and
total leukocytes in whole blood from glioma cases and controls.
[0093] FIG. 18A shows a forward and side scatter plot of a
representative blood sample showing gating for lymphocytes and
counting beads.
[0094] FIG. 18B shows lymphocyte subpopulation observed using
gating for CD3 expression.
[0095] FIG. 18C shows CD45 gating on non-bead events. CD45+ low and
high cells were added in order to count total CD45+ cells.
[0096] FIG. 19A-C are photographs and a lie graph that show
immunohistochemical (1HC) staining of a representative GBM
specimen.
[0097] FIG. 19A shows CD3 staining. Average number of cells
positive for staining was 418.
[0098] FIG. 19 B shows CD8 staining. Average number of cells
positive for staining was 296.
[0099] FIG. 19 C shows correlation of CD3 and CD8 staining, Pearson
r=0.992
[0100] FIG. 20 is a set of two heatmaps showing results of MS-qPCR
and bisulfite pyrosequencing of Magnetic activated cell sorting
(MACS) sorted human leukocyte subsets. Abbreviations: B=B
lymphocytes, Gran=Granulocytes, Neut=Neutrophils, Mono=Monocytes,
NK=CD56+ Natural killer cells, Nkdim=CD16+CD56dim natural killer
cells, NKbr=CD16-CD56bright natural killer cells, NK8+=CD8+CD56+
natural killer cells, NK8-=CD8-CD56+ natural killer cells,
NKT=CD3+CD56+ natural killer T cells, T=CD3+T lymphocytes,
CD8=CD3+CD8+ T lymphocytes (cytotoxic T cells), CD4=CD3+CD4+ T
lymphocytes (helper T cells), Treg=CD3+CD4+CD25+FOXP3+ regulatory T
cells.
[0101] FIG. 20 A is a heatmap of DNA methylation in FOXP3 and CD3Z
gene regions assessed by MS-qPCR.
[0102] FIG. 20 B is a heatmap of DNA methylation at three CpG loci
in the CD3Z gene assessed by bisulfite pyrosequencing.
[0103] FIG. 21A-C are graphical representations showing levels of T
cell and Treg infiltrates in glioma tissues stratified by
histological subtype determined by MS-qPCR for demethylation of
specific CpG loci. Abbreviations: PA=Pilocytic Astrocytoma,
EP=Ependymoma, OD=Oligodendroglioma, OA=Oligoastrocytoma,
AS=Astrocytoma, GBM=Glioblastoma multiforme. Kruskal-Wallis one-way
analysis of variance by rank test p-values is shown.
[0104] FIG. 21A shows T cell levels determined by CD3Z
demethylation in solid glioma samples stratified by tumor
histology.
[0105] FIG. 21B shows Treg levels determined by FOXP3 demethylation
in solid glioma samples stratified by tumor histology.
[0106] FIG. 21C shows Treg percent of T cells, determined by ratio
of FOXP3 to CD3Z demethylation in solid glioma samples stratified
by histology.
[0107] FIG. 22A-C are graphical representations showing Kaplan
Meier analysis of time of survival of glioma patients stratified
according to whether the level of T cells or Tregs in the tumor
infiltrates of the patients are above or below the median level of
T cells or Tregs, respectively. Log Rank p-values shown.
[0108] FIG. 22A shows survival (ordinate) of glioma patients as a
function of time (abscissa) in relation to T cell levels as
determined by CD3Z demethylation.
[0109] FIG. 22B shows survival of glioma patients in relation to
Treg levels as determined by FOXP3 demethylation.
[0110] FIG. 22C shows survival of glioma patients in relation to
Treg percent of T cells as determined by ratio of FOXP3 to CD3Z
demethylation.
[0111] FIG. 23A-B are representations of results obtained from
analysis of DMRs of leukocyte subtypes.
[0112] FIG. 23A shows a heat map of the methylation status for the
highest ranked 50 leukocyte DMRs by leukocyte subtype.
[0113] FIG. 23B shows a Plot depicting the -log 10(P-values) for
the highest ranked 50 leukocyte DMRs across three cancer data sets
(HNSCC; Ovarian; Bladder). P-values (ordinate) show methylation
differences between cancer cases and non-cancer controls and were
obtained from individual unconditional logistic regression models
fit to each of the 50 leukocyte DMRs. For the HNSCC data set,
logistic regression models were adjusted for patient age, gender,
smoking status (never, former, current), smoking pack years, weekly
alcohol consumption, and HPV serology status. The bladder cancer
data set was adjusted for patient age, gender, smoking status,
smoking pack years, and family history of bladder cancer. The
ovarian cancer data set was adjusted for patient age group (55-60,
60-65, 65-70, 70-75 and >75 years). The horizontal dashed line
represents -log 10(p=0.05).
[0114] FIG. 24A-B show results obtained from the DMR profile
analysis of the HNSCC data set determining methylation class
membership.
[0115] FIG. 24A left column shows a heat map of the HNSCC testing
data set. Rows represent subjects, which are grouped by predicted
methylation class membership. Columns represent the highest ranked
50 leukocyte DMRs that were used to generate the methylation
classes for the HNSCC testing set. FIG. 24 A right column is a
bar-plot depicting the percent cancer case/control across the
predicted methylation classes in the HNSCC testing set.
[0116] FIG. 24B shows receiver operating characteristic (ROC)
curves based on the predicted methylation classes only in the HNSCC
testing set and methylation classes including patient age, gender,
smoking status (never, former, current), smoking pack years, weekly
alcohol consumption, and HPV serostatus.
[0117] FIG. 25A-B show results obtained from the DMR profile
analysis of the Ovarian data set for determining methylation class
membership.
[0118] FIG. 25A is a heat map of the ovarian testing data set. Rows
represent subjects which are grouped by predicted methylation class
membership. Columns represent the highest ranked ten leukocyte DMRs
that were used to generate the methylation classes for the ovarian
testing set. FIG. 25 A right column is a bar-plot depicting the
percent cancer case/control across the predicted methylation
classes in the ovarian testing set.
[0119] FIG. 25B shows ROC curves based on the predicted methylation
classes alone in the ovarian testing set and methylation classes
plus patient age group (55-60, 60-65, 65-70, 70-75 and >75
years).
[0120] FIG. 26A-B show results obtained from the DMR profile
analysis of the bladder data set for determining methylation class
membership.
[0121] FIG. 26A is a heat map of the bladder testing data set. Rows
represent subjects, which are grouped by predicted methylation
class membership. Columns represent the highest ranked 56 leukocyte
DMRs that were used to generate the methylation classes for the
bladder testing set. FIG. 26 A right column represents a bar-plot
depicting the percent cancer case/control across the predicted
methylation classes in the bladder testing set.
[0122] FIG. 26B shows ROC curves based on the predicted methylation
classes alone in the bladder testing set and methylation classes
plus patient age, gender, smoking status (never, former, current),
smoking pack years, and family history of bladder cancer.
[0123] FIG. 27A-C are graphical representations showing image plots
representing the pairwise spearman correlation coefficients.
[0124] FIG. 27A shows the six CpG loci identified by HNSCC analysis
(Langevin S M et al., Epigenetics. 2012 March; 7(3):291-9) and the
highest ranked 50 leukocyte DMRs used in the present analysis.
[0125] FIG. 27B shows the seven CpG loci identified by the
alternative ovarian analysis and the highest ranked ten leukocyte
DMRs used in the present analysis,
and (c) the nine CpG loci identified by the bladder analysis
reported in (Laird P W, 2003 Nat Rev Cancer 3:253-266) and the
highest ranked 56 leukocyte DMRs used in the present analysis.
[0126] FIG. 27C shows the nine CpG loci identified by the bladder
analysis reported in (Shen L et al., 2007 PLoS genetics
3:2023-2036) and the highest ranked 56 leukocyte DMRs used in the
present analysis.
[0127] FIG. 28 is a schematic diagram showing hierarchy of
leukocyte subtypes and sample sizes for each of the leukocyte
subtypes used in the analysis for determination of methylation
class membership.
[0128] FIG. 29 is a diagram representing the analytic workflow the
HNSCC data set (n=184; 92 HNSCC cases and 92 cancer-free controls).
The full HNSCC data set was first divided into equally sized
training and testing sets. The training sets were used in
development of a classifier based on leukocyte DMRs. The resulting
classifiers were then used to predict methylation class membership
for the observations in the respective independent testing sets.
The phenotypic importance of the predicted methylation classes in
the testing data was examined subsequently.
[0129] FIG. 30 is a diagram representing the analytic workflow the
ovarian cancer data set (n=401; 128 ovarian cancer cases and 273
cancer-free controls). The full ovarian cancer data set was divided
into equally sized training and testing sets. The training sets
were used in the development of a classifier based on leukocyte
DMRs. The resulting classifiers were then used to predict
methylation class membership for the observations in the respective
independent testing sets. The phenotypic importance of the
predicted methylation classes in the testing data was then
examined.
[0130] FIG. 31 is a diagram representing the analytic workflow of
the bladder cancer data set (n=460; 23 Bladder cancer cases and 237
cancer-free controls). The full bladder cancer data set was divided
into equally sized training and testing sets. The training sets
were used in the development of a classifier based on leukocyte
DMRs. The resulting classifiers were then used to predict
methylation class membership for the observations in the respective
independent testing sets. The phenotypic importance of the
predicted methylation classes in the testing data was then
examined.
[0131] FIG. 32 is a diagram illustrating Semi-Supervised
Recursively Partitioned Mixture Models (SS-RPMM) for predicting
methylation class membership. The full methylation dataset was
randomly divided into training and testing sets. Using the training
data only, univariate models (adjusted for potential confounders)
were used to identify CpG loci whose methylation is most strongly
associated with the clinical variable of interest (i.e.,
case/control status). RPMM is then fit to the training data using
the M CpGs that are most associated with the clinical variable of
interest (M is determined using a nested cross-validation
procedure) CpGs. The resulting solution is then used in conjunction
with an empirical Bayes classifier to predict methylation class
membership for the observations in the testing data.
[0132] FIG. 33A-D show results obtained from SS-RPMM analysis (see
FIG. 30) of the ovarian cancer data set for determination of
methylation class membership.
[0133] FIG. 33A is a heatmap of the testing set obtained by
predicted methylation class using the SS-RPMM procedure. Rows
represent subjects and columns represent the seven CpG loci
identified by this analysis.
[0134] FIG. 33B represents percentage of cases/controls obtained by
predicted methylation class membership in the testing set.
[0135] FIG. 33C sows information regarding the seven CpG loci
identified by the SS-RPMM analysis.
[0136] FIG. 33D shows a ROC/AUC (area under the curve) analysis
based on the predicted methylation class memberships in the testing
set. Dark represents the ROC/AUC based on the predicted methylation
classes along and light represents the ROC/AUC using the predicted
methylation classes and patient age group.
[0137] FIG. 34 is a graphical representation showing loci in the
gene NKp46 chosen from candidate NK cell-specific differential DNA
methylation markers, selected by DNA methylation and mRNA
expression criteria.
[0138] Linear mixed effects modeling of DNA methylation microarray
data from MACS isolated human leukocytes generated a coefficient
estimating differential methylation in NK cells relative to other
cell subtypes, shown on the avscissa. Linear modeling of mRNA
microarray data from the same isolated cells determined log-fold
change in expression between NK cells and each of the following
subtypes: T cells, B cells, granulocytes and monocytes. The average
of these four log-fold change values is shown on the ordinate.
Significance for a particular gene region was achieved when
q<0.1 for four mRNA expression linear models as well as the DNA
methylation mixed effects model. Candidates for NK cell-specific
DNA methylation biomarkers were limited to significant gene loci
exhibiting decreased methylation in NK cells (methylation
estimate<0) and within genes that exhibited increased RNA
expression (log fold change>1). The candidate loci are marked
with asterisks in the top left quadrant, and NKp46 loci are marked
with grey asterisks.
[0139] FIG. 35 is a heatmap showing demethylation status of NKp46
determined by methylation specific quantitative PCR (MS-qPCR) of
isolated human leukocyte populations. Individual samples of (MACS)
purified white blood cell subtypes were subjected to a MS-qPCR
assay that detects demethylated copies of NKp46 DNA. Extent of
NKp46 methylation is illustrated in this heatmap in which light
indicates that copies of DNA in particular sample were demethylated
in the targeted region of NKp46, and dark indicates that copies
were methylated.
[0140] FIG. 36 is a line graph showing linearity of NKp46 MS-qPCR
calibration. Bisulfite converted universal methylated DNA was used
to standardize total amount of DNA in samples at a constant amount.
At least three replicates of each standard are plotted. Real time
PCR Ct values decrease linearly with ten-fold increase in bisulfite
converted NK cell DNA concentration.
[0141] FIG. 37 is a bar graph showing prevalence of HNSCC by normal
NKp46 demethylation tertile. Normal NKp46 demethylation tertile
cutoffs were determined from control blood samples only. Higher
tertiles indicate higher NK cell levels. HNSCC prevalence
(ordinate) refers to the percent of total cases in this example
whose NKp46 demethylation measurements fell within the control
derived tertile range. Displayed p-value is from a chi-squared test
for trend in proportions.
[0142] FIG. 38 is a heatmap showing methylation status of selected
NKp46 CpG loci measured by bisulfite pyrosequencing of isolated
human leukocytes. The methylation status of eight individual CpG
loci near the promoter region of NKp46 were interrogated by
pyrosequencing of bisulfite converted DNA extracted from Magnetic
activated cell sorting (MACS) isolated human leukocyte populations.
CpG numbers 2 through 7 represent the six loci targeted in the
MS-qPCR assay. This heatmap displays methylation levels at each
locus ranging from unmethylated (light) to methylated (dark).
[0143] FIG. 39 is a graph showing percent demethylation (ordinate)
of a DNA region in NKp46 in control and HNSCC patient blood samples
(abscissa) assessed by MS-qPCR. The NKp46 MS-qPCR assay measures
the extent of DNA demethylation. A higher level of demethylation
indicates a higher level of NK cells within a sample. Wilcoxon rank
sum p-value is displayed.
[0144] FIG. 40 is a listing of DNA sequences of regions in 96
different genes, each sequence having one CpG dinucleotide shown
within square brackets and used to determine methylation status of
the gene. The DNA sequence surrounding the CpG dinucleotides was
used to design probes for the array and for primers for performing
the methods for analyzing differential methylation. Also included
are the names of the genes, chromosome number indicating the
chromosome in which genes are located, the source of the DNA
sequences, Genebank accession numbers, and the coordinate of the
CpG dinucleotide in respective genes.
[0145] FIG. 41A-B are schematic diagrams showing different ways of
representing effects on measured DNA methylation due to an exposure
or a specific phenotype.
[0146] FIG. 41A depicts the marginal effects (.beta.) on measured
DNA methylation. The marginal effects are effects which are not
adjusted for white blood cell (WBC) distribution.
[0147] FIG. 41B depicts the effects on measured DNA methylation
adjusted for WBC distribution resulting from exposure or a specific
phenotype.
[0148] FIG. 42 is a set of graphical representations showing the
relationship between {circumflex over (.alpha.)} and {circumflex
over (.beta.)}, the effect on measured DNA methylation not adjusted
or adjusted for WBC distribution, for the covariate (e.g. age,
current smoker status, toe Arsenic concentration and Dye use) of
interest over autosomal CpGs. Dots represents overall methylation
as indicated by the first component of the coefficient vector
{circumflex over (.beta.)}, corresponding to the intercept (Example
38), light=low, black=moderate, dark=high. The diagonal straight
line represents identity ({circumflex over (.alpha.)}={circumflex
over (.beta.)}). The curve depicts a loess fit to the scatter
plot.
[0149] FIG. 43A-B are a graphical representation showing
fluorescence intensities of CD3Z gene amplified by digital droplet
PCR, and a graphical representation showing concentration of CD3Z
gene in PCR samples.
[0150] FIG. 43A shows a fluorescence intensity dot plot for
amplification of CD3Z gene by detection of intensities of 6 FAM
(6-Carboxyfluorescein). Positive and negative droplets are
distinguished by a horizontal line.
[0151] FIG. 43B shows a correlation of the concentration of copy
numbers of CD3Z gene obtained by measuring 6 FAM fluorescence
intensities and the expected copy numbers of CD3Z gene obtained by
dilution of a known amount of DNA from CD3+ T cells.
[0152] FIG. 44A-B are a graphical representation showing
fluorescence intensities of FoxP3 gene amplified by digital droplet
PCR, and a graphical representation showing concentration of FoxP3
gene in PCR samples.
[0153] FIG. 44A shows a fluorescence intensity dot plot for
amplification of FoxP3 gene by detection of intensities of 6 FAM
(6-Carboxyfluorescein). Positive and negative droplets are
distinguished by a horizontal line.
[0154] FIG. 44B shows a correlation of the concentration of copy
numbers of FoxP3 gene obtained by measuring 6 FAM fluorescence
intensities and the expected copy numbers of FoxP3 gene obtained by
dilution of a known amount of DNA from CD3+ T cells.
[0155] FIG. 45A-B are a graphical representation showing
fluorescence intensities of NKp46 gene amplified by digital droplet
PCR, and a table showing concentration of NKp46 gene in the PCR
samples amplified under different conditions.
[0156] FIG. 45A shows a fluorescence intensity dot plot for
amplification of NKp46 gene under different conditions by detection
of intensities of 6 FAM (6-Carboxyfluorescein). Positive and
negative droplets are distinguished by a horizontal line.
[0157] FIG. 45B is a table showing concentration of NKp46 gene in
copies/.mu.l determined under different PCR conditions as fractions
of methylated control DNA.
[0158] FIG. 46A-B are a graphical representation showing
fluorescence intensities of NKp46 gene amplified by digital droplet
PCR, and a table showing concentration of NKp46 gene in the PCR
samples amplified under different conditions.
[0159] FIG. 46A shows a fluorescence intensity dot plot for
amplification of NKp46 gene by detection of intensities of 6 FAM
(6-Carboxyfluorescein). The amplification of demethylated NKp46
locus was performed using C-less and NKp46 DMR specific primers and
probes, and results compared. Positive and negative droplets are
distinguished by a horizontal line.
[0160] FIG. 46B is a table showing concentration of NKp46 gene in
copies/.mu.l determined with whole blood DNA, Neutrophil DNA,
CD16+CD56.sup.dim NK cell DNA and CD16+CD56.sup.bright NK cell
DNA.
[0161] FIG. 47 is a drawing of processing and workflow of 85 venous
whole blood samples analyzed in Examples herein. Eighty five venous
whole blood samples were collected from disease free human donors.
Of these samples 79 samples were used for isolation of target cell
type by magnetic activated cell separation (MACS) and six samples
were subjected to conventional immune profiling in which fresh
aliquots were analyzed by protein based methods. Purity was
confirmed by fluorescence activated cell sorting (FACS) in 79
samples isolated by MACS. The six samples analyzed by conventional
immune profiling were placed in 12 specific different storage
conditions that differ by presence of coagulants, temperature,
and/or duration.
[0162] DNA was extracted from each of the 79 samples analyzed by
FACS and the 72 samples in the 12 specific storage conditions.
Aliquots of the genomic DNA from five of the FACS purified, DNA
extracted 79 samples were combined in quantities that mimicked
human blood as determined by artificially reconstituting peripheral
blood. Aliquots of each of seven of the cell DNA mixtures, the FACS
purified DNA extracted 79 samples, and the 72 samples stored
according to the 12 specific storage conditions were randomized.
Aliquots of each of the resulting 158 samples were contacted with
sodium bisulfate, for analysis of methylation status of cytosines
in DNA. Aliquots of 58 of these samples were analyzed using a
high-density methylation microarray (HDMA) and aliquots of 158
samples were analyzed using a low-density methylation microarray
(LDMA).
[0163] FIG. 48A-P are a set of graphs of representative FACS
results for purified WBC subsets used in examples herein. The lower
right quadrant of each panel indicates sample purity. The upper
right quadrant of each panel indicates the viability of the cells
in the sample.
[0164] FIG. 49 is a diagram representing MACS purified WBC subset
samples used to establish reference libraries of DNA methylation
signatures. Terminal nodes represent the final sample cell types,
which were each purified from a specimen of disease-free human
blood. The tree diagram indicates the hierarchical relationship of
sample cell lineages. Pan* samples were not subsequently selected
in the MACS separation process, and therefore contained a
biological mixture of subsets within the cell type immediately
above them in the tree.
[0165] FIG. 50 is a photograph of a clustering heatmap for WBC
lineage-specific DNA methylation. DNA methylation signatures
distinguishing normal human leukocyte subtypes were obtained using
a high-density DNA methylation microarray. Purified WBC subset
samples are displayed in FIG. 50 in columns with cell type
indicated at the bottom on the x-axis. Individual CpG loci are
displayed in rows with the gene containing each locus indicated to
the right on the y-axis. Methylation values from completely
unmethylated (represented by gray areas) to completely methylated
(represented by dark areas) are indicated in the key at the bottom
left. Samples and loci were organized according to unsupervised
hierarchical clustering.
[0166] FIG. 51 is a photograph of DNA methylation signatures
distinguishing normal human leukocyte subtypes that was obtained
using custom, low-density DNA methylation microarray. Purified WBC
subset samples are displayed in FIG. 51 in columns with cell type
indicated at the bottom on the x-axis. Individual CpG loci are
displayed in rows with the gene containing each locus indicated to
the right on the y-axis. Methylation values from completely
unmethylated (represented by gray areas) to completely methylated
(represented by dark areas) are indicated in the key at the bottom
left. Samples and loci were organized according to unsupervised,
hierarchical clustering.
[0167] FIG. 52 is a photograph of a crosscheck of purified WBC
subset samples that was obtained using on a high densityDNA
methylation microarray. The quantity of each of seven WBC subsets
(displayed on the abscissa) was predicted in the purified WBC
subset samples using DNA methylation. The true identity of each
purified WBC subset sample is shown on the ordinate, as indicated
to the right. Saturation of the interior bins indicate the
estimated proportions of WBC subsets, determined using DNA
methylation, in purified WBC subset samples, as shown in the key at
the bottom right.
[0168] FIG. 53 is a photograph of a crosscheck of purified WBC
subset samples that was obtained using a custom, low-density DNA
methylation microarray. The quantity of each of seven WBC subsets
(displayed on the abscissa) was predicted in the purified WBC
subset samples using DNA methylation. The true identity of each
purified WBC subset sample is shown on the ordinate, as indicated
to the right. Saturation of the interior bins indicate the
estimated proportions of WBC subsets, determined using DNA
methylation, in purified WBC subset samples, as shown in the key at
the bottom right.
[0169] FIG. 54A-D are graphs showing quantitative reconstructions
of leukocyte subsets that were obtained using a high density DNA
methylation microarray. In FIG. 54A-D, the abscissa displays
quantities of specific WBC subsets determined using DNA
methylation. Cell type is indicated by color (light and dark grays)
and sample type is indicated by shapes listed in the insets. Lines
are from the origin having a slope of one indicating ideal
correspondence between the displayed values in each panel. FIG. 54A
contains data for DNA from purified WBC subsets that were combined
in quantities mimicking human blood under clinical conditions. The
expected quantity of each cell type is plotted on the ordinate.
Whole blood samples from disease-free human donors were subjected
to WBC subset quantification by the described methods. The
granulocytes were observed to be the highest percentage of the
leukocytes (50-60%) compared to B-cells, T cells, NK cells and
monocytes (less than about 40%). FIG. 54B-D are graphs of data for
whole blood samples from disease-free human donors subjected to WBC
subset quantification by established methods: manual 5-part
differential (FIG. 54A); automated 5-part differential (FIG. 54B);
and FACS (FIG. 54D). It was observed that the five WBC
quantitations measured using DNA methylation were very close to the
values expected by other methods. In FIG. 54B-D, the neutrophils
had the highest percentage of leukocytes (50-60%) compared to cell
types lymphocytes, monocytes, and B cells. The methods herein
detected specific, clinically relevant modulations in peripheral
blood immune cell composition.
[0170] FIG. 55A-D are a set of graphs of quantitative
reconstruction of leukocyte subsets using a custom, low density DNA
methylation microarray. The abscissa indicates the quantities of
specific WBC subsets determined using DNA methylation. Cell type is
indicated by shading and sample type is indicated by shape of the
datum point, as described in the inset legends. Lines are drawn
from the origin with a slope of one indicating ideal correspondence
between the displayed values in each panel. The expected quantity
of each cell type is indicated by the ordinate. FIG. 55A is a graph
of DNA from purified WBC subsets that were combined in quantities
mimicking human blood under 19 clinical conditions. In FIG. 55 A
the granulocytes contained the highest percentage of leukocytes
(50-60%) compared to B-cells, T cells, NK cells and monocytes (less
than about 20%). FIG. 55B-D are graphs of data for whole blood
samples from disease-free human donors subjected to WBC subset
quantification by the following methods: manual 5-part differential
(FIG. 55A); automated 5-part differential (FIG. 55B); and FACS
(FIG. 55D). In FIG. 54B-D, the neutrophils were observed to have
the highest percentage of leukocytes (about 60%) compared to other
cell types including lymphocytes, monocytes, eosinophils,
basophils, T cells, NK cells, and B cells.
[0171] FIG. 56A-C are a set of graphs of comparisons of
conventional immune cell quantification methods. Cell type is
indicated by shading and disease-free human blood donor is
indicated by shape of the point, as described in the legends to the
right. Lines are drawn from the origin. A slope of one indicates
ideal correspondence between the displayed values in each panel.
The following methods were compared: manual 5-part differential and
CBC with automated 5-part differential (FIG. 56A); manual 5-part
differential and FACS (FIG. 56B); and CBC with automated 5-part
differential and FACS (FIG. 56C).
[0172] FIG. 57A-F are a set of graphs showing Bland-Altman
agreement of immune cell quantification methods/assays applied to
whole blood samples from disease free human donors. Each data point
corresponds to one WBC subset in one blood sample. The mean WBC
subset quantity (percent) determined by the two given methods is
indicated by the abscissa and the difference between the WBC subset
quantities (percent) determined by the two given methods is
indicated by the ordinate. The root-mean-square-error (RMSE) value
between the two given methods is shown at the top left, in units of
WBC subset quantity (percent). The data in FIG. 57A show agreement
between measurements obtained from the Low Density Methylation
Microarray (LDMA) DNA methylation and known amounts of each of the
cell types in laboratory constructed DNA mixtures. FIG. 57B-D
contain data that indicate agreement between immune cell
quantification using DNA methylation (DNAm) from the custom,
low-density DNA methylation microarray and either: manual 5-part
differential (FIG. 57B); CBC with automated 5-part differential
(FIG. 57C), and FACS (FIG. 57D). FIG. 57E-G contain data that
indicate agreement among the following immune cell quantification
methods: CBC with automated 5-part differential and FACS (FIG.
57E); manual 5-part differential and FACS (FIG. 57F); and manual
5-part differential and CBC with automated 5-part differential
(FIG. 57G).
[0173] FIG. 58 is a diagram showing details of workflow followed in
methods herein for whole blood samples from disease-free human
donors. The samples were subjected to following methods of WBC
subset quantification to compare to quantitative reconstruction of
WBC subsets using DNA methylation by the methods herein. Venous
whole blood was collected from a disease free human donor and
aliquots of the sample were contacted with heparin, citrate, or
EDTA. Each of the heparin, citrate, or EDTA samples was maintained
either as a fresh sample or as a sample stored overnight at room
temperature, 4.degree. C., or at -80.degree. C. The heparin fresh
sample was analyzed for WBC subsets by using flow cytometry, manual
differential WBC counting, automated differential WBC counting, a
high density methylation microarray (HDMA), or a low-density
methylation microarray (LDMA). The other samples including the
citrate and EDTA fresh samples or as samples stored overnight at
one of room temperature, 4.degree. C., or -80.degree. C., and the
heparin samples stored overnight at room temperature, 4.degree. C.,
or -80.degree. C. were each analyzed for WBC subsets using the HDMA
and LDMA.
[0174] FIG. 59A-D are a set of graphs showing comparisons of immune
cell quantification by DNA methylation for samples treated with
different blood anticoagulants and storage conditions. Blood
samples were from disease-free human donors. Lines are drawn from
the origin with a slope of one indicating ideal correspondence
between the displayed values in each panel. Cell type is indicated
by shading and shape of the datum point. FIG. 59A shows data for
DNA methylation for blood samples treated with citrate (open
circle) or EDTA (open square) as an anti-coagulant. FIG. 59B-D show
data for DNA methylation for blood samples treated with: heparin
(FIG. 59B); EDTA (FIG. 59B); or citrate (FIG. 59D) as an
anti-coagulant and stored at different conditions. The cells were
stored at room temperature (open circle), at 4.degree. C. (open
square), or at -80.degree. C. (open triangle). Comparable WBC
subset data were observed for fresh samples compared to samples
treated with different coagulants. Further, the WBC subset data for
samples stored at room temperature compared to samples stored at
4.degree. C. and -80.degree. C. were observed to be comparable.
DETAILED DESCRIPTION OF THE INVENTION
[0175] A model of hematopoiesis includes an early restriction point
at which multipotent progenitor cells become committed to either
lymphoid or myeloid lineages. The standard methods of
distinguishing immune cell lineages are inadequate for fully
distinguishing lineage commitment and the process of
hematopoiesis.
[0176] Epigenetics refers to heritable control of gene expression
that occurs without changing the sequence of DNA. Chromatin
packaging is a mechanism of epigenetic gene regulation which has
been implicated in cell lineage commitment and lineage-specific
gene expression. Transcriptionally inactive, or silenced,
heterochromatin is more tightly packaged around histone proteins
than transcriptionally active euchromatin due to differences in DNA
methylation patterns and post-translational histone modifications.
Due to its accessibility for measurement, DNA methylation is a
marker of chromatin packaging. DNA methylation is largely confined
to cytosine residues in CpG dinucleotides which, though
underrepresented in the genome, are frequently found in high
concentrations called CpG islands. Less methylated CpG islands are
highly associated with transcriptional activity and subsequent gene
expression, and more methylated CpG islands are highly associated
with transcriptional inactivity and gene silencing. Methylation of
CpG dinucleotides causes chromatin to become more compact and
inaccessible to transcription machinery by moving histones and
altering the organization of chromatin and nucleosomes.
(Christensen, B. C., et al. 2009, PLoS Genet. 5, e1000602; Schmidl,
C., et al 2009, Genome Res 19, 1165-1174).
[0177] In some instances, the overall balance of leukocyte
subclasses in circulation or in tissue most prominently influences
pathogenesis. For example, incipient cancer cells are recognized
and eliminated by cytotoxic T cells (CTLs) and natural killer (NK)
cells, and tumorigenesis is also promoted by certain other
inflammatory cells, including B-lymphocytes, mast cells,
neutrophils, regulatory T cells (Tregs), and others. These cells
have been shown to promote angiogenesis, tumor cell proliferation,
tissue invasion and metastasis (Hanahan and Weinberg 2011, Cell,
144, 646-74; Ostrand-Rosenberg, 2008, Curr Opin Genet Dev, 18,
11-18). Likewise, higher levels of NK cells and CTLs circulating in
the blood and residing in adipose tissues are associated with lower
incidence of metabolic diseases such as type II diabetes (Lynch et
al., 2009, Obesity, 17, 601-5), and higher levels of Ml macrophages
in adipose tissue can induce inflammation and insulin resistance
(Anderson et al., 2011, Curr Opin Lipidol. 21, 172-177). Methods of
quantifying the composition of lymphocyte populations can be
informative regarding the underlying immuno-biology of disease
states as well as the immune response to chronic medical
conditions. (Chua et al., 2011, Brit Cancer 104, 1288-1295).
[0178] The methods described herein provide a measurement of
individual human or animal immune cell numbers or immune cell
ratios and in diverse biologic media without the requirement for
viable cells or cell sorting or the use of any antibodies or
protein markers. The methods are applicable to blood including
samples of unsorted blood that is fresh, or is frozen or unfrozen
anticoagulant treated peripheral whole blood, finger stick blood,
non-anticoagulant treated whole blood, blood clots, isolated
mononuclear cells, huffy coat, archival Guthrie card neonatal
blood, and to a sample that is a spot, fresh, frozen or is from a
tumor such as a formalin-fixed tumor biopsy, and to urine sediment,
CNS fluid, fat or other tissue biopsy.
[0179] In one embodiment the methods described herein are provided
as diagnostic kits for testing laboratories in the form of immune
cell specific detection reagents, premixed and optimized plate
formatted multiplex assays for immune profiling compatible with
specific instrument platforms, applications for in vitro
diagnostics of blood, CNS, urine or bronchoalveolar lavage and
point of care blood sampling kits for mail-in immune testing and
immune monitoring.
[0180] The simplified DNA based immuno-diagnostic approach provided
herein uses samples that are much smaller volumes of blood than
required for earlier methods and that require no processing. These
samples can be simply `spotted` onto a solid phase carrier and
transported through the mail or delivered using courier.
[0181] In another embodiment, the methods described include
development of software that can process the output data of immune
specific methylation assays to create immune parameter reports by
comparison to different reference and control values.
[0182] In an alternate embodiment the methods herein describe a
discovery platform which is a bioinformatic integration of
empirically derived genome wide methylation analyses with
publically available differential gene expression analyses. The
merged datasets are then sorted to produce candidates for further
examination. The discovery platform is useful to discover
clinically useful gene biomarkers.
[0183] The methods described herein include a proof-of-principal
test of the discovery platform. For the test the goal set was to
discover a gene or gene set that provides a marker of CD3+ T cells.
The method is applicable to finding a biomarker for any cell.
Specifically, the platform identifies gene regions that are
`demethylated` within the target cell population (CD3+ T cell) and
completely methylated in non-target cells.
[0184] To accomplish this discovery phase for the set goal, normal
immune cells from the peripheral blood of different individuals was
isolated using flow cytometry antibody based cell sorting.
Following purification each of the immune cell subtypes was
subjected to methylation discovery analysis using the Infinium
genome-wide methylation platform. (Infinium.RTM. HumanMethylation27
Beadchip Microarray, developed by Illumina.RTM., Inc., San Diego,
Calif.). The DNA methylation data was then merged with existing
gene expression data. Candidates that have high potential to
discriminate CD3.+-.T cells from non-T cells were then further
analyzed with two different methylation validation methods
(pyrosequencing and quantitative methylation specific PCR i.e.
MethylLight). Finally, a quantitative calibration curve was
developed by diluting known and measured numbers of CD3+ T cells
into a background matrix of fully methylated lymphocyte DNA. The
latter procedure reconstructs the conditions of detection that are
present in differentiating CD3+ T cells from a mixture of cells in
a complex biological sample.
[0185] The methods described herein use individual samples of
sorted, normal, human, peripheral blood leukocytes shown in Table
15, Example 13, purchased from AllCells.RTM., LLC (Emeryville,
Calif.). These leukocytes were sorted in a column containing
antibody-conjugated magnetic beads through a combination of
positive and negative selection. DNA from the leukocytes was
extracted according to manufacturer's protocol using the DNeasy
Blood & Tissue kit (Qiagen), and subjected to Bisulfite
conversion by treatment with sodium bisulfite using the EZ DNA
Methylation Kit (Zymo) following the manufacturer's protocol,
thereby converting unmethylated cytosine residues to uracil and
leaving methylated cytosine residues intact. DNA methylation is
measured using a DNA methylation microarray as described in Example
13.
[0186] Huehn et al. (U.S. patent publication number 2007/0269823
A1) describes a method for identifying FoxP3-positive regulatory T
cells by analyzing the methylation status of CpG positions in the
FOXP3 gene, and further describes a method for diagnosing immune
status of a mammal by measuring amounts of regulatory T cells thus
identified. CpG methylation analysis of FoxP3 gene is also used to
determine the quality of in vitro generated T regulatory cells and
for identifying chemical or biological substances that modulate the
expression of the FOXP3 gene in T cells. Specific CpG positions in
the mouse FoxP3 gene are identified for analyzing methylation
status and primers for amplifying mouse and human CpG dense regions
in FOXP3 gene are described.
[0187] Olek (U.S. patent publication number 2007/0243161 A1)
describes a method for pan-cancer diagnostics involving
identification of an amount and/or proportion of stable regulatory
T cells in a patient suspected of having cancer by analyzing
methylation status of CpG positions in the FOXP3 and/or camta1
genes. Increased amount/proportion of stable regulatory T cells in
the patient is indicative of an unspecified cancerous disease. A
method of treating cancer by reducing the amount or proportion of
stable regulatory T cells and a method for diagnosing survival of a
cancer patient by measuring T regulatory cell amounts and/or
proportions in patients suspected of having cancer using CpG
methylation analysis of FoxP3 and/or camta1 genes are described.
Increased amounts and/or proportions of stable regulatory T cells
in the cancer patient is indicative of a shorter survival.
[0188] Olek et al. (International publication number WO 2010/069499
A2) describes a method of identifying T-lymphocytes, in particular
CD3+CD4+ and/or CD3+CD8+ cells by analyzing the methylation status
of CpG positions in one or more of genes for CD3 multi-protein
complex CD3 .gamma., -.delta. and -.epsilon., or in other genes.
Demethylation is indicative of a CD3+ cell. Olek further describes
methods for methylation analysis of CpG positions in CD4+ and/or
CD8+ genes, in particular CD8 beta gene, or in other genes, and for
determining immune status based on T-lymphocytes identified by
methylation analyses, and for monitoring amounts of T-lymphocytes
in response to chemical and/or biological substance exposure, in
particular CD4+ or CD8+ T lymphocytes.
[0189] Shen-Orr et al. 2010, Nature Methods Vol. 7:4, 287-289
describes a cell-type specific significance analysis of microarrays
for analyzing differential gene expression for each cell type in a
biological sample from microarray data and relative cell type
frequencies. In Shen-Orr's method relative abundance of each cell
type in a mix tissue sample is first quantified, and this
information is used in combination with microarray gene expression
data to deconvolve and compare cell type-specific average
expression profiles for groups of mixed tissue samples.
[0190] Abbas et al. 2009, PLoS One Vol. 4:7 e6098 describes
deconvolution of microarray gene expression data to characterize
proportions of cells in a tissue, and further identifies cellular
activation patterns in Systematic Lupus Erythematosus.
[0191] A method similar to regression calibration is provided
herein for determining changes in the distribution of white blood
cells between different subpopulations (e.g. cases and controls)
using DNA methylation signatures or DNA methylation profiles, in
combination with an external validation set having methylation
signatures from purified leukocyte samples. The method is
demonstrated with Head and Neck Squamous Cell Carcinoma (HNSCC)
cases and matched controls, showing that DNA methylation signatures
register known changes in CD4+ and granulocyte populations.
[0192] Use of DMRs as markers of immune cell identity is employed
herein with a high density methylation platform, and a set of
analytical tools for estimating the proportions of immune cells in
unfractionated whole blood to determine the DNA methylation
signature of each of the principal immune components of whole blood
(B cells, granulocytes, monocytes, NK cells, and T cells subsets).
A form of regression calibration was determined that considers a
methylation signature as a high-dimensional multivariate surrogate
for the distribution of white blood cells. This distribution was
used to predict or model disease states. As a surrogate, the DNA
methylation signature was assumed to be a highly correlated measure
of leukocyte distribution, and thus fits into the framework of
measurement error models, in which the use of a noisy surrogate
marker to investigate an association with a disease outcome of
interest results in biased estimates, unless internal or external
validation data are obtained to "calibrate" the model and correct
the bias (Carroll et al., 2006, Measurement error in nonlinear
models. Chapman & Hall, Boca Raton, Fla., 2.sup.nd
edition).
[0193] In this case, the problem was complicated by the extremely
high dimension of the surrogate. Measurement error problems are
formulated as a set of relationships between z, the disease outcome
(e.g. case/control status), .omega., the gold standard (e.g.
leukocyte distribution), and y, the surrogate (e.g. DNA
methylation). The concept E(z|.omega.), was difficult to estimate
due to the cost or logistical complications involved in obtaining w
in a large number of samples. Sufficient data for modeling
E(z|y)=f(y) were collected, which provides information about
E(z|.omega.) through the (often imperfect) association
E(y|.omega.)=g(.omega.), which is inferred from an external
validation sample (Thurston et al., 2003, J Stat Plan Inf, 113,
527-34; Carroll et al., 2006, Measurement error in nonlinear
models. Chapman & Hall, Boca Raton, Fla., 2.sup.nd edition). An
additional assumption was that E(z|.omega.,y)=E(z|.omega.), i.e.
the surrogate provides no information about disease above and
beyond the standard for which it serves as a surrogate. The
high-dimensional nature of y renders f(y) difficult to formulate.
Although multivariate methods of measurement error correction
exist, even in a high-dimensional context (e.g. Li and Yin, 2007,
Ann Stat, 35, 2143-72) an explicit specification of f(y) is
important, which becomes unwieldy as each component of y
contributes a small amount of information about z, and both
dimension-reduction strategies and constrained regression
strategies entail substantial loss of information. In the present
context, specification of y=f(z) is natural and straightforward.
Consequently, a reversal of the modeling equation is here provided,
formulating y=f(z) as part of the modeling strategy, and linking
the linear functions f and g in a manner that admits the estimation
of .omega.. In methods herein several major sources of possible
bias were identified and methods provided for control and
subjection to sensitivity analysis of the sources of the bias.
[0194] Examples herein include methods for an estimation technique,
theoretical treatment of bias, and a demonstration of the approach
through an application to whole blood specimens collected in an
example of head and neck squamous cell carcinoma (HNSCC). See FIG.
3. Also provided are methods for a sensitivity analysis,
demonstrating the impact of possible biases. Simulation study
results are shown in examples herein based on the biology in the
samples used.
[0195] Examples 1-3 herein show a method for determining changes in
distribution of white blood cells between different subpopulations
(e.g. cases and controls) from DNA methylation signatures, assuming
an external validation set consisting of methylation signatures
from purified white blood cell (WBC) samples exists. Examples 4, 10
and 11 herein demonstrate the methodology using a data set of HNSCC
cases and matched controls, inferring from DNA methylation assays
alone known changes in CD4+ and granulocyte populations between
cases and controls and change in CD4+ populations due to aging.
Using previous methods flow cytometry would have been necessary to
obtain the same results. A method for assessing the sensitivity of
the magnitude estimates to possible biases is also provided.
Example 12 validates the method through simulation.
[0196] Methods are provide herein for determining changes in the
distribution of white blood cell types between different human
populations (e.g. cases and controls) using DNA methylation
signatures; by using an external validation set having methylation
profiles from purified white blood cell components. DNA methylation
in peripheral blood was accordingly shown to be a biomarker for
clinical and epidemiological investigation. Studies have attempted
to distinguish cancer cases from controls using whole peripheral
blood assayed with DNA methylation arrays, including ovarian
(Teschendorff et al., 2009, PLoS ONE 4, e8274), bladder (Marsit et
al., 2011, J Clin Oncol 29, 1133-1139), and pancreatic (Pedersen et
al., 2011, PLoS ONE 6, e18223) cancers. Although these studies have
demonstrated discrimination of cases from controls, sound evidence
for a biological mechanism has been elusive. Presumably, disease
associated alterations in blood methylation have several
etiological components driven by endogenous genetic, environmental
and disease specific factors. From known developmental associated
differences in DNA methylation among specific blood cell types,
changes in the distributions of blood cell types alone could
account for disease associated DNA methylation. The many diverse
types of immune cells in blood make this issue highly complex and
problematic to tackle using single cell type assays. Therefore, it
is important for the development of this new avenue of biomarker
research to delineate effects due to the immune cell distribution
itself from other "non cell type" alterations in DNA methylation.
The differences among human populations attributed to cell
distributions are termed "immunologically mediated".
[0197] Immunological explanations for differences in mRNA profiles
between cases and controls have been proposed, e.g. Showe et al.,
2009, Cancer Res 69: 9202-10 and Kossenkov et al., 2011, Clin
Cancer Res 17: 5867-77. The statistical principles described in the
method herein apply to mRNA expression profiles and an appropriate
validation set S.sub.0 based on mRNA expression arrays. Little to
no modification of mathematical expressions and computer code is
necessary to apply the statistical principles described in the
method herein to analysis of mRNA expression profiles. Under the
assumption that the upstream epigenetic control mechanisms are more
biologically stable, less variability in measurement of DNA
methylation is expected compared with measurement of mRNA
expression.
[0198] In the methods herein, a solution to partition this
component of variation in methylation from other determinants
employs multivariate analytic tools including regression
coefficients, associated inference, and coefficients of
determination measures. These tools were used to evaluate whether
the observed DNA methylation differences were due to an
immunologically mediated response. Prior measurement error
formulations (Thurston et al., 2003, J Stat Plan Inf; 113, 527-34;
Li and Yin, 2007, Ann Stat, 35, 2143-2172) require specification of
a logistic regression model for case/control status, conditional on
DNA methylation signature, a computationally difficult task that is
vulnerable to model mis-specifications. A reverse formulation was
used herein that naturally models the relationship of DNA
methylation conditional on known phenotypes. The formulation
respects the protocol (DNA methylation assay data collected after
sampling from phenotype groups). Other strategies to formulate
errors were found to be unsuccessful. For example, the strategy
utilizing Expectation-Maxinlization (EM) algorithm to integrate
over the missing data .omega. (Little and Rubin, 2002, Statistical
Analysis with Missing Data. Wiley, Hoboken, N.J., 2.sup.nd edition)
is outside the measurement error literature and within the larger
missing-data literature. However, by design, the distribution of
.omega. varied substantially between the data sets S.sub.0 and
S.sub.1, severely complicating the approach, with side-effect of
introducing feedback from S.sub.1 to S.sub.0, contaminating the
gold-standard status of S.sub.0. Another alternative that was found
to be unsuccessful was the simpler approach of an empirical Bayes
procedure, similar to existing mixture-model approaches (Koestler
et al., 2010, Bioinformatics, 26, 2578-2585). However, difficulty
in specifying the distribution of .xi. rendered this approach
untenable, and in a separate simulation, attempts to impute .omega.
among S.sub.1 samples using parameters obtained from S.sub.0
samples resulted in extremely biased estimates of .omega..
[0199] Examples herein show that group level comparisons of blood
cell DNA methylation revealed significant immune alterations.
Methods for individual level immune cell profiling are applicable
also, since methods herein are useful also to clinical and detailed
analytical epidemiologic applications that examine individual risk
factor information. When z.sub.1i involves an orthogonal (e.g.
one-way ANOVA) parameterization and ordinary least squares (OLS) is
used to obtain B.sub.1, then equation 5 (Example 3) herein reduces
to simple expressions involving the projected quantities
.omega..sub.i=y.sub.1iB.sub.0(B.sub.0B.sub.0).sup.-1. For
exploratory purposes, projections .omega..sub.i serve as estimates
of individual profiles. There is interest in minor immune cell
fractions and their role in disease, though the signal strength of
cell types comprising <5% of the total white cell compartment is
difficult to quantitate. Examples of such cell types include the
regulatory T cell or NK cell fractions, which are implicated in
autoimmune and malignant diseases. Optimization of platforms for
technical sensitivity to minor subtypes combined with statistical
optimization of signature recognition are needed to enhance the
approach for testing highly targeted immune hypotheses.
[0200] In addition to group level comparisons of blood cell DNA
methylation, immune cell profiling at the individual level is
important for examining individual risk factors in clinical and
detailed analytical epidemiologic applications. As shown in
Examples herein, individual immune profiles are theoretically
achievable and require extensive validation with a wide array of
mixture combinations.
[0201] The methods herein have potentially far reaching
implications for rapid, simple and complete assessment of the
composition of human white blood cell populations, i.e. the immune
profile. Currently, assessment of the cellular composition of
peripheral blood cannot be accomplished without the use of freshly
drawn venous blood that is immediately prepared in a specially
equipped laboratory. A complete assessment of the entire immune
profile requires extensive flow cytometric measurements based on
protein epitopes on leukocyte membranes that distinguishes subtypes
of immune cells that are either too rare or too similar in
appearance to be distinguished using simple microscopic approaches.
In particular, flow cytometry is limited by the following: cells
must be separated, requiring large volumes of fresh cells;
detection can be accomplished only by the fluorescent antibody tags
available, which require expensive technology to read; the outer
cell membrane must be intact, mandating limited utility in many
instances.
[0202] In contrast, using the methods herein, the application of
labor-intensive or expensive steps is required only in the
construction of the validation set S.sub.0, which need only be
developed once. Once S.sub.0 is available, subsequent interrogation
is based on the chemically stable CpG methylation of DNA. Thus the
methods herein obviate the need for fresh blood and the
preservation of labile protein epitopes. The methods herein are
able to also simultaneously assess the individual components of the
peripheral blood using a highly multiplexed molecular platform and
therefore logistically straightforward. Furthermore, the
statistical methodology used here is implemented easily with the
instrumental output of the methylation arrays, which simplifies the
interpretation of the immune profile data from the operator's point
of view. The methods herein are immediately deployed in a research
framework to cost effectively assess human immune profiles (in
fresh or archival samples), to explore the potential of the immune
profiles to function as biomarkers, and to address key questions
regarding disease pathogenesis. Furthermore, the approach used in
the methods herein is readily suited for rapid translation to a
broad base of clinical applications such as disease monitoring,
diagnosis, prognosis, and response to therapy.
[0203] The methods herein are applied to tumor biopsies for immune
characterization of cancer patients. Other notable applications
exist including the application of the test to urine sediments in
patients with autoimmune and diabetic kidney disease or in patients
undergoing kidney transplantation. Positive detection of T cells in
urine sediment is indicative of immune activation and potential
kidney disease progression or acute rejection in the context of
kidney transplantation.
[0204] Populations of blood lymphocytes can be distinguished
morphologically on the basis of size and the presence of a granular
cytoplasm.
[0205] Small lymphocytes, including subsets of T- and B cells, are
responsible for adaptive immune responses. Sublineages of small
lymphocytes are morphologically indistinguishable and are
distinguished by cell surface receptors and cellular function. B
cells are typically distinguished by expression of the surface
molecule CD 19. They express immunoglobulins, which are surface
receptors for pathogens. In addition, B cells are capable of
further differentiating into effector cells called plasma cells.
(Parham, P. The Immune System, Garland Science, New York, N.Y.,
2005). Differentiated T cells exhibit a complex of surface
molecules which function as antigen receptors, referred to as the T
cell receptor (TCR) complex. This complex includes the TCR .alpha.
plus .beta., or .gamma. plus .delta. antigen recognition chains,
which are associated with invariant chain subunits CD3.gamma.,
.delta., .epsilon., and .zeta.. (Zhang, Z., et al. 2007, Blood 109,
4328-4335). In general, T cells are distinguished from other cell
lineages by expression of CD3 molecules on the cell surface. The
genes that encode CD3 .gamma., .delta., .epsilon., and .zeta.
subunits are CD3G, CD3D, CD3E and CD3Z respectively. The former
three genes are tightly clustered on chromosome 11, whereas CD3Z is
located on chromosome 1. Differentiated T cells are further divided
into two lineages depending on their expression of either CD4 or
CD8. The main function of CD8+ T cells, also known as cytotoxic T
cells, is to kill infected and transformed cells. The main function
of CD4+ T cells is to help other immune cells respond appropriately
to sources of infection or malignancy There are several subsets of
CD4+ T cells, including Th1, Th2, Th17 and regulatory T cells.
(Parham, P. The Immune System, Garland Science, New York, N.Y.,
2005). Regulatory T cells suppress an immune response by
influencing the activity of other cell types. They act primarily in
the periphery on mature lymphocytes that have exited the main
lymphoid tissues and serve as a means of preventing autoimmunity
during protective immune responses. Exemplary regulatory T cells
are thymus-derived CD4+CD25+Foxp3+ T cells, commonly referred to as
Tregs. (Zou, W. 2006, Nat Rev Immunol 6, 295-307). These cells
primarily function to maintain peripheral self-tolerance. (Cesana,
G. C., et al., 2006, J Clin Oncol 24, 1169-1177). Forkhead Box P3
(FOXP3), a transcription factor expressed by Tregs, is an important
developmental and functional factor that regulates Treg
immunosuppressive functions. (Janson, P. C., Winerdal, M. E. &
Winqvist, O. 2009, Biochim Biophys Acta 1790, 906-919; Zou, W.
2006, Nat Rev Immunol 6, 295-307).
[0206] Natural killer (NK) cells are large CD56+ lymphocytes with a
granular cytoplasm. They enter infected or malignant tissue to kill
damaged cells and secrete cytokines aimed at preventing the spread
of disease to other cells or tissues. Thus, NK cells act as
effector cells of innate immunity. A subset of CD56+ NK cells that
express CD3 surface molecules are NKT cells.
[0207] To determine if distinct methylation profiles are indeed
associated with leukocyte lineages, statistical clustering of
methylation patterns was performed using a modified model-based
form of unsupervised clustering known as recursively partitioned
mixture modeling (RPMM). (Houseman, E. A., et al. 2008, BMC
Bioinformatics, 2008, 9, 365).
[0208] A locus by locus comparison was performed in which putative
leukocyte DMRs were identified from Infinium data in SAS version
9.1 using a macro for locus-by-locus linear modeling that adjusts
for control probe and beadchip plate. Infinium beta values for
Group 1 leukocyte samples were compared to Infinium beta values for
Group 2 leukocyte samples, in which group membership for each phase
of the comparison is shown in Table 1.
TABLE-US-00001 TABLE 1 Locus by locus comparison groups Group 1
Leukocytes Group 2 Leukocytes Phase I CD3+, Pan-T, CD4, NK, B,
Mono, Gran, Neut Treg, CD8 Phase II NK Pan-T, CD4, Treg, CD8, B,
Mono, Gran, Neut Phase III CD8 CD4, Treg, NK, B, Mono, Gran,
Neut
[0209] Resultant t-values from each comparison were converted to
p-values in R version 2.11.1 of Illumina's software which provides
convenient mechanisms for loading and analyzing the results of
methylation status, and for quality control and basic visualization
tasks.
[0210] False discovery rate estimation and Q-values were computed
by the Q-value package in R to adjust for multiple comparisons.
(Significance was characterized as Q.ltoreq.0.05.)
[0211] For significant CpG loci (Q.ltoreq.0.05), a negative t-value
indicates the locus putatively represents a DMR that is
unmethylated in group 1 leukocyte lineage(s) and methylated in
group 2 leukocyte lineage(s). Conversely, a positive t-value
indicates that the locus putatively represents a DMR that is
methylated in group 1 leukocyte lineages and unmethylated in group
2 leukocyte lineages. A DMR that is unmethylated in the leukocyte
lineage(s) of interest and methylated in other leukocyte lineages
would make the best epigenetic biomarker, since unmethylation is
associated with transcriptional activity whereas methylation is
associated with transcriptional silencing. Therefore, significant
CpG loci exhibiting negative t-values are preferred.
[0212] In the methods herein, results of locus by locus comparisons
were merged with cell type specific gene expression data. (Palmer
et al., 2006, BMC Genomics 7, 115; Du et al., 2006, Genomics 87,
693-703; and Hashimoto et al., 2003, Blood 101, 3509-3513) to
identify putative DMRs that are in genes associated with altered
expression by Group 1 leukocyte lineages compared to Group 2
leukocyte lineages. An exemplary candidate epigenetic biomarker of
a specific leukocyte lineage is an unmethylated region of a gene
that is highly expressed by the leukocyte lineage, and not
expressed by other cell types such as lineage-specific surface
molecules, obligate differentiation proteins, and secreted factors.
A further candidate is a methylated region of a gene that is not
expressed by the leukocyte lineage and is expressed by other cell
types. Without being limited by any theory or mechanism of action
scenarios correlate with chromatin packaging, so that differential
DNA methylation plays a large role in regulating leukocyte lineage
specific expression of the gene. If no leukocyte lineage specific
difference in expression of the gene containing a putative DMR were
observed, other modes of gene regulation such as activators,
repressors, and enhancers overshadow the role of chromatin
packaging in regulating expression of the gene. Alternatively, such
a gene is expressed in a temporally or environmentally specific
manner that was not elucidated by the gene expression candidate
data. Such a putative DMR would not be an ideal target to explore
as an epigenetic biomarker of that leukocyte lineage.
[0213] In the methods described herein DMR validation is performed
for each putative DMR identified from array data using bisulfite
pyrosequencing and/or MethyLight quantitative real time PCR assays
that measure DNA methylation of the gene region in sorted human
leukocyte samples shown in Table 15, Example 13. Bisulfite
pyrosequencing assays were designed using Pyromark Assay Design 2.0
(Qiagen), and carried out on a Pyromark MD pyrosequencer running
Pyromark qCpG software (Qiagen). Oligonucleotide primers were
obtained from Invitrogen.TM. by Life Technologies.TM.. The gene
region of interest were PCR amplified from bisulfite converted DNA
using a biotinylated reverse primer and an unlabelled forward
primer. The biotinylated PCR product was complexed with sequencing
primers that anneal upstream from the target region, and was then
incubated with enzymes and substrates. Then, dNTPs were dispensed
in a specific order and light emitted with the incorporation of
each nucleotide is measured with a CCD camera. Methylation was
quantified by calculating the ratio of cytosine (methylated) to
thymine (unmethylated) at each CpG locus.
[0214] In the methods described herein methylation status of
specific gene regions was calculated using MethyLight according to
the protocol described by Campan et al. 2009, Methods Mol Biol 507,
325-337, with the following modifications: C-less primers and probe
were used to determine total DNA input for each sample and control
reference rather than ALU-C4 primers and probe. To measure
unmethylation, control unmethylated DNA was used as a reference,
generating a percent unmethylated reference value which is
subsequently converted into percent methylation. Real time PCR
primers and flourescent (major groove binding) MGB probes were
obtained from Applied Biosystems (Foster City, Calif.). TaqMan.RTM.
Universal PCR Mastermix, no AmpErase.RTM. UNG was obtained from
Applied Biosystems, manufactured by Roche (Branchburg, N.J.).
Quantitative, real time PCR reactions were performed with Applied
Biosystems 7300 Real Time PCR System using Applied Biosystems 7300
system sequence detection software version
1.4.0.25.COPYRGT.2001-2006.
[0215] In the methods herein, a putative DMR identified as being
unmethylated in group 1 leukocytes based on Infinium methylation
data was shown using bisulfite pyrosequencing or MethyLight.RTM.
qPCR to be unmethylated in group 1 leukocytes and methylated in
group 2 leukocytes and the DMR was confirmed as an unmethylated
epigenetic biomarker specific to the group 1 leukocyte lineage(s).
A putative DMR shown using bisulfite pyrosequencing or
MethyLight.RTM. qPCR to be unmethylated in group 1 leukocytes and
in some group 2 leukocytes, was not confirmed as an epigenetic
biomarker specific to the group 1 leukocyte lineage(s). Instead
that DMR represents an epigenetic biomarker of several different
human leukocyte lineages including the group 1 lineage(s). A DMR
that is partially unmethylated by bisulfite pyrosequencing or
MethyLight.RTM. qPCR in group 1 leukocytes and methylated in group
2 leukocytes, is a weak epigenetic biomarker of the group 1
leukocyte lineage(s). That DMR is heterogeneously unmethylated in
group 1 leukocytes and is homogeneously methylated in group 2
leukocytes and is therefore not useful for distinguishing group 1
from group 2 leukocyte lineages.
[0216] If Infinium data suggested that a CpG locus represents a DMR
specific to group 1 leukocytes, and bisulfite pyrosequencing or
MethyLight qPCR did not find a difference in DNA methylation in
that region between group 1 and group 2 leukocyte samples, the
region was not considered a DMR that would serve as an epigenetic
biomarker of the group 1 leukocyte lineage(s).
[0217] These discovery platform criteria successfully identified a
unique heretofore unknown sequence of genomic DNA that is
specifically marked by CpG demethylation in CD3 positive T cells,
not in other hematopoietic peripheral blood cells (FIG. 10B). In
examples herein it is further shown the DNA methylation status of
this region in the promoter of CD3Z gene in sorted human peripheral
blood leukocytes measured by MethyLight.RTM. qPCR confirms that the
identified genomic sequence is an immune cell type specific
differentially methylated region that is a useful marker to
quantify CD3+ T cells in biological specimens such as whole or
separated blood and other tissues.
[0218] Gliomas are a histologically diverse cancer with few
established risk factors and poor prognoses (Kleihues et al. 1993,
Brain Pathol 3(3): 255-68; Ohgaki and Kleihues 2005, Acta
Neuropathol 109(1): 93-108; Louis et al. 2007, Acta Neuropathol
114(2): 97-109; Ohgaki, and Kleihues 2007, Am J Pathol 170(5):
1445-53). However, immune factors are associated with increased
glioma risk and are also thought to play a role in patient outcomes
(Wiemels et al. 2009, Int J. Cancer. 2009 Aug. 1; 125(3):680-7;
Yang et al. 2010, J Clin Neurosci 17(11): 1381-5). Patients with
glioblastoma multiforme (GBM) exhibit abnormalities (McVicar et
al., 1992, J Neurosurg 76(2): 251-60; Ashkenazi et al. 1997,
Neuroimmunomodulation 4(1): 49-56) of T cell response associated
with pronounced reductions in T cell numbers in peripheral blood
including the suppressive regulatory T cells (Tregs) (Fecci, et
al., 2006, Cancer Res 66(6): 3294-302). Despite low T cell and Treg
counts, the ratio of Tregs to T cells is clinically relevant in
immunosuppression. Currently there is no validated method to
quantify this ratio. The quantification of immunosuppression is
envisioned herein to help also in characterizing patient tumors. An
immunosuppressive environment in glioma is also suggested by the
accumulation of tumor infiltrating lymphocytes (TILs) displaying
markers of Tregs, (i.e. cell membrane CD4 and CD25 and
intracellular staining of the FOXP3 protein).
[0219] Epigenetic markers involving the demethylation of the FOXP3
gene have been determined to be the most specific marker of stable
Tregs. (Baron et al., 2007, Eur J Immunol 37(9): 2378-89; Floess et
al., 2007 PLoS Biol 5(2): e38; Polansky et al., 2008, Eur J Immunol
38(6): 1654-63). As described in examples herein, by combining
information about the FOXP3 differentially methylated region (DMR)
with methylation specific quantitative PCR (MS-qPCR) highly
sensitive and accurate counts of Tregs in blood and tissues were
obtained. Such DNA-based methods to interrogate specific
populations of T cell subsets are far less expensive than
flow-cytometry and can be applied to archival specimens. Examples
herein show that the DMR marker for CD3+ T cells identified herein
is used alone or in conjunction with the previously described Treg
DMR marker.
[0220] A quantitative assay for CD3+ T cells based on the
demethylation of the promoter of a component of the T cell receptor
complex: CD3Z (CD247) is also described herein. Examples herein
show the validity of CD3Z demethylation as a CD3+ T cell marker and
illustrate its application in patients with glioma that demonstrate
the high discriminating value of CD3Z demethylation in glioma
case-control subject comparisons, histopathological
characterization of tumors and patient prognosis.
[0221] An understanding of the role played by an altered immune
response in etiology facilitates development of more effective
therapies and prognostic indicators. Epidemiological studies
implicate atopic immune alterations in glioma risk (Wrensch et al.,
2005, Am J Epidemiol 161(10): 929-38; Schwartzbaum et al., 2010,
Carcinogenesis 31(10): 1770-7). Immune suppression and
abnormalities in T cells in glioma patients may prevent antitumor
immunity and poses barriers to effective immunotherapeutic
strategies (Grauer et al., 2007, Int J Cancer 121(1): 95-105;
Sonabend et al., 2008, Anticancer Res 28(2B): 1143-50). Data
obtained using novel T cell epigenetic assays described in examples
herein demonstrate dramatic decreases in CD3+ T cells and Tregs in
peripheral blood from GBM patients. The copy numbers of
demethylated CD3Z and FOXP3, as a percent of total leukocyte
copies, were observed to be reduced about two-fold in GBM patients,
which was highly statistically significant.
[0222] Validation studies herein support the notion that the CD3Z
MS-qPCR assay using unprocessed archival whole blood is an accurate
reflection of T cells as measured by conventional flow cytometry.
Previous studies have validated the FOXP3 demethylation assay as a
measure of Tregs in blood and tissues (Baron et al., 2007, Eur J
Immunol 37(9): 2378-89). Current steroid use (dexamethasone),
temozolomide and radiation exposures as possible factors in these
effects among cases were investigated but no significant
associations of any factor with these T cell alterations was found.
The methods described in examples herein that delineate T cell
subsets from DNA facilitate immune cell analyses using blood
specimens that have been archived in cohort populations with
long-term glioma follow-up data. Nested case control studies within
large epidemiologic cohorts are now feasible as a result, allowing
for the first time, to test whether T cell and Treg abnormalities
precede the diagnosis of glioma.
[0223] The balance of suppressive Tregs to total T cells in
peripheral blood has been reported to be shifted towards greater
suppression in GBM patients and other types of cancer (Beyer and
Schultze, 2006, Blood 108(3): 804-11). Ratio of Tregs/T cells in
association with cigarette smoking was examined herein. An
association of current smoking with higher Treg/T cell ratios was
observed. There is strong evidence that cigarette smoke exposure
leads to the accumulation of Tregs in respiratory airways in mice
(Brandsma et al., 2008, Respir Res 9: 17) and humans (Smyth et al.,
2007, Chest 132(1): 156-63) as well as in the gut epithelium of
exposed mice (Verschuere et al., 2011 Lab Invest. 91(7):1056-67).
Treg/T cell ratios were herein observed to be higher in current
smokers versus former smokers (FIG. 16). It was subsequently
confirmed in an independent population that current but not former
cigarette smoking exhibit higher Treg/T cell ratios. Results herein
illustrate the need for examination of patient characteristics to
include cigarette smoking in diseases that affect Treg levels. New
epigenetic methods described herein are useful in promoting these
types of studies.
[0224] Similar to many types of cancer CD4+ T helper cells and
Tregs have been shown to infiltrate the human glioma tumor
microenvironment (Nishikawa and Sakaguchi, 2010, Int J Cancer
127(4): 759-67). In glioma studies using IHC to quantify T cells in
FFPE preparations CD4+ T cell numbers were reported to increase
with tumor grade, whereas CD8+ T cells appear in equal frequencies
across glioma grades (Heimberger et al., 2008, Clin Cancer Res
14(16): 5166-72). Results herein indicate increased CD3Z
demethylated cells according to grade (FIG. 17).
Immunohistochemical IHC analysis herein showed that mostly these
cells were CD8+ cells with very few CD4+ cells. Examples herein
also show that ependymal tumor cells and some significant fraction
of grade II Oligodendrogliomas (OD) and Astrocytomas (AS) tumors
contain significant numbers of I cells and Tregs (FIG. 21). As
progression of lower grade to higher grade brain tumors is a common
and serious clinical problem results herein show that epigenetic
analyses are useful for characterizing low grade OD and AS tumors
as well as Ependymomas (EP). Compared to previous reports (El
Andaloussi and Lesniak, 2006, Neuro Oncol 8(3): 234-43; El
Andaloussi and Lesniak, 2007, J Neurooncol 83(2): 145-52;
Heimberger et al., 2008, Clin Cancer Res 14(16): 5166-72;
Heimberger et al., 2008, Neuro Oncol 10(1): 98-103) analysis herein
using the MS-qPCR showed significantly increased ratio of Treg/CD3+
Tcells within glioma tumor tissues of different pathological grade
(FIG. 17). Results herein showed also how the ratio of Tregs/CD3+
Tcells increases with tumor grade in comparison to blood. Thus,
until the present results, there was no evidence of a specific
accumulation of Tregs in human brain tumors. The survival data in
examples herein show significant associations of immune parameters
with patient survival (FIG. 22).
[0225] Without being limited by any theory or mechanism of action,
observations herein of a close linear relationship between flow
cytometry of CD3+ T cells and CD3Z demethylation that was identical
among glioma cases and controls argues against a cancer related
effect on CD3Z demethylation such as downregulation of CD3Z through
a posttranslational effect on CD3Z proteins mediated by up
regulation of lysosomal or proteasomal degradation pathways.
Another issue concerning the validity of CD3Z demethylation as a
CD3+ T cell marker in cancer tissues is that DNA demethylation may
take place in transformed cells and thus `mimic` a lymphocyte
signal. To ascertain that the observed CD3Z demethylation was
taking place in CD3+ T cells and not due to DNA demethylation
taking place in transformed cells CD3Z and FOXP3 demethylation in
brain tumor cells lines and in human GBM xenografts which cannot
contain human T cells was assessed. These samples contained
non-detectable levels of CD3Z or FOXP3 demethylation. Normal brain
tissue was also uniformly devoid of T cell signals, consistent with
the specificity of the MS-qPCR in tumor as reflecting infiltration
of immune cells. Some subtypes of NK cells
(CD56.sup.dimCD16.sup.bright) utilize CD3Z in NK receptor signaling
(Lanier, 2006, Trends Cell Biol 16(8): 388-90). The contribution of
CD3Z expressing and demethylated NK cells to the overall CD3Z
demethylated signal in peripheral white blood cells is estimated to
be very small. Furthermore, NK cells have not been observed in
glioma tissues.
[0226] The fundamental innovation in the epigenetic analyses
described herein is a shift in immunodiagnostics away from
proteomic-based approaches to one that is based on quantifying cell
type specific DNA methylation events. This new approach produces
gains in versatility, sensitivity, feasibility and throughput
compared with conventional flow cytometry or IHC and does so at a
lower cost. The high chemical stability of cytosine methylation
marks within genomic DNA and the fact that differentiation within
the immune system is tightly linked with gene specific DNA
methylation events makes quantification of immune cells through
epigenetic analyses a unique approach. The method combines the
intrinsic chemical stability of DNA with the high sensitivity of
qPCR methods. Automation and liquid robotic handling in processing
and analysis add further to the power of the methodology and open
avenues for investigations in the immunoepidemiology of glioma and
many other diseases.
[0227] Methods herein show that blood-based DNA methylation
signatures across a complex cellular mixture of WBCs are useful for
distinguishing solid tumor cancer cases in which there are
well-defined immune-mediated responses and controls. As
tumorigenesis elicits a distinct immune response (Camilleri-Brot S
et al., 2004, Ann Oncol 15:104-112; Wang Yet al., 2005, Am J Clin
Pathol 124:392-401; Rui Let al., 2011 Nat Immunol 12:933-940), the
result is a hematopoietic shift in WBC populations, which can be
precisely discerned by applying the unique epigenetic signature of
differing lineages. The aggregate methylation signature in blood
that distinguishes cancer cases from controls corresponds to the
epigenetic signatures that define leukocyte subtypes.
[0228] To understand the role of immune-mediated responses to
tumorigenesis in defining distinct signatures of blood-based DNA
methylation between cancer cases and cancer-free controls in
examples herein, the epigenetic landscape of WBCs was obtained by
identifying DMRs among leukocyte subtypes. This analysis revealed
that the majority of the highest ranking 50 leukocyte DMRs (Example
25) were differentially methylated between disease cases and normal
controls for HNSCC and ovarian cancers, with a smaller fraction
differentially methylation between bladder cancer cases and
controls. Among the eight overlapping CpG loci that were found to
be significantly differentially methylated between cancer cases and
controls across the three data sets, the direction of the
relationships was similar for HNSCC and ovarian cancer cases
compared to controls. These findings show that HNSCC and ovarian
cancer elicit similar shifts in leukocyte compositions in the
hematopoietic system.
[0229] Of the seven overlapping DMRs (CD72, PACAP, FGD2, SLC22A18,
GSTP1, NFE2, ASGR2) several are located within genes with either
established or alleged involvement in immune differentiation or
function, viz., CD72, PACAP and FGD2 (Kumanogoh and Kikutani, 2001,
Trends Immunol 22:670-676; Parnes and Pan, 2000, Immunol Rev
176:75-85; Tan et al., 2009, Proc Natl Acad Sci 106:2012-2017;
Huber C et al., 2008, J Biol Chem 283:34002-34012). CD72, a member
of the C-type lectin superfamily, negatively regulates B cell
coreceptor signaling (Kumanogoh and Kikutani, 2001) and has been
shown to act as a unique inhibitory receptor on NK cells regulating
cytokine production (Alcon V L et al., 2009, Eur J Immunol
39:826-832). Moreover, PACAP has been implicated as an intrinsic
regulator of regulatory T cell abundance after inflammation36 and
FGD2 has been shown to play a role in leukocyte signaling and
vesicle trafficking in cells specialized to present antigen in the
immune system (Huber C et al., 2008, J Biol Chem
283:34002-34012).
[0230] In the model described herein containing the DNA methylation
profile for the highest ranking 50 leukocyte DMRs, patient age,
gender, smoking status, smoking pack years, weekly alcohol
consumption, and HPV serological status (Table 19, Example 13),
HNSCC cancer was predicted with high degree of sensitivity and
specificity. Similarly high prediction performance was obtained for
ovarian cancer using the DNA methylation profile for the highest
ranking ten leukocyte DMRs and patient age group. Prediction
performance for bladder cancer, based on the methylation profile of
the highest ranking 56 DMRs, patient age, gender, smoking status,
smoking pack years, and family history of bladder cancer, was lower
than that observed for HNSCC and ovarian cancer. One explanation
for the differences in magnitude for discriminating cancer cases
and controls among cancer types is underlying differences in the
magnitude of shift in leukocyte subtypes. Cancers characterized by
a pronounced immunologic response such as HNSCC and ovarian cancer
(Alhamarneh O et al., 2008, Head Neck 30:251-261; Zhang L et al.,
2003, N Engl J Med 348:203-213; Tomsova M et al., 2008, Gynecol
Oncol 108:415-420; Sato E et al., 2005, Proc Natl Acad Sci
102:18538-18543; Curiel T J et al., 2004, Nat Med 10:942-949),
correspond to more discernable shifts in leukocyte sub-population,
thus resulting in greater discrimination of blood-derived DNA
methylation using leukocyte DMRs for these cancers compared to
bladder cancer.
[0231] Substantial correlation was also obtained in methylation of
the loci identified via the semi-supervised recursively partitioned
mixture model (SS-RPMM) analyses and the leukocyte DMRs that
defined the methylation classes discovered for the HNSCC and
ovarian data sets. A diagram illustrating the analytic framework
for SS-RPMM is provided in FIG. 32. The SS-RPMM25 procedure is
specifically designed to construct methylation classes that are
based on an optimal number of informative features (loci whose
methylation is most strongly associated with cancer case/control
status). The results demonstrate that the methylation classes
identified through SS-RPMM for the HNSCC and ovarian data sets are
in large part due to systematic hematopoietic changes in WBC
populations in response to tumorigenesis. The 56 leukocyte DMRs
used in the bladder profile analysis were less correlated with the
nine CpG loci identified via the previously reported SSRPMM
analysis of this data set (Marsit C J et al., 2011, J Clin Oncol
29:1133-1139). Alternative biological epigenetic mechanisms may be
operative in bladder cancer in addition to the epigenetic
signatures characteristic of leukocyte subtypes, and contribute
independently to the blood-derived differences in DNA methylation
between bladder cancer cases and controls.
[0232] Examples herein provide evidence that observed differences
in blood-derived DNA methylation in cancer cases are largely
explained by systematic differences in the methylation signatures
of leukocyte sub-populations. These findings signify that different
cancers elicit a discernible, unique immune response evident in
peripheral blood. These results have important implications for
research into the immunology of cancer. Further, the approach of
observing differences in blood derived DNA methylation provides a
completely novel tool for the study of the immune profiles of
diseases where only DNA can be accessed; that is, this approach has
utility not only in cancer diagnostics and risk-prediction, but can
also be applied to future research (including stored specimens) for
any disease where the immune profile holds medical information. The
approach represents an extremely simple, yet truly powerful and
important new tool for medical research and may serve as a catalyst
for future non-invasive disease diagnostics.
[0233] Natural killer (NK) cells are a key element of the innate
immune system implicated in human cancer. To examine NK cell levels
in archived blood samples from a study of human head and neck
squamous cell carcinoma (HNSCC), a DNA-based quantification method
described in methods herein was developed (Examples 27-36).
[0234] Head and neck squamous cell carcinoma (HNSCC) is strongly
associated with alterations in the immune system and it is
postulated that progression of HNSCC tumors is linked to immune
evasion or failure of the immune system to fight the cancer (Duray
A, et al., 2010, Clinical & developmental immunology,
2010:701657; Pries R, and Wollenberg B, 2006, Cytokine Growth
Factor Rev, 17:141-6; Wulff S et al., 2009, Anticancer research,
29:3053-7; Kuss I et al., 2004. Clin Cancer Res, 10:3755-62; Kuss I
et al., 2005, Adv Otorhinolaryngol, 62:161-72). Natural killer (NK)
cells are of particular interest in the context of HNSCC and other
cancers, since they are able to recognize and destroy pre-cancerous
and malignant cells (Kim R et al., 2007, Immunology, 121:1-14;
Ostrand-Rosenberg S. 2008, Curr Opin Genet Dev, 18:11-8; Whiteside
T L, 2006, Cancer Treat Res, 130:103-24; Parham P. The Immune
System. 2nd ed. New York, N.Y.: Garland Science; 2005). Natural
killer cell infiltration into solid tumor tissue has been
associated with improved survival in studies of many different
types of cancer (Ishigami S et al., 2000 Cancer, 88:577-83; Kondo E
et al., 2003, Dig Surg, 20:445-51; Villegas F R et al., 2002, Lung
Cancer 2002; 35:23-8). Immune suppression is frequently seen in
patients with head and neck cancer (Duray A, et al., 2010, Clinical
& developmental immunology, 2010:701657; Pries R, and
Wollenberg B, 2006, Cytokine Growth Factor Rev, 17:141-6; Wulff S
et al., 2009, Anticancer research, 29:3053-7; Kuss I et al. 2004.
Clin Cancer Res, 10:3755-62; Kuss I et al., 2005, Adv
Otorhinolaryngol, 62:161-72). Diminished NK cell and natural killer
T (NKT) cell activity and number have been observed in the
peripheral blood of patients with HNSCC (Wulff S et al., 2009,
Anticancer research, 29:3053-7; Moiling J W et al., 2007, J Clin
Oncol, 25:862-8).
[0235] A novel DMR is identified herein that distinguishes NK cells
from other leukocytes to facilitate the quantification of NK cells
in archived blood samples from a case control study of HNSCC. Many
chemical exposures, such as tobacco and alcohol, as well as viral
factors, such as human papilloma virus (HPV), are known or
suspected to be causal factors in HNSCC (Furniss C S et al., 2009
Annals of oncology: official journal of the European Society for
Medical Oncology/ESMO, 20:534-41; Applebaum K M et al., 2007,
Journal of the National Cancer Institute, 99:1801-10) and may
independently affect immune profiles (Mehta H et al., 2008,
Inflammation research, 57:497-503; Wansom D et al., 2010, Archives
of otolaryngology--head & neck surgery 2010; 136:1267-73; Gao B
et al., 2011 American journal of physiology Gastrointestinal and
liver physiology 300:G516-25). Unlike previous studies, data shown
herein evaluates the effects of these factors on the depression in
NK immune profile. Patient risk factors and disease characteristics
(e.g. tumor location) are evaluated herein in relationship to NK
cells to determine the independent associations of HNSCC with
innate immune parameters.
[0236] NK cell-specific DNA methylation was identified by analyzing
DNA methylation and mRNA array data from purified blood leukocyte
subtypes (NK, T, B, monocytes, granulocytes), and confirmed via
pyrosequencing and methylation specific quantitative PCR (MS-qPCR).
NK cell levels in archived whole blood DNA from 122 HNSCC patients
and 122 controls from a study population were assessed by MS-qPCR.
Details of this study population have been previously described
(Applebaum K M et al., 2007, Journal of the National Cancer
Institute, 99:1801-10). Briefly, peripheral blood from 122 control
donors and 122 HNSCC patients was collected between December 1999
and December 2003 in the greater Boston area. Population based
control subjects with no prior history of cancer were from the same
region as cases, and were frequency matched on age and gender.
Study approval was obtained from the Brown University Institutional
Review Board. Subjects provided written informed consent for
participation in this study. Venous anticoagulated whole blood was
drawn into sodium citrate and stored at -20.degree. C. prior to DNA
isolation.
[0237] Pyrosequencing and MS-qPCR (FIG. 39) confirmed that a
demethylated DNA region in NKp46 distinguishes NK cells from other
leukocytes, and serves as a quantitative NK cell marker.
Demethylation of NKp46 was significantly lower in HNSCC patient
blood samples compared with controls (p<0.001). Individuals in
the lowest NK tertile had over 5-fold risk of being a HNSCC case,
controlling for age, gender, HPV16 status, cigarette smoking,
alcohol consumption, and BMI (OR=5.6, 95% CI: 2.0, 17.4) (FIG. 37).
Cases did not show differences in NKp46 demethylation based on
disease treatment or tumor site.
[0238] The results of this study indicate a significant depression
in NK cells in HNSCC patients that is unrelated to exposures
associated with the disease. DNA methylation biomarkers of NK cells
represent an alternative to conventional flow cytometry that can be
applied in a wide variety of clinical and epidemiologic settings
including archival blood specimens.
[0239] Understanding of immune cell level alterations associated
with cancer and other diseases has, until now, been restricted by
the limitations of immunodiagnostic methods. Described herein is a
new method for measuring NK cell levels in human blood and tissue
based on cell-lineage specific DNA methylation that can be applied
to samples regardless of handling and storage procedures. This is a
step forward in immune cell detection and quantification that is
applicable to many types of clinical samples. Applying the method
to a case-control study of HNSCC (Examples 27-36) revealed a
case-associated decrease in circulating NK cells that is
independent of known risk factors and treatments. This shows that
it is important to monitor NK cell levels in patients with HNSCC,
and that it may be worthwhile to pursue future immune therapies may
be designed aimed at restoring circulating NK cells in patients
with HNSCC.
[0240] A variety of methods are available as bases for methodology
used to analyze CpG methylation states. These methods can be
divided roughly into two types: gene-specific and global
methylation analysis. A large number of techniques have been
developed for gene-specific CpG methylation analysis. Early studies
used methylation sensitive restriction enzymes to digest DNA
followed by Southern detection or PCR amplification. Bisulfite
reaction based methods such as methylation specific PCR (MSP) and
bisulfite genomic sequencing PCR are commonly used currently.
Global methylation analysis measures the overall level of methyl
cytosines in genome by methods such as chromatography or methyl
accepting capacity assay. Further, methylation hot-spots or
methylated CpG islands in the genome may also be identified by
several of the recently developed genome-wide screen methods such
as Restriction Landmark Genomic Scanning for Methylation (RLGS-M),
and CpG island microarray.
[0241] The gene-specific method MethyLight is a highly sensitive
high-throughput quantitative methylation assay, capable of
detecting methylated alleles in the presence of a 10000-fold excess
of unmethylated alleles using fluorescence-based real-time PCR
technology that requires few or minor further manipulations after
the PCR step. Eads C A et al., Nucl. Acids Res. (2000) 28 (8):
e32-00. For example, a MethylLight assay is commercially available
from QIAGEN, Inc. Valencia, Calif.
[0242] In another embodiment of the method, analyzing the
methylation of any gene, e.g., the CD3Z gene through amplification
by Polymerase Chain Reaction (PCR) is performed using digital PCR.
Digital PCR is an improved method of PCR useful to overcome
difficulties associated with conventional PCR. Conventional PCR
assumes that amplification of nucleic acid is exponential and
nucleic acids are quantified by comparing the number of
amplification cycles and amount of PCR end-product to those of a
reference sample. In practice however, several factors interfere
with this calculation, making measurements uncertainties and
inaccurate and hence unsuitable for highly sensitive
measurements.
[0243] In digital PCR, a sample is partitioned so that individual
nucleic acid molecules within the sample are localized and
concentrated within many separate regions. Molecules can be counted
by estimating by using a Poisson distribution. Each partition
contains "0" or "1" molecules, or a negative or positive reaction,
respectively. After PCR amplification, nucleic acids are quantified
by counting the regions that contain PCR end-product, which is a
count of positive reactions. A system for digital PCR based on
integrated fluidic circuits (chips) having integrated chambers and
valves for partitioning samples is commercially available. For
example a digital PCR system is available from Life Technologies
(Grand Island, N.Y. 14072USA) and QuantaLife QuantaLife Pleasanton,
Calif. USA).
[0244] This application relates to international application
PCT/US2012/039669 filed May 25, 2012 (published as international
publication number WO/2012/162660 published Nov. 29, 2012), which
claims the benefit of provisional applications having Ser. Nos.
61/489,883 filed May 25, 2011 entitled, "Methods of
Immunodiagnostics using DNA Methylation arrays as surrogate
measures of the identity of a cell or a mixture of cells";
61/509,644, filed Jul. 20, 2011 entitled "Methods of
Immunodiagnostics using DNA Methylation arrays as surrogate
measures of the identity of a cell or a mixture of cells for
prognosis and diagnosis of diseases"; 61/585,892 filed Jan. 12,
2012 entitled, "Methods of Immunodiagnostics using DNA Methylation
arrays as surrogate measures of the identity of a cell or a mixture
of cells for prognosis and diagnosis of diseases"; and 61/619,663,
filed Apr. 3, 2012 entitled "Methods using DNA Methylation arrays
for identifying a cell or a mixture of cells for prognosis and
diagnosis of diseases, and for cell remediation therapies"
inventors Karl Kelsey, Eugene Andres Houseman, John Wiencke,
William P. Accomando, Jr. and Carmen Marsit, each of which
applications including the sequence listings is hereby incorporated
herein by reference in its entirety. A portion of the examples and
figures herein have been submitted as an appendix to provisional
application Ser. No. 61/865,479 filed Aug. 13, 2013, entitled,
"Methods using DNA methylation for identifying a cell or a mixture
of cells for prognosis and diagnosis of diseases, and for cell
remediation therapies", and is an unpublished manuscript submitted
to the journal Genome Biology entitled, "Quantitative
reconstruction of leukocyte subsets using DNA methylation" by
William P. Accomando, Jr., John Wiencke, Eugene Andres Houseman,
Heather II. Nelson, and Karl Kelsey.
[0245] The invention having been fully described is further
illustrated by the following claims and examples herein. Data in
Example herein show that cell mixture distributions within
peripheral blood were assessed accurately and reliably using DNA
methylation. DNA methylation was measured and analyzed in leukocyte
subsets purified from whole blood, and a library of lineage
specific DNA methylation signatures that distinguish human T-cells,
B-cells, NK cells, monocytes, eosinophils, basophils and
neutrophils were included that list these signatures. The library
was used as a reference to quantify simultaneously these cell types
in DNA from adult human blood. The methods described were
successful in detecting clinically relevant shifts in leukocyte
populations. The methods, compositions and kits herein more
accurately analyzed human whole blood samples compared to
established methods of immune cell quantification. Data obtained by
these methods using DNA methylation were found to be unaffected by
duration of storage of blood. Data show that it was possible, using
only DNA rather than whole cells by the methods herein, to
reconstruct precise immune cell differential numbers. Methods in
various embodiments used a library including signatures comprising
differentially methylated regions (DMRs) from types of leukocytes
in a blood sample of the patient. In various embodiments, the
library includes at least one gene or locus selected from the group
consisting of: FGD2, HLA-DOB, BLK, IGSF6, CLDN15, SFT2D3, ZNF22,
CEL, HDC, GSG1, FCN1, OSBPL5, LDB2, NCR1, EPS8L3, CD3D, PPP6C,
CD3G, TXK, and FAIM. In various embodiments, the library includes
at least one selected from the group consisting of: CLEC9A (2
loci), INPP5D, INHBE, UNQ473, SLC7A11, ZNF22, XYLB, HDC, RGR,
SLCO2B1, C1orf54, TM4SF19, IGSF6, KRTHA6, CCL21, SLC11A1, FGD2,
TCL1A, MGMT, CD19, LILRB4, VPREB3, FLJ10379. HLA-DOB, EPS8L3,
SHANK1, CD3D (2 loci), CHRNA3, CD3G (2 loci), RARA, and GRASP. The
nucleotide sequence and corresponding amino acid sequence of each
of the genes or loci are listed in genome or protein databases such
as GenBank, European Nucleotide Archive, European Bioinformatics
Institute, GenomeNet, or The National Center for Biotechnology
Information (NCBI) Protein database.
[0246] Examples herein accurately assed cell mixture distributions
within peripheral blood using DNA methylation. DNA methylation was
measured in leukocyte subsets purified from wholeblood and was used
to establish a library of lineage specific DNA methylation
signatures that distinguished human T-cells, B-cells, NK cells,
monocytes, eosinophils, basophils, and neutrophils. This library
was used as a reference to simultaneously quantify these cell types
in DNA from adult human blood. Methods, compositions and kits
described herein more effectively detected clinically relevant
shifts in leukocyte populations that established methods of immune
cell quantification performed on human whole blood samples. Unlike
established methods, methods described herein were not affected by
type and duration of storage of blood samples. Data show that
precise immune cell differential estimates were reconstructed using
only DNA rather than whole cells.
[0247] Different human cell types, defined by function and
morphology, are shown in Examples herein in complex mixtures using
a variety of physical, optical and proteomic characteristics.
(Pollard, T. D. et al. 2007 Cell Biology second edition Saunders
Elsevier publishing, Philadelphia, Pa.).
[0248] Lineage-specific DNA methylation has been investigated to
distinguish different types of cells (Baron, U. et al. 2006
Epigenetics 1: 55-60; Wieczorek, G. et al. 2009 Cancer Res 69:
599-608; Sehouli, J. et al. 2011 Epigenetics 6: 236-246; Wiencke,
J. K. et al. 2012 Epigenetics 7: 1391-1402; Accomando, W. P. et al.
2012 Clin Cancer Res 18: 6147-6154; Christensen, B. C. et al. 2009
PLoS Genet. 5, e1000602, doi:10.1371/journal.pgen.1000602).
Patterns of DNA methylation, occurring at cytosine residues in the
context of cytosine-guanine (CpG) dinucleotides, are tightly
associated with chromatin conformation, which coordinates gene
expression and reflects transcriptional programming of gene
expression. (Bird, A. 2002 Genes & development 16: 6-21; and
Zaidi, S. K. et al. 2011 The Journal of biological chemistry 286:
18355-18361). During differentiation, somatic cell lineages undergo
de novo DNA methylation followed by maintenance methylation
(Jaenisch, R. 1997 Trends in genetics: TIG 13: 323-329), thereby
establishing mitotically heritable, cell lineagespecific
methylation signatures (Khavari, D. A., et al. 2010 Cell Cycle 9,
3880-3883; Bocker, M. T. et al. 2011 Blood 117, e182-189; Meissner,
A. 2010 Nature biotechnology 28, 1079-1088; Hawkins, R. D. et al.
2010 Cell Stem Cell 6: 479-491). Patterns of DNA methylation served
as reliable indicators of cell lineage and were used as sensitive
and specific biomarkers for diverse cell types (Baron, U. et al.
2006 Epigenetics 1: 55-60; Accomando, W. P. et al. 2012 Clin Cancer
Res 18: 6147-6154; Meissner, A. 2010 Nature biotechnology 28,
1079-1088; Davies, M. N. et al. 2012 Genome Biol 13: R43,
doi:10.1186/gb-2012-13-6-r43; and Varley, K. E. et al. 2013 Genome
Res 23: 555-567).
[0249] The immune system is a powerful model for investigating,
developing and implementing new approaches to human cell detection
and quantification. Blood is a complex mixture of many different
specialized cell types and the composition of white blood cell
(WBC, or leukocyte) populations reflects disease states and
toxicant exposures (Bui, J. D. et al. 2007 Curr Opin Immunol 19:
203-208; Kim, R. et al. 2007 Immunology 121: 1-14;
Ostrand-Rosenberg, S. 2008 Curr Opin Genet Dev 18: 11-18; Dunn, G.
P. et al. 2002 Nat Immunol 3: 991-998; Shimizu, J. et al. 1999 J
Immunol 163, 5211-5218; Zou, W. 2006 Nat Rev Immunol 6: 295-307).
Thus, the ability to detect an improper balance of immune cells is
valuable both in a clinical and research setting. However, research
aimed at further understanding immune cell level alterations is
restricted by the limitations of immunodiagnostic methods. Routine
blood leukocyte differentiation is achieved using physical cell
isolation and the electrical impedance or optical light scattering
properties of the cells (Handin, R. I., Lux, S. E. & Stossel,
T. P. 2003 Blood: Principles and Practice of Hematology second
edition, 2304, Lippincott Williams & Wilkins). Fluorescently
labeled antibodies and flow cytometry are used to identify
specialized cell subtypes, e.g. CD4+ T-cells (Sehouli, J. et al.
2011 Epigenetics 6: 236-246; Dieye, T. N. et al. 2011 Journal of
immunological methods 372: 7-13). These methods rely upon intact
cells, and therefore require fresh samples and cannot be applied to
older, archived blood samples.
[0250] Human leukocytes derive from pluripotent hematopoietic stern
cells through a developmental process called hematopoiesis,
resulting in a hierarchy of leukocyte lineages each with unique
functions and gene expression patterns (Parham, P. 2005 The Immune
System second edition, Garland Science, New York, N.Y.). Epigenetic
regulation of gene expression is important to hematopoiesis;
cellular fates are largely determined by patterns of DNA packaging
into chromatin (Janson, P. C. et al. 2009 Biochim Biophys Acta
1790: 906-919).
[0251] Examples herein shown that human leukocyte lineages were
distinguished with very high sensitivity and specificity by
epigenetic marks such as patterns of DNA methylation occurring in
differentially methylated regions, DMRs. The identification of DMRs
that are biomarkers of specific human leukocyte lineages resulted
in the development of sensitive assays for monitoring these
leukocytes in the peripheral blood by measuring DNA methylation.
While some immune cell lineage-specific DMRs have been used in
assays to detect and quantify a single type of leukocyte in human
blood and tissue (Wieczorek, G. et al. 2009 Cancer Res 69: 599-608;
Sehouli, J. et al. 2011 Epigenetics 6: 236-246; Wiencke, J. K. et
al. 2012 Epigenetics 7: 1391-1402; Accomando, W. P. et al. 2012
Clin Cancer Res 18: 6147-6154). Examples herein elucidate a
different approach to simultaneously quantify the entire
distribution of WBC types in human blood using methylation profiles
assessed in archived DNA.
[0252] The compositions, methods and kits herein are useful for
assessing immune modulations including gimmune profiling to be
performed in a wide variety of archival blood samples from large
epidemiological studies of human disease and exposure and clinical
trials of drug efficacy and biomonitoring. Examples herein include
a novel platform for expansion of the nascent field of human
immunotoxicology. Compositions, methods and kits herein provide an
effective improvement in a vast number of novel diagnostic and
therapeutic procedures, by serving as a reliable alternative to the
accepted reference standard of manual differential as well as the
automated differential and even FACS based analysis. Thus,
compositions, methods and kits herein are useful in clinical
applications as well as population studies; aiding in diagnostic
follow-up, toxicologic assessment and in numerous new approaches
being developed in translational medical research. Furthermore,
Examples herein provide new approaches to clinicalprofiling of
immune response to therapy for chronic diseases.
[0253] Without being limited by any particular theory or mechanism
of action, it is envisioned that the compositions, methods and kits
herein provide can be used to identify, characterize and enumerate
any type of lineage stable human cells within complex mixtures.
This presents an unprecedented opportunity for the development of a
new generation of methods for cellular quantification that exploits
the human methylome; supporting the feasibility of "molecular"
histology. Using the immune system as a model, Examples herein
created a paradigm for the mapping of cell-specific DNA methylation
signatures in order to generate reference libraries of efficacious
biomarkers that distinguish different cell types. During mitosis,
patterns of DNA methylation are replicated at the time of DNA
synthesis such that daughter cells inherit both genetic material
and epigenetic information contained within the parental cell
(Khavari, D. A. et al. 2010 Cell Cycle 9, 3880-3883).
[0254] Examples herein include established powerful computational
tools to quantitatively reconstruct the precise makeup of cellular
mixtures. In the past, simultaneous quantification of normal or
disease-associated changes in cell population composition has been
accomplished using flow cytometry, electrical impedance, light
scatter and/or immunohistochemistry. This approach required large
volumes of fresh blood or tissue, and, for flow cytometry, can
involve laborious antibody tagging (Roussel, M., et al. 2010
Cytometry. Part A: the journal of the International Society for
Analytical Cytology 77: 552-563; Mittag, A. et al. 2011 Methods in
cell biology 103: 1-20). In contrast, Examples herein use
high-throughput techniques which entail simple, convenient DNA
analysis methods that can easily be automated to facilitate rapid
quantitative reconstruction of cell subsets. Moreover, the assays
and arrays (e.g., LDMA) employed use different chemistry than the
HDMA, highlighting the crossplatform applicability of the approach
described herein.
[0255] Further examples of the inventions are found in a manuscript
(48 pages) submitted to the journal Genome Biology entitled,
"Quantitative reconstruction of leukocyte subsets using DNA
methylation" by William P. Accomando, Jr., John K. Wiencke, E.
Andres Houseman, Heather H. Nelson, and Karl T. Kelsey, which is
incorporated by reference herein in its entirety.
[0256] A skilled person will recognize that many suitable
variations of the methods may be substituted for or used in
addition to those described above and in the claims. It should be
understood that the implementation of other variations and
modifications of the embodiment of the invention and its various
aspects will be apparent to one skilled in the art, and that the
invention is not limited by the specific embodiments described
herein and in the claims. The present application mentions various
patents, scientific articles, and other publications, each of which
is hereby incorporated herein in its entirety by reference.
[0257] The invention having now been fully described, it is
exemplified by the following examples and claims which are for
illustrative purposes only and are not meant to be further
limiting.
EXAMPLES
Example 1
Statistical Methods for Using DNA Methylation Arrays as Surrogate
Measures of Cell Mixture Distribution
[0258] In the framework for measurement of methylation status of
CpG sites in cell mixtures Y.sub.0h represents an m.times.1 vector
of methylation assay values, e.g. average beta values from an
Infinium bead-array product corresponding to a purified blood
sample consisting of a homogenous cellular population (e.g.
monocytes or granulocytes), with the qualitative characterization
of the cell type indicated by a d.sub.0.times.1 covariate vector
w.sub.h. Here, h.epsilon.{1, . . . , n.sub.0}, and the m individual
values correspond to CpG sites on a DNA methylation microarray,
possibly pre-selected to correspond to putative DMRs for
distinguishing different cellular types. Correspondingly, Y.sub.1i
represents an m.times.1 vector of methylation assay values for the
same CpG sites (in the same order) as Y.sub.0h but corresponding to
a heterogeneous mixture of cells (e.g. peripheral whole blood) from
a human subject. Here, i.epsilon.{1, . . . , n.sub.1}, n.sub.1 is
the number of target specimens, and z.sub.1i is a d.sub.1.times.1
covariate vector representing an intercept as well as phenotypes or
exposures corresponding to the subject, e.g. d.sub.1=2 for a simple
case/control study without confounders. Here the goal is to
understand the associations between Y.sub.1i and Z.sub.1i in terms
of associations between Y.sub.0h and w.sub.0h, i.e. to infer
changes in mixtures of cell types associated with phenotypes or
exposures, using DNA methylation as a surrogate measure of cell
mixture. Thus, there are two data sets,
S.sub.0={(Y.sub.01,w.sub.1), . . . ,
(Y.sub.0n.sub.0,w.sub.n.sub.0)}, the set of data from "purified"
cell samples effectively representing external validation or
gold-standard data and S.sub.1={(Y.sub.11,z.sub.1), . . . ,
(Y.sub.1n.sub.1,z.sub.n.sub.1)}, representing surrogate data
collected from a target population. To this end following linear
models are provided:
Y.sub.0h=B.sub.0w.sub.0h+e.sub.0h
Y.sub.1i=.mu..sub.1+B.sub.1z.sub.1i+e.sub.1i, (1)
where B.sub.0 and B.sub.1 are, respectively, m.times.d.sub.0 and,
m.times.d.sub.1 matrices and e.sub.0 and e.sub.1 are error vectors.
For simplicity a one-way ANOVA parameterization for w is assumed.
Slight generalizations to account for design complications met in
practice is described in Example 2.
[0259] A reasonable regression parameterization for z is also
assumed, including an intercept, and for convenience, the first
column of B.sub.0 is denoted as .mu..sub.1, the m.times.1
intercept. The error vectors e.sub.0 and e.sub.1 may reflect
independence among arrays h and i, or else may have more complex
random effects structure accounting for technical effects or
biological replication; however, their substructures are incidental
to this analysis, with the exception of the fine details of the
bootstrap procedure proposed below.
[0260] To implement a surrogacy relation, the following linking
regression model is proposed:
B.sub.1=1.sub.m.gamma..sub.0.sup.T+B.sub.0.GAMMA.+U, (2)
where .GAMMA. is a d.sub.0.times.d.sub.1 matrix that summarizes
associations between the rows of B.sub.0j and B.sub.1i and U is a
matrix of errors. Substituting equation (2) into (1), writing
B.sub.0=(b.sub.01, . . . , b.sub.0d.sub.0) explicitly in terms of
its columns and writing .GAMMA..sup.T=(.gamma..sub.1, . . . ,
.gamma..sub.d.sub.0), it follows that
Y 1 i = l = 0 d 0 b 0 l ( .gamma. l T z 1 i ) + ( 1 m .gamma. 0 T +
U ) z 1 i + e 1 i . ( 3 ) ##EQU00001##
To impart a biological interpretation, it is assumed assume that
the DNA assayed in S.sub.1 arises as a mixture of DNA from cell
types profiled in S.sub.0, with mixture coefficients whose
population average, conditional on z, are {.omega..sub.1.sup.(z), .
. . .omega..sub.d.sub.0.sup.(z)}, so that
E ( Y 1 i | z 1 i = z ) = .xi. ( z ) + l = 1 d 0 b 0 l .omega. l (
z ) , ( 4 ) ##EQU00002##
where the m.times.1 vector .xi..sup.(z) represents cell types
excluded from consideration among the purified samples in S.sub.0,
or else non-cell specific methylation, including alterations at the
molecular level in the maintenance of DNA methylation patterns
themselves (possibly exposure related, age, or disease related). It
follows from (3) and (4) that the mixture coefficients are
recoverable from .GAMMA.,
.omega..sub.l.sup.(z)=.gamma..sub.l.sup.Tz.sub.1i, provided
.xi..sup.(z) is orthogonal to the column space of B.sub.0. As
discussed in detail in the Example 3 bias can arise if differences
in .xi..sup.(z) between distinct values of z have nonzero
projection onto the column space of B.sub.0, although the magnitude
of anticipated biases can be assessed through sensitivity analysis
as shown in Example 11.
[0261] It is possible to assign interpretations to the components
of variation in (3). SS.sub.o represents overall variability in
Y.sub.1i, i.e.
SS.sub.o=.SIGMA..sub.i=1.sup.n.sup.1.parallel.Y.sub.1i-.mu..sub.1.pa-
rallel..sup.2, where .mu..sub.1=E(Y.sub.1i). From multivariate
probability theory it is straightforward to show that
SS.sub.o=SS.sub.e+SS.sub.v+SS.sub.u, where
SS.sub.e=.SIGMA..sub.i=1.sup.n.sup.1.parallel.e.sub.1i.parallel..sup.2,
SS.sub.v=.SIGMA..sub.i=1.sup.n.sup.1(z.sub.1i-
z.sub.1).sup.T.GAMMA..sup.TB.sub.0.sup.TB.sub.0.GAMMA.(z.sub.1i-
z.sub.1), and SS.sub.u=.SIGMA..sub.i=1.sup.n.sup.1{(z.sub.1i-
z.sub.1).sup.TU.sup.TU(z.sub.1i- z.sub.1)+m(z.sub.1i-
z.sub.1).sup.T.gamma..sub.0.gamma..sub.0.sup.T(z.sub.1i- z.sub.1)}.
SS.sub.e measures variation unexplained by the covariates z.sub.1i,
presumed to represent a combination of technical noise and
unsystematic biological heterogeneity. SS.sub.v measures
variability explained by mixtures of profiles in the set S.sub.0,
and SS.sub.u measures variability in systematic biological
heterogeneity that nevertheless remains unexplained by mixtures of
profiles in S.sub.0, presumably due to some process other than
differences in mixtures of cell types. Thus two partial coefficient
of determination measures are proposed:
R.sub.1,0.sup.2=SS.sub.v/SS.sub.o, which represents the proportion
of total variation in S.sub.1 explained by S.sub.0, and
R.sub.1,1.sup.2=SS.sub.v/(SS.sub.o-SS.sub.e), which represents the
proportion of systematic variation in S.sub.1 explained by S.sub.0.
It is noted that R.sub.1,1.sup.2 is poorly defined when
SS.sub.o.apprxeq.SS.sub.e.
[0262] Estimation proceeds by applying an appropriate linear model,
e.g. ordinary least squares, linear mixed effects models (Wang and
Petronis, 2008, DNA Methylation Microarrays: Experimental Design
and Statistical Analysis. Chapman & Hall, Boca Raton, Fla.),
limma (Smyth, 2004, Stat Appl Genet and Mol Biol, 3(1), 3), or
surrogate variable analysis (Teschendorff et al., 2011,
Bioinformatics, 27(11), 1496-505), to obtain estimates {circumflex
over (B)}.sub.0 and {circumflex over (B)}.sub.1. Estimates of
.gamma..sub.0 and .GAMMA. are then obtained by projecting
{circumflex over (B)}.sub.1 onto the column space of {tilde over
(B)}.sub.0=(1.sub.m,B.sub.0), as described in detail in the Example
3. Standard errors can be obtained in one of three ways. The
simplest estimator, SE.sub.0, is the "naive" estimator from simple
least squares theory, ignoring the fact that {circumflex over
(B)}.sub.0 and {circumflex over (B)}.sub.1 are estimates, i.e.
potentially variable. To account for variation in estimating
{circumflex over (B)}.sub.1, a simple alternative is to use a
nonparametric bootstrap procedure.
For each bootstrap iteration t, sampling is performed with
replacement from S.sub.1 (or sample errors in a manner consistent
with a hierarchical experimental design) to obtain S.sub.1.sup.(t),
producing bootstrap estimates {circumflex over (B)}.sub.1.sup.(t)
from which "single-bootstrap" standard errors SE.sub.1 are
computed. Finally, it is possible to account for variation in
estimating B.sub.0 by also bootstrapping S.sub.0; because of
potentially small sample sizes n.sub.0, using a parametric
bootstrap is proposed herein. A "double-bootstrap" standard error
estimator, SE.sub.2, is computed from these two sets of bootstraps.
The double-bootstrap has the additional benefit over the
single-bootstrap, in that it can be used to assess bias due to
measurement error (variability) in {circumflex over (B)}.sub.0.
Estimation details are provided in Example 3.
[0263] Beyond bias due to measurement error, which is easily
corrected using the double-bootstrap procedure, there are
additional sources of potential bias. For example, a univariate z
representing case/control status is considered, where
.delta..ident..xi..sup.(1)-.xi..sup.(0)=B.sub.0.alpha. for some
d.sub.0.times.1 vector .alpha..noteq.0. In such a situation, there
will be a bias equal to .alpha. in estimating the mixture
differences. Example 2 provides a detailed analysis of such biases,
and proposes a sensitivity analysis procedure for assessing the
magnitude of possible bias in a given data set.
[0264] In the examples herein the method for inferring changes in
the distribution of white blood cells between different
subpopulations is used for analysis of population data. It is
possible to use S.sub.0 to predict distribution of leukocytes in a
single sample having DNA methylation profile Y*. Equating the
intercept term of B.sub.1 in (1) with Y* and applying (2), mixing
proportion estimates .GAMMA.*=({tilde over (B)}.sub.0.sup.T{tilde
over (B)}.sub.0).sup.-1{tilde over (B)}.sub.0.sup.TY* is obtained.
Estimates can be further refined with the use of quadratic
programming techniques (Goldfarb and Idnani, 1983, Math Prog, 27,
1-33), restricting the components of .GAMMA.*,
.gamma..sub.l*.gtoreq.0 in minimizing
.parallel.Y*-B.sub.0.GAMMA.*.parallel..sup.2 with respect to
.GAMMA.*. Such individual projections of methylation profiles on
the column space spanned by S.sub.0 facilitate the application of
the fundamental ideas proposed above to individual,
clinically-based diagnostic procedures.
[0265] It is noted that DNA methylation arrays are typically
focused on the comparison of methylated to unmethylated CpG
dinucleotides, not quantifying actual amounts of DNA. Therefore,
information on cell mixtures from DNA methylation is limited to
distributions, not actual counts, as one might obtain from flow
cytometry. In addition, it is possible to model z.sub.1i directly
as a function of mixture coefficients .GAMMA.* obtained
individually via the constraint .gamma..sub.l*.gtoreq.0.
Example 2
General Designs for the Treatment of Methylation Assay Data
Obtained from Purified Cells S.sub.0
[0266] Because the cell types assembled in S.sub.0 potentially
involve hierarchical relationships corresponding to cell lineage,
designs that are more general than a one-way ANOVA parameterization
may be necessary for w. If cell-type interpretations can be
extracted from S.sub.0 via a d.sub.0.times.d.sub.0* contrast matrix
L (i.e. B.sub.0L identifies the mean methylation for d*.sub.0 cell
types), then interpretations can be obtained by simply replacing
{circumflex over (B)}.sub.0 with {circumflex over (B)}L in the
projection used to estimate .gamma..sub.0 and .GAMMA. and their
standard errors. The case of CD4+ and CD8+ T cells, both of which
are the primary components of the T-lymphocyte group is considered
as an example. In this example one sample is purified CD4+ T cells,
another sample is purified CD8+ T cells, and yet another sample is
T-lymphocyte cells that have not been purified to more specific
lineages. Such was the case for S.sub.0 in the examples. The
CD4+sample may be identified as w.sub.0h=(1,1,0).sup.T, the CD8+
sample as w.sub.0h=(1,0,1).sup.T, and the latter, less specific
sample as w.sub.0h=(1,0,0).sup.T. Then an appropriate contrast L
for identifying CD4+ and CD8+ samples would be constructed as a
3.times.2 matrix with columns (1,1,0).sup.T and (1,0,1).sup.T. This
approach was used in the examples 6-9 below, and was also employed
in the simulations.
Example 3
Estimation Details and Bias
[0267] Estimation:
[0268] A two-stage estimation procedure is here introduced. The
first stage of analysis involves estimation of B.sub.0 and B.sub.1
by appropriate linear models, e.g. ordinary least squares (OLS)
regression estimator {circumflex over
(B)}.sub.0.sup.T=[.SIGMA..sub.h=1.sup.n.sup.0z.sub.0hz.sub.0h.sup.T].sup.-
-1[.SIGMA..sub.h=1.sup.n.sup.0z.sub.0h.sup.TY.sub.0h.sup.T] and a
similar estimator for ({circumflex over (.mu.)}.sub.1, {circumflex
over (B)}.sub.1).sup.T; a procedure such as limma; or else
locus-by-locus linear mixed effects models that adjust for
technical (e.g. chip) effects. The second stage of analysis,
estimation of .sup..quadrature..gamma..sub.0 and
.sup..quadrature..GAMMA., proceeds as follows:
({circumflex over (.gamma.)}.sub.0,{circumflex over
(.GAMMA.)}.sup.T)={tilde over (B)}.sub.1.sup.T{tilde over
(B)}.sub.0({tilde over (B)}.sub.0.sup.T{tilde over
(B)}.sub.0).sup.-1, (5)
where {tilde over (B)}.sub.0=(1.sub.m,{circumflex over (B)}.sub.0).
Let {circumflex over (r)}.sub..gamma.={circumflex over
(B)}.sub.1-1.sub.m{circumflex over (.gamma.)}.sub.0-{circumflex
over (B)}.sub.0{circumflex over (.GAMMA.)}, {circumflex over
(.SIGMA.)}.ident.({circumflex over
(.sigma.)}.sub.rs.sup.(.gamma.)).sub.rs=(m-d.sub.0-1).sup.-1{circumflex
over (r)}.sub..gamma..sup.T{circumflex over (r)}.sub..gamma.,
V.sub.0=m({tilde over (B)}.sub.0.sup.T{tilde over
(B)}.sub.0).sup.-1, and V.sub.0=(v.sub.rs.sup.(0)).sub.rs. Naive
standard error estimates for the (r,s).sup.th element of
({circumflex over (.gamma.)}.sub.0,{circumflex over
(.GAMMA.)}.sup.T) can be obtained by computing
(m.sup.-1v.sub.ss.sup.(0){circumflex over
(.sigma.)}.sub.rr.sup.(.gamma.)).sup.1/2. The naive standard error
estimates fail to account for the variability in estimating
{circumflex over (B)}.sub.0 and {circumflex over (B)}.sub.1, and
are consequently biased, as demonstrated in the simulations,
Example 12.
[0269] A nonparametric bootstrap procedure is used as an
alternative. For each bootstrap iteration t, with replacement from
S.sub.1 is sampled, (or sample errors in a manner consistent with a
hierarchical experimental design, e.g. taking into account chip
effects), to obtain S.sub.1.sup.(t). From S.sub.1.sup.(t) an
estimate of {circumflex over (B)}.sub.1.sup.(t) is obtained, and
then {circumflex over (.gamma.)}.sub.0.sup.(t) and {circumflex over
(.GAMMA.)}.sup.(t) are computed by replacing {circumflex over
(B)}.sub.1 with {circumflex over (B)}.sub.1.sup.(t) in (S1). After
resampling a large number T times, standard errors are obtained
empirically from the bootstrap sets {{circumflex over
(.gamma.)}.sub.0.sup.(t)}.sub.t=1, . . . , T and {{circumflex over
(.GAMMA.)}.sup.(t)}.sub.t=1, . . . , T. This method of estimation
is called the "single bootstrap" to distinguish it from an
alternative that accounts for variability in estimation of
{circumflex over (B)}.sub.0 as well.
[0270] Because S.sub.0 will typically consist of small sample sizes
per cell type, a nonparametric bootstrap procedure for estimating
variation in {circumflex over (B)}.sub.0 may not perform well.
Therefore a parametric bootstrap is used. Let .OMEGA..sub.j be the
variance-covariance matrix for the j.sup.th row of {circumflex over
(B)}.sub.0. A resampled matrix {circumflex over (B)}.sub.0.sup.(t)
is formed by adding, to each row j of {circumflex over (B)}.sub.0,
a zero-mean multivariate normal vector with variance-covariance
.OMEGA..sub.j, or a corresponding multivariate t-distribution with
n.sub.0-d.sub.0 degrees of freedom. Then {circumflex over
(.gamma.)}.sub.0.sup.(t) and {circumflex over (.GAMMA.)}.sup.(t)
are computed from (S1) by replacing {circumflex over (B)}.sub.0
with {circumflex over (B)}.sub.0.sup.(t) (in addition to the
previously mentioned replacement). This method is referred to as
the "double bootstrap". The double bootstrap ignores correlation
between CpG sites within a single validation sample, and given the
relative purity assumed for these samples and adequate correction
for technical effects, this is reasonable to first order. As is
demonstrated in Examples 6-9 and simulations (Example 10), there is
negligible difference between the single and double bootstrap, so
the incorporation of additional complexity to model cross-CpG
correlations is unlikely to produce much benefit. However, the
double-bootstrap has the additional benefit over the
single-bootstrap, in that it can be used to assess bias due to
measurement error (variability) in {circumflex over
(.gamma.)}.sub.0.
Bias:
[0271] There are several potential sources of bias in this
analysis. The first arises from measurement error in B.sub.0, and
the others arise from biological non-orthogonality.
[0272] It can be shown that first form of bias, from measurement
error, manifests as a multiple of .GAMMA. on the order of V.sub.0
.OMEGA., where .OMEGA.=m.sup.-1.SIGMA..sub.j=1.sup.m.OMEGA..sub.j.
However, it is easily assessed using the double-bootstrap procedure
described above, by subtracting {circumflex over (.gamma.)}.sub.0
from T.sup.-1.SIGMA..sub.t=1.sup.T{circumflex over
(.gamma.)}.sub.0.sup.(t) and {circumflex over (.GAMMA.)} from
T.sup.-.SIGMA..sub.t=1.sup.T{circumflex over (.GAMMA.)}.sup.(t),
and bias correction can be implemented by subtracting this term
from the estimate.
[0273] Biases induced by biological non-orthogonality are more
insidious. For example, a univariate z.sub.1i is considered
representing case/control status, where
.delta.=.xi..sup.(1)-.xi..sup.(0)=B.sub.0.alpha. for some
d.sub.0.times.1 vector .alpha..noteq.0. In such a situation, there
will be a bias equal to .alpha. in estimating the mixture
differences. Non-orthogonal .delta. may arise from two distinct
sources. One occurs when some cell types have not been profiled in
S.sub.0, so that
.SIGMA..sub.l=0.sup.d.sup.0.omega..sub.l.sup.(z)<1. The other
may arise when some non-cell-mediated biological process (i.e.
distinct from a change in cellular mixtures) nevertheless results
in methylation profiles that appear similar to those that
distinguish cell types profiled in S.sub.0. To this end, model
represented by equation (4) is elaborated follows:
E ( Y 1 i | z 1 i 1 = z ) = l = 1 d 0 ( B 0 l + .lamda. l ( z ) )
.omega. l ( z ) + q = 1 Q ( .mu. ~ q + .lamda. ~ q ( z ) ) .omega.
~ q ( z ) , ( 6 ) ##EQU00003##
where q.epsilon.{1, . . . , Q} indexes unprofiled cell types (or
free DNA), each with methylation profile {circumflex over
(.mu.)}.sub.q, and in mixture proportions .omega..sub.l.sup.(z) and
{tilde over (.omega.)}.sub.q.sup.(z),
.SIGMA..sub.l=1.sup.d.sup.0.omega..sub.l.sup.(z)+.SIGMA..sub.q=1.sup.Q{ti-
lde over (.omega.)}.sub.q.sup.(z)=1. Here .lamda..sup.(z) denotes
an "abnormal", or at least non-functional, non-cell-mediated
process that is specific to disease status (and may affect
different cell types in different degrees of intensity).
[0274] Let P=({tilde over (B)}.sub.0.sup.T{tilde over
(B)}.sub.0).sup.-1{tilde over (B)}.sub.0.sup.T, and denote
difference between case and control parameters using .DELTA., e.g.
.DELTA..omega..sub.l=.omega..sub.l.sup.(1)-.omega..sub.l.sup.(0)
and
.DELTA.E(Y.sub.1i)=E(Y.sub.1i|z.sub.1iI=1)-E(Y.sub.1i|Z.sub.1iI=0).
It follows from equation (6) that
P .DELTA. E ( Y 1 i ) = l = 1 d 0 l .DELTA. .omega. l + q = 1 Q P
.mu. q .DELTA. .omega. ~ q + l = 1 d 0 P .DELTA. ( .lamda. l
.omega. l ) + q = 1 Q P .DELTA. ( .lamda. q .omega. ~ q ) . ( 7 )
##EQU00004##
The values .DELTA.{tilde over (.omega.)}.sub.q may need to shift in
order to accommodate any shifts in .DELTA..omega..sub.l, since the
model constrains
.SIGMA..sub.l=1.sup.d.sup.0.DELTA..omega..sub.l+.SIGMA..sub.q=1.sup.Q.DEL-
TA.{tilde over (.omega.)}.sub.q=0. The first term on the right hand
side of (6) is the target quantity, identifying the desired mixture
weights. The second term will be negligible if the profiles {tilde
over (.mu.)}.sub.q are approximately orthogonal to the columns of
B.sub.0, or else the differences .DELTA.{tilde over
(.omega.)}.sub.q are small. This condition will be satisfied if
S.sub.0 is exhaustive in the sense that
1-.SIGMA..sub.l=1.sup.d.sup.0.omega..sub.l.sup.(z) is
negligible.
[0275] Mathematically, it is difficult to further characterize the
latter two terms, without specifying what kinds of
non-cell-mediated processes are likely. For example, even if
.DELTA..lamda..sub.q=0 for a particular value of q, it may
nevertheless still produce a bias if .DELTA.{tilde over
(.omega.)}.sub.q.noteq.0. Conversely, even if
.DELTA..omega..sub.l=0, bias can result from a nonzero difference
.DELTA..lamda..sub.l (e.g. different methylation intensities at
island shores due to distinct risk profiles) if
.DELTA..lamda..sub.l is not annihilated by P. Only processes that
are equal in intensity in both cases and controls and across cell
types will be differenced out of equation (7). Thus, a key
consideration is whether P annihilates the methylation signature
corresponding to a given non-cell-mediated biological process. In
order to examine this issue more carefully, a Bayesian view is
adopted to characterize a prior expectation of bias as a function
of prior probabilities for individual CpG sites. The goal, in part,
is to understand the potential for bias, given the number m of CpG
sites chosen to be measured in S.sub.0, with the goal of selecting
m in a manner consistent with minimizing bias.
[0276] Assuming that the CpGs under consideration are ordered in
advance (e.g. randomly or by F-statistic
F.sub.j=d.sub.0.sup.-1{circumflex over
(B)}.sub.0j.cndot..OMEGA..sub.j.sup.-1{circumflex over
(B)}.sub.0j.cndot..sup.T, and that the dependence of
trH.sub.m={tilde over (B)}.sub.0.sup.T{tilde over (B)}.sub.0 is
explicitly written on m. If the CpGs are randomly ordered, then
trH.sub.m=O(m), otherwise it is possible that
trH.sub.m=O(m.sup.1-.zeta.), .zeta.>0 reflecting a diminishing
rate of return by adding additional non-informative CpG sites. Then
.delta.=.SIGMA..sub.l=1.sup.d.sup.0P.DELTA.(.lamda..sub.l.omega..sub.l)+.-
SIGMA..sub.q=1.sup.QP.DELTA.({tilde over (.lamda.)}.sub.q{tilde
over (.omega.)}) is decomposed by the number k of CpG sites
affected by alterations that distinguish cases from controls. k is
fixed, k.epsilon.J.sub.m={1, . . . , m}; each of the
C(m,k)=m!/[k!(m-k)!] subsets J.sub.kl.OR right.J.sub.m of k indices
corresponds to a vector .delta..sub.kl representing the mean
methylation difference between case and control over systematic
biological processes that result in changes at the k specific CpG
sites represented by the k indices, and only those k CpG sites.
Thus .delta..sub.kl has at most k nonzero values. The bias
resulting from such processes is H.sub.m.sup.-1{tilde over
(B)}.sub.0.sup.T.delta..sub.kl=O(km.sup..zeta.-1). A prior
probability .pi..sub.kl is assumed that the subset J.sub.kl could
correspond to one or more biological processes that distinguish
cases from controls. It follows from this view that the prior
expectation of .delta. is
E [ .delta. | ( .pi. kl ) kl ] = k = 1 m l = 1 C ( m , k ) .pi. kl
.delta. kl = O ( k = 1 m l = 1 C ( m , k ) .pi. kl k m .zeta. - 1 )
. ( 8 ) ##EQU00005##
If a prior probability over sets of CpG sites in the genome is
constructed so that CpG sites are considered independent, and each
CpG site is assigned a uniform prior probability of .pi..sub.0,
then .pi..sub.kl.ident..pi..sub.0.sup.k(1-.pi..sub.0).sup.m-k and,
from (8),
E ( .delta. | .pi. 0 ) = O ( m .zeta. k = 1 m C ( m - 1 , k - 1 )
.pi. 0 k ( 1 - .pi. 0 ) m - k ) = .pi. 0 ( 1 - .pi. 0 ) O ( m
.zeta. ) . ( 9 ) ##EQU00006##
The bias does not depend on m if trH.sub.m=O(m), i.e. random
ordering. Random ordering renders the size of E(.delta.|.pi..sub.0)
theoretically independent of m, it does so at the cost of including
many potentially noninformative CpGs, early on at low values of m,
and these may be possible sources of bias in practice, without
offering any modeling benefit in return. If the CpG sites are
ordered by level of informativeness, then potentially
H.sub.m=O(m.sup.1-.zeta.), and there will be a small increasing
prior expectation of bias, motivating judicious choice of m. The
key, then, is to order the CpGs in terms of their ability to
distinguish different types profiled in S.sub.0, choosing m large
enough to distinguish the signatures from one another, but small
enough that the E(.delta.|.pi..sub.0) is reasonably low, in a
relative sense. Naturally, different choices of prior .pi..sub.kl
in (8) will lead to different conclusions about the magnitude of
bias. If the set J.sub.m of CpG sites used in S.sub.0 and S.sub.1
oversample those known to have less modifiable methylation states,
e.g. away from so-called shore regions (Doi A et al., 2009, Nat
Genet. 41: 1350-3), then .pi..sub.0 is effectively lowered, and so
will be the corresponding expected prior bias. It is worth
emphasizing that this analysis concerns only a Bayesian prior, not
the actual biological truth. In choosing CpG sites among those
assayed in S.sub.0 and S.sub.1, a potentially negative outcome
would be to have included a number of sites that also happen to
represent systematic, non-cell-mediated biological differences
between cases and controls in S.sub.1, in which case biased
estimates will be inevitable. In summary, bias in the proposed
estimation procedure is controlled by selecting a sufficiently
exhaustive list of cell types to profile in S.sub.0, and by
choosing m judiciously.
Example 4
Proof of Concept of Measurement Error Model for Determining Changes
in Distribution Of White Blood Cells Between Different
Subpopulations
[0277] In this example, general features of the method herein are
described that can be used with existing methylation data sets as
benchmarks for validating the proposed method to demonstrate its
clinical or epidemiological utility. Examples 6-9 that follow show
application of the method to specific data sets. The data analyses
involve DNA methylation data obtained by the Infinium
HumanMethylation27 Beadchip Microarrays from Illumina, Inc. (San
Diego, Calif.). A subset of m=100 CpG sites on the array was used
and the subset was selected as described below. In Examples 6-9,
S.sub.0 consisted of 46 white blood cell samples; the sorted,
normal, human, peripheral blood leukocyte subtypes were purchased
from AllCells.RTM., LLC (Emeryville, CA) and were isolated from
whole blood using a combination of negative and positive selection
with highly specific cell surface antibodies conjugated to magnetic
beads; materials and protocols were obtained from Miltenyi Biotec,
Inc. (Auburn, Calif.). These 46 samples are summarized in Table 2
and depicted by the clustering heatmap in FIG. 1. T lymphocytes
that express CD4 or CD8 constitute over 95% of the T cell class.
The pan-T cell type was further refined to CD4+, CD8+, and "other"
Pan-T cells subtypes.
[0278] In summary, the covariate vector w.sub.h consisted of
indicators for five cell types and another two indicators for CD4+
and CD8+ T cell subtypes. A generalization of the one-way ANOVA
parameterization assumed above for w.sub.h (Example 2) was
necessary to account for the ambiguous status of some Pan-T cells.
For each CpG site, a linear mixed effects model with a random
intercept for bead chip was used to estimate B.sub.0; 27 additional
whole blood control samples (replicates from the same individual)
were used to assist in estimating chip effects, since otherwise the
data set would have been sufficiently sparse to risk confounding
between cell type and chip. These "array controls" were indicated
with an additional term in w.sub.0h. For each CpG site, a linear
mixed effects model with a random intercept for bead chip was used
to estimate the corresponding row of B.sub.0 and B.sub.1.
[0279] From S.sub.0, F statistics were computed and used to order
each of the 26,486 autosomal CpGs by decreasing level of
informativeness with respect to blood cell types. FIG. 5A depicts
the relationship log.sub.10 trH.sub.m by log.sub.10 (m) for
increasing array sizes. FIG. 5B depicts the relationship
.differential. log.sub.10 tr(H.sub.m)/.differential. log(m) by
log.sub.10(m) for increasing array sizes, obtained by smoothing the
first differences of the curve depicted in FIG. 5A via loess
smoother. FIG. 5A also shows the tangent (obtained from the loess
curve) at low values of m. For O(m) convergence, FIG. 5A should
show a linear association with slope equal to one, and the curve in
FIG. 5B should show a curve close to the value of 1.0. Neither is
the case, i.e. convergence is sub-linear in m. It is noted that the
rate of convergence dropped precipitously after about 6,000 CpG
sites, but was notably slower than 0(m) even after m=10. In the
range of 1-1000 CpG sites the convergence rate appeared parabolic
with a minimum of about 0.85, starting to stabilize in the
m=100-300 range. Thus, maximum informativeness was provided by the
highest ranking m=100-300 CpG sites, with m>300 reflecting
diminishing returns from adding additional CpGs. Therefore, a
moderately low value of m in this range, m=100, consistent with the
size of a small custom microarray chip was chosen.
TABLE-US-00002 TABLE 2 Sorted white blood cells in S.sub.0 Short
Name Description Number B cells CD19+ B-lymphocytes 6 Granulocytes
CD15+ granulocytes 8 Monocytes CD 14+ monocytes 5 NK CD56+ Natural
Killer (NK) cells 11 T cells (CD4+).sup.1,2 CD3+CD4+ T-lymphocytes
8 T cells (CD8+).sup.1,3 CD3+CD8+ T-lymphocytes 2 T cells
(NKT).sup.1 CD3+CD56+ natural killer 1 T cells (other).sup.1 CD3+
T-lymphocytes 5 .sup.1Considered as a member of the "pan-T cell"
group. .sup.2Pan-T cell further refined as also belonging to the
"CD4+" group. .sup.3Pan-T cell further refined as also belonging to
the "CD8+" group.
Example 5
Cell Mixture Experiment for Validating the Method for Determining
Changes in Distribution of White Blood Cells Between Different
Subpopulations
[0280] In this example is described a laboratory reconstruction
experiment, which validates the concept on which the method herein
is based that DNA methylation retains substantial information about
cell mixtures. The results of applying the method herein to several
different target data sets S.sub.1 is described in Examples
6-9.
[0281] For the HNSCC and ovarian cancer data sets, from which bead
chip data were available, a linear mixed effects model with a
random intercept for bead chip was used to estimate the
corresponding row of B1. For the remaining data sets, no bead chip
data were available; consequently, ordinary least squares was used.
250 bootstrap iterations were used for each example and each of the
two bootstrap methods of standard error estimation.
[0282] An experiment was conducted which involved six known
mixtures of monocytes and B cells and six known mixtures of
granulocytes and T cells. FIG. 2 presents both the known fractions
("Expected") and the resulting predictions ("Observed") from
Infinium 27K profiles, as described above. As FIG. 2 shows,
accuracy of prediction is within 10%, and often less than 5%, with
the largest errors occurring for granulocytes, as shown in Table 3.
It is noted that the sum of the individual observed predictions for
each individual profile ranged from 98.9% to 102.7% even though the
constraints of the projection do not explicitly constrain the sum
to 100%; this provides additional evidence that the DNA methylation
profile captures information about cell mixtures.
TABLE-US-00003 TABLE 3 Summary statistics for errors in cell
mixture reconstruction Results* B cell Granulocyte Monocyte NK T
cell minimum 0.0 0.3 0.0 0.0 0.0 median 0.1 6.5 1.1 2.1 0.3 maximum
5.5 10.0 4.1 6.4 5.3 *|Observed % - Expected %|
Example 6
Application of the Methods Herein to the Subpopulations of Head and
Neck Cancer Patients and Controls
[0283] This example describes the application of the method herein
for determining changes in the distribution of white blood cells
between different subpopulations to patients having head and neck
squamous cell carcinoma (HNSCC). The target data set S.sub.1 was
obtained from arrays applied to whole blood specimens collected in
a random subset of individuals involved in an ongoing
population-based case-control study (Peters et al., 2005, Cancer
Epidemiol Biomarkers Prev, 14(2), 476-82) of head and neck cancer
(HNSCC): 92 cases and 92 age and sex matched controls. Blood was
drawn at enrollment (prior to treatment in 85% of the cases). Mean
age among the subjects arrayed in this study was 60 years, and
there were 56 females and 128 males, consistent with the higher
incidence of the disease in men. Thus, the covariate vector z
consisted of an indicator for case/control status, an indicator for
male sex, and age (in decades) centered at the mean. The clustering
heatmap in FIG. 3 depicts the raw DNA methylation data in S.sub.1.
Table 4 presents coefficient case status, double-bootstrap bias
estimates (estimates of bias arising from measurement error), as
well as naive, single-bootstrap, and double-bootstrap standard
error estimates. Each of these quantities is measured in percentage
points (%). Estimates of bias arising from measurement error (i.e.
substituting estimated quantities for known ones in a two-stage
statistical procedure) were almost always less than half a
percentage point, and for significant coefficient estimates, always
towards the null.
[0284] The proportion of CD4+ T-lymphocytes decreased in cases
compared with controls, with a bias-corrected estimate of -10:4
percentage points and approximate 95% confidence interval (-13:1%;
-3:3%); the proportion of NK cells decreased, with a bias-corrected
estimate of -1.5 percentage points and 95% confidence interval
(-2:2%; -0:75%); and the proportion of granulocytes increased, with
a bias-corrected estimate of 7.6 percentage points and 95%
confidence interval (4:2%; 10:9%). There was also some evidence of
an increase in CD8+ T-lymphocytes, with an estimate of 4.5
percentage points and 95% confidence interval (4:5%; 7:0%). As
shown in Table 5 the proportion of CD4+ T-lymphocytes decreased by
3.3 percentage points (-4:4%; -2:2%) per decade of age, and CD8+
T-lymphocytes increased by 2.0 percentage point (1:0%; 3:0%) per
decade. The other coefficients were insignificant.
[0285] For this analysis, R.sub.1,0.sup.2 was estimated at 14.2%,
and R.sub.1.1.sup.2 was estimated at 93:9%. Thus, a small but
non-negligible proportion of total variation (systematic
variation+unexplained biological heterogeneity+technical noise)
appeared to have been driven by changes in cell population between
cases and controls and as a result of aging. The SS.sub.e comprised
85% of total variation, so a substantial portion of variability in
DNA methylation appeared to remain unexplained (presumably due, in
large part, to technical noise). However, the systematic variation
was explained by changes in cell population.
[0286] These results were consistent with previous studies, as
HNSCC patients are known to display an absolute and relative
increase in myeloid derived granulocytes (Trellakis et al., 2011,
Int J Cancer, Epub ahead of print, DOI: 10.1002/ijc.25892) and also
displayed an alteration in lymphoid T cell homeostasis that leads
to decreases in CD4+ T cells (Kuss et al., 2004, Clin Cancer Res,
10(11), 3755-62; Kuss et al., 2005, Adv Otorhinolaryngol, 62,
161-72). In addition, the proportion of Treg cells (a subclass of
CD4+ T cells) is known to decrease from infancy to adulthood (Mold
et al., 2010, Science, 330(6011), 1695-9). The bias estimates
obtained from the double-bootstrap procedure allow the correction
of bias arising from measurement error. However, there is no
statistical procedure for correcting the other possible sources of
bias, those arising from changes in distribution among unprofiled
cell types as well as non-immune-mediated methylation differences.
Example 7 presents a detailed sensitivity analysis which shows that
the magnitude of the resulting bias is likely to be small, less
than a percentage point.
TABLE-US-00004 TABLE 4 Estimates for HNSCC analysis (case vs.
control) P- Est Bias.sub.2 SE.sub.0 SE.sub.1 SE.sub.2 value
(Intercept, .gamma..sub.0) -0.62 -0.02 0.41 0.52 0.52 0.23 B Cell
-0.45 0.04 0.30 0.77 0.76 0.55 Granulocyte 7.51 -0.07 0.50 1.73
1.71 <0.0001 Monocyte 0.49 0.10 0.50 0.47 0.48 0.31 NK -1.43
0.06 0.56 0.37 0.38 0.00017 T Cell (cd4+) -9.08 1.32 1.95 1.15 1.39
<0.0001 T Cell (cd8+) 3.06 -1.46 1.96 0.98 1.27 0.016 Est =
Regression coefficient estimate (.times.100%). Bias.sub.2 =
Double-bootstrap bias estimate (.times.100%). SE.sub.0 = Naive
standard error (.times.100%) SE.sub.1 = Single-bootstrap standard
error (.times.100%). SE.sub.2 = Double-bootstrap standard error
(.times.100%). P-values were computed using SE.sub.2.
TABLE-US-00005 TABLE 5 Estimated Regression Coefficients for Sex
and Age in HNSCC Data Set P- Est Bias.sub.2 SE.sub.0 SE.sub.1
SE.sub.2 value Sex (Intercept, 0.12 0.00 0.24 0.57 0.57 0.83
.gamma..sub.0) B Cell 0.38 0.01 0.17 0.85 0.84 0.65 Granulocyte
-0.29 -0.08 0.28 1.82 1.81 0.87 Monocyte 0.13 0.01 0.29 0.47 0.47
0.78 NK 0.49 0.05 0.32 0.40 0.40 0.22 T Cell -1.80 0.45 1.12 1.25
1.20 0.13 (cd4+) T Cell 0.82 -0.44 1.12 1.03 1.04 0.43 (cd8+) (Age
- (Intercept, -0.20 -0.02 0.15 0.24 0.24 0.40 60)/10 .gamma..sub.0)
B Cell 0.24 0.01 0.11 0.34 0.33 0.47 Granulocyte 1.12 -0.01 0.19
0.67 0.67 0.096 Monocyte 0.13 0.02 0.19 0.20 0.20 0.54 NK -0.22
0.02 0.21 0.15 0.15 0.14 T Cell -2.75 0.56 0.73 0.53 0.57 <
0.0001 (cd4+) T Cell 1.44 -0.56 0.73 0.46 0.50 0.0038 (cd8+) Est =
Regression coefficient estimate (.times.100%) Bias.sub.2 =
Double-bootstrap bias estimate (.times.100%). SE.sub.0 = Naive
standard error (.times.100%). SE.sub.1 = Single-bootstrap standard
error (.times.100%). SE.sub.2 = Double-bootstrap standard error
(.times.100%). P-values were computed using SE.sub.2.
Example 7
Application of the Methods Herein to Subpopulations of Ovarian
Cancer Cases and Controls
[0287] In this example the method herein for inferring changes in
the distribution of white blood cells between different
subpopulations (e.g. cases and controls) was applied to an ovarian
cancer data set (Teschendorff et al., 2009, PLoS ONE, 4(12),
e8274). DNA methylation data for blood samples were obtained from
Gene Expression Omnibus (Accession number GSE19711). Only those
cases in which blood was collected pre-treatment were used ere.
After removing four arrays with a preponderance of missing values,
the data set consisted of 272 controls and 129 cases in which blood
was collected prior to treatment. A clustering heatmap displaying
the DNA methylation data is shown in FIG. 6. In this analysis, z
consisted of case-control status, age (categorized in five-year
increments), and two bisulfite conversion efficiency measures.
Tables 6-8 presents result for case-control status and estimated
regression coefficients for age in ovarian cancer data set.
R.sub.1,0.sup.2 was estimated at 17.8%, and R.sub.1,1.sup.2 was
estimated at 86:1%.
TABLE-US-00006 TABLE 6 Estimates for Ovarian Cancer Analysis (Case
vs. Control) P- Est Bias.sub.2 SE.sub.0 SE.sub.1 SE.sub.2 value
(Intercept, .gamma..sub.0) -0.05 -0.05 0.41 0.19 0.20 0.81 B Cell
-1.36 0.02 0.29 0.22 0.23 <0.0001 Granulocyte 8.97 -0.04 0.49
1.02 1.00 <0.0001 Monocyte 0.55 0.06 0.49 0.29 0.30 0.066 NK
-2.09 0.01 0.55 0.31 0.34 <0.0001 T Cell (cd4+) 5.64 0.18 1.93
1.06 1.34 <0.0001 T Cell (cd8+) -0.35 -0.17 1.93 0.95 1.19 0.77
Est = Regression coefficient estimate (.times.100%). Bias.sub.2 =
Double-bootstrap bias estimate (.times.100%). SE.sub.0 = Naive
standard error (.times.100%) SE.sub.1 = Single-bootstrap standard
error (.times.100%). SE.sub.2 = Double-bootstrap standard error
(.times.100%). P-values were computed using SE2.
TABLE-US-00007 TABLE 7 Estimated Regression Coefficients for Age in
Ovarian Cancer Data Set P- Est Bias.sub.2 SE.sub.0 SE.sub.1
SE.sub.2 value Age (Intercept, .gamma..sub.0) -1.24 -0.05 0.37 0.41
0.40 0.0021 55-60 B Cell 0.40 0.04 0.27 0.50 0.49 0.42 Granulocyte
0.91 0.04 0.45 2.04 2.02 0.65 Monocyte 0.85 0.12 0.45 0.59 0.58
0.15 NK -0.25 0.10 0.50 0.55 0.55 0.65 T Cell (cd4+) -2.79 0.63
1.76 2.13 1.96 0.15 T Cell (cd8+) 2.22 -0.84 1.77 1.81 1.59 0.16
Age (Intercept. .gamma..sub.0) -0.72 -0.07 0.35 0.39 0.39 0.070
60-65 B Cell 0.54 0.07 0.25 0.49 0.49 0.27 Granulocyte 0.71 0.06
0.42 1.99 1.98 0.72 Monocyte 0.27 0.08 0.42 0.58 0.58 0.64 NK -0.24
0.06 0.47 0.55 0.55 0.65 T Cell (cd4+) -3.54 0.80 1.66 2.02 1.97
0.072 T Cell (cd8+) 2.84 -0.97 1.66 1.85 1.64 0.084 Age (Intercept,
.gamma..sub.0) -0.53 -0.08 0.40 0.41 0.41 0.19 65-70 B Cell -0.03
0.07 0.29 0.51 0.51 0.96 Granulocyte 2.46 0.02 0.48 2.17 2.17 0.26
Monocyte 0.85 0.12 0.48 0.64 0.64 0.18 NK -0.89 0.07 0.54 0.59 0.60
0.14 T Cell (cd4+) -6.12 1.48 1.89 2.18 2.12 0.0038 T Cell (cd8+)
4.37 -1.64 1.89 1.87 1.71 0.011 Age (Intercept. .gamma..sub.0)
-1.20 -0.07 0.40 0.41 0.41 0.0037 70-75 B Cell 0.29 0.07 0.29 0.48
0.48 0.55 Granulocyte 2.13 -0.05 0.48 2.05 2.04 0.30 Monocyte 0.76
0.12 0.48 0.60 0.60 0.21 NK -0.51 0.19 0.54 0.56 0.55 0.36 T Cell
(cd4+) -6.82 1.97 1.89 2.16 2.12 0.0013 T Cell (cd8+) 5.35 -2.20
1.90 1.89 1.79 0.0028 Age (Intercept, .gamma..sub.0) -0.31 -0.09
0.49 0.46 0.45 0.49 75+ B Cell 0.13 0.08 0.35 0.54 0.53 0.81
Granulocyte 1.10 -0.15 0.58 2.12 2.11 0.60 Monocyte 1.73 0.12 0.59
0.64 0.63 0.0065 NK -0.30 0.13 0.66 0.60 0.59 0.61 T Cell (cd4+)
-6.54 1.31 2.30 2.29 2.18 0.0027 T Cell (cd8+) 2.73 -1.37 2.31 2.06
1.86 0.14 Est = Regression coefficient estimate (.times.100%)
Bias.sub.2 = Double-bootstrap bias estimate (.times.100%). SE.sub.0
= Naive standard error (.times.100%). SE.sub.1 = Single-bootstrap
standard error (.times.100%). SE.sub.2 = Double-bootstrap standard
error (.times.100%). P-values were computed using SE.sub.2
TABLE-US-00008 TABLE 8 Estimated Regression Coefficients for
Bisulfite Conversion in Ovarian Cancer Data Set P- Est Bias.sub.2
SE.sub.0 SE.sub.1 SE.sub.2 value BSC1 (Intercept, .gamma..sub.0)
-0.08 0.00 0.14 0.09 0.10 0.39 (Green/ B Cell -0.10 0.00 0.10 0.10
0.10 0.30 1000) Granulocyte 0.13 0.04 0.17 0.40 0.40 0.74 Monocyte
0.13 -0.01 0.17 0.12 0.12 0.26 NK -0.09 0.00 0.19 0.14 0.14 0.53 T
Cell (cd4+) 0.51 -0.14 0.65 0.48 0.51 0.32 T Cell (cd8+) -0.23 0.11
0.66 0.40 0.47 0.62 BSC2 (Intercept, .gamma..sub.0) 0.25 0.00 0.14
0.08 0.08 0.0027 (Green/ B Cell 0.07 0.00 0.10 0.08 0.08 0.40 1000)
Granulocyte 0.07 0.01 0.17 0.38 0.37 0.84 Monocyte -0.18 0.01 0.17
0.10 0.10 0.075 NK 0.10 0.00 0.19 0.12 0.12 0.41 T Cell (cd4+)
-0.65 0.20 0.67 0.41 0.50 0.20 T Cell (cd8+) 0.63 -0.21 0.68 0.34
0.45 0.16 Est = Regression coefficient estimate (.times.100%)
Bias.sub.2 = Double-bootstrap bias estimate (.times.100%). SE.sub.0
= Naive standard error (.times.100%). SE.sub.1 = Single-bootstrap
standard error (.times.100%). SE.sub.2 = Double-bootstrap standard
error (.times.100%). P-values were computed using SE.sub.2. It is
noted that coefficients are given as %/1000 units fluorescence, and
that standard deviations for BSC1 and BSC2 were 1950 and 2169,
respectively.
[0288] Compared with controls, data obtained from cases showed
significant increases in granulocytes and significant decreases in
B cells, NK cells, and CD4+ T cells. Cases also showed marginally
significant increases in monocytes. These results are consistent
with previous literature, in which it has been demonstrated that
ovarian cancer patients experience decreases in B and T lymphocytes
(den Ouden et al., 1997, Eur J Obstet Gynecol Reprod Biol, 72,
73-77; Bishara et al., 2008, Reprod Biol, 138, 7175; Cho et al.,
2009, Cancer Immunol Immunother, 58, 1523), increases in monocytes
(den Ouden et al., 1997, Eur J Obstet Gynecol Reprod Biol, 72,
73-77; Bishara et al., 2008, Reprod Biol, 138, 7175) and (somewhat
equivocally) increases in eosinophil granulocytes (Bishara et al.,
2008, Reprod Biol, 138, 7175). Additionally, there were significant
systematic decreases in CD4+ T cells with increasing age, with a
gradient consistent in direction and somewhat consistent in
magnitude with the corresponding effect found in the HNSCC data
set. The CD8+ T cell coefficients for were positive, with gradient
consistent in direction and somewhat consistent in magnitude with
the corresponding effect found in the HNSCC data set. No bisulfite
conversion coefficient was significant, and coefficients were of
small magnitude (Table 8; generally less than 1 percentage point
per standard deviation).
Example 8
Application of the Methods Herein to Subpopulations of Down
Syndrome Patients and Controls
[0289] The method herein was applied to trisomy 21 (Down syndrome)
data set (Kerkel et al., PLoS Genet. 2010, 6(11):e1001212)
consisting of 29 total peripheral blood leukocyte samples from Down
syndrome cases and 21 controls, as well as six T cell samples from
cases and four T cell samples from controls (GEO Accession number
GSE25395). Because of the potential for bias induced by copy number
amplification four CpG sites on Chromosome 21 were excluded,
resulting in m=96 CpG sites that were used for analysis. A
clustering heatmap displaying the DNA methylation data is shown in
FIG. 7. In one analysis data from cases and controls were compared
using the total leukocyte samples only, and in another total
leukocytes to T cells were compared, pooling cases and controls.
Coefficient estimates are provided in Table 9. The only significant
difference between cases and controls was in B cell distribution,
with bias-corrected estimated decrease of 4.8%, 95% confidence
interval (-6:2%; -3:5%). This result is consistent with known
immune characteristics of Down Syndrome, including deficiencies in
both B and T cells (Verstegen et al., 2010, Pediatr Res, 67, 563-9;
Ram and Chinen, 2011, Clin Exp Immunol, 164, 9-16). However, in the
comparison between total leukocytes and T cells, the coefficients
except B Cell and NK were highly significant, in directions
consistent with comparison of a sample of purified T cells to a
generic whole blood sample. In fact, an estimate of the cellular
composition of the T cell samples can be obtained by a simple
linear transformation of .GAMMA. estimates (adding intercept terms
with the T cell coefficients); this operation produces values that
are not significantly distinct from zero for the cell types except
CD4+ and CD8+, whose bias-corrected estimates were, respectively,
75.9%, 95% confidence interval (67%; 85%) and 8.6%, 95% confidence
interval (0%; 17%), for cases and controls consistent with the
known distribution of these T cells. For the analysis of case vs.
control within total leukocytes, R.sub.1,0.sup.2 was estimated at
4.5%, and R.sub.1,1.sup.2 was estimated at 67:6%. For the analysis
of total leukocyte vs. T cell with pooled cases and controls,
R.sub.1,0.sup.2 was estimated at 81.4%, and R.sub.1,1.sup.2 was
estimated at 98:9%. The latter set of coefficients of determination
indicates that a substantial portion of variation is explained by
composition of leukocytes, which is the expected result for such an
analysis.
TABLE-US-00009 TABLE 9 Estimates for Down syndrome analysis (case
vs. control, total leukocyte vs. T Cell) P- Est Bias.sub.2 SE.sub.0
SE.sub.1 SE.sub.2 value Case Intercept, .gamma..sub.0 2.02 -0.10
0.86 1.17 1.17 0.084 Status B Cell -4.87 -0.03 0.62 0.70 0.69
<0.0001 (total Granulocyte 3.85 0.15 1.02 3.01 2.98 0.20 leuko-
Monocyte 0.12 0.11 1.03 0.97 0.96 0.90 cytes) NK -0.63 -0.06 1.16
0.83 0.82 0.44 T Cell -0.30 -0.37 4.02 2.49 2.66 0.91 (cd4+) T Cell
-1.89 0.35 4.03 2.47 2.42 0.43 (cd8+) T Cell Intercept,
.gamma..sub.0 -0.97 0.07 1.7 1.4 1.6 0.54 (cases + B Cell -0.51
0.02 1.2 1.2 1.2 0.67 controls) Granulocyte -56.21 0.49 2.1 3.4 3.4
<0.0001 Monocyte -5.13 -0.37 2.1 1.1 1.3 <0.0001 NK 0.07 0.34
2.3 1.5 1.7 0.97 T Cell 60.18 -2.89 8.1 3.2 5.2 <0.0001 (cd4+) T
Cell 3.00 2.34 8.2 3.3 5.4 0.58 (cd8+) Est = Regression coefficient
estimate (.times.100%). Bias.sub.2 = Double-bootstrap bias estimate
(.times.100%). SE.sub.0 = Naive standard error (.times.100%).
SE.sub.1 = Single-bootstrap standard error (.times.100%). SE.sub.2
= Double-bootstrap standard error (.times.100%). P-values were
computed using SE.sub.2.
Example 9
Application of the Methods Herein to Obesity in an African American
Population
[0290] The method herein was also applied to an obesity data set
(Wang et al., 2010) consisting of seven lean African-Americans and
seven Obese African-Americans (GEO Accession number GSE25301). FIG.
8 shows a clustering heatmap displaying the DNA methylation data.
In this analysis, z consisted of obesity status. Obese subjects had
an estimated increase of 12 percentage points in granulocytes,
bias-corrected 95% confidence interval (3:4%; 20%) and an estimated
decrease of 4 percentage points in NK cells, bias-corrected 95%
confidence interval (-7:7%; -0:9%) (Table 10). No significant
differences were found for other blood cell types. The specific
immunological differences estimated by the method herein are
consistent with known immunological perturbations associated with
type II diabetes (Lynch et al., 2009, Obesity, 17(3), 601-5;
Anderson et al., 2011, Curr Opin Lipidol, 21(3), 172-7.).
TABLE-US-00010 TABLE 10 Estimated Regression Coefficients for Data
Set concerning Obesity in African Americans P- Est Bias.sub.2
SE.sub.0 SE.sub.1 SE.sub.2 value Obese Intercept, .gamma..sub.0
0.96 -0.09 1.08 0.85 0.84 0.25 B Cell 0.70 -0.03 0.78 1.16 1.14
0.54 Granulocyte 12.25 0.51 1.30 4.27 4.27 0.0041 Monocyte -0.70
-0.01 1.31 1.57 1.54 0.65 NK -4.42 -0.13 1.46 1.75 1.73 0.011 T
Cell (cd4+) -6.97 -0.29 5.11 6.27 5.49 0.20 T Cell (cd8+) -2.29
0.22 5.13 4.97 4.36 0.60 Est = Regression coefficient estimate
(.times.100%). Bias.sub.2 = Double-bootstrap bias estimate
(.times.100%). SE.sub.0 = Naive standard error (.times.100%).
SE.sub.1 = Single-bootstrap standard error (.times.100%). SE.sub.2
= Double-bootstrap standard error (.times.100%). P-values were
computed using SE.sub.2.
Example 10
Additional Analyses
[0291] In this example a special case was considered in which
subject population was such that for this population z=0 and the
population was sufficiently homogeneous with respect to blood cell
distribution to admit sensible characterization of that
distribution. In such case it is possible to recover estimates from
{circumflex over (.GAMMA.)}. The results of such an analysis
applied to the HNSCC case/control data set is shown in Table 11
below.
TABLE-US-00011 TABLE 11 White Blood Cell Distribution in HNSCC
Controls 95% Conf. Est SE.sub.2 Bias.sub.2 BC-Est Int. B Cell 7.9
0.5 0.1 7.8 (6.8, 8.9) Granulocyte 42.2 1.2 -0.1 42.3 (39.9, 44.6)
Monocyte 9.9 0.7 0.3 9.6 (8.3, 10.9) NK 7.9 0.7 0.2 7.7 (6.3, 9.1)
T Cell (cd4+) 15.2 3.0 -0.1 15.3 (9.5, 21.2) T Cell (cd8+) 7.6 3.0
0.4 7.2 (1.4, 13.0)TZ,1/32 Est = Regression coefficient estimate
(.times.100%), normalized so that estimates sum to 90%. SE.sub.2 =
Double-bootstrap standard error (.times.100%). Bias.sub.2 =
Double-bootstrap bias estimate (.times.100%). BC-Est =
bias-corrected estimate.
[0292] If the coefficients represented a complete profiling of
blood cell types, the estimates should sum approximately to one,
even though the model does not explicitly constrain them so. In
this case, the original bias corrected estimates (of leukocyte
distribution in HNSCC controls) summed to 133%. The table shows the
values re-normalized to 90%, the anticipated proportion of the cell
types. The resulting estimated distribution of leukocytes is
consistent with the literature (Alberts B et al., 2008, Molecular
Biology of the cell. New York, N.Y.: Taylor and Francis, 5.sup.th
edition)
[0293] An additional analysis was also conducted in which S.sub.0
consisted of only samples with pure CD4+ or CD8+ cells and S.sub.1
to consisted only of samples having the less purified
T-lymphocytes. For such S.sub.1, there were no covariates, so z
consisted only of an intercept. The following unnormalized
bias-corrected estimates: 69.0% CD4+, 95% confidence interval (54%;
84%), and 32.5% CD8+, 95% confidence interval (19%; 46%). This is
consistent with known proportions of these specific cell types
among T lymphocytes.
Example 11
Sensitivity Analysis
[0294] The bias estimates evident from the double-bootstrap
procedure admit the possibility of correcting the bias arising from
measurement error. There is no statistical procedure for correcting
the other possible sources of bias, those arising from unprofiled
cell types and non-cell-mediated profile differences, i.e.
methylation difference signatures .delta. with nonzero projection
onto the space spanned by the WBC signatures. It is possible to
conduct a sensitivity analysis using the theory presented under
"Bias" (equations 6-9). It is shown that the magnitude of the bias
is likely to be small, less than a percentage point.
Detailed Analysis
[0295] A method of sensitivity analysis to estimate the magnitude
of bias arising from unprofiled cell types and non-cell-mediated
profile differences is described below for the HNSCC data set
presented in Example 6 and FIG. 4.
[0296] For each value of k.epsilon..sub.m, k elements are randomly
sampled, .sub.k.OR right..sub.m without replacement, then k rows of
B.sub.1 are sampled without replacement, .delta.* is set equal to
the m.times.d.sub.1 zero matrix, and the rows indicated by .sub.k
are substituted by the k rows selected from B.sub.1. The matrix
.delta.* served as a representative of the sum of processes having
systematic methylation changes at k locations, of total magnitude
consistent with the observed data (under the conservative
assumption that no systematic methylation difference is cell
mediated), and .alpha.*=(B.sub.0B.sub.0).sup.-1B.sub.0.delta.*
represented the corresponding bias in .GAMMA.. If, as in this
situation, the goal was to assess the sensitivity to bias in column
of B.sub.1 (i.e. Case Status), the uninteresting columns of
.delta.* or .alpha.* could be simply deleted. Replicating this
resampling procedure 100,000 times, an approximation to the
distribution of possible biases corresponding to processes
involving exactly k CpG sites was generated. HG. 4 displays the
results of such an analysis, showing the distribution of
(.alpha.*.sup.T.alpha.*).sup.-1/2 for various values of k. It is
noted that the relationship of median values to m was consistent
with the theory presented in Example 12 under the subheading
"Additional simulations." The median values of (a*.sup.T.alpha.*)
had an almost perfect linear relationship with m. The magnitude of
the bias was small: for the more likely low values of k, the bias
was 0.1 to 0.25 of a percentage point. In addition, this analysis
was conservative in that it assumed the effect in B.sub.1 was due
to non-cell-mediated processes, a strongly conservative assumption.
In addition, for various choices of .pi..sub.0 over a range of
small magnitudes, the expected bias over the uniform posterior
implied by .pi..sub.0 was computed by iterated expectation, first
by computing the mean bias for each choice of k, then forming the
expectation over the binomial distribution Bin(100, .pi..sub.0), As
noted in details described under "Bias" in Example 3 the result
scaled linearly with .pi..sub.0. The constant of proportionality
was estimated to be 2.08 percentage points. In summary, if the
prior expectation is of even moderate size (.about.0.1) that any
one CpG among the 100 selected for this application will show
systematic differentiation between cases and controls, then the
implied bias would be expected to be less than a percentage
point.
Example 12
Simulations
[0297] To verify the properties of the proposed methodology,
extensive simulation studies were conducted. Simulation parameters
were obtained from the HNSCC data set, and most simulations assumed
no sources of biological bias (DNA methylation changes arising from
processes not mediated by the profiled leukocytes, including shifts
in distribution within cell types not profiled). In every
simulation, S.sub.0 was specified to consist of five B cell
samples, ten granulocyte samples, five monocyte samples, 15 NK
samples, five general T cell samples, eight specific CD4+ T cell
samples, and two specific CD8+ T cell samples. Estimates from the
external validation set S.sub.0, described above, were used for
mean methylation profiles among WBC types, using the m=100 most
informative CpG sites.
[0298] n.sub.l/2 cases and n.sub.o/2 controls, were specified,
no.epsilon.{100, 200, 500}. Among the controls, methylation
profiles were generated by a white blood cell population of 7% B
cells, 62% granulocytes, 6% monocytes, 2% NK cells, and 13% were T
cells, of which 65% were CD4+ cells and 35% were CD8+ cells, and
the remaining 5% were unspecified (and assumed to have mean equal
to the unsorted T-lymphocytes). Among cases, one of the following
scenarios was specified: a 4% reduction in CD4+ cells, a 2%
reduction in CD8+ cells, and an 8% increase in granulocytes
(alternative with changes in both CD4+ and CD8+, "Strong
Alternative I"); a 6% reduction in CD4+ cells, and an 8% increase
in granulocytes (alternative with changes in CD4+ and not CD8+,
"Strong Alternative II"); a weaker alternative with half the
effects of Strong Alternative I ("Mixed Alternative" elaborated
upon below); and two null scenarios with no changes in cell
population, each with a different assumption about .delta.. It is
noted that these changes reflect absolute changes in percentage
points, not relative changes. It is also noted that these values
were actually used to generate Dirichlet-distributed mixture
weights for each simulated subject, with Dirichlet parameters equal
to a precision parameter (10 corresponding to "noisy", and 100
corresponding to "precise") times the mean weight described
above.
[0299] Residual effects .xi..sub.i.sup.(0) for controls were set
equal to 0.1 times estimated intercept .mu..sub.1 and residual
effects .xi..sub.i.sup.(1) for cases were set equal to 0.08 or 0.09
times .mu..sub.1 plus multiples 10.theta. of the column of U
corresponding to case. The constants of proportionality 0.1, 0.08,
and 0.09 were chosen to correspond to assumed contributions of .xi.
to an overall methylation signature presumed to be dominated by
profiled populations of white blood cells in specified proportions,
with 0.08 used for the strong alternatives and 0.09 used for the
Mixed Alternative. The constant 10 was used to amplify the scale of
.delta. so that its effect could be detected in simulation; it is
noted that U was orthogonal to the white blood cell profiles, by
construction.
[0300] It is noted also that the individual, Dirichlet-generated
subject weights did not necessarily sum to one, and the difference
from 1 was not applied as a multiplier; thus the resulting .xi.
corresponded to the situation P.mu..sub.q=0, where
P=(B.sub.0B.sub.0).sup.-1B.sub.0 along with orthogonal
contributions from the .lamda. terms of (6). The multiplier
.theta.=0 was used for strong alternatives, and the "Strong Null"
case (i.e. no methylation differences between cases and controls)
and .theta.=0.5 was used for the Mixed Alternative, and .theta.=1
was used for the "Mixed Null" with case/control differences not
mediated by cellular population differences.
[0301] A simple normal error structure for e.sub.oh and e.sub.oi
was specified, with no chip effects, and with variance equal to the
sum of chip and residual variance estimated (individually for each
CpG) for the HNSCC data. For each simulation, 50 bootstraps were
used to estimate standard errors. 1000 simulations were run for
each scenario. Table 12 presents results for n.sub.1=200 with
precise mixture weights (small within-status heterogeneity in
distribution), and Table 13 presents results for n.sub.1=200 with
noisy mixture weights (larger within-status heterogeneity). The
tables show mean estimate, simulation standard deviation, median
estimates for the three types of proposed standard errors, and
proportion of p-values (obtained from z-scores constructed using
the double-bootstrap standard error) falling below .alpha.=0.05 and
.alpha.=0.01.
[0302] In these cases, the bias in estimation was minimal. Both
types of bootstrap produced similar standard error estimates, which
were close to the simulation standard deviation and often quite
different from the naive standard error estimate. Under null
scenarios, the rejection probabilities were tolerably close to
their nominal values, and for alternatives, power could be quite
high, even with this modest design.
Results for Coefficients of Determination
[0303] Results for the coefficients of determination are provided
in Table 14. R.sub.1,0.sup.2 decreased with decreasing strength of
the alternative, falling to zero under both null scenarios. For
strong alternatives, R.sub.1,1.sup.2 was frequently close to 1.0.
For the Mixed Alternative, R.sup.2.sub.1,1 had a lower, and still
high values ranging from about 0.85 to 0.90. For the mixed null
result, R.sub.1,1.sup.2 typically had lower values, from about 0.05
to 0.20. In the Strong Null case, R.sub.1,1.sup.2 covered a broader
range among moderately low values; note, however, that this
scenario effectively represents 0/0, i.e. a poorly defined value.
Scenarios with n.sub.1.epsilon.{100, 500} produced similar results,
with simulation standard deviations and power adjusted accordingly,
and still having practical utility.
Additional Simulations
[0304] Additional simulations, were conducted which assumed bias
arising from processes not profiled by the profiled leukocytes. For
these scenarios, .xi..sup.0 was set to {circumflex over
(.mu.)}.sub.1, and .xi..sup.1=.xi..sup.0 except for a set of CpG
sites randomly selected among the m dimensions of the array (once
and for all before 1000 simulations); among those dimensions j,
.xi..sup.1.sub.j was set to 1-{circumflex over (.mu.)}.sub.1j,
reflecting a \reversal" of methylation state. Estimates were biased
towards the null, on the order of about a percentage point.
TABLE-US-00012 TABLE 12 Simulation results (precise mixtures,
n.sub.1 = 200) Truth Est SD SE.sub.0 SE.sub.1 SE.sub.2 pow(0.05)
pow(0.01) Strong Alternative I (.theta. = 0) B Cell 0.0 0.07 1.00
0.92 0.97 0.98 0.057 0.018 Granulocyte 8.0 8.02 0.73 0.39 0.73 0.73
1.000 1.000 Monocyte 0.0 0.01 0.48 0.43 0.47 0.47 0.055 0.013 NK
0.0 -0.09 1.08 1.02 1.02 1.05 0.066 0.015 T Cell (cd4+) -4.0 -4.06
0.81 0.80 0.78 0.81 0.999 0.989 T Cell (cd8+) -2.0 -1.93 0.83 0.81
0.78 0.81 0.653 0.419 Strong Alternative II (.theta. = 0) B Cell
0.0 0.00 0.97 0.92 0.97 0.99 0.048 0.016 Granulocyte 8.0 8.00 0.71
0.39 0.72 0.72 1.000 1.000 Monocyte 0.0 0.03 0.48 0.42 0.47 0.47
0.063 0.016 NK 0.0 0.03 1.04 1.02 1.01 1.05 0.052 0.014 T Cell
(cd4+) -6.0 -5.83 0.76 0.80 0.77 0.80 1.000 1.000 T Cell (cd8+) 0.0
-0.22 0.81 0.81 0.80 0.81 0.064 0.014 Mixed Alternative (.theta. =
0.5) B Cell 0.0 -0.02 1.02 1.10 0.96 0.98 0.065 0.011 Granulocyte
4.0 3.99 0.75 0.47 0.73 0.73 1.000 0.995 Monocyte 0.0 0.02 0.49
0.51 0.47 0.47 0.060 0.015 NK 0.0 0.04 1.05 1.22 1.01 1.04 0.054
0.009 T Cell (cd4+) -2.0 -2.07 0.82 0.96 0.79 0.83 0.695 0.471 T
Cell (cd8+) -1.0 -0.95 0.82 0.96 0.78 0.82 0.203 0.082 Mixed Null
(.theta. = 1) B Cell 0.0 0.00 1.04 1.58 0.96 1.02 0.066 0.017
Granulocyte 0.0 0.03 0.73 0.67 0.74 0.74 0.055 0.014 Monocyte 0.0
-0.01 0.47 0.73 0.47 0.48 0.054 0.013 NK 0.0 -0.01 1.12 1.76 1.01
1.09 0.063 0.014 T Cell (cd4+) 0.0 0.01 0.87 1.38 0.80 0.90 0.054
0.013 T Cell (cd8+) 0.0 -0.02 0.88 1.39 0.79 0.89 0.057 0.015
Strong Null (.theta. = 0) B Cell 0.0 -0.01 0.99 0.90 0.96 0.96
0.068 0.014 Granulocyte 0.0 0.03 0.72 0.38 0.74 0.73 0.052 0.013
Monocyte 0.0 -0.01 0.47 0.42 0.47 0.47 0.055 0.013 NK 0.0 -0.01
1.06 1.00 1.01 1.02 0.059 0.020 T Cell (cd4+) 0.0 0.00 0.81 0.78
0.80 0.82 0.054 0.013 T Cell (cd8+) 0.0 -0.01 0.81 0.79 0.79 0.80
0.054 0.015 Est = Mean regression coefficient estimate
(.times.100%); SD = SD regression coefficient estimate
(.times.100%). SE.sub.0 = Naive standard error (.times.100%);
SE.sub.1 = Single-bootstrap standard error (.times.100%). SE.sub.2
= Double-bootstrap standard error (.times.100%). pow(.alpha.) =
Pr{P.sub.2 < .alpha.}, where P.sub.2 is the p-value computed
from SE.sub.2.
TABLE-US-00013 TABLE 13 Simulation Results (Noisy Mixtures, n.sub.1
= 200) Truth Est SD SE.sub.0 SE.sub.1 SE.sub.2 pow(0.05) pow(0.01)
Strong Alternative I (.theta. = 0) B Cell 0.0 -0.06 1.39 0.92 1.36
1.34 0.065 0.019 Granulocyte 8.0 7.87 2.02 0.39 2.00 1.99 0.974
0.897 Monocyte 0.0 0.05 1.03 0.42 1.04 1.02 0.049 0.012 NK 0.0
-0.02 1.21 1.02 1.16 1.18 0.061 0.010 T Cell (cd4+) -4.0 -4.00 1.23
0.79 1.21 1.22 0.903 0.739 T Cell (cd8+) -2.0 -1.97 1.05 0.80 1.02
0.98 0.517 0.298 Strong Alternative II (.theta. = 0) B Cell 0.0
-0.08 1.38 0.92 1.36 1.34 0.063 0.017 Granulocyte 8.0 7.90 2.03
0.39 1.99 1.98 0.973 0.905 Monocyte 0.0 0.10 1.07 0.42 1.04 1.02
0.054 0.019 NK 0.0 0.02 1.17 1.02 1.14 1.18 0.053 0.009 T Cell
(cd4+) -6.0 -5.70 1.19 0.80 1.13 1.16 0.999 0.986 T Cell (cd8+) 0.0
-0.23 1.08 0.81 1.10 1.04 0.066 0.015 Mixed Alternative (.theta. =
0.5) B Cell 0.0 0.05 1.42 1.10 1.34 1.34 0.066 0.016 Granulocyte
4.0 4.00 2.01 0.47 2.02 2.01 0.500 0.291 Monocyte 0.0 0.01 1.06
0.51 1.03 1.02 0.072 0.020 NK 0.0 -0.02 1.24 1.22 1.13 1.16 0.064
0.013 T Cell (cd4+) -2.0 -2.11 1.30 0.95 1.26 1.28 0.391 0.191 T
Cell (cd8+) -1.0 -0.94 1.08 0.96 1.05 1.02 0.163 0.052 Mixed Null
(.theta. = 1) B Cell 0.0 0.06 1.41 1.59 1.36 1.37 0.062 0.016
Granulocyte 0.0 0.04 2.08 0.67 2.06 2.05 0.056 0.008 Monocyte 0.0
-0.02 1.05 0.73 1.03 1.03 0.058 0.020 NK 0.0 0.01 1.26 1.76 1.14
1.22 0.066 0.011 T Cell (cd4+) 0.0 -0.01 1.42 1.38 1.31 1.36 0.067
0.016 T Cell (cd8+) 0.0 0.00 1.19 1.39 1.08 1.10 0.073 0.011 Strong
Null (.theta. = 0) B Cell 0.0 0.06 1.37 0.91 1.36 1.32 0.065 0.017
Granulocyte 0.0 0.03 2.07 0.38 2.06 2.05 0.055 0.009 Monocyte 0.0
-0.02 1.04 0.42 1.03 1.02 0.057 0.021 NK 0.0 0.01 1.19 1.01 1.14
1.16 0.053 0.018 T Cell (cd4+) 0.0 -0.04 1.38 0.79 1.31 1.31 0.069
0.015 T Cell (cd8+) 0.0 0.01 1.11 0.79 1.08 1.03 0.065 0.016 Est =
Mean regression coefficient estimate (.times.100%); SD = SD
regression coefficient estimate (.times.100%). SE.sub.0 = Naive
standard error (.times.100%); SE.sub.1 = Single-bootstrap standard
error (.times.100%). SE.sub.2 = Double-bootstrap standard error
(.times.100%). pow(.alpha.) = Pr{P.sub.2 < .alpha.}, where
P.sub.2 is the p-value computed from SE.sub.2.
TABLE-US-00014 TABLE 14 Results for coefficients of determination
Median R.sub.1, 0.sup.2 Median R.sub.1, 1.sup.2 (Interquartile
(Interquartile Range) Range) Precise Strong Alternative I 0.13
(0.12-0.15) 0.98 (0.97-0.98) Mixtures (.theta. = 0) n.sub.1 = 200
Strong Alternative II 0.13 (0.12-0.15) 0.98 (0.97-0.98) (.theta. =
0) Mixed Alternative 0.04 (0.03-0.05) 0.88 (0.85-0.91) (.theta. =
0.5) Mixed Null (.theta. = 1) 0.00 (0.00-0.00) 0.10 (0.05-0.17)
Strong Null (.theta. = 0) 0.00 (0.00-0.00) 0.25 (0.15-0.38) Noisy
Strong Alternative I 0.05 (0.03-0.06) 0.98 (0.97-0.98) Mixtures
(.theta. = 0) n.sub.1 = 200 Strong Alternative II 0.05 (0.03-0.06)
0.98 (0.97-0.98) (.theta. = 0 ) Mixed Alternative 0.01 (0.01-0.02)
0.89 (0.81-0.94) (.theta. = 0.5) Mixed Null (.theta. = 1) 0.00
(0.00-0.01) 0.46 (0.28-0.64) Strong Null (.theta. = 0) 0.00
(0.00-0.01) 0.72 (0.55-0.85)
Example 13
Identification of a Unique DMR in CD3Z Gene
[0305] Individual samples of sorted, normal, human, peripheral
blood leukocytes as shown in Table 15, were purchased from
AllCells.RTM., LLC (Emeryville, CA). These leukocytes were sorted
in a column with antibody-conjugated magnetic beads using a
combination of positive and negative selection. Genomic DNA from
the leukocytes was extracted according to manufacturer's protocol
using the DNeasy Blood & Tissue kit (Qiagen) or the AllPrep
DNA/RNA/Protein Mini Kit according to manufacturer's protocol (Cat.
No. 8004, QIAGEN, Valencia, Calif.), then quantified by NanoDrop
ND-1000 Spectrophotometer (NanoDrop Technologies, Inc., Wilmington,
Del.) and stored at -20.degree. C. The extracted genomic DNA was
subjected to Bisulfite conversion by treatment with sodium
bisulfite using the EZ DNA Methylation Kit (Zymo) following the
manufacturer's protocol, thereby converting unmethylated cytosine
residues to uracil and leaving methylated cytosine residues
intact.
TABLE-US-00015 TABLE 15 Sorted leukocytes from AllCells .RTM., LLC
Cell Lineage Abbreviation N CD3+ T Lymphocytes Pan-T 5 CD3+CD4+ T
Lymphocytes CD4 2 CD3+CD4+CD25+ Treg 6 Regulatory T Lymphocytes
CD3+CD8+ T Lymphocytes CD8 2 CD56+ Natural Killer Cells NK 3 (Large
Granular Lymphocytes) CD 19+ B Lymphocytes B 5 CD 14+ Monocytes
Mono 4 CD15+ Granulocytes Gran 5 CD16+ Neutrophils Neut 4
[0306] Analysis of the methylation status of the bisulfate
converted DNA was performed using DNA methylation microarray,
Infinium.RTM. HumanMethylation27 Beadchip Microarray,
(Illumina.RTM., Inc., San Diego, Calif.). This microarray
quantifies the methylation status of 27,578 CpG loci from 14,495
genes, with a redundancy of 15-18-fold. Bisulfite converted,
genomic DNA from sorted human peripheral blood leukocytes was
subjected to whole genome amplification. The purified whole genome
amplified DNA was hybridized to locus-specific DNA oligomers linked
to individual bead types corresponding to each CpG locus,
unmethylated or methylated. Allele-specific primer annealing was
followed by specific single-base extension using labeled ddNTPs.
Extension only occurs if the bead type matches the methylation
status of the genomic DNA.
[0307] The array was fluorescently stained, scanned, and
fluorescent intensities of each of the unmethylated and methylated
bead types were measured. The ratio of fluorescent signals is
computed from both alleles using the following equation:
.beta.=(max(M,0))/(|U|+|M|)+100. The .beta.-value is a continuous
variable ranging from 0 (unmethylated) to 1 (completely methylated)
that represents the methylation at each CpG site and is used in
subsequent statistical analyses. Data were assembled with
BeadStudio methylation software from Illumina, Inc. (San Diego,
Calif.). Bibikova, M., et al., Epigenomics 1, 177-200 (2009).
[0308] A comparison of methylation in sorted normal human immune
cells was observed to produce distinct profiles of methylation
markers for further consideration. As shown in FIG. 9 DNA
Methylation profiles distinguished lymphocytes from myeloid derived
leukocytes. Recursively partitioned mixture model (RPMM) of
autosomal gene Infinium beta values from sorted, human, peripheral
blood leukocytes was performed in R version 2.11.1 of Illumina's
software which provides convenient mechanisms for loading and
analyzing the results of methylation status, and for quality
control and basic visualization tasks.
[0309] Candidate DNA regions with high potential to discriminate
CD3+ T cells from non-T cells were chosen based on the criteria of
being differentially demethylated and differentially overexpressed
in CD3+ T cells compared with other cell types (monocytes,
granulocytes, NK cells, and B cells). Two quantitative methylation
methods, bisulfite pyrosequencing and MS-qPCR, were used to confirm
array methylation.
[0310] The highest ranking 5000 most variable CpG loci were plotted
on the left (FIG. 9 left panel), such that the less methylated loci
appear as grey and more methylated loci appear as black. The number
of individual leukocyte samples in each methylation class is shown
in FIG. 9 in the table to the right. The algorithm for prioritizing
these candidates described herein yielded CD3E and CD3Z as specific
DMR for identifying CD3+ T cells.
Example 14
Patient Characteristics and Biological Samples for Determining
CD3.+-.T Cell Distribution in Glioma Cases and Controls
[0311] Whole blood samples from glioma patients (N=94) and controls
(N=71) were obtained from the UCSF San Francisco Adult Glioma Study
(AGS) for these examples (Table 16). The patients included in this
example were diagnosed between 1997 and 2011. Details of subject
ascertainment through the rapid case ascertainment program of San
Francisco regional population-based registry or the UCSF
Neuro-oncology Clinic have been described (Wrensch M et al., 2007,
Clin Cancer Res 13(1): 197-205; Felini M J et al. 2009, Cancer
Causes Control 20(1): 87-96; Wrensch M et al., 2009, Nat Genet.
41(8): 905-8; Christensen B C et al., 2011, J Natl Cancer Inst
103(2): 143-53). Pertinent data for this analysis included age at
histological diagnosis, gender, vital status, and survival time
between diagnosis date and date of death for those deceased or
between diagnosis date and date of last contact for those alive,
and any of cigarette smoking history and exposure to steroids,
chemotherapy and radiation therapy.
[0312] A panel of 120 fresh frozen glioma tumors from the UCSF
Brain Tumor Research Center tissue bank, obtained under appropriate
institutional review board approval, which were previously
characterized for molecular features (Christensen B C et al., 2011,
J Natl Cancer Inst 103(2): 143-53; Zheng S et al., 2011, Neuro
Oncol 13(3): 280-9) was chosen for tumor MS-qPCR and IHC studies
(Table 16). Tumor samples were defined as secondary GBM if the
patients had prior histological diagnosis of a low-grade glioma.
The ages are given at the time of surgery, which occurred at UCSF
between 1990 and 2003. This tumor set contained the following
histological subtypes: 2 pilocytic astrocytoma (PA), 15 ependymoma
grade II (EPII), 20 oligodendroglioma grade II (ODII), 16
oligoastroglioma grade II (OAII), 3 oligoastroglioma grade III
(OAIII), 23 astrocytoma grade II (ASH), 4 astrocytoma grade III
(ASIII) and 37 astrocytoma grade IV, also called glioblastoma
multiforme grade IV (GBM), ten of which were recurrent and five of
which were secondary.
[0313] Sorted, normal, human, peripheral blood leukocyte subtypes
were isolated from different non-diseased individuals' whole blood
by MACS using a combination of negative and positive selection with
highly specific cell surface antibodies conjugated to magnetic
beads. The purity of separated cells was determined with flow
cytometry to be >97%.
Example 15
Bisulfite Pyrosequencing and MS-qPCR Assays for Validating CD3Z,
CD3E and FOXP3 Specific DMRs
[0314] The demographic characteristics of donors for samples
(N=285) used in MS-qPCR analysis is as shown in Table 16. CpGenome
Universal Methylated DNA (Cat. No. S7821, Millipore Corp.,
Temecula, Calif.), purified T cell and Treg DNA were bisulfite
converted at the same time. Bisulfite pyrosequencing assays were
designed using Pyromark Assay Design 2.0 (QIAGEN), and carried out
using a Pyromark MD pyrosequencer running Pyromark qCpG software
(QIAGEN). Custom oligonucleotide primers used in bisulfite
pyrosequencing were obtained from Invitrogen (Life Technologies Co,
Carlsbad Calif.). For MS-qPCR reactions, primers and TaqMan major
groove binding (MGB) probes with 5' 6FAM and 3' non-fluorescent
quencher (NFQ) as well as TaqMan 1000 RXN Gold with Buffer A Pack
were obtained from Applied Biosystems (Part No. 4304971, 4316034
and 4304441, Applied Biosystems, Foster City, Calif.). The primer
and probe sequences are shown in Table 17 and FIG. 12. Solutions
for MS-qPCR: 10.times.TaqMan Stabilizer containing 0.1% Tween-20,
0.5% gelatin were prepared weekly. Each reaction of 20 .mu.l
contained 5 .mu.l DNA, 11.9 .mu.l PreMix, 3 .mu.l OligoMix, and 0.1
.mu.l Taq DNA polymerase. Cycling was performed using a 7900HT Fast
Real-Time PCR System (Applied Biosystems, Foster City, Calif.); 50
cycles at 95.degree. C. for 15 sec and 60.degree. C. for 1 min
after 10 min at 95.degree. C. preheat. Samples were run in
triplicate using the absolute quantification method. Copy number of
the target locus in each sample was determined by reference to a
four-point standard curve, which was based on known copies of
bisulfite converted template.
TABLE-US-00016 TABLE 16 Demographic characteristics of donors for
samples (N = 285) used in MS-qPCR analysis Control Blood Case Blood
samples samples Excised Tumors Characteristic (n = 94) (n = 71) (n
= 120) Age Median (range) 57 (22-87) 57 (20-86) 41 (1-78) Mean 55
(16.5) 56 (13).sup. 41 (15).sup. (standard deviation) Gender, No
(%) Female 43 (46%) 26 (36%) 42 (35%) Male 51 (54%) 45 (64%) 78
(65%) Race, No (%) White, 78 (83%) 67 (95%) 102 (85%) Non-Hispanic
Hispanic 3 (3%) 3 (4%) 7 (6%) Asian 6 (7%) 0 (0%) 4 (3%) Black 5
(6%) 0 (0%) 0 (0%) Other 1 (1%) 1 (1%) 7 (6%)
[0315] Quantification of total bisulfite converted DNA copies for
standard and biological samples was determined by reference to the
C-less qPCR assay as described previously (Weisenberger D J et al.,
2008, Nucleic Acids Res 36(14): 4689-98.; Campan M et al., 2009,
Methods Mol Biol 507: 325-37). In this procedure one determines the
relative amounts of a bisulfite converted sample through the use of
a TaqMan PCR reaction using primers and probes that recognize a DNA
strand that does not contain cytosines, and hence is able to
amplify the total amount of DNA (bisulfite-converted or
unconverted) in a PCR reaction well. The absolute copy number in
DNA Standard Solution (Cambio Ltd. Cambridge, UK) was used to
calibrate the C-less reaction and assuming 3.3 pg=1 genome copy.
Universal methylated DNA and purified CD3+ T cell and Treg DNA
(bisulfite converted) were quantified at the same time. Since
C-less primers hybridize to both strands of the standard DNA
(non-bisulfite converted) and bisulfite converted samples allow for
only single strand hybridization during the first cycle, the
resultant copy number in bisulfite samples is multiplied by two.
After C-less assay, the copy number of the different standards:
universal methylated, CD3+ T cell and Treg DNA was used to create
standard curves for CD3Z and FOXP3. To create a calibration curve
known quantities of CD3+ T cell or Treg DNA were spiked into
universal methylated DNA in ratios that maintained a constant total
copy number in each reaction across the dilution scheme. The latter
procedure mimics the conditions of detection that exist in
differentiating different relative numbers of CD3+ T cells and
Tregs within a mixture of cells in a complex biological sample. For
absolute quantification of CD3Z, the four-point standard curve used
10,000, 1,000, 100, and 10 bisulfite converted CD3+ T cell DNA
copies; absolute quantification of FOXP3 used, 5,000, 500, 50 and 5
bisulfite converted Treg cell DNA copies.
TABLE-US-00017 TABLE 17 Primer and probe sequences for MS-qPCR
assays Oligonu- cleotide Name Sequence (5' to 3') C-less Fwd
TTGTATGTATGTGAGTGTGGGAGAGA (SEQ ID NO: 97) C-less Rev
TTTCTTCCACCCCTTCTCTTCC (SEQ ID NO: 98) C-less Probe
(6FAM)CTCCCCCTCTAACTCTAT(MGB, NFQ) (SEQ ID NO: 99) CD3Z Fwd
GGATGGTTGTGGTGAAAAGTG (SEQ ID NO: 100) CD3Z Rev
CAAAAACTCCTTTTCTCCTAACCA (SEQ ID NO: 101) CD3Z Probe
(6FAM)CCAACCACCACTACCTCAA(MGB, NFQ)) (SEQ ID NO: 102 FOXP3 Fwd
GGGTTTTGTTGTTATAGTTTTTG (SEQ ID NO: 103) FOXP3 Rev
TTCTCTTCCTCCATAATATCA (SEQ ID NO: 104) FOXP3 Probe
(6FAM)CAACACATCCAACCACCAT(MGB, NFQ) (SEQ ID NO: 105) MGB: major
groove binding FAM: 6-Carboxyfluorescein NGQ: NFQ C-less qPCR
assay: Campan M et al., 2009, Methods Mol Biol, 507: 325-37;
Weisenberger D J et al., 2008, Nucleic Acids Res 2008; 36:
4689-98
[0316] The CD3E specific DMR DNA methylation status of the DMR in
CD3E gene was measured by pyrosequencing bisulfite converted DNA
from sorted, human, peripheral blood leukocytes. FIG. 10A. The CD3Z
specific DMR, DNA methylation status of the DMR in CD3Z gene was
measured by MethyLight.RTM. qPCR. of converted DNA from sorted,
human, peripheral blood leukocytes (FIG. 10B). The genomic region
containing the CD3Z DMR is shown in FIG. 11.
[0317] Standard calibration curves were used to determine if the
newly identified CD3Z DMR was useful to quantify CD3+ T cells,
Tregs (FOXP3 demethylated) and ratios of Tregs/CD3+ T cells in
biological specimens such as whole or separated blood or other
tissues. To obtain these curves quantitative real time methylation
specific PCR was performed. DNA isolated from purified cell types
was bisulfite converted and serially diluted into a background of
fully methylated commercial DNA standard (Qiagen). This method is
referred to herein as "CS-DM assay" or assays.
[0318] It was observed that the total genomic copy numbers of each
sample within a dilution series remained constant. Log dilutions
were prepared to include the appropriate range of Ct values
corresponding to test samples (whole blood, tumor specimens). Using
cytosine less: C-less primers genome copy numbers for each test
standard were measured to ensure adequate input DNA and to
normalize the CD3+ and Treg assay values. The calibration curve for
C-less total input is shown in FIG. 13A (N=8 replicates); errors
denote standard error of the mean Ct value. FIG. 13B shows dilution
of isolated normal PanT cells (N=7 replicates) and FIG. 13C shows
dilution and calibration curve for isolated CD3+CD25+ T cells (N=8
replicates). For samples to be tested these calibration curves
(FIG. 13A-C) were used to estimate total input copies, CD3+ T cell,
and Tregs copies, respectively.
[0319] The results show that the DNA methylation status of this
region identified herein in the promoter of CD3Z gene in sorted
human peripheral blood leukocytes, which was validated as an immune
cell type specific differentially methylated region (FIG. 10B) was
observed to be useful to quantify CD3+ T cells in biological
specimens such as whole or separated blood, or other tissues.
Example 16
Flow Cytometry of Blood Lymphocytes in Whole Blood for
Quantification of CD3+ T Cells
[0320] Levels of CD3+ T cells in whole blood were quantified by
flow cytometry for comparison with CD3+ T cell levels determined
using CD3Z Ms-qPCR assay. Venous whole blood samples were collected
in citrate EDTA and processed using a lysis no wash protocol
(Invitrogen, Carlsbad, Calif. cat#GAS-010). Cells were labeled by
direct staining with the appropriate fluorochrome-conjugated
antibodies (eBioscience Inc, San Diego, Calif.), and were incubated
for 20 minutes in the dark at 4.degree. C.; CD3-fluorescein
isothiocyanate (FITC, cat #11-0038-41), anti-CD4-allophycocyanin
(APC, cat #17-0048-41), anti-CD8-phycoerythrin (PE, cat
#12-0086-41), and anti-CD45-PerCP-Cy5.5 (cat #45-0459-41). Isotype
control mAbs were used as negative controls. Aecucheck counting
beads (Invitrogen, Carlsbad Calif. cat #PCB100) were used for
quantifying leukocyte numbers. Acquisition was preformed within 48
hrs of blood draw on a FACScalibur flow cytometer using Cell-Quest
Software (Becton Dickinson, Franklin Lakes, N.J.). For CD3+ cells a
minimum of 10,000 events were collected on the lymphocyte gate that
was set on the forward scatter vs. side scatter (FSC vs. SSC) and
then gated on CD3+ cells. CD45+ counts were obtained by first
gating on non-bead events using the FSC vs. SSC. A CD45+ histogram
plot of the non-bead events was then created. CD45+ cells were
gated. Examples are seen in FIG. 18. Absolute counts (number cells
per .mu.l) were obtained by taking the number of cells counted,
divided by total number of beads counted, multiplied by the known
concentration of beads. Flowjo software (TreeStar Inc, Ashland,
Oreg.) was used for data analysis.
Example 17
Tumor Immunohistochemistry (IHC) for Measuring Levels of Tumor
Infiltrating Lymphocytes (TIL) in Glioma Tumors
[0321] Slides were prepared from a 5 micron slice of each FFPE
tumor block. Slides were stained using a Benchmark XT instrument
per manufacturer's instructions (Ventana, Tucson, Ariz.). CD3
antibody (Dako, Carpinteria, CA cat #A0452) was added in a 1:600
dilution, and incubated for 30 minutes. CD8 antibody (Dako,
Carpinteria, CA cat #M7103) was added in a 1:200 dilution and
incubated for 60 minutes. CD4 antibody (Leica Microsystems, Buffalo
Grove Ill., cat #NCL-L-CD4-368) was added in a 1:50 dilution, and
incubated for 2 hours. Slides were counterstained with hematoxylin.
Each slide was scanned at a magnification of 10.times. to identify
four suitable fields that were then scored at 25.times.
magnification. Examples are seen in FIG. 19A-C. The numbers of
positive staining cells were recorded and the average count per
four fields calculated. Photomicrographs was taken and scored for
specimens with very high cell counts to increase accuracy. Samples
were also examined to see if they contained predominantly
perivascular and/or parenchymal infiltrates. A blind comparison of
observation by two individuals was carried out to ensure uniform
interpretation. Data from tumor IHC were analyzed in combination
with CD3Z MS-qPCR data to determine association between the two
data sets. (see Example 19)
Example 18
Statistical Analysis of Differential Methylation in CD3+ T Cells
for Identification of Cell-Specific DMRs
[0322] To identify putative cell specific DMRs, MACS sorted
leukocyte DNA methyation data consisting of un-normalized average
beta values from the Illumina HumanMethyation27 microrray were
calculated from probe intensities using Illumina GenomeStudio.
Locus by locus comparisons of DNA methyation between the sorted
cell types were performed using a linear mixed effects model
(controlling for beadchip) in SAS version 9.2, thereby generating
estimates and p-values for differential methyation in CD3+ T cells
compared to other cell types. Resultant p-values were adjusted for
multiple comparisons using the qValue package in the software
program R project for statistical computing, version 2.13 available
for downloading from the internet, and q-values of less than 0.05
were considered significant. Correlations, F-tests, Wicoxon rank
sum and Kruskal-Wallis one-way analysis of variance by ranks tests
were carried out in R version 2.11.1 and survival analysis was
performing using the survival pack in R version 2.11.1.
Example 19
Discovery and Validation of CD3Z Demethylation as a Marker of CD3+
T Cells
[0323] The search for genes containing DMRs specific for CD3+ T
cells using methods herein revealed candidate CpG sites within the
genes encoding several components of the T cell receptor (TCR)
complex; namely, CD3D, CD3E, CD3G, and CD3Z. Myeloid derived blood
cells (granulocytes, neutrophils, monocytes) and B-lymphocytes
contained methylated CpG sites within CD3D, CD3E, CD3G and CD3Z
loci compared with T cells, which were demethylated. CD3Z was also
unmethylated in CD16+ NK cells, but was methylated in CD16- NK
cells. The promoter regions of the CD3D, CD3E and CD3G genes are
CpG sparse compared with CD3Z, which contains a CpG island that is
optimally suited for designing MS-qPCR assays (FIG. 1A). For these
reasons the CD3Z locus was analyzed for the development of a CD3+ T
cell epigenetic marker. CD3Z is significantly overexpressed
(p=0.0001; Palmer, Diehn et al. 2006) and demethylated (q=0.00026)
in CD3+ T cells compared with non-T cells. Pyrosequencing of CD3Z
showed the extent of differences in demethylation among immune cell
lineages, which approaches complete demethylation in CD3+ T cells
and nearly complete methylation in other cell lineages (FIG.
20A-B).
[0324] Bisulfite converted universal methylated DNA and DNA from
purified CD3+ Tcells were used to prepare a four point calibration
curve to estimate CD3+ T cell numbers in mixtures of cells (FIG.
14B). Total amount of DNA was held constant at four points. Log
Linear PCR kinetics were demonstrated over a range of CD3+ T cell
DNA inputs corresponding to 10 to 100000 genomic copies, indicating
that the MS-qPCR assay was able to detect a few demethylated cells
within a background of many thousands of methylated cells.
[0325] Whole blood samples from 46 healthy controls and 20 patients
with glioma were then used to compare flow cytometry quantification
of CD3+ T cells with the CD3Z MS-qPCR assay (FIG. 14C). The MS-qPCR
measurements were observed to correlate highly with conventional
flow measurement of T cells as a fraction of total blood leukocytes
(Pearson R=0.93; F test p<2.2.times.10-16). The uniform
regression and close correspondence of the two methods was true for
both glioma patients (labeled "cases") and the healthy controls.
These data show that the disease process itself and treatment
exposures did not influence the demethylation assay.
[0326] The correlation of CD3+ T cells detected by IHC and MS-qPCR
was assessed in a set of FFPE samples; the results indicated a
significant association of IHC score with CD3Z demethylation
(Pearson R=0.85; F test p=3.4.times.10.sup.-11; FIG. 14D). Most
CD3+ TILs were CD8+ and only a few stained positively for CD4+
(FIG. 19). Glioma cell lines (A172, T98G) were also studied; both
expressed Foxp3 copy numbers <0.06% of total input. Analysis of
two autopsy brain specimens revealed Foxp3 copy numbers <0.04%
of total input. These values show limits of detection of the assay
which were observed to be much lower than values observed in
patient blood or tumor samples. These results demonstrate the
specificity of the CD3Z epigenetic assay for detecting CD3+immune
cells within a background of tumor cells.
Example 20
Determination of T Cells and Tregs Levels in Peripheral Blood by
CD3Z and FOXP3 MS-qPCR Assays in Glioma Cases and Controls
[0327] The utility of the epigenetic assays using archived frozen
blood specimen samples was tested by performing a case control
analysis of CD3Z and FOXP3 demethylation in glioma patients and
control subjects to measure CD3+ T cell and Treg levels,
respectively, in stored peripheral blood specimens from the
University of San Francisco Adult Glioma Study (AGS). Results of
MS-qPCR assays are summarized in Table 18. The total inputs of DNA
from whole blood from the 94 controls and 71 glioma cases were not
significantly different from each other. In patients with grade IV
glioblastoma multiforme (GBM), peripheral blood CD3+ T cell levels
were observed to be significantly lower (Wilcoxon p=1.7.times.10-9;
FIG. 15A), peripheral blood Treg levels were observed to be
significantly lower (Wilcoxon p=5.2.times.10-11; FIG. 15B) and
peripheral blood Treg/CD3+ T cell ratios were observed to be
moderately lower (Wilcoxon p=0.024; FIG. 15C) compared to healthy
controls. In glioma patients and controls subjects, levels of T
cells and Tregs were positively correlated (Pearson R=0.61, F test
p<2.2.times.10.sup.-16). Use of dexamethasone or chemotherapy
was not associated with T cell measures. The GBM case patients
received steroid treatments prior to blood sampling. In healthy
controls, but not glioma patients, people who had smoked were
observed to have higher peripheral blood CD3+ T cell levels than
those who had never smoked (Wilcoxon p=0.08, FIG. 16A) and current
smokers had significantly higher levels of peripheral blood Tregs
than former smokers (Wilcoxon p=0.01) and never smokers (Wilcoxon
p=0.002; FIG. 16B). Furthermore, the ratio of Tregs/CD3+ T cells
was significantly elevated in the peripheral blood of current
smokers compared to former smokers (Wilcoxon p=0.01) and never
smokers (Wilcoxon p=0.03) among healthy controls, and trended
towards elevated levels in current smokers compared to former
smokers (Wilcoxon p=0.17) and never smokers (Wilcoxon p=0.14; FIG.
16C).
TABLE-US-00018 TABLE 18 Summary of MS-qPCR measurements for samples
(N = 285) Sample Percent Demethylation, Median (Range) Description
CD3Z FOXP3 FOXP3/CD3Z Blood samples 17.6 (2.1-44.4) 0.8 (0.06-3.2)
4.5 (0.9-20.2) (n = 165) Controls 21.7 (4.7-44.4) 1.0 (0.2-3.2) 4.8
(1.0-20.2) (n = 94) Never Smokers 19.3 (4.7-32.1) 1.0 (0.2-2.5) 4.8
(1.0-11.7) (n = 44) Former Smokers 22.4 (8.8-43.4) 1.1 (0.2-2.2)
4.4 (1.8-10.5) (n = 42) Current Smokers 23.4 (5.7-44.4) 1.6
(0.8-3.2) 7.4 (3.6-20.2) (n = 8) Glioma Cases 11.2 (2.1-37.7) 0.5
(0.06-2.5) 4.1 (0.9-14.8) (n = 71) Never Smokers 11.3 (2.7-37.7)
0.5 (0.06-2.5) 3.8 (1.3-11.5) (n = 31) Former Smokers 12.7
(3.3-32.8) 0.5 (0.06-1.7) 4.1 (0.9-12.8) (n = 29) Current Smokers
9.6 (2.1-27.8) 0.5 (0.1-1.2) 5.1 (2.3-14.8) (n = 11) Non-GBM 18.5
(3.5-26.6) 0.9 (0.2-1.6) 6.0 (3.8-7.1) (n = 6) GBM 10.5 (2.1-37.7)
0.5 (0.06-2.5) 4.1 (0.9-14.8) (n = 65) Excised Tumors 0.5
(0.03-18.7) 0.03 (0-1.5) 5.1 (0-100) (n = 120) Grades I, II &
III 0.3 (0.03-3.9) 0.02 (0-0.5) 3.4 (0-100) (n = 83) Pilocytic 1.4
(1.0-1.9) 0 (0-0).sup. 0 (0-0) Astrocytoma (n = 2) Ependymoma 0.5
(0.09-3.0) 0.03 (0-0.3) 3.4 (0-29.4).sup. (n = 15) Oligodendrogli-
0.2 (0.04-1.6) 0 (0-0.2) 0 (0-57.3) oma (n = 20) Oligoastrocytoma
0.25 (0.04-3.9) 0.05 (0-0.4) 10.5 (0-100) (n = 19) Astrocytoma 0.3
(0.03-2.0) 0 (0-0.5) 0 (0-100) (n = 27) Grade IV, GBM 1.1
(0.17-18.7) 0.08 (0-1.5) 7.8 (0-47.4).sup. (n = 37)
Example 21
Determination of T Cells and Tregs Levels in Tumor Infiltrates by
CD3Z and FOXP3 MS-qPCR Assays in Excised Glioma Tumors
[0328] The demethylation assays of CD3Z and FOXP3 were used to
measure levels of tumor infiltrating CD3+ T cells and Tregs,
respectively, in 120 fresh frozen glioma tumors from the UCSF Brain
Tumor Research Center tissue bank. Results of MS-qPCR assays are
summarized in Table 18. Increased glioma tumor grade and higher
levels of both CD3+ T cell (Wilcoxon p=5.7.times.10-7; FIG. 17A)
and Treg (Wilcoxon p=0.00014; FIG. 17B) in tumor infiltrates were
observed to be significantly associated. In grade IV glioma tumor
tissues the median level of Treg percentage of T cells was observed
to be higher than that of control blood samples (Table 18), and
higher than that of lower grade tumors (FIG. 17C). Data from
MS-qPCR showed significant differences among glioma tumor
histologies in levels of CD3+ T cells (Kruskal-Wallis
p=8.6.times.10-7; FIG. 21A), Tregs (Kruskal-Wallis p=0.00011; FIG.
21B) and Treg/CD3+ T cell ratios (Kruskal-Wallis p=0.018; FIG.
21C). Poorer patient survival was associated with and higher levels
of tumor infiltrating CD3+ T cells (Log-Rank p-value=0.014; FIG.
22A) and Tregs (Log-Rank p-value=0.039; FIG. 22B) measured by
MS-qPCR.
Example 22
Kaplan-Meier Survival Curves for Glioma Cases Show Association of
Lower Treg with Improved Survival
[0329] Survival of glioma patients were correlated with the
incidence of CD3+ T cells and Tregs as measured by CD3Z
demethylation assays. (FIG. 22A-C). Both univariate and
multivariate survival analyses were performed. Kaplan-Meier
survival curves for glioma cases were stratified by median values
of CD3Z demethylation assays. For depicting the survival results in
FIG. 22A-C, patients were divided into two groups. In each panel
the top trace represents survival data of the group of patients for
whom the measured variable (methylation status of CD3+ T cells, or
of Tregs, or a ratio Tregs/T cells) was below the median observed
for that variable, and the bottom trace represents survival data of
the group of patients for whom the measured variable was above the
median observed for that variable.
[0330] The results show that after controlling for age, gender and
grade the CD3Z demethylation assays for CD3+ and CD3+ Tregs in
glioma tumor tissue were significantly associated (FIG. 22A-C) with
poorer patient survival.
A CD3+ T cell CD3Z demethylation assay was performed which showed
that lower CD3+ T cell/total input in glioma tumor tissue was
significantly associated (FIG. 22A) with improved survival
(Log-Rank p-value=0.0144). A Treg CS-DM CD3Z demethylation assays
was performed which showed (FIG. 22B) that lower Treg/total input
in glioma tumor tissue was significantly associated with improved
survival (Log-Rank p-value=0.0385). A measurement of Treg/CD3+ T
cell ratio was performed by CD3Z demethylation assay which showed
(FIG. 22C) that lower Treg percentage of CD3+ T cells in glioma
tumor tissue was significantly associated with improved survival
(Log-Rank p-value=0.4558).
Example 23
Cells, and Cancer Patient and Control Datasets for Determining DNA
Methylation Based Epigenetic Signatures for Differentiating
Patients and Controls
[0331] Sorted, normal, human peripheral blood leukocyte subtypes
were isolated from whole blood by magnetic activated cell sorting
(MACS) (AllCells LLC, Emeryville, CA). The purity of separated
cells was confirmed with flow cytometry to be >97%. Genomic DNA
was extracted and purified from cell pellets using a commercially
available method (Qiagen, Valencia, Calif.), treated with sodium
bisulfite (Zymo Research, Irvine, Calif.) and subjected to
methylation profiling using the Infinium HumanMethyation27
BeadArray (Illumina, San Diego, Calif.). This same platform was
used for the analysis of samples from the case-control studies
described below.
[0332] The HNSCC data set consists (Table 19) of 92 incident cases
from the greater Boston area and 92 cancer-free population-based
control subjects from the same region (Applebaum K M et al., Int J
Cancer 124:2690-2696, 2009). The clinical characteristics for this
study population are contained in Table 19. The ovarian cancer data
set (Teschendorff A E et al., 2009, PLoS One 4:e8274, 2009) is
publicly available from Gene Expression Omnibus (GEO,
http://www.ncbi.nlm.nih.gov/geo/, Accession number GSE19711), and
consists of 266 postmenopausal women diagnosed with primary
epithelial ovarian cancer (131 pre-treatment and 135 post-treatment
cases) from the UK Ovarian Cancer Population Study (UKOPS).
Controls (n=274) were cancer-free postmenopausal women for which
annual serum samples were available. To avoid potential biases due
to therapy, only pre-treatment ovarian cases were included in the
analysis. The bladder cancer data set (Marsit C J et al., 2011, J
Clin Oncol 29:1133-1139) consists of 223 incident bladder cancer
cases identified from the New Hampshire state cancer registry and
237 population controls from the same region (Karagas M R et al.,
1998, Environ Health Perspect 106:1047-1050; Wallace K et al.,
2009, Cancer Prey Res 2:70-73). Table 20 provides a summary of the
participant characteristics.
TABLE-US-00019 TABLE 19 Characteristics of the study population in
the HNSCC data set. Characteristics Cases (n = 92) Controls (n =
92) Age, median years (range) 58 (31-84) 59 (32-86) Gender, n (%)
Male 64 (69.6%) 64 (69.6%) Female 28 (30.4%) 28 (30.4%) Smoking
history, n (%) Never 17 (18.5%) 32 (34.8%) Former 59 (64.1%) 47
(51.1%) Current 16 (17.4%) 13 (14.1%) Pack-years*, median (range)
40.0 (0.8-135.0) 24.5 (0.5-85.0) Alcohol history, median 15.7
(0-307.0) 5.6 (0-140.6) drinks/week (range) HPV16 (E6, E7 or L1
seropositivity), n (%) Negative 66 (71.7%) 83 (90.2%) Positive 26
(28.3%) 9 (9.8%) Tumor Site, n (%) Oral cavity 39 (42.4%) --
Pharynx 35 (38.0%) -- Larynx 18 (19.6%) -- Stage, n (%) I 9 (12.5%)
-- II 9 (12.5%) -- III 14 (19.4%) -- IV 40 (55.6%) -- *Restricted
to ever-smokers (current + former)
TABLE-US-00020 TABLE 20 Characteristics of the study population in
the Bladder cancer data set. Controls Cases Characteristics No. %
No. % Total No. 237 223 Age, years Median 65 66 Range 28-74 25-74
Sex Male 158 48 171 52 Female 79 60 52 40 Family history of bladder
cancer* No 224 53 199 47 Yes 7 44 9 56 Smoking history Never 72 64
40 36 Former 126 53 111 47 Current 39 35 72 66 Tumor stage/grade
designation Carcinoma in situ NA 6 3 Noninvasive low grade NA 140
63 (grade 1-2) Noninvasive high grade NA 17 7 (grade 3) Invasive NA
60 27 *Data on family history were not available for 13
subjects
Example 24
Statistical Analysis of Differences in Methylation Status in
Leucocyte Subsets for Determining Signatures Based on Leukocyte
DMRs
[0333] The analytic strategy was aimed toward examining the extent
to which peripheral blood DNA methylation of non-hematopoietic
cancers is driven by the epigenetic signatures that define
leukocyte subtypes. Linear mixed-effects models were used to assess
differences in methylation across the leukocyte subtypes and
controlled for the large number of comparisons using false
discovery rate (fdr) estimation. Leukocyte DMRs were subsequently
ranked based on their strength of association and the highest
ranking 50 DMRs were examined across the three cancer data sets
between cancer cases and cancer-free controls.
An analysis was performed that capitalized on the aggregate
methylation signatures across a collection of leukocyte DMRs. Each
one of the full cancer data sets was split into equally sized
training and testing sets. Samples in the training sets were then
clustered using leukocyte DMRs. Clustering analysis was achieved
using the Recursively Partitioned Mixture Model20 (RPMM), a
hierarchical model-based method for clustering used for the
clustering of array-based methylation data ((Christensen B C et
al., 2009, PLoS Genet. 5:e1000602; Christensen B C et al., 2011, J
Natl Cancer Inst 103:143-453; Hinoue T et al., 2012, Genome Res.
22(2):271-82; Koestler D C et al., 2010, Bioinformatics
26:2578-2585). Based on the RPMM fit to the training sets,
methylation class membership for the observations in the respective
testing sets was predicted and the association between predicted
methylation class and cancer case/control status were assessed.
[0334] The detailed statistical methodologies employed in the
analysis are shown in Examples 25-26. Analyses were carried out
using the R statistical package, R project for statistical
computing, version 2.13 R available for downloading from the
internet.
Example 25
Prediction of Methylation Class Membership Based on Epigenetic
Signatures from Leukocyte Derived DMRs
[0335] Genome-wide DNA methylation was profiled in 46 samples of
magnetic antibody sorted, normal human peripheral blood leukocyte
subtypes (including B cells, granulocytes, monocytes, NK-cells,
CD4+ T cells, CD8+ T cells, and Pan-T cells; FIG. 28) using the
Infinium HumanMethylation27 BeadArray. To discern leukocyte subtype
DMRs, an association between methylation and leukocyte subtype for
each of 26,486 autosomal CpG loci was examined. This data revealed
10,370 significantly differentially methylated CpGs among the
leukocyte subtypes (fdr q-value<0.05), which were ranked by
q-value (Table 22 and FIG. 24A). The highest ranking 50 DMRs (Table
21) from this ranked list were selected for use in the case-control
analyses. Since the publically available ovarian cancer data set
included both pre- and post-treatment cases, only pre-treatment
cases (n=131) were considered in subsequent analyses to avoid
potential biases resulting from therapy. Using unconditional
logistic regression models, adjusted for available and relevant
confounders (FIG. 24A), a substantial proportion of the 50 selected
leukocyte DMRs were found to be significantly differentially
methylated between cancer cases and cancer-free controls at the
.alpha.=0.05 threshold (48, 47, and 8 out of 50, permutation
p-values=<0.001, <0.001, 0.085, for HNSCC, ovarian cancer,
and bladder cancer, respectively; FIG. 24B).
[0336] Eight of the leukocyte DMRs that were significantly
differentially methylated in cancer cases compared to controls were
observed to be common to the three cancer types (FIG. 24B). In
HNSCC and ovarian cancer, seven of these eight leukocyte DMRs were
hypomethylated in cases relative to controls, whereas the 8 DMRs
were hypermethylated in bladder cancer cases relative to controls
(Table 22).
[0337] To extend on the aggregate methylation signatures across a
collection of leukocyte DMRs, classifiers based on profiles of
leukocyte DMRs obtained from the subset analysis were developed and
tested and the performance of these classifiers for successfully
discriminating cancer cases from cancer-free controls was assessed.
The workflow of the DMR methylation profile analysis is shown in
FIGS. 29-31. For each of the three cancer data sets, a
cross-validation procedure (Christensen B C et al., 2011, J Natl
Cancer Inst 103:143-153) was implemented on the training sets only
to determine the number of highest ranking leukocyte DMRs (M) for
subsequent clustering analysis of the training sets. The highest
ranking 50, 10, and 56 leukocyte DMRs from the respective
cross-validation procedures using the 10,370 putative DMRs
initially identified were selected to cluster the observations in
the HNSCC, ovarian cancer, and bladder cancer training sets
respectively. The resultant clustering solutions were used to
predict methylation class membership for the subjects within the
respective independent testing sets. FIG. 24A, FIG. 25A and FIG.
26A depict heat maps of the respective testing sets by predicted
methylation class for each cancer data set. Methylation classes
derived from leukocyte subtype DMRs were significantly associated
with cancer case status within each cancer type (permutation
.chi..sup.2 p-values <0.0001, <0.0001, 0.03, HNSCC, ovarian
cancer, and bladder cancer data sets respectively), supporting the
phenotypic relevance of predicted methylation classes based on
leukocyte DMRs.
[0338] For the HNSCC testing set, subjects predicted to be in the
right most classes of the dendrogram (classes beginning with R)
were six-fold more likely to be HNSCC cases compared to subjects in
the left most classes (classes beginning with L) (OR=5.99; 95% CI
[1.96, 18.36]), controlling for age, gender, smoking, alcohol
consumption, and HPV serostatus. Assessing the clinical utility of
the predicted methylation classes in HNSCC demonstrated that
methylation classes derived from the highest ranking 50 leukocyte
DMRs were highly predictive of HNSCC case/control status (area
under the curve (AUC)=0.82 95% CI [0.74, 0.91]), which increased to
0.92 (0.87, 0.98 with age, gender, smoking, alcohol consumption,
and HPV serostatus included in the model (FIG. 24B).
[0339] For ovarian cancer, subjects predicted to be in the right
most classes were approximately ten-fold more likely to be ovarian
cancer cases compared to subjects in the left most classes
(OR=9.87, 95% CI [4.63, 21.10]), controlling for age. Additionally,
the predicted methylation classes in the ovarian cancer data
demonstrated remarkably high sensitivity and specificity for
predicting ovarian cancer case/control status (AUC=0.83 95% CI
[0.77, 0.89]), which increased to AUC=0.86 95% CI [0.81, 0.92] with
age included in the model (FIG. 25B).
[0340] In the bladder cancer data, subjects in the right most
classes were nearly twice as likely to be bladder cancer cases
compared to subjects in the left most (OR=1.94 95% CI [0.95, 3.98],
adjusted for age, gender, smoking and family history of bladder
cancer). The clinical utility of the predicted methylation classes
in the bladder cancer data was lower than that observed for HNSCC
and ovarian cancer (bladder AUC=0.67 95% CI [0.60, 0.73] and
adjusted AUC=0.77 95% CI [0.71, 0.83] with age, gender, smoking,
and family history in the model) (FIG. 26B).
[0341] Utilizing leukocyte-derived DMRs to differentiate cases and
controls resulted in methylation profiles that were consistent, and
in the case of HNSCC and ovarian tumors, considerably better in
terms of their prediction performance compared to previously
published results using the same data sets (Teschendorff A E et
al., 2009, PLoS One 4:e8274; Marsit C J et al., 2011, J Clin Oncol
29:1133-1139; Langevin S M et al., Epigenetics. 2012 March;
7(3):291-9). For the HNSCC and ovarian data sets there was a high
degree of correlation in the methylation status of leukocyte DMRs
and CpG loci identified by previous analytic strategies (Langevin S
M et al., Epigenetics. 2012 March; 7(3):291-9; mean absolute
spearman correlations=0.68 and 0.75, respectively; FIG. 27A and
FIG. 27B). In contrast, the highest ranking 56 DMRs in the bladder
data set were found to be less correlated with the CpG loci used to
form the methylation classes in a previous study using the same
data set (mean absolute spearman correlation=0.11; FIG. 27C).
TABLE-US-00021 TABLE 21 The highest ranking 50 differentially
methylated regions (DMRs) among the leukocyte subtypes (false
discovery rate q-values < 0.001 for all) CpG Name Chromosome
Gene Name F-statistic cg03801286 21 KCNE1 373.63 cg25634666 11
FOLR3 369.50 cg24777950 14 CTSG 350.66 cg17356733 21 IFNGR2 291.97
cg02497428 16 IGSF6 291.35 cg24211388 6 AIF1 285.92 cg03330678 17
9-Sep 284.79 cg00546897 21 LOC284837 279.64 cg24841244 11 CD3D
271.62 cg11283860 1 SLC45A1 271.09 cg27485921 2 ATP6V1E2 267.19
cg00974864 1 FCGR3B 260.62 cg07730301 11 ALDH3B1 252.52 cg07728874
11 CD3D 250.67 cg17496921 19 TSPAN16 246.58 cg26661623 17 ASGR2
242.83 cg18920397 1 LY9 238.64 cg27461196 19 FXYD1 236.64
cg20720686 7 POR 232.23 cg09303642 12 NFE2 231.34 cg23140706 12
NFE2 224.95 cg08458487 10 SFTPD 217.67 cg20748065 7 POR 217.63
cg18589858 11 SLCO2B1 217.14 cg10287137 11 P2RY2 215.31 cg25587233
9 PPP2R4 207.25 cg08044694 19 BRD4 202.50 cg18084554 19 ARID3A
198.61 cg13650156 7 PILRA 197.87 cg18854666 2 SLC11A1 197.42
cg17173423 11 MS4A3 195.50 cg22242539 17 SERPINF1 194.11 cg02780988
17 KRTHA6 193.25 cg10266490 1 ACOT11 192.62 cg27606341 5 FYB 191.23
cg15512851 6 FGD2 185.34 cg20070090 1 S100A8 183.43 cg11058932 7
TSGA13 183.31 cg13500819 5 PACAP 182.82 cg15880738 11 CD3G 182.73
cg07285167 1 CSF3R 182.16 cg09868035 20 C20orf135 179.56 cg01980222
6 TREM2 178.94 cg21019522 11 SLC22A18 176.20 cg16097772 12 LYZ
172.89 cg21969640 12 GPR84 172.51 cg12971694 9 CD72 172.43
cg22224704 11 GSTP1 172.40 cg07239938 19 ELA2 170.70 cg02240622 15
PLCB2 169.99
TABLE-US-00022 TABLE 22 Methylation differences between cancer
cases and controls for the eight overlapping differentially
methylated leukocyte DMRs. Mean delta-beta refers to the difference
in mean methylation between cancer cases and controls (i.e.
.beta.cases - .beta.controls). Mean delta-beta (95% CI) Gene Locus
HNSCC Ovarian Bladder C20orf135 -0.05 (-0.07, -0.03) -0.06 (-0.08,
-0.05) 0.02 (0.0, 0.04) PACAP 0.02 (0.00, 0.04) 0.04 (0.02, 0.05)
0.02 (0.0, 0.04) FGD2 -0.05 (-0.07, -0.03) -0.06 (-0.07, -0.04)
0.02 (0.01, 0.04) SLC22A18 -0.05 (-0.07, -0.04) -0.05 (-0.06,
-0.04) 0.02 (0.01, 0.04) GSTP1 -0.05 (-0.07, -0.04) -0.06 (-0.07,
-0.05) 0.02 (0.01, 0.04) NFE2 -0.04 (-0.05, -0.03) -0.04 (-0.05,
-0.03) 0.02 (0.0, 0.03) ASGR2 -0.06 (-0.08, -0.04) -0.05 (-0.07,
-0.04) 0.02 (0.01, 0.04) SLC11A1 -0.05 (-0.07, -0.04) -0.05 (-0.04,
-0.06) 0.02 (0.0, 0.04)
Example 26
Statistical Analysis of Methylation Differences in Leukocyte DMRs
Between Cancer Cases and Cancer-Free Controls for Determining
Epigenetic Signatures Specific to Each Group
[0342] Linear mixed-effects models were used to assess differences
in methylation across the leukocyte subtypes, modeling arcsine
square-root transformed methylation as the response1, leukocyte
subtype as a fixed effect covariate, and a random effect term for
plate/BeadChip. False discovery rate (fdr) estimation was used to
control for the large number of comparisons and putative leukocyte
DMRs were defined as those with fdr q-value<0.05. Leukocyte DMRs
were then ranked based on their strength of association using the
F-statistics that resulted from the respective linear mixed-effects
models.
[0343] Methylation differences among the highest ranking 50
leukocyte DMRs were examined between cancer cases and cancer-free
controls using a series of unconditional logistic regression models
that were adjusted using available and relevant covariate
information. A leukocyte DMR was considered differentially
methylated if the nominal p-value from the unconditional logistic
regression model was less than 0.05. Permutation tests were then
applied to each of the three data sets to determine if the number
of differentially methylated leukocyte DMRs was significantly
greater than expected by chance. Specifically, samples were
randomly permuted (same permutation across the highest ranking 50
DMRs) and an unconditional logistic regression model was fit to the
resampled data. For each data set 1000 permutations were considered
to generate a null distribution of the number of differentially
methylated leukocyte DMRs. Permutation p-values were then obtained
by comparing the observed number of differentially methylated
leukocyte DMRs to the respective null distribution.
[0344] The leukocyte DMR profile analysis involved splitting the
full cancer data sets into equally sized training and testing sets
(FIGS. 29-32). Samples in the training set were clustered using the
highest ranking M leukocyte DMRs, where M was determined from the
total pool of putative DMRs using the previously described
cross-validation procedure (Sincic N and Herceg Z, 2011, Curr Opin
Oncol 23:69-76). Clustering analysis was achieved using the
Recursively Partitioned Mixture Model3 (RPMM), a hierarchical
model-based method for clustering that has been extensively used
for the clustering of array-based methylation data (Cui H M, 2007,
Dis Markers 23:105-112; Wilhelm-Benartzi C S et al., 2010,
Carcinogenesis 31:1972-1976; Schwartzman J et al., 2011,
Epigenetics 6:1248-1256, 2011). Based on the RPMM fit to the
training data, a naive Bayes classifier was used to predict
methylation class membership for the observations in the
independent testing set. Associations between predicted methylation
class and cancer case/control status were assessed using
permutation .chi..sup.2 tests and unconditional logistic regression
models adjusted for available and relevant confounders. The
clinical utility of the identified methylation classes were
investigated using receiver operating characteristic (ROC) curves
and the corresponding area under the curve (AUC).
[0345] Pairwise spearman correlation coefficients were computed
between the highest ranking M leukocyte DMRs and the CpG loci
identified from the corresponding semi-supervised RPMM2 (SS-RPMM)
analysis of the HNSCC, ovarian, and bladder cancer data sets. A
diagram illustrating the analytic framework for SS-RPMM is provided
in FIG. 32. Briefly SS-RPMM is a statistical methodology for
identifying classes of methylation that are associated with a
phenotype of interest and has been successfully applied in several
of settings (Christensen B C et al. 2009, Cancer Res 69:227-234;
Marsit C J et al., 2006, Cancer Res 66:10621-10629, 2006).
[0346] The same training and testing sets were used for the HNSCC
and bladder cancer data sets as were used in the references
Langevin S M et al., Epigenetics. 2012 March; 7(3):291-9 and
Christensen B C et al., 2009, Cancer Res 69:227-234, to compare the
results of the present analysis to previously published results,
and to provide additional insight with respect to the findings of
those studies. The ovarian cancer data set was also analyzed using
SS-RPMM strategy described in Langevin S M et al., Epigenetics.
2012 March; 7(3):291-9 and Christensen B C et al., 2009, Cancer Res
69:227-234, and the results are shown in FIG. 33. Following the
logic above, the training sets used for the SS-RPMM analysis were
applied to the leukocyte DMR profile analysis of the ovarian
data.
[0347] Analyses were carried out using the R statistical package, R
project for statistical computing, version 2.13 R available for
downloading from the internet.
Example 27
Methylation Analysis by DNA Methylation Microarray for NK Cell
Specific DMR
[0348] Normal human peripheral blood leukocytes were isolated by
magnetic activated cell sorting (MACS; Miltenyi Biotec Inc.,
Auburn, Calif.) and purity was confirmed by fluorescence activated
cell sorting (FACS). The major cell types obtained included NK
cells (n=9), B cells (n=5), T cells (n=16), monocytes (n=5), and
granulocytes (n=8). DNA and RNA were co-extracted from MACS sorted
leukocytes using AllPrep DNA/RNA mini kit (Qiagen Inc., Valencia,
Calif.). DNA from archived blood was extracted with DNeasy Blood
& Tissue kit (Qiagen Inc., Valencia, Calif.). DNA was treated
with sodium bisulfite according to the EZ DNA Methylation Kit (Zymo
Research Corporation, Irvine, Calif.).
[0349] Methylation analysis was performed using The Infinium.RTM.
HumanMethylation27 Beadchip Microarray (Illumina Inc., San Diego,
Calif.), which quantifies the methylation status of 27,578 CpG loci
from 14,495 genes, with a redundancy of 15-18 fold. The ratio of
fluorescent signals was computed from both alleles using the
following equation: .beta.=(max(M,0))/(|U|+|M|)+100. The resultant
.beta.-value is a continuous variable ranging from 0 (unmethylated)
to 1 (completely methylated) that represents the methylation at
each CpG site and is used in subsequent statistical analyses. Data
were assembled with the methylation module of GenomeStudio software
(Illumina, Inc., San Diego, Calif.; Bibikova M et al., 2009,
Epigenomics 2009; 1:177-200)
Example 28
Validation of DNA Methylation Microarray Results for Identifying NK
Cell-Specific DMRs by Pyrosequencing
[0350] Pyrosequencing assays to validate microarray results were
designed using Pyrornark Assay Design 2.0 (Qiagen Inc., Valencia,
Calif.), and carried out on a Pyromark MD pyrosequencer running
Pyromark qCpG 1.1.11 software (Qiagen Inc., Valencia, Calif.).
Oligonucleotide primers were obtained from Life Technologies.TM.
(Grand Island, N.Y.).
Example 29
Protein Expression Analysis by mRNA Expression Array for
Identifying NK Cell-Specific DMRs
[0351] The Whole-Genome DASL HT Assay Kit (Illumina Inc., San
Diego, Calif.) was used to obtain simultaneous profiles of more
than 29,000 mRNA transcripts. Data were assembled with the
expression module of GenomeStudio software (Illumina Inc., San
Diego, Calif.). The mRNA expression array data was used in
combination with DNA methylation array data to identify NK
cell-specific DNA methylation.
Example 30
Methylation Specific Quantitative Polymerase Chain Reaction
(MS-qPCR) Analysis for Quantification of NKp46 Demethylation
[0352] Primers and TaqMan major groove binding (MGB) probes (Table
23) with 5' 6-FAM (6-Carboxyfluorescein) and 3' non-fluorescent
quencher (NFQ) as well as TaqMan.RTM. 1000 RXN Gold with Buffer A
Pack were obtained from Life Technologies.TM. (Grand Island, N.Y.).
MS-qPCR was performed using solutions and conditions according to
Campan M et al., 2009, Methods Mol Biol, 507:325-37 with the
following modifications. A solution of 10.times. TaqMan.RTM.
Stabilizer containing 0.1% Tween-20, 0.5% gelatin was prepared
weekly. Each reaction of 20 .mu.l contained 5 .mu.l DNA, 11.9 .mu.l
preMix, 3 .mu.l oligoMix, and 0.1 .mu.l Taq DNA polymerase. Cycling
was performed using a 7900HT Fast Real-Time PCR System (Applied
Biosystems, Foster City, Calif.); 50 cycles at 95.degree. C. for 15
sec and 60.degree. C. for 1 min after 10 min at 95.degree. C.
preheat. Samples were run in triplicate using the absolute
quantification method.
TABLE-US-00023 TABLE 23 MS-qPCR oligonucleotide sequences
Oligonucleotide name Sequence NKp46 forward ATTAGGTTGGTAGAATTTGAGT
primer (SEQ ID NO: 116) NKp46 reverse CCCATTCCCCTTCCACA (SEQ ID NO:
primer 117) NKp46 probe (6FAM)CTCACCAACACAAAACAA(MGB, NFQ) (SEQ ID
NO: 118) C-less forward TTGTATGTATGTGAGTGTGGGAGAGA primer (SEQ ID
NO: 97) C-less reverse TTTCTTCCACCCCTTCTCTTCC primer (SEQ ID NO:
98) C-less probe (6FAM)CTCCCCCTCTAACTCTAT(MGB, NFQ) (SEQ ID NO: 99)
MGB: major groove binding FAM: 6-Carboxyfluoresee in NGQ: NFQ
C-less qPCR assay: Campan M et al., 2009, Methods Mol Biol, 507:
325-37; Weisenberger D J et al., 2008, Nucleic Acids Res 2008: 36:
4689-98
[0353] Quantification of total bisulfite converted DNA copies was
performed by reference to the C-less qPCR assay (Campan M et al.,
2009, Methods Mol Biol, 507:325-37; Weisenberger D J et al., 2008,
Nucleic Acids Res 2008; 36:4689-98). C-less primers and probes
recognize a DNA sequence without cytosines; hence, the assay
amplifies the total amount of DNA in a PCR reaction regardless of
bisulfite conversion or methylation status. A conversion factor was
used for a diploid human cell, which is 6.6 picograms (pg) of DNA
(3.3 pg per copy) to calculate copy number.
[0354] Normal human blood DNA quantified by UV absorption
(Nanodrop, Inc) was used to generate a four point standard curve
with 30,000 copies, 3,000 copies, 300 copies and 30 copies of
genomic DNA. This standard curve was included on each sample plate
to obtain quantification of DNA from Ct values. Since C-less
primers hybridize to both strands of the standard DNA
(non-bisulfite converted) and since bisulfite converted samples
hybridize to a single strand during the first cycle, the resultant
copy number obtained from bisulfite treated samples was multiplied
by two. Bisulfite converted, universal methylated DNA standard
(Zymo Research Corperation, Valencia, Calif.) and bisulfite
converted, isolated NK cell DNA were quantified at the same time
using the C-less assay. Resultant copy number measurements were
used to prepare a calibration curve for the NKp46 demethylation
assay. NK cell DNA in known copy numbers was spiked into universal
methylated DNA in ratios that maintained a constant total number of
DNA copies (10,000 copies) in each reaction across the dilution
scheme. This mimics conditions for detecting different relative
numbers of NK cells within a complex mixture of cells in a
biological sample. For absolute quantification of NKp46
demethylation, the four-point standard curve used 10,000 copies,
1,000 copies, 100 copies, and 10 copies of bisulfite converted NK
cell DNA.
Example 31
Statistical Modeling of the DNA Methylation Microarray Data for
Estimation of Differential Methylation
[0355] A linear mixed effects model was applied to the Illumina
Infinium.RTM. HumanMethylation27 data using SAS (SAS Institute
Inc., Cary, N.C.). Cell type was designated as the fixed effect and
beadchip plate was the random effect. For this example, the fixed
effect groups were NK cells and non-NK cells, which included pan T
lymphocytes, CD4+ T-lymphocytes, Tregs, CD8+ T-lymphocytes,
B-lymphocytes, granulocytes and monocytes. Coefficients were
generated that estimated differential methylation were generated
such that, for any particular locus, a negative coefficient
indicated less methylation in NK cells than in the other cell
types. Resultant p-values were adjusted for multiple comparisons
using the "qvalue" package in the software, the R project for
statistical computing available for downloading from the
internet.
Example 32
Statistical Modeling of the RNA Expression Array for Estimation of
Differential RNA Expression
[0356] Linear models were applied to the Illumina Whole-Genome DASL
HT using the "limma" package in the software, the R project for
statistical computing. RNA expression for MACS isolated NK cells
was compared to each of the following MACS isolated leukocytes: pan
T-lymphocytes, CD4+ T-lymphocytes, Tregs, CD8+ T-lymphocytes, B
lymphocytes, ganulocytes and monocytes. Thus, estimates were
obtained for log-fold changes in RNA expression between NK cells
and each of the aforementioned cell types, in which a positive
value indicated higher RNA expression in NK cells compared to a
particular cell type. Resultant p values were adjusted for multiple
comparisons using the "qvalue" package in R project for statistical
computing. NK cell specific differential RNA expression was
considered significant only if the seven q-values were each less
than 0.1.
Example 33
Statistical Analysis of the (MS-qPCR) Data
[0357] Statistical analyses were carried out in R project for
statistical computing. A generalized linear model analysis and
F-test were performed to determine log linear PCR kinetics for the
NK cell standard curve. To test for univariate associations between
continuous NKp46 demethylation measurements and discrete variables,
Wilcoxon rank sum tests (for dichotomous variables, such as case
status) and Kruskal-Wallis one-way analysis of variance tests were
employed. To test for univariate associations between continuous
NKp46 demethylation and other continuous variables linear
regression analysis, calculation of Pearson product-moment
correlations and F-tests were performed. A chi-squared test for
trends in proportions was applied to identify trends in HNSCC
prevalence by control-determined demethylation tertiles.
Multivariate logistic regression analyses were performed using the
"glm" function with family set to binary.
Example 34
NKp46 Demethylation is a Biomarker of NK Cells
[0358] Analysis of DNA methylation and RNA expression microarray
data from MACS isolated (FACS validated) normal human leukocytes
were integrated to identify putative, NK cell-specific DMRs that
could potentially serve as reliable biomarkers of the cell type.
The list of candidate gene regions was narrowed to CpG loci that
were significantly demethylated in NK cells (q<0.1,
coefficient<0) and that were located within genes whose RNA
expression was significantly elevated in NK cells (q<0.1, log
fold-change>1). These candidates are marked as darkened
asterisks in the top left quadrant of FIG. 34. Pyrosequencing and
MS-qPCR of bisulfite converted DNA from the MACS isolated
leukocytes confirmed that a region near the promoter of NKp46 is
demethylated in NK cells, and is methylated in T cells, B cells,
granulocytes, and monocytes (FIGS. 35 and 38). Furthermore, the
CD56.sup.dim subset of NK cells showed complete demethylation in
the NKp46 region, whereas CD56.sup.bright NK cells exhibited only
partial demethylation in the region as measured by MS-qPCR. The
NKp46 MS-qPCR assay was optimized to fit a log-linear relationship
between lower Ct values (more demethylated copies of NKp46) and
increased NK cell DNA content (Pearson R=-0.996,
p<2.2.times.10.sup.16; FIG. 36).
Example 35
Samples from HNSCC Patients have Diminished Circulating NK
Cells
[0359] The calibrated NKp46 MS-qPCR assay was used to measure the
level of circulating NK cells in the peripheral blood of patients
with HNSCC and cancer free controls. The demographics of the study
population are shown in Table 24.
[0360] Univariate analysis revealed that significantly fewer
demethylated copies of NKp46 were detected in HNSCC blood than in
control blood (p<0.0001, FIG. 39), indicative of a diminished NK
cell compartment in the peripheral blood of HNSCC patients. There
was no significant univariate association observed between the
measured number of demethylated NKp46 copies and age, gender, HPV16
(E6 and/or E7) serology, cigarette smoking, alcohol consumption, or
body mass index. There was no significant difference in the number
of demethylated NKp46 copies detected in patients with oral,
pharyngeal, and laryngeal tumors.
[0361] To determine whether the observed association between NK
cells and case status was attributable to systemic chemotherapy or
other treatments, the number of demethylated NKp46 copies detected
in case blood samples drawn within one month of diagnosis was
compared to those drawn more than one month after diagnosis, and no
significant difference was observed.
TABLE-US-00024 TABLE 24 Demographic characteristics Total Controls
HNSCC Oral Pharyngeal Laryngeal Characteristic (N = 244) (n = 122)
(n = 122) (n = 43) (n = 53) (n = 26) Age Mean (SD) 61 (12).sup. 62
(12).sup. 61 (12).sup. 60 (15).sup. 60 (10).sup. 64 (9.5) Median
(Range) 60 (29-87) 60 (31-87) 60 (29-86) 59 (29-86) 60 (41-86) 64
(50-83) Gender Male, No.(%) 178 (73%) 89 (73%) 89 (73%) 27 (63%) 41
(77%) 21 (81%) Female, No.(%) 66 (27%) 33 (27%) 33 (27%) 16 (37%)
12 (23%) 5 (19%) HPV 16 Serology L1+, No.(%) 33 (14%) 4 (3%) 29
(24%) 6 (14%) 22 (42%) 1 (4%) E6+, No.(%) 41 (17%) 4 (3%) 37 (30%)
2 (5%) 32 (60%) 3 (12%) E7+, No.(%) 28 (11%) 2 (2%) 26 (21%) 1 (2%)
23 (43%) 2 (8%) E6+ and E7+, No.(%) 25 (10%) 0 (0%) 25 (20%) 0 (0%)
23 (43%) 2 (8%) E6+ or E7+, No.(%) 44 (18%) 6 (5%) 38 (31%) 3 (7%)
32 (60%) 3 (12%) Cigarette Smoking Status Never, No.(%) 65 (27%)
41(34%) 24 (20%) 11 (26%) 11 (21%) 2 (8%) Former, No.(%) 149 (61%)
66 (54%) 83 (68%) 29 (67%) 35 (66%) 19(73%) Current, No.(%) 30
(12%) 15 (12%) 15 (12%) 3 (7%) 7 (13%) 5 (19%) Cigarette Pack-Years
Mean (SD) 26 (29).sup. 17 (23).sup. 35 (32).sup. 26 (27).sup. 36
(35).sup. 45 (30).sup. Median (Range) 16 (0-116) 7 (0-114) 31
(0-116) 20 (0-105) 33 (0-116) 45 (0-96) Alcohol Drinks per Week
Mean (SD) 18 (26).sup. 15 (27).sup. 21 (24).sup. 18 (23).sup. 22
(25).sup. 23 (25).sup. Median (Range) 7 (0-199) 6 (0-199) 14
(0-155) 7 (0-90) 18 (0-155) 19 (0-113)
[0362] The NKp46 MS-qPCR measurements from cancer-free control
blood samples were used to determine suitable cutoffs for NKp46
demethylation tertiles. The proportion of total HNSCC cases
decreased significantly with increasing demethylation tertile
(p>0.001, FIG. 37), indicating that HNSCC patients are more
likely to have depressed levels of NK cells in their peripheral
blood. The trend held true independent of the case stratification
by HPV16 (E6 and/or E7) serology, or time of blood drawing within a
month of diagnosis or earlier. Multivariate logistic regression
controlling for age, gender, cigarette smoking, alcohol
consumption, BMI, and HPV16 (E6 and/or E7) serology confirmed
increased HNSCC risk for individuals in the lower two normal NKp46
demethylation tertiles (Table 25), strongly indicating that lower
levels of NK cells in the peripheral blood are significantly
associated with HNSCC.
TABLE-US-00025 TABLE 25 Logistic regression of HNSCC risk NKp46
demethylation Crude Adjusted* tertile OR (95% CI) p-value OR (95%
CI) p-value 1st (lowest) 4.3 (2.2, 9.0) 5.0 .times. 10.sup.-5 5.6
(2.0, 17.4) 0.002 2nd (middle) 2.8 (1.4, 6.0) 0.006 4.9 (1.8, 16.1)
0.004 3rd (highest) Reference Reference *Unconditional multivariate
model controlling for age, gender, smoking, drinking, BMI and HPV16
(E6 and/or E7) serology
Example 36
Application of the Methodology to mRNA Data
[0363] The statistical methods described herein for determining
changes the distribution of white blood cells among different
subpopulations are applicable to mRNA expression profiles with the
following considerations. A mathematical consideration is that mRNA
is typically analyzed on a logarithmic scale, yet the assumptions
of the methods herein involve linearity on an arithmetic scale,
since the mixing coefficients are assumed to act linearly on
absolute numbers of nucleic acid molecules; thus, the proposed
methods would require analysis of untransformed fluorescence
intensities, for which skewed distributions would result in
numerical instabilities. A biological consideration is absence of a
linear relationship between cell number and mRNA copies, since
proteins may be translated as a consequence of an initial burst of
mRNA transcription upon cellular development, followed by
significant mRNA degradation. In contrast, one would expect the
average beta value provided by Illumina bead-array products, as
well as similarly constructed quantities from other platforms to
scale in proportion to the actual fraction of methylated nucleic
acids with a biologically reasonable assumption of two DNA
molecules per cell.
[0364] An example of an application of methods herein is shown
using mRNA data. The validation data set S.sub.0 was obtained from
Watkins N A et al., 2009, Blood 113: e1-e9, in which the Illumina
Human-6 v2 Expression BeadChip was used to characterize the mRNA
expression profile of eight types of blood cells: B cells,
granulocytes, erythroblasts, megakaryocytes, monocytes, natural
killer cells, CD4+ T cells, and CD8+ T cells. For this analysis
erythroblasts (nucleated progenitors of red blood cells) and
megakaryocytes (progenitors of platelets) were removed. The target
data set S.sub.1 was obtained from Showe M K et al., 2009, Cancer
Res 69: 9202-10, in which the same mRNA expression platform was
used to characterize expression differences in isolated mononuclear
cells between nonsmall cell lung cancer (NSCLC) cases and controls
having non-cancer lung disease, adjusting for age, sex and smoking.
In addition, data was presented from 18 matched case samples, pre-
and post-operative.
[0365] The same methodology was used as for the DNA methylation
data sets herein, ordering the 46,693 transcripts by F statistic
according to their ability to distinguish six types of leukocytes.
Of the 100 transcripts having the largest F statistics it was
observed that 86 overlapped with the transcripts in Showe M K et
al., 2009, Cancer Res 69: 9202-10. Thus the remainder of the
analysis was carried out using the 86 overlapping loci. In the
analyses, untransformed data (i.e. using either the normalized
fluorescence intensities or 2 raised to the power of the normalized
log.sub.2 intensities) were used. Application of the constrained
projection in Examples 1 and 5 resulted in an average percentage
estimates consistent with mononuclear cells (i.e. a subfraction
with most granulocytes removed): 3.3% B cell, 3.4% granulocyte,
18.1% monocyte, 29.5% NK cell, 11.6 CD4+ T cell, and 2.2% CD8+ T
cell.
[0366] Table 26 presents results from 137 NSCLC cases and 91
controls, adjusted for age, sex, and smoking status. Table 27
presents results from 18 matched pre-operative and post-operative
samples from NSCLC cases, where the analyzed outcome was the
difference in untransformed expression (post-operative expression
minus pre-operative expression), and coefficients displayed
correspond to the intercept of B.sub.1 (analogous to a paired
t-test). Perturbations in T cell distribution were consistent with
known immunological changes resulting from NSCLC (Ginns L C et al.,
1982, Am Rev Respir Dis 23: 265-9; Mazzoccoli G et al., 1999, In
Vivo 13: 205-9), as well as with age and smoking. The perturbations
and coefficient signs were reasonable; the magnitudes were
potentially biased. For example, the estimates corresponding to
granulocyte distribution were much larger than expected given the
relatively small number of granulocytes present in a monouclear
subtraction. Thus, the methods herein were determined to be
suitable for application to mRNA data sets.
TABLE-US-00026 TABLE 26 White blood cell distribution comparing
cases to controls in NSCLC mRNA data set Est SE.sub.2 p-value Case
Status B Cell 0.8 4.15 0.8511 Granulocyte -34.6 9.48 0.0003
Monocyte 17.9 9.58 0.0613 NK 1.3 5.18 0.8095 T Cell (CD4+) 24.9
9.01 0.0057 T Cell (CD8+) -15.2 9.03 0.0931 Age (decades) B Cell
-0.7 1.36 0.5824 Granulocyte -7.9 3.45 0.0218 Monocyte -6.5 2.76
0.0180 NK -4.0 1.80 0.0255 T Cell (cd4+) 13.0 2.89 0.0000 T Cell
(CD8+) 8.3 2.96 0.0052 Sex (male) B Cell 0.1 2.66 0.9827
Granulocyte -34.8 6.41 0.0000 Monocyte 6.8 5.44 0.2091 NK -7.8 3.32
0.0193 T Cell (CD4+) 21.1 5.39 0.0001 T Cell (CD8+) 13.2 5.76
0.0223 Former Smoker B Cell 1.6 3.97 0.6821 Granulocyte 17.2 8.25
0.0375 Monocyte 6.1 7.84 0.4368 NK 2.7 5.19 0.6103 T Cell (CD4+)
-11.3 8.02 0.1578 T Cell (CD8+) -20.3 8.28 0.0141 Current Smoker B
Cell 3.4 5.21 0.5183 Granulocyte 31.6 11.26 0.0049 Monocyte 17.8
10.49 0.0907 NK 5.4 6.93 0.4373 T Cell (CD4+) -21.8 10.25 0.0337 T
Cell (CD8+) -41.2 11.10 0.0002 Est = Regression coefficient
estimate (.times.100%) SE.sub.2 = Double-bootstrap standard error
(.times.100%).
TABLE-US-00027 TABLE 27 White blood cell distribution comparing
matched pre-operative and post-operative cases in NSCLC mRNA data
set Est SE.sub.2 p-value B Cell -10.7 5.55 0.0543 Granulocyte -19.4
11.16 0.0826 Monocyte -13.4 10.43 0.1987 NK 6.3 7.15 0.3794 T Cell
(CD4+) -11.3 10.57 0.2859 T Cell (CD8+) 48.8 11.33 0.0000 Est =
Regression coefficient estimate (.times.100%) SE.sub.2 =
Double-bootstrap standard error (.times.100%).
Example 37
An Array for High-Throughput DNA Methylation Analysis
[0367] An array for performing DNA methylation analysis in a
high-throughput manner was made using VeraCode microbeads
(Illumina, San Diego, Calif. USA) and DNA sequences of regions in
96 different genes, each sequence having one CpG dinucleotide shown
within square brackets (FIG. 40) and used to determine methylation
status of the gene. Veracode beads are cylindrical glass microbeads
240 microns in length by 28 microns in diameter with a surface
suitable for attaching DNA, RNA, protein, antibody and other
ligands for performing bioassays. For performing DNA methylation
analysis various CpG specific DNA oligomers were attached to these
beads. Each microbead is inscribed with a high-density holographic
code (24-bit), allowing development of very large numbers of bead
types. When a laser is shone at the high density codes of the beads
they emit a signal specific to the code and the signal is detected
by a CCD camera. The fluorescence of the bead indicates whether the
particular CpG site carried by the bead is demethylated. The result
is compared with the fluorescence readout obtained from DNA from a
purified leukocyte sample. A VeraCode array is a collection of
beads, each carrying a DNA oligomer specific for either the
methylated or the unmethylated form of a particular CpG locus,
distributed into different wells of a micro titer plate. A user
selects the entirety or a subset of nucleotide sequences containing
CpG sites in a gene or genes of interest for attaching to VeraCode
beads to have a custom designed VeraCode array particularly
advantageous for the user's analysis.
[0368] To ascertain which 96 CpGs would give optimal precision for
the white blood cell (WBC) types the following procedure was
followed. The Infinium HumanMethylation 27K data corresponding to
the Magnetic activated cell sorting (MACS sorted leukocyte DNA were
assembled in the methylation module of GenomeStudio, and the
quality of the data was assessed by calculating Mahalanobis
distances. Forty-seven samples yielded acceptable data. A matrix of
n-values was generated with rows defined by microarray CpG locus
and columns defined by sample identification. A corresponding
matrix indicating cellular phenotypes was also generated, with rows
defined by sample identification (in precisely the same order as
the columns in the corresponding matrix) and columns defining the
cell lineage(s) to which each cell lineage belongs.
[0369] A linear mixed effects (LME) model was applied to the
Illumina Infinium HumanMethylation27 WBC lineage as the fixed
effect and beadchip plate as the random effect. The fixed effect
groups were: Pan-T cell, CD4+ T cell, CD8+ T cell, Pan-NK cell,
CD56.sup.dim NK cell, CD56.sup.bright NK cell, B cell, granulocyte,
neutrophil, eosinophil, and monocyte. Across the gene loci, this
model generated coefficients for each fixed effect group indicating
relative estimates of DNA methylation for each of the different
cell types. Collapsing categories accounted for the hierarchical
relationships among cell lineages and a linear transformation was
applied to convert coefficient estimates to estimated mean value
per cell type, resulting in a matrix {tilde over (B)}.sub.0 of mean
values, each row corresponding to a CpG locus and each column
corresponding to a cell type. The model also generated an
F-statistic for each locus that indicates how significantly
different DNA methylation was between the cell types.
[0370] A stochastic search algorithm was then employed to select
the differentially methylated regions (DMRs) that work best in
concert on a custom microarray to distinguish leukocyte lineages,
and would therefore be the most effective at quantifying immune
cell types in a biological sample. The objective was to ascertain
which 96 CpGs would give optimal precision for the WBC types.
[0371] The stochastic search algorithm was designed to maximize
precision of estimated cellular fractions, under the assumption
that the variance-covariance of the fraction estimates is
proportional to ({tilde over (B)}.sub.0.sup.T{tilde over
(B)}.sub.0).sup.-1. To optimize precision for a single individual
cell type, the corresponding diagonal element of ({tilde over
(B)}.sub.0.sup.T{tilde over (B)}.sub.0).sup.-1 was minimized; to
optimize a set of cell types, the sum of the corresponding diagonal
elements was minimized.
[0372] The general strategy was as follows. The engine is a
stochastic search algorithm that starts with an initial set of
CpGs, which is the beginning choice for the "current" set. On each
iteration a randomly chosen CpG from the current set is switched
out with a randomly chosen CpG from the remaining (unselected)
CpGs, and precision is compared between the current set and the
"candidate" set. If the candidate set gives better precision then
the switch is accepted. Otherwise it is rejected. Ideally, by the
end of the algorithm, the acceptance rate should be 0%.
[0373] The algorithm was run for 50,000 iterations starting with
the 500 CpGs having the best F statistics. This was repeated ten
times with different random number seeds each time. Then, the
algorithm was run for 50,000 iterations starting with the CpGs
having the 500 largest absolute effect sizes (coefficients
generated by the LME model) for the WBC types. This was also
repeated ten times with different random number seeds each time.
Next 20 runs were compared and the algorithm run for 50,000
iterations starting with the 500 most frequently chosen CpGs from
the previous 20 runs. This was repeated five times with different
random number seeds each time. Finally, a run was performed for
750,000 iterations starting with the 96 most frequently chosen CpGs
from the previous five runs.
Example 38
Mediation Analysis for Estimating Effects of an Exposure or
Phenotype on Measured DNA Methylation
[0374] A method is described for conducting a mediation analysis to
estimate the effects of an exposure or to estimate the effects of a
specific phenotype on measured DNA methylation along two paths:
through changes in WBC distribution, and directly, unmediated by
changes in WBC distribution. Most Epigenome-wide association scans
(EWAS) have attempted to estimate the marginal effect (.beta.,
depicted in FIG. 41A) on measured DNA methylation, which are
effects not adjusted for WBC distribution. However, a significant
portion of the effect on DNA methylation is mediated through
changes in WBC distribution as shown in FIG. 41B. Of interest in
EWAS studies is .alpha., the direct effect adjusted for WBC
distribution. Estimating this effect requires estimation of two
other quantities, .GAMMA., the effect of exposure or phenotype on
WBC distribution, and .xi., the effect of WBC distribution on
methylation. If y is the DNA methylation measured for subject i at
a particular CpG site (j, subscript suppressed for clarity),
z.sub.i is a p.times.1 matrix of covariates for subject i
(including the exposure or phenotype of interest), and
.omega..sub.i is the subject-specific WBC distribution estimated
using constrained projection in the manner described in Example 1
then y.sub.i=z.sub.i.sup.T.alpha.+.omega..sub.i.sup.T.xi.+e.sub.i,
where e.sub.i is a zero-mean error. Additionally, the effect of
exposure/phenotype on WBC distribution can be modeled as
.omega..sub.i=.GAMMA.z+u.sub.i, where u.sub.i is a zero-mean error
vector. It is noted that .alpha. is a p.times.1 vector, and K cell
types are assumed, so that .omega..sub.i is a K.times.1 vector,
.GAMMA. is a K.times.p matrix, and .xi. is a K.times.1 vector. It
follows that
y=z.sub.i.sup.T(.alpha.+.GAMMA..sup.T.xi.)+u.sub.i.sup.T.xi.+e.sub.i,
so that the marginal effect .beta. is the p.times.1 vector
.alpha.+.GAMMA..sup.T.xi.. Estimation proceeds first by computing
{circumflex over
(.GAMMA.)}=(.SIGMA..sub.i=1.sup.n.omega..sub.iz.sub.i)(.SIGMA..sub.i=1.su-
p.nz.sub.i.sup.Tz.sub.i), then computing
u=.omega..sub.i-.GAMMA.Z.sub.i,
r.sub.i=(z.sub.i.sup.T,u.sub.i.sup.T).sup.T, {circumflex over
(.zeta.)}=(.SIGMA..sub.i=1.sup.nr.sub.i.sup.Tr.sub.i).sup.-1(.SIGMA..sub.-
i=1.sup.nr.sub.iy.sub.i), extracting {circumflex over (.xi.)} as
the last K components of {circumflex over (.zeta.)} and obtaining
{circumflex over (.alpha.)} by subtracting {circumflex over
(.GAMMA.)}.sup.T{circumflex over (.xi.)} from the first p
components of {circumflex over (.zeta.)}.
Statistical inference is achieved by permutation. Specifically, the
null distributions of {circumflex over (.alpha.)} and {circumflex
over (.GAMMA.)} are obtained by permuting the exposure or phenotype
of interest within z (only the components representing the
covariate to be tested), and the null distribution of {circumflex
over (.xi.)} is obtained by permuting the subject assignments
corresponding to .omega..sub.i. Adjustments for multiple
comparisons are achieved by nesting within each permutation a loop
that estimates {circumflex over (.alpha.)}.sub.j, {circumflex over
(.GAMMA.)}.sub.j, and {circumflex over (.xi.)}.sub.j for each
individual CpG, with adjusted p-values obtained by comparing the
maximum absolute values of {circumflex over (.alpha.)}.sub.j,
{circumflex over (.GAMMA.)}.sub.j, and {circumflex over
(.xi.)}.sub.j (over the CpGs) to the corresponding statistics
computed from each individual permutation. For comparison purposes,
a similar permutation test can be applied for the marginal
coefficient .beta..
[0375] This method to a data set consisting of n=205 control
subjects in a bladder cancer case/control study (Karagas M R et
al., 1998, Environ Health Perspect 106: 1047-1050). Four separate
analyses were performed: (1) the phenotype of interest was age; (2)
the exposure of interest was current smoker status; (3) the
exposure of interest was toenail arsenic; and (4) the exposure of
interest was reported use of hair dye. Sex was included as a
covariate in analyses, and age was included in (2)-(4).
[0376] The relationship between {circumflex over (.alpha.)} and
{circumflex over (.beta.)} for the covariate of interest over
autosomal CpGs is shown in FIG. 42. Dots represents overall
methylation as indicated by the first component of the coefficient
vector {circumflex over (.beta.)}, corresponding to the intercept
(light=low, black=moderate, dark=high). The diagonal straight line
represents the identity ({circumflex over (.alpha.)}={circumflex
over (.beta.)}). The curve depicts a loess fit to the scatter plot.
In each of the cases there is an S-shaped relationship that shows
attenuation of effect ({circumflex over (.alpha.)} tends to be
smaller than {circumflex over (.beta.)}). Table 28 shows the
multiple-comparisons adjusted p-values for each coefficient
corresponding to the covariate of interest (.beta., .alpha.,
.gamma.) and overall WBC distribution effect on DNA methylation
(.xi.), obtained by permutation test using 5000 permutations. As
shown in the table, significance of .alpha. may be greater than,
less than, or equal to the significance of .beta.. Remarkably, in
every case, the covariate of interest shows a strongly significant
association with WBC distribution. It is noted that WBC shows
significant overall association with DNA methylation.
TABLE-US-00028 TABLE 28 Multiple-comparisons adjusted p-values
Exposure/Phenotype .beta. .alpha. .gamma. .xi. Age 0.0358 0.0838
<0.0002 0.0100 Current Smoker 0.0326 0.0200 <0.0002 0.0134
Toenail Arsenic 0.1054 0.0512 <0.0002 0.0148 Dye Use 0.2614
0.2570 <0.0002 0.0102
Example 39
Comparison of Methods Herein for Estimating Fractions of Blood Cell
Types with Non-Negative Matrix Factorization (NNMF)
[0377] The methods herein are predicated on the relationship
E ( Y i ) = l = 0 d 0 b 0 l .omega. il , ##EQU00007##
where Y.sub.i is a vector of DNA methylation measurements obtained
for subject i, d.sub.0 is the number of blood cell types to be
assayed, .omega..sub.il are the fractions of each blood cell type
corresponding to subject i, and b.sub.l is the vector of
methylation fractions corresponding to blood cell type l; the
methods herein provide techniques for estimating the fractions
.omega..sub.il assuming the values of b.sub.l have been obtained
from an external validation data set. In contrast, non-negative
matrix factorization (NNMF) could be used to estimate
.omega..sub.il and b.sub.l simultaneously in absence of an external
validation set. In the context of NNMF, the d.sub.0 vectors
.omega..sub..cndot.l are considered "factors", and the d.sub.0
vectors (assumed to represent individual methylation profiles) are
considered "basis vectors" and the number of factors d.sub.0 must
be provided to the NNMF algorithm.
[0378] Using the 12 experimental samples described in Example 5
NNMF was compared to methods herein (Examples 1-3). Highest ranking
100 and 500 pseudo-DMRs were selected on the basis of
informativeness as in Example 4; for each choice, the constrained
projection described in Examples 1 and 5 was used to impute
specific cell distributions, then NNMF was performed assuming four,
five, and six factors (i.e. factor values assumed to represent the
fractions .omega..sub.il for one cell type l). The nmf function in
the R package NMF was used with default settings. Since NNMF
requires random inputs, NNMF was applied 100 times, each with
different randomly generated starting values according to the
default settings of the nmf function. Six cases were considered,
viz., 100 CpGs and 500 CpGs for each of four, five and six factors.
For each of the 100 runs in each of the six cases, the fitted
factors .omega..sub..cndot.(values of which were assumed to
correspond to fractions .omega..sub.il) were correlated to expected
fractions of B cells, T cells, monocytes, and granulocytes, and for
each specific cell type, the factor with the maximum correlation to
that type was assigned to it. Then, for each cell type in each
case, the median correlation with assigned factor was tabulated.
Table 29 below reports these median values, and Table 30 reports
the correlation between expected fraction and the fraction observed
using methods herein. A comparison of these tables demonstrates
that, though NNMF can achieve high correlation with expected cell
fraction if the pseudo-DMRs are known in advance, the methods
described herein in Examples 1-4 still achieves higher correlation.
In addition, NNMF occasionally fails to match known cell types to
imputed cell types in a monomorphic manner. Table 31 reports the
percentage of runs for which at least two different cell types were
matched via NNMF to the same factor.
[0379] It is expected that NNMF would behave less favorably than
methods described herein (Examples 1-4), since NNMF requires the
estimation of (n+M) F unknown parameters (where n=# of target
samples, M=# of CpGs, and F=# of factors) and methods herein
require the estimation of only n K unknown parameters, where K<F
and K is the number of known cell types.
TABLE-US-00029 TABLE 29 Median correlation for two different sets
of CpG containing sequences Factors = 4 Factors = 5 Factors = 6 100
CpGs B cells 0.998 0.996 0.996 T cells 0.988 0.989 0.990 Monocytes
0.832 0.900 0.927 Granulocytes 0.967 0.954 0.963 500 CpGs B cells
0.998 0.996 0.996 T cells 0.985 0.993 0.990 Monocytes 0.798 0.896
0.879 Granulocytes 0.943 0.977 0.970
TABLE-US-00030 TABLE 30 Correlation between expected fraction and
the fraction observed using methods herein. 100 DMRs 500 DMRs B
cells 1.000 1.000 T cells 0.998 0.997 Monocytes 1.000 1.000
Granulocytes 0.997 0.999
TABLE-US-00031 TABLE 31 Percentage of runs for which at least two
different cell types were matched to the same factor Factors DMRs =
100 DMRs = 500 4 4 2 5 0 1 6 0 0
Example 40
Quantitation of T Cell, Treg and CD16+CD56.sup.dim NK Cell Numbers
by CD3Z, FoxP3 and NKp46 Methylation Assays, Respectively Using
Droplet Digital PCR
[0380] A droplet digital PCR technique was used to quantitate T
cell, Treg and CD16+CD56.sup.dim NK cell numbers using CD3Z, FoxP3
and NKp46 methylation assays described in Examples 15 and 30.
Digital PCR (dPCR) is a refinement of conventional PCR methods and
is used to directly quantify and clonally amplify nucleic acids.
dPCR and traditional PCR differ in method of measuring nucleic acid
amounts, as dPCR is more precise. The two PCR methods differ in
that the sample is separated into a large number of partitions in
dPCR, and the reaction in each partition is carried out
individually. This separation produces a more reliable collection
and sensitive measurement of nucleic acid amounts.
[0381] Isolated and purified T cells and Tregs were serially
diluted, and copies of each of the targets were quantified as
measures of cell numbers. Bisulfite converted DNA from whole blood,
isolated human T-cells and Treg cells and from NK cells was
quantified using the emulsion partitioning method of BioRad
QX100.TM. Droplet Digital.TM. PCR (ddPCR.TM.) system. This system
creates portioned PCR reaction using water-in-oil droplets for
performing high-throughput digital PCR. The QX100 droplet generator
partitions samples into 20,000 nanoliter-sized droplets. After PCR
using a thermal cycler, droplets from the samples were streamed in
single file on a reader (QX100 droplet reader). The PCR-positive
and PCR-negative droplets were counted to obtain quantification of
target DNA in digital form. Results are shown in FIGS. 43-46 as dot
plots of fluorescence intensities of the droplets, with each point
on the plot representing a single droplet. The horizontal lines are
cutoffs between "positive" and "negative" droplets for each sample.
A measure of concentration of the target sequence (demethylated
CD3Z, Fox3P or NKp46) in copies per microliter was obtained as
readout from the system. Dividing target sequence concentration by
total DNA concentration obtained by C-less PCR yielded the percent
of total DNA that was positive for the target DNA region (FIGS.
45-46).
[0382] Data in figures show that successful amplification and
detection of CD3Z and Foxp3 DMRs, respectively were obtained. FIG.
43A and FIG. 44A show dot plots indicating distinguishing of
positive droplets and negative droplets. FIG. 43B and FIG. 44B show
the calculated absolute numbers of positive PCR droplets. Results
obtained from dilution of standard purified T cells shows
correspondence of quantities of CD3Z and FoxP3 genes with extent of
dilution and hence validity of dPCR as a detection method for
methylation based assay of immune cell identity. Other partitioning
approaches have been developed that employ microfluidic
manipulation and results similar to the data obtained herein are
expected from the use of such other methods of partitioning. FIG.
45 shows quantitation of purified NK cells under different
conditions and FIG. 46 shows quantitation of whole blood and of
purified leukocyte subsets by measuring demethylated NKp46 DMR
described in Example 30.
Example 41
Sample Workflow
[0383] FIG. 47 summarizes the workflow carried out for samples
derived from human whole blood utilized in the following examples.
FIG. 51 describes 85 venous whole blood samples that were collected
from disease free human donors. Of these, 79 samples were used for
isolation of target cell type by magnetic activated cell separation
(MACS) and six samples were subjected to conventional immune
profiling in which fresh aliquots are analyzed by protein based
methods. Purity was confirmed in the 79 samples isolated by MACS by
fluorescence activated cell sorting (FACS). The six samples
separated by conventional immune profiling were stored under 12
specific storage conditions which differed by presence of
coagulants and temperatures, and duration, which yielded 72
samples.
[0384] DNA was extracted from each of the 79 samples from FACS and
the 72 samples from the 12 specific storage conditions. An aliquot
of the genomic DNA from five of the FACS purified, DNA extracted 79
samples were combined in quantities that mimicked human blood by
artificially reconstituting peripheral blood. Aliquots of each of
seven of the cell DNA mixtures, the FACS purified DNA extracted 79
samples, and the 72 samples in the 12 specific storage conditions
were then randomized. Aliquots of the resulting 158 samples were
contacted with sodium bisulfate, which is used in the analysis of
methylation status of cytosines in DNA, and 158 sodium bisulfate
treated aliquots of the 58 samples were analyzed using each of a
high-density methylation microarray (HDMA) and a low-density
methylation microarray (CDMA).
[0385] Date for DNA methylation microarrays are available at an
NCBI website entitled, "Gene Expression Omnibus" (GEO) in
accordance with a protocol known as, "Minimum Information About a
Microarray Experiment" (MIAME). The methods, materials and
conditions described in this example and FIG. 47 are fully
described in the following examples.
Example 42
Purified Leukocyte Subtypes
[0386] Venous whole blood samples were collected from 79
disease-free human donors whose demographic characteristics are
shown in Table 32. A homogenous populations of one specific type of
leukocyte was obtained from each sample, which were purified by
MACS, a method of cell separation that utilizes antibody-conjugated
magnetic microbeads, and a combination of positive and negative
selection protocols (Miltenyi Biotec Inc., Auburn, Calif.). Purity
of the 79 purified cell samples was determined by FACS.
Representative FAGS results for 15 sample types are shown in FIG.
48. The hierarchical relationship between the different populations
of MACS purified leukocyte subtypes, and the number of replicate
samples for each cell type, is shown in FIG. 49.
TABLE-US-00032 TABLE 32 Demographic characteristics of blood donor
for purified cells Total number 79 Age, Mean (SD) 30 (9) .sup.
Weight (lbs), Mean (SD) 181 (38) .sup. Height (inches), Mean (SD)
69 (3.7) Gender Male, No. (%) 62 (78%) Female, No. (%) 15 (19%)
Unknown, No. (%) 2 (3%) Race White No. (%) 32 (41%) Hispanic, No.
(%) 12 (15%) Black, No. (%) 13 (16%) Asian, No. (%) 13 (16%) Native
American, No. (%) 3 (4%) Unknown/Other, No. (%) 6 (8%) Tobacco
smoking Yes, No. (%) 13 (16%) No, No. (%) 33 (42%) Unknown, No. (%)
33 (42%)
Example 43
Conventionally Profiled Whole Bloods
[0387] Six additional venous whole blood samples were collected
from different disease free human donors whose demographic
characteristics are summarized in Table 32. The workflow for these
samples is shown in FIG. 58. Each whole blood sample was divided
into three aliquots, which contained an anticoagulant: heparin,
citrate, or EDTA. A portion of the aliquot in Heparin was used to
perform conventional immune profiling methods, including flow
cytometry which is described below, manual 5-part white blood cell
differential and CBC with automated 5-part white blood cell
differential. Another portion of this aliquot for each sample was
analyzed for methylation assessment using the high-density DNA
methylation microarray (HDMA; described in examples below). Another
portion was analyzed for methylation assessment using the
low-density methylation array (LDMA; described in examples below)
directly without storage. Aliquots for each of the six blood
samples were each stored overnight at one of three temperatures
(room temperature, 4.degree. C., and -80.degree. C.) prior to
methylation assessment on the HDMA.
Example 44
Differential Leukocyte Counts
[0388] Manual white blood cell (WBC) counts were performed
according to established standards (Koepke J A. 1977 Differential
Leukocyte Counting. Stokie, Ill.: College of American Pathologists;
Houwen B. 2001 The differential cell count. Laboratory Hematology
89-100). Automated WBC counts were performed using the XE-5000.TM.
Automated Hematology System (Sysmex America, Inc., Mundelein, Ill.)
according to manufacturer instructions. The following cell types
were enumerated: total WBC, lymphocytes, monocytes, neutrophils,
basophils and eosinophils.
Example 45
Fluorescence Activated Cell Sorting (FACS) of Leukocyte Subsets
[0389] Blood samples were directly stained for cell surface markers
and were incubated for 20 minutes in the dark at 4.degree. C.
Antibodies were purchased from eBioscience Inc (San Diego, Calif.).
Each blood sample was divided into two aliquots. The first aliquot
cells were stained with: anti-human CD3e FITC (catalog number
11-0039-41), anti-human CD4 APC-eFluor 780 (catalog number
47-0049-41), anti-human CD8a 605NC (catalog number 93-0088-41),
anti-human CD16 PE-Cy7 (catalog number 25-0168-41), anti-human CD25
APC (catalog number 17-0259-41), anti-human CD45 PerCP-Cy5.5
(catalog number 45-9459-41), antihuman CD56 PE (catalog number
12-0567-41), and anti-human CD127 eFluor 127 (catalog number
48-1278-41) to analyze T-cells, NKT cells, and NK cells. The second
aliquot cells were stained with: anti-human CD14 FITC (catalog
number 11-0149-41), anti-human CD15 eFluor 450 (catalog number
48-0159-41), anti-human CD16 PE-Cy-7, anti-human CD19 APC-eFluor
780 (catalog number 47-0199-41), anti-human CD45 PerCP-Cy5.5, and
anti-human CD123 PE (catalog number 12-1239-41) to analyze B-cells,
monocytes, and granulocytes (neutrophils, eosinophils, and
basophils).
[0390] Unstained, isotype, and fluorescence-minus-one (FMO)
controls were used to determine sample gating and background.
Individual compensation controls were used in each sample run.
CountBright counting beads (Invitrogen, catalog number C36950) was
added for quantification of total leukocytes and each subset.
Acquisition was performed within 12 hours of blood draw on the
FACSAria III flow cytometer (Becton Dickinson) using FACSDiva
Software (Becton Dickinson). An acquisition limit of 10,000 events
was used on the monocyte gate, using FSC versus SSC dot plot, for
each aliquot. Final data analysis and presentation of results was
done using Flowjo software (TreeStar Inc).
[0391] Cell types and detection parameters were set as follows:
Lymphocytes: low SSC (side scatter) and low FSC (forward scatter);
B-cells: CD45+ and CD19+; T-cells: CD45+ and CD3+ antibodies;
Helper T-cells (Th): CD3+ and CD4+; Regulatory T-cells (Tregs):
CD3+ and CD4+ and CD25+ and FOXP3+; Cytotoxic T-cells (Tc): CD3+
and CD8+; Natural Killer T-cells (NKT): CD3+ and C56+; Natural
Killer (NK) cells: CD3- and CD56+; Effector NK cells: CD3- and
CD16+ and CD56 dim (i.e. lower level); Regulatory NK cells: CD3-
and CD16- and CD56 bright (i.e. higher level); CD8+ NK cells: CD3-
and CD8+ and CD56+antibodies; CD8- NK cells: CD3- and CD8- and
CD56+; Granulocytes: high SSC (side scatter) and high FSC (forward
scatter); Eosinophils: CD44+ and high SSC and high FSC; Basophils:
CD123+ and high SSC and high FSC; Neutrophils: CD15+ and CD16+ and
high SSC and high FSC; Monocytes: low SSC (side scatter) and high
FSC (forward scatter) and CD14+.
Example 46
DNA Extraction
[0392] Genomic DNA was extracted and purified from whole blood and
from MACS purified leukocyte samples using AllPrep DNA/RNA/Protein
Mini Kit (QIAGEN, catalog number 8004) or DNeasy blood and tissue
kit (QIAGEN, catalog number 69506) according to manufacturer's
instructions and protocol. DNA was quantified by NanoDrop ND-1000
Spectrophotometer (NanoDrop Technologies, Inc.). DNA samples for
some applications were further purified using the DNA Clean and
Concentrator according to manufacturer's protocol (ZYMO Research
Corporation, catalog number D4004). Samples were kept at 4.degree.
C. for shortterm storage or at -20.degree. C. for long-term
storage.
Example 47
Artificial Blood Samples
[0393] Genomic DNA from five of the purified leukocyte samples was
combined in quantities that mimicked human blood under seven
clinical conditions (Table 33). DNA was mixed thoroughly and stored
briefly at 4.degree. C. prior to analysis.
TABLE-US-00033 TABLE 33 Proportions of DNA from purified cells
combined into mixtures that artificially reconstruct blood under
clinical conditions Clinical T- B- NK Granu- mono- condition cells
cells cells locytes cytes normal 20% 2.5%.sup. 1.5%.sup. .sup. 67%
9% T-cell 6% 6% 5% 70.5% 12.5% lymphopenia-1 T-cell 2% 7% 6% 71.5%
13.5% lymphopenia-2 granulocytosis 10% 0% 0% .sup. 90% 0%
granulocytopenia 34.5%.sup. 17% 16% 9% 23.5% B-cell lymphoma
20.5%.sup. 0.5%.sup. 2% 67.5% 9.5% monocytosis 14% 0% 0% .sup. 61%
.sup. 25%
Example 48
Sodium Bisulfite Conversion
[0394] Genomic DNA from six conventionally profiled whole blood
samples, genomic DNA from the 79 purified leukocyte samples, and
DNA mixtures in the seven artificial blood samples were randomized
and treated with sodium bisulfite using ZYMO EZ-96 DNA Methylation
Kit (ZYMO Research Corp., catalog number D5004), and were stored at
-80.degree. C. until used. This method and procedure was used for
assessment of DNA methylation by converting unmethylated cytosine
residues to uracil.
Example 49
High-Density DNA Methylation Microarray (HDMA)
[0395] To analyze patterns of cell-lineage specific DNA methylation
and examine the viability of the mathematical models herein,
methods were developed. Forty-six of the purified leukocyte DNA
samples, six of the artificial blood reconstruction samples
(excluding T-cell lymphopenia 1), and the six conventionally
profiled whole blood samples were analyzed using the Infinium.RTM.
HumanMethylation27 Beadchip microarray (Illumina Inc., San Diego,
Calif.). This platform was used to quantify the methylation status
of 27,578 CpG loci from 14,495 genes, with a redundancy of 15-fold
to 18-fold. The ratio of fluorescent signals was computed from both
alleles using the following equation:
.beta.(max(M,0))/(|U|+|M|)+100. The resultant .beta.-value was a
continuous variable from 0 (unmethylated) to 1 (completely
methylated) that represents the methylation at each CpG site and
was used in subsequent statistical analyses. Data were assembled
with the methylation module of GenomeStudio software, a product of
Illumina, Inc. (Bibikova, M. et al. 2009 Epigenomics
1:177-200).
[0396] Following the crosscheck optimization procedure, a minimum
number of 34 CpG loci were selected to establish DNA methylation
signatures for the HDMA reference library. These loci were found in
the following genes: CLEC9A (2 loci) (SEQ ID NO:119), INPP5D (SEQ
ID NO:120), INHBE (SEQ ID NO:28), UNQ473 (SEQ ID NO:121), SLC7A11
(SEQ ID NO:122), ZNF22 (SEQ ID NO:11), XYLB (SEQ ID NO:123), HDC
(SEQ ID NO:26), RGR (SEQ ID NO:124), SLCO2B1 (SEQ ID NO:125),
C1orf54 (SEQ ID NO:126), TM4SF19 (SEQ ID NO:127), IGSF6 (SEQ ID
NO:28), KRTHA6 (SEQ ID NO:128), CCL21 (SEQ ID NO:129), SLC11A1 (SEQ
ID NO:130), FGD2 (SEQ ID NO:2), TCL1A (SEQ ID NO:131), MGMT (SEQ ID
NO:132), CD19 (SEQ ID NO:133), LILRB4 (SEQ ID NO:134), VPREB3 (SEQ
ID NO:135), FLJ10379 (SEQ ID NO:136), HLA-DOB (SEQ ID NO:43),
EPS8L3 (SEQ ID NO:4), SHANK1 (SEQ ID NO:137), CD3D (2 loci) (SEQ ID
NO:93), CHRNA3 (SEQ ID NO:138), CD3G (2 loci) (SEQ ID NO:92), RARA
(SEQ ID NO:139), GRASP(SEQ ID NO:140).
Example 50
Low-Density DNA Methylation Microarray (LDMA)
[0397] To thoroughly validate the DNA methylation-based approach to
immune profiling used herein, methods in examples herein were
performed to analyze the 79 purified leukocyte samples, the seven
artificial blood reconstruction samples, and the 72 samples of the
six conventionally profiled whole blood samples (each stored under
12 different conditions) by the VeraCode.RTM. custom
GoldenGate.RTM. Methylation assay (GGMA). The assay used a
four-probe design to differentiate between methylated and
unmethylated sequences for a custom panel of 96 different CpG loci.
The method generated DNA targets through allele-specific,
amplification using universal primers, and hybridization to a bead
array at sites bearing complementary address sequences. The
hybridized targets contained a fluorescent label denoting a
methylated or unmethylated state for a given locus.
[0398] Methylation status of each interrogated CpG site was
calculated as the ratio of fluorescent signal from one allele
relative to the sum of both methylated and unmethylated alleles,
thereby generating a .beta.-value ranging from 0 (unmethylated) to
1 (fully methylated). Several different control types were used to
ensure data quality. Each bead type was represented with an average
30-fold redundancy. Data were assembled with the methylation module
of GenomeStudio software (Illumina, Inc.).
[0399] Following the crosscheck optimization procedure, a minimum
number of 20 CpG loci were selected to establish DNA methylation
signatures for the LDMA reference library. The selected loci were
found in the following genes: FGD2 (SEQ ID NO:2), HLA-DOB (SEQ ID
NO:43), BLK (SEQ ID NO:40), IGSF6 (SEQ ID NO:28), CLDN15 (SEQ ID
NO:29), SFT2D3 (SEQ ID NO:89), ZNF22 (SEQ ID NO:11), CEL (SEQ ID
NO:39), HDC (SEQ ID NO:26), GSG1 (SEQ ID NO:67), FCN1 (SEQ ID
NO:53), OSBPL5 (SEQ ID NO:64), LDB2 (SEQ ID NO:36), NCR1 (SEQ ID
NO:91), EPS8L3 (SEQ ID NO:4), CD3D (SEQ ID NO:93), PPP6C (SEQ ID
NO:7), CD3G (SEQ ID NO:92), TXK (SEQ ID NO:30), FAIM (SEQ ID
NO:32).
Example 51
Statistical Methods
[0400] Statistical analyses in the examples herein were performed
using the R statistical platform (www.Rproject.org)
Example 52
Identification of Cell Lineage-Specific Methylation
[0401] A linear mixed effects (LME) model was applied to the
purified leukocyte HDMA data with cell type designated as the fixed
effect and beadchip as the random effect (controlling for plate
effects) to identify DNA methylation signatures that represent
biomarkers of leukocyte subtypes. This method generated
F-statistics for every CpG on the array indicating how well
differential methylation at that locus distinguishes seven
different leukocyte lineages: T-cells, B-cells, NK cells,
monocytes, eosinophils, basophils, and neutrophils. This method and
calculation also generated seven coefficients for each CpG
indicating directionality and intensity of differential methylation
at that locus for the cell types.
Example 53
Selection of CpG Panel for Immune Profiling
[0402] Using the LME results, a stochastic search algorithm was
implemented to determine the best combination of putative DMRs to
use for the simultaneous assessment of T-cells, B-cells, NK cells,
monocytes, and granulocytes in a human blood sample. This algorithm
was used to assess the predictive ability of a selected panel of
CpG loci by analyzing the variance in methylation across cell types
as designated in a contrast matrix. If substitution of randomly
selected locus of one of the loci in the panel would improve the
predictive ability, the substitution would be accepted and the new
locus would replace the old in the panel. This search algorithm was
implemented for 50,000 iterations starting from ten different
random number seeds in three stages: first starting with the top
500 F-statistics, then the top 500 absolute effect sizes (based on
the LME coefficients), and then the top 500 from the first two
stages. The stochastic search algorithm was implemented an
additional iteration, starting from the top 96 from the final stage
above until the acceptance rate for substitutions definitively
dropped to zero.
Example 54
DNA Methylation-Based Cell Quantification
[0403] To estimate cell mixtures by DNA methylation marks, methods
herein employed a constrained projection, in which a DNA
methylation profile from a target profile is projected onto mean
methylation profiles for isolated cell types, subject to the
constraint that the projection values (estimated mixing weights)
were greater than or equal to zero and sum to less than one. The
mean values were obtained from a reference library of DNA
methylation signatures, and the projection was implemented via
quadratic programming (Goldfarb, D. et al. 1982 Idnani A. Dual and
Primal-Dual Methods for Solving Strictly Convex Quadratic Programs.
In: Hennart J P, ed. Numerical Analysis. Berlin: Springer-Verlag
pages 226-39; Goldfarb D et al. 1983 Mathematical Programming
27:1-33; Houseman, E. A. et al. 2012 BMC Bioinformatics 13:86).
Example 55
Cell Differential Quantification Using DNA Methylation
[0404] Methods herein used DNA methylation to detect and quantify
the proportions of each of T-cells, B-cells, NK cells, monocytes,
basophils, eosinophils, and neutrophils in any single human blood
sample. The first step in achieving this goal was to establish a
reference library of DNA methylation signatures that serve as
biomarkers for those cell types. A microarray was used to identify
and to assess DNA methylation in WBC subsets purified from normal
(disease-free) human blood, to generate a reference data set. To
generate a target data set, DNA methylation at the same CpG loci as
the reference data set was assessed in the target samples using the
same platform used to establish the reference library. The cell
types of interest were quantified in the target samples by
projecting their DNA methylation profiles onto the mean methylation
profiles for the purified WBC types of interest from the reference
data set using quadratic programming (Houseman, E. A. et al. 2012
BMC Bioinformatics 13 (86): pages 1-16, which is hereby
incorporated by herein in its entirety). Sample workflows are
illustrated in FIG. 47.
Example 56
DNA Methylation Distinguishes WBC Subsets
[0405] Venous whole blood was collected from 79 disease free human
donors (Table 32) and homogenous populations of the WBC types of
interest were isolated from each blood sample.
[0406] using magnetic activated cell separation (MACS) with the
purity confirmed by FACS (FIG. 48). To account for inter-individual
variation, at least four samples of each cell type were purified
from each donor (FIG. 49). A subset of these purified cell samples
were analyzed by a high-density methylation microarray (HDMA), the
Infinium HumanMethylation (Illumina Inc., San Diego, Calif.), to
identify patterns of WBC lineage-specific DNA methylation. See
Houseman, E. A. et al. 2012 BMC Bioinformatics 13 (86): pages 1-16,
which is hereby incorporated by herein in its entirety. The HDMA
assessed DNA methylation at 27,578 CpG loci in 14,495 genes
throughout the human genome. A linear mixed effects model applied
to these data (with cell type as the fixed effect and beadchip as
the random effect) revealed hundreds of CpG loci exhibiting
lineage-specific DNA methylation patterns that distinguished the
WBC types of interest.
[0407] A panel of 96 CpG loci was selected that function in concert
for DNA methylation-based immune profiling, which loci could be
placed on a custom low-density DNA methylation microarray (LDMA),
the VeraCode GoldenGate methylation array (Illumina Inc., San
Diego, Calif.), which allowed independent confirmation of the HDMA
results, and would lead to more efficient use of resources for the
quantification of WBC subsets in target samples. A bioinformatic
search algorithm was applied that works in a stochastic manner,
substituting CpG loci and assessing the predictive ability of the
selected loci by analyzing the variance in methylation across WBC
types as designated in a contrast matrix.
[0408] A panel of 96 CpG loci were selected from which DNA
methylation clearly distinguished the WBC types of interest,
B-cells, T-cells, NK cells, monocytes, neutrophils, basophils, and
eosinophils, as indicated by unsupervised hierarchical clustering
of HDMA data for the purified WBC subsets (FIG. 50). These 96 CpG
loci were placed on the LDMA, which used different chemistry was
used than for the HDMA and therefore represented an independent
platform. Unsupervised hierarchical clustering of LDMA data for the
purified WBC subsets identified that DNA methylation at these loci
clearly and reliably distinguished the WBC types of interest (FIG.
51).
Example 57
Accurate Prediction of Purified WBC Subset Identities Using DNA
Methylation
[0409] To test the performance of the method, both HDMA and LDMA
derived DNA methylation data sets for the purified WBC subset
samples were analyzed as if the data sets were target data sets
containing unknown samples. Projection was performed using
quadratic programming to quantity seven different leukocyte
subtypes in each of the purified WBC subset samples using
methylation signatures from the corresponding HDMA or LDMA
reference library. This crosscheck procedure was used to improve
efficiency by identifying any problematic purified WBC subset
samples in the reference set, and to determine the minimum number
of CpG loci required for accurate leukocyte subtype detection and
quantification.
[0410] It was observed that only 34 and 20 CpG loci respectively,
were required to accurately predict the leukocyte subtype identity
of unknown purified WBC subset samples using the HDMA (FIG. 52),
and LDMA (FIG. 53), respectively. These loci are listed in Examples
49-50 herein. The disparity in the minimum number of loci required
with each of the two platforms resulted from the fact that fewer
purified WBC subset samples were analyzed using the HDMA (due to
higher costs associated with that platform) and more CpG loci were
therefore needed to compensate.
[0411] These methods and arrays used herein revealed that
CD16-CD56bright "regulatory" NK cells should be eliminated from
subsequent reference data sets, since this cell type was frequently
misclassified. These cells were not present in significant numbers
in peripheral blood, and were found primarily in lymphatic tissue.
The purities of the regulatory NK cell samples obtained from
peripheral blood were low according to FACS analysis (FIG. 48I),
providing one plausible explanation for any consistent
misclassification.
Example 58
Clinically Relevant Shifts in the WBC Composition Detected Using
DNA Methylation
[0412] Efficacy of methods and arrays herein were analyzed by
detecting specific immune modulations that occur in peripheral
blood of human patients exhibiting particular clinical conditions:
diminished T-cells (T-cell lymphopenia), increased granulocytes
(granulocytosis), diminished granulocytes (granulocytopenia),
diminished B-cells (B-cell lymphopenia), and increased monocytes
(monocytosis). Genomic DNA extracted from five of the purified WBC
subset samples were combined in precise quantities that
represented/mimicked constitution of human blood found in patients
exhibiting each of these clinical conditions, and in normal
patients (Table 33).
[0413] DNA methylation was assessed in these DNA mixtures using
both the HDMA and LDMA platforms and methods. Five different WBC
types in each mixture were quantified by performing projections by
quadratic programming using the appropriate reference data set,
utilizing only the minimum numbers of (34 or 20) CpG loci
established by the crosscheck procedure described in examples
herein.
[0414] Five WBC quantities measured using DNA methylation methods
using HDMA and LDMA were observed to have comparable results to the
expected values (FIG. 54A and FIG. 55A). These data indicate that
methods and compositions herein were effectively detected five
specific, clinically relevant modulations in peripheral blood
immune cell samples.
Example 59
DNA Methylation Analysis Provides Accurate WBC Quantification
Compared to Established Methods
[0415] Methods and arrays described herein were compared to gold
standard methods of WBC quantification. Venous whole blood was
collected from six different, disease-free, human donors (FIG. 47).
Blood samples were analyzed using methods described herein and were
compared to three different, well established immune profiling
methods: manual 5-part differential, CBC with automated 5-part
differential, and FACS.
[0416] Genomic DNA was extracted the blood samples, and DNA
methylation was assessed using both the HDMA and LDMA platforms and
methods described in examples herein. WBC types were quantified by
quadratic programming using the corresponding reference data set,
and utilizing only the minimum numbers of (34 or 20) CpG loci
identified by the crosscheck procedure described in examples
herein. Quantities of WBC types measured by the DNA methylation
methods were comparable to the results obtained using the gold
standard methods (FIG. 54B-D, FIG. 55B-D, and FIG. 56).
[0417] Agreement between methods herein and the gold standard
methods was excellent, and little evidence of systematic bias was
observed. The mean difference between each pair of estimates was
approximately zero. Standard deviations in model prediction values
was determined by calculating root mean square error (RMSE) between
WBC quantities measured using DNA methylation and WBC quantities
measured by each of the gold standard methods (FIG. 57A-C). It was
observed that the standard deviations were low. The levels of
uncertainty were similar to those levels observed among the gold
standard methods (FIG. 57D-F).
Example 60
Storage Conditions do not Affect WBC Estimates Obtained Using DNA
Methylation
[0418] Examples analyzed whether the stability of DNA allows
methods and arrays herein to overcome many limitations of previous
WBC quantification methods. The DNA methods and arrays used herein
did not require fresh blood or an intact cell membrane. Thus these
methods and materials are useful for analyzing samples that were
previously precluded from immunological assessment, such as
archived blood samples that are stored in hospitals and
laboratories, or blood samples collected in an anticoagulant not
compatible with a particular method.
[0419] Examples analyzed whether a blood anticoagulant and/or
storage temperature variations alter WBC quantification by DNA
methylation methods herein. Six venous whole blood samples were
collected from disease-free human donors and were contacted with an
anticoagulants: citrate, heparin, or EDTA. DNA extracted was
extracted from fresh samples and also from samples stored at room
temperature, 4.degree. C. or -80.degree. C. for at least 24 hours
prior to DNA extraction (FIG. 58). DNA samples were analyzed using
a LDMA platform to assess DNA methylation and to generate a target
data set to consider the effects of blood storage conditions. Seven
WBC types of interest were quantified in each of these target
samples by performing a projection by quadratic programming using
the LDMA reference set. The minimum number of 20 CpG loci
established by the crosscheck procedure described above was
used.
[0420] It was observed that the storage conditions examined did not
alter WBC subset quantities measured in human blood by DNA
methylation (FIG. 59).
Sequence CWU 1
1
1401122DNAArtificial SequenceThe sequence has been designed and
synthesized. 1gccagcccca gcaaacggtt ttacttcttc tcagtcctgt
agaggctgag gtgtcaagga 60cgggaccctg ttgctgactg ctcaagagga ggcaagctgg
atctctctta tagagtttcc 120at 1222122DNAArtificial SequenceThe
sequence has been designed and synthesized. 2tctccactga acctggtctg
tgtcctggag tgcagggcct gagcctcggt gctcttggta 60cgtgaagtgc ctggaacagc
ttcacgctcc agtaactgtg aacccctggg gcctcacatg 120cc
1223122DNAArtificial SequenceThe sequence has been designed and
synthesized. 3gatcatgtgt ttgtggcaac ttcctctgtg ggcttttgcc
caggtctgtc cccaagcata 60cgatggccaa aacttctgca ccagagcagc atcctgtgta
acacagtcag gtccagcagt 120ta 1224122DNAArtificial SequenceThe
sequence has been designed and synthesized. 4agcttctctg ctagtggcca
caggcagagc ctgcctttga tgaggttaca gaggcagcca 60cgcctgtgct ctttggactc
tggtgggtgg ggaggcttcc tggtcactaa ccgctcaaca 120tc
1225122DNAArtificial SequenceThe sequence has been designed and
synthesized. 5cacccacggg gccaggctgg cacaacgccc caacgctgca
atcctgggaa gagtcacacg 60cgccctcccg ggacccacgt gaccatcaag ggagtgtgga
ggacacatcc ctcgggggtg 120ac 1226122DNAArtificial SequenceThe
sequence has been designed and synthesized. 6tggcttcctc tctgaggatg
cagctgctgc ctccctgggc ctggctgctt gcatcctgtg 60cgcctggctt ccacagctcc
accgcagagt ctgagctcca aaagagaggc acaagggggt 120ct
1227122DNAArtificial SequenceThe sequence has been designed and
synthesized. 7tttttgtagc tacagataga atagagatct ttgtctattt
tgttcacgaa gtactgcaag 60cgccttgagc tgtacctggc acatctttgt tgctcagcaa
agagtggttg gataaacgaa 120cg 1228122DNAArtificial SequenceThe
sequence has been designed and synthesized. 8accaaatgag agagccattt
ggggatacaa atcattccca acccagagct acagaagaca 60cgtgtccaca aacacaactg
tgcacaaact cactgggcaa tcctgttcaa atatttagca 120ag
1229122DNAArtificial SequenceThe sequence has been designed and
synthesized. 9tctctacgga cttcctcggt gatacccact cgtccatctt
cgatgctaag gccggcattg 60cgctcaatga caatttcgtg aagctcattt catggtaagg
gggaaggagc tggagactta 120ga 12210122DNAArtificial SequenceThe
sequence has been designed and synthesized. 10cgcgccagct gtcaggcggt
ttctagcctc gcttcggtta ttttaagctg atgagcctga 60cgcatctcat cactaatatc
agcagtttca tttctcctgt tttccattcg ctgtaataaa 120at
12211122DNAArtificial SequenceThe sequence has been designed and
synthesized. 11catttacagg aaaggaacca aggctcagag aagaaatgtg
ctccgttcac cgtgtgtaag 60cggacgaccc agaattggaa tggttctttg tggctccaaa
gtctgatttc aacacacccc 120tt 12212122DNAArtificial SequenceThe
sequence has been designed and synthesized. 12gggagacctg acctgaaagg
acccccttca agtgataggg cagagcacag attgcaaaaa 60cgcatattaa gaaatcactc
ttggccgggc gcggtggctc atgcctgtaa tcccagcact 120tt
12213122DNAArtificial SequenceThe sequence has been designed and
synthesized. 13tcccaggcag ccctgctggc ccctaaggac atagagtacc
tgcttctgag agggctgcca 60cggtggccac ctgtgaagcc tgtcacccag aactggatgg
tacctgactt tcttcataga 120cc 12214122DNAArtificial SequenceThe
sequence has been designed and synthesized. 14tttaacattc ctaacagact
acattttgca aaagaataac aacgaagggg acttgtccta 60cgagctagca cacatggtgt
aaaccggagt aattacaagg gtgtagcagg ggtgtgcaga 120aa
12215122DNAArtificial SequenceThe sequence has been designed and
synthesized. 15ggcctccctt gaccactcca cgctgtccga gagctcaaag
gccctcacgg tatacactca 60cgctgggcat ccagtccaca tgggacccac agccctgaat
ggccccaacc acgtgagtgt 120gg 12216122DNAArtificial SequenceThe
sequence has been designed and synthesized. 16cctcctcccc acggaggcct
aggcatcagc cccctccctc atcctttcca gagtttggga 60cgggatgtct tcagttgcca
cggccacagt atggcttccc ctacagttag gctacagttg 120gg
12217122DNAArtificial SequenceThe sequence has been designed and
synthesized. 17agggaagata cggctattat agaagtgact cctcccagga
actgtgcttc cgggattgga 60cgcagggcct caggcatttt gcgtgtccac agtcacaact
gtgtgaatat aggtgtgtca 120ta 12218122DNAArtificial SequenceThe
sequence has been designed and synthesized. 18gtcggcatcg gtgcgtgttg
gtcaggggtc tgggcgggtg tctgatgcgg cctggcctct 60cgcccgcagt tctctcggca
ctggtgactg gcgagagcct ggagcggctt cggagagggc 120ta
12219122DNAArtificial SequenceThe sequence has been designed and
synthesized. 19tggcttggtg gttatagcag tggaagtgtt gaaagtggct
tgataatgaa tatattttaa 60cgatgaagcc gacagaattt gtggataaat cacaggtgaa
ttttggatga aaaaaagaag 120ag 12220122DNAArtificial SequenceThe
sequence has been designed and synthesized. 20cccaaaggaa gattccactt
ggcgcaggca tcaggagtta tccaatgtga cttccaaaga 60cgccttgaaa aggttttctg
ctaacgaaac tcttcttagt caaatgagga accaaaagca 120ga
12221122DNAArtificial SequenceThe sequence has been designed and
synthesized. 21gcccggggag ctaggggaca tgtgtgagca tgaggcctcc
attgacctct ccgcctacat 60cgagtctggg gaagagcagc ttctctccga tctctttgcc
gtgaagccag cgcctgaggc 120ca 12222122DNAArtificial SequenceThe
sequence has been designed and synthesized. 22tgagtatgtc tgggaaacac
aagagtccca gaagattgag tggcctgcag atacgcatta 60cggggtgtac atttgtattg
tggagaagaa aagattttgt gccactctct tcagcctcca 120ct
12223122DNAArtificial SequenceThe sequence has been designed and
synthesized. 23cagctgcagt ggaggcggcg gtgggaaagc ctggcccaca
cacgtggtct gtagcgacag 60cggcttggaa gtgctctacc agagttgcgg taagcccttg
cagtacaccc atgtgtgttt 120at 12224122DNAArtificial SequenceThe
sequence has been designed and synthesized. 24cagccccttg cccaaaccag
ttctgcagag agcccaggcc cggctgttgc aggaaccttg 60cgccaacctc catttccagg
gaaaagctcc gttccccgac aaggacgatc ctctccggct 120tc
12225122DNAArtificial SequenceThe sequence has been designed and
synthesized. 25tttctgtgct gggaattccc ttagctccag cctccactgg
gcagtttatt atcttaattc 60cgcatgaaga gtgtcctccc tcaccctcca ccctgccctg
gaccagacct ccagccgcga 120ca 12226122DNAArtificial SequenceThe
sequence has been designed and synthesized. 26gagctaaggt caaagaaaga
accctttaaa taaagggccc acactggctg ccagggagtg 60cgcaggactg gcaagaggga
agccgggctg ctccacgcct ttcacgcctt ccacctcctg 120cg
12227122DNAArtificial SequenceThe sequence has been designed and
synthesized. 27ccgtcctcgt agtaaatgat gtcttggggc tgtggcccga
gctgcctcag gtagatccca 60cgcaggcccc cgctggtgga gcaggtgatg ttgacggagg
ctcccacggg gacagtcgtg 120ca 12228122DNAArtificial SequenceThe
sequence has been designed and synthesized. 28ggagagacac aaggcctggg
agccgctttc ctggcctgcc gtgcagctga ggcactggca 60cgcagcctaa gccaggcaca
cttgcccatg ccctggaatg gagagccagt gacccagagt 120ag
12229122DNAArtificial SequenceThe sequence has been designed and
synthesized. 29gctgatgctg ggggtgactc tgccaaacag ctactggcga
gtgtccactg tgcacgggaa 60cgtcatcacc accaacacca tcttcgagaa cctctggttt
agctgtgcca ccgactccct 120gg 12230122DNAArtificial SequenceThe
sequence has been designed and synthesized. 30gaggaaagga tcatggtagc
cccttctgcg gggagcacac aacagtcttc agttcttctg 60cggtgctcta ctcacaaaaa
cacatctttc aactgaaatc atagttcgct caagatgttt 120ct
12231122DNAArtificial SequenceThe sequence has been designed and
synthesized. 31tggcccggag ggcacccggg cagagacgga agaaattgca
cgtgagcgtt tgtgtgcata 60cgtgtgcctg tccatgtgtg cacacacttg tgcttgtgag
tctctgtgtg cccatgcata 120tg 12232122DNAArtificial SequenceThe
sequence has been designed and synthesized. 32aagtgtgggt aggagccggc
cgctggcccc gctctgggct agacggtggg gacatactgg 60cgggcaaaaa cagccctgtg
cctgctctgc agctatgggg aaggaatatc tgttcatggc 120ag
12233122DNAArtificial SequenceThe sequence has been designed and
synthesized. 33ttggtggcag cctcctaacc ttagccagaa ctattcctgc
taagttcttg cacgagttga 60cgctttgctg agcacagccg atacccagcc tttgcagcaa
agatccttgg tcaaagggat 120aa 12234122DNAArtificial SequenceThe
sequence has been designed and synthesized. 34ccagaaagtg atcctgcaga
tggtgccgat tcatgagtgc ggctagtgag tccgatggag 60cgcctcgcct agtgaatgct
ccagcaagga ggtgtgtgtc tgtgtggtga acgtgtggtt 120cc
12235122DNAArtificial SequenceThe sequence has been designed and
synthesized. 35gacaatgcta cttcagtttg gagcacaaac atatgatcag
cacatggaaa tgtggtaatt 60cggatgcatt cgtgattgca acagattgaa gaaattagac
cagacaaaga gtgtttttag 120ag 12236122DNAArtificial SequenceThe
sequence has been designed and synthesized. 36ctccagccag gacccttcac
aacctgattg ctaagcttgt tagcatagag gtggtctaac 60cgctacatga gccgctcacc
cctgacaacc acactgttgt aatgtatcag aaatgttgat 120ta
12237122DNAArtificial SequenceThe sequence has been designed and
synthesized. 37cacgcggaaa caggtaaaaa tcattttgct tttattttgc
attcaacaag caagttatta 60cggaacagca gttatgggcc aggcatacct cccagagctg
ggaacacagt ggggacctcc 120ct 12238122DNAArtificial SequenceThe
sequence has been designed and synthesized. 38accttgtgat ccgcccacgt
cagcctccca aagtgctggg attacaggcg taagccacag 60cgcccagcct cgctgttctt
atcttggcag cagattccga atgtcggctg gtgcccctgt 120ca
12239122DNAArtificial SequenceThe sequence has been designed and
synthesized. 39aggctggatg gtgacacttc cacacccttg agtgggactg
ccttgtgctg ctctgggatt 60cgcacccagc ttggactacc cgctccacgg gccccaggaa
aagctcgtac agataaggtc 120ag 12240122DNAArtificial SequenceThe
sequence has been designed and synthesized. 40cagagttagc aaacctccat
gctgactcta caaggtaatt tgccctgccg tgtggacaaa 60cgctgcagat ctcatggaga
gggcttgggc tctgccatgt gccatctgtg tgcaccaggg 120ca
12241122DNAArtificial SequenceThe sequence has been designed and
synthesized. 41gcccatggct gggcagagaa atgtcaactc ctgggcttgc
ctgggcactg atgcagcatt 60cgcctgaggg caggaaacat ctgcctcaga aagtcacttg
gggtgggaga aaggaaatga 120tg 12242122DNAArtificial SequenceThe
sequence has been designed and synthesized. 42attgcccagg tcctcagcta
caaggaagct gtgcttcgtg ctatagatgg catcaaccag 60cggtcctcgg atgctaacct
ctaccgcctc ctggacctgg accccaggcc cacgatggtg 120ag
12243122DNAArtificial SequenceThe sequence has been designed and
synthesized. 43gagaaacaac ctgcagtagg ctgggtcaca gaggcaatct
gtgatttttt ggtcaggaca 60cggaaacaaa tctcagttgg ggtatatgtg gacaaatgaa
actggaaaca aaggttgctc 120ct 12244122DNAArtificial SequenceThe
sequence has been designed and synthesized. 44acgcccgtcc tagtcccatc
tcaggtgcgc acttgctgtg tgactttggg cccctctctg 60cgctgcagtc agactccaaa
gtcaggaacg tgagggctac catctctcaa gacatttcag 120ct
12245122DNAArtificial SequenceThe sequence has been designed and
synthesized. 45gtgactcagg tggcaagtgc agtggggagc ccccagcttt
cccttcttgg atgcttcatt 60cgcttggggc caccaaatat cgactgagga ctttctgccc
atgccaggct ctgctctcgg 120tg 12246122DNAArtificial SequenceThe
sequence has been designed and synthesized. 46ctaggaaact tcttccatat
atcataaaca gagaccagta ttacaatact tcacccactg 60cgccaatttg gctttcatgt
ctgtttcctg tgtcgatcac aacatcctag acagcccaaa 120ca
12247122DNAArtificial SequenceThe sequence has been designed and
synthesized. 47gctctccagg gggctgcgag gggctcatgg gatccccatg
ggccaaggcc aggtggttga 60cgtgagtttt tgtcagtgcg aaaaccccag ccctcccttt
atcaccctgc agacgtctag 120gg 12248122DNAArtificial SequenceThe
sequence has been designed and synthesized. 48tgcccccaag caggccggga
ctgccaggct ttacatcaga gaactgagtt tcagttacca 60cggtgaaggc tgacagcaca
gagcacagtt ccgtgcaaat caagacacat ttcccaagtc 120cc
12249122DNAArtificial SequenceThe sequence has been designed and
synthesized. 49ccacccttaa agtcctcaga aggtgggaac tgaactggca
caggatggga accggctgtg 60cgctggccac ttgattttgc cagctgccct gtaattcagc
tggtgaggaa actgaggcac 120ag 12250122DNAArtificial SequenceThe
sequence has been designed and synthesized. 50tgatagccaa ttaggcttgg
ggacctgcat gcccagcccc tgccttcctg gagcccatga 60cgcaggggcc atccctgacc
acagcagatt tcatcgagta cttgcttgtt gagtggtgga 120gc
12251122DNAArtificial SequenceThe sequence has been designed and
synthesized. 51ctccagaggg gatggagagg cgcgactgtg ggagctggaa
ggggcaccac ccggcaattg 60cgggataaag caaatgctgc acacagagtg tgaaacttaa
cctggttgag aattttcggc 120ac 12252122DNAArtificial SequenceThe
sequence has been designed and synthesized. 52ctgacctcac cacccaccag
ggaggtgggt cttattctgg gcatcgtgcc aagttcttag 60cggggccctc tagaatctct
aaagcaaatc aggctgaaga ggggaaaacc agcaggggga 120gg
12253122DNAArtificial SequenceThe sequence has been designed and
synthesized. 53gggttgttac cagcttttag ggaccagaaa acccaggtct
gtctcacctg gacatgtgtc 60cgcagcctgg gcaggcaggt tcttgatatg caggaacaag
actagcagga cagcgagccc 120cc 12254122DNAArtificial SequenceThe
sequence has been designed and synthesized. 54tcccttgcca gcttccctgg
tgaccagcca ggacccaaat cacctgggtc ccctccccta 60cgccctcctg caaagaggaa
gtgctcatga acttcggccc tgccagggcc ttatcagagc 120cc
12255122DNAArtificial SequenceThe sequence has been designed and
synthesized. 55aggctgagaa ggagcagagc agggggcagc cacatggctc
tgccttcccg gctcctcgtc 60cgcctgatct gcaaccagtg gcaaatgcag atcccagatg
cactctggaa gttctgcctg 120ag 12256122DNAArtificial SequenceThe
sequence has been designed and synthesized. 56accaactggg aggaagctta
aatagccttg tctcaattga ggtctggttt gatggccaaa 60cgagtttgct acagaatgct
cagaattgca agcaaggggt gtagagctgc ctctcttctg 120tc
12257122DNAArtificial SequenceThe sequence has been designed and
synthesized. 57agtttgtttc tcaagcacac tgggagggtg agtggtgtag
tccaggcctg aagatgaaat 60cgctgataga catcaggtga caggaaatca gtagcttctg
ctaccttggg cttcgctcca 120at 12258122DNAArtificial SequenceThe
sequence has been designed and synthesized. 58aagcccagct gctggctgat
aaatatttta tcactgctca cagagcagtc cccaggaagg 60cgcctgcatc ctccaagccc
acagagcacc ccttcctgcc cggacagaag gaactggcca 120gg
12259122DNAArtificial SequenceThe sequence has been designed and
synthesized. 59aaggcccttt ggagtaactg cagcaatgag tgcccgggct
gtgcttggag taccagtgct 60cgcccggggc tatactgaat gagtaagcag ccccgtctgc
ttttgctgtg caaaggtaag 120gg 12260122DNAArtificial SequenceThe
sequence has been designed and synthesized. 60gaaatctgcc taatgagggt
cgaggccagc acacacaggg acctatttgc agtaaaacaa 60cgtggggtga cgcctaagaa
atagacaaca ttaacacaaa gggagcctac tacgtagcac 120at
12261122DNAArtificial SequenceThe sequence has been designed and
synthesized. 61tggccagcag cggctacaga gccgtcacta tggggaggga
caggacttga ggggttgcct 60cggtccacct cactggagaa tgggcagagt ttatggagtc
tgaaccacct ggtctccagg 120cc 12262122DNAArtificial SequenceThe
sequence has been designed and synthesized. 62tttctgaccc cgaaggctgt
ggtgttcacc tggacagcag tagcttccca gtaaggcaca 60cgccacgacg cgcaatatta
tgcggccctt taggaggacg ttgccgaatg gtgtgtatcg 120ac
12263122DNAArtificial SequenceThe sequence has been designed and
synthesized. 63agtgggccag cagtcgggcc agagtccagc tcagcaactc
cgggttacag gcagcccagg 60cgggcctagc caccggcagc tgcactcaga ggccactgtg
tcctggctga
gctcatctgc 120ct 12264122DNAArtificial SequenceThe sequence has
been designed and synthesized. 64cggcaaagcc acgctcacct tcctgaaccg
agccgaggat tacaccctta ccatgcccta 60cgcccactgc aaaggtgaga ggctcagcca
cacactccga gggcagagcc aggctctgtg 120ag 12265122DNAArtificial
SequenceThe sequence has been designed and synthesized.
65gagctgcctc ggcgcacggc cactggcccg gctccaggcg gcgcagtctg gctgatgaca
60cgagcgctgt tctcaccagc tgcctgagcc agtcagatgg aaaagtaatc ctatttgtgc
120tt 12266122DNAArtificial SequenceThe sequence has been designed
and synthesized. 66taaagacaac aatttcacag ctctgatgat cagaaatgat
gtaatggcca caggcggctc 60cgcctgcgtc atccatgatt tcatcacaca cctcgggagg
ctcagggtga cagacagtgc 120at 12267122DNAArtificial SequenceThe
sequence has been designed and synthesized. 67aaaccaaagg gacttggagt
gcagatggca tccttcggtt cttccagaca agctgcaaga 60cgctgaccat ggccaaggta
accggcttcc cctcctattg ctcaaaggat gcagtctaca 120gc
12268122DNAArtificial SequenceThe sequence has been designed and
synthesized. 68agaagtggac acagcaggtc ttggctctta agatctgcag
ttgtgagttc cttttgcaat 60cgctgtaggt cattgtgcaa cctgctggtc tctggactcc
tgatttctag acatctataa 120aa 12269122DNAArtificial SequenceThe
sequence has been designed and synthesized. 69cccttcaaag cccgccttct
tgccgtgtga tgctgcctgg gccagcaggg caggtcacca 60cgctgtctct tcaaagcagc
tcgctcatgc ccacagcgct gggcacaagg gcagccacga 120gc
12270122DNAArtificial SequenceThe sequence has been designed and
synthesized. 70tcgggtacaa gagcatgaat ttgggcctcc ccaacatctg
cagtgcaaaa tatttaacaa 60cgggtgtggc acagcctctg accaacagcc agaacacaca
caagccacac acagccatgc 120ct 12271122DNAArtificial SequenceThe
sequence has been designed and synthesized. 71ttttccggtt tttgatcttt
cttctgctta gtcggcgaac tggggtctgg ttccctctct 60cgctctctcc tctggtccct
cccttctccc acagcctctc ctccgtcccc gccccagtgc 120cc
12272122DNAArtificial SequenceThe sequence has been designed and
synthesized. 72ccaggcccag acgagcgatt ggcggaggcc ggtcccgtga
ccacgaattc cctgtaattt 60cgctggagtc ctgggtttaa tagagagagt ccccatacgc
ttgtatttat cagcaatata 120ca 12273122DNAArtificial SequenceThe
sequence has been designed and synthesized. 73ctcacatgac ctggattgga
acttggctca gccactgact agccagacaa actcaaataa 60cgtacacagc ttctcagcac
ctcaccccct cattcataaa aaacagggac aatggtaccc 120ac
12274122DNAArtificial SequenceThe sequence has been designed and
synthesized. 74ctgtctgcat gcactggaat tgaggtctgt ggatgtgcct
ttcctgacaa tatttcttca 60cgcttgctgc ccactggtgc tgtgagggca caataacgaa
tgtttacttt gcccttgcac 120tc 12275122DNAArtificial SequenceThe
sequence has been designed and synthesized. 75tgagctaatt aatactagta
atctacctgc aacagctgca gcgaggactc tgtgaggtca 60cgtgggaagg agcttggcac
agtgtcaagg acgcctcctt gaactgagct taggactctg 120ga
12276122DNAArtificial SequenceThe sequence has been designed and
synthesized. 76ggaagtggac cctcgggtgc caggtttgca ggaatccact
tccttgatgt cagtccttgg 60cgccaagcct cagttgggta tcagaagcct tgctccatca
gagatggggt cccagccatc 120ag 12277122DNAArtificial SequenceThe
sequence has been designed and synthesized. 77ctgccagcaa cagcagtgac
cttctggggc gggtcctgcc tggctggggt tcctctttct 60cgctcctggg tcgagccccc
actcccaggc tgcgcctccc tcttttctgg agaggtatct 120tt
12278122DNAArtificial SequenceThe sequence has been designed and
synthesized. 78tgtgggtcct ggtctgcggt ctctcttgcc cctctgagtc
cacgccctgc agggaggtta 60cgctttgtga tgtaattcag cacctgtgtc ttgtcccagt
gaggacatct cccacttgcc 120ag 12279122DNAArtificial SequenceThe
sequence has been designed and synthesized. 79ctttggttac cgaaaacagc
ccggctggga ctgctgggct gggaacttag ctaagcagtg 60cggaggctga accccaccat
ctctgggatc cgcagcaaat cagaagcccc cacccacgat 120aa
12280122DNAArtificial SequenceThe sequence has been designed and
synthesized. 80tgggggtgcc tggagtttgg ctggggctgg gtgcccagtg
ggcgggcaca ggccccttga 60cgtggctgtg gcctagctgg cagcctcgtc cttcctctcc
gctaggcggg cactggagct 120tt 12281122DNAArtificial SequenceThe
sequence has been designed and synthesized. 81cttcccagct tcctctgcct
ggattcttag aggcctgggg tcctagaacg agctggtgca 60cgtggcttcc caaagatctc
tcagataatg agaggaaatg cagtcatcag tttgcagaag 120gc
12282122DNAArtificial SequenceThe sequence has been designed and
synthesized. 82aggggcggga caggggtagg gtggcgcggt ggctgggcgc
aaaggtcccg cagtgggcca 60cgcaggcacc gggctgacct ggcaaaactt tggcgtctct
gaaaacctct ggtaaccagc 120tc 12283122DNAArtificial SequenceThe
sequence has been designed and synthesized. 83tgcctgccag gactgataag
gggccctcct agggctccca caaacggttt atcggtttat 60cgctggggga cagcctgcag
gcttcaggag gggacacaag catggagcgg ctttggggtc 120ta
12284122DNAArtificial SequenceThe sequence has been designed and
synthesized. 84cctccgcgac tacctgagct ccttcccctt ccagatttga
ccggcagcgc ccgccgtgca 60cgcagcatta actgggatgc cgtgttattt tgttattact
tgcctggaac catgtgggta 120cc 12285122DNAArtificial SequenceThe
sequence has been designed and synthesized. 85gacacacact ataatgatcc
tttctatact ccttagccat tgaacgagag atcaaataaa 60cgcagtaaca tccctcagat
gcatgatttg agcatggctt ggaaagtatt agcagttacc 120tg
12286122DNAArtificial SequenceThe sequence has been designed and
synthesized. 86gttgaacagg ccagttactg ggatgcagtt ctgcgtttcc
cttgggtctc accttaacat 60cgctcgctga agtgtgccca gattacagag cgggcaaagg
gaagcagtgg ttttgctcac 120ag 12287122DNAArtificial SequenceThe
sequence has been designed and synthesized. 87ttggtttttt tcaagagatg
agaaaagaga tgtgccagtt gtgttgccaa atcacagtga 60cgggccctgg tccagaaaag
attttcatgt tacacaattg caggcttctg attttttttt 120ct
12288122DNAArtificial SequenceThe sequence has been designed and
synthesized. 88cctggggcaa ggccccttcc tgttcgggtg ttggctccgg
aacttggttc tggggctgac 60cgctgctggg gccccactta gtctgagtct gcagttaact
ccgtgacccc aaggcatcca 120ag 12289122DNAArtificial SequenceThe
sequence has been designed and synthesized. 89tgcttctcta ttctgttctc
agtttcggcc acaggcctgg caacatcctt gactccttcg 60cgccccttgt ccaagactcg
gtgctgctgt cccatgtgtt tggtgtcact ctcgtgctct 120gg
12290122DNAArtificial SequenceThe sequence has been designed and
synthesized. 90attggagccg gtggccacgg ccaaggagga tgctggcctg
gaaggggact tcagaagcta 60cggggcagca gaccactatg ggcctgaccc cactaaggcc
cggcctgcat cctcatttgc 120cc 12291122DNAArtificial SequenceThe
sequence has been designed and synthesized. 91tgaaggaagg actcacgctg
ctgggcgctg atcctctgac tcagacacag ccctggaaga 60cgggagtaat gagacctgtt
gcctcccagg cacaccgtga tcccattccc cttccacgcc 120ag
12292122DNAArtificial SequenceThe sequence has been designed and
synthesized. 92tggagccagt ctagctgctg cacaggctgg ctggctggct
ggctgctaag ggctgctcca 60cgcttttgcc ggaggacaga gactgacatg gaacagggga
agggcctggc tgtcctcatc 120ct 12293122DNAArtificial SequenceThe
sequence has been designed and synthesized. 93agggcagctc tcacccaggc
tgatagttcg gtgacctggc tttatctact ggatgagttc 60cgctgggaga tggaacatag
cacgtttctc tctggcctgg tactggctac ccttctctcg 120ca
12294122DNAArtificial SequenceThe sequence has been designed and
synthesized. 94ctgcctccca gcctctttct gagggaaagg acaagatgaa
gtggaaggcg cttttcaccg 60cggccatcct gcaggcacag ttgccgatta caggtagggc
cgacgtgtcg acggcaggga 120ac 12295122DNAArtificial SequenceThe
sequence has been designed and synthesized. 95ttggattatt agaagagaga
ggtctgcggc ttccacaccg tacagcgtgg tttttcttct 60cggtataaaa gcaaagttgt
ttttgatacg tgacagtttc ccacaagcca ggctgatcct 120tt
12296122DNAArtificial SequenceThe sequence has been designed and
synthesized. 96tcgatgaagc ccggcgcatc cggccgccat gacgtcaatg
gcggaaaaat ctgggcaagt 60cgggggctgt gacaacaggg cccagatgca gaccccgata
tgaaaacata atctgtgtcc 120ca 1229726DNAArtificial SequenceThe
sequence has been designed and synthesized. 97ttgtatgtat gtgagtgtgg
gagaga 269822DNAArtificial SequenceThe sequence has been designed
and synthesized. 98tttcttccac cccttctctt cc 229918DNAArtificial
SequenceThe sequence has been designed and synthesized.
99ctccccctct aactctat 1810021DNAArtificial SequenceThe sequence has
been designed and synthesized. 100ggatggttgt ggtgaaaagt g
2110124DNAArtificial SequenceThe sequence has been designed and
synthesized. 101caaaaactcc ttttctccta acca 2410219DNAArtificial
SequenceThe sequence has been designed and synthesized.
102ccaaccacca ctacctcaa 1910323DNAArtificial SequenceThe sequence
has been designed and synthesized. 103gggttttgtt gttatagttt ttg
2310421DNAArtificial SequenceThe sequence has been designed and
synthesized. 104ttctcttcct ccataatatc a 2110519DNAArtificial
SequenceThe sequence has been designed and synthesized.
105caacacatcc aaccaccat 1910626DNAArtificial SequenceThe sequence
has been designed and synthesized. 106ttgtatgtat gtgagtgtgg gagaga
2610722DNAArtificial SequenceThe sequence has been designed and
synthesized. 107tttcttccac cccttctctt cc 2210828DNAArtificial
SequenceThe sequence has been designed and synthesized.
108catctgggcc ctgttgtcac agcccccg 2810924DNAArtificial SequenceThe
sequence has been designed and synthesized. 109cgacaccacg
gaggaagaga agag 2411019DNAArtificial SequenceThe sequence has been
designed and synthesized. 110atggcggccg gatgcgccg
1911121DNAArtificial SequenceThe sequence has been designed and
synthesized. 111ggatggccgc ggtgaaaagc g 2111224DNAArtificial
SequenceThe sequence has been designed and synthesized.
112cggttaggag aaaaggagtc tctg 2411319DNAArtificial SequenceThe
sequence has been designed and synthesized. 113ctgaggcagc ggtggccgg
1911422DNAArtificial SequenceThe sequence has been designed and
synthesized. 114ggaagagaag gggtggaaga aa 2211518DNAArtificial
SequenceThe sequence has been designed and synthesized.
115atagagttag agggggag 1811622DNAArtificial SequenceThe sequence
has been designed and synthesized. 116attaggttgg tagaatttga gt
2211717DNAArtificial SequenceThe sequence has been designed and
synthesized. 117cccattcccc ttccaca 1711818DNAArtificial SequenceThe
sequence has been designed and synthesized. 118ctcaccaaca caaaacaa
18119726DNAArtificial SequenceThe sequence has been designed and
synthesized. 119atgcacgagg aagaaatata cacctctctt cagtgggata
gcccagcacc agacacttac 60cagaaatgtc tgtcttccaa caaatgttca ggagcatgct
gtcttgtgat ggtgatttca 120tgtgttttct gcatgggatt attaacagca
tccattttct tgggcgtcaa gttgttgcag 180gtgtccacca ttgcgatgca
gcagcaagaa aaactcatcc aacaagagag ggcactgcta 240aactttacag
aatggaagag aagctgtgcc cttcagatga aatattgcca agccttcatg
300caaaactcat taagttcagc ccataacagc agtccttgtc caaacaattg
gattcagaac 360agagaaagtt gttactatgt ctctgaaatt tggagcattt
ggcacaccag tcaagagaat 420tgtttaaagg aaggttccac gctgctacaa
atagagagca aagaagaaat ggattttatc 480actggcagct tgaggaagat
taaaggaagc tatgattact gggtggggtt gtctcaggat 540ggacacagcg
gacgctggct ttggcaagat ggctcctctc cttctcctgg cctgttgcca
600gcagagagat cccagtcagc taaccaagtc tgtggatacg tgaaaagcaa
ttcccttctt 660tcgtctaact gcagcacgtg gaagtatttt atctgtgaga
agtatgcgtt gagatcctct 720gtctga 7261204928DNAArtificial SequenceThe
sequence has been designed and synthesized. 120aaacaggaag
tcagtcagtt aagctggtgg cagcagccga ggccaccaag aggcaacggg 60cggcaggttg
cagtggaggg gcctccgctc ccctcggtgg tgtgtgggtc ctgggggtgc
120ctgccggccc ggccgaggag gcccacgccc accatggtcc cctgctggaa
ccatggcaac 180atcacccgct ccaaggcgga ggagctgctt tccaggacag
gcaaggacgg gagcttcctc 240gtgcgtgcca gcgagtccat ctcccgggca
tacgcgctct gcgtgctgta tcggaattgc 300gtttacactt acagaattct
gcccaatgaa gatgataaat tcactgttca ggcatccgaa 360ggcgtctcca
tgaggttctt caccaagctg gaccagctca tcgagtttta caagaaggaa
420aacatggggc tggtgaccca tctgcaatac cctgtgccgc tggaggaaga
ggacacaggc 480gacgaccctg aggaggacac agtagaaagt gtcgtgtctc
cacccgagct gcccccaaga 540aacatcccgc tgactgccag ctcctgtgag
gccaaggagg ttcctttttc aaacgagaat 600ccccgagcga ccgagaccag
ccggccgagc ctctccgaga cattgttcca gcgactgcaa 660agcatggaca
ccagtgggct tccagaagag catcttaagg ccatccaaga ttatttaagc
720actcagctcg cccaggactc tgaatttgtg aagacagggt ccagcagtct
tcctcacctg 780aagaaactga ccacactgct ctgcaaggag ctctatggag
aagtcatccg gaccctccca 840tccctggagt ctctgcagag gttatttgac
cagcagctct ccccgggcct ccgtccacgt 900cctcaggttc ctggtgaggc
caatcccatc aacatggtgt ccaagctcag ccaactgaca 960agcctgttgt
catccattga agacaaggtc aaggccttgc tgcacgaggg tcctgagtct
1020ccgcaccggc cctcccttat ccctccagtc acctttgagg tgaaggcaga
gtctctgggg 1080attcctcaga aaatgcagct caaagtcgac gttgagtctg
ggaaactgat cattaagaag 1140tccaaggatg gttctgagga caagttctac
agccacaaga aaatcctgca gctcattaag 1200tcacagaaat ttctgaataa
gttggtgatc ttggtggaaa cagagaagga gaagatcctg 1260cggaaggaat
atgtttttgc tgactccaaa aagagagaag gcttctgcca gctcctgcag
1320cagatgaaga acaagcactc agagcagccg gagcccgaca tgatcaccat
cttcatcggc 1380acctggaaca tgggtaacgc cccccctccc aagaagatca
cgtcctggtt tctctccaag 1440gggcagggaa agacgcggga cgactctgcg
gactacatcc cccatgacat ttacgtgatc 1500ggcacccaag aggaccccct
gagtgagaag gagtggctgg agatcctcaa acactccctg 1560caagaaatca
ccagtgtgac ttttaaaaca gtcgccatcc acacgctctg gaacatccgc
1620atcgtggtgc tggccaagcc tgagcacgag aaccggatca gccacatctg
tactgacaac 1680gtgaagacag gcattgcaaa cacactgggg aacaagggag
ccgtgggggt gtcgttcatg 1740ttcaatggaa cctccttagg gttcgtcaac
agccacttga cttcaggaag tgaaaagaaa 1800ctcaggcgaa accaaaacta
tatgaacatt ctccggttcc tggccctggg cgacaagaag 1860ctgagtccct
ttaacatcac tcaccgcttc acgcacctct tctggtttgg ggatcttaac
1920taccgtgtgg atctgcctac ctgggaggca gaaaccatca tccagaaaat
caagcagcag 1980cagtacgcag acctcctgtc ccacgaccag ctgctcacag
agaggaggga gcagaaggtc 2040ttcctacact tcgaggagga agaaatcacg
tttgccccaa cctaccgttt tgagagactg 2100actcgggaca aatacgccta
caccaagcag aaagcgacag ggatgaagta caacttgcct 2160tcctggtgtg
accgagtcct ctggaagtct tatcccctgg tgcacgtggt gtgtcagtct
2220tatggcagta ccagcgacat catgacgagt gaccacagcc ctgtctttgc
cacatttgag 2280gcaggagtca cttcccagtt tgtctccaag aacggtcccg
ggactgttga cagccaagga 2340cagattgagt ttctcaggtg ctatgccaca
ttgaagacca agtcccagac caaattctac 2400ctggagttcc actcgagctg
cttggagagt tttgtcaaga gtcaggaagg agaaaatgaa 2460gaaggaagtg
agggggagct ggtggtgaag tttggtgaga ctcttccaaa gctgaagccc
2520attatctctg accctgagta cctgctagac cagcacatcc tcatcagcat
caagtcctct 2580gacagcgacg aatcctatgg cgagggctgc attgcccttc
ggttagaggc cacagaaacg 2640cagctgccca tctacacgcc tctcacccac
catggggagt tgacaggcca cttccagggg 2700gagatcaagc tgcagacctc
tcagggcaag acgagggaga agctctatga ctttgtgaag 2760acggagcgtg
atgaatccag tgggccaaag accctgaaga gcctcaccag ccacgacccc
2820atgaagcagt gggaagtcac tagcagggcc cctccgtgca gtggctccag
catcactgaa 2880atcatcaacc ccaactacat gggagtgggg ccctttgggc
caccaatgcc cctgcacgtg 2940aagcagacct tgtcccctga ccagcagccc
acagcctgga gctacgacca gccgcccaag 3000gactccccgc tggggccctg
caggggagaa agtcctccga cacctcccgg ccagccgccc 3060atatcaccca
agaagttttt accctcaaca gcaaaccggg gtctccctcc caggacacag
3120gagtcaaggc ccagtgacct ggggaagaac gcaggggaca cgctgcctca
ggaggacctg 3180ccgctgacga agcccgagat gtttgagaac cccctgtatg
ggtccctgag ttccttccct 3240aagcctgctc ccaggaagga ccaggaatcc
cccaaaatgc cgcggaagga acccccgccc 3300tgcccggaac ccggcatctt
gtcgcccagc atcgtgctca
ccaaagccca ggaggctgat 3360cgcggcgagg ggcccggcaa gcaggtgccc
gcgccccggc tgcgctcctt cacgtgctca 3420tcctctgccg agggcagggc
ggccggcggg gacaagagcc aagggaagcc caagaccccg 3480gtcagctccc
aggccccggt gccggccaag aggcccatca agccttccag atcggaaatc
3540aaccagcaga ccccgcccac cccgacgccg cggccgccgc tgccagtcaa
gagcccggcg 3600gtgctgcacc tccagcactc caagggccgc gactaccgcg
acaacaccga gctcccgcat 3660cacggcaagc accggccgga ggaggggcca
ccagggcctc taggcaggac tgccatgcag 3720tgaagccctc agtgagctgc
cactgagtcg ggagcccaga ggaacggcgt gaagccactg 3780gaccctctcc
cgggacctcc tgctggctcc tcctgcccag cttcctatgc aaggctttgt
3840gttttcagga aagggcctag cttctgtgtg gcccacagag ttcactgcct
gtgagactta 3900gcaccaagtg ctgaggctgg aagaaaaacg cacaccagac
gggcaacaaa cagtctgggt 3960ccccagctcg ctcttggtac ttgggacccc
agtgcctcgt tgagggcgcc attctgaaga 4020aaggaactgc agcgccgatt
tgagggtgga gatatagata ataataatat taataataat 4080aatggccaca
tggatcgaac actcatgatg tgccaagtgc tgtgctaagt gctttacgaa
4140cattcgtcat atcaggatga cctcgagagc tgaggctcta gccacctaaa
accacgtgcc 4200caaacccacc agtttaaaac ggtgtgtgtt cggaggggtg
aaagcattaa gaagcccagt 4260gccctcctgg agtgagacaa gggctcggcc
ttaaggagct gaagagtctg ggtagcttgt 4320ttagggtaca agaagcctgt
tctgtccagc ttcagtgaca caagctgctt tagctaaagt 4380cccgcgggtt
ccggcatggc taggctgaga gcagggatct acctggcttc tcagttcttt
4440ggttggaagg agcaggaaat cagctcctat tctccagtgg agagatctgg
cctcagcttg 4500ggctagagat gccaaggcct gtgccaggtt ccctgtgccc
tcctcgaggt gggcagccat 4560caccagccac agttaagcca agccccccaa
catgtattcc atcgtgctgg tagaagagtc 4620tttgctgttg ctcccgaaag
ccgtgctctc cagcctggct gccagggagg gtgggcctct 4680tggttccagg
ctcttgaaat agtgcagcct tttcttccta tctctgtggc tttcagctct
4740gcttccttgg ttattaggag aatagatggg tgatgtcttt ccttatgttg
ctttttcaac 4800atagcagaat taatgtaggg agctaaatcc agtggtgtgt
gtgaatgcag aagggaatgc 4860accccacatt cccatgatgg aagtctgcgt
aaccaataaa ttgtgccttt ctcactcaaa 4920aaaaaaaa
4928121870DNAArtificial SequenceThe sequence has been designed and
synthesized. 121ctcgccctca aatgggaacg ctggcctggg actaaagcat
agaccaccag gctgagtatc 60ctgacctgag tcatccccag ggatcaggag cctccagcag
ggaaccttcc attatattct 120tcaagcaact tacagctgca ccgacagttg
cgatgaaagt tctaatctct tccctcctcc 180tgttgctgcc actaatgctg
atgtccatgg tctctagcag cctgaatcca ggggtcgcca 240gaggccacag
ggaccgaggc caggcttcta ggagatggct ccaggaaggc ggccaagaat
300gtgagtgcaa agattggttc ctgagagccc cgagaagaaa attcatgaca
gtgtctgggc 360tgccaaagaa gcagtgcccc tgtgatcatt tcaagggcaa
tgtgaagaaa acaagacacc 420aaaggcacca cagaaagcca aacaagcatt
ccagagcctg ccagcaattt ctcaaacaat 480gtcagctaag aagctttgct
ctgcctttgt aggagctctg agcgcccact cttccaatta 540aacattctca
gccaagaaga cagtgagcac acctaccaga cactcttctt ctcccacctc
600actctcccac tgtacccacc cctaaatcat tccagtgctc tcaaaaagca
tgtttttcaa 660gatcattttg tttgttgctc tctctagtgt cttcttctct
cgtcagtctt agcctgtgcc 720ctccccttac ccaggcttag gcttaattac
ctgaaagatt ccaggaaact gtagcttcct 780agctagtgtc atttaacctt
aaatgcaatc aggaaagtag caaacagaag tcaataaata 840tttttaaatg
tcaaaaaaaa aaaaaaaaaa 8701229648DNAArtificial SequenceThe sequence
has been designed and synthesized. 122ggtttgtaat gatagggcgg
cagcagcagc agcagcagca gtggtggaac gaggaggtgg 60agaattgaga gcacgatgca
tacacaggtg tttctgagta gtaattagat cgctgtgaag 120gaaaaagcac
acctttgagt tttcacctgt gaacactata gcgctgagag agacagtctg
180aaagcagagg aagacatcga tcagtaacac caagagacac caaagttgaa
agttttgttt 240tctttccctc tgttttattt ttcccccgtg tgtccctact
atggtcagaa agcctgttgt 300gtccaccatc tccaaaggag gttacctgca
gggaaatgtt aacgggaggc tgccttccct 360gggcaacaag gagccacctg
ggcaggagaa agtgcagctg aagaggaaag tcactttact 420gaggggagtc
tccattatca ttggcaccat cattggagca ggaatcttca tctctcctaa
480gggcgtgctc cagaacacgg gcagcgtggg catgtctctg accatctgga
cggtgtgtgg 540ggtcctgtca ctatttggag ctttgtctta tgctgaattg
ggaacaacta taaagaaatc 600tggaggtcat tacacatata ttttggaagt
ctttggtcca ttaccagctt ttgtacgagt 660ctgggtggaa ctcctcataa
tacgccctgc agctactgct gtgatatccc tggcatttgg 720acgctacatt
ctggaaccat tttttattca atgtgaaatc cctgaacttg cgatcaagct
780cattacagct gtgggcataa ctgtagtgat ggtcctaaat agcatgagtg
tcagctggag 840cgcccggatc cagattttct taaccttttg caagctcaca
gcaattctga taattatagt 900ccctggagtt atgcagctaa ttaaaggtca
aacgcagaac tttaaagacg ccttttcagg 960aagagattca agtattacgc
ggttgccact ggctttttat tatggaatgt atgcatatgc 1020tggctggttt
tacctcaact ttgttactga agaagtagaa aaccctgaaa aaaccattcc
1080ccttgcaata tgtatatcca tggccattgt caccattggc tatgtgctga
caaatgtggc 1140ctactttacg accattaatg ctgaggagct gctgctttca
aatgcagtgg cagtgacctt 1200ttctgagcgg ctactgggaa atttctcatt
agcagttccg atctttgttg ccctctcctg 1260ctttggctcc atgaacggtg
gtgtgtttgc tgtctccagg ttattctatg ttgcgtctcg 1320agagggtcac
cttccagaaa tcctctccat gattcatgtc cgcaagcaca ctcctctacc
1380agctgttatt gttttgcacc ctttgacaat gataatgctc ttctctggag
acctcgacag 1440tcttttgaat ttcctcagtt ttgccaggtg gctttttatt
gggctggcag ttgctgggct 1500gatttatctt cgatacaaat gcccagatat
gcatcgtcct ttcaaggtgc cactgttcat 1560cccagctttg ttttccttca
catgcctctt catggttgcc ctttccctct attcggaccc 1620atttagtaca
gggattggct tcgtcatcac tctgactgga gtccctgcgt attatctctt
1680tattatatgg gacaagaaac ccaggtggtt tagaataatg tcagagaaaa
taaccagaac 1740attacaaata atactggaag ttgtaccaga agaagataag
ttatgaacta atggacttga 1800gatcttggca atctgcccaa ggggagacac
aaaataggga tttttacttc attttctgaa 1860agtctagaga attacaactt
tggtgataaa caaaaggagt cagttatttt tattcatata 1920ttttagcata
ttcgaactaa tttctaagaa atttagttat aactctatgt agttatagaa
1980agtgaatatg cagttattct atgagtcgca caattcttga gtctctgata
cctacctatt 2040ggggttagga gaaaagacta gacaattact atgtggtcat
tctctacaac atatgttagc 2100acggcaaaga accttcaaat tgaagactga
gatttttctg tatatatggg ttttgtaaag 2160atggttttac acactataga
tgtctatact gtgaaaagtg ttttcaattc tgaaaaaaag 2220catacatcat
gattatggca aagaggagag aaagaaattt attttacatt gacattgcat
2280tgcttcccct tagataccaa tttagataac aaacactcat gctttaatgg
attataccca 2340gagcactttg aacaaaggtc agtggggatt gttgaataca
ttaaagaaga gtttctaggg 2400gctactgttt atgagacaca tccaggagtt
atgtttaagt aaaaatcctt gagaatttat 2460tatgtcagat gttttttcat
tcattatcag gaagttttag ttatctgtca tttttttttt 2520tcacatcagt
ttgatcagga aagtgtataa cacatcttag agcaagagtt agtttggtat
2580taaatcctca ttagaacaac cacctgtttc actaataact tacccctgat
gagtctatct 2640aaacatatgc attttaagcc ttcaaattac attatcaaca
tgagagaaat caccaacaaa 2700gaagatgttc aaaataatag tcccatatct
gtaatcatat ctacatgcaa tgttagtaat 2760tctgaagttt tttaaattta
tggctatttt tacacgatga tgaattttga cagtttgtgc 2820attttcttta
tacattttat attcttctgt taaaatatct cttcagatga aactgtccag
2880attaattagg aaaaggcata tattaacata aaaattgcaa aagaaatgtc
gctgtaaata 2940agatttacaa ctgatgtttc tagaaaattt ccacttctat
atctaggctt tgtcagtaat 3000ttccacacct taattatcat tcaacttgca
aaagagacaa ctgataagaa gaaaattgaa 3060atgagaatct gtggataagt
gtttgtgttc agaagatgtt gttttgccag tattagaaaa 3120tactgtgagc
cgggcatggt ggcttacatc tgtaatccca gcactttggg aggctgaggg
3180ggtggatcac ctgaggtcgg gagttctaga ccagcctgac caacatggag
aaaccccatc 3240tctactaaaa atacaaaatt agctgggcat ggtggcacat
gctggtaatc tcagctattg 3300aggaggctga ggcaggagaa ttgcttgaac
ccgggaggcg gaggttgcag tgagccaaga 3360ttgcaccact gtactccagc
ctgggtgaca aagtcagact ccatctccaa aaaaaaaaga 3420ttatatatat
atatatatgt gtgtgtatgt gtgtgtgtgt gtgtgtgtgt atatatatat
3480atatatatat acacacacac acacacactt tttatatata tatatatata
tatatagtgg 3540aacttacaaa tgagagtaat ataatgatga aattttgaac
tgttatttat aaacatctaa 3600ggtaaaatgg ttagtcatgg ccagagtatg
tttcatcctt taatttttgt ccatttgaaa 3660ataaggattt ttgaaagaat
tataccaatt aaaattatta aaggcaaaca tagaattcat 3720aaaaaattgt
ccaaagtaga aatgatgacc tataatttgg agcatttcca attcagtaat
3780ttcaattttg ctcttgaaaa catttaatat atatccaaga ctgacatttc
tttagctgaa 3840cctaacgttt gggtctctga gtgaatttat aataactcct
tccttcctta gcatagggtt 3900ttcaaaattt gatttataat tcctatttcc
agtaaatatt gttcatttgt ccacatctct 3960ccctatgata tgttgctgga
ggtaagaatt tctttcatat tcctattttt tttttcccca 4020tagactaggc
tcatagaatt taaacaagca aattttcctg agctttttct tgccaaatga
4080aagaagactg gtaaattctc atagagaggt ttgtgtagtt cttggctctt
cctggggtta 4140atgtgcttat attcacagtg gcaaattggt ctcagacttt
aatttattta tttttgattt 4200gaatttctct ttaaaagtat caatttaaaa
ggtaactaga attattcttt ctcattttca 4260aaagtgattt ttgcattatt
aaatttccct gccattgtaa tgccatttca cgcagaaaaa 4320aagtcagcca
gtaattaaga aaaaaagtga tggagattaa gtagtatttt ggcttatttt
4380taggactcat catgagaaga cacagttcct ttaatcagga aattaatatc
cataattttc 4440actcaaaatt gcagtatgta aagcagattc tcaaaaactc
tcctgaacac ttatttatat 4500atatgttttt atataagtaa aatttttctc
atatttttat acgatatgca cacacacaca 4560tacatgcaca tactacttac
tacatgttct gtacttgtac tttgtaccat gcatattcaa 4620atgtttatat
acataagttt attataacat aaacagtaaa agtaatgaat actgtttaaa
4680ataactaata tagtattttt taatttttgt ggggatggat tctcaaatac
ttgtgatttt 4740aaaagattct aaagctaaaa cacaacttga ttttaaaaag
aatgattctc cttacacaat 4800tataaatatt tgcagtaaat attttcctta
taatactgtt ttgaccccat ttaaaaagta 4860ttagattata ttcctttgat
ccaatgaaaa ctgaacctta taaatggtta gctgaaagta 4920gaccttattc
ttgtccttct ttagaagagt aaagatttgt cctagggaag atggctgact
4980tcggttccca acatgcgtat gcatttagac tgtagctcct cagccctgtg
gacacaaaat 5040ttggacagct tattaggtta cgttagcaat gcatgacggt
ttctccaaca ctaagatatt 5100cacgttgaaa cagatttcct gttcgtctta
tgtgtctggt aaaattgttt ccccaattac 5160aatttgacat atcaatagag
ggttaacaag agtataatta cataacagaa ttcctcatga 5220actgtaatca
gtctacagga aaatcattat tttatcttga tttgcagatg aatatactgc
5280taagaaaggg agcaactctg acctttgtta aagttgatct tttgtaattg
aggtataagg 5340tatgaaaaga taaaaaaccg aaggccagag aatcaggaaa
tgaaagatag tatggactga 5400aggtaacaat attttaatgt tatgcaatat
agtcagagaa atattaaaaa ttagttgttt 5460gctgtgcata ggtggatctc
gcaggaagct aatgaaacct aagcttcagt gcctctcact 5520tagacatgtt
ccattcgagg tcctgaacct aactttgtat taggaattct gtactaattt
5580tgttgaagaa gaccagcaaa gttgtgtaca cttctacccc cacaaaatct
gcattgtcca 5640tgtgagtaaa gtaaaataat tcctgttatt tttttctgtt
agaaataagt atggaggata 5700tgtttttaaa aatttatgag ttaattgaaa
tatccatata taacaagtga ctttctcaca 5760atatatatga tgtgatatat
agggagatag tttcactttc atcatatttt atacgttgat 5820tctgaactat
agaaaaataa taaatgggat tttaattata gctcttagtt gggaaagaaa
5880tatagagaga tgtgggattt gaatgcccat gaaagacatt ttattttact
tgaatatatt 5940cttgcttcac tttaccctcc ataatatgtt gtacattagt
gctgatcaag tttacagagt 6000tacattttgc tttcctaacc attcagtcag
gaattaaaat atggcattgt ataacaactg 6060ggaagaagct catagtggat
ataaattaga gtagataatg ggtcaccttg atagcctctg 6120tttacattac
ttgtatatgg gcaaaataat tattacctat acgtgtattt aagcttaatt
6180ttcatataaa cagtattttt aatctatgtt aaaatagata atatctaaaa
gtgtgatctc 6240taggtagtcc ttagtttatt agtactgtac ttcaaaaaga
tttttaaata ggtccggcac 6300ggtggctcat gcctgtaatc ccagcacttt
gggaggctga ggcgggcgaa tcacctgagg 6360tcaggagttc gagatcagcc
tggccaacat ggtgaaaccc tgtctcaact aaaaatataa 6420aaattagccg
ggcgtggtgg caggcgcctg taatcccagc tactcgggag gctgaggcag
6480gagaatcact tgaacccaag gggcagaagc tgcagttagc caagatcgca
tcattgcact 6540ccagcctagg ggacaagagc gcgagacttc atctcaaaaa
aaaaaaaaaa aaaaaaaaaa 6600gatttttaaa taatagctaa aggtatgctc
tctaggtcat ccttagttta ttagtactgt 6660acttaaaaat tattttttaa
tagtcaattt tgggagataa ttatttcttt ccttatattt 6720tccaattagt
tggtgtctaa aaataaatgt tttgtctaat tttagatcag gtatacattc
6780acaaaagcat aaatcatagt ctcacaggaa attcaccaat tttccatatg
tcgtgagata 6840actgtccttt ctacaacctc ataacaatga atttatataa
ttacctagat tttcttagtg 6900tgaatctacc cattagtttt attttcttgg
tagttatttt tttccctcct ctctgttact 6960attggcctta aaatacacag
aggacggtta cagtgtccta atagctgtta catgtgtgtg 7020tttcagcgta
cttgaatcaa gtgtacattt atagtaccaa taaccgcctt tacagcttta
7080cagttaacaa ttctctcaca aaactgtaga gcattaggca tctgagagcc
atagagggcc 7140aactttgttc cagagtgaac atgctttttt tcctcaacat
atacactact gatttttttt 7200aaaagtatga ctttcaagtg aattaatgta
ttggttagga gaactgcttg ctaagtcctt 7260attacctctt gttaaagcct
cagaaggccg tgctgaaagc cagaggggaa aaaaagagta 7320atgcacaggt
atctcttttg cagtggtgac tgtattttga gtaccttgtg tgacagggta
7380ttattacagc atcttgtggg aaaacctatt aggcctttgc atgttaaagc
tgtataattt 7440gttgggttgt gagtggtctg acttaaatgt gtattataaa
atttagacat caaattttcc 7500tactaactaa ctttattaga tgcatacttg
gaagcacagt catatcacac tgggaggcaa 7560tgcaatgtgg ttacctggtc
ctaggtttga actgtcttat ttcaaaagat ttctgaatta 7620atttttccct
agaatttctc cttcattcca aagtacaaac atactttgaa gaatgaaaca
7680gattgttccc atgaatgtat gctcatactc gactagaaac gatctatgtt
aaatgactgt 7740gtatatgaat tatttcaagt actaccccaa ataactttct
tattgctctg aaagaagaaa 7800agcaatgtaa atcactatga ttattgcaca
aacaaccaga attctccaac aattttaagt 7860aatctgatcc tcttcttgga
gaaaattgtt acctaatagt ttttccttat gaatgttatt 7920actactggta
taaatcaaat ttctataaat ttcctactta agtcttaaga actgggttct
7980tcctttgatg ttattcatgt tcagaaagga aacaacactt tactctttta
ggacaattcc 8040tagaatctat agtagtatca ggatatattt tgctttaaaa
tatattttgg ttattttgaa 8100tacagacatt ggctccaaat tttcatcttt
gcacaatagt atgacttttc actagaactt 8160ctcaacattt gggaactttg
caaatatgag catcatatgt gttaaggctg tatcatttaa 8220tgctatgaga
tacattgttt tctccctatg ccaaacaggt gaacaaacgt agttgttttt
8280tactgatact aaatgttggc tacctgtgat tttatagtat gcacatgtca
gaaaaaggca 8340agacaaatgg cctcttgtac tgaatacttc ggcaaactta
ttgggtcttc attttctgac 8400agacaggatt tgactcaata tttgtagagc
ttgcgtagaa tggattacat ggtagtgatg 8460cactggtaga aatggttttt
agttattgac tcagaattca tctcaggatg aatcttttat 8520gtctttttat
tgtaagcata tctgaattta ctttataaag atggttttag aaagctttgt
8580ctaaaaattt ggcctaggaa tggtaacttc attttcagtt gccaaggggt
agaaaaataa 8640tatgtgtgtt gttatgttta tgttaacata ttattaggta
ctatctatga atgtatttaa 8700atatttttca tattctgtga caagcattta
taatttgcaa caagtggagt ccatttagcc 8760cagtgggaaa gtcttggaac
tcaggttacc cttgaaggat atgctggcag ccatctcttt 8820gatctgtgct
taaactgtaa tttatagacc agctaaatcc ctaacttgga tctggaatgc
8880attagttatg accttgtacc attcccagaa tttcaggggc atcgtgggtt
tggtctagtg 8940attgaaaaca caagaacaga gagatccagc tgaaaaagag
tgatcctcaa tatcctaact 9000aactggtcct caactcaagc agagtttctt
cactctggca ctgtgatcat gaaacttagt 9060agaggggatt gtgtgtattt
tatacaaatt taatacaatg tcttacattg ataaaattct 9120taaagagcaa
aactgcattt tatttctgca tccacattcc aatcatatta gaactaagat
9180atttatctat gaagatataa atggtgcaga gagactttca tctgtggatt
gcgttgtttc 9240ttagggttcc tagcactgat gcctgcacaa gcatgtgata
tgtgaaataa aatggattct 9300tctatagcta aatgagttcc ctctggggag
agttctggta ctgcaatcac aatgccagat 9360ggtgtttatg ggctatttgt
gtaagtaagt ggtaagatgc tatgaagtaa gtgtgtttgt 9420tttcatctta
tggaaactct tgatgcatgt gcttttgtat ggaataaatt ttggtgcaat
9480atgatgtcat tcaactttgc attgaattga attttggttg tatttatatg
tattatacct 9540gtcacgcttc tagttgcttc aaccatttta taaccatttt
tgtacatatt ttacttgaaa 9600atattttaaa tggaaattta aataaacatt
tgatagttta cataataa 96481231963DNAArtificial SequenceThe sequence
has been designed and synthesized. 123cagtccggcc gggcggagct
aggggcgggc ccctgcgtct ctgggcgctg gagcgcggcg 60actatcacgc cgcgtggcgg
acggacggac tgacggacgc gcagccttac ccgaaaggcc 120atggcggagc
acgcccctcg ccgctgctgc ctgggctggg acttcagcac gcagcaggta
180aaggttgttg ctgttgatgc agagttgaat gtcttctatg aggaaagtgt
gcattttgac 240agagatcttc cagaatttgg gcatgtactt gatgtgcatg
gtgttcatgt gcacaaggat 300gggctgacgg tcacttctcc agtactaatg
tgggtccagg cactggatat catcttggag 360aagatgaagg cttcgggctt
cgaattctct caagtcctag ccttgtccgg ggcgggccag 420caacacggaa
gtatatactg gaaggctgga gcccagcagg cactgacaag cttatcacca
480gacctccggc tacaccagca gcttcaggac tgtttctcca tcagcgactg
cccggtgtgg 540atggactcca gcaccacagc ccagtgccgc cagctggagg
ctgctgtggg tggtgctcag 600gctctcagct gcctcacggg gtcccgtgcc
tatgagcgtt ttacagggaa ccaaattgca 660aaaatttacc agcagaaccc
cgaggcctac tcacatacag agagaatttc tttggtcagt 720agctttgctg
cttccctgtt ccttggctct tactccccta ttgactacag tgatggttct
780ggaatgaatt tgttgcagat acaggataaa gtctggtccc aggcttgcct
tggtgcctgt 840gcacctcatt tagaggagaa gcttagccca ccagtaccat
catgctcagt tgtgggagcc 900atttcttcct acaacgtcca gcgctacgga
tttcctccag gatgcaaagt ggtggccttc 960actggggaca acccagcgtc
gctggcaggc atgagactgg aggaaggtga cattgcggtc 1020agcctgggca
ccagtgacac cctgtttctc tggctccaag agcccatgcc tgccctggaa
1080ggccacatct tctgcaaccc ggttgactcc cagcactaca tggcactcct
gtgctttaaa 1140aatggctccc tcatgagaga gaagatccgc aacgagtctg
tatcccgttc ctggagcgat 1200ttctctaagg cactgcagtc cacagagatg
ggcaacggtg gaaacctggg tttttatttt 1260gatgtaatgg agatcacccc
tgaaattatt ggacgtcata ggtttaacac agaaaaccac 1320aaggttgcag
cattccctgg ggatgtggag gttcgagcac taattgaagg acaattcatg
1380gccaagagga ttcacgcaga aggcctgggc tatcgagtca tgtccaagac
aaagattttg 1440gccacaggag gagcatctca caatagagaa atcttacagg
tgcttgcaga tgtgtttgat 1500gccccggtgt atgttataga cactgccaac
tcggcctgtg tgggttctgc ataccgagct 1560tttcatggtc ttgcaggtgg
aacagatgtg cccttttcag aggttgtgaa gttagctcca 1620aatcccagac
tagctgctac cccaagcccg ggagcttctc aggtgagaga ccatcngaat
1680ttgtttgtag catttgcatt atgaaagccc gctagggttt tttcccccac
caaaaggtca 1740cctacattga acgtgatgtg ctcaactaaa ggagaaattc
tgctttattg aaattatcaa 1800gaaaatggag ctaaagggcc atgttgtcag
ctgcaagtca cagatactgc tgattttaca 1860gccagggtca gatggattgc
tgggcatatt tgtattgctt cttatgcctc acggtgggcc 1920cttccatgtc
actgggctat aaaagctact gaaaggatcc atc 19631241475DNAArtificial
SequenceThe sequence has been designed and synthesized.
124agagacagct gggccactgg cagtgaggga gagtgaggat ggcagagacc
agtgccctgc 60ccactggctt cggggagctc gaggtgctgg ctgtggggat ggtgctactg
gtggaagctc 120tctccggtct cagcctcaat accctgacca tcttctcttt
ctgcaagacc ccggagctgc 180ggactccctg ccacctactg gtgctgagct
tggctcttgc ggacagtggg atcagcctga 240atgccctcgt tgcagccaca
tccagccttc tccgtgtctc ccacaggcgc tggccctacg 300gctcggacgg
ctgccaggct cacggcttcc agggctttgt gacagcgttg gccagcatct
360gcagcagtgc agccatcgca tgggggcgtt atcaccacta ctgcacccgt
agccagctgg 420cctggaactc agccgtctct ctggtgctct tcgtgtggct
gtcttctgcc ttctgggcag 480ctctgcccct tctgggttgg ggtcactacg
actatgagcc actggggaca tgctgcaccc 540tggactactc caagggggac
agaaacttca ccagcttcct cttcaccatg
tccttcttca 600acttcgccat gcccctcttc atcacgatca cttcctacag
tctcatggag cagaaactgg 660ggaagagtgg ccatctccag gtaaacacca
ctctgccagc aaggacgctg ctgctcggct 720ggggccccta tgccatcctg
tatctatacg cagtcatcgc agacgtgact tccatctccc 780ccaaactgca
gatggtgccc gccctcattg ccaaaatggt gcccacgatc aatgccatca
840actatgccct gggcaatgag atggtctgca ggggaatctg gcagtgcctc
tcaccgcaga 900agagggagaa ggaccgaacc aagtgagcct gccaccctgg
agtgagcccc aggccaggag 960gctgttccag gagtcctgcc cagcagcctc
agtggccaag cccagacact cacccacctt 1020ccccagtggc cccgtggatc
ctggtcctag gctggacaca ggattcagaa agacaccagg 1080ctgcacagaa
agagccagat ggacctgagt gtcggtcaca gccccctaca ctcaaggctg
1140agaggcctca ggaaagtcat tcctttttaa aaataataat aaatgtaagg
gggtacagtg 1200cagttttgtt acatggatag attgcctagt ggtgaagtct
gggcttttag tgtaaccatc 1260accctaataa tatacgttgt acccattaag
ttatttctca tccctcaccc cctcccacct 1320tgtcaccctt ctgagtctcc
aatgtctatt attccacact ccatgtccac gtgtacacat 1380tatttagctc
ccacttacaa gtgagaacat gtggtatttg actttctgtt tttgagttat
1440ttcacttaaa ataatgacct ccagtttcat ccatg 14751252327DNAArtificial
SequenceThe sequence has been designed and synthesized.
125acatttcatt gtaaacgact gggagtatct gagcaaatta tttcttacgt
gactttagag 60aaaacggcta cctatctgac cccaaaacga cttgaggaaa ctgtttccac
ggtcctgctg 120cagaggggaa gcacagtcgt caagaagaga gtggggtcag
gatcaaaaca catttagtgt 180gacttaggga aagaaaacat tttccctctt
tgaacctctc tggatacagt cattttgcct 240ctacttgagg atcaactgtt
caacctcaat ggcctttcag gacctcctgg gtcacgctgg 300tgacctgtgg
agattccaga tccttcagac tgtttttctc tcaatctttg ctgttgctac
360ataccttcat tttatgctgg agaacttcac tgcattcata cctggccatc
gctgctgggt 420ccacatcctg gacaatgaca ctgtctctga caatgacact
ggggccctca gccaagatgc 480actcttgaga atctccatcc cactggactc
aaacatgagg ccagagaagt gtcgtcgctt 540tgttcatcct cagtggcagc
tccttcacct gaatgggacc ttccccaaca caagtgacgc 600agacatggag
ccctgtgtgg atggctgggt gtatgacaga atctccttct catccaccat
660cgtgactgag tgggatctgg tatgtgactc tcaatcactg acttcagtgg
ctaaatttgt 720attcatggct ggaatgatgg tgggaggcat cctaggcggt
catttatcag acaggtttgg 780gagaaggttc gtgctcagat ggtgttacct
ccaggttgcc attgttggca cctgtgcagc 840cttggctccc accttcctca
tttactgctc actacgcttc ttgtctggga ttgctgcaat 900gagcctcata
acaaatacta ttatgttaat agccgagtgg gcaacacaca gattccaggc
960catgggaatt acattgggaa tgtgcccttc tggtattgca tttatgaccc
tggcaggcct 1020ggcttttgcc attcgagact ggcatatcct ccagctggtg
gtgtctgtac catactttgt 1080gatctttctg acctcaagtt ggctgctaga
gtctgctcgg tggctcatta tcaacaataa 1140accagaggaa ggcttaaagg
aacttagaaa agctgcacac aggagtggaa tgaagaatgc 1200cagagacacc
ctaaccctgg agattttgaa atccaccatg aaaaaagaac tggaggcagc
1260acaaaaaaaa aaaccttctc tgtgtgaaat gctccacatg cccaacatat
gtaaaaggat 1320ctccctcctg tcctttacga gatttgcaaa ctttatggcc
tattttggcc ttaatctcca 1380tgtccagcat ctggggaaca atgttttcct
gttgcagact ctctttggtg cagtcatcct 1440cctggccaac tgtgttgcac
cttgggcact gaaatacatg aaccgtcgag caagccagat 1500gcttctcatg
ttcctactgg caatctgcct tctggccatc atatttgtgc cacaagaaat
1560gcagacgctg cgtgaggttt tggcaacact gggcttagga gcgtctgctc
ttgccaatac 1620ccttgctttt gcccatggaa atgaagtaat tcccaccata
atcagggcaa gagctatggg 1680gatcaatgca acctttgcta atatagcagg
agccctggct cccctcatga tgatcctaag 1740tgtgtattct ccacccctgc
cctggatcat ctatggagtc ttccccttca tctctggctt 1800tgctttcctc
ctccttcctg aaaccaggaa caagcctctg tttgacacca tccaggatga
1860gaaaaatgag agaaaagacc ccagagaacc aaagcaagag gatccgagag
tggaagtgac 1920gcagttttaa ggaattccag gagctgactg ccgatcaatg
agccagatga agggaacaat 1980caggactatt cctagacact agcaaaatct
agaaaataaa taacaaggct gggtgcggtg 2040gctcacgcct gtaatcccag
caccttggga ggctgaggcg ggcagatcat gaggtcagaa 2100gataaagacc
accctggcca acatggtgaa accctgtctc tactaaaaca aatacaaaac
2160ttcgctgggc acagtggcac aggcctttaa ttccagctac ttgggaggct
gaggcaggag 2220aattacttga acccaggagg tggaaattgc aatgagccaa
gattgggcca ctgcattcca 2280gcctggtgac agagcgagac tgtctcaaaa
aaaaaaaaaa aaaaaaa 2327126638DNAArtificial SequenceThe sequence has
been designed and synthesized. 126ctccagacag acagaagaaa gggattcttt
tcagtctaga aaaatgctca ccccttcctc 60agaacatttc cactgtgacg aaaagagact
gatgaagcct cagagagaaa ggcaactctg 120ggtggtgatg caatagtgca
gaatccagaa tggatgtcct ctttgtagcc atctttgctg 180tgccacttat
cctgggacaa gaatatgagg atgaagaaag actgggagag gatgaatatt
240atcaggtggt ctattattat acagtcaccc ccagttatga tgactttagt
gcagatttca 300ccattgatta ctccatattt gagtcagagg acaggctgaa
caggttggat aaggacataa 360cagaagcaat agagactacc attagtcttg
aaacagcacg tgcagaccat ccgaagcctg 420taactgtgaa accagtaaca
acggaaccta gtccagatct gaacgatgcc gtgtccagtt 480tgcgaagtcc
tattcccctc ctcctgtcgt gtgcctttgt tcaggtgggg atgtatttca
540tgtagaaggt ggaagaaggc tgctatgact ctttggatgg gagtctggca
agaggaaatt 600ggaagataaa ataaataata agtgaaataa tctggtta
6381271077DNAArtificial SequenceThe sequence has been designed and
synthesized. 127acgtatatac agagcctccc tggccctcct ggaaagagtc
ctggaaagac aaccttcagg 60tccagccctg gagctggagg agtggagccc cactctgaag
acgcagcctt tctccaggtt 120ctgtctctcc cattctgatt cttgacacca
gatgcaggat ggtgtcctct ccctgcacgc 180aggcaagctc acggacttgc
tcccgtatcc tgggactgag ccttgggact gcagccctgt 240ttgctgctgg
ggccaacgtg gcactcctcc ttcctaactg ggatgtcacc tacctgttga
300ggggcctcct tggcaggcat gccatgctgg gaactgggct ctggggagga
ggcctcatgg 360tactcactgc agctatcctc atctccttga tgggctggag
atacggctgc ttcagtaaga 420gtgggctctg tcgaagcgtg cttactgctc
tgttgtcagg tggcctggct ttacttggag 480ccctgatttg ctttgtcact
tctggagttg ctctgaaaga tggtcctttt tgcatgtttg 540atgtttcatc
cttcaatcag acacaagctt ggaaatatgg ttacccattc aaagacctgc
600atagtaggaa ttatctgtat gaccgttcgc tctggaactc cgtctgcctg
gagccctctg 660cagctgttgt ctggcacgtg tccctcttct ccgcccttct
gtgcatcagc ctgctccagc 720ttctcctggt ggtcgttcat gtcatcaaca
gcctcctggg ccttttctgc agcctctgcg 780agaagtgaca ggcagaacct
tcacttgcaa gcatgggtgt tttcatcatc ggctgtcttg 840aatcctttct
acaaggagtg ggtacgaatt ataaacaaac ttccccttta ggtatccctg
900gagtaataat gacaacaaaa ttcactgcag gtcggtggaa tgatagaatg
cattttaaat 960cacattgtaa acttccaggt gatccatgga taggataaat
aactaagtta ttataattgt 1020ttaggaattt atagtccata aaatatcctc
cagccaggga aaaaaaaaaa aaaaaaa 10771284667DNAArtificial SequenceThe
sequence has been designed and synthesized. 128ccctagtttc
agacccaagc tctgccagtc acttcctgtt caccctgagc aagctattaa 60accattcgga
gccccagttt ccacactggt aaaaatgaag ataaaatcag ccacctccta
120gggttgttac gaggtgtaaa tgaaataaca atgagcctgg tacatcacag
gtgctccctg 180aattttagtt cctttgctcc cttttaccat cccagtagcc
caggcagggg tgagcagcta 240tgaggcgtgg cagcagtgga tggaagtgac
aagggacaga tgaggatccc caggggaagg 300gaggtggaga aagcaggaac
caaatgtcat tcgtgcagcc actcccgatt tatgggatct 360ggtgcagaga
ctctttgaag aggcaataaa actcatattc tgctaggaaa tgacaacagt
420ggagagtcct gacttagcaa accaaattca acgatttcat tttttttttt
tttttacaag 480gcatggggtt ccggatgggt agcaaaaatc acattctaat
attagctatc tggggtagga 540acaggctttc ccaagagcaa gagatgtgta
ccatgccaac acccaatggt gagtgtacat 600aaatgctcca tagaggcctt
gagagtccag aacctgagct ccttgctaag ctgcaacctt 660cctctgctag
caccatggcc acccagacct gcacccctac cttctccact gggtctatca
720agggcctctg tggcacagca ggcggcatct ctcgggtgtc ctccatccgt
tctgtgggct 780cctgcagggt ccccagtctc gccggtgctg cagggtacat
ctcttctgct aggtcgggcc 840tctctggcct tgggagctgc ttgcctggct
cctacctgtc ttctgagtgc cacacctctg 900gctttgtggg gagcgggggc
tggttctgcg agggctcctt caacggcagc gagaaggaga 960ctatgcagtt
cctgaacgac cgcctggcca actacctgga gaaggtgcgt cagctggagc
1020gggagaacgc ggagctggag agccgcatcc aggagtggta cgagtttcag
atcccataca 1080tctgcccaga ctaccagtcc tacttcaaga ccatcgaaga
tttccagcag aaggtgaggg 1140agacctggcc cctttccagc tgagcagccc
aactcttagg aggcgtctgt ctcattccag 1200ggcctttcca gatgggacgg
tagcttttag ggaattctcc tggtagggca aacttttctc 1260tctagggctt
tggtttcctg ggggacccag gcttctctgc tggtctcttg cttgtacctt
1320gttgccagcc gttcctcacc cccctcccct tgctgtagga agggaaggaa
catgtaccaa 1380gcccctaccg gatacagcac tgtgcctgac actttcctat
tccctggctt atttattctt 1440catcttggag agattggtat tactggaccc
atctgacaga tgtggaagtt gaattttgag 1500gggttcagca acttgctcaa
gagcacacag gctcagtgtt agatgcagaa cctgaactca 1560ggtctgcctc
accctggtgt atcttgctgt gccactctga aggctacagg cgccaggact
1620gatggtcact gacatcctcg ggatgcccct gtggggtgga tgctccccca
cacctggggt 1680gtggctcagg gctggaaatg gtgaaggggc tggtggtaca
cagtggcatg gcctctgtga 1740gctgagctat ccccctacct gcccccacac
tgctgggcct cctgtagctt atgggaagcc 1800tcttgttccc cagatcctgc
tgactaagtc tgagaatgcc aggctggtcc tgcagattga 1860taatgccaag
ctggctgctg acgacttccg gaccaagtga gtgggcctga tcagggaagg
1920tttgccatgc cactcccttg cctctgtccc ctgcctttgg gagctggtga
tgtgggaata 1980aggctggatg ggaaaatgtg gcatgagccc tggacctcag
gggaaaggag tgtcatgctc 2040ctgatcccgt cagccactgc agctggacag
tagggcaggg gtccctgctg gtgtgtggtt 2100tcctttgctg cagacttggc
agggttctgt gtgtgcaggt atgagacaga gctgtctctg 2160cggcagctag
tggaggccga catcaacggc ctgcgtagga tcctggatga gctgaccctg
2220tgcaaggctg acctggaggc tcaggtggag tccctgaagg aggagctgat
gtgcctcaag 2280aagaatcacg aggaggtgag gctggtgcca tgtgacttcc
cagtgtttcc catccagctt 2340aggaagccac tgctgggctt tcagttttct
gtgcggcagg aactatacaa aggccttgca 2400tttcattctc gtttcatttc
atccttacaa taatcccaag aattataaac tgttacaagc 2460tccattttac
agatgagaaa acttaggcac aaagaggtta agttgcttgc ctaaggtcat
2520agagggtcta cacttttgcc cataacacta catgtctatt tgggctctag
tgtcctgata 2580acagcaattt aatttgccta gggtttgtat catctcacaa
atatcccata gaaggaggta 2640ggtatatacg gagaaggaga ccaaggctca
gagatattta agtatcctgt ctaatgctat 2700gcagctggtg actgagggaa
gagggtttga attcaggtca ttgaaacctg caatccagca 2760tctttttcac
aacctcatgc cgtcttgcct cctctctgca ggaagtcagt gtactccgtt
2820gccaacttgg ggaccgactg aatgtggagg tggacgctgc tcccccagtg
gatctcaaca 2880agatcctgga ggatatgaga tgccagtacg aggccctggt
ggagaataac cgcagagatg 2940tggaggcctg gttcaacacc caggtggggc
tggggtgccc tgggaccaca ggctcctggg 3000ctggggttac ccttggaagt
agcttggttt gaccatgctc tgggccctgg catgtgtttc 3060agactgagga
gctgaaccag caggtggtgt ccagctcgga gcagctgcag tgctgccaga
3120cggagatcat cgagctgaga cgtacggtca acgcgctaga gattgagctg
caggctcagc 3180acagcatggt gagtggcccc tgcctgcgtc gctggccacg
gcctgtggca ggtccccgac 3240gcaccagcct cagcgtgcag gctctcatgg
ggtgtgatca caggtcgtag gcagatgccc 3300agggctgtgg gtttctgggg
tcaggggatt cctccccaat aggcagctct tcctctttcc 3360cattgcagcg
gaattccttg gaatccaccc tggccgaaac cgaggcccgc tacagctccc
3420agctggccca gatgcagtgc ctgatcagca acgtggaggc ccagctgtct
gagatccgct 3480gcgacctgga gcggcagaac caggagtacc aggtgttact
ggacgtcaag gcccggctgg 3540agggcgagat cgctacctac cgccacctgc
tggagggaga ggactgcaag tgagtggccc 3600ttgggctggg gtagggcttg
actgaaccct cagtgccatg tggagggcgt caagcccaga 3660agtggttgtc
gcccagatga agggaactaa accaaagccc cttgagattc tccatttagt
3720cccaggcttt ggtaatgcac agcgggagaa tccaacccaa cacacgccgc
gtgttttccg 3780ccatcttttc tgattggcag tttctgctct tcattcctgt
agctcagtcc tctcaccctt 3840ggggaattca gaggcactga gatgatccgg
ggccaccggt ctcgcttgat cctctagatc 3900tgtttaacac gaatctcagc
ccagtgctcc gatgccaaat gcaccctgca tgattttgtt 3960tcctcaggct
tcctccccaa ccttgtgcca cggcatgcaa gcctgttatt agagttcctt
4020ctgtcccccc ggtgccctgt gtcccctctg tgccctgcac cccggctccc
caggttggca 4080ctcagatccg caccatcacc gaggagatca gagatgggaa
agtcatctcc tccagggagc 4140acgtgcagtc ccgcccgctg tgacagccca
cttggtccac cagggcaggg ccctgaccac 4200aggaaggagg acacccctgt
ggctcctgga ggcttaacga ccctgccctt ctctagaggg 4260gtccccctac
gcttagcagg tttttctacc aaaacactcc ccgtattgtg tttccggact
4320taactgtgct tttacgccat gcaaaaccag gtttcctgga aatttaccca
ataaagtgtg 4380ttctcctggc atagcaaact caaccctggc tgactctgtt
gatgctcatg tgctcgtgtg 4440gttcatgggg gtgtctgaca ggggctgcag
tatagttgtg gggtttccat ttaggtctat 4500cttctcgggc tctgagtggg
aggcggtggc agggtgattt gaagtattta ataacaaaga 4560tgcctcagag
gctggccaat gagaacagac actgacccag tctggatggg tgcggccata
4620gctctcagct tggccctgcc tgtgagtaga caccttaggg cggcccc
4667129402DNAArtificial SequenceThe sequence has been designed and
synthesized. 129atggctcagt cactggctct gagcctcctt atcctggttc
tggcctttgg catccccagg 60acccaaggca gtgatggagg ggctcaggac tgttgcctca
agtacagcca aaggaagatt 120cccgccaagg ttgtccgcag ctaccggaag
caggaaccaa gcttaggctg ctccatccca 180gctatcctgt tcttgccccg
caagcgctct caggcagagc tatgtgcaga cccaaaggag 240ctctgggtgc
agcagctgat gcagcatctg gacaagacac catccccaca gaaaccagcc
300cagggctgca ggaaggacag gggggcctcc aagactggca agaaaggaaa
gggctccaaa 360ggctgcaaga ggactgagcg gtcacagacc cctaaagggc ca
4021303865DNAArtificial SequenceThe sequence has been designed and
synthesized. 130acagaacacg gggtgcctgg aaggggaaca gatgtgttgt
ggggcacagg gcaggctggg 60aggggaacaa aggtccactc catgggtaac cagacccttc
cgccagggct ggccacttct 120gcctttggaa aatgtttcac aacgccccat
gttgtgtgtg tgtgtgaatc ggccgatgtg 180aaccgaatgt tgatgtaaga
ggcagggcac tcggctgcgg atgggtaaca gggcgtgggc 240tggcacactt
acttgcacca gtgcccagag agggggtgca ggctgaggag ctgcccagag
300caccgctcac actcccagag tacctgaagt cggcatttca atgacaggtg
acaagggtcc 360ccaaaggcta agcgggtcca gctatggttc catctccagc
ccgaccagcc cgaccagccc 420agggccacag caagcacctc ccagagagac
ctacctgagt gagaagatcc ccatcccaga 480cacaaaaccg ggcaccttca
gcctgcggaa gctatgggcc ttcacggggc ctggcttcct 540catgagcatt
gctttcctgg acccaggaaa catcgagtca gatcttcagg ctggcgccgt
600ggcgggattc aaacttctct gggtgctgct ctgggccacc gtgttgggct
tgctctgcca 660gcgactggct gcacgtctgg gcgtggtgac aggcaaggac
ttgggcgagg tctgccatct 720ctactaccct aaggtgcccc gcaccgtcct
ctggctgacc atcgagctag ccattgtggg 780ctccgacatg caggaagtca
tcggcacggc cattgcattc aatctgctct cagctggacg 840aatcccactc
tggggtggcg tcctcatcac catcgtggac accttcttct tcctcttcct
900cgataactac gggctgcgga agctggaagc tttttttgga ctccttataa
ccattatggc 960cttgaccttt ggctatgagt atgtggtggc gcgtcctgag
cagggagcgc ttcttcgggg 1020cctgttcctg ccctcgtgcc cgggctgcgg
ccaccccgag ctgctgcagg cggtgggcat 1080tgttggcgcc atcatcatgc
cccacaacat ctacctgcac tcggccctgg tcaagtctcg 1140agagatagac
cgggcccgcc gagcagacat cagagaagcc aacatgtact tcctgattga
1200ggccaccatc gccctgtccg tctcctttat catcaacctc tttgtcatgg
ctgtctttgg 1260gcaggccttc taccagaaaa ccaaccaggc tgcgttcaac
atctgtgcca acagcagcct 1320ccacgactac gccaagatct tccccatgaa
caacgccacc gtggccgtgg acatttacca 1380ggggggcgtg atcctgggct
gcctgttcgg ccccgcggcc ctctacatct gggccatagg 1440tctcctggcg
gctgggcaga gctccaccat gacgggcacc tacgcgggac agttcgtgat
1500ggagggcttc ctgaggctgc ggtggtcacg cttcgcccgt gtcctcctca
cccgctcctg 1560cgccatcctg cccaccgtgc tcgtggctgt cttccgggac
ctgagggact tgtcgggcct 1620caatgatctg ctcaacgtgc tgcagagcct
gctgctcccg ttcgccgtgc tgcccatcct 1680cacgttcacc agcatgccca
ccctcatgca ggagtttgcc aatggcctgc tgaacaaggt 1740cgtcacctct
tccatcatgg tgctagtctg cgccatcaac ctctacttcg tggtcagcta
1800tctgcccagc ctgccccacc ctgcctactt cggccttgca gccttgctgg
ccgcagccta 1860cctgggcctc agcacctacc tggtctggac ctgttgcctt
gcccacggag ccacctttct 1920ggcccacagc tcccaccacc acttcctgta
tgggctcctt gaagaggacc agaaagggga 1980gacctctggc taggcccaca
ccagggcctg gctgggagtg gcatgtatga cgtgactggc 2040ctgctggatg
tggagggggc gcgtgcaggc agcaggatag agtgggacag ttcctgagac
2100cagccaacct gggggcttta gggacctgct gtttcctagc gcagccatgt
gattaccctc 2160tgggtctcag tgtcctcatc tgtaaaatgg agacaccacc
acccttgcca tggaggttaa 2220gcactttaac acagtgtctg gcacttggga
caaaaacaaa caaacgaaaa acatttcaaa 2280aggtatttat tgagcacctg
caggcgtgac ctgacagccc aagggtgggt ggggtgaggg 2340cttgaggact
tgggcgggac acaggctcca aactggagct tgaaatagtg tctgatgaat
2400gttaaattat ctatctatct atttatttat ttatttgaga cagggaaagg
gtctccctct 2460gttgccaagg ctggagtgca gtggcgcaat cttaactcat
tgcaacctcc accttctggg 2520ttcaagcgat tctctttatt cagccccggg
agtggcgcgc gccaccacgc ccagctaatt 2580tgtgtatttt cagcagagac
ggggtttgcc atgctggcca ggctggtctc gaactgctgg 2640attcaagtga
tccgcccatc tccgtctccc aaagtgctgg gaattacagg cgtgagccac
2700caaacccggc ctgattaaag ttaaataaat actagttccc ttctcgtcca
aaggagcagg 2760gaatgggaac cgggaaggca cgaagtctct aaagcatcca
gaagacccct acaccagggt 2820ctggtccgct cctattcgcc gcagcctttc
tgttccgcct gcaacccatt ttccagacag 2880taaaacggcg gcgcacttct
ttctccgtca ggcaccaggt cataaggaac ccaagagtct 2940gtgcctctga
ggcccaaatt atttgctgtt tcctcagggg agccggcggc cgcgactccc
3000acgccgcgcc gttaccgctc cctctctgct gactgctccc cctaggggca
gagacggtcc 3060cgacgcccgc catcccgccc cggcctcacc cctccccgcc
aggcggaacg acgcggggag 3120gcgggcgctc ggggctcgcg ccaggggccc
cagaatcctt cggggagagt gggtgggagg 3180aagctgtgtg ggcggggagc
cccctctgcc ttagggagcg gctgggcacc cattcgcccc 3240attcaggggc
tgcactttat agacgttccc taggctgttt ctaggctccc ccaagtccct
3300cctccagcct cgtcgggtcc ctcagacccc agcccaggac ctgcggaggg
ccgcagcgag 3360gagaggccaa caggcctttc cctagagttg aacctgggcc
gggtgttgca cctggaagaa 3420cccccgattt cctggggacc cagcagggca
ggcggcctgg ctccgcgctc aggtccggac 3480gcttgtttat gagaagaatt
tcctctttct taaaagggca acgatgcgag tgggtccctc 3540aaggagagaa
gagatgggac cggtctggtg cgacctgggc aagcgctgca gagggtacct
3600gggcaagagg gccgcccgcc tcctctgggt ttggcactgg agaagatggg
tccatgccag 3660ctgaaggagg agatggatgg gggacgttta gcgaagaaag
gcatctccca gatcctttag 3720cctcctggaa gtgcccccgt tgtaccccct
acacacccct cttggcattg agtgccagtc 3780ctctgccagg ctctgtgtta
caagttgggg agggcggcaa agtcccgaat taaagatgtc 3840agttctcaag
gaaaaaaaaa aaaaa 3865131345DNAArtificial SequenceThe sequence has
been designed and synthesized. 131atggccgagt gcccgacact cggggaggca
gtcaccgacc acccggaccg cctgtgggcc 60tgggagaagt tcgtgtattt ggacgagaag
cagcacgcct ggctgccctt aaccatcgag 120ataaaggata ggttacagtt
acgggtgctc ttgcgtcggg aagacgtcgt cctggggagg 180cctatgaccc
ccacccagat aggcccaagc ctgctgccta tcatgtggca gctctaccct
240gatggacgat accgatcctc agactccagt ttctggcgct tagtgtacca
catcaagatt 300gacggcgtgg aggacgtgct tctcgagctg ctgccagatg attaa
3451321265DNAArtificial SequenceThe sequence has been designed and
synthesized. 132ctcggccccg cccccgcgcc ccggatatgc tgggacagcc
cgcgccccta gaacgctttg 60cgtcccgacg cccgcaggtc ctcgcggtgc gcaccgtttg
cgacttggta cttggaaaaa 120tggacaagga ttgtgaaatg
aaacgcacca cactggacag ccctttgggg aagctggagc 180tgtctggttg
tgagcagggt ctgcacgaaa taaagctcct gggcaagggg acgtctgcag
240ctgatgccgt ggaggtccca gcccccgctg cggttctcgg aggtccggag
cccctgatgc 300agtgcacagc ctggctgaat gcctatttcc accagcccga
ggctatcgaa gagttccccg 360tgccggctct tcaccatccc gttttccagc
aagagtcgtt caccagacag gtgttatgga 420agctgctgaa ggttgtgaaa
ttcggagaag tgatttctta ccagcaatta gcagccctgg 480caggcaaccc
caaagccgcg cgagcagtgg gaggagcaat gagaggcaat cctgtcccca
540tcctcatccc gtgccacaga gtggtctgca gcagcggagc cgtgggcaac
tactccggag 600gactggccgt gaaggaatgg cttctggccc atgaaggcca
ccggttgggg aagccaggct 660tgggagggag ctcaggtctg gcaggggcct
ggctcaaggg agcgggagct acctcgggct 720ccccgcctgc tggccgaaac
tgagtatgtg cagtaggatg gatgtttgag cgacacacac 780gtgtaacact
gcatcggatg cggggcgtgg aggcaccgct gtattaaagg aagtggcagt
840gtcctgggaa caagcgtgtc tgccctttct gtttccatat tttacagcag
gatgagttca 900gacgcccgcg gtcctgcaca catttgtttc cttctctaac
gctgcccttg ctctattttt 960catgtccatt aaaacaggcc aagtgagtgt
ggaaggcctg gctcatgttg ggccacagcc 1020caggatgggg cagtctggca
ccctcaggcc acagacggct gccatagccg ctgtccaggg 1080ccagctaagg
cccatcccag gccgtccaca ctagaaagct ggccctgccc catccccacc
1140atgcctccct tcctggctgt gtccatggct gtgatggcat tctccactca
gcagttccta 1200gcatcccaca cccaggtctc actgaaagaa aggggaacag
gccatggcag tcagtgctta 1260cagag 12651331916DNAArtificial
SequenceThe sequence has been designed and synthesized.
133ctagagcact tcaattagtg gtgaacaaca cggtctctac tccaaggggc
tcacatcttg 60tgcagaaaac agaaatgaac aaataacaca caagatcatt tccgtggtag
tgagagctgg 120gatgaaaata aaacagcgtg gcagggagga ggcaagtgtt
gtgagtctgg agggttcctg 180gagaatgggg cctgaggcgt gaccaccgcc
ttcctctctg gggggactgc ctgccgcccc 240cgcagacacc catggttgag
tgccctccag gcccctgcct gccccagcat cccctgcgcg 300aagctgggtg
ccccggagag tctgaccacc atgccacctc ctcgcctcct cttcttcctc
360ctcttcctca cccccatgga agtcaggccc gaggaacctc tagtggtgaa
ggtggaaggt 420ggaaggtatg tccaaagggc agaaagggaa gggattgagg
ctggaaactg agttgtggct 480gggtgtcctt nnctgagtaa cttaccctct
ctaagcctcc attttcttat ttgtaaaatt 540catcaaaggg ttggaaggac
tctgccggct cctccactcc cagcttttgg agtcctcgct 600ctataacctg
gnntgtcagg agcacggggg gcttggaggt cccccccacc catggtctcc
660acagagggag ataacgctgt gctgcagtgc ctcaagggga cctcagatgg
ccccactcag 720cagctgacct ggtctcggga gtccccgctt aaacccttct
taaaactcag cctggggctg 780ccaggcctgg gaatccacat gaggcccctg
gcatcctggc ttttcatctt caacgtctct 840caacagatgg ggggcttcta
cctgtgccag ccggggcccc cctctgagaa ggcctggcag 900cctggctgga
cagtcaatgt ggagggcagc ggtgagggcc gggctggggc aggggcagga
960ggagagaagg gaggccacca tggacagaag aggtccgcgg ccacaatgga
gctggagaga 1020ggggctggag ggattgaggg cgaaactcgg agctaggtgg
gcagactcct ggggcttcgt 1080ggcttcagta tgagctgctt cctgtccctc
tacctctcac tgtcttctct ctctctgcgg 1140gtctttgtct ctatttatct
ctgtctttga gtctctatct ctctccctct cctgggtgtc 1200tctgcatttg
gttctgggtc tcttcccagg ggagctgttc cggtggaatg tttcggacct
1260aggtggcctg ggctgtggcc tgaagaacag gtcctcagag ggccccagct
ccccttccgg 1320gaagctcatg agccccaagc tgtatgtgtg ggccaaagac
cgccctgaga tctgggaggg 1380agagcctccg tgtgtcccac cgagggacag
cctgaaccag agcctcagcc agggtatggt 1440gatgactggg gagatgccgg
gaagctgggg tccagagaca gagaggggag ggaaactgaa 1500gaggtgaaac
cctgaggatc aggctttcct tgtcttatct ctccctgtcc cagacctcac
1560catggcccct ggctccacac tctggctgtc ctgtggggta ccccctgact
ctgtgtccag 1620gggccccctc tcctggaccc atgtgcaccc caaggggcct
aagtcattgc tgagcctaga 1680gctgaaggac gatcgcccgg ccagagatat
gtgggtaatg gagacgggtc tgttgttgcc 1740ccgggccaca gctcaagacg
ctggaaagta ttattgtcac cgtggcaacc tgaccatgtc 1800attccacctg
gagatcactg ctcggccagg tagagtttct ctcaactggg aggcatctgt
1860gtgggggtac tgggaagaag tggagccagt caatcttaga ttcccccaac ccgagg
19161341341DNAArtificial SequenceThe sequence has been designed and
synthesized. 134atgatcccca ccttcacggc tctgctctgc ctcgggctga
gtctgggccc caggacccac 60atgcaggcag ggcccctccc caaacccacc ctctgggctg
agccaggctc tgtgatcagc 120tgggggaact ctgtgaccat ctggtgtcag
gggaccctgg aggctcggga gtaccgtctg 180gataaagagg aaagcccagc
accctgggac agacagaacc cactggagcc caagaacaag 240gccagattct
ccatcccatc catgacagag gactatgcag ggagataccg ctgttactat
300cgcagccctg taggctggtc acagcccagt gaccccctgg agctggtgat
gacaggagcc 360tacagtaaac ccaccctttc agccctgccg agtcctcttg
tgacctcagg aaagagcgtg 420accctgctgt gtcagtcacg gagcccaatg
gacactttcc ttctgatcaa ggagcgggca 480gcccatcccc tactgcatct
gagatcagag cacggagctc agcagcacca ggctgaattc 540cccatgagtc
ctgtgacctc agtgcacggg gggacctaca ggtgcttcag ctcacacggc
600ttctcccact acctgctgtc acaccccagt gaccccctgg agctcatagt
ctcaggatcc 660ttggaggatc ccaggccctc acccacaagg tccgtctcaa
cagctgcagg ccctgaggac 720cagcccctca tgcctacagg gtcagtcccc
cacagtggtc tgagaaggca ctgggaggta 780ctgatcgggg tcttggtggt
ctccatcctg cttctctccc tcctcctctt cctcctcctc 840caacactggc
gtcagggaaa acacaggaca ttggcccaga gacaggctga tttccaacgt
900cctccagggg ctgccgagcc agagcccaag gacgggggcc tacagaggag
gtccagccca 960gctgctgacg tccagggaga aaacttctgt gctgccgtga
agaacacaca gcctgaggac 1020ggggtggaaa tggacactcg gagcccacac
gatgaagacc cccaggcagt gacgtatgcc 1080aaggtgaaac actccagacc
taggagagaa atggcctctc ctccctcccc actgtctggg 1140gaattcctgg
acacaaagga cagacaggca gaagaggaca gacagatgga cactgaggct
1200gctgcatctg aagcccccca ggatgtgacc tacgcccagc tgcacagctt
taccctcaga 1260cagaaggcaa ctgagcctcc tccatcccag gaaggggcct
ctccagctga gcccagtgtc 1320tatgccactc tggccatcca c
1341135458DNAArtificial SequenceThe sequence has been designed and
synthesized. 135tctacttgcc tgcctccctg cctctggcca tggcctgccg
gtgcctcagc ttccttctga 60tggggacctt cctgtcagtt tcccagacag tcctggccca
gctggatgca ctgctggtct 120tcccaggcca agtggctcaa ctctcctgca
cgctcagccc ccagcacgtc accatcaggg 180actacggtgt gtcctggtac
cagcagcggg caggcagtgc ccctcgatat ctcctctact 240accgctcgga
ggaggatcac caccggcctg ctgacatccc cgatcgattc tcggcagcca
300aggatgaggc ccacaatgcc tgtgtcctca ccattagtcc cgtgcagcct
gaagacgacg 360cggattacta ctgctctgtt ggctacggct ttagtcccta
ggggtggggt gtgagatggg 420tgcctcccct ctgcctccca tttctgcccc tgaccttg
4581362692DNAArtificial SequenceThe sequence has been designed and
synthesized. 136ccacgcgtcc ggggactaaa gcccagagag caagacagtt
gggcttagaa ggagcagcca 60gggcactgct tgagaaacca ggggagctca gtctgctatc
gtacattagg cctgacgtta 120aagggctttc aacgcttcag gatattgaaa
taggagtgca gcatatttta gcagatatga 180ttgctaaaga caaagacacg
cttgacttca ttcggaactt gtgccagaag agacatgttt 240gtatccagtc
atctctggca aaagtatcct caaaaaaggt aaatgagaaa gatgttgata
300agtttctgct ctaccagcat ttttcctgca acataagaaa cattcaccat
catcagattc 360tggcaattaa ccgtggagaa aatttgaagg tactgacggt
taaggtcaat atttctgatg 420gagtgaagga tgaattctgt aggtggtgca
tccaaaacag gtggagacca cgtagctttg 480caaggccaga gttaatgaag
atcttatata attcactgaa tgattccttt aaacgcctta 540tttatcctct
tctctgtaga gaattcagag ccaaactaac atcagatgca gagaaggaat
600cagtaatgat gtttggacgg aaccttcgtc agctcctttt aacaagccct
gttccagggc 660gcaccttaat gggagtggat cctggttata aacatggttg
caaattagct ataatttctc 720ctactagtca gatacttcat actgatgtgg
tttacttgca ttgtggacaa ggcttccgag 780aggcggagaa aataaagaca
cttttgctga atttcaactg cagcacagta gtgattggaa 840atggaactgc
ctgcagggaa acagaagctt actttgctga cctgataatg aagaattatt
900ttgcaccact ggatgttgtt tactgtatcg tcagtgaagc aggagcatca
atctacagtg 960tcagccctga agctaacaaa gagatgccag ggctggaccc
taatttgaga agtgcagttt 1020ccatagcaag gcgtgtacaa gatccattag
ctgagctagt gaaaattgag ccaaagcaca 1080ttggagttgg aatgtatcag
catgacgtat cccagacttt actcaaggca acactggaca 1140gtgttgtaga
agaatgtgtc agctttgtgg gagtggatat taacatctgt tcagaagttt
1200tgttaaggca tattgcagga ctcaatgcca acagggccaa aaatattatt
gaatggcgag 1260agaaaaatgg accctttatc aaccgagaac agctgaagaa
agtgaaaggg ctgggcccaa 1320aatccttcca acagtgtgct ggcttcatca
gaatcaacca ggattatatc cgaacgtttt 1380gcagtcagca aactgaaact
tcaggccaaa ttcaaggagt tgctgtgaca tcttcagcag 1440acgttgaggt
cacaaatgag aagcagggca aaaagaagag caaaactgca gtgaatgttt
1500tactgaagcc aaatcctttg gaccaaactt gtattcatcc agaatcatat
gacatagcaa 1560tgaggttttt gtcatccatt ggagggacac tgtatgaggt
tggaaagcct gaaatgcaac 1620aaaaaataaa ttcattcctt gaaaaggaag
gaatggagaa aattgcagaa agattgcaaa 1680caacagtaca caccttacag
gtcatcatag atggtctcag ccagcctgaa agctttgact 1740ttcgaacaga
ttttgataaa cctgatttca agagaagcat agtatgcctg gaagatctgc
1800agattgggac agttcttaca ggcaaagttg agaatgccac tctctttgga
atttttgtgg 1860atataggagt ggggaaatct gggctgattc ccatacgaaa
tgtaacagaa gcaaaacttt 1920caaaaacaaa gaagagaaga agccttggac
tgggccccgg agaaagagtg gaagtccaag 1980tactcaacat tgacatcccc
cgatctagga ttactctgga cctcattcgg gtgttatgag 2040tatcccacga
aggccagacg ctgattttat tttctcattt ccacagattg acaaggataa
2100gtcagttgtt tgtaaactct aggtagcaga tgagaaataa ttcacttaat
atcagaaata 2160ttttccaaac actttccttt attttttctt ctgaataaat
agaaaaccaa cagtttgatt 2220tccttttccc ttaaaggaaa caactaatac
acattcttat atggctttat gtagtaatag 2280ttttctgact aaaattttgt
tttttatttt ttgtaattta tctttaactc cttttgcatt 2340ttgtataaca
gattgcttaa cttctacttg ccaacatctg ccttgctgga cttgtatggg
2400attgtcttct tgatttgaat tgtaccgtct ttgttgacac agtagggctg
ggcagtgttt 2460aatccttcca ttttatagat ttttttttaa tcaggccttt
tggacttcat tcataatttt 2520gcaataatct cttttccctt gtcatgcaag
ccaaaaatat accagtaaaa cagattctga 2580cgtgtttgta gttatcaaat
gaatggctcg aaacacttct caaaaggata tacgtattga 2640ccccaacaat
aaatgtttgt ggctagtgaa aaaaaaaaaa aaaaaaaaaa aa
26921376643DNAArtificial SequenceThe sequence has been designed and
synthesized. 137gtcgccccgt ggccccacaa tgacccacag ccccgcgaca
agcgaggacg aggaacgcca 60cagtgccagc gagtgtcccg aggggggctc agagtccgac
agctccccag acgggccagg 120tcgaggcccc cgggggaccc ggggccaggg
cagtggggca cctggtagcc tggcctctgt 180tagaggcctc cagggccgct
caatgtccgt cccagacgac gcccacttca gcatgatggt 240cttcaggatt
ggcatcccgg acctgcacca gacaaaatgc cttcgcttca accccgatgc
300caccatctgg acggccaagc agcaggtgct ctgtgccctg agcgagagcc
tgcaggatgt 360gctcaactat ggcctgttcc aaccggccac ctccggccgc
gatgccaact tcctggagga 420ggagaggctg ctgcgggagt acccccagtc
ctttgagaag ggggtcccct acctggagtt 480ccgatacaag acccgagttt
acaaacagac caacctggat gagaagcagc tggccaagtt 540gcacacgaag
acggggttga agaagttcct ggagtatgtg cagctcggga catctgacaa
600ggtggcgcgg ctgctggaca aggggctgga ccccaattac catgactcgg
attcgggaga 660gacccccttg acactggcgg cccagaccga aggctctgta
gaggtgattc gaaccctgtg 720cctgggcggg gcccacattg acttccgggc
ccgggatggc atgaccgcac tgcataaggc 780cgcatgcgcc cgacactgcc
tggcactcac ggcgctcctg gaccttgggg gttcccccaa 840ctacaaggac
cgtcgggggc tgacccctct gttccacacg gccatggtgg gtggtgaccc
900ccgatgctgc gagctgctcc tgttcaacag ggcccagctg ggcatagctg
atgagaacgg 960ctggcaggaa atccaccagg cctgccagcg gggtcactct
cagcacctgg agcatctgct 1020tttctacggg gctgagcctg gagcccagaa
cgcctcgggg aacacggctc tgcacatctg 1080cgccctctac aacaaggaga
cctgtgccag gatcctcctg tatcgaggtg ccgacaagga 1140tgtgaagaac
aacaacggac agaccccctt ccaggtggca gtgattgctg ggaattttga
1200gctgggggag ctgatccgaa accaccgaga acaggatgtg gtgcccttcc
aggagtcccc 1260caagtacgcg gcccggcgac gggggccccc aggcacaggg
ctgacggtgc ccccggcgct 1320gctgcgggcc aacagtgaca ccagcatggc
gctgcccgac tggatggtgt tctccgcccc 1380gggggccgcg tcctctgggg
cccctggccc tacctcaggg tcccagggcc agtcgcagcc 1440ctcggccccc
accaccaagc tcagcagcgg gaccctccga agtgccagca gcccccgggg
1500tgccagggcc cgctctccat cccgagggag gcaccctgag gacgccaaga
ggcagccccg 1560aggccggccc agctccagcg ggacaccccg ggaagggcca
gccgggggca cggggggctc 1620agggggcccc gggggctccc tgggcagccg
cgggaggcgg aggaagctct actcagcggt 1680acccggacgc tccttcatgg
ctgtgaagtc ctaccaggcc caagccgagg gggagatctc 1740cctgagcaag
ggcgagaaga tcaaagtact tagcatcggg gaaggaggct tctgggaagg
1800ccaggtcaaa ggtcgtgttg gctggttccc ctctgactgc ctggaagaag
tggcgaatcg 1860ctctcaggag agcaagcaag aaagccgcag tgacaaggca
aagagactct tccggcatta 1920taccgtgggc tcctacgaca gctttgatgc
cccaagctta atggatggga ttggcccagg 1980gagcgattac atcattaagg
agaagacagt cttgctgcag aagaaggaca gtgaggggtt 2040tgggttcgtg
ctccgggggg ccaaggcgca gacccccatc gaggagttca cccccacccc
2100ggccttcccg gcgctgcagt acctggagtc ggtggacgag ggtggcgtgg
catggcgagc 2160tggactgcga atgggagact tcctcatcga ggtgaacggg
cagaatgtgg tgaaggtcgg 2220ccaccgacag gtggtgaaca tgatccgcca
agggggcaac acgctgatgg tgaaggtggt 2280gatggtcacc aggcacccgg
acatggatga ggcagtgcac aagaaagcac cccagcaggc 2340caagcggctg
ccgcccccaa ccatctccct gcgttccaaa tctatgacct cagagctgga
2400ggagatggag tacgagcagc agccggcgcc ggtgcccagc atggagaaaa
agcggaccgt 2460gtatcagatg gctctcaaca aactggacga aatcctggcc
gcagctcaac agaccatcag 2520tgcaagcgaa agccctggtc ccggtggcct
cgcgtccctg ggcaaacacc gacccaaagg 2580tttctttgcc actgagtcga
gcttcgatcc ccaccaccgt gcccagccaa gttacgagcg 2640tccttctttc
ctgcctccag gacctgggtt gatgctccgg caaaaatcta tcggtgcggc
2700agaagatgac agaccttacc tagcaccccc agccatgaaa ttcagccgca
gcctgtctgt 2760gcctggttcg gaggacattc ccccgccacc caccacgtcc
ccaccggagc ctccctacag 2820cacacctcca gtcccctcct cctcagggcg
cctcaccccc tcccctcggg gagggccctt 2880caaccctggc tctggtggcc
ccctccccgc ctcctcccct gcatcctttg acgggccctc 2940ccctcccgac
actcgcgtgg ggagccgcga gaagagcctg taccacagtg ggcccctgcc
3000cccggcccac caccacccgc cccaccacca ccaccaccac gccccgcccc
ctcagcccca 3060ccaccaccac gcccaccccc ctcatcctcc cgagatggag
acaggcggct ctcccgacga 3120ccctccaccc cgcctggctc tggggcccca
gcccagcctg cgaggctgga ggggcggcgg 3180gcccagcccg accccggggg
ccccgtcccc atcgcaccac ggcagcgcgg gcgggggcgg 3240cggctcctcc
cagggcccgg ctctacgcta tttccagctg cccccgcggg cggccagcgc
3300agccatgtac gtgcccgccc gctcgggccg cggccgcaag ggcccgctgg
tcaagcagac 3360caaggtggaa ggcgagcccc agaagggcgg cggcctcccg
cccgcgccgt cgcccacgtc 3420cccggcctcc ccgcagccgc cgcccgccgt
ggccgcgccc tcggagaaga actccatccc 3480catccccacc atcatcatca
aggccccgtc caccagtagc agcggccgca gcagccaggg 3540cagcagcacc
gaggcggagc cccccaccca gccggagccc acgggaggcg gcggcggcgg
3600cggctcctcg cccagccccg ccccggccat gtcacccgtg cccccgtccc
cctcgcccgt 3660gcccaccccc gcctcgccca gcggcccggc cacgctggac
ttcacgagcc agttcggggc 3720cgccctggtg ggggcggccc ggagggaggg
gggctggcag aatgaggcgc gccggcgctc 3780cacgctgttc ctgtccaccg
acgcggggga cgaggacggc ggggacggcg ggctgggcac 3840aggggcggcc
ccgggcccgc ggctgcgcca ctccaaatcc atcgacgagg gcatgttctc
3900cgccgagccc tacctccgac tggagtctgc gggcagcggc gcgggctacg
gcggctacgg 3960ggccggtagc cgagcctacg ggggtggcgg gggcagcagc
gccttcacca gcttcctgcc 4020cccgcgaccc ctggtgcacc cgctgaccgg
caaggccctg gatcccgcct ccccgctggg 4080gctggccctg gccgcccgcg
agcgagcgct gaaggagtcc tcggagggcg gcggggcccc 4140ccagccgcct
cccaggcccc catcgccccg ctacgaggcc ccgccgccca ccccgcacca
4200ccactcgccc cacgcccacc acgagccagt gctgcgtctc tggggggcct
ccccgccgga 4260ccctgcgcgc cgggagctgg ggtacagggc cgggctgggc
agccaggaga agtcccttcc 4320cgccagcccg cccgccgccc ggcgttccct
gctacaccgc ctgccgccca ccgctcccgg 4380ggtggggccc ctcctgctgc
agctggggac ggagcccccg gccccgcacc ccggagtaag 4440caagccctgg
aggtccgcag cccccgaaga acccgagcgg ctgccgctgc acgtgcggtt
4500ccttgaaaac tgccagcccc gggcccctgt gacgagcgga aggggtcccc
cctcggagga 4560cgggccgggg gtcccgccgc ccagcccacg ccggtccgtg
cccccctccc cgacctcccc 4620gagggccagc gaagagaacg ggctgcccct
gctggtcctg ccgcctcccg ccccctcggt 4680ggatgtggaa gatggcgaat
tccttttcgt ggaaccgctg cctccgcctc tggaattctc 4740caacagcttc
gaaaagccag agtcgcccct cacgcctggg cctccccacc cgctgcccga
4800cacacctgcc cctgccaccc cgttaccccc tgtgccaccc ccggctgtgg
ccgcagcccc 4860tcccaccctg gactccaccg catccagcct gacatcctat
gacagcgagg tggccaccct 4920gacccagggg gcctccgccg ctcctgggga
cccccatcca ccaggcccgc ctgccccagc 4980agcaccggct cccgctgccc
cacagcctgg cccggaccct ccgcctggca cggattctgg 5040catcgaggag
gtggacagtc ggagcagcag tgaccaccca ctggagacca tcagcagcgc
5100ctccacgctg agcagcctat ctgccgaagg tggtggcagc gcagggggtg
ggggcggggc 5160tggggccggt gtggccagtg ggccggagct tctggacacc
tatgtggcct acctggacgg 5220ccaggccttt gggggcagca gtactcccgg
cccgccatac cctcctcagc tcatgactcc 5280ctctaagctc cggggccggg
cgctaggagc cagcggaggc ctgcggcctg gccccagcgg 5340gggactccga
gaccctgtta cccccaccag ccccaccgtc tcggtgacag gggctggaac
5400cgatgggctg ctggccctgc gtgcttgttc aggacccccc acggcaggcg
tggcgggggg 5460tccggtggct gtagagccag aagtcccacc ggtgcccttg
ccgacggcct cctctctgcc 5520ccggaagctg ctgccctggg aggagggccc
gggcccaccg ccaccacctc tgcccgggcc 5580cttggcccag cctcaggcct
cagccttggc cacagtaaaa gccagcatca tcagtgaact 5640cagctccaag
cttcagcagt ttgggggctc ctcggcagct ggcggcgctc tgccctgggc
5700ccgaggaggc agtgggggag gcggagacag ccaccacggg ggagccagct
atgtccccga 5760gaggacctcc tccctgcagc ggcagagact ctccgacgac
tcccagtcct cactcctctc 5820caagcctgtc agcagcctgt ttcagaactg
gcccaaacca cctctgccgc cactccccac 5880cggaacaggg gtctccccta
cagccgctgc ggccccaggg gccacctcac cctcagcctc 5940ctcctcctcc
acgtccaccc gccacctcca gggcgtggag ttcgagatgc ggccccctct
6000gctccgccgg gcccccagcc cctcgctgct gcccgcctcg gagcacaagg
tcagccctgc 6060gcccaggccc tcgtccctgc ccatcctgcc ttccggaccc
ctctacccag gcctctttga 6120catccgtggc tccccaactg gaggggcagg
aggctcggct gaccccttcg ccccagtctt 6180tgtgccgcca cacccgggga
tatccggggg gctcggggga gccttgtcag gggcctcgcg 6240ctccctctca
ccgacccgcc tgctctcgct gcccccggac aagccgtttg gcgctaaacc
6300tctggggttc tggaccaagt tcgacgtggc tgattggctg gagtggctgg
gtttggcgga 6360gcaccgagcc cagttcctgg accacgagat cgatggctcc
cacctgcccg ccttgaccaa 6420ggaggactac gtcgatctag gtgtgaccag
ggtgggccac cgcatgaaca tcgaccgggc 6480tctcaaattc ttcctggaga
ggtgatggct ggcctggacg gaccagcccc gtccacagaa 6540ctcttgagcc
tgctggcctc ttgacctctg acccctgact gtcattctct ccccgggcca
6600gggactctgt tcaaactgcg ccctgccctc atctcccaag gcc
66431383318DNAArtificial SequenceThe sequence has been designed and
synthesized. 138gtccttccca gggtgtgtta gcgtctctcc tacggcgccc
gcctagcccg gtggtcccaa 60ccccctgcgg gagctgcaca cgccccggaa acctgggaca
gaaactgagt ccctctcctt 120cctggtggtg gtgacagcac ctgctcagat
ctggtcggac gccgccggcc ggagcaccca 180gcccggcgga gaaggagctc
gcccggcgct ggggactggg acctggagcc ccttccccta 240ccgcacgtac
gccccgcccc gcgcacgccc gcccgcccgc ccgcgcctgg cgcagcttca
300ctccggatgg
ttcctgtcct cccgcgggtc cgagggcgct ggaaacccag cggcggcgaa
360gcggagagga gccccgcgcg tctccgcccg cacggctcca ggtctggggt
ctgcgctgga 420gccgcgcggg gagaggccgt ctctgcgacc gccgcgcccg
ctcccgaccg tccgggtccg 480cggccagccc ggccaccagc catgggctct
ggcccgctct cgctgcccct ggcgctgtcg 540ccgccgcggc tgctgctgct
gctgctgctg tctctgctgc cagtggccag ggcctcagag 600gctgagcacc
gtctatttga gcggctgttt gaagattaca atgagatcat ccggcctgta
660gccaacgtgt ctgacccagt catcatccat ttcgaggtgt ccatgtctca
gctggtgaag 720gtggatgaag taaaccagat catggagacc aacctgtggc
tcaagcaaat ctggaatgac 780tacaagctga aatggaaccc ctctgactat
ggtggggcag agttcatgcg tgtccctgca 840cagaagatct ggaagccaga
cattgtgctg tataacaatg ctgttgggga tttccaggtg 900gacgacaaga
ccaaagcctt actcaagtac actggggagg tgacttggat acctccggcc
960atctttaaga gctcctgtaa aatcgacgtg acctacttcc cgtttgatta
ccaaaactgt 1020accatgaagt tcggttcctg gtcctacgat aaggcgaaaa
tcgatctggt cctgatcggc 1080tcttccatga acctcaagga ctattgggag
agcggcgagt gggccatcat caaagcccca 1140ggctacaaac acgacatcaa
gtacaactgc tgcgaggaga tctaccccga catcacatac 1200tcgctgtaca
tccggcgcct gcccttgttc tacaccatca acctcatcat cccctgcctg
1260ctcatctcct tcctcactgt gctcgtcttc tacctgccct ccgactgcgg
tgagaaggtg 1320accctgtgca tttctgtcct cctctccctg acggtgtttc
tcctggtgat cactgagacc 1380atcccttcca cctcgctggt catccccctg
attggagagt acctcctgtt caccatgatt 1440tttgtaacct tgtccatcgt
catcaccgtc ttcgtgctca acgtgcacta cagaaccccg 1500acgacacaca
caatgccctc atgggtgaag actgtattct tgaacctgct ccccagggtc
1560atgttcatga ccaggccaac aagcaacgag ggcaacgctc agaagccgag
gcccctctac 1620ggtgccgagc tctcaaatct gaattgcttc agccgcgcag
agtccaaagg ctgcaaggag 1680ggctacccct gccaggacgg gatgtgtggt
tactgccacc accgcaggat aaaaatctcc 1740aatttcagtg ctaacctcac
gagaagctct agttctgaat ctgttgatgc tgtgctgtcc 1800ctctctgctt
tgtcaccaga aatcaaagaa gccatccaaa gtgtcaagta tattgctgaa
1860aatatgaaag cacaaaatga agccaaagag attcaagatg attggaagta
tgttgccatg 1920gtgattgatc gtatttttct gtgggttttc accctggtgt
gcattctagg gacagcagga 1980ttgtttctgc aacccctgat ggccagggaa
gatgcataag cactaagctg tgtgcctgcc 2040tgggagactt ccttgtgtca
gggcaggagg aggctgcttc ctagtaagaa cgtactttct 2100gttatcaagc
taccagcttt gtttttggca tttcgaggtt tacttatttt ccacttatct
2160tggaatcatg caaaaaaaaa aatgtcaaga gtatttatta ccgataaatg
aacatttaac 2220tagccttttt ggtatggtaa agagatgtca aaatgtgatt
ctatgtgatt agtatgctat 2280gctatggaat atacatgtaa aaatgtttcc
ttttagttgt tgaaacaaaa ctggatagaa 2340aaatgctgtt cagaaatatg
aaaagtcatt cagttatcac tacagatctc ccagtaattt 2400ttcttattta
gcccataatc tctttgaagg tttatactaa ttcagcaatc ccccatcgtt
2460acccatttct taccatgcat ttctcgttct ttactgggtc taaagggcta
tgcctccatt 2520tcagagagct tcaactactt ctcttgcata cttctaaatt
ataccatgag aaatcatgcc 2580tagttattca ttgttaatat aactgtctta
gtacaccata aactgggtgg attataaaca 2640acagaaactt ctcagttttg
gaggttggga ggtccaaggt caaggcacca gcaaatttgg 2700tgtctggtga
gggtcctctt cctcaaaggg tgccttctag ctgtgtcctc acatgactga
2760agggactagc tatctctgtg gggtctattt tataagggca ctaaccccat
tcatgagagc 2820agagccccca tggcctaatc acctttccaa ggccccacct
tctatctaag acaatcacgc 2880tgggaatagg tttcaacata tgaattgggg
gaggacacat ttggaccaca gcatgaacct 2940ttagaacagg gtttctcagc
cttagcacta tggacatttt gggctggata aatatgtgtt 3000ggtacagaat
gggggtatcc tgtgcattgt aggatcttta gcagtaccct agcctcaact
3060cactagatgc caatgacata ccttgcttct tcaccagtta tgataaccaa
gaatgtctcc 3120attgttaaat gttcccttag gagcaaaatt gcccctggtt
gagaaacatt gctttagaca 3180aattgttaag agtatcatgt actacacttc
tgaaacttaa cgtgatcatc accactgaca 3240gatgattcac agagactgtt
tgaatcttgt ctcactagtt tttcctgtgc aaaaataaaa 3300tggacagaat tgcagccc
33181391389DNAArtificial SequenceThe sequence has been designed and
synthesized. 139atggccagca acagcagctc ctgcccgaca cctgggggcg
ggcacctcaa tgggtacccg 60gtgcctccct acgccttctt cttcccccct atgctgggtg
gactctcccc gccaggcgct 120ctgaccactc tccagcacca gcttccagtt
agtggatata gcacaccatc cccagccacc 180attgagaccc agagcagcag
ttctgaagag atagtgccca gccctccctc gccaccccct 240ctaccccgca
tctacaagcc ttgctttgtc tgtcaggaca agtcctcagg ctaccactat
300ggggtcagcg cctgtgaggg ctgcaagggc ttcttccgcc gcagcatcca
gaagaacatg 360gtgtacacgt gtcaccggga caagaactgc atcatcaaca
aggtgacccg gaaccgctgc 420cagtactgcc gactgcagaa gtgctttgaa
gtgggcatgt ccaaggagtc tgtgagaaac 480gaccgaaaca agaagaagaa
ggaggtgccc aagcccgagt gctctgagag ctacacgctg 540acgccggagg
tgggggagct cattgagaag gtgcgcaaag cgcaccagga aaccttccct
600gccctctgcc agctgggcaa atacactacg aacaacagct cagaacaacg
tgtctctctg 660gacattgacc tctgggacaa gttcagtgaa ctctccacca
agtgcatcat taagactgtg 720gagttcgcca agcagctgcc cggcttcacc
accctcacca tcgccgacca gatcaccctc 780ctcaaggctg cctgcctgga
catcctgatc ctgcggatct gcactcggta cacgcccgag 840caggacacca
tgaccttctc ggacgggctg accctgaacc ggacccagat gcacaacgct
900ggcttcggcc ccctcaccga cctggtcttt gccttcgcca accagctgct
gcccctggag 960atggatgatg cggagacggg gctgctcagc gccatctgcc
tcatctgcgg agaccgccag 1020gacctggagc agccggaccg ggtggacatg
ctgcaggagc cgctgctgga ggcgctaaag 1080gtctacgtgc ggaagcggag
gcccagccgc ccccacatgt tccccaagat gctaatgaag 1140attactgacc
tgcgaagcat cagcgccaag ggggctgagc gggtgatcac gctgaagatg
1200gagatcccgg gctccatgcc gcctctcatc caggaaatgt tggagaactc
agagggcctg 1260gacactctga gcggacagcc ggggggtggg gggcgggacg
ggggtggcct ggcccccccg 1320ccaggcagct gtagccccag cctcagcccc
agctccaaca gaagcagccc ggccacccac 1380tccccttaa
13891401973DNAArtificial SequenceThe sequence has been designed and
synthesized. 140tggcatcccc cagccgccgc cagccccgcc gaggggagcc
agcgccgtct ctgaggggcg 60tccggcgccg gagccatgac cctccgccga ctcaggaagc
tgcagcagaa ggaggaggcg 120gcggccaccc cggaccccgc cgcccggact
cccgactcgg aagtcgcgcc cgccgctccg 180gtcccgaccc cgggaccccc
tgccgcagcc gccacccctg ggcccccagc ggacgagctg 240tacgcggcgc
tggaggacta tcaccctgcc gagctgtacc gcgcgctcgc cgtgtccggg
300ggcaccctgc cccgccgaaa gggctcagga ttccgctgga agaatctcag
ccagagtcct 360gaacagcagc ggaaagtgct gacgttggag aaggaggata
accagacctt cggctttgag 420atccagactt atggccttca ccaccgggag
gagcagcgtg tggaaatggt gacctttgtc 480tgccgagttc atgagtctag
ccctgcccag ctggctgggc tcacaccagg ggacaccatc 540gccagcgtca
atggcctgaa tgtggaaggc atccggcatc gagagattgt ggacatcatt
600aaggcgtcag gcaatgttct cagactggaa actctatatg ggacatcaat
tcggaaggca 660gaactggagg ctcgtctgca gtacctgaag caaaccctgt
atgagaagtg gggagagtac 720aggtccctaa tggtgcagga gcagcggctg
gtgcatggcc tggtggtgaa ggaccccagc 780atctacgaca cgctggagtc
ggtgcgctcc tgcctctacg gcgcgggcct gctcccgggc 840tcgctgccct
tcgggcctct gctcgccgtg cccgggcgtc cccgcggagg cgcccgacgg
900gccaggggcg acgccgacga cgccgtctac cacacgtgct tcttcgggga
ctccgagccg 960ccggcgctgc cgcccccgcc gcccccggcc cgcgccttcg
gcccgggccc cgccgagacc 1020cctgccgtgg ggccgggccc tgggccgcgg
gccgcgctga gccgcagcgc cagtgtgcgg 1080tgcgcgggcc ctggcggggg
cggaggcggg ggcgcgccgg gcgcgctctg gactgaggct 1140cgcgagcagg
ccctatgcgg ccccggcctg cgcaaaacca agtaccgcag cttccgccgg
1200cggctgctca agttcatccc cggactcaac cgctccctgg aggaggagga
gagccagctg 1260taggggcggg ggcgggcagg gaggtattta tttatttatt
cgcaacagcc agcgctaaaa 1320gagggggagg ccgagccaag aggaccccag
gagcccagag cagcgggaga gggtccttcc 1380tagcctcggc ccgccgggtc
ggttcctggc tggtgtctgc tgagggagtg gggggcccag 1440ccccttctct
tctcccccgc caaaccacag tgggagctgg ggcaggggga gagccaggca
1500atcgggggcc aaagatgggg gtgctcgcct acagtctgca tctgtagtgc
cttgtggggt 1560atccaggaac accctcccag caggggatgg gaaccctgtc
ccatgaagcc ctctcctcag 1620ctttacttgc tcccccgccc ttagccttgg
ggagaaatgg cccgtggtgg gctgaccccc 1680caccctccac acacacagtt
ccatgaccca gcgggccccc aggggcatca ggtgctggtc 1740ctcctccctc
ctggcctcga cccctaaggg cttcgcccct cccaggggcc tgtaactaag
1800tcgggtcctg ccaggcaggg ggcctgtgtt ctgtgcccct tgggagacag
gaactggcga 1860gttcaggtgg ggtggggaca gcacagactg ttccaccgtt
gtgcatattg ttgcttctga 1920accacaaact gtataaatgg atggtttttt
gcaaaaaaaa aaaaaaaaaa aaa 1973
* * * * *
References