Selective Oxidation Of 5-methylcytosine By Tet-family Proteins Rao; Anjana ; et al. [The Children's Medical Center Corporation]

Selective Oxidation Of 5-methylcytosine By Tet-family Proteins

Rao; Anjana ; et al.

Patent Application Summary

U.S. patent application number 16/172369 was filed with the patent office on 2019-02-14 for selective oxidation of 5-methylcytosine by tet-family proteins. This patent application is currently assigned to The Children's Medical Center Corporation. The applicant listed for this patent is The Children's Medical Center Corporation, The United States of America, As Represented by the Secretary, Department of Health & Human Servic. Invention is credited to Suneet Agarwal, Aravind Iyer, Kian Peng Koh, Anjana Rao, Mamta Tahiliani.

Application Number	20190048407 16/172369
Document ID	/
Family ID	42060425
Filed Date	2019-02-14

View All Diagrams

United States Patent Application	20190048407
Kind Code	A1
Rao; Anjana ; et al.	February 14, 2019

SELECTIVE OXIDATION OF 5-METHYLCYTOSINE BY TET-FAMILY PROTEINS

Abstract

The present invention provides for novel methods for regulating and detecting the cytosine methylation status of DNA. The invention is based upon identification of a novel and surprising catalytic activity for the family of TET proteins, namely TET1, TET2, TET3, and CXXC4. The novel activity is related to the enzymes being capable of converting the cytosine nucleotide 5-methylcytosine into 5-hydroxymethylcytosine by hydroxylation.

Inventors:

Rao; Anjana; (La Jolla, CA) ; Tahiliani; Mamta; (New York, NY) ; Koh; Kian Peng; (Jamaica Plain, MA) ; Agarwal; Suneet; (Belmont, MA) ; Iyer; Aravind; (Bethesda, MD)

Applicant:

Name	City	State	Country	Type
The Children's Medical Center Corporation The United States of America, As Represented by the Secretary, Department of Health & Human Servic	Boston Bethesda	MA MD	US US

Assignee:

The Children's Medical Center Corporation
Boston
MA

The United States of America, As Represented by the Secretary, Department of Health & Human Servic
Bethesda
MD

Family ID:

42060425

Appl. No.:

16/172369

Filed:

October 26, 2018

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
16169801	Oct 24, 2018
16172369
15341344	Nov 2, 2016
16169801
15193796	Jun 27, 2016
15341344
13795739	Mar 12, 2013	9447452
15193796
13120861	Jun 7, 2011	9115386
PCT/US2009/058562	Sep 28, 2009
13795739
61100503	Sep 26, 2008
61100995	Sep 29, 2008
61121844	Dec 11, 2008

Current U.S. Class:	1/1
Current CPC Class:	C12Q 2521/531 20130101; G01N 33/5308 20130101; G01N 2500/00 20130101; C12Q 1/6806 20130101; C12Q 2537/164 20130101; C12N 2501/71 20130101; C12Q 1/6869 20130101; C12Q 1/26 20130101; C12N 2501/70 20130101; G01N 33/5011 20130101; C12Q 1/6827 20130101; C12Q 2600/154 20130101; C12N 9/0071 20130101; C12Q 2522/10 20130101; C12Q 1/6827 20130101; C12Q 2537/164 20130101; C12Q 2522/10 20130101; C12Q 2521/531 20130101
International Class:	C12Q 1/6827 20060101 C12Q001/6827; C12Q 1/26 20060101 C12Q001/26; G01N 33/574 20060101 G01N033/574; G01N 33/50 20060101 G01N033/50; C12N 5/00 20060101 C12N005/00; C12N 9/02 20060101 C12N009/02; G01N 33/53 20060101 G01N033/53; C12Q 1/6806 20060101 C12Q001/6806; C12N 5/074 20060101 C12N005/074; C12N 15/873 20060101 C12N015/873; C12N 5/0783 20060101 C12N005/0783; C12Q 1/6869 20060101 C12Q001/6869; C12Q 1/6886 20060101 C12Q001/6886

Goverment Interests

GOVERNMENT SUPPORT

[0002] This invention was made with Government Support under Grant No: RO1 AI44432 and Grant No. KO8 HL089150 awarded by the National Institutes of Health (NIH). The Government has certain rights in the invention.

Claims

1. A method comprising: labeling a hydroxyl group on a hydroxymethylated residue in a nucleic acid to generate a labeled hydroxymethylated residue, wherein said nucleic acid is from an extracellular fluid sample; and sequencing said nucleic acid comprising said labeled hydroxymethylated residue.

2. The method of claim 1, wherein said extracellular fluid sample is from a mammal.

3. The method of claim 1, wherein said nucleic acid is a mammalian nucleic acid.

4. The method of claim 1, wherein said labeling is covalently labeling.

5. The method of claim 1, wherein said hydroxymethylated residue is a 5-hydroxymethylcytosine.

6. The method of claim 5, wherein said labeling comprises glycosylating said 5-hydroxymethylcytosine.

7. The method of claim 1, wherein said nucleic acid further comprises a methylated cytosine residue.

8. The method of claim 7, wherein said methylated cytosine residue is a 5-methylcytosine.

9. The method of claim 1, wherein said sequencing comprises high-throughput sequencing.

10. The method of claim 1, further comprising binding said labeled hydroxymethylated residue to a support.

11. The method of claim 10, wherein said binding occurs prior to said sequencing.

12. The method of claim 1, wherein said labelling comprises associating a label with said hydroxymethylated residue.

13. The method of claim 12, wherein said label comprises a sugar.

14. The method of claim 12, wherein said label comprises a bead.

15. A composition comprising a nucleic acid from an extracellular fluid sample, wherein said nucleic acid comprises a covalently labeled hydroxyl group on a hydroxymethylated residue.

16. The composition of claim 15, wherein said extracellular fluid sample is from a mammal.

17. The composition of claim 15, wherein said hydroxymethylated residue is a 5-hydroxymethylcytosine.

18. The composition of claim 15, wherein said extracellular fluid sample is isolated from a subject.

19. The composition of claim 15, wherein a label that is covalently associated with said hydroxyl group comprises a sugar.

20. The composition of claim 19, wherein said sugar comprises a modified glucose.

Description

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation application under 35 U.S.C. .sctn. 120 of co-pending U.S. application Ser. No. 16/169,801 filed Oct. 24, 2018, which is a continuation application under 35 U.S.C. .sctn. 120 of co-pending U.S. application Ser. No. 15/341,344 filed Nov. 2, 2016, which is a continuation application under 35 U.S.C. .sctn. 120 of co-pending U.S. application Ser. No. 15/193,796 filed Jun. 27, 2016, which is a continuation application under 35 U.S.C. .sctn. 120 of U.S. application Ser. No. 13/795,739 filed Mar. 12, 2013, now U.S. Pat. No. 9,447,452, issued Sep. 20, 2016, which is a continuation application under 35 U.S.C. .sctn. 120 of U.S. application Ser. No. 13/120,861 filed on Jun. 7, 2011, now U.S. Pat. No. 9,115,386, issued Aug. 25, 2015, which is a 35 U.S.C. .sctn. 371 National Phase Entry Application of International Application No. PCT/US2009/058562 filed Sep. 28, 2009, which designates the United States, and which claims benefit under 35 U.S.C. .sctn. 119(e) of U.S. Provisional Patent Application Ser. No. 61/100,503 filed Sep. 26, 2008, U.S. Provisional Patent Application Ser. No. 61/100,995 filed Sep. 29, 2008, and U.S. Provisional Patent Application Ser. No. 61/121,844 filed on Dec. 11, 2008, the contents of which are incorporated herein in their entirety by reference.

FIELD OF THE INVENTION

[0003] The present invention relates to enzymes with novel hydroxylase activity and methods for uses thereof, and methods of labeling and detecting methylated residues.

SEQUENCE LISTING

[0004] The instant application contains a Sequence Listing which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Mar. 24, 2011, is named 20110324_Seq_List_TXT_033393_063004_US.TXT and is 147,751 bytes in size.

BACKGROUND OF THE INVENTION

[0005] DNA methylation and demethylation play vital roles in various aspects of mammalian development, as well as in somatic cells during differentiation and aging. Importantly, these processes are known to become highly aberrant during tumorigenesis and cancer (A. Bird, Genes Dev 16: 6-21 (2002); W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006)).

[0006] In mammals, DNA methylation occurs primarily on cytosine in the context of the dinucleotide CpG. DNA methylation is dynamic during early embryogenesis and plays crucial roles in parental imprinting, X-inactivation, and silencing of endogenous retroviruses. Embryonic development is accompanied by major changes in the methylation status of individual genes, whole chromosomes and, at certain times, the entire genome (A. Bird, Genes Dev 16: 6-21 (2002); W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006)). For example, there is active genome-wide demethylation of the paternal genome shortly after fertilization (W. Mayer, Nature 403: 501-502 (2000); J. Oswald, Curr Biol 10: 475-478 (2000)). DNA demethylation is also an important mechanism by which germ cells are reprogrammed: the development of primordial germ cells (PGC) during early embryogenesis involves widespread DNA demethylation mediated by an active (i.e. replication-independent) mechanism (A. Bird, Genes Dev 16: 6-21 (2002); W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); P. Hajkova, Nature 452: 877-881 (2008); N. Geijsen, Nature 427: 148-154 (2004)).

[0007] De novo DNA methylation and demethylation mechanisms are also prominent in somatic cells during differentiation and aging. Expression of differentiation-specific genes in somatic cells is often accompanied by progressive DNA demethylation (W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007)). Tight regulation of DNA demethylation is a feature of pluripotent stem cells and progenitor cells in cellular differentiation pathways, which could contribute to the ability of these cells to self-renew, as well as give rise to daughter differentiating cells (W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006); S. Simonsson Nat Cell Biol 6: 984-990 (2004); R. Blelloch, Stem Cells 24: 2007-2013 (2006)).

[0008] It is believed that two important aspects of stem cell function, pluripotency and self-renewal ability, require proper DNA demethylation, and hence, the ability to manipulate these stem cell functions could be improved by controlled expression of enzymes in the DNA demethylation pathway. The epigenetic reprogramming of somatic nuclei during somatic cell nuclei transfer (SCNT) may also require proper control of DNA demethylation pathways (W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006); S. Simonsson (2004); R. Blelloch (2006)). For optimal efficiency of cloning by SCNT, regulated DNA demethylation may be required for nuclear reprogramming in the transferred somatic cell nucleus. Moreover, correct regulation of DNA demethylation could improve the efficiency with which induced pluripotent stem cells (iPS cells) are generated from adult fibroblasts or other somatic cells using pluripotency factors (K. Takahashi, Cell 126: 663-676 (2006); K. Takahashi, Cell 131: 861-872 (2007); J.Yu, Science 318: 1917-1920 (2007)).

[0009] DNA methylation processes are known to be highly aberrant in cancer. Overall, the genomes of cancer cells show a global loss of methylation, but additionally tumor suppressor genes are often silenced through increased methylation (L. T. Smith, Trends Genet 23: 449-456 (2007); E. N. Gal-Yam, Annu Rev Med 59: 267-280 (2008); M. Esteller, Nature Rev Cancer 8: 286-298 (2007); M. Esteller, N Engl J Med 358: 1148-1159 (2008)). Thus, oncogenesis is associated with aberrant regulation of the DNA methylation/demethylation pathway. Moreover, the self-renewing population of cancer stem cells can be characterized by high levels of DNA demethylase activity. Furthermore, in cultured breast cancer cells, gene expression in response to oestrogen has been shown to be accompanied by waves of apparent DNA demethylation and remethylation not coupled to replication (R. Metivier, Nature 452: 45-50 (2008); S. Kangaspeska, Nature 452:112-115 (2008)). It is presently unknown whether this apparent demethylation is due to full conversion of 5-methylcytosine (5mC) to cytosine, or whether it reflects a partial modification of 5-methylcytosine to a base not recognized by methyl-binding proteins or antibodies to 5-methylcytosine.

[0010] DNA demethylation can proceed by two possible mechanisms--a "passive" replication-dependent demethylation, or a process of active demethylation for which the molecular basis is still unknown. The passive demethylation mechanism is fairly well understood and is typically observed during cell differentiation, where it accompanies the increased expression of lineage-specific genes (D. U. Lee, Immunity, 16: 649-660 (2002)). Ordinarily, hemimethylated CpG's are generated during cell division as a result of replication of symmetrically-methylated DNA. These hemimethylated CpGs are recognized by the DNA methyltransferase (Dnmt) 1, which then transfers a methyl group to the opposing unmethylated cytosine to restore the symmetrical pattern of DNA methylation (H. Leonhardt, Cell 71: 865-873 (1992); L. S. Chuang, Science 277: 1996-2000 (1997)). If Dnmt1 activity or localization is inhibited, remethylation of the CpG on the opposite strand does not occur and only one of the two daughter strands retains cytosine methylation.

[0011] In contrast, enzymes with the ability to demethylate DNA by an active mechanism have not been identified as molecular entities. There is evidence that active DNA demethylation occurs in certain carefully-controlled circumstances, such as shortly after fertilization, and during early development of primordial germ cells (PGC) (W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006); P. Hajkova, Nature 452: 877-881 (2008); N. Geijsen, Nature 427: 148-154 (2004)). The mechanism of active demethylation is not known, though various disparate mechanisms have been postulated (reviewed in (H. Cedar, Nature 397: 568-569 (1999); S. K. Ooi, Cell 133:1145-1148 (2008)). However, no proteins with these postulated activities have been reliably identified to date.

[0012] Overall, identification of molecules that play a role in active demethylation and methods to screen for changes in the methylation status of DNA would be important for the development of novel therapeutic strategies that interfere with or induce demethylation and monitor changes in the methylation status of cellular DNA.

SUMMARY OF THE INVENTION

[0013] The present invention provides for novel methods for regulating and detecting the cytosine methylation status of DNA. The invention is based upon identification of a novel and surprising catalytic activity for the family of TET proteins, namely TET1, TET2, TET3, and CXXC4. The novel activity is related to the enzymes being capable of converting the cytosine nucleotide 5-methylcytosine into 5-hydroxymethylcytosine by hydroxylation.

[0014] The invention provides, in part, novel methods and reagents to promote the reprogramming of somatic cells into pluripotent cells, for example, by increasing the rate and/or efficiency by which induced pluripotent stem (iPS) cells are generated, and for modulating pluripotency and cellular differentiation status. The inventors have made the surprising discovery that members of the TET family of enzymes are highly expressed in ES cells and iPS cells, and that a gain in pluripotency is associated with induction of members of the TET family of enzymes and the presence of 5-hydroxymethylcytosine, while a loss of pluripotency suppresses TET family enzyme expression and results in a loss of 5-hydroxymethylcytosine. Thus, the TET family of enzymes provide a novel set of non-transcription factor targets that can be used to modulate and regulate the differentiation status of cells. Accordingly, the invention provides novel reagents, such as TET family enzymes, functional TET family derivatives, or TET catalytic fragments for the reprogramming of somatic cells into pluripotent stem cells. This novel and surprising activity of the TET family proteins, and derivatives thereof, could also provide a way of improving the function of stem cells generally--any kind of stem cell, not just iPS cells. Examples include, but are not limited to, neuronal stem cells used to create dopaminergic neurons administered to patients with Parkison's or other neurodegenerative diseases etc, muscle stem cells administered to patients with muscular dystrophies, skin stem cells useful for treating burn patients, and pancreastic islet stem cells administered to patients with type I diabetes.

[0015] The invention also provides novel methods of diagnosing and treating individuals at risk for or having a myeloid cancer, such as a myeloproliferative disorder (MPD), a myelodysplatic syndrome (MDS), an acute myeloid leukemia (AML), a systemic mastocytosis, and a chronic myelomonocytic leukemia (CMML). The inventors have made the surprising discovery that TET family mutations have significant and profound effects on the hydroxymethylation status of DNA in cells, and that such defects can be detected using the methods of the invention, such as bisulfate treatment of nucleic acids and antibody-based detection of cytosine methylene sulfonate.

[0016] One aspect of the present invention also provides a method for improving the generation of stable human regulatory Foxp3+ T cells, the method comprising contacting a human T cell with, or delivering to a human T cell, an effective 5-methylcytosine to 5-hydroxymethylcytosine converting amount of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytic fragment or combination thereof. In one embodiment, one uses the entire protein of TET1, TET2, TET3, and CXXC4, or a nucleic acid molecule encoding such protein.

[0017] In one embodiment, the method of generating human regulatory Foxp3+ T cells further comprises contacting the human T cell with a composition comprising cytokines, growth-factors, and activating reagents. In one embodiment, the composition comprising cytokines, growth factors, and activating reagents comprises TGF-.beta..

[0018] Accordingly, in one aspect, the invention provides a method for improving the efficiency or rate with which induced pluripotent stem (iPS) cells can be produced from adult somatic cells. In one embodiment of this aspect, the method comprises contacting a somatic cell with, or delivering to a somatic cell being treated to undergo reprogramming, an effective amount of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytic fragment, or combination thereof, in combination with one or more known pluripotency factors, in vitro or in vivo. In one embodiment, one uses the entire catalytically active TET1, TET2, TET3, or CXXC4 protein, or a nucleic acid encoding such protein. In one embodiment, only a functional TET1, TET2, TET3, or CXXC4 derivative is used. In one embodiment, only a TET1, TET2, TET3, or CXXC4 catalytic fragment is used.

[0019] In one embodiment of the aspect, reprogramming is achieved by delivery of a combination of one or more nucleic acid sequences encoding Oct-4, Sox2, c-Myc, and Klf4 to a somatic cell. In another embodiment, the nucleic acid sequences of Oct-4, Sox2, c-MYC, and Klf4 are delivered using a viral vector, such as an adenoviral vector, a lentiviral vector, or a retroviral vector.

[0020] Another object of the invention is to provide a method for improving the efficiency of cloning mammals by nuclear transfer or nuclear transplantation.

[0021] Accordingly, in one aspect, the invention provides a method for improving the efficiency of cloning mammals by nuclear transfer or nuclear transplantation, the method comprising contacting a nucleus isolated from a cell during a typical nuclear transfer protocol with an effective hydroxylation-inducing amount of a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytic fragment thereof.

[0022] The invention is based, in part, upon identification of a novel and surprising hydroxylase activity for the family of TET proteins, namely TET1, TET2, TET3, and CXXC4, wherein the hydroxylase activity converts the cytosine nucleotide 5-methylcytosine into 5-hydroxymethylcytosine. However, because 5-hydroxymethylcytosine is not recognized either by the 5-methylcytosine binding protein MeCP2 (V.Valinluck, Nucleic Acids Research 32: 4100-4108 (2004)), or specific monoclonal antibodies directed against 5-methylcytosine, novel and inventive methods to detect 5-hydroxymethylcytosine are required.

[0023] Accordingly, one object of the present invention is directed to methods for the detection of the 5-hydroxymethylcytosine nucleotide in a sample.

[0024] In one aspect of the invention, an assay based on thin-layer chromatography (TLC) is used to detect 5-hydroxymethyl cytosine in a sample. In other aspects, the methods described herein generally involve direct detection of 5-hydroxymethyl cytosine with agents that recognize and specifically bind to it. These methods can be used singly or in combination to determine the hydroxymethylation status of cellular DNA or sequence information. In one aspect, these methods can be used to detect 5-hydroxymethylcytosine in cell nuclei for the purposes of immunohistochemistry. In another aspect, these methods can be used to immunoprecipitate DNA fragments containing 5-hydroxymethylcytosine from crosslinked DNA by chromatin immunopreciptation (ChIP).

[0025] Accordingly, in one embodiment of the aspects described herein, an antibody or antigen-binding portion thereof that specifically binds to 5-hydroxymethylcytosine is provided. In one embodiment, a hydroxymethyl cytosine-specific antibody, or hydroxymethyl cytosine-specific binding fragment thereof is provided to detect a 5-hydroxymethylcytosine nucleotide. Levels of unmethylated cytosine, methylated cytosine and hydroxymethylcytosine can also be assessed by using proteins that bind CpG, hydroxymethyl-CpG, methyl-CpG, hemi-methylated CpG as probes. Examples of such proteins are known (Ohki et al., EMBO J 1999; 18: 6653-6661; Allen et al., EMBO J 2006; 25: 4503-4512; Arita et al., Nature 2008; doi:10.1038/nature07249; Avvakumov et al., Nature 2008; doi:10.1038/nature07273). In some embodiments of these aspects, it may be desirable to engineer the antibody or antigen-binding portion thereof to increase its binding affinity or selectivity for the 5-hydroxymethylcytosine target site. In one embodiment, an antibody or antigen-fragment thereof that specifically binds cytosine-5-methylsulfonate is used to detect a 5-hydroxymethylcytosine nucleotide in a sample.

[0026] In one aspect, the invention also provides methods for screening for signaling pathways that activate or inhibit TET family enzymes at the transcriptional, translational, or posttranslational levels.

[0027] In one aspect, one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, or DNA encoding one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, is used to generate nucleic acids containing hydroxymethylcytosine from nucleic acids containing 5-methylcytosine, or in an alternative embodiment other oxidized pyrimidines from appropriate free or nucleic acid precursors.

[0028] Yet another object of the present invention provides a kit comprising materials for performing methods according to the aspects of the invention as described herein.

[0029] In one embodiment, the kit comprises one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, or DNA encoding one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, to be contacted with or delivered to a cell, or plurality of cells.

[0030] In one embodiment, the kit comprises one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, and one or more compositions comprising cytokines, growth factors, and activating reagents for the purposes of generating stable human regulatory T cells. In one preferred embodiment, the compositions comprising cytokines, growth factor, and activating reagents, comprises TGF-.beta.. In a preferred embodiment, the kit includes packaging materials and instructions therein to use said kits.

[0031] In one embodiment, the kit comprises one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments, or DNA encoding one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments, and a combination of the nucleic acid sequences for Oct-4, Sox2, c-MYC, and Klf4, for the purposes of improving the efficiency or rate of the generation of induced pluripotent stem cells. In one embodiment, the nucleic acid sequences for Oct-4, Sox2, c-MYC, and Klf4 are delivered in a viral vector, selected from the group consisting of an adenoviral vector, a lentiviral vector, or a retroviral vector. In a further embodiment, the kit includes packaging materials and instructions therein to use said kit.

[0032] In one embodiment, the kit comprises one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, or DNA encoding one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, to be contacted with or delivered to a cell, or plurality of cells for the purposes of improving the efficiency of cloning mammals by nuclear transfer. In a further embodiment, the kit includes packaging materials and instructions therein to use said.

[0033] In some embodiments, the kit also comprises reagents suitable for the detection of the activity of one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, namely the production of 5-hydroxymethylcytosine from 5-methylcytosine. In one embodiment, the kit comprises an antibody or binding portion thereof or CxxC domain of a TET family protein or another DNA-binding protein that specifically binds to 5-hydroxymethylcytosine. In other embodiments, the kit includes packaging materials and instructions therein to use said kits. In other embodiments, recombinant TET proteins are provided in a kit to generate nucleic acids containing hydroxymethylcytosine from nucleic acids containing 5-methylcytosine or other oxidized pyrimidines from appropriate free or nucleic acid precursors.

[0034] The present invention, in part, relates to novel methods and compositions that enhance stem cell therapies. One aspect of the present invention includes compositions and methods of inducing stem cells to differentiate into a desired cell type by contacting with or delivering to, a stem cell one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, or nucleic acid encoding one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, or any combination thereof, to increase pluripotency of said cell being contacted. Such cells, upon contact with or delivery of one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, or DNA encoding one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof, or any combination thereof, can then be utilized for stem cell therapy treatments, wherein said contacted cell can undergo further manipulations to differentiate into a desired cell type for use in treatment of a disorder requiring cell or tissue replacement.

[0035] The present invention also provides, in part, improved methods for the treatment of cancer by the administration of compositions modulating catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments thereof. Also encompassed in the methods of the present invention are methods for screening for the identification of TET family modulators.

[0036] Accordingly, in one aspect, the invention provides a method for treating an individual with, or at risk for, cancer using a modulator(s) of the activity of the TET family of proteins. In one embodiment, the method comprises selecting a treatment for a patient affected by, or at risk for developing, cancer by determining the presence or absence of hypermethylated CpG island promoters of tumor suppressor genes, wherein if hypermethylation of tumor suppressor genes is detected, one administers to the individual an effective amount of a tumor suppressor activity reactivating catalytically active TET family enzyme, a functional TET family derivative, a TET catalytic fragment therein, or an activating modulator of TET family activity.

[0037] In one embodiment of this aspect, the treatment involves the administration of a TET family inhibiting modulator. In particular, the TET family inhibiting modulator is specific for TET1, TET2, TET3, or CXXC4. In one embodiment of the invention, the cancer being treated is a leukemia. In one embodiment, the leukemia is acute myeloid leukemia caused by the t(10:11)(q22:q23) Mixed Lineage Leukemia translocation of TET1.

[0038] In one embodiment of the present aspect, and other aspects described herein, the TET family targeting modulator is a TET family inhibitor. In one embodiment, the TET targeting treatment is specific for the inhibition of TET1, TET2, TET3, or CXXC4. For example, a small molecule inhibitor, a competitive inhibitor, an antibody or antigen-binding fragment thereof, or a nucleic acid that inhibits TET1, TET2, TET3, or CXXC4.

[0039] In one embodiment of the present aspect, and other aspects described herein, the TET family targeting modulator is a TET family activator. Alternatively and preferably, the TET targeting treatment is specific for the activation of TET1, TET2, TET3, or CXXC4. For example, a small molecule activator, an agonist, an antibody or antigen-binding fragment thereof, or a nucleic acid that activates TET1, TET2, TET3, or CXXC4.

[0040] Also encompassed in the methods and assays of the present invention are methods to screen for the identification of a TET family modulator for use in anti-cancer therapies. The method comprises a) providing a cell comprising a TET family enzyme, recombinant TET family enzyme thereof, TET family functional derivative, or TET family fragment thereof; b) contacting said cell with a test molecule; c) comparing the relative levels of 5-hydroxymethylated cytosine in cells expressing the TET family enzyme, recombinant TET family enzyme thereof, TET family functional derivative, or TET family fragment thereof in the presence of the test molecule, with the level of 5-hydroxymethylated cytosine expressed in a control sample in the absence of the test molecule; and d) determining whether or not the test molecule increases or decreases the level of 5-hydroxymethylated cytosine, wherein a statistically significant decrease in the level of 5-hydroxymethylated cytosine indicates the molecule is an inhibitor, and a statistically significant increase in the level of 5-hydroxymethylated cytosine indicates the molecule is an activator.

[0041] In another embodiment of this aspect, a method for high-throughput screening for anti-cancer agents is provided. The method comprises screening for and identifying TET family modulators. For example, providing a combinatorial library containing a large number of potential therapeutic compounds (potential modulator compounds). Such "combinatorial chemical libraries" are then screened in one or more assays to identify those library members (particular chemical species or subclasses) that display a desired characteristic activity (e.g., inhibition of TET family mediated 5-methylcytosine to 5-hydroxymethylcytosine conversion, or activation of TET family mediated 5-methylcytosine to 5-hydroxymethylcytosine conversion).

BRIEF DESCRIPTION OF DRAWINGS

[0042] FIG. 1 depicts the chemical structures for cytosine, 5-methylcytosine, 5-hydroxymethylcytosine, and 5-methylenesulfonate.

[0043] FIG. 2 depicts the conversion of 5-methylcytosine to 5-hydroxymethylcytosine that can be mediated by a catalytically active TET family enzyme, functional TET family derivative, or TET catalytic fragment.

[0044] FIGS. 3A-3B shows the various conversions mediated by enzymes encoded by the "T even" family of bacteriophages. FIGS. 3A-3B show that alpha-glucosyltransferases add glucose in the alpha configuration, and beta-glucosyltransferases add glucose in the beta configuration. FIGS. 3A-3B also show that beta-glucosyl-HMC-alpha-glucosyl-transferases add another glucose molecule in the beta-configuration to glucosylated 5-hydroxymethylcytosine.

[0045] FIG. 4 depicts a method by which methylcytosine and 5-hydroxymethylcytosine can be detected in, and isolated from nucleic acids for use in downstream applications.

[0046] FIG. 5 identifies the TET subfamily as having structural features characteristic of enzymes that oxidize 5-methylpyrimidines. FIG. 5 is a schematic diagram of the domain structure of the TET subfamily proteins, which includes the CXXC domain, the "C" or Cys-rich domain, and the 2OG-Fe(II) oxygenase domain containing a large, low complexity insert.

[0047] FIG. 6 demonstrates that overexpression of catalytically active TET subfamily proteins leads to decreased staining with a monoclonal antibody directed against 5-methylcytosine. FIG. 6 shows the relation between 5-methylcytosine staining and high expression of HA on a per-cell basis using the Cell Profiler program. FIG. 6 depicts that the mean intensity of 5-methylcytosine staining decreases in the presence of catalytically active full-length TET1 (FL) or the C+D domains of TET1 (C+D), but not when the catalytic activity is abrogated (FL mut or C+D mut). FIG. 6 expresses the 5-methylcytosine staining data of FIG. 6B normalized to the levels of the mock transfected sample.

[0048] FIGS. 7A-E demonstrate that TET1 expression leads to the generation of a novel nucleotide. FIG. 7 depicts line scans of labeled spots on a TLC plate, obtained using phosphorimaging of the results of assays to detect a novel nucleotide in genomic DNA of cells transfected with various constructs. FIG. 7A shows the line scan from mock transfected cells. FIG. 7B shows the line scan from cells transfected with catalytically active full-length TET1 (FL). FIG. 7C shows the line scan from cells transfected with catalytically inactive TET1 (FL mut). FIG. 7D shows the line scan from cells transfected with TET1 catalytic fragment (C+D). FIG. 7E shows the line scan from cells transfected with mutant TET1 catalytic fragment (C+D mut).

[0049] FIGS. 8A-8C demonstrate that TET1 expression leads to the generation of a novel nucleotide. FIG. 8 depicts line scans of labeled spots on a TLC plate, obtained using a phosphorimager, and shows that a novel nucleotide is only observed in DNA from cells transfected with the catalytically-active (C+D) fragment of TET1, as in FIG. 8B, and not in DNA from cells transfected with empty vector, as in FIG. 8A, or the catalytically-inactive mutant version of (C+D), as in FIG. 8C.

[0050] FIG. 9 identifies the novel nucleotide as 5-hydroxymethylcytosine, by determining that the unknown nucleotide is identical to authentic 5-hydroxymethylcytosine obtained from T4 phage grown in GalU-deficient E. Coli hosts. FIG. 9 depicts the results of LC/MS/MS runs using mass spectroscopy analysis with a collision energy of 15V.

[0051] FIG. 10 shows that a recombinant protein comprising the catalytic domain (C+D) of human TET1, expressed in baculovirus expression vector in insect Sf9 cells, is active in converting 5-methylcytosine to 5-hydroxylmethylcytosine in vitro, and depicts the relative activity of the recombinant C+D fragment of TET1 in the presence of various combinations of Fe2+, ascorbic acid, .alpha.-KG and EDTA.

[0052] FIG. 11A-11I demonstrates the physiological importance of TET1 in gene regulation. FIG. 11A shows that TET1 mRNA is strongly upregulated after 8 h of stimulation of mouse dendritic cells (DC) with LPS. FIGS. 11B-11I show the changes in Tet1, Tet2 and Tet3 mRNA levels in mouse ES cells that have been induced to differentiate by withdrawal of leukemia inhibitory factor (LIF) and addition of retinoic acid, and shows that Tet1,Tet2, and the positive control pluripotency gene Oct4 are downregulated (FIGS. 11B-11E, and FIGS. 11H-11I), whereas Tet3 is upregulated, during RA-induced differentiation (FIGS. 11F-11G).

[0053] FIG. 12A-12F shows the effect of Tet RNAi on ES cell lineage gene marker expression, using cells treated with Tet1,Tet2 and Tet3 siRNAs. FIG. 12A shows that Tet siRNA inhibits Tea expression. FIG. 12B shows the effect of siRNA-mediated Tea inhibition on Oct4.

[0054] FIG. 12C shows the effect of siRNA-mediated Tea inhibition on Sox2. FIG. 12D shows the effect of siRNA-mediated Tet1 inhibition on Nanog. FIG. 12E shows the effect of siRNA-mediated Tet1 inhibition on Cdx2. FIG. 12F shows the effect of siRNA-mediated Tea inhibition on Gata6.

[0055] FIG. 13A-C shows the identification of 5-hydromethylcytosine as the catalytic product of conversion from 5-methylcytosine by TET1 and detection of 5-hydromethylcytosine in the genome of mouse ES cells. FIG. 13A shows a schematic diagram of predicted domain structure of TET1, comprising the CXXC domain [Allen, M. D., et al., Embo J, 2006. 25(19): p. 4503-12], cysteine-rich and double-stranded beta-helix (DSBH) regions. FIG. 13B depicts the TLC data of cells overexpressing full-length (FL) TET1 or the predicted catalytic domain (CD) that reveals the appearance of an additional nucleotide species identified by mass spectrometry as 5-hydromethylcytosine. H1671Y, D1673A mutations at the residues predicted to bind Fe(II) abrogate the ability of TET1 to generate 5-hydromethylcytosine. FIG. 13C shows that 5-hydromethylcytosine is detected in the genome of mouse ES cells.

[0056] FIG. 14A-B depicts the role of murine Tea and Tet2 in the catalytic generation of 5-hydromethylcytosine in ES cells. FIG. 14A depicts that the mouse genome expresses three family members--Tet1, Tet2 and Tet3--that share significant sequence homology with the human homologs (Lorsbach, R. B., et al., Leukemia, 2003. 17(3): p. 637-41). Tet1 and Tet3 encode within their first conserved coding exon the CXXC domain. FIG. 14B shows that mouse ES cells express high levels of Tea and Tet2, which can be specifically depleted with RNAi.

[0057] FIG. 15A-D shows the changes in Tet family gene expression that occur in mouse ES cells upon differentiation. FIG. 15A shows that the mRNA levels of Tea rapidly decline upon LIF withdrawal. FIG. 15B shows that the mRNA levels of Tet2 rapidly decline upon LIF withdrawal. FIG. 15C demonstrates that Tet3 levels remain low upon LIF withdrawal but increase 10-fold with addition of retinoic acid. FIG. 15D shows that the mRNA levels of Oct4 rapidly decline upon LIF withdrawal, as expected.

[0058] FIG. 16A-E shows that Tet1, Tet2 and 5-hydromethylcytosine are associated with pluripotency. FIGS. 16A-16C show the loss of pluripotency induced by RNAi-mediated depletion of Oct4 potently suppresses Tea (FIG. 16A) and Tet2 expression (FIG. 16B) and upregulates Tet3 (FIG. 16C). Sox2 RNAi was found to cause a similar, though weaker, effect as Oct4 RNAi, and Nanog RNAi had almost no effect. FIGS. 16D-16E show that the gain of pluripotency in iPS clones derived from mouse tail-tip fibroblasts (TTF) by viral transduction of Oct4, Sox2, Klf4 and c-Myc is associated with up-regulation of Tet1 (FIG. 16D) and Tet2 (FIG. 16E) and appearance of 5-hydromethylcytosine in the genome.

[0059] FIG. 17A-I shows the effect of Tet knockdown on ES cell pluripotency and differentiation genes. FIGS. 17A-17C show that RNAi-mediated knockdown of each Tet member does not affect expression of the pluripotency factors Oct4 (FIG. 17A), Sox2 (FIG. 17B) and Nanog (FIG. 17C). FIGS. 17D-17F demonstrate that RNAi-depletion of Tet1, but not of Tet2 or Tet3, increases the expression of the trophectodermal genes Cdx2 (FIG. 17D), Eomes (FIG. 17E) and Hand1 (FIG. 17F). FIGS. 17G-17I demonstrate that RNAi-depletion of Tet family members produces small insignificant changes in expression of extraembryonic endoderm, mesoderm and primitive ectoderm markers Gata6 (FIG. 17G), Brachyury (FIG. 17H), and Fgf5 (FIG. 17I).

[0060] FIG. 18 shows the theoretical vs. quantified by bisulfite sequencing amount of 5-hydromethylcytosine present in samples in the absence or presence of various TET family siRNA inhibitors.

[0061] FIG. 19 illustrates an assay to detect cytosine methylene sulfonate from bisulfite treated samples.

[0062] FIGS. 20A-20B compare the correlation between dot intensity and the amount of cytosine methylene sulfonate (FIG. 20A) or 5-hydromethylcytosine (FIG. 20B) present in a sample.

[0063] FIGS. 21A-21B show the result of analyses of 5-hydromethylcytosine present in samples obtained from patients diagnosed with cancer with or without mutations in TET2, by analysis of dot 3 (FIG. 21A) and dot 4 (FIG. 21B) from TLC plates.

[0064] FIG. 22A-B depicts real-time PCR analyses of various oligonucloetides in the presence or absence of bisulfite treatment. FIG. 22A shows the amplification plots under the various experimental conditions, and FIG. 22B summarizes that data expressed as change in the cycle threshold (Ct).

[0065] FIG. 23 depicts the reaction of sodium bisulfite with cytosine, 5-methylcytosine, and 5-hydroxymethylcytosine.

[0066] FIGS. 24A-24B shows the sequences (SEQ ID NO: 18 and SEQ ID NO: 19, respectively) and primers (SEQ ID NO: 8 and SEQ ID NO: 10, respectively) used to determine whether cytosine methylene sulfonate impedes PCR amplification of DNA.

[0067] FIG. 25 shows the results of real-time PCR analysis of various oligonucleotides before and after bisulfite treatment, expressed as a change in cycle threshold.

[0068] FIGS. 26A-26C shows the sequences (SEQ ID NOS 20-22, respectively) and primers (SEQ ID NOS 11-16, respectively) used to sequence bisulfite treated genomic DNA from HEK293T cells and the sequences and primers used to sequence the bisulfite treated MLH amplicon. FIG. 26A depicts the sequence of the no CG amplicon (SEQ ID NO:20); FIG. 26B shows the sequence of the MLH1 amplicon 1 (SEQ ID NO:21), and FIG. 26C (SEQ ID NO:22) shows the sequence of the MLH1 amplicon 2.

[0069] FIG. 27A-27C depicts the line traces of bisulfite treated genomic DNA in the absence or presence of a TET1 catalytic domain. FIG. 27A shows the line traces of MspI sites in the presence or absence of TET1. FIG. 27B shows the line traces of Tag.sup..alpha.I sites in the presence or absence of TET1. FIG. 27C compares the mean cycle threshold for various amplicons in the absence or presence of TET1 treatment.

[0070] FIG. 28A depicts the generation of abasic sites from 5-hydroxymethylcytosine by glycosylases. FIG. 28B shows the specific reaction of abasic sites with aldehyde reactive probes.

[0071] FIG. 29A shows the impact of TET1 expression on aldehyde density. FIG. 29B compares the impact of co-expression of MD4 on abasic sites and aldehyde density.

[0072] FIG. 30 shows the glucosylation of 5-hydroxymethylcytosine by .beta.-glucosyltransferase.

[0073] FIG. 31 shows a schematic diagram depicting how the glucosylation of 5-hydroxymethylcytosine can be labeled, using aldehye quantification.

[0074] FIG. 32 compares aldehye quantification of DNA under various conditions, including in the presence of sodium bisulfate treatment and sodium periodate treatment.

[0075] FIG. 33 quantifies the amount of 5-hydromethylcytosine present in samples obtained from patients diagnosed with cancer with or without mutations in TET2.

[0076] FIG. 34 shows a schematic depicting the sites of various mutations found in TET2.

[0077] FIGS. 35A-B shows the expression of Tet2 in various myeloid and lymphoid lineage populations isolated from bone marrow and thymus. FIG. 35A shows Tet2 expression in myeloid lineage subpopulations and FIG. 35B shows Tet2 expression in various lymphoid lineage subpopulations.

[0078] FIG. 36A-B shows the expression of Tet1 in various myeloid and lymphoid lineage populations isolated from bone marrow and thymus. FIG. 36A shows Tet1 expression in myeloid lineage subpopulations and FIG. 36B shows Tet1 expression in various lymphoid lineage subpopulations.

[0079] FIG. 37A-B shows the reduction of TET2 mRNA and protein expression in cells upon treatment with siRNA sequence directed against TET2. FIG. 37A shows the reduction in mRNA expression, and FIG. 37B shows the reduction in Myc-tagged Tet2 protein.

[0080] FIG. 38 illustrates a potential link between abnormalities in energy metabolism and tumor suppression mediated by the TET family of enzymes.

DETAILED DESCRIPTION OF THE INVENTION

[0081] The present invention provides novel and improved methods for modulating pluripotency and differentiation status of cells, novel methods for reprogramming somatic cells, novel research tools for use in the modulation of cellular gene transcription and methylation studies, novel methods for detecting and isolating 5-methylcytosine and 5-hydroxymethylcytosine in nucleic acids, and novel methods for cancer treatment and screening methods therein.

[0082] The invention is based upon identification of a novel and surprising enzymatic activity for the family of TET proteins, namely TET1, TET2, TET3, and CXXC4. This novel enzymatic activity relates to the conversion of the cytosine nucleotide 5-methylcytosine into 5-hydroxymethylcytosine via a process of hydroxylation by the TET family of proteins. Accordingly, the invention provides novel tools for regulating the DNA methylation status of mammalian cells. Specifically, these enzymatic activities can be harnessed in methods for use in human Foxp3+ regulatory T cell generation, in the reprogramming of somatic cells, in stem cell therapy, in cancer treatment, in the modulation of cellular transcription, and as research tools for DNA methylation studies.

[0083] DNA methylation is catalyzed by at least three DNA methyltransferases (DNMTs) that add methyl groups to the 5' portion of the cytosine ring to form 5' methyl-cytosine. During S-phase of the cell cycle, DNMTs, found at the replication fork, copy the methylation pattern of the parent strand onto the daughter strand, making methylation patterns heritable over many generations of cell divisions. In mammalian genomes, this modification occurs almost exclusively on cytosine residues that precede guanine--i.e., CpG dinucleotides. CpGs occur in the genome at a lower frequency than would be statistically predicted because methylated cytosines can spontaneously deaminate to form thymine. This substitution is not efficiently recognized by the DNA repair machinery, so C-T mutations accumulate during evolution. As a result, 99% of the genome is CpG depleted. The other 1% is composed of discrete regions that have a high (G+C) and CpG content, and are known as CpG islands.

[0084] CpG islands are mostly found at the 5' regulatory regions of genes, and 60% of human gene promoters are embedded in CpG islands. Although most of the CpG dinucleotides are methylated, the persistence of CpG islands suggests that they are not methylated in the germ line and thus did not undergo CpG depletion during evolution. Around 90% of CpG islands are estimated to be unmethylated in somatic tissues, and the expression of genes that contain CpG islands is not generally regulated by their methylation. However, under some circumstances CpG islands do get methylated, resulting in long-term gene silencing.

[0085] Regulated DNA methylation is essential for normal development, as mice lacking any one of the enzymes in these pathways die in the embryonic stages or shortly after birth. As a silencing mechanism, DNA methylation plays a role in the normal transcriptional repression of repetitive and centromeric regions, X chromosome inactivation in females, and genomic imprinting. The silencing mediated by DNA methylation occurs in conjunction with histone modifications and nucleosome remodeling, which together establish a repressive chromatin structure. In addition, it has been shown that many cancerous cells possess aberrant patterns of DNA methylation.

[0086] As 5-hydroxymethylcytosine is not recognized by the 5-methylcytosine-binding protein MeCP2 (V. Valinluck, Nucleic Acids Research 32: 4100-4108 (2004)), without wishing to be limited by a theory, conversion of 5-methylcytosine into 5-hydroxymethylcytosine could result in loss of binding of MeCP2 and other 5-methylcytosine-binding proteins (MBDs) to DNA, and interfere with chromatin condensation, and therefore result in loss of gene silencing dependent on MBDs.

[0087] Additionally, because 5-hydroxymethylcytosine is not recognized by DNA methyltransferase 1 (Dnmt1), which remethylates hemi-methylated regions of DNA, particularly during DNA replication (V. Valinluck and L. C. Sowers, Cancer Research 67: 946-950 (2007)), the oxidative conversion of 5-methylcytosine to 5-hydroxymethylcytosine would result in net loss of 5-methylcytosine in favor of unmethylated cytosine during successive cycles of DNA replication, therefore facilitating the "passive" demethylation of DNA.

[0088] Finally, conversion of 5-methylcytosine to 5-hydroxymethylcytosine could also lie in the pathway of "active" demethylation if one postulates, without wishing to be bound by a theory, that a specific DNA repair mechanism exists that recognizes 5-hydroxymethylcytosine and replaces it with cytosine. Without wishing to be limited by a theory, the DNA repair mechanisms that could be utilized for recognition of 5-hydroxymethylcytosine include, but are not limited to: direct repair (B. Sedgwick, DNA Repair (Amst).6(4):429-42 (2007)), base excision repair (M. L. Hedge, Cell Res.18(1):27-47 (2008)), nucleotide incision repair (L. Gros, Nucleic Acids Res.32(1):73-81 (2004)), nucleotide excision repair (S. C. Shuck, Cell Res.18(1):64-72 (2008)), mismatch repair (G. M. Li, Cell Res. 18(1):85-98 (2008)), homologous recombination, and non-homologous end-joining (M. Shrivastav, Cell Res.18(1):134-47 (2008)).

[0089] We identified a novel enzymatic activity for the TET family of proteins, namely that the TET family of proteins mediate the conversion of 5-methylcytosine in cellular DNA to yield 5-hydroxymethylcytosine by hydroxylation.

Methods of Improving the Reprogramming of Somatic Cells for the Production of Induced Pluripotent Stem Cells and for Use in Somatic Nuclear Cell Transfer

[0090] The present invention provides, in part, improved methods for the reprogramming of somatic cells into pluripotent stem cells by the administration of a composition containing at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof.

[0091] The data demonstrate a novel catalytic activity for the TET family of enzymes, specifically the ability to hydroxylate 5-methylcytosine (5mC) to an intermediate, 5-hydroxymethylcytosine (HMC), and methods wherein to detect this modification.

[0092] Accordingly, in one aspect, the invention provides a method for improving the efficiency or rate with which induced pluripotent stem (iPS) cells can be produced from adult somatic cells, comprising contacting a somatic cell being treated to undergo reprogramming with or delivering to a somatic cell being treated to undergo reprogramming an effective amount of one or more catalytically active TET family enzyme, one or more functional TET family derivatives, one or more TET catalytic fragments therein, or a combination thereof, in combination with one or more known pluripotency factors, in vitro or in vivo. In one embodiment, one uses at least one entire catalytically active TET1, TET2, TET3, or CXXC4 protein, or a nucleic acid encoding such protein. In one embodiment, one uses at least one functional TET1, TET2, TET3, or CXXC4 derivative, or at least one nucleic acid encoding such functional derivatives. In one embodiment, one uses at least one TET1, TET2, TET3, or CXXC4 catalytically active fragment or a nucleic acid encoding at least one such catalytically active fragment.

[0093] In another aspect, the invention provides a method for improving the efficiency or rate with which induced pluripotent stem (iPS) cells can be produced from adult somatic cells, comprising contacting a somatic cell being treated to undergo reprogramming with, or delivering to, a somatic cell being treated to undergo reprogramming, an effective amount of one or more catalytically active TET family enzymes, one or more functional TET family derivatives, or one or more TET catalytic fragments, and an effective amount of one or more inhibitors of TET family catalytic activity, in combination with one or more known pluripotency factors, in vitro or in vivo. In one embodiment, the catalytically active TET family enzyme, functional TET family derivatives, or TET catalytic fragments, is a catalytically active TET1 and/or TET2 enzyme, and/or functional TET1 and/or TET2 derivative, and/or a TET1 and/or TET2 catalytic fragment, and the inhibitor of TET family catalytic activity is a TET3 inhibitor that is specific for only TET3. In one embodiment, the inhibitor of TET3 is an siRNA or shRNA sequence specific for inhibiting TET3.

[0094] The TET family of proteins as referred to in this aspect, and all aspects and embodiments described herein in this application, comprises the nucleotide sequences of TET1, TET2, TET3, and CXXC4 with GenBank nucleotide sequence IDs: GeneID: NM_030625.2 (TET1) (SEQ ID NO:23), GeneID: NM_001127208.1 (TET2) (SEQ ID NO:24), GeneID: NM_144993.1 (TET3) (SEQ ID NO:25), and GeneID: NM_025212.1 (CXXC4) (SEQ ID NO:26) and the protein sequences of TET1, TET2, and CXCC4 with GenBank peptide sequence IDs: NP_085128 (TET1) (SEQ ID NO:27), NP_001120680 (TET2) (SEQ ID NO:28), and NP_079488 (CXXC4) (SEQ ID NO:29).

[0095] As used herein, a "TET family protein" refers to the sequences of human TET1, TET2, TET3, and CXXC4, and to proteins having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or more, homology to human TET1, TET2, or TET3, and displaying a catalytic (hydroxylating) activity of the TET family of proteins. A "functional TET family derivative", as used herein, refers to a protein comprising a signature sequence, SEQ ID NO:1, from the catalytic site of the TET family proteins and having a catalytic activity of TET proteins.

SEQ ID NO: 1: GVAzAPxHGSzLIECAbxEzHATT

[0096] where x=any residue, z=aliphatic residue in the group (L, I, V) and b=basic residue in the group (R, K)

[0097] A "TET catalytically active fragment", as referred to herein, comprises a protein having a catalytic activity of TET family proteins and a sequence meeting one of the following criteria: (1) Identical to the sequence of SEQ ID NO: 2 or one of the empirically verified catalytic fragments; or having homology of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or more, to such a sequence; or (2) incorporating a linear succession of the TET signature sequences of SEQ ID NO: 2, SEQ ID NO: 3, and SEQ ID NO: 4 in a defined order, that are predicted to form the core of the beta-stranded double helix catalytic domain; or having homology of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or more, to such a linear succession of TET family signature sequences, and preserving the linear order thereof.

SEQ ID NO: 3: PFxGxTACxDFxAHxHxDxxN-[X].sub.5-TxVxTL-[X].sub.13-DEQxHVLPxY-[X].sub.0-78- 0-GVAxAPxHGSxLIECAxxExHATT-[X].sub.11-RxSLVxYQH, wherein X is any amino acid residue. SEQ ID NO: 4: PFxGxTACxDFxAHxHxDxxN-[X].sub.5-TxVxTL-[X].sub.12-DEQxHVLPxY-[X].sub.0-78- 0-GVAxAPxHGSxLIECAxxExHAT-[X].sub.11-RxSLVxYQH, wherein X is any amino acid residue. SEQ ID NO: 5: PFxGxTACxDFxxHxHxDxxN-[X].sub.2-11-TxVxTL-[X].sub.9-13-DEQxHVLPxY-[X].sub- .0-780-GVAxAPxHGSxLIECAxxExHATT-[X].sub.5-13-RxSLVxYQH, wherein X is any amino acid residue.

[0098] The human TET3 peptide sequence, as described herein, comprises: SEQ ID NO: 6, as well as that described by GenBank Peptide ID: NP_659430.

[0099] In connection with contacting a cell with, or delivering to, a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment therein, the phrase "increasing the efficiency" of induced pluripotent stem (iPS) cell production indicates that the proportion of reprogrammed cells in a given population is at least 5% higher in populations treated with a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment therein, than a comparable, control population, wherein no catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment thereof, is present. In one embodiment, the proportion of reprogrammed cells in a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment therein treated cell population is at least 10% higher, at least 15% higher, at least 20% higher, at least 25% higher, at least 30% higher, at least 35% higher, at least 40% higher, at least 45% higher, at least 50% higher, at least 55% higher, at least 60% higher, at least 65% higher, at least 70% higher, at least 75% higher, at least 80% higher, at least 85% higher, at least 90% higher, at least 95% higher, at least 98% higher, at least 1-fold higher, at least 1.5-fold higher, at least 2-fold higher, at least 5-fold higher, at least 10 fold higher, at least 25 fold higher, at least 50 fold higher, at least 100 fold higher, at least 1000-fold higher, or more than a control treated cell population of comparable size and culture conditions. The phrase "control treated cell population of comparable size and culture conditions" is used herein to describe a population of cells that has been treated with identical media, viral induction, nucleic acid sequences, temperature, confluency, flask size, pH, etc., with the exception of the addition of the catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment therein.

[0100] By the phrase "increasing the rate" of iPS cell production is meant that the amount of time for the induction of iPS cells is at least 6 hours less, at least 12 hours less, at least 18 hours less, at least 1 day less, at least 2 days less, at least 3 days less, at least 4 days less, at least 5 days less, at least 6 days less, at least 1 week less, at least 2 weeks less, at least 3 weeks less, or more, in the presence of a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment therein, than in a control treated population of comparable size and culture conditions.

[0101] The production of iPS cells, as practiced by those skilled in the art, is generally achieved by the introduction of nucleic acid sequences encoding stem cell-associated genes into an adult, somatic cell. In general, these nucleic acids are introduced using retroviral vectors and expression of the gene products results in cells that are morphologically and biochemically similar to pluripotent stem cells (e.g., embryonic stem cells). This process of altering a cell phenotype from a somatic cell phenotype to a stem cell-like phenotype is referred to herein as "reprogramming".

[0102] Reprogramming can be achieved by introducing a combination of stem cell-associated genes including, or pluripotency inducing factors, such as Oct3/4 (Pouf51), Sox1, Sox2, Sox3, Sox 15, Sox 18, NANOG, Klf1, Klf2, Klf4, Klf5, c-Myc, 1-Myc, n-Myc and LIN28. In general, successful reprogramming is accomplished by introducing Oct-3/4, a member of the Sox family, a member of the Klf family, and a member of the Myc family to a somatic cell (K. Takahashi, Cell 126: 663-676 (2006); K. Takahashi, Cell 131: 861-872 (2007); J.Yu, Science 318: 1917-1920 (2007)).

[0103] Oct-3/4 (Pou5f1):

[0104] Oct-3/4 is one of the family of octamer ("Oct") transcription factors, and plays a crucial role in maintaining pluripotency. The absence of Oct-3/4 in Oct-3/4+ cells, such as blastomeres and embryonic stem cells, leads to spontaneous trophoblast differentiation, and presence of Oct-3/4 thus gives rise to the pluripotency and differentiation potential of embryonic stem cells.

[0105] Sox Family:

[0106] The Sox family of genes is associated with maintaining pluripotency similar to Oct-3/4, although it is also associated with multipotent and unipotent stem cells in contrast with Oct-3/4, which is exclusively expressed in pluripotent stem cells. While Sox2 was the initial gene used for induction by Yamanaka et al., Jaenisch et al., and Thomson et al., other genes in the Sox family have been found to work as well in the induction process. Sox1 yields iPS cells with a similar efficiency as Sox2, and genes Sox3, Sox15, and Sox18 also generate iPS cells, although with decreased efficiency.

[0107] Klf Family:

[0108] Klf4 of the Klf family of genes was initially identified by Yamanaka et al. and confirmed by Jaenisch et al. as a factor for the generation of mouse iPS cells and was demonstrated by Yamanaka et al. as a factor for generation of human iPS cells. However, Thomson et al. reported that Klf4 was unnecessary for generation of human iPS cells and in fact failed to generate human iPS cells. Klf2 and Klf4 have been found to be factors capable of generating iPS cells, and related genes Klf1 and Klf5 did as well, although with reduced efficiency.

[0109] Myc Family:

[0110] The Myc family of genes are proto-oncogenes implicated in cancer. Yamanaka et al. and Jaenisch et al. demonstrated that c-myc is a factor implicated in the generation of mouse iPS cells and Yamanaka et al. demonstrated it was a factor implicated in the generation of human iPS cells. However, Thomson et al., Yamanaka et al., and unpublished work by Johns Hopkins University have reported that c-myc is unnecessary for generation of human iPS cells. N-myc and L-myc have been identified to induce instead of c-myc with similar efficiency.

[0111] Nanog:

[0112] In embryonic stem cells, Nanog, along with Oct-3/4 and Sox2, is necessary in promoting pluripotency. Yamanaka et al. has reported that Nanog is unnecessary for induction although Thomson et al. has reported it is possible to generate iPS cells with Nanog as one of the factors.

[0113] LIN28:

[0114] LIN28 is an mRNA binding protein expressed in embryonic stem cells and embryonic carcinoma cells associated with differentiation and proliferation. Thomson et al. demonstrated it is a factor in iPS generation, although it is unnecessary.

[0115] In one embodiment of the methods described herein, reprogramming is achieved by delivery of Oct-4, Sox2, c-Myc, Klf4, or any combination thereof, to a somatic cell (e.g., a fibroblast). In one embodiment of the methods described herein, reprogramming is achieved by delivery of at least one of Sox-2, Oct-4, Klf-4, c-Myc, Nanog, or Lin-28 to a somatic cell (e.g., a fibroblast). In one embodiment, reprogramming is achieved by delivery of the following four transcription factors, Sox-2, Oct-4, Klf-4, and c-Myc, to a somatic cell. In one embodiment, reprogramming is achieved by delivery of three of the following four transcription factors: Sox-2, Oct-4, Klf-4, and c-Myc, to a somatic cell. In one embodiment, reprogramming is achieved by delivery of two of the following four transcription factors: Sox-2, Oct-4, Klf-4, and c-Myc, to a somatic cell. In one embodiment, reprogramming is achieved by delivery of one of the following four transcription factors: Sox-2, Oct-4, Klf-4, and c-Myc to a somatic cell. In one embodiment, reprogramming of a somatic cell is achieved in the absence of the following four transcription factors: Sox-2, Oct-4, Klf-4, and c-Myc.

[0116] In one embodiment, reprogramming is achieved by delivery of the following four transcription factors, Sox-2, Oct-4, Nanog, and Lin-28, to a somatic cell. In one embodiment, reprogramming is achieved by delivery of any three of the following four transcription factors: Sox-2, Oct-4, Nanog, or Lin-28 to a somatic cell. In one embodiment, reprogramming is achieved by delivery of two of the following four transcription factors: Sox-2, Oct-4, Nanog, or Lin-28 to a somatic cell. In one embodiment, reprogramming is achieved by delivery of one of the following four transcription factors: Sox-2, Oct-4, Nanog, or Lin-28 to a somatic cell. In one embodiment, reprogramming is achieved in the absence of the following four transcription factors: Sox-2, Oct-4, Nanog, or Lin-28.

[0117] In one embodiment, the nucleic acid sequences of one or more of Oct-4, Sox2, c-MYC, Klf4, Nanog, or Lin-28 are delivered using a viral vector or a plasmid. The viral vector can be, for example, a retroviral vector, a lentiviral vector or an adenoviral vector. In some embodiments, the viral vector is a non-integrating viral vector. In one embodiment, reprogramming is achieved by introducing more than one non-integrating vector (e.g., 2, 3, 4, or more vectors) to a cell, wherein each vector comprises a nucleic acid sequence encoding a different reprogramming factor (e.g., Oct2, Sox2, c-Myc, Klf4, etc). In an alternate embodiment, more than one reprogramming factor is encoded by a non-integrating vector and expression of the reprogramming factors is controlled using a single promoter, polycistronic promoters, or multiple promoters.

[0118] Non-viral approaches to the introduction of nucleic acids known to those skilled in the art can also be used with the methods described herein. Alternatively, activation of the endogenous genes encoding such transcription factors can be used. In another embodiment, one or more proteins that reprogram the cell's differentiation state can be introduced to the cell. For example, proteins such as c-Myc, Oct4, Sox2 and/or Klf4 can be introduced to the cell through the use of HIV-TAT fusion. The TAT polypeptide has characteristics that permit it to penetrate the cell, and has been used to introduce exogenous factors to cells (see, e.g., Peitz et al., 2002, Proc. Natl. Acad. Sci. USA. 99:4489-94). This approach can be employed to introduce factors for reprogramming the cell's differentiation state. While it is understood that reprogramming is usually accomplished by viral delivery of stem-cell associated genes, it is also contemplated that reprogramming can be induced using other delivery methods, such as delivery of the native, purified proteins (K. Takahashi, Cell 126: 663-676 (2006); K. Takahashi, Cell 131: 861-872 (2007); J.Yu, Science 318: 1917-1920 (2007)). In some embodiments, the reprogramming can be induced using plasmid delivery methods, such as described in Okita K, et al., 2008 Nov. 7; 322(5903):949-53. In other embodiments, reprogramming is achieved by the use of recombinant proteins, such as via a repeated treatment of the cells with certain proteins channeled into the cells to be reprogrammed via poly-arginine anchors. Such cells are termed herein as "protein-induced pluripotent stem cells" or piPS cells, as described in H. Zhou et al., Cell Stem Cell, 4 (5), 8 May 2009, p. 381-384.

[0119] The efficiency of reprogramming (i.e., the number of reprogrammed cells) can be enhanced by the addition of various small molecules as shown by Shi, Y., et al (2008) Cell-Stem Cell 2:525-528, Huangfu, D., et al (2008) Nature Biotechnology 26(7):795-797, Marson, A., et al (2008) Cell-Stem Cell 3:132-135, which are incorporated herein by reference in their entirety. It is contemplated that the methods to increase efficiency or rate of iPS cell formation through the novel catalytic activity of one or more members of the TET family described herein can also be used in combination with a single small molecule (or a combination of small molecules) that enhances the efficiency of induced pluripotent stem cell production. Some non-limiting examples of agents that enhance reprogramming efficiency include soluble Wnt, Wnt conditioned media, BIX-01294 (a G9a histone methyltransferase), PD0325901 (a MEK inhibitor), DNA methyltransferase inhibitors, histone deacetylase (HDAC) inhibitors, valproic acid, 5'-azacytidine, dexamethasone, suberoylanilide, hydroxamic acid (SAHA), trichostatin (TSA), and inhibitors of the TGF-.beta. signaling pathway, among others.

[0120] It is thus contemplated that inhibitors can be used alone or in combination with other small molecule(s) to replace one or more of the reprogramming factors used in the methods to improve the efficiency or rate of iPS cell production by modulating TET family enzymatic activity as described. In some embodiments, one or more small molecules or other agents are used in the place of (i.e. to replace or substitute) exogenously supplied transcription factors, either supplied as a nucleic acid encoding the transcription factor or a protein or polypeptide of the exogenously supplied transcription factor, which are typically used in the production of iPS cells. As discussed herein, "exogenous" or "exogenous supplied" refer to addition of a nucleic acid encoding a reprogramming transcription factor (e.g. a nucleic acid encoding Sox2, Klf4, Oct4, c-Myc, Nanog, or Lin-28) or a polypeptide of a reprogramming factor (e.g. proteins of Sox2, Klf4, Oct4, c-Myc, Nanog, or Lin-28 or biologically active fragments thereof) which is normally used in production of iPS cells. In some embodiments, reprogramming of a cell is achieved by contacting a cell with one or more agents, such as small molecules, where the agent (i.e. small molecules) replaces the need to reprogram the differentiated cell with one or more of exogenous Sox2, Klf4, Oct4, c-Myc, Nanog, or Lin-28.

[0121] In one embodiment, replacement of exogenous transcription factor Sox2 is by an agent which is an inhibitor of the TGF.beta. signalling pathway, such as a TGFBR1 inhibitor. In other embodiments, a cell to be reprogrammed is contacted with small molecules or other agents which replace exogenous supplied Oct-4 and Klf-4.

[0122] Thus, the methods described herein include methods for producing reprogrammed cells from differentiated cells (i.e. from fibroblasts e.g., MEFs) without using exogenous oncogenes, for example c-Myc or oncogenes associated with introduction of nucleic acid sequences encoding one or more of the transcription factors selected from Sox-2, Oct-4 or Klf-4 into the differentiated cell to be reprogrammed (i.e. viral oncogenes). For example, chemically mediated reprogramming of differentiated cells makes it possible to create reprogrammed cells (i.e. iPS cells) from small numbers of differentiated cells, such as those obtained from hair follicle cells from patients, blood samples, adipose biopsy, fibroblasts, skin cells, etc). In some embodiments, the addition of small molecule compounds allows successful and safe generation of reprogrammed cells (i.e. iPS cells) from human differentiated cells, such as skin biopsies (fibroblasts or other nucleated cells) as well as from differentiated cells from all and any other cell type. In one embodiment, an agent which is an agonist of MEK or Erk cell signalling replaces exogenous transcription factor Klf-4. Examples of such agonists include prostaglandin J2, an inhibitor of Ca2+/calmodulin signaling, EGF receptor tyrosine kinase inhibitor, or HDBA. In one embodiment, exogenous transcription factor Oct-4 is replaced by an agent that is an inhibitor of Na2+ channels, an agonist of ATP-dependent potassium channels, or an agonist of MAPK signalling pathways.

[0123] In general, iPS cells are produced by viral or non-viral delivery of said stem cell-associated genes into adult somatic cells (e.g., fibroblasts). While fibroblasts are preferred, essentially any primary somatic cell type can be used. Some non-limiting examples of primary somatic cells include, but are not limited to, epithelial cells, endothelial cells, neuronal cells, adipose cells, cardiac cells, skeletal muscle cells, immune cells (T, B, NK, NKT, dendritic, monocytes, neutrophils, eosinophils), hepatic cells, splenic cells, lung cells, circulating blood cells, gastrointestinal cells, renal cells, bone marrow cells, and pancreatic cells. The cell can be a primary cell isolated from any somatic tissue including, but not limited to bone marrow, brain, pancreas, liver, lung, gut, stomach, intestine, fat, muscle, uterus, skin, spleen, thymus, kidney, endocrine organ, bone, etc. Where the cell is maintained under in vitro conditions, conventional tissue culture conditions and methods can be used, and are known to those of skill in the art. Isolation and culture methods for various cells are well within the abilities of one skilled in the art. Further, the parental cell can be from any mammalian species, with non-limiting examples including a murine, bovine, simian, porcine, equine, ovine, or human cell. The parental cell should not express embryonic stem cell (ES) markers, e.g., Nanog mRNA or other ES markers, thus the presence of Nanog mRNA or other ES markers indicates that a cell has been re-programmed. Where a fibroblast is used, the fibroblast is flattened and irregularly shaped prior to the re-programming, and does not express Nanog mRNA. The starting fibroblast will preferably not express other embryonic stem cell markers. The expression of ES-cell markers can be measured, for example, by RT-PCR. Alternatively, measurement can be by, for example, immunofluorescence or other immunological detection approaches that detect the presence of polypeptides or other features that are characteristic of the ES phenotype.

[0124] To confirm the induction of pluripotent stem cells, isolated clones can be tested for the expression of a stem cell marker. Such expression identifies the cells as induced pluripotent stem cells. Stem cell markers can be selected from the non-limiting group including SSEA1, CD9, Nanog, Fbx15, Ecat1, Esg1, Eras, Gdf3, Fgf4, Cripto, Dax1, Zpf296, Slc2a3, Rex1, Utf1, and Nat1. Methods for detecting the expression of such markers can include, for example, RT-PCR and immunological methods that detect the presence of the encoded polypeptides. The pluripotent stem cell character of the isolated cells can be confirmed by any of a number of tests evaluating the expression of ES markers and the ability to differentiate to cells of each of the three germ layers. As one non-limiting example, teratoma formation in nude mice can be used to evaluate the pluripotent character of the isolated clones. The cells are introduced to nude mice and histology is performed on a tumor arising from the cells. The growth of a tumor comprising cells from all three germ layers (endoderm, mesoderm and ectoderm) further indicates that the cells are pluripotent stem cells. The pluripotent stem cell character of the isolated cells can also be confirmed by the creation of chimeric mice. For example, the cells can be injected by micropipette into a trophoblast, and the blastocyst transferred to a recipient females, where resulting chimeric living mouse pups (with, for example, 10%-90% chimerism) are indicative of successful generation of iPS cells. Tetraploid complementation can also be used to determine the pluripotent stem cell character of the isolated cells, such that the cells are injected into tetraploid blastocysts, which themselves can only form extra-embryonic tissues, and the formation of whole, non-chimeric, fertile mice, is indicative of successful generation of iPS cells (X-y Zhao et al., 2009, Nature. doi:10.1038/nature08267; L. Kang, et al. 2009. Cell Stem Cell. doi:10.1016/j.stem.2009.07.001; and M. J. Boland et al. Nature. 2009 Aug. 2; 461(7260):91-94).

[0125] Another object of the invention is to provide a method for improving the efficiency of cloning mammals by nuclear transfer or nuclear transplantation.

[0126] Accordingly, in one aspect the invention provides a method for improving the efficiency of cloning mammals by nuclear transfer or nuclear transplantation, the method comprising contacting a nucleus isolated from a cell during a typical nuclear transfer protocol with an effective hydroxylating-inducing amount of one or more catalytically active TET family enzymes, one or more functional TET family derivatives, one or more TET catalytically active fragments thereof, or any combination thereof.

[0127] In another aspect, the invention provides a method for improving the efficiency of cloning mammals by nuclear transfer or nuclear transplantation, the method comprising contacting a nucleus isolated from a cell during a typical nuclear transfer protocol with an effective of one or more catalytically active TET family enzymes, one or more functional TET family derivatives, one or more TET catalytic fragments, or any combination thereof, and an effective amount of one or more inhibitors of TET family catalytic activity, in combination with at least one known factors that induces pluripotency, in vitro or in vivo. In one embodiment, the catalytically active TET family enzyme, functional TET family derivatives, or TET catalytic fragments, is a catalytically active TET1 and/or TET2 enzyme, and/or functional TET1 and/or TET2 derivative, and/or a TET1 and/or TET2 catalytic fragment, or any combination thereof, and the inhibitor of TET family catalytic activity is a TET3 inhibitor. In one embodiment, the inhibitor of TET3 is an siRNA or shRNA sequence specific for TET3.

[0128] In one embodiment, the method comprises a typical nuclear transfer protocol. In a non-limiting example, the method comprises the steps of: (a) enucleating an oocyte; (b) isolating and permeabilizing a nucleated cell, thereby generating a permeabilized cell having pores in its plasma membrane or a partial plasma membrane or no remaining plasma membrane; (c) dedifferentiating the permeabilized cell containing a nucleus of step (b), comprising contacting the nucleus with an effective hydroxylation inducing amount of one or more catalytically active TET family enzymes, one or more functional TET family derivatives, and/or one or more TET catalytically active fragments thereof, under dedifferentiating conditions utilized by ones skilled in the art; (d) transplanting the dedifferentiated nucleus formed in step (c) into a nucleated or enucleated egg such that the dedifferentiated nucleus is exposed to an activating egg cytoplasm, thereby forming a reconstituted oocyte, wherein the recipient egg is from the same species as the somatic reprogrammed cell nucleus; and (e) transferring the reconstituted oocyte or an embryo formed from the reconstituted oocyte into a host animal, thus allowing the egg to develop under direction of genetic information contained in the transplanted activated nucleus.

[0129] In connection with the administration of a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment thereof, "improving the efficiency of cloning mammals by nuclear transfer or nuclear transplantation", indicates that the proportion of cloned mammals produced in the presence of exogenous catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments therein, is at least 5% higher than a comparable, control treated population. In one embodiment, the proportion of viable cloned mammals in a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment, treated population is at least 10% higher, at least 15% higher, at least 20% higher, at least 25% higher, at least 30% higher, at least 35% higher, at least 40% higher, at least 45% higher, at least 50% higher, at least 55% higher, at least 60% higher, at least 65% higher, at least 70% higher, at least 75% higher, at least 80% higher, at least 85% higher, at least 90% higher, at least 95% higher, at least 98% higher, at least 99% higher, or more than a control treated population under comparable conditions, wherein no catalytically active TET family enzyme, no functional TET family derivative, or no TET catalytically active fragment is present. The term "control treated population under comparable conditions" is used herein to describe a population of permeabilized, nucleated cells that have been treated with identical media, viral induction, nucleic acid sequence, temperature, confluency, flask size, pH, etc., with the exception of the addition of the catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments therein, with all other steps in the protocol remaining identical.

[0130] In one embodiment, somatic cells are cultured for 5 or more passages (about 10 doublings in cell number), more preferably for 7 or more passages (about 14 doublings in cell number), more preferably for 10 (about 20 doublings in cell number) or more passages and yet more preferably for 15 (about 30 doublings in cell number) passages on a suitable growth medium. Cells are cultured until confluent, disaggregated by chemical and/or mechanical means, and allocated to new growth media upon each passage.

[0131] It is preferred that the donor cells of the invention be induced to quiescence prior to fusion or microinjection into the recipient cell. In accord with the teachings of PCT/GB96/02099 and WO 97/07668, both assigned to the Roslin Institute (Edinburgh), it is preferred that the donor nucleus be in either the G0 or G1 phase of the cell cycle at the time of transfer. Donors must be diploid at the time of transfer in order to maintain correct ploidy. It is particularly preferred that the donor cells be in the G0 phase of the cell cycle.

[0132] While it is preferred that the recipient of the donor cell nucleus be an oocyte at metaphase I to metaphase II, the present invention may be used with other recipients known to those of ordinary skill in the art, including zygotes and two-cell embryos. Activation of oocytes can be by fertilization with sperm or by parthenogenetic activation schemes known in the art. It is particularly preferred that the recipient be enucleate. A preferred oocyte is an enucleated metaphase II oocyte, non-activated or pre-activated. When a recipient is an enucleated metaphase II oocyte, activation may take place at the time of transfer.

[0133] It is preferred that the reconstituted oocyte be activated prior to implantation into the host using techniques known to those of ordinary skill in the art, such as electrical stimulation. As would be understood by one of ordinary skill in the art, activation techniques should be optimized for the particular cell type being used. Non-electrical means for activation known in the art include, but are not limited to, ethanol, protein kinase inhibitors (e.g., 6-dimethylpurine (DMAP), ionophores (e.g., ionomycin), temperature change, protein synthesis inhibitors (e.g. cyclohexamide), thapsigargin, phorbol esters (e.g. phorbol 12-myristate 13-acetate ("PMA")), and mechanical means (See, e.g., Susko-Parrish., U.S. Pat. No. 5,496,720, issued Mar. 5, 1996).

[0134] Cultured donor cells may be genetically altered by methods well-known to those of ordinary skill in the art (see, Molecular Cloning a Laboratory Manual, 2nd Ed., 1989, Sambrook, Fritsch and Maniatis, Cold Spring Harbor Laboratory Press; U.S. Pat. No. 5,612,205, Kay et al., issued Mar. 18, 1997; U.S. Pat. No. 5,633,067, to DeBoer et al., issued May 27, 1997). Any known method for inserting, deleting or modifying a desired gene from a mammalian cell may be used to alter the nuclear donor. Included is the technique of homologous recombination, which allows the insertion, deletion or modification of a gene or genes at specific site or sites in the cell genome. Examples for modifying a target DNA genome by deletion, insertion, and/or mutation are retroviral insertion, artificial chromosome techniques, gene insertion, random insertion with tissue specific promoters, gene targeting, transposable elements and/or any other method for introducing foreign DNA or producing modified DNA/modified nuclear DNA. Other modification techniques include deleting DNA sequences from a genome and/or altering nuclear DNA sequences. Nuclear DNA sequences, for example, may be altered by site-directed mutagenesis.

Human Regulatory T Cell Production Using TET Family Proteins

[0135] The mechanisms underlying the methylation and demethylation status of mammalian cells are areas of active research. Most gene regulation is transitory, depending on the current state of the cell and changes in external stimuli. Persistent regulation, on the other hand, is a primary role of epigenetic modifications, i.e., heritable regulatory patterns that do not alter the basic genetic coding of the DNA. DNA methylation is the archetypical form of epigenetic regulation, and performs a crucial role in maintaining the long-term identity of various cell types.

[0136] Tissue-specific methylation also serves in regulating adult cell types/stages, and in some cases a causal relationship between methylation and gene expression has been established. A much studied example for such a cell type and cell status specific modification of certain gene regions is found during the lineage commitment of naive T cells to differentiated helper T cells (Th1 or Th2). Naive (unstimulated) CD4.sup.+ T cells become activated upon encountering an antigen and become committed to alternative cell fates through further stimulation by interleukins. The two types of helper T cells show reciprocal patterns of gene expression: Th1 cells produce Interferon-gamma (IFN-gamma) and silence IL-4, while Th2 cells produce IL-4 and silence IFN-gamma (K. M. Ansel, Nature Immunology 4:616-623, (2003)). For both alternative cell fates, the expression of these genes is inversely correlated with methylation of proximal CpG sites. In Th2 and naive T cells the IFN-gamma promoter is methylated, but not in IFN-gamma expressing Th1 cells (J. T. Attwood, CMLS 59:241-257, (2002)). Conversely, the entire transcribed region of IL-4 becomes demethylated under Th2-inducing conditions, strongly correlating with efficient transcription of IL-4, whereas in Th1 cells, specific untranscribed regions gradually become heavily methylated and IL-4 is not expressed (D. U. Lee, Immunity 16:649-660, (2002)). Furthermore, it has been demonstrated that in naive T cells, the IL-2 promoter is heavily methylated and inactive, but after activation of the naive T cell, the IL-2 gene undergoes rapid and specific demethylation at six consecutive CpGs. This alteration in methylation patterns occurs concomitantly with cell differentiation and increased production of the IL-2 gene product (D. Bruniquel and R. H. Schwartz, Nat. Immunol. 4:235-40, (2003)). In developing immune cells, demethylation during cell fate decisions occurs either passively through exclusion of maintenance methylases from the replication fork, or actively as in the case of IL-2 where a yet not identified enzyme is able to actively demethylate the promoter region upon TCR stimulation.

[0137] Regulatory T cells or Treg cells play an important role for the maintenance of immunological tolerance by suppressing the action of autoreactive effector cells and are critically involved in preventing the development of autoimmune reactions, thus making them important and attractive targets for therapeutic applications (S. Sakaguchi, Nat Immunol 6:345-352, (2005)). While a number of cell surface molecules are used to characterize and define Treg cells, the most common being CD4+CD25hi, the transcription factor FOXP3 is specifically expressed in these cells and has been shown to be a critical factor for the development and function of Treg cells.

[0138] It has been demonstrated that a conserved 348 bp fragment upstream of the FOXP3 transcription start site contains a minimal promoter necessary for induction of FOXP3 expression (P. Y. Mantel, J. Immunol. 176(6):3593-602 (2006)). Analysis of the methylation status in a stretch of 8 tightly positioned CpG dinucleotides demonstrated that naturally occurring regulatory T cells display a completely demethylated promoter region. In contrast, induced CD4+CD25hi cells, as well unstimulated and restimulated CD4+CD2510 cells displayed a partially methylated promoter region (P. C. Janson, PLoS ONE. 3(2) (2008)). Various data demonstrate that activation of CD4+CD2510 cells results in partial demethylation of the human FOXP3 promoter, and that the speed of demethylation correlates with proliferation, thus indicating a mechanism of passive demethylation. Importantly, in contrast to the mouse system, the addition of TGF-.beta. during cell culture of human regulatory T cells does not result in a Treg-like demethylation at the human FOXP3 promoter, highlighting the need for alternative mechanisms of modulating the methylation status at the FOXP3 locus for the generation of stable human regulatory T cell lines.

[0139] The importance of demethylation at the FOXP3 locus was demonstrated by the fact that the addition of DNA methylation-inhibiting 5-azacytidine to in vitro derived human regulatory T cell cultures was sufficient to induce stable FOXP3 expression, and 5-azacytidine also stabilized TGF-.beta. induced FOXP3+ Treg cells in restimulation cultures. Similarly, blocking the maintenance of DNA methylation, by pharmacological inhibition of DNA methyltransferase-1, induced significant and stable activation-dependent FOXP3 expression in cycling conventional T cells, which was further amplified by co-treatment with TGF-.beta..

[0140] Taken together, the results thus far demonstrate that epigenetic modification, which results in imprinting of FOXP3 expression and stable Treg populations, is not restricted to naturally occurring Treg cells differentiating within the thymus, but can still be initiated in peripheral FOXP3-T cells. Furthermore, the data indicate that stable conversion of CD25-CD4+ T cells into FOXP3+ Treg can only occur under conditions that also induce epigenetic fixation of the Treg phenotype by modulating the methylation status of the DNA at the FOXP3 locus. However, the biological signals leading to this modulation of the methylation status at the FOXP3 locus remain elusive.

[0141] One object of the present invention to provide an improved method of generating stable regulatory T cells.

[0142] Accordingly, one aspect of the present invention provides a method for improving the generation of stable human regulatory FOXP3+ T cells, the method comprising contacting a human T cell with or delivering to a human T cell an effective 5-methylcytosine to 5-hydroxymethylcytosine converting amount of one or more catalytically active TET family enzymes, functional TET family derivatives, TET catalytic fragments, or any combination thereof. In one embodiment, one uses the entire protein of TET1, TET2, TET3, or CXXC4, or a nucleic acid encoding such a protein, or any combination thereof. In one embodiment, one uses only the active hydroxylation-inducing portion of TET1, TET2, TET3, or CXXC4, or a nucleic acid encoding such a fragment, or any combination thereof.

[0143] In connection with "contacting with" or "delivering to" a cell a TET family enzyme, functional TET family derivative, TET catalytic fragment thereof, or any combination thereof, the phrase "improving the generation of stable human regulatory FOXP3+ cells" indicates that the percentage of stable human regulatory FOXP3+ cells in a given population is at least 5% higher in populations treated with a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytic fragment thereof, relative to a comparable, control population, where no TET family enzyme, functional TET family derivative, or TET catalytic fragment is present. In one embodiment, the percentage of stable human regulatory FOXP3+ cells in a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytic fragment thereof, treated population is at least 10% higher, at least 15% higher, at least 20% higher, at least 25% higher, at least 30% higher, at least 35% higher, at least 40% higher, at least 45% higher, at least 50% higher, at least 55% higher, at least 60% higher, at least 65% higher, at least 70% higher, at least 75% higher, at least 80% higher, at least 85% higher, at least 90% higher, at least 95% higher, at least 1-fold higher, at least 1.5-fold higher, at least 2-fold higher, at least 5-fold higher, at least 10 fold higher, at least 25 fold higher, at least 50 fold higher, at least 100 fold higher, at least 1000-fold higher, or more than a control treated population of comparable size and culture conditions. The phrase "control treated population of comparable size and culture conditions" is used herein to describe a population of cells that has been treated with identical media, viral induction, nucleic acid sequences, temperature, confluency, flask size, pH, etc., with the exception of the addition of a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytic fragment thereof.

[0144] By the phrase "stable human regulatory FOXP3+ T cells" is meant a population of CD4 T cells that maintain expression of the transcription factor FOXP3 upon repeated T cell stimulation in the absence of exogenous regulatory T cell differentiation factors, such as, but not limited to, TGF-.beta.. Such "stable human regulatory FOXP3+ T cells" possess functions known to be characteristic of human regulatory T cells, for example, but not limited to, the ability to suppress the proliferation of naive CD4+CD25-cells in a dose-dependent manner, as assayed by techniques familiar to those in the art, including, but not limited to, tritiated-thymidine incorporation and CFSE assays.

[0145] The production of human regulatory FOXP3+ T cells, as practiced by those skilled in the art, is generally achieved by purifying CD4+ cells from a human source and culturing and expanding the CD4+ cells in the presence of agents that non-specifically activate the T cell receptor, and cytokines and/or growth factors known to promote survival, growth, function, differentiation, or a combination thereof, of the regulatory T cell lineage. It is to be understood that the CD4+ T cells may be obtained from in vivo sources, such as, for example, peripheral blood, leukopheresis blood product, apheresis blood product, peripheral lymph nodes, gut associated lymphoid tissue, spleen, thymus, cord blood, mesenteric lymph nodes, liver, sites of immunologic lesions, e.g. synovial fluid, pancreas, cerebrospinal fluid, tumor samples, granulomatous tissue, or any other source where such cells may be obtained. It is to be understood that any technique, which enables separation of the CD4 T cells for use in the methods and assays invention may be employed, such as flow cytometric sorting, or through the use of magnetic bead assays (negative or positive selection), or a combination of such methods, and is to be considered as part of this invention.

[0146] Cytokines and growth factors, it is to be understood, may include polypeptides and nonpolypeptide factors. As defined herein, a "cytokine" is any of a number of substances that are secreted by specific cells of the immune system which carry signals locally between cells, and thus have an effect on other cells, and include proteins, peptides, or glycoproteins. A cytokine, may include lymphokines, interleukins, and chemokines, and can be classified into: (1) the four a-helix bundle family, which is further divided into three sub-families (IL-2 subfamily, interferon (IFN) subfamily, and the IL-10 subfamily); (2) the IL-1 family, which primarily includes IL-1 and IL-18; and (3) the IL-17 family, which has yet to be completely characterized, though member cytokines have a specific effect in promoting proliferation of T-cells that cause cytotoxic effects.

[0147] A "growth factor", as the term is defined herein, refers to a naturally occurring substance capable of stimulating cellular growth, proliferation and cellular differentiation. A growth factor may be a protein or a steroid hormone. A cytokine may be a growth factor. Some non-limiting examples of growth factor families include: Bone morphogenetic proteins (BMPs), Epidermal growth factor (EGF), Erythropoietin (EPO), Fibroblast growth factor (FGF), Granulocyte-colony stimulating factor (G-CSF), Granulocyte-macrophage colony stimulating factor (GM-CSF), Growth differentiation factor-9 (GDF9), Hepatocyte growth factor (HGF), Hepatoma derived growth factor (HDGF), Insulin-like growth factor (IGF), Myostatin (GDF-8), Nerve growth factor (NGF) and other neurotrophins, Platelet-derived growth factor (PDGF), Thrombopoietin (TPO), Transforming growth factor alpha (TGF-.alpha.), Transforming growth factor beta (TGF-.beta.), and Vascular endothelial growth factor (VEGF).

[0148] In general, successful generation of human regulatory FOXP3+ T cells, as practiced by one of skill in the art, is accomplished by culturing purified CD4+ T cells in the presence of anti-CD3 and anti-CD28 antibodies as T cell receptor stimulating agents, and promoting the differentiation of human regulatory FOXP3+ T cells by the addition of TGF-.beta. to the culture medium. The isolated CD4+ cells cultured under such conditions can then be assessed for expression of cell-surface markers characteristic of the regulatory T cell lineage, such as, but not limited to, CD25, using techniques standard in the art. It is to be understood that the isolated culture-expanded human regulatory FOXP3+ T cells of this invention may express in addition to CD25 and CD4 any number or combination of cell surface markers, as described herein, and as is well known in the art, and are to be considered as part of this invention. The isolated CD4+ T cells cultured under such conditions can also be assessed for expression of the transcription factor defining the regulatory T cell lineage, FOXP3, using techniques known in the art, for example, but not limited to, intracellular flow cytometric analysis using a labeled FOXP3 specific monoclonal antibody that can be detected using a flow cytometer.

[0149] Accordingly, in one embodiment, the method of generating human regulatory FOXP3+ T cells further comprises contacting the human T cell with a composition comprising at least one cytokine, growth-factor, or activating reagents. In one embodiment, the composition comprises TGF-.beta..

Compositions and Methods for Detecting 5-Methylcytosine and 5-Hydroxmethylcytosine

[0150] The invention is based, in part, upon identification of a novel and surprising enzymatic activity for the family of TET proteins, namely TET1, TET2, TET3, and CXXC4. The novel activity is related to the hydroxylase activity of the TET family enzymes, wherein the hydroxylase activity converts the cytosine nucleotide 5-methylcytosine into 5-hydroxymethylcytosine. There are currently no techniques or reagents to detect or map 5-hydroxymethylcytosine residues in genomes, as it is not recognized either by the 5-methylcytosine binding protein MeCP2 (V. Valinluck, Nucleic Acids Research 32: 4100-4108 (2004)), or existing specific monoclonal antibodies directed against 5-methylcytosine. Hence, reagents and methods to detect 5-hydroxymethylcytosine are required.

[0151] Accordingly, one object of the present invention is directed towards compositions and methods for the detection of 5-methylcytosine and 5-hydroxymethylcytosine nucleotides in a nucleic acid, such as DNA, in a biological sample.

[0152] In one embodiment, an assay based on thin-layer chromatography (TLC) is used. Briefly, DNA is extracted from cells and digested with a methylation insensitive enzyme that cuts the DNA regardless of whether the internal cytosine in the CG dinucleotide is methylated. Preferably, the restriction enzyme cuts within CCGG sequences, and more preferably the enzyme is MspI. Alternatively, the enzyme cuts within TCGA, and the restriction enzyme used is Taq.alpha.1. The restricted DNA is then treated with an agent to remove the newly exposed 5' phosphate, such as calf intestinal phosphatase. The DNA is then treated to yield fragments that are almost exclusively labeled on the newly exposed 5' cytosine, regardless of methylation status, by, for example, end-labeling the DNA with T4 polynucleotide kinase and [.gamma.32P]ATP. The DNA fragments are then digested to liberate dNMPs (dinucleotide monophosphates), using agents such as, for example, snake venom phosphodiesterase and DNase I. The dNMPs can then be separated on cellulose TLC plates and excised for nucleotide identification. As a means of confirming the presence of 5-hydroxymethylcytosine nucleotide in a sample, a known biological source of the nucleotide may be used, such as T-even phages grown in E. coli lacking GalU (the enzyme that catalyses formation of the glucose donor UDP-Glucose) and the McrA and McrB1 components of McrBC, which results in the exclusive production of 5-hydroxymethylcytosine, and can be used to compare migration patterns with that of the nucleotides present in the sample.

[0153] In addition, the methods and compositions described herein generally involve direct detection of 5-methylcytosine and 5-hydroxymethylcytosine nucleotides, with agents that recognize and specifically bind to 5-methylcytosine and 5-hydroxymethylcytosine nucleotides in a nucleic acid sequence. These methods and compositions can be used singly or in combination to determine the hydroxymethylation status of cellular DNA or sequence information. In one embodiment, these methods and compositions can be used to detect 5-hydroxymethylcytosine in cell nuclei for the purposes of immunohistochemistry. In another embodiment, these methods and compositions can be used to immunoprecipitate DNA fragments containing 5-hydroxymethylcytosine from crosslinked DNA by chromatin immunopreciptation (ChIP). The identity of such fragments can then be determined by deep-sequencing (ChIPseq) or by hybridizing the fragments to genomic tiling arrays.

[0154] Accordingly, one embodiment comprises providing an antibody or antigen-binding fragment thereof that specifically binds to 5-hydroxymethylcytosine. The antibody or antigen-binding portion thereof can be contacted with a biological sample under conditions effective to yield a detectable signal if 5-hydroxymethylcytosine is present in the sample, and the antibody or antigen-binding portion thereof binds to the 5-hydroxymethylcytosine. A determination can then be made as to whether the sample yields a detectable signal, where the presence of the detectable signal indicates that the sample contains the 5-hydroxymethylcytosine. Such a determination can be made using any equipment that detects the signal, such as a microscope (fluorescent, electron) or flow cytometric device.

[0155] In one embodiment, the 5-hydroxymethylcytosine nucleotide is detected using a hydroxymethylation-specific antibody, hydroxymethylation-specific antigen-binding fragment thereof, or hydroxymethylation-specific protein.

[0156] The methylation of cytosine residues occurs in the DNA of many organisms from plants to mammals and is believed to play a critical role in gene regulation. There is considerable research into the mechanisms by which patterns of cytosine methylation change during the differentiation of cells and in states of disease. Furthermore, cytosine methylation patterns are believed to serve as a functional "fingerprint" of different normal and diseased cell types and of the same cell type at various stages of differentiation, and thus mapping the sites of cytosine methylation on a genome-wide scale is a subject of research.

[0157] Novel compositions and methods are provided herein that (1) enable covalent enzymatic tagging of methylcytosine in polynucleotides, and detection of the covalent tag; (2) enable covalent enzymatic tagging of 5-hydroxymethylcytosine in polynucleotides, and detection of the covalent tag; and (3) enable detection of 5-hydroxymethylcytosine through chemical modification, such as bisulfite treatment. The compositions and methods for tagging, modification, detection and isolation further provide, in part, numerous downstream applications for analysis of methylcytosine and 5-hydroxymethylcytosine in polynucleotides, including but not limited to, genome-wide analysis of methylcytosine and 5-hydroxymethylcytosine patterns in normal and diseased DNA. The compositions and methods of the invention significantly expand the current state of the art, and can be immediately applied to basic research, clinical diagnostics, and drug screening applications.

[0158] This invention describes, in part, a method to covalently tag and detect naturally occurring 5-hydroxymethylcytosine in nucleic acids, such as DNA, for multiple applications. As has been described herein, we have shown that 5-hydroxymethylcytosine is present in mammalian DNA, which, without wishing to be bound by a theory, may exist as an intermediate during changes in methylation status of the genome. As described herein, modification of methylcytosine to 5-hydroxymethylcytosine is catalyzed through the action of the novel TET family of enzymes. Without wishing to be bound by a theory, we believe that 5-hydroxymethylcytosine in DNA is subsequently converted into unmethylated cytosine. 5-hydroxymethylcytosine in DNA may also serve other functions.

[0159] As is described herein, in some aspects, methods are provided wherein a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment thereof is contacted with a nucleic acid, such as DNA or RNA, to convert methylcytosine in nucleic acids to 5-hydroxymethylcytosine. In some embodiments, the nucleic acids are contacted in vitro. In some embodiments, the nucleic acids are contacted in a cell. In some embodiments, the nucleic acids are contacted in vivo, in a living animal, preferably a mammal, for example, a human.

[0160] Compositions and methods to detect and map methylated and hydroxymethylated cytosine residues in genomes have numerous applications. Several techniques are currently utilized to map methylated cytosine residues. One method involves a chemical reaction of nucleic acids with sodium hydrogen sulfite (bisulfite), which sulfonates unmethylated cytosine but does not efficiently sulfonate methylated cytosine. The sulfonated unmethylated cytosine is prone to spontaneous deamination, which yields sulfonated uracil. The sulfonated uracil can then be desulfonated to uracil at low pH. The base-pairing properties of the pyrimidines uracil and cytosine are fundamentally different: uracil in DNA is recognized as the equivalent of thymine and therefore is paired with adenine during hybridization or polymerization of DNA, whereas cytosine is paired with guanosine during hybridization or polymerization of DNA. Performance of genomic sequencing or PCR on bisulfite treated DNA can therefore be used to distinguish unmethylated cytosine in the genome, which has been converted to uracil by bisulfite/deamination/desulfonation, versus methylated cytosine, which has remained unconverted. This technique is amenable to large-scale screening approaches when combined with other technologies such as microarray hybridization and high-throughput sequencing.

[0161] As described, the invention provides, in one aspect, a method of detecting 5-hydroxymethylcytosine in complex genomes using bisulfite treatment of nucleic acids, such as DNA. The method comprises, in part, contacting a nucleic acid of interest, such as isolated genomic DNA or an oligonucleotide, with an effective amount of sodium bisulfite to convert any 5-hydroxymethylcytosine present in the nucleic acid to cytosine-5-methylenesulfonate. The bisulfite treated nucleic acid is then digested with an enzyme, such as a methyl sensitive enzyme, and the nucleic acid is end-labeled. In one embodiment, the enzyme is MseI. In one embodiment, the nucleic acid is end-labeled, for example, using .sup.32P. The digested and labeled nucleic acid is then contacted with an antiserum, antibody or antigen-fragment thereof specific for cytosine-5-methylenesulfonate. The contacted nucleic acid can then be immobilized using, for example, beads specific for the species and isotype of antiserum, antibody or antigen-fragment thereof. In one embodiment, the beads comprise anti-rabbit IgG beads. The amount of 5-hydroxymethylcytosine in the immobilized nucleic acid can then be determined by obtaining the radiation counts, by, for example, a scintillation counter. In other embodiments of the aspect, the antibody or antigen-binding fragment is directly labeled. In some embodiments, the label is a fluorescent label or an enzymatic substrate. In some embodiments, the nucleic acid is contacted in vitro. In some embodiments, the nucleic acid is contacted in a cell. In some embodiments, the nucleic acid is contacted in vivo.

[0162] In some embodiments, the ability of a test inhibitor to inhibit TET family enzymatic activity can be determined using the methods described herein. For example, genomic DNA is isolated from cells treated with one or more test inhibitors of TET family enzymatic activity, such as siRNAs, and undergoes bisulfite treatment as described herein. The presence of less cytosine-5-methylenesulfonate in a sample treated with the test inhibitor(s) of TET family enzymatic activity compared with a sample to which no test inhibitor(s) was added is indicative of the ability of the test inhibitor to inhibit TET family activity.

[0163] In other embodiments, the methods described herein to detect cytosine-5-methylenesulfonate in a sample can be used to test whether a patient having a mutation, single nucleotide polymorphism, or other genetic difference in a TET family member genomic sequence has decreased 5-hydroxymethylcytosine.

[0164] In other embodiments, the methods of the aspect can be used to isolate a nucleic acid having one or more 5-hydroxymethylcytosine residues, for use, for example, in chromatin immunopreciptation assays. Such isolated nucleic acids can then be sequenced or subjected to PCR amplification and subsequent sequencing to identify the genomic regions having 5-hydroxymethylcytosine residues.

[0165] As described herein, the invention provides, in one aspect, novel and significant improvements for detecting 5-methylcytosine and 5-hydroxymethylcytosine in complex genomes. In some embodiments, a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment thereof is provided to efficiently convert methylcytosine in nucleic acids to 5-hydroxymethylcytosine. In some embodiments, compositions and methods are provided for using specific and efficient enzymes to convert methylcytosine residues in nucleic acids to glucosylated-5-hydroxymethylcytosine residues and gentibiose-containing-5-hydroxymethylcytosine residues. In some embodiments, the nucleic acids are contacted in vitro. In some embodiments, the nucleic acids are contacted in a cell. In some embodiments, the nucleic acids are contacted in vivo.

[0166] Another method currently used to distinguish methylated versus unmethylated cytosine in genomes is by use of methylation sensitive restriction enzymes (MSRE). Cytosine methylation in certain sequence contexts prevents cleavage by MSRE, whereas other enzymes are able to cleave the identical sequence regardless of cytosine methylation status. This differential sensitivity to cytosine methylation can be used to quantitatively determine the degree of methylation in particular stretches of sequence in the genome. Limitations of this method are that it is less amenable to large-scale approaches, and analysis is limited to methylation within recognition sites of the restriction enzymes.

[0167] As described herein, the invention provides, in one aspect, novel and significant improvements for detecting methylcytosine in complex genomes. The compositions and methods, as described herein, will allow tagging and analysis of all methylated cytosine residues in the genome, as opposed to the limited analysis obtained with MSRE.

[0168] A third method used to distinguish methylated versus unmethylated cytosine in genomes is via affinity purification of methylated cytosine using antibodies or protein domains (e.g. MBD2) that specifically bind to the methylated cytosine residue. Methylated cytosine containing DNA is bound by these affinity reagents and then enriched by binding of the affinity reagent to a solid support or other separation strategy. Further analysis such as microarray hybridization and high-throughput sequencing can be performed on either the bound fraction enriched for methylated cytosine-containing DNA, or the unbound fraction enriched for unmethylated cytosine. This technique has the advantage of enriching regions of interest for further analysis, such as high-throughput sequencing of methylated or unmethylated cytosine in genomes. One limitation of this method is that it depends heavily on the binding affinity and specificity of the given methylated cytosine binding protein, since the binding of these reagents is noncovalent. Another limitation of this method is that it measures density of methylation in a given genomic region, and will not be as sensitive to areas with sparse methylation target sites.

[0169] The compositions and methods of the invention provide, in one aspect, improved affinity purification of DNA containing methylated cytosine, by adding covalent tags and/or chemical modifications to methylated cytosine and 5-hydroxymethylated cytosine residues. This is because, as described herein, detection reagents against glucosylated 5-hydroxymethyl cytosine, gentibiose containing 5-hydroxymethylcytosine DNA and chemically modified 5-methylenesulfonate hydroxymethylcytosine are either covalently bound or non-covalently bound with a much higher affinity and specificity than that currently achievable by methylcytosine affinity reagents.

[0170] In addition, as described herein, novel compositions and methods are provided for detecting methylated and hydroxymethylated cytosine in complex genomes. Such compositions and methods utilize the properties of certain enzymes to efficiently and specifically add glucose residues to hydroxymethylcytosine in DNA. Enzymes encoded by bacteriophages of the "T even" family have these properties, and those enzymes that add glucose in the alpha configuration are called alpha-glucosyltransferases (AGT), while those enzymes that add glucose in the beta configuration are called beta-glucosyltransferases (BGT). T2, T4, and T6 bacteriophages encode AGTs, but only T4 bacteriophages encode BGT. Amino acids important for the activity of T4 alpha-glucosyltransferases are His-Asp-His (114-116) ((L. Lariviere, J Mol Biol (2005) 352, 139). Amino acids important for the activity of T4 beta-glucosyltransferases are Asp-Ile-Arg-Leu (amino acids 100-103) (SEQ ID NO: 17), Met (amino acid 231) and Glu (amino acid 311) (L. Lariviere, (2003) J Mol Biol 330, 1077). T2 and T6 bacteriophages possess an additional activity that further modifies glucosylated hydroxymethylcytosine by adding another glucose molecule in the beta-configuration. This enzyme is called beta-glucosyl-alpha-glucosyl-transferase (BGAGT). Addition of the second glucose results in the formation of a disaccharide containing two glucose molecules linked in a beta-1-6 configuration, which is known as gentibiose or gentiobiose. The glucose donor used by AGT, BGT, and BGAGT is called uridine diphosphate glucose (UDPG).

[0171] In some embodiments of this aspect, enzymes encoded by bacteriophages of the "T even" family are provided that add glucose molecules to 5-hydroxymethylcytosine residues in nucleic acids. In one embodiment, the 5-hydroxymethylcytosine is naturally occurring. In one embodiment, the 5-hydroxymethylcytosine occurs through contacting DNA with a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment thereof, thereby converting methylcytosine to hydroxymethylcytosine. In one embodiment, the enzyme provided is an alpha-glucosyltransferase. In one embodiment, the alpha-glucosyltransferases provided are encoded by a bacteriophage selected from the group consisting of T2, T4, and T6 bacteriophages. In one embodiment, the enzyme is a beta-glucosyltransferase. In one embodiment, the beta-glucosyltransferase is encoded by a bacteriophage selected from T4 bacteriophages. In some embodiments, enzymes encoded by bacteriophages of the "T even" family add two glucose molecules linked in a beta-1-6 configuration to hydroxymethylcytosine to form gentibiose-containing-hydroxymethylcytosine. In one embodiment, the enzyme is a beta-glucosyl-alpha-glucosyl-transferase. In one embodiment, the beta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophage selected from the group consisting of T2 and T6 bacteriophages. In some embodiments, the nucleic acids are in vitro. In some embodiments, the nucleic acids are in a cell. In some embodiments, the nucleic acids are in vivo.

[0172] As defined herein, a "naturally occurring" 5-hydroxymethylcytosine residue is one which is found in a sample in the absence of any external manipulation, or activity. For example, a "naturally occurring 5-hydroxymethylcytosine residue" is one found in an isolated nucleic acid that is present due to normal genomic activities, such as, for example, gene silencing mechanisms.

[0173] In some embodiments of this aspect, the addition of glucose or gentibiose molecules to 5-hydroxymethylcytosine residues provides a method to detect nucleic acids containing hydroxymethylated cytosines. In some embodiments, the method to detect the hydroxymethylated cytosine utilizes radiolabeled glucose and glucose derivative donor substrates. In one such embodiment, the nucleic acid is incubated with an alpha-glucosyltransferases, a beta-glucosyltransferase, or a beta-glucosyl-alpha-glucosyl-transferase in the presence of radiolabeled uridine diphosphate glucose (UDPG), and the DNA purified and analyzed by liquid scintillation counting, autoradiography or other means. In one such embodiment, the UDPG is radiolabeled with 14C. In one embodiment, the UDPG is radiolabeled with 3H.

[0174] In some embodiments of this aspect, proteins that recognize glucose residues are used as a method to detect 5-hydroxymethylated cytosine. In some embodiments, the proteins recognize only the glucose residue. In some embodiments, the proteins recognize the residue in the context of hydroxymethyl cytosine. In one embodiment, the protein that recognizes glucose residues is a lectin. In one embodiment, the protein that recognizes glucose residues is an antibody or antibody fragment thereof. In one embodiment, the antibody is modified with several tags and used for solid-phase purification of gentibiose-containing-hydroxymethylcytosine in DNA. In one embodiment, the tags are a biotin molecules or beads. In one embodiment, the antibody is modified with gold or fluorescent tags. In one embodiment, the protein that recognizes glucose residues is an enzyme. In one embodiment, the enzyme is a hexokinase or a beta-glucosyl-alpha-glucosyl-transferase.

[0175] In other embodiments of this aspect, the addition of glucose to the 5-hydroxymethylcytosine residues provides a method to detect nucleic acids containing hydroxymethylated cytosines. In such embodiments, naturally occurring 5-hydroxymethylcytosine, or 5-hydroxymethylcytosine occurring through contacting DNA with a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment thereof, undergoes conversion to glucosylated 5-hydroxymethylcytosine using the methods described herein. The glucosylated 5-hydroxymethylcytosine is then contacted with sodium periodate to generate aldehyde residues, and the DNA isolated and precipitated by any method known to one of skill in the art, such as ethanol precipitation. The quantity of aldehyde residues, as determined by one of skill in the art, can then be used to determine the quantity of 5-hydroxymethylcytosine residues. For example, in one embodiment, aldehye residues can be detected using an aldehyde specific probe conjugated to a tag, such as an enzyme, non-fluorescent moiety, or fluorescent label. In one embodiment, the aldehyde specific probe is an aldehydye reactive biotin, and can be detected by streptavidin conjugated to an enzyme. In some embodiments, the enzyme is horseradish peroxidase. In some embodiments of the aspect, the aldehyde specific probe can be used to perform specific pulldown of the glucosylated DNA residues, which can be used, for example, to perform chromatin immunoprecipitation assays to determine in vivo sites of genomic 5-hydroxymethylation.

[0176] In some embodiments of this aspect, proteins that recognize gentibiosyl residues are used as a method to detect 5-hydroxymethylated cytosine. In some embodiments, enzymes encoded by bacteriophages of the "T even" family add two glucose molecules linked in a beta-1-6 configuration to hydroxymethylcytosine to form gentibiose-containing-hydroxymethylcytosine. In one embodiment, the enzyme is a beta-glucosyl-alpha-glucosyl-transferase. In one embodiment, the beta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophage selected from the group consisting of T2 and T6 bacteriophages. In some embodiments, the gentibiosyl residue in gentibiose-containing-hydroxymethylcytosine is detected non-covalently. In some embodiments, the non-covalent detection methods utilizes proteins with an affinity for the gentibiosyl residue. In one embodiment, the protein is an antibody specific to gentibiose-containing-hydroxymethylcytosine. In one embodiment, the antibody is modified with several tags and used for solid-phase purification of gentibiose-containing-hydroxymethylcytosine in DNA. In one embodiment, the tags are a biotin molecules or beads. In one embodiment, the antibody is modified with gold or fluorescent tags. In one embodiment, the protein is a lectin with affinity to gentibiosyl residues. In one embodiment, the lectin is Musa acuminata lectin (BanLec). In one embodiment, the lectin is modified with gold or fluorescent tags. In some embodiments, the proteins with an affinity for the gentibiosyl residue are used to identify gentibiose-containing-hydroxymethylcytosine in DNA using electron microscopy or immunofluorescent detection.

[0177] In some embodiments of the aspect, glucose substrates that trap the covalent enzyme-DNA intermediates are used as a method to detect 5-hydroxymethylated cytosine. In some embodiments, enzymes encoded by bacteriophages of the "T even" family add glucose substrates that trap the covalent enzyme-DNA intermediates to 5-hydroxymethylcytosine in DNA. In some embodiments, the glucose substrate is a UDPG analog. In one embodiment, the UDPG analog is uridine-2-deoxy-2-fluoro-glucose. In some embodiments, the enzyme encoded by bacteriophages of the "T even" family is labeled with a tag to facilitate detection and isolation of the covalently linked enzyme-DNA intermediate. In one embodiment, the tag is a protein. In one embodiment, the tag is not a protein.

[0178] In some embodiments of this aspect, the method to detect the hydroxymethylated cytosine uses a chemical that recognizes sugar residues and catalyzes further reactions that enable additional tags to be placed on these sugar residues. In one embodiment, the sugar residue is a glucose or a glucose derivative. In one embodiment, the sugar residue is a gentibiose molecule.

[0179] In some embodiments of this aspect, the addition of glucose molecules to hydroxymethylcytosine serves to covalently tag hydroxymethylcytosine for downstream applications. In one such embodiment, the downstream application involves the detection and purification of DNA containing methylcytosine and hydroxymethylcytosine. In some embodiments the glucose and glucose derivative donor substrates are radiolabeled for detection.

[0180] In some embodiments of this aspect, the 5-hydroxymethyl residue of 5-hydroxymethylcytosine residues in nucleic acids is converted to a methylenesulfonate residue after treatment with sodium hydrogen sulfite. In some embodiments, the addition of sulfonate to 5-hydroxymethylcytosine provides a method to detect the hydroxymethylated cytosine residue. In one embodiment, antibodies specific for the 5-methylenesulfonate residue in nucleosides are used. In some embodiments, the nucleic acids are in vitro. In some embodiments, the nucleic acids are in a cell. In some embodiments, the nucleic acids are in vivo.

[0181] In some embodiments of this aspect, the addition of glucose, glucose analogs, or sulfonate molecules to methylcytosine and hydroxymethylcytosine serves to covalently or non-covalently tag methylcytosine and hydroxymethylcytosine for downstream applications. In one such embodiment, the downstream application involves the detection and purification of nucleic acids containing methylcytosine and hydroxymethylcytosine. In some embodiments the glucose and glucose derivative donor substrates are radiolabeled for detection. In some embodiments, the downstream application involves detection of methylcytosine and 5-hydroxymethylcytosine in cells or tissues directly by fluorescence or electron microscopy. In some embodiments, the downstream application involves detection of methylcytosine and 5-hydroxymethylcytosine by assays such as blotting or linked enzyme mediated substrate conversion with radioactive, colorimetric, luminescent or fluorescent detection. In some embodiments, the downstream application involves separation of the tagged nucleic acids away from untagged nucleic acids by enzymatic, chemical or mechanical treatments, and fractionation of either the tagged or untagged DNA by precipitation with beads, magnetic means, fluorescent sorting. In some embodiments, this is followed by application to whole genome analyses such as microarray hybridization and high-throughput sequencing.

[0182] Another object of the present invention is to provide methods and assays to screen for signaling pathways that activate or inhibit TET family enzymes at the transcriptional, translational, or posttranslational levels.

[0183] Accordingly, one aspect of the invention provides assays for detecting the activity of the TET family of proteins. In one embodiment, an assay for detecting increased hydroxymethylcytosine in vitro using an oligonucleotide containing 5-methylcytosine is provided. In one embodiment, an assay for detecting an increased cytosine-to-methylcytosine ratio in vitro in an oligonucleotide containing 5-methylcytosine is provided. In one embodiment, an assay for detecting increased hydroxymethylcytosine in cellular DNA is provided. In one embodiment, an assay for detecting an increased cytosine-to-methylcytosine ratio in cellular DNA is provided. In another embodiment, an assay for detecting increased hydroxymethylcytosine in transfected plasmid DNA is provided. In one embodiment, an assay for detecting an increased cytosine-to-methylcytosine ratio in transfected plasmid DNA is provided. In another embodiment, an assay for detecting increased activity of a reporter gene that is initially silenced by promoter methylation is provided. In one embodiment, an assay for the detection of other oxidative modifications of pyrimidines in RNA or DNA, in vitro, in cells or in plasmid DNA, is provided.

[0184] Another aspect provides a method for detecting factors involved in decreasing the amount of 5-hydroxymethylcytosine residues in a nucleic acid. In some embodiments, the decrease in the amount of 5-hydroxymethylcytosine residues is caused by conversion of 5-hydroxymethylcytosine to cytosine. In some embodiments, the decrease in 5-hydroxymethylcytosine residues is mediated by a DNA repair protein, such as, for example, a glycosylase. In some embodiments, the DNA repair protein is one or more proteins selected from MBD4, SMUG1, TDG. NTHL1, NEIL1, NEIL2, or APEX1. In some embodiments, the method comprises expressing a test factor in a mammalian cell and determining whether any 5-hydroxymethylcytosine residue decreasing activity is present in a cellular lysate by monitoring cleavage of a 5-hydroxymethylcytosine residue containing oligonucleotide. In one embodiment, the method comprises expressing a test glycosylase in a mammalian cell, such as, for example, a 293T cell. Oligonucleotides can then be generated and end-labeled, whereby at least one oligonucleotide comprises one or more 5-hydroxymethylcytosine residues, and at least one oligonucleotide has a known substrate for the test glycosylase. The test glycosylase expressing cells are then lysed, and the oligonucelotides are added to the lysate. In one embodiment, the oligonucelotides are exposed to alkaline conditions to generate abasic sites, and then run on a denaturing gel to detect breaks in the oligonucloetides. For example, if both the oligonucleotide comprising 5-hydroxymethylcytosine residue and the oligonucleotide having a known substrate for the test glycosylase are cut, it indicates that the test glycosylase recognizes 5-hydroxymethylcytosine.

A Kit for Enhancing Gene Transcription, Assessment of 5-methylcytosine to 5-Hydroxymethylcytosine Conversion, and Purification of Nucleotides

[0185] Other aspects of the present invention provide kits comprising materials for performing methods according to the invention as above. A kit can be in any configuration well known to those of ordinary skill in the art and is useful for performing one or more of the methods described herein for the conversion of 5-methylcytosine to 5-hydroxymethylcytosine in cells, and the detection of 5-methylcytosine and 5-hydroxymethylcytosine in a nucleic acid.

[0186] In one embodiment of this aspect, the kit comprises one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof, or engineered nucleic acids encoding such catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof, to be contacted with a cell, or plurality of cells.

[0187] In one embodiment of this aspect, the kit comprises one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof, and one or more compositions comprising cytokines, growth factors, and activating reagents for the purposes of generating stable human regulatory T cells. In one preferred embodiment, the compositions comprising cytokines, growth factor, and activating reagents, comprises TGF-.beta.. In one embodiment of this aspect, the kit includes packaging materials and instructions therein to use said kits.

[0188] In one embodiment of this aspect, the kit comprises one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments, or engineered nucleic acids encoding such catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof, and the nucleic acid sequences for one or more of Oct-4, Sox2, c-MYC, and Klf4, for the purposes of improving the efficiency or rate of the generation of induced pluripotent stem cells. In some embodiments, the nucleic acid sequences for one or more of Oct-4, Sox2, c-MYC, and Klf4 are delivered in a viral vector. In some embodiments, the vector is an adenoviral vector, a lentiviral vector, or a retroviral vector. In one embodiment of this aspect, the kit includes packaging materials and instructions therein to use said kits.

[0189] In one embodiment of this aspect, the kit comprises one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof, to be contacted with a cell, or plurality of cells for the purposes of improving the efficiency of cloning mammals by nuclear transfer. In preferred embodiments, the kit includes packaging materials and instructions therein to use said kits.

[0190] In some embodiments, the kit also comprises reagents suitable for the detection of the activity of one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof, namely the production of 5-hydroxymethylcytosine from 5-methylcytosine. In one preferred embodiment, the kit comprises an antibody, antigen-binding portion thereof, or protein that specifically binds to 5-hydroxymethylcytosine. In other embodiments, one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof are provided in a kit to generate nucleic acids containing hydroxymethylcytosine from nucleic acids containing 5-methylcytosine or other oxidized pyrimidines from appropriate free or nucleic acid precursors. In all such embodiments of the aspect, the kit includes packaging materials and instructions therein to use said kits.

[0191] In some embodiments of this aspect, the kit also comprises, or consists essentially of, or consists of, reagents suitable for the detection and purification of methylcytosine for use in downstream applications. In one embodiment, the kit comprises, consists essentially of, or consists of, one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof for the conversion of methylcytosine to 5-hydroxymethylcytosine; one or more enzymes encoded by bacteriophages of the "T even" family; one or more glucose or glucose derivative substrates; one or more proteins to detect glucose or glucose derivative modified nucleotides; and standard DNA purification columns, buffers, and substrate solutions, as known to one of skill in the art.

[0192] In some embodiments of this aspect, the enzymes encoded by bacteriophages of the "T even" family are selected from the group consisting of alpha-glucosyltransferases, beta-glucosyltransferases, and beta-glucosyl-alpha-glucosyl-transferases. In one embodiment, the alpha-glucosyltransferases are encoded by a bacteriophage selected from the group consisting of T2, T4, and T6 bacteriophages. In one embodiment, the beta-glucosyltransferase is encoded by a bacteriophage selected from T4 bacteriophages. In one embodiment, the beta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophage selected from the group consisting of T2 and T6 bacteriophages.

[0193] In some embodiments, the glucose and glucose derivative donor substrates are radiolabeled. In one such embodiment, the radiolabeled glucose and glucose derivative donor substrate is uridine diphosphate glucose (UDPG). In one such embodiment, the UDPG is radiolabeled with 14C. In one embodiment, the UDPG is radiolabeled with 3H.

[0194] In some embodiments, the proteins that recognize glucose or glucose derivative modified nucleotides are selected from a group comprising a lectin, an antibody or antigen-binding fragment thereof, or an enzyme. In some embodiments, the proteins recognize only the glucose residue. In some embodiments, the proteins recognize the residue in the context of hydroxymethyl cytosine. In one embodiment, the antibody or antibody fragment thereof is modified with several tags. In one embodiment, the tags are biotin molecules or beads. In one embodiment, the antibody is modified with gold or fluorescent tags. In one embodiment, the enzyme is a hexokinase or a beta-glucosyl-alpha-glucosyl-transferase. In one embodiment, the lectin is Musa acuminata lectin (BanLec). In one embodiment, the lectin is modified with gold or fluorescent tags.

[0195] In all such embodiments of the aspect, the kit includes the necessary packaging materials and informational material therein to use said kits. The informational material can be descriptive, instructional, marketing or other material that relates to the methods described herein and/or the use of a compound(s) described herein for the methods described herein. In one embodiment, the informational material can include information about production of the compound, molecular weight of the compound, concentration, date of expiration, batch or production site information, and so forth. In one embodiment, the informational material relates to methods for culturing the compound. In one embodiment, the informational material can include instructions to culture a compound(s) (e.g., a TET family enzyme) described herein in a suitable manner to perform the methods described herein, e.g., in a suitable dose, dosage form, or mode of administration (e.g., a dose, dosage form, or mode of administration described herein) (e.g., to a cell in vitro or a cell in vivo). In another embodiment, the informational material can include instructions to administer a compound(s) described herein to a suitable subject, e.g., a human, e.g., a human having or at risk for a disorder described herein or to a cell in vitro.

[0196] The informational material of the kits is not limited in its form. In many cases, the informational material, e.g., instructions, is provided in printed matter, e.g., a printed text, drawing, and/or photograph, e.g., a label or printed sheet. However, the informational material can also be provided in other formats, such as Braille, computer readable material, video recording, or audio recording. In another embodiment, the informational material of the kit is contact information, e.g., a physical address, email address, website, or telephone number, where a user of the kit can obtain substantive information about a compound described herein and/or its use in the methods described herein. Of course, the informational material can also be provided in any combination of formats.

[0197] In all embodiments of the aspects described herein, the kit will typically be provided with its various elements included in one package, e.g., a fiber-based, e.g., a cardboard, or polymeric, e.g., a styrofoam box. The enclosure can be configured so as to maintain a temperature differential between the interior and the exterior, e.g., it can provide insulating properties to keep the reagents at a preselected temperature for a preselected time. The kit can include one or more containers for the composition containing a compound(s) described herein. In some embodiments, the kit contains separate containers (e.g., two separate containers for the two agents), dividers or compartments for the composition(s) and informational material. For example, the composition can be contained in a bottle, vial, or syringe, and the informational material can be contained in a plastic sleeve or packet. In other embodiments, the separate elements of the kit are contained within a single, undivided container. For example, the composition is contained in a bottle, vial or syringe that has attached thereto the informational material in the form of a label. In some embodiments, the kit includes a plurality (e.g., a pack) of individual containers, each containing one or more unit dosage forms (e.g., a dosage form described herein) of a compound described herein. For example, the kit includes a plurality of syringes, ampules, foil packets, or blister packs, each containing a single unit dose of a compound described herein. The containers of the kits can be air tight, waterproof (e.g., impermeable to changes in moisture or evaporation), and/or light-tight. The kit optionally includes a device suitable for administration of the composition, e.g., a syringe, inhalant, pipette, forceps, measured spoon, dropper (e.g., eye dropper), swab (e.g., a cotton swab or wooden swab), or any such delivery device. In a preferred embodiment, the device is a medical implant device, e.g., packaged for surgical insertion.

Methods of Improving Stem Cell Therapies Using TET Family Proteins

[0198] Stem cell bioengineering is an emerging technology that holds great promise for the therapeutic treatment of a wide range of disorders. A fundamental problem in the field relates to understanding mechanisms whereby stem cell differentiation and lineage commitment can be controlled in vitro so that the bioengineered stem cells may be used in vivo. A method that could easily be adapted to generate a wide range of stem cell types would allow a multitude of therapeutic applications to be developed. Human embryonic stem cell research and consequent therapeutic applications could provide treatments for a variety of conditions and disorders, including Alzheimer's disease, spinal cord injuries, amyotrophic lateral sclerosis, Parkinson's disease, type-1 diabetes, and cardiovascular diseases. Stem cells that could be readily differentiated into desired cell types could also be useful for a number of tissue engineering applications such as the production of complete organs, including livers, kidneys, eyes, hearts, or even parts of the brain. In addition, the ability to control stem cell proliferation and differentiation has applicability in developing targeted drug treatments.

[0199] The present invention relates, in part, to novel methods and compositions that enhance stem cell therapies. One aspect of the present invention includes compositions and methods of inducing stem cells to differentiate into a desired cell type by contacting a stem cell or a plurality of stem cells, with, or delivering to a stem cell or a plurality of stem cells, one or more catalytically active TET family enzymes, one or more functional TET family derivatives, or one or more TET catalytically active fragments thereof, or engineered nucleic acids encoding one or more of such catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof, to increase pluripotency of said cell being contacted or delivered to.

[0200] As defined herein, "stem cells" are primitive undifferentiated cells having the capacity to differentiate and mature into other cell types, for example, brain, muscle, liver and blood cells. Stem cells are typically classified as either embryonic stem cells, or adult tissue derived-stem cells, depending on the source of the tissue from which they are derived. "Pluripotent stem cells", as defined herein, are undifferentiated cells having the potential to differentiate to derivatives of all three embryonic germ layers (endoderm, mesoderm, and ectoderm). Adult progenitor cells are adult stem cells which can give rise to a limited number of particular types of cells, such as hematopoetic progenitor cells. Stem cells for use with the present invention may be obtained from any source. By way of example, pluripotent stem cells can be isolated from the primordial germinal ridge of the developing embryo, from teratocarcinomas, and from non-embryonic tissues, including but not limited to the bone marrow, brain, liver, pancreas, peripheral blood, fat tissue, placenta, skeletal muscle, chorionic villus, and umbilical cord blood. The methods and compositions of the present invention may be used with and include embryonic stem cells. Embryonic stem cells are typically derived from the inner cell mass of blastocyst-stage embryos (Odorico et al. 2001, Stem Cells 19:193-204; Thomson et al. 1995. Proc Natl Acad Sci USA. 92:7844-7848.; Thomson et al. 1998. Science 282:1145-1147). The distinguishing characteristics of stem cells are (i) their ability to be cultured in their non-differentiated state and (ii) their capacity to give rise to differentiated daughter cells representing all three germ layers of the embryo and the extra-embryonic cells that support development. Embryonic stem cells have been isolated from other sites in the embryo. Embryonic stem cells may be induced to undergo lineage specific differentiation in response to soluble factors.

[0201] According to certain embodiments, the stem cells are of human origin. According to one embodiment, the stem cells are selected from embryonic stem cells and adult stem cells. The adult stem cell can be a pluripotent cell or a partially committed progenitor cell.

[0202] According to certain embodiments, the composition comprises genetically modified stem cells. Typically, the cells are transformed with a suitable vector comprising a nucleic acid sequence for effecting the desired genetic alteration, as is known to a person skilled in the art.

[0203] According to certain embodiments, the stem cells may be partially committed progenitors isolated from several tissue sources. In some embodiments, the partially committed progenitors are hematopoietic cells, neural progenitor cells, oligodendrocyte cells, skin cells, hepatic cells, muscle cells, bone cells, mesenchymal cells, pancreatic cells, chondrocytes or marrow stromal cells.

[0204] Such stem cells, upon contact with, or delivery of, one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof, can then be utilized for stem cell therapy treatments, wherein said contacted cell can undergo further manipulations to differentiate into a desired cell type for use in treatment of a disorder requiring cell or tissue replacement.

[0205] The differentiated stem cells of the present invention may be used as any other differentiated stem cell. By way of a non-limiting example, differentiated stem cells of the present invention can be used for tissue reconstitution or regeneration in a human patient in need thereof. The differentiated stem cells are administered in a manner that permits them to graft to the intended tissue site and reconstitute or regenerate the functionally deficient area. One method of administration is delivery through the peripheral blood vessel of the subject, given that stem cells are preferentially attracted to damaged areas. Another form of administration is by selective catheterization at or around the site of damage, which can lead to almost complete delivery of the stem cells into a damaged area.

Methods of Diagnosing and Treating Cancer

[0206] The present invention also provides, in part, improved methods for the diagnosis and treatment of cancer by the administration of compositions modulating catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof. Also encompassed in the methods of the present invention are methods for screening for the identification of TET family modulators. Such methods can be used to modify or determine, for example, treatments to be administered to an individual having or being predisposed to cancer.

[0207] Deregulation of gene expression is a hallmark of cancer. Although genetic lesions have been the focus of cancer research for many years, it has become increasingly recognized that aberrant epigenetic modifications also play major roles in the tumorigenic process. These modifications are imposed on chromatin, do not change the nucleotide sequence of DNA, and are manifested by specific patterns of gene expression that are heritable through many cell divisions. When a general role for DNA methylation in gene silencing was established more than 25 years ago, it was proposed that aberrant patterns of DNA methylation might play a role in tumorigenesis. Initial studies found evidence for a decrease in the total 5-methylcytosine content in tumor cells, and the occurrence of global hypomethylation in cancer was firmly established in subsequent studies. Hypomethylation occurs primarily at DNA repetitive elements and is believed to contribute to the genomic instability frequently seen in cancer. Hypomethylation can also contribute to overexpression of oncogenic proteins, as was shown to be associated with loss of imprinting of IGF2 (insulin growth factor 2), leading to aberrant activation of the normally silent maternally inherited allele. This was found to be associated with an increased risk for colon cancer. The mechanisms underlying global hypomethylation patterns are the focus of intensive research (E. N. Gal-Yam, Annu Rev Med 59: 267-280 (2008)).

[0208] Aberrant hypermethylation at normally unmethylated CpG islands occurs parallel to global hypomethylation. The CpG island promoter of the Rb (Retinoblastoma) gene, found to be hypermethylated in retinoblastoma, was the first tumor suppressor shown to harbor such a modification. This discovery was soon followed by studies showing promoter hypermethylation and silencing of other tumor suppressor genes, including, but not limited to VHL (von Hippel-Lindau) in renal cancer, the cell cycle regulator CDKN2 A/p16 in bladder cancer, and the mismatch repair gene hMLH1 in colon cancer. It is now established that aberrant hypermethylation at CpG island promoters is a hallmark of cancer. Notably, not only protein-coding genes undergo these modifications; CpG island promoters of noncoding microRNAs were shown to be hypermethylated in tumors, possibly contributing to their proposed roles in carcinogenesis (Id.).

[0209] The origin for the dysregulated methylation patterns in cancer are an active area of research. Initially it was suggested that like genetic mutations, de novo hypermethylation events are stochastically generated, and that the final patterns observed are a result of growth advantage and selection. However, several observations made in recent years should be noted: First, hypermethylation events are already apparent at precancerous stages, such as in benign tumors and in tumor-predisposing inflammatory lesions. Second, there seem to be defined sets of hypermethylated genes in certain tumors. These differential methylation signatures, or "methylomes," may even differentiate between tumors of the same type, as was recently shown for the CpG island methylator phenotype (CIMP) in colon cancer. Third, although many hypermethylated genes have tumor-suppressing functions, not all are involved in cell growth or tumorigenesis (Id.).

[0210] One object of the present invention relates to methods for treating an individual with, or at risk for, cancer by using an agent that modulates the hydroxylase activity of the catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments.

[0211] Accordingly, in one aspect the invention provides a method for treating an individual with or at risk for cancer using an effective amount of one or more modulators of the activity of the TET family of proteins. In one embodiment of the aspect, the method includes selecting a treatment for a patient affected by or at risk for developing cancer by determining the presence or absence of hypermethylated CpG island promoters of tumor suppressor genes, wherein if hypermethylation of tumor suppressor genes is detected, one administers to the individual an effective amount of a tumor suppressor activity reactivating catalytically active TET family enzyme, a functional TET family derivative, a TET catalytically active fragment therein, an activating modulator of TET family activity, or any combination thereof.

[0212] In one embodiment, the treatment involves the administration of a TET family inhibiting modulator. In particular, the TET family inhibiting modulator is specific to TET1, TET2, TET3, or CXXC4. In one embodiment of the aspect, the cancer being treated is a leukemia. In one embodiment, the leukemia is acute myeloid leukemia caused by the t(10:11)(q22:q23) Mixed Lineage Leukemia translocation of TET1. In one embodiment, the TET family inhibiting modulator is specific to TET2.

[0213] The present invention also provides, in another aspect, improved methods for the diagnosis of disease conditions by creating methylome or hydroxymethylome signatures for stratifying subjects at risk for a disease condition, and for directing therapy and monitoring the response to the therapy in subjects. In some embodiments of the aspect, methods to detect methylcytosine and 5-hydroxymethylcytosine in DNA from a subject diagnosed with or at risk for a disease condition are provided, wherein enzymes encoded by bacteriophages of the "T even" family are contacted with the DNA and the global level of methylation and hydroxymethylation determined. In one embodiment, the DNA is obtained from a diseased tissue sample of the subject. In one embodiment, the enzyme provided is an alpha-glucosyltransferase. In one embodiment, the alpha-glucosyltransferase provided is encoded by a bacteriophage selected from the group consisting of T2, T4, and T6 bacteriophages. In one embodiment, the enzyme is a beta-glucosyltransferase. In one embodiment, the beta-glucosyltransferase is encoded by a bacteriophage selected from T4 bacteriophages. In some embodiments, enzymes encoded by bacteriophages of the "T even" family add two glucose molecules linked in a beta-1-6 configuration to hydroxymethylcytosine to form gentibiose-containing-hydroxymethylcytosine. In one embodiment, the enzyme is a beta-glucosyl-alpha-glucosyl-transferase. In one embodiment, the beta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophage selected from the group consisting of T2 and T6 bacteriophages. In one embodiment, the disease condition is a myeloproliferative disorder, myelodysplatic disorders, acute myelogenous leukemia, or other malignant and pre-malignant conditions.

[0214] In some embodiments of the aspect, methods to detect global levels of methylcytosine and 5-hydroxymethylcytosine in DNA from a subject with familial predisposition for a disease condition are provided, wherein enzymes encoded by bacteriophages of the "T even" family are contacted with the DNA. In one embodiment, the enzyme provided is an alpha-glucosyltransferase. In one embodiment, the alpha-glucosyltransferase provided is encoded by a bacteriophage selected from the group consisting of T2, T4, and T6 bacteriophages. In one embodiment, the enzyme is a beta-glucosyltransferase. In one embodiment, the beta-glucosyltransferase is encoded by a bacteriophage selected from T4 bacteriophages. In some embodiments, enzymes encoded by bacteriophages of the "T even" family add two glucose molecules linked in a beta-1-6 configuration to hydroxymethylcytosine to form gentibiose-containing-hydroxymethylcytosine. In one embodiment, the enzyme is a beta-glucosyl-alpha-glucosyl-transferase. In one embodiment, the beta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophage selected from the group consisting of T2 and T6 bacteriophages. In one embodiment, the disease condition is a myeloproliferative disorder, myelodysplatic disorders, acute myelogenous leukemia, or other malignant and pre-malignant conditions. In one embodiment, the DNA is isolated from the CD34+ hematopoietic cells of a family member of a subject with a disease condition, to determine if there is a familial predisposition.

[0215] Also encompassed in the methods of the present invention are methods for screening for and identifying drugs that cause alterations in the methylcytosine and 5-hydroxymethylcytosine residues in genomic DNA using the compositions and methods described herein.

[0216] As defined herein, the phrase "genetic predisposition" refers to the genetic makeup of a subject or cell, that makes or predetermines the subject's or cells' likelihood of being susceptible to a particular disease, disorder or malignancy, or likelihood of responding to a treatment for a disease disorder or malignancy. Accordingly, as defined herein, an individual having a "familial predisposition" refers to the subject or individual having one or more family members that have had, have, or have an increased likelihood of developing, a particular disease, disorder or malignancy, such as, cancer. The familial predisposition may be due to one or more underlying genetic mutations, or can be caused by shared environmental risk factors in the family members, or be a combination thereof.

[0217] As defined herein, a "cancer", "malignancy", or "malignant condition" refers to the presence of cells possessing characteristics typical of cancer-causing cells, such as uncontrolled proliferation, immortality, metastatic potential, rapid growth and proliferation rate, and certain characteristic morphological features. Often, cancer cells will be in the form of a tumor, but such cells may exist alone within a patient, or may be a non-tumorigenic cancer cell, such as a leukemia cell. In some circumstances, cancer cells will be in the form of a tumor; such cells may exist locally, or circulate in the blood stream as independent cells, for example, leukemic cells. Examples of cancers, wherein methylation status plays a role, include, but are not limited to, breast cancer, a melanoma, adrenal gland cancer, biliary tract cancer, bladder cancer, brain or central nervous system cancer, bronchus cancer, blastoma, carcinoma, a chondrosarcoma, cancer of the oral cavity or pharynx, cervical cancer, colon cancer, colorectal cancer, esophageal cancer, gastrointestinal cancer, glioblastoma, hepatic carcinoma, hepatoma, kidney cancer, leukemia, liver cancer, lung cancer, lymphoma, non-small cell lung cancer, osteosarcoma, ovarian cancer, pancreas cancer, peripheral nervous system cancer, prostate cancer, sarcoma, salivary gland cancer, small bowel or appendix cancer, small-cell lung cancer, squamous cell cancer, stomach cancer, testis cancer, thyroid cancer, urinary bladder cancer, uterine or endometrial cancer, and vulval cancer.

[0218] "Leukemia" is a cancer of the blood or bone marrow and is characterized by an abnormal proliferation of white blood cells i.e., leukocytes. There are four major classifications of leukemia comprising of Acute lymphoblastic leukemia (ALL), Chronic lymphocytic leukemia (CLL), Acute myelogenous leukemia or acute myeloid leukemia (AML), and Chronic myelogenous leukemia (CML).

[0219] "Acute myeloid leukemia" (AML), also known as acute myelogenous leukemia, is a cancer of the myeloid line of white blood cells, characterized by the rapid proliferation of abnormal myeloid cells that accumulate in the bone marrow and interfere with the production of normal blood cells. AML is the most common acute leukemia affecting adults, and its incidence increases with age. The World Health Organization (WHO) classification of subtypes of acute myeloid leukemia comprises of: a) AML with characteristic genetic abnormalities, including, but not limited to AML with translocations between chromosome 10 and 11 [t(10, 11)], chromosome 8 and 21 [t(8;21)], chromosome 15 and 17 [t(15;17)], and inversions in chromosome 16 [inv(16)]; b) AML with multilineage dysplasia, which includes patients who have had a prior myelodysplastic syndrome (MDS) or myeloproliferative disease that transforms into AML; c) AML and myelodysplastic syndrome (MDS), therapy-related, which category includes patients who have had prior chemotherapy and/or radiation and subsequently develop AML or MDS. These leukemias may also be characterized by specific chromosomal abnormalities; d) AML not otherwise categorized, which includes subtypes of AML that do not fall into the above categories; and e) Acute leukemias of ambiguous lineage, which occur when the leukemic cells can not be classified as either myeloid or lymphoid cells, or where both types of cells are present. Acute myeloid leukemias can further be classified or diagnosed as: minimally differentiated acute myeloblastic leukemia (M0), acute myeloblastic leukemia, without maturation (M1), acute myeloblastic leukemia, with granulocytic maturation (M2) (caused by t(8;21)(q22;q22), t(6;9)), promyelocytic, or acute promyelocytic leukemia (APL) (M3), (caused by t(15;17)), acute myelomonocytic leukemia (M4), (caused by inv(16)(p13q22), del(16q)), myelomonocytic together with bone marrow eosinophilia (M4eo), (caused by inv(16), t(16;16)), acute monoblastic leukemia (M5a) or acute monocytic leukemia (M5b) (caused by del (11q), t(9;11), t(11;19)), acute erythroid leukemias, including erythroleukemia (M6a) and very rare pure erythroid leukemia (M6b), acute megakaryoblastic leukemia (M7), (caused by t(1;22)), and acute basophilic leukemia (M8).

[0220] In connection with the administration of a TET family modulator, a drug which is "effective against" a cancer indicates that administration in a clinically appropriate manner results in a beneficial effect for at least a statistically significant fraction of patients, such as a improvement of symptoms, a cure, a reduction in disease load, reduction in tumor mass or cell numbers, extension of life, improvement in quality of life, or other effect generally recognized as positive by medical doctors familiar with treating the particular type of disease or condition.

[0221] In connection with determining or modifying a treatment to be administered to an individual having a cancer, or having familial predisposition to a cancer, such as a leukemia, the treatment can include, for example, imatinib (Gleevac), all-trans-retinoic acid, a monoclonal antibody treatment (gemtuzumab ozogamicin), chemotherapy (for example, chlorambucil, prednisone, prednisolone, vincristine, cytarabine, clofarabine, farnesyl transferase inhibitors, decitabine, inhibitors of MDR1, rituximab, interferon-.alpha., anthracycline drugs (such as daunorubicin or idarubicin), L-asparaginase, doxorubicin, cyclophosphamide, doxorubicin, bleomycin, fludarabine, etoposide, pentostatin, or cladribine), bone marrow transplant, stem cell transplant, radiation therapy, anti-metabolite drugs (methotrexate and 6-mercaptopurine), or any combination thereof. The modification of the treatment based upon, for example, determination of the hydroxymethylation status of a cell, or TET family activity, includes, but is not limited, to changing the dosage, frequency, duration, or type of treatment(s) being administered to a patient in need thereof.

[0222] A "TET family modulator" is a molecule that acts to either increase or reduce the production and/or accumulation of TET family gene product activity in a cell. The molecule can thus either enhance or prevent the accumulation at any step of the pathway leading from the TET family gene to TET family enzymatic activity, e.g. transcription, mRNA levels, translation, or the enzyme itself. As used interchangeably herein, an "inhibitor", "inhibiting modulator" or "inhibitory modulator" of the TET family is a molecule that acts to reduce the production and/or accumulation of TET family gene product activity in a cell. The inhibitor, inhibiting modulator or inhibitory modulator molecule can thus prevent the accumulation at any step of the pathway leading from the TET family gene to the TET family enzymatic activity e.g. preventing transcription, reducing mRNA levels, preventing translation, or inhibiting the enzyme itself. Similarly, as used interchangeably herein, an "activator" or "activating modulator" of the TET family is a molecule that acts to increase the production and/or accumulation of TET family gene product activity in a cell. The TET family activator or activating modulator molecule can thus enhance the accumulation at any step of the pathway leading from the TET family gene to TET family enzymatic activity e g enhancing transcription, increasing mRNA levels, enhancing translation, or activating the enzyme itself.

[0223] In one embodiment of the present aspect, the TET family targeting treatment is a TET family inhibitor. In a preferred embodiment, the TET targeting treatment is specific for the inhibition of TET1, TET2, TET3, or CXXC4. For example, a small molecule inhibitor, a competitive inhibitor, an antibody or antigen-binding fragment thereof, or a nucleic acid that inhibits TET1, TET2, TET3, or CXXC4, as encompassed under "Definitions".

[0224] In one embodiment of the present aspect, the TET family targeting treatment is a TET family activator. Alternatively and preferably, the TET targeting treatment is specific for the activation of TET1, TET2, TET3, or CXXC4. For example, a small molecule activator, an agonist, an antibody or antigen-binding fragment thereof, or a nucleic acid that activates TET1, TET2, TET3, or CXXC4, as defined under "Definitions".

[0225] Also encompassed in the methods of the present aspect are methods to screen for the identification of a TET family modulator for use in anti-cancer therapies. The method comprises a) providing a cell comprising a TET family enzyme or recombinant TET family enzyme thereof; b) contacting said cell with a test molecule; c) comparing the relative levels of 5-hydroxymethylated cytosine in cells expressing the TET family enzyme or recombinant TET family enzyme thereof in the presence of the test molecule with the level of 5-hydroxymethylated cytosine expressed in a control sample in the absence of the test molecule; and d) determining whether or not the test molecule increases or decreases the level of 5-hydroxymethylated cytosine, wherein a statistically significant decrease in the level of 5-hydroxymethylated cytosine indicates the molecule is an inhibitor and a statistically significant increase in the level of 5-hydroxymethylated cytosine indicates the molecule is an activator.

[0226] In another embodiment of the aspect, a method for high-throughput screening for anti-cancer agents is provided. The method comprises screening for and identifying TET family modulators. For example, providing a combinatorial library containing a large number of potential therapeutic compounds (potential modulator compounds). Such "combinatorial chemical libraries" are then screened in one or more assays to identify those library members (particular chemical species or subclasses) that display a desired characteristic activity (e.g., inhibition of TET family mediated 5-methylcytosine to 5-hydroxymethylcytosine conversion or activation of TET family mediated 5-methylcytosine to 5-hydroxymethylcytosine conversion). The compounds thus identified can serve as conventional "lead compounds" or "candidate therapeutic agents," and can be derivatized for further testing to identify additional TET family modulators.

[0227] Once identified, such compounds are administered to patients in need of TET family targeted treatment, for example, patients affected with, or at risk for, developing cancer or cancer metastasis. The route of administration may be intravenous (I.V.), intramuscular (I.M.), subcutaneous (S.C.), intradermal (I.D.), intraperitoneal (I.P.), intrathecal (I.T.), intrapleural, intrauterine, rectal, vaginal, topical, intratumor and the like. The compounds of the invention can be administered parenterally by injection or by gradual infusion over time and can be delivered by peristaltic means. Administration may be by transmucosal or transdermal means. For transmucosal or transdermal administration, penetrants appropriate to the barrier to be permeated are used in the formulation. Such penetrants are generally known in the art, and include, for example, for transmucosal administration bile salts and fusidic acid derivatives. In addition, detergents may be used to facilitate permeation. Transmucosal administration may be through nasal sprays, for example, or using suppositories. For oral administration, the compounds of the invention are formulated into conventional oral administration forms such as capsules, tablets and tonics. For topical administration, the pharmaceutical composition (e.g., inhibitor of TET family activity) is formulated into ointments, salves, gels, or creams, as is generally known in the art. The therapeutic compositions of this invention are conventionally administered intravenously, as by injection of a unit dose, for example. The term "unit dose" when used in reference to a therapeutic composition of the present invention refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle. The compositions are administered in a manner compatible with the dosage formulation, and in a therapeutically effective amount. The quantity to be administered and timing depends on the subject to be treated, capacity of the subject's system to utilize the active ingredient, and degree of therapeutic effect desired.

[0228] Any formulation or drug delivery system containing the active ingredients required for TET family modulation, suitable for the intended use, as are generally known to those of skill in the art, can be used. Suitable pharmaceutically acceptable carriers for oral, rectal, topical or parenteral (including inhaled, subcutaneous, intraperitoneal, intramuscular and intravenous) administration are known to those of skill in the art. The carrier must be pharmaceutically acceptable in the sense of being compatible with the other ingredients of the formulation and not deleterious to the recipient thereof. As used herein, the terms "pharmaceutically acceptable", "physiologically tolerable" and grammatical variations thereof, as they refer to compositions, carriers, diluents and reagents, are used interchangeably and represent that the materials are capable of administration to or upon a mammal without the production of undesirable physiological effects.

Definitions

[0229] As used herein, the term "drug" or "compound" refers to a chemical entity or biological product, or combination of chemical entities or biological products, administered to a person to treat or prevent or control a disease or condition. The chemical entity or biological product is preferably, but not necessarily a low molecular weight compound, but may also be a larger compound, for example, an oligomer of nucleic acids, amino acids, or carbohydrates including, without limitation, proteins, oligonucleotides, ribozymes, DNAzymes, glycoproteins, siRNAs, lipoproteins, aptamers, and modifications and combinations thereof.

[0230] The terms "effective" and "effectiveness", as used herein, includes both pharmacological effectiveness and physiological safety. Pharmacological effectiveness refers to the ability of the treatment to result in a desired biological effect in the patient. Physiological safety refers to the level of toxicity, or other adverse physiological effects at the cellular, organ and/or organism level (often referred to as side-effects) resulting from administration of the treatment. "Less effective" means that the treatment results in a therapeutically significant lower level of pharmacological effectiveness and/or a therapeutically greater level of adverse physiological effects.

[0231] As used herein, the phrase "therapeutically effective amount" or "effective amount" are used interchangeably and refer to the amount of an agent that is effective, at dosages and for periods of time necessary to achieve the desired therapeutic result, e.g., for an increase in hydroxymethylation for a TET family activator, or a decrease or prevention of hydroxymethylation for a TET family inhibitor. An effective amount for treating such a disease related to defects in methylation is an amount sufficient to result in a reduction or amelioration of the symptoms of the disorder, disease, or medical condition. By way of example only, an effective amount of a TET family inhibitor for treatment of a disease characterized by an increase in hydroxymethylation will cause a decrease in hydroxymethylation. An effective amount for treating such an hydroxymethylation-related disease (i.e. one characterized by an increase in hydroxymethylation) is an amount sufficient to result in a reduction or amelioration of the symptoms of the disorder, disease, or medical condition. The effective amount of a given therapeutic agent (i.e. TET family inhibitor or TET family activator,) will vary with factors such as the nature of the agent, the route of administration, the size and species of the animal, such as a human, to receive the therapeutic agent, and the purpose of the administration.

[0232] A therapeutically effective amount of the agents, factors, or inhibitors described herein, or functional derivatives thereof, can vary according to factors such as disease state, age, sex, and weight of the subject, and the ability of the therapeutic compound to elicit a desired response in the individual or subject. A therapeutically effective amount is also one in which any toxic or detrimental effects of the therapeutic agent are outweighed by the therapeutically beneficial effects. The effective amount in each individual case can be determined empirically by a skilled artisan according to established methods in the art and without undue experimentation. Efficacy of treatment can be judged by an ordinarily skilled practitioner. Efficacy can be assessed in animal models of cancer and tumor, for example treatment of a rodent with an experimental cancer, and any treatment or administration of an TET family inhibitor in a composition or formulation that leads to a decrease of at least one symptom of the cancer, for example a reduction in the size of the tumor.

[0233] As used herein, the phrase "pharmaceutically acceptable", and grammatical variations thereof, as they refer to compositions, carriers, diluents and reagents, are used interchangeably and represent that the materials are capable of administration to or upon a mammal without the production of undesirable physiological effects such as nausea, dizziness, gastric upset and the like. Each carrier must also be "acceptable" in the sense of being compatible with the other ingredients of the formulation. A pharmaceutically acceptable carrier typically will not promote the raising of an immune response to an agent with which it is admixed, unless so desired. The preparation of a pharmacological composition that contains active ingredients dissolved or dispersed therein is well understood in the art and need not be limited based on formulation. The pharmaceutical formulation contains a compound of the invention in combination with one or more pharmaceutically acceptable ingredients. The carrier can be in the form of a solid, semi-solid or liquid diluent, cream or a capsule. Typically such compositions are prepared as injectable either as liquid solutions or suspensions, however, solid forms suitable for solution, or suspensions, in liquid prior to use can also be prepared. The preparation can also be emulsified or presented as a liposome composition. The active ingredient can be mixed with excipients which are pharmaceutically acceptable and compatible with the active ingredient and in amounts suitable for use in the therapeutic methods described herein. Suitable excipients are, for example, water, saline, dextrose, glycerol, ethanol or the like and combinations thereof. In addition, if desired, the composition can contain minor amounts of auxiliary substances such as wetting or emulsifying agents, pH buffering agents and the like which enhance the effectiveness of the active ingredient. The therapeutic composition of the present invention can include pharmaceutically acceptable salts of the components therein. Pharmaceutically acceptable salts include the acid addition salts (formed with the free amino groups of the polypeptide) that are formed with inorganic acids such as, for example, hydrochloric or phosphoric acids, or such organic acids as acetic, tartaric, mandelic and the like. Salts formed with the free carboxyl groups can also be derived from inorganic bases such as, for example, sodium, potassium, ammonium, calcium or ferric hydroxides, and such organic bases as isopropylamine, trimethylamine, 2-ethylamino ethanol, histidine, procaine and the like. Physiologically tolerable carriers are well known in the art. Exemplary liquid carriers are sterile aqueous solutions that contain no materials in addition to the active ingredients and water, or contain a buffer such as sodium phosphate at physiological pH value, physiological saline or both, such as phosphate-buffered saline. Still further, aqueous carriers can contain more than one buffer salt, as well as salts such as sodium and potassium chlorides, dextrose, polyethylene glycol and other solutes. Liquid compositions can also contain liquid phases in addition to and to the exclusion of water. Exemplary of such additional liquid phases are glycerin, vegetable oils such as cottonseed oil, and water-oil emulsions. The amount of an active agent used in the invention that will be effective in the treatment of a particular disorder or condition will depend on the nature of the disorder or condition, and can be determined by standard clinical techniques. The phrase "pharmaceutically acceptable carrier or diluent" means a pharmaceutically acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, solvent or encapsulating material, involved in carrying or transporting the subject agents from one organ, or portion of the body, to another organ, or portion of the body.

[0234] The terms "subject" and "individual" are used interchangeably herein, and refer to an animal, for example, a human from whom cells can be obtained (i.e. differentiated cells can be obtained which are reprogrammed) and/or to whom treatment, including prophylactic treatment, with the reprogrammed cells (or their differentiated progeny) as described herein, is provided. For treatment of conditions or disease states which are specific for a specific animal such as a human subject, the term subject refers to that specific animal. The term "mammal" is intended to encompass a singular "mammal" and plural "mammals," and includes, but is not limited to humans; primates such as apes, monkeys, orangutans, and chimpanzees; canids such as dogs and wolves; felids such as cats, lions, and tigers; equids such as horses, donkeys, and zebras; food animals such as cows, pigs, and sheep; ungulates such as deer and giraffes; rodents such as mice, rats, hamsters and guinea pigs; and bears. In some preferred embodiments, a mammal is a human. The "non-human animals" and "non-human mammals" as used interchangeably herein, includes mammals such as rats, mice, rabbits, sheep, cats, dogs, cows, pigs, and non-human primates. The term "subject" also encompasses any vertebrate including but not limited to mammals, reptiles, amphibians and fish. However, advantageously, the subject is a mammal such as a human, or other mammals such as a domesticated mammal, e.g. dog, cat, horse, and the like, or production mammal, e.g. cow, sheep, pig, and the like are also encompassed in the term subject.

[0235] As used herein the terms "sample" or "biological sample" means any sample, including but not limited to cells, organisms, lysed cells, cellular extracts, nuclear extracts, or components of cells or organisms, extracellular fluid, and media in which cells are cultured.

[0236] The term "in vitro" as used herein refers to refers to the technique of performing a given procedure in a controlled environment outside of a living organism. The term "in vivo", as used herein refers to experimentation using a whole, living organism as opposed to a partial or dead organism, or in an in vitro controlled environment. "Ex vivo" as the term is used herein, means that which takes place outside an organism. The term ex vivo is often differentiated from the term in vitro in that the tissue or cells need not be in culture; these two terms are not necessarily synonymous.

[0237] The term "pluripotent" as used herein refers to a cell with the capacity, under different conditions, to differentiate to more than one differentiated cell type, and preferably to differentiate to cell types characteristic of all three germ cell layers. Pluripotent cells are characterized primarily by their ability to differentiate to more than one cell type, preferably to all three germ layers, using, for example, a nude mouse teratoma formation assay. Pluripotency is also evidenced by the expression of embryonic stem (ES) cell markers, although the preferred test for pluripotency is the demonstration of the capacity to differentiate into cells of each of the three germ layers. In some embodiments, a pluripotent cell is an undifferentiated cell.

[0238] The term "stem cell" as used herein, refers to an undifferentiated cell which is capable of proliferation and giving rise to more progenitor cells having the ability to generate a large number of mother cells that can in turn give rise to differentiated, or differentiable daughter cells. The daughter cells themselves can be induced to proliferate and produce progeny that subsequently differentiate into one or more mature cell types, while also retaining one or more cells with parental developmental potential. The term "stem cell" refers to a subset of progenitors that have the capacity or potential, under particular circumstances, to differentiate to a more specialized or differentiated phenotype, and which retains the capacity, under certain circumstances, to proliferate without substantially differentiating. In one embodiment, the term stem cell refers generally to a naturally occurring mother cell whose descendants (progeny) specialize, often in different directions, by differentiation, e.g., by acquiring completely individual characters, as occurs in progressive diversification of embryonic cells and tissues. Cellular differentiation is a complex process typically occurring through many cell divisions. A differentiated cell may derive from a multipotent cell which itself is derived from a multipotent cell, and so on. While each of these multipotent cells may be considered stem cells, the range of cell types each can give rise to may vary considerably. Some differentiated cells also have the capacity to give rise to cells of greater developmental potential. Such capacity may be natural or may be induced artificially upon treatment with various factors. In many biological instances, stem cells are also "multipotent" because they can produce progeny of more than one distinct cell type, but this is not required for "stem-ness." Self-renewal is the other classical part of the stem cell definition, and it is essential as used in this document. In theory, self-renewal can occur by either of two major mechanisms. Stem cells may divide asymmetrically, with one daughter retaining the stem state and the other daughter expressing some distinct other specific function and phenotype. Alternatively, some of the stem cells in a population can divide symmetrically into two stems, thus maintaining some stem cells in the population as a whole, while other cells in the population give rise to differentiated progeny only. Formally, it is possible that cells that begin as stem cells might proceed toward a differentiated phenotype, but then "reverse" and re-express the stem cell phenotype, a term often referred to as "dedifferentiation" or "reprogramming" or "retrodifferentiation" by persons of ordinary skill in the art. In the context of cell ontogeny, the adjective "differentiated", or "differentiating" is a relative term meaning a "differentiated cell" is a cell that has progressed further down the developmental pathway than the cell it is being compared with. Thus, a reprogrammed cell, as this term is defined herein can differentiate to lineage-restricted precursor cells (such as a mesodermal stem cell), which in turn can differentiate into other types of precursor cells further down the pathway (such as an tissue specific precursor, for example, a cardiomyocyte precursor), and then to an end-stage differentiated cell, which plays a characteristic role in a certain tissue type, and may or may not retain the capacity to proliferate further.

[0239] The term "embryonic stem cell" is used to refer to the pluripotent stem cells of the inner cell mass of the embryonic blastocyst (see U.S. Pat. Nos. 5,843,780, 6,200,806, which are incorporated herein by reference). Such cells can similarly be obtained from the inner cell mass of blastocysts derived from somatic cell nuclear transfer (see, for example, U.S. Pat. Nos. 5,945,577, 5,994,619, 6,235,970, which are incorporated herein by reference). The distinguishing characteristics of an embryonic stem cell define an embryonic stem cell phenotype. Accordingly, a cell has the phenotype of an embryonic stem cell if it possesses one or more of the unique characteristics of an embryonic stem cell such that that cell can be distinguished from other cells. Exemplary distinguishing embryonic stem cell characteristics include, without limitation, gene expression profile, proliferative capacity, differentiation capacity, karyotype, responsiveness to particular culture conditions, and the like. The term "adult stem cell" or "ASC" is used to refer to any multipotent stem cell derived from non-embryonic tissue, including fetal, juvenile, and adult tissue. Stem cells have been isolated from a wide variety of adult tissues including blood, bone marrow, brain, olfactory epithelium, skin, pancreas, skeletal muscle, and cardiac muscle. Each of these stem cells can be characterized based on gene expression, factor responsiveness, and morphology in culture. Exemplary adult stem cells include neural stem cells, neural crest stem cells, mesenchymal stem cells, hematopoietic stem cells, and pancreatic stem cells. As indicated above, stem cells have been found resident in virtually every tissue.

[0240] The term "progenitor cell" is used herein to refer to cells that have a cellular phenotype that is more primitive (i.e., is at an earlier step along a developmental pathway or progression than is a fully differentiated cell) relative to a cell which it can give rise to by differentiation. Typically, progenitor cells also have significant or very high proliferative potential. Progenitor cells can give rise to multiple distinct differentiated cell types or to a single differentiated cell type, depending on the developmental pathway and on the environment in which the cells develop and differentiate.

[0241] The term "differentiated cell" refers to a primary cell that is not pluripotent as that term is defined herein. It should be noted that placing many primary cells in culture can lead to some loss of fully differentiated characteristics. However, simply culturing such cells does not, on its own, render them pluripotent. The transition to pluripotency requires a reprogramming stimulus beyond the stimuli that lead to partial loss of differentiated character in culture. Reprogrammed pluripotent cells also have the characteristic of the capacity of extended passaging without loss of growth potential, relative to primary cell parents, which generally have capacity for only a limited number of divisions in culture. Stated another way, the term "differentiated cell" refers to a cell of a more specialized cell type derived from a cell of a less specialized cell type (e.g., a stem cell such as an induced pluripotent stem cell) in a cellular differentiation process.

[0242] As used herein, the term "somatic cell" refers to a cell forming the body of an organism, as opposed to germline cells. In mammals, germline cells (also known as "gametes") are the spermatozoa and ova which fuse during fertilization to produce a cell called a zygote, from which the entire mammalian embryo develops. Every other cell type in the mammalian body--apart from the sperm and ova, the cells from which they are made (gametocytes) and undifferentiated stem cells--is a somatic cell: internal organs, skin, bones, blood, and connective tissue are all made up of somatic cells. In some embodiments the somatic cell is a "non-embryonic somatic cell", by which is meant a somatic cell that is not present in or obtained from an embryo and does not result from proliferation of such a cell in vitro. In some embodiments the somatic cell is an "adult somatic cell", by which is meant a cell that is present in or obtained from an organism other than an embryo or a fetus or results from proliferation of such a cell in vitro. Unless otherwise indicated the methods for reprogramming a differentiated cell can be performed both in vivo and in vitro (where in vivo is practiced when an differentiated cell is present within a subject, and where in vitro is practiced using isolated differentiated cell maintained in culture). In some embodiments, where a differentiated cell or population of differentiated cells are cultured in vitro, the differentiated cell can be cultured in an organotypic slice culture, such as described in, e.g., meneghel-Rozzo et al., (2004), Cell Tissue Res, 316(3);295-303. As used herein, the term "adult cell" refers to a cell found throughout the body after embryonic development.

[0243] As used herein, the term "small molecule" refers to a chemical agent including, but not limited to, peptides, peptidomimetics, amino acids, amino acid analogs, polynucleotides, polynucleotide analogs, aptamers, nucleotides, nucleotide analogs, organic or inorganic compounds (i.e., including heteroorganic and organometallic compounds) having a molecular weight less than about 10,000 grams per mole, organic or inorganic compounds having a molecular weight less than about 5,000 grams per mole, organic or inorganic compounds having a molecular weight less than about 1,000 grams per mole, organic or inorganic compounds having a molecular weight less than about 500 grams per mole, and salts, esters, and other pharmaceutically acceptable forms of such compounds.

[0244] A "nucleic acid", as described herein, can be RNA or DNA, and can be single or double stranded, and can be, for example, a nucleic acid encoding a protein of interest, a polynucleotide, an oligonucleotide, a nucleic acid analogue, for example peptide-nucleic acid (PNA), pseudo-complementary PNA (pc-PNA), locked nucleic acid (LNA) etc. Such nucleic acid sequences include, for example, but are not limited to, nucleic acid sequence encoding proteins, for example that act as transcriptional repressors, antisense molecules, ribozymes, small inhibitory nucleic acid sequences, for example, but not limited to, RNAi, shRNAi, siRNA, micro RNAi (mRNAi), antisense oligonucleotides etc.

[0245] As used herein, the term "DNA" is defined as deoxyribonucleic acid. The term "polynucleotide" is used herein interchangeably with "nucleic acid" to indicate a polymer of nucleosides. Typically a polynucleotide of this invention is composed of nucleosides that are naturally found in DNA or RNA (e.g., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine) joined by phosphodiester bonds. However the term encompasses molecules comprising nucleosides or nucleoside analogs containing chemically or biologically modified bases, modified backbones, etc., whether or not found in naturally occurring nucleic acids, and such molecules may be preferred for certain applications. Where this application refers to a polynucleotide it is understood that both DNA, RNA, and in each case both single- and double-stranded forms (and complements of each single-stranded molecule) are provided. "Polynucleotide sequence" as used herein can refer to the polynucleotide material itself and/or to the sequence information (i.e. the succession of letters used as abbreviations for bases) that biochemically characterizes a specific nucleic acid. A polynucleotide sequence presented herein is presented in a 5' to 3' direction unless otherwise indicated.

[0246] The terms "polypeptide" as used herein refers to a polymer of amino acids. The terms "protein" and "polypeptide" are used interchangeably herein. A peptide is a relatively short polypeptide, typically between about 2 and 60 amino acids in length. Polypeptides used herein typically contain amino acids such as the 20 L-amino acids that are most commonly found in proteins. However, other amino acids and/or amino acid analogs known in the art can be used. One or more of the amino acids in a polypeptide may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a phosphate group, a fatty acid group, a linker for conjugation, functionalization, etc. A polypeptide that has a nonpolypeptide moiety covalently or noncovalently associated therewith is still considered a "polypeptide". Exemplary modifications include glycosylation and palmitoylation. Polypeptides may be purified from natural sources, produced using recombinant DNA technology, synthesized through chemical means such as conventional solid phase peptide synthesis, etc. The term "polypeptide sequence" or "amino acid sequence" as used herein can refer to the polypeptide material itself and/or to the sequence information (i.e., the succession of letters or three letter codes used as abbreviations for amino acid names) that biochemically characterizes a polypeptide. A polypeptide sequence presented herein is presented in an N-terminal to C-terminal direction unless otherwise indicated.

[0247] The term "variant" as used herein refers to a polypeptide or nucleic acid that is "substantially similar" to a wild-type polypeptide or polynucleic acid. A molecule is said to be "substantially similar" to another molecule if both molecules have substantially similar structures (i.e., they are at least 50% similar in amino acid sequence as determined by BLASTp alignment set at default parameters) and are substantially similar in at least one relevant function (e.g., effect on cell migration). A variant differs from the naturally occurring polypeptide or nucleic acid by one or more amino acid or nucleic acid deletions, additions, substitutions or side-chain modifications, yet retains one or more specific functions or biological activities of the naturally occurring molecule.

[0248] Amino acid substitutions include alterations in which an amino acid is replaced with a different naturally-occurring or a non-conventional amino acid residue. Some substitutions can be classified as "conservative," in which case an amino acid residue contained in a polypeptide is replaced with another naturally occurring amino acid of similar character either in relation to polarity, side chain functionality or size. Substitutions encompassed by variants as described herein can also be "non-conservative," in which an amino acid residue which is present in a peptide is substituted with an amino acid having different properties (e.g., substituting a charged or hydrophobic amino acid with an uncharged or hydrophilic amino acid), or alternatively, in which a naturally-occurring amino acid is substituted with a non-conventional amino acid. Also encompassed within the term "variant," when used with reference to a polynucleotide or polypeptide, are variations in primary, secondary, or tertiary structure, as compared to a reference polynucleotide or polypeptide, respectively (e.g., as compared to a wild-type polynucleotide or polypeptide). Polynucleotide changes can result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence. Variants can also include insertions, deletions or substitutions of amino acids, including insertions and substitutions of amino acids and other molecules) that do not normally occur in the peptide sequence that is the basis of the variant, including but not limited to insertion of ornithine which does not normally occur in human proteins.

[0249] The term "derivative" as used herein refers to peptides which have been chemically modified, for example by ubiquitination, labeling, pegylation (derivatization with polyethylene glycol) or addition of other molecules. A molecule is also a "derivative" of another molecule when it contains additional chemical moieties not normally a part of the molecule. Such moieties can improve the molecule's solubility, absorption, biological half life, etc. The moieties can alternatively decrease the toxicity of the molecule, or eliminate or attenuate an undesirable side effect of the molecule, etc. Moieties capable of mediating such effects are disclosed in Remington's Pharmaceutical Sciences, 18th edition, A. R. Gennaro, Ed., MackPubl., Easton, Pa. (1990).

Recombinant Proteins

[0250] Typically, the proteins or polypeptides of the present invention are secreted into the growth medium of recombinant E. coli. To isolate the desired protein, the E. coli host cell carrying a recombinant plasmid is propagated, homogenized, and the homogenate is centrifuged to remove bacterial debris. The supernatant is then subjected to sequential ammonium sulfate precipitation. The fraction containing the desired protein of the present invention is subjected to gel filtration in an appropriately sized dextran or polyacrylamide column to separate the proteins. If necessary, the protein fraction may be further purified by HPLC. Alternative methods may be used as suitable. Mutations or variants of the above polypeptides or proteins are encompassed by the present invention. Variants may be modified by, for example, the deletion or addition of amino acids that have minimal influence on the properties, secondary structure, and hydropathic nature of the desired polypeptide. For example, a polypeptide may be conjugated to a signal (or leader) sequence at the N-terminal end of the protein which co-translationally or post-translationally directs transfer of the protein. The polypeptide may also be conjugated to a linker or other sequence for ease of synthesis, purification, or identification of the polypeptide.

[0251] Fragments of the above proteins are also encompassed by the present invention. Suitable fragments can be produced by several means. In the first, subclones of the gene encoding the desired protein of the present invention are produced by conventional molecular genetic manipulation by subcloning gene fragments. The subclones then are expressed in vitro or in vivo in bacterial cells to yield a smaller protein or peptide. In another approach, based on knowledge of the primary structure of the proteins of the present invention, fragments of the genes of the present invention may be synthesized by using the polymerase chain reaction ("PCR") technique together with specific sets of primers chosen to represent particular portions of the protein. These then would be cloned into an appropriate vector for increased expression of an accessory peptide or protein. Chemical synthesis can also be used to make suitable fragments. Such a synthesis is carried out using known amino acid sequences for the proteins of the present invention. These fragments can then be separated by conventional procedures (e.g., chromatography, SDS-PAGE) and used in the methods of the present invention.

[0252] The nucleic acid molecule encoding a catalytically active TET family enzyme, a functional TET family derivative, or a TET catalytically active fragment thereof of the present invention can be introduced into an expression system of choice using conventional recombinant technology. Generally, this involves inserting the nucleic acid molecule into an expression system to which the molecule is heterologous (i.e., not normally present). The introduction of a particular foreign or native gene into a mammalian host is facilitated by first introducing the gene sequence into a suitable nucleic acid vector. "Vector" is used herein to mean any genetic element, such as a plasmid, phage, transposon, cosmid, chromosome, virus, virion, etc., which is capable of replication when associated with the proper control elements and which is capable of transferring gene sequences between cells. Thus, the term includes cloning and expression vectors, as well as viral vectors. The heterologous nucleic acid molecule is inserted into the expression system or vector in proper sense (5' to 3') orientation and correct reading frame. Alternatively, the nucleic acid may be inserted in the "antisense" orientation, i.e, in a 3' to 5' prime direction. The vector contains the necessary elements for the transcription and translation of the inserted protein-coding sequences.

[0253] Recombinant genes may also be introduced into viruses, including vaccinia virus, adenovirus, and retroviruses, including lentivirus. Recombinant viruses can be generated by transfection of plasmids into cells infected with virus. Suitable vectors include, but are not limited to, the following viral vectors such as lambda vector system gt11, gt WES.tB, Charon 4, and plasmid vectors such as pBR322, pBR325, pACYC177, pACYC184, pUC8, pUC9, pUC18, pUC19, pLG339, pR290, pKC37, pKC101, SV 40, pBluescript II SK+/- or KS+/-(see "Stratagene Cloning Systems" Catalog (1993) from Stratagene, La Jolla, Calif., which is hereby incorporated by reference in its entirety), pQE, pIH821, pGEX, pET series (see F. W. Studier et. al., "Use of T7 RNA Polymerase to Direct Expression of Cloned Genes," Gene Expression Technology Vol. 185 (1990), and any derivatives thereof.

[0254] Recombinant molecules can be introduced into cells via transformation, particularly transduction, conjugation, mobilization, or electroporation. The DNA sequences are cloned into the vector using standard cloning procedures in the art, as described by Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Springs Laboratory, Cold Springs Harbor, N.Y. (1989), which is hereby incorporated by reference in its entirety. A variety of host-vector systems may be utilized to express the protein-encoding sequence of the present invention. Primarily, the vector system must be compatible with the host cell used. Host-vector systems include but are not limited to the following: bacteria transformed with bacteriophage DNA, plasmid DNA, or cosmid DNA; microorganisms such as yeast containing yeast vectors; mammalian cell systems infected with virus (e.g., vaccinia virus, adenovirus, etc.); insect cell systems infected with virus (e.g., baculovirus); and plant cells infected by bacteria.

[0255] The expression elements of these vectors vary in their strength and specificities. Depending upon the host-vector system utilized, any one of a number of suitable transcription and translation elements can be used. Different genetic signals and processing events control many levels of gene expression (e.g., DNA transcription and messenger RNA ("mRNA") translation).

[0256] Transcription of DNA is dependent upon the presence of a promoter which is a DNA sequence that directs the binding of RNA polymerase and thereby promotes mRNA synthesis. The DNA sequences of eukaryotic promoters differ from those of prokaryotic promoters. Furthermore, eukaryotic promoters and accompanying genetic signals may not be recognized in or may not function in a prokaryotic system, and, further, prokaryotic promoters are not recognized and do not function in eukaryotic cells. Similarly, translation of mRNA in prokaryotes depends upon the presence of the proper prokaryotic signals which differ from those of eukaryotes. Efficient translation of mRNA in prokaryotes requires a ribosome binding site called the Shine-Dalgarno ("SD") sequence on the mRNA. This sequence is a short nucleotide sequence of mRNA that is located before the start codon, usually AUG, which encodes the amino-terminal methionine of the protein. The SD sequences are complementary to the 3'-end of the 16S rRNA (ribosomal RNA) and probably promote binding of mRNA to ribosomes by duplexing with the rRNA to allow correct positioning of the ribosome. For a review on maximizing gene expression see Roberts and Lauer, Methods in Enzymology, 68:473 (1979), which is hereby incorporated by reference in its entirety. Promoters vary in their "strength" (i.e., their ability to promote transcription). For the purposes of expressing a cloned gene, it is desirable to use strong promoters in order to obtain a high level of transcription and, hence, expression of the gene.

[0257] Depending upon the host cell system utilized, any one of a number of suitable promoters may be used. For instance, when cloning in E. coli, its bacteriophages, or plasmids, promoters such as the T7 phage promoter, lac promoter, trp promoter, rec A promoter, ribosomal RNA promoter, the PR and PL promoters of coliphage lambda and others, including but not limited, to lac UV5, omp F, bla, 1pp, and the like, may be used to direct high levels of transcription of adjacent DNA segments. Additionally, a hybrid trp-lac UV5 (tac) promoter or other E. coli promoters produced by recombinant DNA or other synthetic DNA techniques may be used to provide for transcription of the inserted gene. Bacterial host cell strains and expression vectors may be chosen which inhibit the action of the promoter unless specifically induced. In certain operons, the addition of specific inducers is necessary for efficient transcription of the inserted DNA. For example, the lac operon is induced by the addition of lactose or IPTG (isopropylthio-beta-D-galactoside). A variety of other operons, such as trp, pro, etc., are under different controls.

[0258] Specific initiation signals are also required for efficient gene transcription and translation in prokaryotic cells. These transcription and translation initiation signals may vary in "strength" as measured by the quantity of gene specific messenger RNA and protein synthesized, respectively. The DNA expression vector, which contains a promoter, may also contain any combination of various "strong" transcription and/or translation initiation signals. For instance, efficient translation in E. coli requires a Shine-Dalgarno ("SD") sequence about 7-9 bases 5' to the initiation codon (ATG) to provide a ribosome binding site. Thus, any SD-ATG combination that can be utilized by host cell ribosomes may be employed. Such combinations include but are not limited to the SD-ATG combination from the cro gene or the N gene of coliphage lambda, or from the E. coli tryptophan E, D, C, B or A genes. Additionally, any SD-ATG combination produced by recombinant DNA or other techniques involving incorporation of synthetic nucleotides may be used. Depending on the vector system and host utilized, any number of suitable transcription and/or translation elements, including constitutive, inducible, and repressible promoters, as well as minimal 5' promoter elements may be used. The nucleic acid molecule(s) of the present invention, a promoter molecule of choice, a suitable 3' regulatory region, and if desired, a reporter gene, are incorporated into a vector-expression system of choice to prepare the nucleic acid construct of present invention using standard cloning procedures known in the art, such as described by Sambrook et al., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor: Cold Spring Harbor Laboratory Press, New York (2001), which is hereby incorporated by reference in its entirety.

[0259] In one aspect of the present invention, a nucleic acid molecule encoding a protein of choice is inserted into a vector in the sense (i.e 5' to 3') direction, such that the open reading frame is properly oriented for the expression of the encoded protein under the control of a promoter of choice. Single or multiple nucleic acids may be ligated into an appropriate vector in this way, under the control of a suitable promoter, to prepare a nucleic acid construct of the present invention. Once the isolated nucleic acid molecule encoding, for example, the catalytically active TET family protein or polypeptide has been cloned into an expression system, it is ready to be incorporated into a host cell. Recombinant molecules can be introduced into cells via transformation, particularly transduction, conjugation, lipofection, protoplast fusion, mobilization, particle bombardment, or electroporation. The DNA sequences are cloned into the host cell using standard cloning procedures known in the art, as described by Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Springs Laboratory, Cold Springs Harbor, N.Y. (1989), which is hereby incorporated by reference in its entirety. Suitable hosts include, but are not limited to, bacteria, virus, yeast, fungi, mammalian cells, insect cells, plant cells, and the like.

[0260] Accordingly, another aspect of the present invention relates to a method of making a recombinant cell. Essentially, this method is carried out by transforming a host cell with a nucleic acid construct of the present invention under conditions effective to yield transcription of the DNA molecule in the host cell. In one embodiment, a nucleic acid construct containing the nucleic acid molecule(s) of the present invention is stably inserted into the genome of the recombinant host cell as a result of the transformation. Transient expression in protoplasts allows quantitative studies of gene expression since the population of cells is very high (on the order of 10.sup.6). To deliver DNA inside protoplasts, several methodologies have been proposed, but the most common are electroporation (Neumann et al., "Gene Transfer into Mouse Lyoma Cells by Electroporation in High Electric Fields," EMBO J. 1: 841-45 (1982); Wong et al., "Electric Field Mediated Gene Transfer," Biochem Biophys Res Commun 30; 107(2):584-7 (1982); Potter et al., "Enhancer-Dependent Expression of Human Kappa Immunoglobulin Genes Introduced into Mouse pre-B Lymphocytes by Electroporation," Proc. Natl. Acad. Sci. USA 81: 7161-65 (1984), and polyethylene glycol (PEG) mediated DNA uptake, Sambrook et al., Molecular Cloning: A Laboratory Manual, Chap. 16, Second Edition, Cold Springs Laboratory, Cold Springs Harbor, N.Y. (1989). During electroporation, the DNA is introduced into the cell by means of a reversible change in the permeability of the cell membrane due to exposure to an electric field. PEG transformation introduces the DNA by changing the elasticity of the membranes. Unlike electroporation, PEG transformation does not require any special equipment and transformation efficiencies can be equally high. Another appropriate method of introducing the gene construct of the present invention into a host cell is fusion of protoplasts with other entities, either minicells, cells, lysosomes, or other fusible lipid-surfaced bodies that contain the chimeric gene. Fraley, et al., Proc. Natl. Acad. Sci. USA, 79:1859-63 (1982).

[0261] Stable transformants are preferable for the methods of the present invention, using variations of the methods above as described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Chap. 16, Second Edition, Cold Springs Laboratory, Cold Springs Harbor, N.Y. (1989). Typically, an antibiotic or other compound useful for selective growth of the transformed cells only is added as a supplement to the media. The compound to be used will be dictated by the selectable marker element present in the plasmid with which the host cell was transformed. Suitable selective marker genes are those which confer resistance to, e.g., gentamycin, G418, hygromycin, streptomycin, spectinomycin, tetracycline, chloramphenicol, and the like. Similarly, "reporter genes," which encode enzymes providing for production of an identifiable compound identifiable, or other markers which indicate relevant information regarding the outcome of gene delivery, are suitable. For example, various luminescent or phosphorescent reporter genes are also appropriate, such that the presence of the heterologous gene may be ascertained visually. An example of a marker suitable for the present invention is the green fluorescent protein (GFP) gene. The isolated nucleic acid molecule encoding a green fluorescent protein can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA, including messenger RNA or mRNA), genomic or recombinant, biologically isolated or synthetic. The DNA molecule can be a cDNA molecule, which is a DNA copy of a messenger RNA (mRNA) encoding the GFP. In one embodiment, the GFP can be from Aequorea victoria (Prasher et al., "Primary Structure of the Aequorea Victoria Green-Fluorescent Protein," Gene 111(2):229-233 (1992); U.S. Pat. No. 5,491,084 to Chalfie et al.). A plasmid encoding the GFP of Aequorea victoria is available from the ATCC as Accession No. 75547. Mutated forms of GFP that emit more strongly than the native protein, as well as forms of GFP amenable to stable translation in higher vertebrates, are commercially available from Clontech Laboratories, Inc. (Palo Alto, Calif.) and can be used for the same purpose. The plasmid designated pTa1-GFPh (ATCC Accession No. 98299) includes a humanized form of GFP. Indeed, any nucleic acid molecule encoding a fluorescent form of GFP can be used in accordance with the subject invention. Standard techniques are then used to place the nucleic acid molecule encoding GFP under the control of the chosen cell specific promoter. The selection marker employed will depend on the target species and/or host or packaging cell lines compatible with a chosen vector.

[0262] An "inhibitor" of a TET family enzyme, as the term is used herein, can function in a competitive or non-competitive manner, and can function, in one embodiment, by interfering with the expression of the TET family polypeptides. A TET family inhibitor includes any chemical or biological entity that, upon treatment of a cell, results in inhibition of the biological activity caused by activation of the TET family enzymes in response to cellular signals. Such an inhibitor can act by binding to the Cys-rich and double-stranded .beta.-helix domains of the enzymes and blockade of their enzymatic activity. Alternatively, such an inhibitor can act by causing conformationals shifts within or sterically hindering the enzymes, such that enyzmatic activity is abolished or reduced.

Inhibitors of TET Family Proteins and Activity

[0263] A "TET family inhibitor", as used herein, refers to a chemical entity or biological product, or combination of a chemical entity or a biological product. The chemical entity or biological product is preferably, but not necessarily a low molecular weight compound, but can also be a larger compound, for example, an oligomer of nucleic acids, amino acids, or carbohydrates including without limitation proteins, oligonucleotides, ribozymes, DNAzymes, glycoproteins, siRNAs, lipoproteins, aptamers, and modifications and combinations thereof. The term "inhibitor" refers to any entity selected from a group comprising; chemicals; small molecules; nucleic acid sequences; nucleic acid analogues; proteins; peptides; aptamers; antibodies; or fragments thereof.

[0264] A nucleic acid sequence can be RNA or DNA, and can be single or double stranded, and can be selected from a group comprising; nucleic acid encoding a protein of interest, oligonucleotides, nucleic acid analogues, for example peptide-nucleic acid (PNA), pseudo-complementary PNA (pc-PNA), locked nucleic acid (LNA), etc. Such nucleic acid sequences include, for example, but not limited to, nucleic acid sequence encoding proteins, for example that act as transcriptional repressors, antisense molecules, ribozymes, small inhibitory nucleic acid sequences, for example but not limited to RNAi, shRNAi, siRNA, micro RNAi (mRNAi), antisense oligonucleotides etc.

[0265] A protein and/or peptide agent can be any protein of interest, for example, but not limited to; mutated proteins; therapeutic proteins; truncated proteins, wherein the protein is normally absent or expressed at lower levels in the cell. Proteins can also be selected from a group comprising; mutated proteins, genetically engineered proteins, peptides, synthetic peptides, recombinant proteins, chimeric proteins, antibodies, midibodies, tribodies, humanized proteins, humanized antibodies, chimeric antibodies, modified proteins and fragments thereof. In some embodiments, the agent is any chemical, entity or moiety, including without limitation synthetic and naturally-occurring non-proteinaceous entities. In certain embodiments the agent is a small molecule having a chemical moiety. For example, chemical moieties included unsubstituted or substituted alkyl, aromatic, or heterocyclyl moieties including macrolides, leptomycins and related natural products or analogues thereof. Inhibitors can be known to have a desired activity and/or property, or can be selected from a library of diverse compounds.

[0266] Antibody Inhibitors of TET Family Enzymes:

[0267] Antibodies that specifically bind TET family enzymes can be used for inhibition in vivo, in vitro, or ex vivo. The TET family inhibitory activity of a given antibody, or, for that matter, any TET family inhibitor, can be assessed using methods known in the art or described herein. An antibody that inhibits TET family enzymes causes a decrease in the conversion of 5-methylcytosine to 5-hydroxymethylcytosine in the DNA of a cell. Specific binding is typically defined as binding that does not recognize other antigens, such as a protein, nucleotide, chemical residue, etc., at a detectable level in an assay used.

[0268] Antibody inhibitors of TET family enzymes can include polyclonal and monoclonal antibodies and antigen-binding derivatives or fragments thereof. Well known antigen binding fragments include, for example, single domain antibodies (dAbs; which consist essentially of single VL or VH antibody domains), Fv fragment, including single chain Fv fragment (scFv), Fab fragment, and F(ab')2 fragment. Methods for the construction of such antibody molecules are well known in the art. As used herein, the term "antibody" refers to an intact immunoglobulin or to a monoclonal or polyclonal antigen-binding fragment with the Fc (crystallizable fragment) region or FcRn binding fragment of the Fc region. Antigen-binding fragments may be produced by recombinant DNA techniques or by enzymatic or chemical cleavage of intact antibodies. "Antigen-binding fragments" include, inter alia, Fab, Fab', F(ab')2, Fv, dAb, and complementarity determining region (CDR) fragments, single-chain antibodies (scFv), single domain antibodies, chimeric antibodies, diabodies and polypeptides that contain at least a portion of an immunoglobulin that is sufficient to confer specific antigen binding to the polypeptide. The terms Fab, Fc, pFc', F(ab') 2 and Fv are employed with standard immunological meanings [Klein, Immunology (John Wiley, New York, N.Y., 1982); Clark, W. R. (1986) The Experimental Foundations of Modern Immunology (Wiley & Sons, Inc., New York); Roitt, I. (1991) Essential Immunology, 7th Ed., (Blackwell Scientific Publications, Oxford)].

[0269] Nucleic Acid Inhibitors of TET Family Enzymes:

[0270] A powerful approach for inhibiting the expression of selected target polypeptides is through the use of RNA interference agents. RNA interference (RNAi) uses small interfering RNA (siRNA) duplexes that target the messenger RNA encoding the target polypeptide for selective degradation. siRNA-dependent post-transcriptional silencing of gene expression involves cleaving the target messenger RNA molecule at a site guided by the siRNA. "RNA interference (RNAi)" is an evolutionally conserved process whereby the expression or introduction of RNA of a sequence that is identical or highly similar to a target gene results in the sequence specific degradation or specific post-transcriptional gene silencing (PTGS) of messenger RNA (mRNA) transcribed from that targeted gene (see Coburn, G. and Cullen, B. (2002) J. of Virology 76(18):9225), thereby inhibiting expression of the target gene. In one embodiment, the RNA is a double stranded RNA (dsRNA). In another embodiment, the RNA is a single stranded DNA. This process has been described in plants, invertebrates, and mammalian cells. In nature, RNAi is initiated by the dsRNA-specific endonuclease Dicer, which promotes processive cleavage of long dsRNA into double-stranded fragments termed siRNAs. siRNAs are incorporated into a protein complex (termed "RNA induced silencing complex," or "RISC") that recognizes and cleaves target mRNAs. RNAi can also be initiated by introducing nucleic acid molecules, e.g., synthetic siRNAs or RNA interfering agents, to inhibit or silence the expression of target genes. As used herein, "inhibition of target gene expression" includes any decrease in expression or protein activity or level of the target gene or protein encoded by the target gene as compared to a situation wherein no RNA interference has been induced. The decrease will be of at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or 99% or more as compared to the expression of a target gene or the activity or level of the protein encoded by a target gene which has not been targeted by an RNA interfering agent.

[0271] The terms "RNA interference agent" and "RNA interference" as they are used herein are intended to encompass those forms of gene silencing mediated by double-stranded RNA, regardless of whether the RNA interfering agent comprises an siRNA, miRNA, shRNA or other double-stranded RNA molecule. "Short interfering RNA" (siRNA), also referred to herein as "small interfering RNA" is defined as an RNA agent which functions to inhibit expression of a target gene, e.g., by RNAi. An siRNA may be chemically synthesized, may be produced by in vitro transcription, or may be produced within a host cell. In one embodiment, siRNA is a double stranded RNA (dsRNA) molecule of about 15 to about 40 nucleotides in length, preferably about 15 to about 28 nucleotides, more preferably about 19 to about 25 nucleotides in length, and more preferably about 19, 20, 21, 22, or 23 nucleotides in length, and may contain a 3' and/or 5' overhang on each strand having a length of about 0, 1, 2, 3, 4, or 5 nucleotides. The length of the overhang is independent between the two strands, i. e., the length of the overhang on one strand is not dependent on the length of the overhang on the second strand. Preferably the siRNA is capable of promoting RNA interference through degradation or specific post-transcriptional gene silencing (PTGS) of the target messenger RNA (mRNA).

[0272] siRNAs also include small hairpin (also called stem loop) RNAs (shRNAs). In one embodiment, these shRNAs are composed of a short (e.g., about 19 to about 25 nucleotide) antisense strand, followed by a nucleotide loop of about 5 to about 9 nucleotides, and the analogous sense strand. Alternatively, the sense strand may precede the nucleotide loop structure and the antisense strand may follow. These shRNAs may be contained in plasmids, retroviruses, and lentiviruses and expressed from, for example, the pol III U6 promoter, or another promoter (see, e.g., Stewart, et al. (2003) RNA April;9(4):493-501, incorporated by reference herein in its entirety). The target gene or sequence of the RNA interfering agent may be a cellular gene or genomic sequence, e.g. the TET1 sequence. An siRNA may be substantially homologous to the target gene or genomic sequence, or a fragment thereof. As used in this context, the term "homologous" is defined as being substantially identical, sufficiently complementary, or similar to the target mRNA, or a fragment thereof, to effect RNA interference of the target. In addition to native RNA molecules, RNA suitable for inhibiting or interfering with the expression of a target sequence include RNA derivatives and analogs. Preferably, the siRNA is identical to its target. The siRNA preferably targets only one sequence. Each of the RNA interfering agents, such as siRNAs, can be screened for potential off-target effects by, for example, expression profiling. Such methods are known to one skilled in the art and are described, for example, in Jackson et al. Nature Biotechnology 6:635-637, 2003.

[0273] In addition to expression profiling, one may also screen the target sequences for similar sequences in the sequence databases to identify sequences that may have off-target effects. For example, according to Jackson et al. (Id.) 15, or perhaps as few as 11 contiguous nucleotides, of sequence identity are sufficient to direct silencing of non-targeted transcripts. Therefore, one may initially screen the proposed siRNAs to avoid potential off-target silencing using the sequence identity analysis by any known sequence comparison methods, such as BLAST. siRNA sequences are chosen to maximize the uptake of the antisense (guide) strand of the siRNA into RISC and thereby maximize the ability of RISC to target human GGT mRNA for degradation. This can be accomplished by scanning for sequences that have the lowest free energy of binding at the 5'-terminus of the antisense strand. The lower free energy leads to an enhancement of the unwinding of the 5'-end of the antisense strand of the siRNA duplex, thereby ensuring that the antisense strand will be taken up by RISC and direct the sequence-specific cleavage of the, for example, TET1 mRNA.

[0274] siRNA molecules need not be limited to those molecules containing only RNA, but, for example, further encompasses chemically modified nucleotides and non-nucleotides, and also include molecules wherein a ribose sugar molecule is substituted for another sugar molecule or a molecule which performs a similar function. Moreover, a non-natural linkage between nucleotide residues can be used, such as a phosphorothioate linkage. The RNA strand can be derivatized with a reactive functional group of a reporter group, such as a fluorophore. Particularly useful derivatives are modified at a terminus or termini of an RNA strand, typically the 3' terminus of the sense strand. For example, the 2'-hydroxyl at the 3' terminus can be readily and selectively derivatizes with a variety of groups.

[0275] Other useful RNA derivatives incorporate nucleotides having modified carbohydrate moieties, such as 2'O-alkylated residues or 2'-O-methyl ribosyl derivatives and 2'-O-fluoro ribosyl derivatives. The RNA bases may also be modified. Any modified base useful for inhibiting or interfering with the expression of a target sequence may be used. For example, halogenated bases, such as 5-bromouracil and 5-iodouracil can be incorporated. The bases may also be alkylated, for example, 7-methylguanosine can be incorporated in place of a guanosine residue. Non-natural bases that yield successful inhibition can also be incorporated. The most preferred siRNA modifications include 2'-deoxy-2'-fluorouridine or locked nucleic acid (LAN) nucleotides and RNA duplexes containing either phosphodiester or varying numbers of phosphorothioate linkages. Such modifications are known to one skilled in the art and are described, for example, in Braasch et al., Biochemistry, 42: 7967-7975, 2003. Most of the useful modifications to the siRNA molecules can be introduced using chemistries established for antisense oligonucleotide technology. Preferably, the modifications involve minimal 2'-O-methyl modification, preferably excluding such modification. Modifications also preferably exclude modifications of the free 5'-hydroxyl groups of the siRNA. The Examples herein provide specific examples of RNA interfering agents, such as RNAi molecules that effectively target mRNA of a TET family enzyme. In some embodiments of the aspects described herein, examples of siRNA and shRNA sequences that can be used to inhibit TET family activity include, but are not limited to: SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 40, SEQ ID NO: 41, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 52, SEQ ID NO: 53, SEQ ID NO: 70, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 86, SEQ ID NO: 98, and SEQ ID NO: 92.

[0276] siRNAs useful for targeting expression of a TET family enzyme can be readily designed and tested. Chalk et al. (Nucl. Acids Res. 33: D131-D134 (2005)) describe a database of siRNA sequences and a predictor of siRNA sequences. Linked to the sequences in the database is information such as siRNA thermodynamic properties and the potential for sequence-specific off-target effects. The database and associated predictive tools enable the user to evaluate an siRNA's potential for inhibition and non-specific effects. The database is available at on the world wide web at siRNA.cgb.ki.se. Synthetic siRNA molecules, including shRNA molecules, can be obtained using a number of techniques known to those of skill in the art. For example, the siRNA molecule can be chemically synthesized or recombinantly produced using methods known in the art, such as using appropriately protected ribonucleoside phosphoramidites and a conventional DNA/RNA synthesizer (see, e.g., Elbashir, S. M. et al., Nature 411:494-498 (2001); Elbashir, S. M., et al., Genes & Development 15:188-200 (2001); Harborth, J. et al., J. Cell Science 114:4557-4565 (2001); Masters, J. R. et al., Proc. Natl. Acad. Sci., USA 98:8012-8017 (2001); and Tuschl, T. et al., Genes & Development 13:3191-3197 (1999)).

[0277] Alternatively, several commercial RNA synthesis suppliers are available including, but not limited to, Proligo (Hamburg, Germany), Dharmacon Research (Lafayette, Colo., USA), Pierce Chemical (part of Perbio Science, Rockford, Ill., USA), Glen Research (Sterling, Va., USA), ChemGenes (Ashland, Mass., USA), and Cruachem (Glasgow, UK). As such, siRNA molecules are not overly difficult to synthesize and are readily provided in a quality suitable for RNAi. In addition, dsRNAs can be expressed as stem loop structures encoded by plasmid vectors, retroviruses and lentiviruses (Paddison, P. J. et al., Genes Dev. 16:948-958 (2002); McManus, M. T. et al., RNA 8:842-850 (2002); Paul, C. P. et al., Nat. Biotechnol. 20:505-508 (2002); Miyagishi, M. et al., Nat. Biotechnol. 20:497-500 (2002); Sui, G. et al., Proc. Natl. Acad. Sci., USA 99:5515-5520 (2002); Brummelkamp, T. et al., Cancer Cell 2:243 (2002); Lee, N. S., et al., Nat. Biotechnol. 20:500-505 (2002); Yu, J. Y., et al., Proc. Natl. Acad. Sci., USA 99:6047-6052 (2002); Zeng, Y., et al., Mol. Cell 9:1327-1333 (2002); Rubinson, D. A., et al., Nat. Genet. 33:401-406 (2003); Stewart, S. A., et al., RNA 9:493-501 (2003)).

[0278] In one embodiment, the RNA interference agent is delivered or administered in a pharmaceutically acceptable carrier. Additional carrier agents, such as liposomes, can be added to the pharmaceutically acceptable carrier. In another embodiment, the RNA interference agent is delivered by a vector encoding small hairpin RNA (shRNA) in a pharmaceutically acceptable carrier to the cells in an organ of an individual. The shRNA is converted by the cells after transcription into siRNA capable of targeting, for example, a TET family enzyme.

[0279] In one embodiment, the vector is a regulatable vector, such as tetracycline inducible vector. Methods described, for example, in Wang et al. Proc. Natl. Acad. Sci. 100: 5103-5106, using pTet-On vectors (BD Biosciences Clontech, Palo Alto, Calif.) can be used. In one embodiment, the RNA interference agents used in the methods described herein are taken up actively by cells in vivo following intravenous injection, e.g., hydrodynamic injection, without the use of a vector, illustrating efficient in vivo delivery of the RNA interfering agents. One method to deliver the siRNAs is catheterization of the blood supply vessel of the target organ. Other strategies for delivery of the RNA interference agents, e.g., the siRNAs or shRNAs used in the methods of the invention, may also be employed, such as, for example, delivery by a vector, e.g., a plasmid or viral vector, e.g., a lentiviral vector. Such vectors can be used as described, for example, in Xiao-Feng Qin et al. Proc. Natl. Acad. Sci. U.S.A., 100: 183-188. Other delivery methods include delivery of the RNA interfering agents, e.g., the siRNAs or shRNAs of the invention, using a basic peptide by conjugating or mixing the RNA interfering agent with a basic peptide, e.g., a fragment of a TAT peptide, mixing with cationic lipids or formulating into particles.

[0280] The RNA interference agents, e.g., the siRNAs targeting TET family enzyme mRNA, may be delivered singly, or in combination with other RNA interference agents, e.g., siRNAs, such as, for example siRNAs directed to other cellular genes. TET family enzyme siRNAs may also be administered in combination with other pharmaceutical agents which are used to treat or prevent diseases or disorders, as described herein.

[0281] Synthetic siRNA molecules, including shRNA molecules, can be obtained using a number of techniques known to those of skill in the art. For example, the siRNA molecule can be chemically synthesized or recombinantly produced using methods known in the art, such as using appropriately protected ribonucleoside phosphoramidites and a conventional DNA/RNA synthesizer (see, e.g., Elbashir, S. M. et al. (2001) Nature 411:494-498; Elbashir, S. M., W. Lendeckel and T. Tuschl (2001) Genes & Development 15:188-200; Harborth, J. et al. (2001) J. Cell Science 114:4557-4565; Masters, J. R. et al. (2001) Proc. Natl. Acad. Sci., USA 98:8012-8017; and Tuschl, T. et al. (1999) Genes & Development 13:3191-3197). Alternatively, several commercial RNA synthesis suppliers are available including, but not limited to, Proligo (Hamburg, Germany), Dharmacon Research (Lafayette, Colo., USA), Pierce Chemical (part of Perbio Science, Rockford, Ill., USA), Glen Research (Sterling, Va., USA), ChemGenes (Ashland, Mass., USA), and Cruachem (Glasgow, UK). As such, siRNA molecules are not overly difficult to synthesize and are readily provided in a quality suitable for RNAi. In addition, dsRNAs can be expressed as stem loop structures encoded by plasmid vectors, retroviruses and lentiviruses (Paddison, P. J. et al. (2002) Genes Dev. 16:948-958; McManus, M. T. et al. (2002) RNA 8:842-850; Paul, C. P. et al. (2002) Nat. Biotechnol. 20:505-508; Miyagishi, M. et al. (2002) Nat. Biotechnol. 20:497-500; Sui, G. et al. (2002) Proc. Natl. Acad. Sci., USA 99:5515-5520; Brummelkamp, T. et al. (2002) Cancer Cell 2:243; Lee, N. S., et al. (2002) Nat. Biotechnol. 20:500-505; Yu, J. Y., et al. (2002) Proc. Natl. Acad. Sci., USA 99:6047-6052; Zeng, Y., et al. (2002) Mol. Cell 9:1327-1333; Rubinson, D. A., et al. (2003) Nat. Genet. 33:401-406; Stewart, S. A., et al. (2003) RNA 9:493-501). These vectors generally have a polIII promoter upstream of the dsRNA and can express sense and antisense RNA strands separately and/or as a hairpin structures. Within cells, Dicer processes the short hairpin RNA (shRNA) into effective siRNA. The targeted region of the siRNA molecule of the present invention can be selected from a given target gene sequence, e.g., a TET family enzyme coding sequence, beginning from about 25 to 50 nucleotides, from about 50 to 75 nucleotides, or from about 75 to 100 nucleotides downstream of the start codon. Nucleotide sequences may contain 5' or 3' UTRs and regions nearby the start codon. One method of designing a siRNA molecule of the present invention involves identifying the 23 nucleotide sequence motif AA(N19)TT (SEQ ID NO: 102) (where N can be any nucleotide) and selecting hits with at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70% or 75% G/C content. The "TT" portion of the sequence is optional. Alternatively, if no such sequence is found, the search may be extended using the motif NA(N21), where N can be any nucleotide. In this situation, the 3' end of the sense siRNA may be converted to TT to allow for the generation of a symmetric duplex with respect to the sequence composition of the sense and antisense 3' overhangs. The antisense siRNA molecule may then be synthesized as the complement to nucleotide positions 1 to 21 of the 23 nucleotide sequence motif. The use of symmetric 3' TT overhangs may be advantageous to ensure that the small interfering ribonucleoprotein particles (siRNPs) are formed with approximately equal ratios of sense and antisense target RNA-cleaving siRNPs (Elbashir et al. (2001) supra and Elbashir et al. 2001 supra). Analysis of sequence databases, including but not limited to the NCBI, BLAST, Derwent and GenSeq as well as commercially available oligosynthesis companies such as Oligoengine.RTM., may also be used to select siRNA sequences against EST libraries to ensure that only one gene is targeted.

[0282] Delivery of RNA Interfering Agents:

[0283] In general, any method of delivering a nucleic acid molecule can be adapted for use with an RNAi interference molecule (see e.g., Akhtar S. and Julian R L. (1992) Trends Cell. Biol. 2(5):139-144; WO94/02595, which are incorporated herein by reference in their entirety). Methods of delivering RNA interference agents, e.g., an siRNA, or vectors containing an RNA interference agent, to the target cells, e.g., a cancer cell or other desired target cells, for uptake can include injection of a composition containing the RNA interference agent, e.g., an siRNA, or directly contacting the cell, e.g., a lymphocyte, with a composition comprising an RNA interference agent, e.g., an siRNA.

[0284] However, there are factors that are important to consider in order to successfully deliver an RNAi molecule in vivo. For example, one should consider: (1) biological stability of the RNAi molecule, (2) preventing non-specific effects, and (3) accumulation of the RNAi molecule in the target tissue. The non-specific effects of an RNAi molecule can be minimized by local administration by e.g., direct injection into a tumor, cell, target tissue, or topically. Local administration of an RNAi molecule to a treatment site limits the exposure of the e.g., siRNA to systemic tissues and permits a lower dose of the RNAi molecule to be administered. Several studies have shown successful knockdown of gene products when an RNAi molecule is administered locally. For example, intraocular delivery of a VEGF siRNA by intravitreal injection in cynomolgus monkeys (Tolentino, M J., et al (2004) Retina 24:132-138) and subretinal injections in mice (Reich, S J., et al (2003) Mol. Vis. 9:210-216) were both shown to prevent neovascularization in an experimental model of age-related macular degeneration. In addition, direct intratumoral injection of an siRNA in mice reduces tumor volume (Pille, J., et al (2005) Mol. Ther. 11:267-274) and can prolong survival of tumor-bearing mice (Kim, W J., et al (2006) Mol. Ther. 14:343-350; Li, S., et al (2007) Mol. Ther. 15:515-523). RNA interference has also shown success with local delivery to the CNS by direct injection (Dorn, G., et al. (2004) Nucleic Acids 32:e49; Tan, P H., et al (2005) Gene Ther. 12:59-66; Makimura, H., et al (2002) BMC Neurosci. 3:18; Shishkina, G T., et al (2004) Neuroscience 129:521-528; Thakker, E R., et al (2004) Proc. Natl. Acad. Sci. U.S.A. 101:17270-17275; Akaneya,Y., et al (2005) J. Neurophysiol. 93:594-602) and to the lungs by intranasal administration (Howard, K A., et al (2006) Mol. Ther. 14:476-484; Zhang, X., et al (2004) J. Biol. Chem. 279:10677-10684; Bitko, V., et al (2005) Nat. Med. 11:50-55).

[0285] For administering an RNAi molecule systemically for the treatment of a disease, the RNAi molecule can be either be modified or alternatively delivered using a drug delivery system-both methods act to prevent the rapid degradation of the RNAi molecule by endo- and exo-nucleases in vivo. Modification of the RNAi molecule or the pharmaceutical carrier can also permit targeting of the RNAi molecule to the target tissue and avoid undesirable off-target effects.

[0286] RNA interference molecules can be modified by chemical conjugation to lipophilic groups such as cholesterol to enhance cellular uptake and prevent degradation. For example, an siRNA directed against ApoB conjugated to a lipophilic cholesterol moiety was injected systemically into mice and resulted in knockdown of apoB mRNA in both the liver and jejunum (Soutschek, J., et al (2004) Nature 432:173-178). Conjugation of an RNAi molecule to an aptamer has been shown to inhibit tumor growth and mediate tumor regression in a mouse model of prostate cancer (McNamara, J O., et al (2006) Nat. Biotechnol. 24:1005-1015).

[0287] In an alternative embodiment, the RNAi molecules can be delivered using drug delivery systems such as e.g., a nanoparticle, a dendrimer, a polymer, liposomal, or a cationic delivery system. Positively charged cationic delivery systems facilitate binding of an RNA interference molecule (negatively charged) and also enhance interactions at the negatively charged cell membrane to permit efficient uptake of an siRNA by the cell. Cationic lipids, dendrimers, or polymers can either be bound to an RNA interference molecule, or induced to form a vesicle or micelle (see e.g., Kim S H., et al (2008) Journal of Controlled Release 129(2):107-116) that encases an RNAi molecule. The formation of vesicles or micelles further prevents degradation of the RNAi molecule when administered systemically. Methods for making and administering cationic-RNAi complexes are well within the abilities of one skilled in the art (see e.g., Sorensen, D R., et al (2003) J. Mol. Biol 327:761-766; Verma, U N., et al (2003) Clin. Cancer Res. 9:1291-1300; Arnold, A S et al (2007) J. Hypertens. 25:197-205).

[0288] Some non-limiting examples of drug delivery systems useful for systemic administration of RNAi include DOTAP (Sorensen, D R., et al (2003), supra; Verma, U N., et al (2003), supra), Oligofectamine, "solid nucleic acid lipid particles" (Zimmermann, T S., et al (2006) Nature 441:111-114), cardiolipin (Chien, P Y., et al (2005) Cancer Gene Ther. 12:321-328; Pal, A., et al (2005) Int J. Oncol. 26:1087-1091), polyethyleneimine (Bonnet M E., et al (2008) Pharm. Res. August 16 Epub ahead of print; Aigner, A. (2006) J. Biomed. Biotechnol. 71659), Arg-Gly-Asp (RGD) peptides (Liu, S. (2006) Mol. Pharm. 3:472-487), and polyamidoamines (Tomalia, D A., et al (2007) Biochem. Soc. Trans. 35:61-67; Yoo, H., et al (1999) Pharm. Res. 16:1799-1804). In some embodiments, an RNAi molecule forms a complex with cyclodextrin for systemic administration. Methods for administration and pharmaceutical compositions of RNAi molecules and cyclodextrins can be found in U.S. Pat. No. 7,427,605, which is herein incorporated by reference in its entirety. Specific methods for administering an RNAi molecule for the inhibition of angiogenesis can be found in e.g., U.S. Patent Application No. 20080152654.

[0289] In other embodiments, RNA interference agent, e.g., an siRNA may be injected directly into any blood vessel, such as vein, artery, venule or arteriole, via, e.g., hydrodynamic injection or catheterization. Administration may be by a single injection or by two or more injections. The RNA interference agent is delivered in a pharmaceutically acceptable carrier. One or more RNA interference agents may be used simultaneously. In one embodiment, only one siRNA that targets a human TET family enzyme is used. In one embodiment, specific cells are targeted with RNA interference, limiting potential side effects of RNA interference caused by non-specific targeting of RNA interference. The method can use, for example, a complex or a fusion molecule comprising a cell targeting moiety and an RNA interference binding moiety that is used to deliver RNA interference effectively into cells. For example, an antibody-protamine fusion protein when mixed with siRNA, binds siRNA and selectively delivers the siRNA into cells expressing an antigen recognized by the antibody, resulting in silencing of gene expression only in those cells that express the antigen. The siRNA or RNA interference-inducing molecule binding moiety is a protein or a nucleic acid binding domain or fragment of a protein, and the binding moiety is fused to a portion of the targeting moiety. The location of the targeting moiety can be either in the carboxyl-terminal or amino-terminal end of the construct or in the middle of the fusion protein. A viral-mediated delivery mechanism can also be employed to deliver siRNAs to cells in vitro and in vivo as described in Xia, H. et al. (2002) Nat Biotechnol 20(10):1006). Plasmid- or viral-mediated delivery mechanisms of shRNA may also be employed to deliver shRNAs to cells in vitro and in vivo as described in Rubinson, D. A., et al. ((2003) Nat. Genet. 33:401-406) and Stewart, S. A., et al. ((2003) RNA 9:493-501). The RNA interference agents, e.g., the siRNAs or shRNAs, can be introduced along with components that perform one or more of the following activities: enhance uptake of the RNA interfering agents, e.g., siRNA, by the cell, e.g., lymphocytes or other cells, inhibit annealing of single strands, stabilize single strands, or otherwise facilitate delivery to the target cell and increase inhibition of the target gene, e.g., TET1, TET2, TET3, or CXXC4. The dose of the particular RNA interfering agent will be in an amount necessary to effect RNA interference, e.g., post translational gene silencing (PTGS), of the particular target gene, thereby leading to inhibition of target gene expression or inhibition of activity or level of the protein encoded by the target gene.

[0290] Small Molecule Inhibitors and Activators:

[0291] As used herein, the term "small molecule" refers to a chemical agent including, but not limited to, peptides, peptidomimetics, amino acids, amino acid analogs, polynucleotides, polynucleotide analogs, aptamers, nucleotides, nucleotide analogs, organic or inorganic compounds (i.e., including heteroorganic and organometallic compounds) having a molecular weight less than about 10,000 grams per mole, organic or inorganic compounds having a molecular weight less than about 5,000 grams per mole, organic or inorganic compounds having a molecular weight less than about 1,000 grams per mole, organic or inorganic compounds having a molecular weight less than about 500 grams per mole, and salts, esters, and other pharmaceutically acceptable forms of such compounds.

Antibodies Specific for TET Family Enzymes and Detecting TET Family Activity

[0292] Antibodies that can be used according to the methods described herein, for example, for detecting TET family activity, such as hydroxymethylation of cytosine, include complete immunoglobulins, antigen binding fragments of immunoglobulins, as well as antigen binding proteins that comprise antigen binding domains of immunoglobulins. Antigen binding fragments of immunoglobulins include, for example, Fab, Fab', F(ab')2, scFv and dAbs. Modified antibody formats have been developed which retain binding specificity, but have other characteristics that may be desirable, including for example, bispecificity, multivalence (more than two binding sites), and compact size (e.g., binding domains alone). Single chain antibodies lack some or all of the constant domains of the whole antibodies from which they are derived. Therefore, they can overcome some of the problems associated with the use of whole antibodies. For example, single-chain antibodies tend to be free of certain undesired interactions between heavy-chain constant regions and other biological molecules. Additionally, single-chain antibodies are considerably smaller than whole antibodies and can have greater permeability than whole antibodies, allowing single-chain antibodies to localize and bind to target antigen-binding sites more efficiently. Furthermore, the relatively small size of single-chain antibodies makes them less likely to provoke an unwanted immune response in a recipient than whole antibodies.

[0293] Multiple single chain antibodies, each single chain having one VH and one VL domain covalently linked by a first peptide linker, can be covalently linked by at least one or more peptide linker to form multivalent single chain antibodies, which can be monospecific or multispecific. Each chain of a multivalent single chain antibody includes a variable light chain fragment and a variable heavy chain fragment, and is linked by a peptide linker to at least one other chain. The peptide linker is composed of at least fifteen amino acid residues. The maximum number of linker amino acid residues is approximately one hundred.

[0294] Two single chain antibodies can be combined to form a diabody, also known as a bivalent dimer. Diabodies have two chains and two binding sites, and can be monospecific or bispecific. Each chain of the diabody includes a VH domain connected to a VL domain. The domains are connected with linkers that are short enough to prevent pairing between domains on the same chain, thus driving the pairing between complementary domains on different chains to recreate the two antigen-binding sites.

[0295] Three single chain antibodies can be combined to form triabodies, also known as trivalent trimers. Triabodies are constructed with the amino acid terminus of a VL or VH domain directly fused to the carboxyl terminus of a VL or VH domain, i.e., without any linker sequence. The triabody has three Fv heads with the polypeptides arranged in a cyclic, head-to-tail fashion. A possible conformation of the triabody is planar with the three binding sites located in a plane at an angle of 120 degrees from one another. Triabodies can be monospecific, bispecific or trispecific.

[0296] Thus, antibodies useful in the methods described herein include, but are not limited to, naturally occurring antibodies, bivalent fragments such as (Fab')2, monovalent fragments such as Fab, single chain antibodies, single chain Fv (scFv), single domain antibodies, multivalent single chain antibodies, diabodies, triabodies, and the like that bind specifically with an antigen.

[0297] Antibodies can also be raised against a nucleotide, polypeptide or portion of a polypeptide by methods known to those skilled in the art. Antibodies are readily raised in animals such as rabbits or mice by immunization with the gene product, or a fragment thereof. Immunized mice are particularly useful for providing sources of B cells for the manufacture of hybridomas, which in turn are cultured to produce large quantities of monoclonal antibodies. Antibody manufacture methods are described in detail, for example, in Harlow et al., 1988. While both polyclonal and monoclonal antibodies can be used in the methods described herein, it is preferred that a monoclonal antibody is used where conditions require increased specificity for a particular protein.

[0298] The term "intrabodies" as used herein, refers to a method wherein to target intracellular endogenous proteins as described in U.S. Pat. No. 6,004,940. Briefly, the method comprises the intracellular expression of an antibody capable of binding to the target. A DNA sequence is delivered to a cell, the DNA sequence contains a sufficient number of nucleotides coding for the portion of an antibody capable of binding to the target operably linked to a promoter that will permit expression of the antibody in the cell(s) of interest. The antibody is then expressed intracellularly and binds to the target, thereby disrupting the target from its normal actions.

[0299] The terms "label" or "tag", as used herein, refer to a composition capable of producing a detectable signal indicative of the presence of the target, such as, for example, a 5-hydroxymethylcytosine, in an assay sample. Suitable labels include radioisotopes, nucleotide chromophores, enzymes, substrates, fluorescent molecules, chemiluminescent moieties, magnetic particles, bioluminescent moieties, and the like. As such, a label is any composition detectable by spectroscopic, photochemical, biochemical, immunochemical, electrical, optical or chemical means. The terms "labeled antibody" or "tagged antibody", as used herein, includes antibodies that are labeled by a detectable means and include, but are not limited to, antibodies that are enzymatically, radioactively, fluorescently, and chemiluminescently labeled. Antibodies can also be labeled with a detectable tag, such as c-Myc, HA, VSV-G, HSV, FLAG, V5, or HIS. The detection and quantification of, for example, 5-hydroxymethylcytosine residues present in a nucleic acid sample correlate to the intensity of the signal emitted from the detectably labeled antibody. In one embodiment, the label is a detectable marker, e.g., incorporation of a radiolabeled amino acid. Various methods of labeling polypeptides and glycoproteins are known in the art and may be used.

[0300] Examples of labels or tags for polypeptides include, but are not limited to, the following: radioisotopes or radionuclides (e.g., 3H, 14C, 15N, 35S, 43K, 52Fe, 57Co, 67Cu, 67Ga, 68 Ga, 90Y, 99Tc, 111In, 1231, 1251, 1311, or 1321), fluorescent labels (e.g., FITC, phycoerythrin, rhodamine, lanthanide phosphors), enzymatic labels (e.g., horseradish peroxidase, beta-galactosidase, luciferase, alkaline phosphatase), quantum dots, chemiluminescent markers, biotinyl groups, predetermined polypeptide epitopes recognized by a secondary reporter (e.g., leucine zipper pair sequences, binding sites for secondary antibodies, metal binding domains, epitope tags), magnetic agents, such as gadolinium chelates, toxins such as pertussis toxin, taxol, cytochalasin B, gramicidin D, ethidium bromide, emetine, mitomycin, etoposide, tenoposide, vincristine, vinblastine, colchicin, doxorubicin, daunorubicin, dihydroxy anthracin dione, mitoxantrone, mithramycin, actinomycin D, 1-dehydrotestosterone, glucocorticoids, procaine, tetracaine, lidocaine, propranolol, and puromycin and analogs or homologs thereof. In some embodiments, the label for the antibody is a fluorescent label.

[0301] A fluorescent label or tag for labeling the antibody may be Hydroxycoumarin, Succinimidyl ester, Aminocoumarin, Succinimidyl ester, Methoxycoumarin, Succinimidyl ester, Cascade Blue, Hydrazide, Pacific Blue, Maleimide, Pacific Orange, Lucifer yellow, NBD, NBD-X, R-Phycoerythrin (PE), a PE-Cy5 conjugate (Cychrome, R670, Tri-Color, Quantum Red), a PE-Cy7 conjugate, Red 613, PE-Texas Red, PerCP, Peridinin chlorphyll protein, TruRed (PerCP-Cy5.5 conjugate), FluorX, Fluoresceinisothyocyanate (FITC), BODIPY-FL, TRITC, X-Rhodamine (XRITC), Lissamine Rhodamine B, Texas Red, Allophycocyanin (APC), an APC-Cy7 conjugate, Alexa Fluor 350, Alexa Fluor 405, Alexa Fluor 430, Alexa Fluor 488, Alexa Fluor 500, Alexa Fluor 514, Alexa Fluor 532, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 568, Alexa Fluor 594, Alexa Fluor 610, Alexa Fluor 633, Alexa Fluor 647, Alexa Fluor 660, Alexa Fluor 680, Alexa Fluor 700, Alexa Fluor 750, Alexa Fluor 790, Cy2, Cy3, Cy3B, Cy3.5, Cy5, Cy5.5 or Cy7.

[0302] As used herein, the term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of vector is a "plasmid", which refers to a circular double stranded DNA loop into which additional nucleic acid segments can be ligated. Another type of vector is a viral vector, wherein additional nucleic acid segments can be ligated into the viral genome. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively linked. Such vectors are referred to herein as "recombinant expression vectors", or more simply "expression vectors." In general, expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. In the present specification, "plasmid" and "vector" can be used interchangeably as the plasmid is the most commonly used form of vector. However, the invention is intended to include such other forms of expression vectors, such as viral vectors (e.g., non-integrating viral vectors or replication defective retroviruses, lentiviruses, adenoviruses and adeno-associated viruses), which serve equivalent functions. In one embodiment, lentiviruses are used to deliver one or more siRNA molecule of the present invention to a cell.

[0303] As used herein, the term "non-integrating viral vector" refers to a viral vector that does not integrate into the host genome; the expression of the gene delivered by the viral vector is temporary. Since there is little to no integration into the host genome, non-integrating viral vectors have the advantage of not producing DNA mutations by inserting at a random point in the genome. For example, a non-integrating viral vector remains extra-chromosomal and does not insert its genes into the host genome, potentially disrupting the expression of endogenous genes. Non-integrating viral vectors can include, but are not limited to, the following: adenovirus, alphavirus, picornavirus, and vaccinia virus. These viral vectors are "non-integrating" viral vectors as the term is used herein, despite the possibility that any of them may, in some rare circumstances, integrate viral nucleic acid into a host cell's genome. What is critical is that the viral vectors used in the methods described herein do not, as a rule or as a primary part of their life cycle under the conditions employed, integrate their nucleic acid into a host cell's genome. It goes without saying that an iPS cell generated by a non-integrating viral vector will not be administered to a subject unless it and its progeny are free from viral remnants.

[0304] As used herein, the term "viral remnants" refers to any viral protein or nucleic acid sequence introduced using a viral vector. Generally, integrating viral vectors will incorporate their sequence into the genome; such sequences are referred to herein as a "viral integration remnant". However, the temporary nature of a non-integrating virus means that the expression, and presence of, the virus is temporary and is not passed to daughter cells. Thus, upon passaging of a re-programmed cell the viral remnants of the non-integrating virus are essentially removed.

[0305] As used herein, the phrases "free of viral integration remnants" and "substantially free of viral integration remnants" refers to iPS cells that do not have detectable levels of an integrated adenoviral genome or an adenoviral specific protein product (i.e., a product other than the gene of interest), as assayed by PCR or immunoassay. Thus, the iPS cells that are free (or substantially free) of viral remnants have been cultured for a sufficient period of time that transient expression of the adenoviral vector leaves the cells substantially free of viral remnants.

[0306] Within an expression vector, "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the regulatory sequence(s) in a manner which allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a target cell when the vector is introduced into the target cell). The term "regulatory sequence" is intended to include promoters, enhancers and other expression control elements (e.g., polyadenylation signals). Such regulatory sequences are described, for example, in Goeddel; Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Regulatory sequences include those which direct constitutive expression of a nucleotide sequence in many types of host cell and those which direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). Furthermore, the RNA interfering agents may be delivered by way of a vector comprising a regulatory sequence to direct synthesis of the siRNAs of the invention at specific intervals, or over a specific time period. It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the target cell, the level of expression of siRNA desired, and the like.

[0307] The expression vectors of the invention can be introduced into target cells to thereby produce siRNA molecules of the present invention. In one embodiment, a DNA template, e.g., a DNA template encoding the siRNA molecule directed against the mutant allele, may be ligated into an expression vector under the control of RNA polymerase III (Pol III), and delivered to a target cell. Pol III directs the synthesis of small, noncoding transcripts which 3' ends are defined by termination within a stretch of 4-5 thymidines. Accordingly, DNA templates may be used to synthesize, in vivo, both sense and antisense strands of siRNAs which effect RNAi (Sui, et al. (2002) PNAS 99(8):5515).

[0308] As used in this specification and the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Thus for example, references to "the method" includes one or more methods, and/or steps of the type described herein and/or which will become apparent to those persons skilled in the art upon reading this disclosure and so forth. It is understood that the foregoing detailed description and the following examples are illustrative only and are not to be taken as limitations upon the scope of the invention. Various changes and modifications to the disclosed embodiments, which will be apparent to those of skill in the art, may be made without departing from the spirit and scope of the present invention.

[0309] As used herein, the term "comprising" or "comprises" is used in reference to compositions, methods, and respective component(s) thereof, that are essential to the invention, yet open to the inclusion of unspecified elements, whether essential or not.

[0310] As used herein, the term "consisting essentially of" refers to those elements required for a given embodiment. The term permits the presence of additional elements that do not materially affect the basic and novel or functional characteristic(s) of that embodiment of the invention.

[0311] As used herein, the term "consisting of" refers to compositions, methods, and respective components thereof as described herein, which are exclusive of any element not recited in that description of the embodiment.

[0312] All patents, patent applications, and publications identified are expressly incorporated herein by reference for the purpose of describing and disclosing, for example, the methodologies described in such publications that might be used in connection with the present invention. These publications are provided solely for their disclosure prior to the filing date of the present application. Nothing in this regard should be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention or for any other reason. All statements as to the date or representation as to the contents of these documents are based on the information available to the applicants and do not constitute any admission as to the correctness of the dates or contents of these documents.

Examples

DNA Methylation and Demethylation

[0313] DNA methylation and demethylation play a vital role in mammalian development. In mammals, DNA methylation occurs primarily on cytosine in the context of the dinucleotide CpG. DNA methylation is dynamic during early embryogenesis and has a crucial role in parental imprinting, X-inactivation and silencing of endogenous retroviruses. Embryonic development is accompanied by remarkable changes in the methylation status of individual genes, whole chromosomes and, at times, the entire genome (A. Bird, Genes Dev 16: 6-21 (2002); W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006)). There is active genome-wide demethylation of the paternal genome shortly after fertilization (W. Mayer, Nature 403: 501-502 (2000); J. Oswald, Curr Biol 10: 475-478 (2000)). DNA demethylation is also an important mechanism by which germ cells are reprogrammed: the development of primordial germ cells (PGC) during early embryogenesis involves widespread DNA demethylation that may be mediated by an active (i.e. replication-independent) mechanism (A. Bird, Genes Dev 16: 6-21 (2002); W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006); W. Mayer, Nature 403: 501-502 (2000); J. Oswald, Curr Biol 10: 475-478 (2000)).

[0314] De novo DNA methylation and demethylation are also prominent in somatic cells during differentiation, tumorigenesis and aging. Expression of differentiation-specific genes in somatic cells is often accompanied by progressive DNA demethylation (W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007)), but it is not clear whether this process reflects an "active" process (see below) or "passive" demethylation occurring as a result of exclusion of Dnmt1 during replication. In cultured breast cancer cells, gene expression in response to oestrogen has been shown to be accompanied by waves of apparent DNA demethylation and remethylation that are clearly not coupled to replication (H. Cedar, Nature 397: 568-569 (1999); S. K. Ooi, Cell 133:1145-1148 (2008)). Moreover, tight regulation of DNA demethylation is a likely feature of pluripotent stem cells and progenitor cells in cellular differentiation pathways, that could plausibly contribute to the ability of these cells to self-renew as well as to give rise to daughter differentiating cells. In fact, it has been proposed that pluripotency and the ability to self-renew, two important aspects of stem cell function, involve (or require) proper DNA demethylation (W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); S. Simonsson, Nat. Cell. Biol. 6: 984-990 (2004); R.Blelloch, Stem Cells 24: 2007-2013 (2006)) and as such, could be improved by controlled expression of enzymes in the DNA demethylation pathway. Furthermore, DNA methylation is highly aberrant in cancer, with global loss of methylation as well as increased methylation leading to silencing of tumor suppressor genes (L. T. Smith, Trends Genet 23: 449-456 (2007); E. N. Gal-Yam, Annu Rev Med 59: 267-280 (2008); M. Esteller Nature Rev Cancer; 8: 286-298 (2007); M. Esteller, N Engl J Med, 358: 1148-1159 (2008)), thus it seems possible that cancer cells aberrantly turn on the DNA demethylation pathway, and that the self-renewing population of cancer stem cells is characterized by high levels of DNA demethylase activity. Overall, therefore, an understanding of the mechanism of active DNA demethylation has broad implications for our understanding of mammalian development, cell differentiation, cancer, stem cell function and aging.

[0315] DNA demethylation can proceed by two possible mechanisms--"passive" replication-dependent demethylation and a postulated process of active demethylation for which the molecular basis is still unknown (see below). The passive mechanism is fairly well understood. Normally, cytosine methylation in CpG dinucleotides is symmetric, i.e. occurs on both strands. Hemimethylated CpG's, which are generated during replication of symmetrically-methylated DNA, are recognized by DNA methyltransferase (Dnmt) 1 and are rapidly remethylated. This process is facilitated by interaction of Dnmt1 with proliferating cell nuclear antigen PCNA, which targets Dnmt1 to the replication fork and ensures rapid restoration of the symmetrical pattern of DNA methylation (H. Leonhardt, 1: 865-873, (1992), L. S. Chuang, Science, 277: 1996-2000 (1997).

[0316] If Dnmt1 activity is inhibited or Dnmt1 is excluded from the replication fork for any reason, remethylation of the CpG on the opposite strand does not occur and only one of the two daughter strands retains cytosine methylation. "Passive" demethylation is typically observed during cell differentiation, where it accompanies the increased expression of lineage-specific genes (D.U. Lee, Immunity, 16: 649-660 (2002)). Over a prolonged time period (3-7 cycles of DNA replication), cytosine methylation is progressively lost from genes whose expression increases as a result of cell differentiation.

[0317] So far, enzymes with the ability to demethylate DNA by an active mechanism have not been identified as molecular entities. There is evidence that active DNA demethylation occurs in certain carefully-controlled circumstances: for instance, the paternal genome is actively demethylated shortly after fertilization, well prior to DNA replication (J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006); W. Mayer, Nature 403: 501-502 (2000)). Early development of primordial germ cells (PGC) also involves widespread demethylation that may be mediated by active DNA demethylation (W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); P. Hajkova, Nature, 452: 877-881 (2008); N. Geijsen, Nature, 427: 148-154 (2004)).The mechanism of active demethylation is not known, and various disparate mechanisms have been postulated, including direct removal of the methyl group (i.e. direct conversion of 5-methylcytosine (5mC) into cytosine, a thermodynamically unfavourable process that involves cleavage of a carbon-carbon bond and results in release of the methyl moiety), and methylcytosine-specific DNA repair through the activity of methylcytosine-specific or T/G mismatch-specific DNA glycosylases, and methylcytosine-specific DNA deamination or other modification such as glycosylation or hydroxymethylation, also followed by DNA repair (reviewed in (H. Cedar, Nature, 397: 568-569 (1999), S. K. Ooi, Cell 133: 1145-1148 (2008)). However, no proteins (or set of proteins) with these postulated activities have been reliably identified to date.

Identification of a Novel Family of 2OG-Fe(II) Oxygenases with Predicted DNA Modification Activities

[0318] 5-methylcytosine (5mC) is a minor base in mammalian DNA: It constitutes .about.1% of all DNA bases and is found almost exclusively as symmetrical methylation of the dinucleotide CpG (M. Ehrlich and R. Y. Wang, Science 212, 1350 (1981)). The majority of methylated CpG is found in repetitive DNA elements, suggesting that cytosine methylation evolved as a defense against transposons and other parasitic elements (M. G. Goll, et al., Annu. Rev. Biochem. 74, 481 (2005)). Methylation patterns change dynamically in early embryogenesis, when CpG methylation is essential for X-inactivation and asymmetric expression of imprinted genes (W. Reik, Nature 447, 425 (2007)). In somatic cells, promoter methylation often shows a correlation with gene expression: CpG methylation may directly interfere with the binding of certain transcriptional regulators to their cognate DNA sequences or may enable recruitment of methyl-CpG binding proteins that create a repressed chromatin environment (A. Bird, Genes Dev. 16, 6 (2002)). DNA methylation patterns are highly dysregulated in cancer: Changes in methylation status have been postulated to inactivate tumor suppressors and activate oncogenes, thus contributing to tumorigenesis (E. N. Gal-Yam, et al., Annu. Rev. Med. 59, 267 (2008)).

[0319] Trypanosomes contain base J (b-D-glucosylhydroxymethyluracil), a modified thymine produced by sequential hydroxylation and glucosylation of the methyl group of thymine (P. Borst and R. Sabatini, Annu. Rev. Microbiol. 62, 235 (2008)). J biosynthesis requires JBP1 and JBP2, enzymes of the 20G- and Fe(II) dependent oxygenase superfamily predicted to catalyze the first step of J biosynthesis (Z. Yu et al., Nucleic Acids Res. 35, 2107 (2007); L. J. Cliffe et al., Nucleic Acids Res. 37, 1452 (2009)). Like 5-methylcytosine, base J has an association with gene silencing: It is present in silenced copies of the genes encoding the variable surface glycoprotein (VSG) responsible for antigenic variation in the host but is absent from the single expressed copy (P. Borst and R. Sabatini, Annu. Rev. Microbiol. 62, 235 (2008)).

[0320] We used bioinformatic analysis to predict that the putative mammalian oncogenes TET1, TET2 and TET3 belong to the class of enzymes containing 2OG-Fe(II) oxygenase domains. To identify homologs of the 2OG-Fe(II) oxygenase domain of JBP1 and JBP2, they were included in a profile of 2OG-Fe(II) oxygenases and a systematic search of the non-redundant database, as well as the protein sequence database of microbes from environmental samples, with their conserved catalytic domain using the PSI-BLAST program, was conducted. A further search of the non-redundant database, with proteins newly detected as a result of this search also included in the profile, and using iterative sequence profile searches, using the predicted oxygenase domains of JBP1 and JBP2, was used to recover homologous regions in three paralogous human proteins (oncogenes) TET1 (CXXC6), TET2, and TET3 (R. B. Lorsbach, Leukemia, 17(3):637-41 (2003)) and their orthologs found throughout metazoa (e<10-5), as well as homologous domains in fungi and algae. In PSI-BLAST searches of these groups of homologous domains consistently recovered each other prior to recovering any other member of the 2OG-Fe(II) oxygenase superfamily, indicating that they formed a distinctive family within it.

[0321] To confirm the relationship of the newly-identified proteins (hereinafter referred to as the JBP1/2 family) with classical 2OG-Fe(II) oxygenases, a multiple alignment of their shared conserved domains was prepared.

[0322] Secondary structure predictions pointed to a continuous series of .beta.-strands with an N-terminal a-helix, which is typical of the double-stranded .beta.-helix (DSBH) fold of the 2OG-Fe(II) oxygenases (L. Aravind and E. V. Koonin, Genome Biol. 2, RESEARCH 0007 (2001)). A multiple sequence alignment showed that the new TET/JBP family displayed all of the typical features of 2OG-Fe(II) oxygenases, including conservation of residues predicted to be important for coordination of the cofactors Fe(II) and 20G. The metazoan TET proteins contain a unique conserved cysteine-rich region, contiguous with the N terminus of the DSBH region. Vertebrate TET1 and TET3, and their orthologs from all other animals, also possess a CXXC domain, a binuclear Zn-chelating domain, found in several chromatin-associated proteins, that in certain cases has been shown to discriminate between methylated and unmethylated DNA (M. D. Allen et al., EMBO J. 25, 4503 (2006)).

[0323] Thus, we have identified the TET subfamily as having structural features characteristic of enzymes that oxidize 5-methylpyrimidines. We have shown that the domain structure of the TET subfamily proteins, includes the CXXC domain, the "C" or Cys-rich domain, and the 2OG-Fe(II) oxygenase domain containing a large, low complexity insert.

[0324] The conserved features of the TET family of proteins include: (i) the H.times.D sequence (where x is any amino acid) associated with the extended region after the first strand which chelates Fe(II); (ii) the GG sequence at the beginning of strand 4 which helps in positioning the active site arginine; (iii) the HXs sequence (where s is a small residue) in the penultimate conserved strand, in which the H chelates the Fe(II) and the small residue helps in binding the 2-oxo acid; (iv) the RX5a sequence (where a is an aromatic residue: F,Y,W) in the last conserved strand of the domain. The R in this motif forms a salt bridge with the 2 oxo acid and the aromatic residue helps in position the first metal-chelating histidine. The JBP1/2 family is unified by the presence of a distinctive proline in the N-terminal conserved helix (which might result in a characteristic kink in the first helix of this subfamily) and a conserved aromatic residue (typically part of a sX2F sequence; `s` being a small residue) in the first conserved strand. These observations indicated that TET1, TET2, and TET3, as well as the majority of JBP1/2 homologs from diverse phage, fungal, algal and animal sources, are catalytically-active 2OG-Fe(II) oxygenases. We have shown that when the conserved H.times.D motif is mutated to Y.times.A catalytic activity is eliminated.

[0325] We have shown that the vertebrate TET1 and TET3 and their orthologs (the TET subfamily) from all other animals show a fusion of the 2OG-Fe(II) oxygenase domain with a N-terminal CXXC domain, as depicted in FIG. 5. The CXXC domain is a binuclear Zn-chelating domain with 8 conserved cysteines and 1 histidine that is found in several chromatin-associated proteins, including the animal DNA methylase DNMT1 and the methylated DNA-binding MBD1. Different versions of this domain have been shown to bind specifically to DNA containing methylated cytosine, either on both strands or just a single strand. This feature, when seen in light of the relationship with JBP1/2 and the phage proteins, suggested to us that the TET subfamily operates on methylcytosine to catalyze oxidation or oxidative removal of the methyl group.

[0326] Additionally, the TET subfamily is characterized by a unique conserved domain (here termed the Cys-rich or "C" domain). This domain is contiguous with the N-terminus of the 2OG-Fe(II) oxygenase domain, and contains at least 8 conserved cysteines and 1 histidine that are likely to comprise a binuclear metal cluster. Based on the position of the N-terminal extensions of the AlkB protein, at least a part of the "C" domain could be similarly positioned and form an extended DNA recognition surface. The 2OG-Fe(II) oxygenase domain of the TET family contains a large, low complexity insert predicted to have a predominantly unstructured conformation. It occurs within the DSBH fold exactly in the same position as an unstructured insert seen in the prolyl hydroxylases. Based on the structure of the prolyl hydroxylases, this insert is likely to be located on the exterior surface of the protein, stacked against one face of the DSBH. Its persistence across the entire family despite lack of sequence conservation indicates that it might form a generalized protein-protein interaction surface.

[0327] Thus, the total weight of the contextual information available for the JBP1/2 family supports a conserved modification function for the entire family, namely oxidation of 5-methylpyrimidines in DNA or RNA. Without wishing to be limited or bound by a theory, we envision that the activity of this family of enzymes need not be restricted to hydroxymethylation of 5-methylcytosine; certain family members could act as dioxygenases for other pyrimidines, either free, in small nucleic acids such as microRNAs, in DNA or in RNA; or could mediate further oxidation steps beyond hydroxymethylation, for instance to an aldehyde or an acid.

Experimental Analysis of the TET Subfamily: Cells Expressing TET1 Show Decreased Staining for 5-Methylcytosine

[0328] To test the computational predictions for the human TET subfamily, all three human TET proteins were subcloned into mammalian expression vectors with tandem FLAG and HA tags. Importantly, TET1/CXXC6 is known to be associated with the development of acute myeloid leukemia in the context of t(10;11)(q22;23) translocations, which result in the expression of TET1:MLL fusion proteins that maintain the predicted catalytic domain of TET1 while losing the SET methyltransferase domain of MLL (R. B. Lorsbach, Leukemia, 17(3):637-41 (2003); R. Ono, Cancer Res 62: 4075-4080 (2002)).

[0329] To examine the effect of TET1 on overall DNA methylation levels, FLAG- and HA-tagged full-length TET1 or its C-terminal Cys-rich+DSBH domains (hereafter referred to as the C+D domain) was expressed in human embryonic kidney (HEK) 293 cells. Two days later, we stained the cells for 5-methylcytosine content using a 5-methylcytosine-specific antibody and for TET1 expression using an antibody to the HA epitope tag. We showed that mock-transfected cells showed substantial variation in 5-methylcytosine staining intensity (FIG. 6), either because 5-methylcytosine levels vary from cell to cell or because the accessibility of 5-methylcytosine to the antibody differs among cells because of technical considerations (e.g., incomplete denaturation of DNA).

[0330] We found that cells transfected with wild-type TET1 showed a strong correlation of HA positivity with decreased staining for 5-methylcytosine, both visually and by quantification (FIG. 6). Untransfected HA-low cells showed a spread of 5-methylcytosine staining intensity similar to that of mock-transfected cells, whereas productively transfected HA-high cells showed uniformly low 5-methylcytosine staining intensity (FIG. 6).

[0331] We have demonstrated that overexpression of catalytically active TET subfamily proteins leads to decreased staining with a monoclonal antibody directed against 5-methylcytosine. We have shown that catalytically active TET1 causes a substantial decrease in nuclear staining for 5-methylcytosine (5mC) in transfected HEK293 cells. We have also quantified the relation between 5-methylcytosine staining and HA/TET1 staining on a per-cell basis using the Cell Profiler program. We found that cells expressing full-length TET1 show a substantial decrease in 5-methylcytosine staining relative to mock-transfected cells (FIG. 6). The loss of 5-methylcytosine staining is even more striking in cells expressing only the C+D domain of TET1, but is far less apparent in cells expressing a mutant C+D domain in which two of the predicted catalytic residues of the predicted 2OG-Fe(II) oxygenase domain, His1672 and Asp1674, are mutated to tyrosine and alanine respectively (numbers refer to residues in full-length TET1).

[0332] We used the Cell Profiler program to quantify the relation between 5-methylcytosine staining and HA staining on a per-cell basis. We found that mock-transfected cells show a wide spread in 5-methylcytosine staining intensity, most likely because access of the anti-5-methylcytosine antibody to the methylated cytosine requires complete denaturation of the DNA. In the population of cells transfected with full-length TET1 or the C+D domain of TET1, we found that the 5-methylcytosine staining intensity of the untransfected (HA-low) subpopulation overlaps with that of the mock-transfected population, but the productively transfected (HA-high) population shows a clear decrease in the intensity of 5-methylcytosine staining (FIG. 6). In contrast, we found that HA-positive cells expressing the mutant H1672Y, D1674A C+D domain show a distribution of 5-methylcytosine staining intensity that is much more similar to that of the mock-transduced cells.

[0333] We also found that, notably, cells expressing the C+D domain display a distinct increase in nuclear size, which again is much less apparent in cells expressing the mutant protein, and we also quantified this effect.

A Novel Nucleotide in DNA from Cells Expressing TET1

[0334] The loss of 5-methylcytosine staining in TET1-expressing cells suggested to us that the 5-methylcytosine in these cells was being modified in some way. To detect the modified nucleotide, we developed an assay based on thin-layer chromatography (TLC) to detect the relative levels of cytosine and 5-methylcytosine in cells. Herein, we demonstrate that TET1 expression leads to the generation of a novel nucleotide. Briefly, DNA is subjected to cleavage with MspI, a methylation-insensitive enzyme that cuts at the sequence CCGG regardless of whether or not the internal CpG is methylated on cytosine. The resulting fragments, whose 5' ends derive from the dinucleotide CpG, contain either cytosine or 5-methylcytosine (H. Cedar et al., Nucleic Acids Res. 6, 2125 (1979)). The DNA is then treated with calf intestinal phosphatase (CIP), end-labeled with polynucleotide kinase (PNK), hydrolysed to dNMPs with snake venom phosphodiesterase (SVPD) and DNase I, and the nucleotides are separated by thin-layer chromatography.

[0335] We demonstrate that our TLC assay detected a novel nucleotide in genomic DNA of cells transfected with catalytically active full-length TET1 or its catalytic fragment (C+D)--the appearance of this novel nucleotide depended both on 5-methylcytosine and on the expression of catalytically active full-length TET1 or its catalytic fragment (C+D) in HEK293 cells. To determine if TET1 altered the relative levels of unmethylated and methylated cytosine in cells, HEK293cells were transfected with control vector or vector encoding full-length or C+D TET1 or their mutant versions, following which DNA was extracted from the entire transfected population and subjected to digestion, end-labeling and TLC. Compared to MspI-digested DNA from cells transfected with the control vector, MspI-digested DNA from cells expressing wildtype, but not mutant, full-length or C+D TET1 yielded a novel labeled spot migrating between dCMP and dTMP. We showed that catalytically active (wt) but not catalytically inactive (mut) TET1 alters the relative levels of unmethylated and methylated cytosine in transfected HEK293 cells and results in the appearance of the novel nucleotide, and this was particularly apparent with the catalytic C+D fragment. We show that the intensity of this spot correlated with a decrease in the intensity of the 5-methyl-dCMP (5m-dCMP) spot, suggesting strongly that the spot was derived from 5-methyl-dCMP and not from dCMP. We also demonstrate that neither the 5-methylcytosine spot nor the new spot were observed when the DNA was digested with HpaII, a methylation-sensitive isoschizomer of MspI which cuts DNA at the sequence CCGG but only if the internal CpG dinucleotide is unmethylated, again indicating that the spot was likely to be a derivative of 5-methyl-dCMP; this is because both 5-methylcytosine and cytosine are present at the 5' end of MspI fragments and are therefore labeled by polynucleotide kinase, but only cytosine is represented at the 5' end of DNA fragments produced by the methylation-sensitive isoschizomer HpaII.

[0336] To confirm that the spot was not an artefact of MspI digestion, we tested another methylation-insensitive enzyme, Taq.alpha.1, whose restriction site (TCGA) includes a central CG dinucleotide. As with MspI, both 5-methylcytosine and cytosine are present at the 5' end of DNA fragments produced by Taq.alpha.1, and are therefore labeled. We show that Taq.alpha.1, a methylation-insensitive enzyme which cuts at the sequence TCGA, gives the same results as MspI, a methylation-insensitive enzyme which cuts at the sequence CCGG. Once again, the novel spot was observed in Taq.alpha.1-digested DNA from cells expressing wildtype, but not mutant, full-length or C+D TET1, and again the intensity of the spot correlated with a decrease in the intensity of the 5-methyl-dCMP spot.

[0337] FIG. 7 shows these experiments represented using line scans of the phosphorimaging of the labeled spots on the TLC plate. These experiments confirmed the correlation between loss of 5-methylcytosine and appearance of the novel nucleotide in cells expressing full-length (FL) or C+D TET1, but not FL mut or C+D mut.

Identification of the Novel Nucleotide as 5-Hydroxymethyl-dCMP

[0338] We identified the novel nucleotide produced by TET1 expression as 5-hydroxymethyl-dCMP. We subcloned full-length and C+D TET1 and their mutant versions into a vector containing an cassette in which expression of human CD25 was driven by an internal ribosome entry site (IRES). This strategy allowed identification and sorting of transfected cells that co-expressed TET1 and CD25, and the acquisition of samples from a preparative TLC.

[0339] We showed the generation of expression plasmids based on pEF1 and used to express full-length TET1 or its C+D catalytic domain, either wildtype (wt) or mutant (mut), together with an IRES-human CD25 cassette, and we demonstrated that successfully-transfected cells were marked with CD25 expression. The cells were sorted for CD25 expression to enrich for the TET1-expressing cell population, genomic DNA was isolated and subjected to MspI cleavage, treatment with calf intestinal phosphatase (CIP) end-labeling with polynucleotide kinase (PNK), hydrolysis to dNMPs with snake venom phosphodiesterase (SVPD) and DNase I, and thin-layer chromatography. The results of the TLC assay showed that the novel nucleotide ("new spot") is only observed in DNA from cells transfected with the catalytically-active (C+D) fragment of TET1, and not in DNA from cells transfected with empty vector or the catalytically-inactive mutant version of (C+D). FIG. 8 depicts theses experiments as line scans of the labeled spots on the TLC plate, using phosphorimager analysis.

[0340] Experiments to determine the identity of the unknown nucleotide by mass spectrometry were performed. Ultra performance liquid chromatography was carried out using Acquity UPLC system (Waters Corp., Milford, Mass.). Waters HSS C18 column (1.0 mm i.d..times.50 mm, 1.8-um particles) was used. The mobile phases were 0.1% aqueous ammonium formate (A, pH6.0) and Methanol (B). After initial equilibration at 100% A, the methanol was increased linearly from 0% to 50% over 15 minutes and then to 100% within 10 minutes and stay at 100% MeOH for 2 minutes before getting back to 0% methanol in 10 min to flush the column. The column was then allowed to re-equilibrate by holding 100% A for 7 min prior to subsequent analyses. The flow rate was 0.05 ml min-1 and the eluant was directly injected into the mass spectrometer. Mass spectrometry analysis was carried out using a Q-tof Premier mass spectrometer (Waters Corp., Milford, Mass.) fitted with an electrospray interface. Data were acquired and processed with Masslynx 4.1 software. Instrument tuning and mass calibration were carried out using 1 mM sodium acetate solution (in 1:4 H2O: ACN). Mass spectra were recorded in the negative mode within m/z 300-500 for LC/MS runs, and within 50-350 for LC/MS/MS runs. The quad was set to allow all ions to pass through in the LC/MS runs, and was set to focus on the specific mass of the targeted parent ions for fragmentation in the LC/MS/MS runs. For all characterizations, Ultra pure water was obtained from a Milli-Q water purification system (Millipore). All solvents and modifiers used were mass spectrometry grade. Methanol was purchase from Fisher Scientific. Ammonium formate was obtained from Sigma. To determine the identity of the unknown nucleotide (336.06 Da signal in negative mode), LC/MS and LC/MS/MS experiments were performed in which the samples eluted from TLC plate were frozen, lyophilized, and re-suspended in water for on-line LC/MS and LC/MS/MS analysis.

[0341] The region containing the unknown spot was excised from preparative TLC plates, and XCMS was used to compare the ion intensities of the signals obtained by processing DNA from cells expressing the wild-type versus the mutant version of TET1 C+D (FIG. 9A). After background subtraction (of the values obtained from a control run of the solvent gradient with Milli-Q water injection), a single species of 336.0582 Da was the only one which showed a significant difference in intensity between the two samples. We found that the intensity of the signal from this species in the wildtype sample was .about.19-fold greater than that in the wild-type sample, whereas for all other species the signal intensity ratio was smaller than 2. Considering the large errors involved in the extraction of samples by scraping TLC plates, species with signal intensity ratios smaller than 2 can reasonably be ignored. The mass of 336.06 Da is consistent with a molecular formula of C.sub.10H.sub.15NO.sub.8P.sup.-, or 5-hydroxymethyl cytosine, an oxidation product which from our bioinformatic analysis could reasonably be produced by TET1.

[0342] LC/MS/MS runs were carried out at several collision energies: 15, 25, 35V (not shown) and 50V, in both positive and negative modes. 5-hydroxymethylcytosine from T4 phage was used as standard for comparison. For straight comparison, all the LC and MS/MS parameters were kept exactly the same for the unknown nucleotide and the 5-hydroxymethylcytsoine standard in each MS/MS run. After background subtraction (of the MS/MS of wild-type blank sample) by Matlab 7.1 (The MathWorks, Inc.) the MS/MS spectra of the unknown nucleotide looked exactly the same as those corresponding MS/MS spectra from the T4 5-hydroxymethylcytosine standard.

[0343] Since 5-hydroxymethylcytosine is not commercially available, a biological source of this nucleotide was sought. The genomes of T-even phages contain hydroxymethylcytosine, which is normally almost completely glucosylated by enzymes in their E. coli hosts. This modification protects them from bacterial restriction enzymes such as McrBC, which recognise and cleave DNA containing either 5-methylcytosine or 5-hydroxymethylcytosine. If these phages are grown in E. coli ER1656, a strain deficient in the glucose donor molecule UDP glucose, lacking GalU (the enzyme that catalyses formation of the glucose donor UDP-Glucose) and the McrA and McrB1 components of McrBC, they remain unglucosylated and their DNA can be used as a source of 5-hydroxymethylcytosine. Indeed, through TLC analysis we showed that DNA from T4 phage grown in galU, mcrA, mcrB1 E. coli hosts yields only 5-hydroxymethylcytosine and no cytosine or 5-methylcytosine. The 5-hydroxymethylcytosine migrates similarly to the novel nucleotide obtained from TET1-expressing cells. We showed that the novel nucleotide spot is present only in cells expressing the wild-type C+D domains, and migrates similarly by TLC analysis to authentic 5-hydroxymethylcytosine obtained from T4 phage grown in GalU-deficient E. Coli hosts. As we show in FIG. 9, the unknown nucleotide was determined to be identical to authentic 5-hydroxymethylcytosine obtained from T4 phage grown in GalU-deficient E. Coli hosts, by using LC/MS/MS runs carried out in negative mode with collision energies of 15V and 25V.

Physiological Importance of TET1 in Gene Regulation.

[0344] We have shown that a recombinant protein comprising the catalytic domain (C+D) of human TET1, expressed in baculovirus expression vector in insect Sf9 cells, is active in converting 5-methylcytosine to 5-hydroxylmethylcytosine in vitro. Further, the catalytically active TET1 fragments shows an absolute requirement for Fe(II) and 20G. Omission of ascorbate did not result in a significant decrease in catalytic activity, most likely because dithiothreitol was included in the reaction to counteract the strong tendency of TET1-CD to oxidize (L. Que Jr., et al., Chem. Rev. 96, 2607 (1996); C. Loenarz, and C. J. Schofield, Nat. Chem. Biol. 4, 152 (2008); L. E. Netto and E. R. Stadtman, Arch. Biochem. Biophys. 333, 233 (1996)). We showed that recombinant TET1-CD was specific for 5-methylcytosine, as conversion of thymine to hydromethyluracil (hmU) was not detected.

[0345] We used an SDS polyacrylamide gel stained with Coomassie Blue in which lane 1 had molecular weight markers, lanes 2-4 were loaded with the indicated amounts of bovine serum albumin (BSA) (2, 1 and 0.5 microgram), lanes 5-8 were loaded with eluted protein from the FLAG affinity column used to purify C+D and C+D mutant (mut). Lanes 5 and 6 had 1.6 micrograms of C+D and mut respectively, and lanes 7 and 8 had 5 micrograms of C+D and mut respectively. The band around 90 kDa represents the TET1 fragment and the bands of higher apparent molecular weight are oxidized versions of the same fragment. We used anti-FLAG western blots loaded with different fractions from the FLAG affinity columns used to purify C+D and C+D mut respectively (Lys=cell lysate; sol=soluble; ins=insoluble; FT=flowthrough; W1=wash 1; W2=wash 2; Fg E1=1.sup.st elution with FLAG peptide; Fg E2=2.sup.nd elution with FLAG peptide; low pH=final elution of column with low pH buffer). We showed that the recombinant C+D fragment of TET1 is catalytically active in vitro, and can produce hydroxymethyl-dCMP (Hm-dCMP) using either the fully-methylated oligo 1 or the hemimethylated oligo 3 as substrate, whereas the catalyticaly-inactive mutant C+D is not. We also showed the relative activity of the recombinant C+D fragment of TET1 in the presence of various combinations of Fe2+, ascorbic acid, a-KG and EDTA. Briefly, 10 mg of double-stranded DNA oligonucleotides containing a methylated Taq.alpha.1 site were incubated with 3 mg of GST-SMCX in a buffer containing 1 mM a-KG, 2 mM ascorbic acid, 75 mM Fe2+ for 3 hours at 37 C. The enzyme to substrate ratio is 1:10. Oligonucleotides were incubated under identical conditions with purified FlagHA-CD(DHD) as a negative control. Recovered oligonucleotides were digested with Taq.alpha.1, end-labeled with T4-PNK and g-32P-ATP and then hydrolyzed to dNMP's with DNaseI and snake venom phosphodiesterase. dNMP's were resolved using cellulose TLC plates and the relative amounts of dNMP's were quantitated using phosphorimager. Each condition was performed in triplicate. FIG. 10 shows the relative activity of the recombinant C+D fragment of TET1 in the presence of various combinations of Fe2+, ascorbic acid, a-KG and EDTA.

[0346] We demonstrated the physiological importance of TET1 in gene regulation. FIG. 11A demonstrates that Tea mRNA is strongly upregulated after 8 h of stimulation of mouse dendritic cells (DC) with LPS, a standard activating stimulus for DC. FIGS. 11B-11I shows the changes in Tet1, Tet2 and Tet3 mRNA levels in mouse ES cells that have been induced to differentiate by withdrawal of leukemia inhibitory factor (LIF) and addition of retinoic acid. We cultured v6.5 mouse ES cells on gelatin-coated wells in DMEM media supplemented with 15% FBS and 10.sup.3 units/ml of LIF. Twenty four hours after plating (DO time point), cells were either continually cultured in the presence of LIF or treated with 1 mM retinoic acid in the absence of LIF for up to 5 days. We showed phase contrast images of the cells, taken daily using a 20.times. objective. We collected cell samples daily for RNA extraction. We measured transcript levels of Tet1, Tet2,Tet3 and Oct4, normalized to b-actin levels, by quantitative RT-PCR and expressed relative to levels at DO. Error bars denote mean.+-.SD from 2 experiments. We showed that Tet1 and Tet2 and the positive control pluripotency gene Oct4 are downregulated, whereas Tet3 is upregulated, during RA-induced differentiation.

[0347] We asked whether 5-hydroxymethylcytosine was a physiological constituent of mammalian DNA. Using the TLC assay, we observed a clear spot corresponding to labeled 5-hydroxymethylcytosine in mouse embryonic stem (ES) cells. Quantification of multiple experiments indicated that 5-hydroxymethylcytosine and 5-methylcytosine constituted 4 to 6% and 55 to 60%, respectively, of all cytosine species in MspI cleavage sites (CACGG) in ES cells. We showed that Tea mRNA levels declined by 80% in response to leukemia inhibitory factor (LIF) withdrawal for 5 days, compared with the levels observed in undifferentiated ES cells; in parallel, 5-hydroxymethylcytosine levels diminished from 4.4 to 2.6% of total C species, a decline of .about.40% from control levels. The difference might be due to the compensatory activity of other Tet-family proteins. Similarly, RNA interference (RNAi)-mediated depletion of endogenous Tea resulted in an 87% decrease in Tet1 mRNA levels and a parallel .about.40% decrease in 5-hydroxymethylcytosine levels. Again, the difference is likely due to the presence of Tet2 and Tet3, which are both expressed in ES cells.

[0348] We show the effect of Tet RNAi on ES cell lineage gene marker expression. Twenty four hours after plating on gelatin-coated wells (DO time point), v6.5 ES cells were transfected with siGENOME SMARTpool (Dharmacon) siRNA targeting Tet1, Tet2 or Tet3, or a luciferase (luc)-targeting siRNA as a negative control, with Lipofectamine RNAiMAX (Invitrogen) in the presence of LIF. Cells were passaged and re-transfected pre-adherent at days 2 and 4 in the presence of LIF. Samples were collected at days 3 (D3) and 5 (D5) for RNA isolation. We took phase contrast images at day 5 (2 fields per transfection). Knockdown of Tet proteins causes appreciable spontaneous ES cell differentiation (especially apparent with Tet3 knockdown, right panels). FIG. 12 shows the degree of knockdown of Tet1, Tet2 and Tet3 RNA, measured by quantitative RT-PCR and normalized to Gapdh levels, in cells treated with Tet1, Tet2 and Tet3 siRNAs. FIG. 12 (middle and bottom rows) show expression of Tet1-Tet3, trophectoderm (Cdx2, Hand1, Psx1), primitive endoderm (Gata4), mesoderm (Brachyury) and primitive ectoderm (Fgf5) markers were measured by quantitative RT-PCR and normalized to Gapdh levels. The expression of D3 control siRNA treatment was set as reference.

[0349] Without wishing to be bound by a theory, our data indicate that Tet1, and other Tet family members, are responsible for 5-hydroxymethylcytosine generation in ES cells under physiological conditions. CpG dinucleotides are .about.0.8% of all dinucleotides in the mouse genome; thus, 5-hydroxymethylcytosine (which constitutes .about.4% of all cytosine species in CpG dinucleotides located in MspI cleavage sites) is .about.0.032% of all bases (.about.1 in every 3000 nucleotides, or .about.2.times.10.sup.6 bases per haploid genome). For comparison, 5-methylcytosine is 55 to 60% of all cytosines in CpG dinucleotides in MspI cleavage sites, about 14 times as high as 5-hydroxymethylcytosine (5-hydroxymethylcytosine may not be confined to CpG). An important question is whether 5-hydroxymethylcytosine and TET proteins are localized to specific regions of ES cell DNA--for instance, genes that are involved in maintaining pluripotency or that are poised to be expressed upon differentiation. A full appreciation of the biological importance of 5-hydroxymethylcytosine will require the development of tools that allow 5-hydroxymethylcytosine, 5-methylcytosine, and cytosine to be distinguished unequivocally.

[0350] As a potentially stable base, 5-hydroxymethylcytosine may influence chromatin structure and local transcriptional activity by recruiting selective 5-hydroxymethylcytosine binding proteins or excluding methyl-CpG-binding proteins (MBPs) that normally recognize 5-methylcytosine, thus displacing chromatin-modifying complexes recruited by MBPs. Indeed, it has already been demonstrated that the methylbinding protein MeCP2 does not recognize 5-hydroxymethylcytosine (V. Valinluck et al., Nucleic Acids Res. 32, 4100 (2004)). Alternatively, without wishing to be bound by a theory, conversion of 5-methylcytosine to 5-hydroxymethylcytosine may facilitate passive DNA demethylation by excluding the maintenance DNA methyltransferase DNMT1, which recognizes 5-hydroxymethylcytosine poorly (V. Valinluck and L. C. Sowers, Cancer Res. 67, 946 (2007)). Even a minor reduction in the fidelity of maintenance methylation would be expected to result in an exponential decrease in CpG methylation over the course of many cell cycles. Finally, 5-hydroxymethylcytosine may be an intermediate in a pathway of active DNA demethylation. 5-hydroxymethylcytosine has been shown to yield cytosine through loss of formaldehyde in photooxidation experiments (E. Privat and L. C. Sowers, Chem. Res. Toxicol. 9, 745 (1996)) and at high pH (J. G. Flaks, S. S. Cohen, J. Biol. Chem. 234, 1501 (1959); A. H. Alegria, Biochim. Biophys. Acta 149, 317 (1967)), leaving open the possibility that 5-hydroxymethylcytosine could convert to cytosine under certain conditions in cells. A related possibility is that specific DNA repair mechanisms replace 5-hydroxymethylcytosine or its derivatives with cytosine (S. K. Ooi, T. H. Bestor, Cell 133, 1145 (2008); J. Jiricny, M. Menigatti, Cell 135, 1167 (2008)). In support of this hypothesis, a glycosylase activity specific for 5-hydroxymethylcytosine was reported in bovine thymus extracts (24. S. V. Cannon, et al., Biochem. Biophys. Res. Commun. 151, 1173 (1988)). Moreover, several DNA glycosylases, including TDG and MBD4, have been implicated in DNA demethylation, although none of them has shown convincing activity on 5-methylcytosine in in vitro enzymatic assays (B. Zhu et al., Proc. Natl. Acad. Sci. U.S.A. 97, 5135 (2000);. R. Metivier et al., Nature 452, 45 (2008);S. Kangaspeska et al., Nature 452, 112 (2008)). Cytosine deamination has also been implicated in demethylation of DNA (R. Metivier et al., Nature 452, 45 (2008); S. Kangaspeska et al., Nature 452, 112 (2008); K. Rai et al., Cell 135, 1201 (2008)); in this context, deamination of 5-hydroxymethylcytosine yields hmU, and high levels of hmU:G glycosylase activity have been reported in fibroblast extracts (V. Rusmintratip and L. C. Sowers, Proc. Natl. Acad. Sci. U.S.A., 97, 14183 (2000)).

[0351] Our studies alter the perception of how cytosine methylation may be regulated in mammalian cells. Notably, disruptions of the TET1 and TET2 genetic loci have been reported in association with hematologic malignancies. A fusion of TET1 with the histone methyltransferase MLL has been identified in several cases of acute myeloid leukemia (AML) associated with t(10;11)(q22;q23) translocation (R. Ono et al., Cancer Res. 62, 4075 (2002); R. B. Lorsbach et al., Leukemia 17, 637 (2003)). Homozygous null mutations and chromosomal deletions involving the TET2 locus have been found in myeloproliferative disorders, suggesting a tumor suppressor function for TET2 (F. Viguie et al., Leukemia 19, 1411 (2005); F. Delhommeau et al., paper presented at the American Society of Hematology Annual Meeting and Exposition, San Francisco, Calif., Dec. 9, 2008.). It will be important to test the involvement of TET proteins and 5-hydroxymethylcytosine in oncogenic transformation and malignant progression.

The Role of Tet Oncogene Proteins in Mouse Embryonic Stem Cells

[0352] By computational analysis, we identified the TET proteins, TET1, TET2 and TET3, as mammalian homologs of the trypanosome J-binding proteins JBP1 and JBP2 that have been proposed to oxidize the 5-methyl group of thymine. We have found that TET1/CXXC6, previously characterized as a fusion partner of the MLL gene in acute myeloid leukemia, is an iron- and a-ketoglutarate-dependent dioxygenase that catalyzes the conversion of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (hmC), both as a recombinant protein in vitro and when overexpressed in cultured HEK293 cells (Tahiliani, M., et al., Science, 2009: 324(5929): p. 930-935). We find that 5-hydroxymethylcytosine can be detected in the genome of mouse embryonic stem (ES) cells but not in differentiated cell types. Tea and Tet2, but not Tet3, are highly expressed in mouse ES cells and RNAi-mediated depletion of both Tea and Tet2 causes loss of 5-hydroxymethylcytosine. Tea and Tet2 are repressed rapidly in parallel with Oct4 when ES cells are cultured in the absence of leukemia inhibitory factor (LIF), whereas additional treatment of retinoic acid leads to induction of Tet3 during differentiation. These changes correspond with a decrease in genomic5-hydroxymethylcytosine levels. Loss of pluripotency caused by Oct4 RNAi also downregulates Tea and Tet2 expression with loss of 5-hydroxymethylcytosine. On the other hand, gain of pluripotency in induced pluripotent stem (iPS) cell reprogrammed from mouse fibroblasts is associated with induction of both Tea and Tet2 and appearance of 5-hydroxymethylcytosine. RNAi-depletion of each Tet member does not decrease mRNA levels of the pluripotency-associated genes Oct4, Sox2 and Nanog, but Tet1 RNAi results in induction of genes that specify trophectodermal lineage. Our results suggest (i) that Tea and Tet2 catalyze conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mouse ES cells; (ii) that Tet1, Tet2 and 5-hydroxymethylcytosine are associated with the pluripotent state; (iii) that Tet1 and Tet2 are downstream targets of the transcriptional network regulated by Oct4 and (iv) that Tea is a novel factor involved in repression of trophectoderm lineage development during the first cell-fate decision in mouse embryogenesis.

[0353] We used the following methods in our analyses. To perform immunofluorescence, we transfected cells with pEF1a expression constructs with HA-epitope N-terminal of full length (FL) TET1 or catalytic domain alone (TET1 CD) or empty vector (mock) for 2 days, as depicted in FIG. 13A. Fixed cells were treated with 2N HCl to denature DNA before co-staining with rabbit anti-HA (Santa Cruz Biotechnology) and mouse anti-5-methylcytosine (Calbiochem) antibodies which were detected using secondary antibodies coupled with Cy2 or Cy3 respectively. Nuclei were stained with DAPI before mounting for fluorescence imaging.

[0354] To perform thin-layer chromatography (TLC), genomic DNA was digested with the restriction endonuclease MspI, which cleaves at CACGG sites, to generate fragments whose 5' ends derive from the dinucleotide CpG and contain either 5-methylcytosine, C or 5-hydroxymethylcytosine. The digested DNA was then radiolabeled at the 5' ends and then hydrolysed from the 3' ends to single dNMPs which were resolved by TLC. Spot intensities were measured by phosphoimaging densitometry and 5-hydroxymethylcytosine levels are represented as percentages of total cytosine (5mC+C+hmC). Values were mean.+-.SD from triplicate samples (FIG. 13A).

[0355] To perform cell culture and RNA interference (RNAi), V6.5 mouse ES cells were maintained on feeder layers in standard ES medium but were replated on gelatin-coated wells for the experiments described. RNAi experiments were performed using Dharmacon siGENOME siRNA duplexes. Mouse ES cells were transfected with 50 nM siRNA using Lipofectamine RNAiMAX reagent (Invitrogen) in the presence of LIF. Retransfections were performed on pre-adherant cells every 2 days and cells were harvested at Day 5 for RNA and TLC analyses.

[0356] We performed RNA extraction, cDNA synthesis and quantitative real-time PCR analyses. Briefly, total RNA was isolated with an RNeasy kit (Qiagen) with on-column DNase treatment. cDNA was synthesized from 0.5 mg total RNA using SuperScriptIII reverse transcriptase (Invitrogen). Quantitative PCR was performed using FastStart Universal SYBR Green master mix (Roche) on a StepOnePlus real-time PCR system (Applied Biosystems). Gene expression was normalized to Gapdh and referenced to Day 0 samples. Data shown are mean.+-.SEM, n=3-4.

[0357] We indentified 5-hydromethylcytosine as the catalytic product of conversion from 5-methylcytosine by TET1 and detected 5-hydromethylcytosine in the genome of mouse ES cells (FIG. 13C). We showed that overexpression of HA-TET1 in HEK293 cells causes loss of staining with an antibody to 5-methylcytosine. We found that TLC of cells overexpressing full-length (FL) TET1 or the predicted catalytic domain (CD) reveals the appearance of an additional nucleotide species identified by mass spectrometry as 5-hydromethylcytosine. We found that H1671Y, D1673A mutations at the residues predicted to bind Fe(II) abrogate the ability of TET1 to generate 5-hydromethylcytosine, and that 5-hydromethylcytosine is detected in the genome of mouse ES cells (FIG. 13B).

[0358] We found a role for murine Tet1 and Tet2 in the catalytic generation of 5-hydromethylcytosine in ES cells. The mouse genome expresses three family members--Tet1, Tet2 and Tet3--that share significant sequence homology with the human homologs (FIG. 14A) (Lorsbach, R. B., et al., Leukemia, 2003. 17(3): p. 637-41). Tet1 and Tet3 encode within their first conserved coding exon the CXXC domain. We show that mouse ES cells express high levels of Tet1 and Tet2 (FIG. 15), but not Tet3, which can be depleted with RNAi (FIG. 14). We found that RNAi-depletion of Tet1 or Tet2 alone decreases 5-hydromethylcytosine levels partially but combined RNAi reduces 5-hydromethylcytosine levels further, suggesting that both Tet1 and Tet2 are enzymes responsible for the catalytic conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mouse ES cells.

[0359] We showed changes in Tet family gene expression occur in mouse ES cells upon differentiation. We found that mRNA levels of Tet1, Tet2 and Oct4 rapidly decline upon LIF withdrawal (FIG. 15). Tet3 level remains low upon LIF withdrawal but increases 10-fold with addition of retinoic acid (FIG. 15C). We found that the decline of Tea and Tet2 expression is associated with loss of 5-hydromethylcytosine.

[0360] We found that Tet1, Tet2 and 5-hydromethylcytosine are associated with pluripotency. We show that the loss of pluripotency induced by RNAi-mediated depletion of Oct4 potently suppresses Tea and Tet2 expression and upregulates Tet3 (FIGS. 16A-16C). We show that Sox2 RNAi causes a similar, though weaker, effect as Oct4 RNAi and that Nanog RNAi has almost no effect (FIGS. 16A-16C). We found that RNAi-depletion of Oct4 in particular causes loss of 5-hydromethylcytosine in ES cells. We show that the gain of pluripotency in iPS clones derived from mouse tail-tip fibroblasts (TTF) by viral transduction of Oct4, Sox2, Klf4 and c-Myc is associated with up-regulation of Tea and Tet2 and appearance of 5-hydromethylcytosine in the genome (FIGS. 16D-16E).

[0361] We show that Tet family member knockdown impacts ES cell pluripotency and differentiation genes. We show that RNAi-mediated knockdown of each Tet family member does not affect expression of the pluripotency factors Oct4, Sox2 and Nanog (FIGS. 17A-17C). We show that RNAi-depletion of Tet1, but not of Tet2 or Tet3, increases the expression of the trophectodermal genes Cdx2, Eomes and Hand1 (FIGS. 17D-17F). We show that RNAi-depletion of Tet family members produces small insignificant changes in expression of extraembryonic endoderm, mesoderm and primitive ectoderm markers (FIGS. 17G-17I).

The Effect of 5-Hydroxymethylcytosine on Sodium Bisulfite-Based Analysis of DNA Methylation Status

[0362] Treatment of DNA with sodium bisulfite promotes the deamination of cytosine to uracil, while 5-methylcytosine is deaminated at a far slower rate, allowing the methylation state of a given cytosine to be ascertained. The reaction of sodium bisulfite with cytosine, 5 methylcytosine and 5-hydroxymethylcytosine differs, as depicted in FIG. 23. During bisulfite-mediated deamination of cytosine, HSO.sub.3.sup.- reversibly and quickly adds across the 5,6 double bond of cytosine, promoting deamination at position 4 and conversion to U--SO.sub.3.sup.-. U--SO.sub.3.sup.- is stable under neutral conditions, but is easily desulfonated to uracil at higher pH. 5-methylcytosine is deaminated to thymine by bisulfite conversion, but the rate is approximately two orders of magnitude slower than that of cytosine. Recently, we showed that 5-hydroxymethylcytosine is present in mammalian DNA (S. Kriaucionis and N. Heintz, Science 324, 929 (2009); M. Tahiliani et al., Science 324, 930 (2009)). Bisulfite reacts with 5-hydroxymethylcytosine to form cytosine 5-methylenesulfonate. This adduct does not readily undergo deamination (H. Hayatsu, et al., Biochemistry 9, 2858 (1970); R. Y. Wang, et al., Nucleic Acids Res 8, 4777 (1980); H. Hayatsu and M. Shiragami, Biochemistry 18, 632 (1979)).

[0363] Bisulfite sequencing usually entails PCR amplification of a region of bisulfite-treated genomic DNA containing the cytosines of interest, followed by sequencing of PCR clones. Cytosine to thymine transitions will be observed at all unmethylated cytosines (M. Frommer et al., Proc Natl Acad Sci USA 89, 1827 (1992)). To test whether the bulky cytosine 5-methylenesulfonate adduct impedes PCR amplification of the treated DNA, we generated DNA templates containing cytosine, 5-methylcytosine or 5-hydroxymethylcytosine as their sole cytosine species, as shown in FIG. 24. To do this, we PCR-amplified a 201 bp oligonucleotide using the nucleoside triphosphates dATP, dGTP, dTTP with dCTP or its 5-methylcytosine or 5-hydroxymethylcytosine derivatives. The PCR products were treated with bisulfite, exposed to conditions promoting deamination and desulfonation, and amplified with the primers: SEQ ID NO: 7: ATTGTCGTAGGTTAAGTGGATTGTAAGGAGGTAG and SEQ ID NO: 8: ATTCACTACCACTCTCCTTACTTCTCTTTCTCC (reverse primer used for primer extension).

[0364] Under these conditions, 5-hydroxymethylcytosine-containing DNA was very poorly amplified compared to cytosine- and 5-methylcytosine-containing DNA. Sequencing of the amplified DNA confirmed that bisulfite-treated 5-hydroxymethylcytosine did not undergo cytosine->thymine transitions, demonstrating, as expected, that 5-hydroxymethylcytosine and 5-methylcytosine cannot be distinguished by the bisulfite technique. Since 5-hydroxymethylcytosine is present in embryonic stem (ES) cells at a level .about.10% of 5-methylcytosine (M. Tahiliani et al., Science 324, 930 (2009)), it is likely that a proportion of the regions identified as methylated in the ES cell genome (C. R. Farthing et al., PLoS Genet 4, e1000116 (2008); B. H. Ramsahoye et al., Proc Natl Acad Sci USA 97, 5237 (2000)) are actually hydroxymethylated.

[0365] To determine if a block in PCR amplification occurred, we performed primer extension assays using two commercial sources of Taq polymerase. A ladder of incomplete extension products was seen only with bisulfite-treated, 5-hydroxymethylcytosine-containing DNA, in which the 5-hydroxymethylcytosine had been converted to the bulky cytosine 5-methylenesulfonate. The most significant stalling occurred at positions across from a CTC sequence close to the end of the reverse primer, and a CCGC sequence and several CC sequences further away. We also found that there were cytosine residues where stalling was weak or did not occur. Thus, cytosine 5-methylenesulfonate stalls but does not block Taq polymerase, and the stalling is particularly striking when two cytosine 5-methylenesulfonate residues are adjacent (FIG. 25).

[0366] In mammalian DNA, 5-methylcytosine (and therefore its hydroxylated derivative, 5-hydroxymethylcytosine) are found almost exclusively in the context of the dinucleotide CpG (B. H. Ramsahoye et al., Proc Natl Acad Sci USA 97, 5237 (2000); Y. Gruenbaum, et al., FEBS Lett 124, 67 (1981); M. Ehrlich, R. Y. Wang, Science 212, 1350 (1981)). To evaluate the degree to which CMS would stall Taq polymerase in this physiological context, we synthesized a set of 158 bp oligonucleotides in which the top strand contained one common CG dinucleotide (in the sequence TCGA, highlighted in FIG. 24B) and a second variable sequence that was one of the following: GGAT, CGAT, CCAT, CGCG, or CCGG (indicated by XXXX in FIG. 24B). After bisulfite treatment, the most significant stalling was observed at the tandem CC sequences in the CC and CCGG oligonucleotides. A minor amount of stalling was observed at the same position in the 2-CG (two non-continuous CGs) and CGCG oligonucleotides. Nevertheless, the 1-CG, 2-CG and CGCG oligonucleotides were efficiently amplified after bisulphite treatment, whereas oligonucleotides containing CC sequences showed a perceptible decrease in amplification efficiency (FIG. 25). The primers used for amplification were: SEQ ID NO: 9: GTGAAATATTGTGGTAGGTTAAGTGGATTGTAAGGAG and SEQ ID NO: 10: CATCTTAATTAACACTACCACTCTCCTTACTTCTCTTTCT.

[0367] We postulated that if cytosine 5-methylenesulfonate can stall DNA polymerase, genomic loci containing hydroxymethylated DNA might be underrepresented in quantitative methylation analyses. To evaluate this point, we examined the MLH1 locus, which is known to be heavily methylated in HEK293T cells (S. Fukushige, et al., Biochem Biophys Res Commun 377, 600 (2008)). We confirmed this point by bisulfite sequencing of genomic DNA purified from HEK293T cells (FIG. 26). The primers used to sequence were: SEQ ID NO: 11: GTGAATTAAGGATTTTTTTGTGTG and SEQ ID NO: 12: AAAAAACATTTCCCTACTTC. Two different amplicons in the MLH1 locus were shown to contain more than 10 highly methylated CpGs; methylated cytosines, which do not undergo C->T transitions, are shown in bold, whereas partially methylated C's which yielded a mixture of C and T after bisulfite sequencing, are highlighted and indicated by Y (FIG. 26). The primers we used to amplify the MLH1 locus amplicons were: SEQ ID NO: 13: GTTAGATTATTTTAGTAGAGGTATATAAGT and SEQ ID NO: 14: ACCAATCAAATTTCTCAACTCTAT; and SEQ ID NO: 15: TGAGAAATTTGATTGGTATTTAAGTTG and SEQ ID NO: 16: CAATCATCTCTTTAATAACATTAACTAACC. We then treated the genomic DNA with the recombinant catalytic domain of TET1 in vitro. Roughly 80% of 5-methylcytosine in MspI or Taq.alpha.1 sites was converted to 5-hydroxymethylcytosine (FIG. 27). Real-time PCR analysis showed that untreated and TET1-treated (hydroxymethylated) DNAs were amplified with almost identical efficiency (FIG. 26), even though each amplicon contained more than 10 highly methylated CpGs.

[0368] In summary, we have shown that the bisulfite technique for DNA methylation analysis does not distinguish between 5-hydroxymethylcytosine and 5-methylcytosine; that loci containing dense regions of hydroxymethylated DNA may be underrepresented in quantitative methylation analyses; and that primer extension reactions conducted with bisulfite-treated DNA would be predicted to terminate disproportionately at sites of hydroxymethylation. It should be possible to take advantage of our findings, combining ligation-mediated PCR with primer extension under suboptimal extension conditions to determine the location of 5-hydroxymethylcytosine in the genome. It is unclear how CMS inhibits PCR. Rein et al. proposed that CMS would block DNA polymerase by analogy to oxidative pyrimidine adducts such as thymine glycol (T. Rein, et al., Nucleic Acids Res 26, 2255 (1998)). However, CMS retains aromaticity, whereas it has since been demonstrated that polymerases are disrupted by thymine glycol's loss of aromaticity and consequent adoption of a chair geometry (P. Aller, et al., Proc Natl Acad Sci USA 104, 814 (2007)). Whatever the mechanism, the observation that 5-hydroxymethylcytosine can stall Taq polymerase after bisulfite reactions may have important ramifications for our interpretation of previous DNA methylation analyses as discussed above.

Materials and Methods

[0369] Minigenes were designed for generation of DNA templates containing cytosine, 5-methylcytosine or 5-hydroxymethylcytosine. Minigenes used as templates to amplify cytosine, 5-methylcytosine or 5-hydroxymethylcytosine containing oligonucleotides were synthesized by Integrated DNA Technologies. DNA containing cytosine 5-methylcytosine or 5-hydroxymethylcytosine was amplified by PCR using nucleoside triphosphates dATP, dGTP, dTTP with dCTP or its derivatives mdCTP (GE healthcare) or hmdCTP (Bioline). PCR products were run on a 2% agarose gel to confirm correct length and further purified by a gel extraction kit (Qiagen).

[0370] Bisulfite treatment and recovery of samples were carried out with the EpiTect Bisulfite kit (QIAGEN) by following manufacturer's instructions. In brief, 2 .mu.g DNA in 20 .mu.L volume was used for each reaction and mixed with 85 .mu.L bisulfite mix and 35 .mu.L DNA protect buffer. Bisulfite conversion was performed on a thermocycler as follows: 99.degree. C. for 5 min, 60.degree. C. for 25 min, 99.degree. C. for 5 min, 60.degree. C. for 85 min, 99.degree. C. for 5 min, 60.degree. C. for 175 min and 20.degree. C. indefinitely. The bisulfite treated DNA was recovered by EpiTect spin column and subsequently sequenced to confirm the efficiency of bisulfite conversion.

[0371] RealTime PCR of oligonucleotides was performed on the StepONE plus real-time PCR system (Applied Biosystems) by using the FastStart Universal SYBR Green Master kit (Roche). 0.1 .mu.g DNA template and 0.15 mM primers were used in each reaction. The amplification reaction program was set as: 95.degree. C. for 10 min, 40 cycles of 95.degree. C. for 15 sec, 60.degree. C. for 1 min, and a melt curve analysis step at the end. Data were analyzed by StepONE plus real-time PCR software.

[0372] To perform the primer extension assays, reverse primers (50 ng) were end labeled with T4 polynucleotide kinase (T4 PNK) (NEB) and 10 .mu.Ci of [.gamma.-32P]-ATP (PerkinElmer) for 1 hr at 37.degree. C., and then purified by Illustra MicroSpin G-25 column (GE Healthcare). For the primer extension, 2 ng template, 4 pmol .gamma.32-P-labeled primers were used. PCR reactions were set up according to manufacturer's instructions using two commercial sources of Taq DNA polymerase (Roche and Sigma). For Roche Taq DNA polymerase, the PCR condition was set as: 95.degree. C. for 10 min, 30 cycles of 95.degree. C. for 15 sec, 60.degree. C. for 1 min. For Sigma TagRED polymerase, the PCR condition was set as: 30 cycles of 94.degree. C. for 1 min, 55.degree. C. for 2 min and 72.degree. C. for 1 min. The primer extension products were mixed with 2.times. gel loading buffer II (Ambion), denatured at 95.degree. C. for 15 min and loaded to 12% polyacrylamide gel denaturing (7 M urea). Sanger sequencing were performed using Thermo Sequenase Dye Primer Manual Cycle Sequencing kit (USB). 2 ng template and 1 pmol [.gamma.32-P]-labeled primer were used for Sanger sequencing. The results were visualized by autoradiography.

[0373] Real Time PCR of bisulfite treated genomic DNA was performed by extracting genomic DNA from HEK293 cells (as described in (H. Hayatsu, et al., Biochemistry 9, 2858 (1970)), and shearing the DNA by vortexing to facilitate pipeting. Recombinant human TET1 catalytic domain (CD) was expressed in insect cells as in (H. Hayatsu, et al., Biochemistry 9, 2858 (1970)). 12 .mu.g of DNA was then reacted with 18 .mu.g of TET1-CD in 50 mM HEPES pH 8.0, 50 mM NaCl, 2 mM Ascorbic Acid, 1 mM alpha-ketoglutarate, 100 .mu.M FAS, and 1 mM DTT. The total reaction volume was 300 .mu.L and the reaction ran 90 minutes at 37.degree. C. The WT sample was subjected to the same reaction conditions without enzyme.

[0374] The DNA was then ethanol precipitated by the addition of 0.1 volume of 3 M sodium acetate pH 7.4, linear polyacrylimide, and 3 volumes of ethanol, followed by freezing and spinning at 16000 g for 30 minutes at 4.degree. C. The sample was then washed twice with 70% ethanol, air dried, and resuspended in 10 mM Tris 0.1 mM EDTA. Resuspension proceeded overnight with gentle shaking at 45.degree. C. About 500 ng of the DNA was digested with MspI or Taq.alpha. I, end labeled, digested to single nucleotides, and run on TLC as described. The data was analyzed on a phosphorimager. The strong cytosine peak seen in this work comes from the fact that we sheared the DNA beforehand, resulting in breaks not created by the enzyme which were end-labeled. This did not confound interpretation of methylation loss or the extent of hydroxymethylation.

[0375] The DNA was bisulfite treated as described above, and was quantified afterward using a Nanodrop (NanoDrop DN-1000 spectrophotometer, Thermo Scientific). Bisulfite treated DNA can no longer reanneal, so an absorbance constant typical of single stranded DNA (33 .mu.g DNA/(mL*0D260 units) was used. Bisulfite treatment changes the absorption properties of DNA so the estimated quantities could be off, but any error would be approximately consistent between the TET-CD treated and WT samples.

[0376] The primers used in the PCR of the CGless region in FIG. 26 and FIG. 27 were designed with the Bisearch Primer Design tool (R. Y. Wang, et al., Nucleic Acids Res 8, 4777 (1980)). A long stretch of DNA, arbitrarily chosen, lacking CpGs was used as input for the program, though a CpG had to be typed into the middle of the sequence to allow the input sequence to be processed. The primers used for the MLH promoter were taken from (Fukushige), with a couple bases added to raise their melting temperature.

[0377] The Real Time PCR was performed using the FastStart Universal SYBR Green Master kit (Roche), with each primer present at a final concentration of 0.15 mM. PCR was run on a StepOnePlus Real Time PCR System (Applied Biosystems), programmed to undergo an initial 10 minute 95.degree. C. step; fifty cycles of 95.degree. C. for 15 s, 50.degree. C. for 30 s, 60.degree. C. for 90 s; and a melt curve analysis step at the end. PCR products were run on an agarose gel to confirm that the correct sized product was formed as the dominant band.

[0378] Real Time PCR product was handled using different pipets than were used to set up PCRs, and also handled on different surfaces, to prevent cross-contamination.

The Effect of 5-Hydroxymethylcytosine on Sodium Bisulfite-Based Analysis of DNA Methylation Status

[0379] DNA methylation at the carbon-5 position of cytosine (5-methylcytosine, also regarded as the "fifth" base) is a stable epigenetic mark found in eukaryotes that imparts an additional layer of heritable information upon DNA. In normal cells, DNA methylation plays vital roles in embryogenesis and development, regulation of gene expression, silencing of transposable elements, and genomic imprinting. In cancer cells, DNA hypermethylation in CpG-island-promoters has been linked to aberrant silencing of tumor suppressor genes. Epigenomic profiling of DNA methylation could serve as marker of cancer cells and indicator for tumor prognosis, as well as useful predictor of response to chemotherapy.

[0380] We have shown that 5-hydroxymethylcytosine is present in mammalian DNA, and that a novel family of proteins, the TET proteins, is capable of converting 5-methylcytosine to 5-hydroxymethylcytosine both in vitro and in vivo.

[0381] Bisulfite sequencing has been one of the most widely-used techniques for global profiling of cytosine methylation patterns. Bisulfite sequencing relies on the fact that reaction with bisulfite promotes the deamination of unmethylated cytosine to yield uracil (read as thymine after PCR). Deamination occurs orders of magnitude more slowly with 5-methylcytosine and 5-hydroxymethylcytosine; 5-methylcytosine reacts poorly with bisulfite whereas 5-hydroxymethylcytosine forms a distinct adduct, cytosine 5-methylsulfonate. Thus, while unmethylated cytosine will be read as thymine, both 5-methylcytosine and 5-hydroxymethylcytosine will still be read as cytosine in subsequent PCR reactions. As a result, all cytosine methylation analyses to date run the risk of conflating 5-methylcytosine and 5-hydroxymethylcytosine. It is highly likely that genomic loci identified as methylated with traditional methods are actually hydroxymethylated.

[0382] To test whether this particular modification on 5-methylcytosine would affect bisulfite sequencing or not, we designed a set of experiments by using synthesized 5-hydroxymethylcytosine oligonucleotides and genomic DNA treated with TET protein.

[0383] The experimental design for primer extension assays that we used is outlined below. We showed primer extension assays for DNA containing different cytosine species, and compared it besides a Sanger sequencing ladder. We found that ladders of incomplete extension products were only observed in an 5-hydroxymethylcytosine-containing DNA after bisulfite treatment, at positions corresponding to G in Sanger sequencing ladder. We found that less full length product was observed in the extension reaction with 5-hydroxymethylcytosine-containing DNA treated with bisulfite.

[0384] We performed primer extension assays of DNA containing CpG combinations: 1CpG, 2CpG, CGCG, CC and CCGG. We showed that the bands corresponding to stalled PCR reaction were notably observed in the 5-hydroxymethylcytosine-containing CC or CCGG oligonucleotides after bisulfite treatment. The stalling effect, though less obvious, was also observed in bisulfite-treated, 5-hydroxymethylcytosine-containing oligonucleotides with CG or CGCG.

[0385] We performed Tet treatment of MLH1 promoter amplicons, both of which contained more than ten fully methylated residues as determined by sequencing of bulk PCR product delayed amplification by less than one cycle. Amplification of a region lacking CpGs, and thus 5-hydroxymethylcytosine, was similar in the WT and TET1 treated populations.

[0386] We designed a strategy of incorporating 5-methylcytosine and 5-hydroxymethylcytosine into designed oligonucleotides. We confirmed that the 5-hydroxymethylcytosine was successfully incorporated into the oligonucleotide using TLC. Analyzing sequencing traces of 5-hydroxymethylcytosine-containing oligonucleotides before and after bisulfite treatment indicated that bisulfite treated 5-hydroxymethylcytosine did not undergo cytosine to thymine transitions. The control cytosine-containing oligonucleotides completely underwent cytosine to thymine conversion. We performed real-time PCR amplification curve of an oligonucleotide containing cytosine, 5-methylcytosine or 5-hydroxymethylcytosine before and after bisulfite treatment. The small lag observed for the bisulfite-treated cytosine oligonucleotide is due, in part, to the fact that after conversion of cytosine to uracil, this oligonucleotide can only be amplified from one of the two strands. We quantified the ACt value from experiments performed.

[0387] In summary, we have shown that the bisulfite technique for DNA methylation analysis does not distinguish between 5-methylcytosine and 5-hydroxymethylcytosine; that loci containing dense regions of hydroxymethylated DNA may be under-represented in quantitative methylation analyses; and that primer extension reactions conducted with bisulfite-treated DNA would be predicted to terminate disproportionately at sites of hydroxymethylation.

[0388] It should be possible to take advantage of our findings, in some embodiments, by combining ligation-mediated PCR with primer extension under suboptimal extension conditions to determine the location of 5-hydroxymethylcytosine in the genome. It is unclear how cytosine-5-methylsulfonate inhibits PCR. Rein et al. proposed that cytosine-5-methylsulfonate would block DNA polymerase by analogy to oxidative pyrimidine adducts such as thymine glycol. However, cytosine-5-methylsulfonate retains aromaticity, whereas it has since been demonstrated that polymerases are disrupted by thymine glycol's loss of aromaticity and consequent adoption of a chair geometry. Whatever the mechanism, the observation that 5-hydroxymethylcytosine can stall Taq polymerase after bisulfate reactions may have important ramifications for our interpretation of previous DNA methylation analyses as discussed herein.

The Effect of 5-Hydroxymethylcytosine on Sodium Bisulfite-Based Analysis of DNA Methylation Status

[0389] Cytosine methylation, typically found in the context of CpG sequences, is critical in vertebrates and performs functions such as regulation of transcription and silencing of transposable elements (W. Reik, Nature 447, 425 (May 24, 2007)). Recently, we predicted that the TET family of proteins would oxidize 5-methylcytosine to 5-hydroxymethylcytosine (L. M. Iyer, et al., Cell Cycle 8, 1698 (2009)). Acting on this prediction, we found that expression of the catalytic domain (CD) of human TET1 in 293T cells caused formation of 5-hydroxymethylcytosine and a corresponding loss of 5-methylcytosine. Recombinant human TET1 CD efficiently oxidized 5-methylcytosine to 5-hydroxymethylcytosine in vitro. We also found that 5-hydroxymethylcytosine is present in mammalian DNA and is particularly abundant in Embryonic Stem Cells. In murine ES cells, siRNA knockdown of Tea and Tet2 causes a reduction in observed hydroxymethylcytosine levels (M. Tahiliani et al., Science 324, 930 (2009)). Independently, another group reported the presence of 5-hydroxymethylcytosine in Purkinje neurons (S. Kriaucionis, N. Heintz, Science 324, 929 (2009)).

[0390] TET proteins include three recognizable domains. A CXXC domain, which in other proteins is involved in binding of unmethylated CpG motifs, a double-stranded beta-helix (DSBH) which contains the catalytic residues, and a cysteine rich region. The function of this last domain is unclear, but based on its similarity to zinc finger domains and its position relative to the DSBH, it may be involved in DNA binding.

[0391] Very little is known about the physiological role of TET proteins or 5-hydroxymethylcytosine. The DSBH of TET1 is found in a fusion with the oncogene MLL in rare leukemias (R. B. Lorsbach et al., Leukemia 17, 637 (2003); R. Ono et al., Cancer Res 62, 4075 (2002)). Null mutations of TET2 are found in a significant fraction of patients with AML or precancerous myelodysplastic disorders, and TET2 is thus believed to be a tumor suppressor that is lost early in the development of myeloid tumors (S. M. Langemeijer et al., Nat Genet 41, 838 (2009); F. Delhommeau et al., N Engl J Med 360, 2289 (2009)). The mechanism of TET's role in cancer is undetermined. Tet2 deficient mice die shortly after birth, again for unknown reasons (H. Tang, et al., Transgenic Res 17, 599 (2008)).

[0392] While 5-hydroxymethylcytosine has no known function, without wishing to be limited by a theory, it is thought that it might facilitate demethylation either by "flagging" methylated cytosines for removal or blocking maintenance methylation. Without wishing to be limited by a theory, it may also have a role in blocking 5-methylcytosine binding proteins or recruiting as yet undiscovered 5-hydroxymethylcytosine binding proteins.

[0393] In one embodiment, we can determine whether hydroxymethylation leads to active and/or passive demethylation of 5-methylcytosine in DNA. As discussed, 5-hydroxymethylcytosine may lead, without wishing to be bound by a theory, to demethylation by an active or passive mechanism. An active mechanism might entail removal of 5-hydroxymethylcytosine by DNA repair machinery, which, without wishing to be limited or bound to a theory, is most likely base excision repair, which is typically used to remove lesions that do not disrupt the broad structure of DNA (V. Valinluck, et al., Nucleic Acids Res 33, 3057 (2005)). Most DNA glycosylases generate abasic sites or 3' phospho a, 13-unsaturated aldehydes, both of which react with an aldehyde specific molecule called ARP (FIG. 28). Removal of these repair intermediates is the rate-limiting step in DNA repair, and thus large scale glycosylase activity would be predicted, without wishing to be constrained by a theory, to generate many aldehydes in DNA which could be measured via ARP. We found that in 293T cells, expression of the TET1 catalytic domain (CD) did not cause a significant increase in aldehyde density (FIG. 29). We considered MBD4 to be a likely glycosylase to remove 5-hydroxymethylcytosine, as it is known to repair the somewhat analogous compound 5-bromocytosine (V. Valinluck, et al., Nucleic Acids Res 33, 3057 (2005)) and it binds to methylated DNA (B. L. Parsons, Proc Natl Acad Sci USA 100, 14601 (2003)). Also, an MBD4 homologue is fused to a distant TET homologue in some algae species (L. M. Iyer, et al., Cell Cycle 8, 1698 (2009)). However, coexpressing MBD4 with TET1 CD did not significantly increase abasic sites (FIG. 29), reduce 5-hydroxymethylcytosine levels, or increase cytosine levels.

[0394] Meanwhile, it has become clear that in 293T cells TET's main effect is to convert cytosine to 5-hydroxymethylcytosine. Only a modest rise in cytosine is observed upon TET expression, which could arise via blocking of maintenance methylation as opposed to repair (M. Tahiliani et al., Science 324, 930 (2009)). Also, the simple fact that cells can tolerate such high levels of 5-hydroxymethylcytosine would seem to indicate, without wishing to be bound by a theory, that at least in 293T cells, large-scale glycosylase activity is not occurring. We have cloned a number of DNA repair proteins (MBD4, SMUG1, TDG, NTHL1, NEIL1, NEIL2 and APEX1), and can test their involvement in resolution of hydroxymethylcytosine. We can do this by expressing the enzymes in mammalian cells, then determining whether any 5-hydroxymethylcytosine-glycosylase activity is present in lysate by monitoring cleavage of a hydroxymethylcytosine-containing oligo. For example, in one aspect we can express a test glycosylase of interest in 293T cells. We can generate and end-label oligonucleotides, where at least one oligonucleotide has 5-hydroxymethylcytosine residues and another oligonucleotide has a known substrate for the test glycosylase. The glycosylase expressing 293 cells are then lysed and the oligonucleotides are added to the lysate. The oligonucleotides are then exposed to alkaline conditions in order to generate abasic sites on the oligonucleotides. The oligonucleotides are then run on a denaturing gel to detect breaks as described herein. If both the hydroxymethylated and positive control oligonucleotides are cut, it indicates that the test glycosylase recognizes 5-hydroxymethylcytosine. If only positive control oligonucleotide is cut, it indicates that the test glycosylase does not recognize 5-hydroxymethylcytosine. If we observe no cutting of both the hydroxymethylated and positive control oligonucleotides, it indicates that the test glycosylase is not active in conditions used in assay.

[0395] In another aspect, we can also determine whether hydroxymethylation blocks maintenance methylation. Without wishing to be bound by a theory, DNMT1 might not efficiently methylate cytosines at CpGs opposite hydroxymethylated CpGs, an observation with some in vitro backing (V. Valinluck, and L. C. Sowers, Cancer Res 67, 946 (2007)). Also, it has been observed that methylation activates DNMT1 allosterically (R. Goyal, et al., Nucleic Acids Res 34, 1182 (2006); Z. M. Svedruzic, Curr Med Chem 15, 92 (2008)), and hydroxymethylation may not have this effect. Finally, DNMT1 requires the partner protein UHRF1, which selectively binds hemimethylated CpGs, for localization to newly replicated DNA (M. Bostick et al., Science 317, 1760 (2007); J. Sharif et al., Nature 450, 908 (2007)). Inhibition of UHRF1 binding could also block maintenance methylation.

[0396] We have expressed recombinant UHRF1 and showed that it has modestly impaired binding to hemihydroxymethylated, as opposed to hemimethylated, DNA, as determined by an Electromobility Shift Assay (EMSA). We saw some binding to unmethylated DNA, which was not observed in past work (M. Bostick et al., Science 317, 1760 (2007); C. Qian et al., J Biol Chem 283, 34490 (2008)) possibly because of the use of different blocking agents. We can also better replicate the conditions used in past work and determine the preference for hemimethylated over hemihydroxymethylated DNA under these conditions. We can also determine whether maintenance methylation of hydroxymethylated DNA is impaired. Episomal plasmids have been shown to maintain methylation faithfully through many cell divisions and are relatively easy to manipulate (C. L. Hsieh, Mol Cell Biol 14, 5487 (1994)), and we can compare the maintenance of methylated versus hydroxymethylated episomes.

[0397] We can also evaluate and discover methods for determining where hydroxymethylcytosine residues are located in DNA.

[0398] The discovery of 5-hydroxymethylcytosine in mammalian DNA forces a reassessment of old techniques used to differentiate methylated and unmethylated cytosine. Furthermore, determination of the physiological role of 5-hydroxymethylcytosine requires knowledge of where in the genome 5-hydroxymethylcytosine is located, and we have developed methods of tagging and precipitating 5-hydroxymethylcytosine for use in chromatin immunoprecipitation.

[0399] In T4 phage, all cytosines are hydroxymethylated and subsequently glucosylated by the enzymes a-glucosyltransferase (AGT) or .beta.-glucosyltransferase (BGT) (S. R. Kornberg, et al., J Biol Chem 236, 1487 (1961)) (FIG. 30). We have succeeded in producing recombinant BGT. Thus, we can glucosylate sites of hydroxymethylation, and label them via the mechanism described in FIG. 31. We treated bacterial plasmid and T4 phage DNA with periodate, and then used the same aldehyde quantification method described. Only periodate treated T4 phage DNA showed major aldehyde presence (FIG. 32).

[0400] In one embodiment, glucosylation conditions for hydroxymethylated DNA can be optimized, and the extent of glucosylation can be measured by TLC. Periodate treatment can be optimized and binding to beads with hydrazide moieties can be performed, in order to perform specific pulldown of hydroxymethylated and glucosylated DNA. Such methods can be used, for example, to perform chromatin immunoprecipitation (ChIP) to determine sites of in vivo genomic hydroxymethylation.

[0401] We can determine likely sites of hydroxymethylation by determining the binding specificities of TET1. We individually expressed domains from TET proteins and tested their DNA binding properties via EMSAs. Other CXXC domains have been found to bind unmethylated CpGs, so we expressed the CXXC domains of TET1 and TET3 to test this specificity. We found that the CXXC domains in TET proteins are very positively charged and seem to bind non-specifically to all DNA in vitro. In parallel, we expressed the CXXC domain of CXXC1, which has been demonstrated to bind to unmethylated CpGs. Under the same conditions used for the TET proteins, this domain bound specifically. We found that the catalytic domain as a whole and the DSBH domain of TET bind DNA, but again with no specificity, not even for methylated CpG, which is TET's substrate. Without wishing to be bound by a theory, this may be due to non-specific binding of DNA to a largely unconserved positively charged region of the DSBH, which is unlikely to actually interact with DNA in vivo because of its predicted position on the protein.

[0402] In one aspect, we can also generate mice in which one or more of the TET family genes is genetically ablated ("knock-out mice"), in a lineage specific or inducible manner ("conditional knock-out mice"). We have successfully generated Tea and Tet2 conditional knock-out mice. We have successfully generated Tet3 conditional KO mice possessing a high degree of chimerism, and are confirming germline transmission, after which we can breed mice fully deficient for Tet3 and analyze their phenotype. We have shown that Tet3 is expressed in many tissues, so subsequent experiments on the mice will be guided by phenotype.

Identifying 5-Hydroxymethylcytsoine Using Antibodies to Cytosine Methylene Sulfonate

[0403] The invention also provides, in part, the use of antibodies to cytosine methylene sulfonate to identify 5-hydroxymethylcytosine residues in genomic DNA and for the isolation of such 5-hydroxymethylcytosine residue comprising DNA by immunoprecipitation, for use, for example, in analyses of cancer cells.

[0404] We have produced a rabbit antiserum specific for cytosine methylene sulfonate, the product of bisulfite treatment of 5-hydroxymethylcytosine, and have shown that this antiserum is highly specific for, and can be used to quantify, the quantity of 5-hydroxymethylcytosine residues present in a sample, such as genomic DNA. We have shown that this rabbit antiserum can be used to demonstrate the inhibition of TET family activity, for example, when TET family activity is inhibited by the use of one or more siRNAs specific for TET family members, such as TET1 or a combination of TET1 and TET2. For example, a bisulfite treated sample, such as a genomic DNA sample, can be digested with an enzyme, such as Mse1, which cleaves at TTAA sequences. The digested DNA can then be end-labeled with .sup.32P. The digested and labeled DNA can then be incubated with an antibody or antiserum specific for cytosine methylene sulfonate, and immobilized, for example, with anti-rabbit IgG beads. Radiation counts can then be determined using scintillation counters, and the radiation count data used to ascertain the amount of 5-hydroxymethylcytosine present in the DNA. An example of such an assay is shown in FIG. 19.

[0405] In another such example, genomic DNA from ES cells, either transfected with siRNA sequences specific for one or more TET family members, such as TET1 or a combination of TET1 and TET2, is bisulfite treated, digested with an enzyme, and labeled and incubated with antiserum specific for cytosine methylene sulfonate, and the amount of cytosine methylene sulfonate residues can be quantified against a standard curve generated using a known oligo containing cytosine methylene sulfonate. The impact of TET family inhibition on the generation of 5-hydroxymethylcytosine can then be compared between the samples. The presence of less cytosine methylene sulfonate in a sample treated with a TET family inhibitor, such as an siRNA sequence, is indicative of the specificity of that siRNA for the TET family member.

[0406] In yet another example, the amount of 5-hydroxymethylcytosine in a patient having mutations in one or more TET family members and suffering from a malignant condition, can be ascertained using bisulfite treatment of DNA obtained from such a patient, where the DNA is then assayed for cytosine methylene sulfonate quantity using the antiserum described herein, as shown in FIG. 21 and FIG. 33. Genomic DNA was isolated from patients having the following mutations in TET2, and diagnosed with the cancerous conditions shown in parentheses:

CCF2032-S631stop-somatic (CD3 negative), heterozygous mutation, (MDS/MPD, MDS/MPD-U<5%) CCF2148-S509stop-somatic (CD3 negative), hemizygous mutation, pt with de14q24, (MDS, RARS) CCF2674-ins1310T-somatic (CD3 negative), homozygous mutation, pt with UPD4q, (MDS/MPD, CMML-1) CCF5936-ins318A-homozygous mutation, SNP-A results pending, (CML)

CCF852-WT TET2, (MDS/MPD, CMML-2)

CCF4018-WT TET2, (MDS/MPD, CMML-1)

[0407] The isolated DNA was then either bisulfite treat or left untreated, digested and labeled with .sup.32P. The bisulfite treated DNA was incubated with antiserum specific for cytosine methylene sulfonate, while the untreated DNA was incubated with antibodies specific for 5-hydroxymethylcytosine to immunoprecipitate the genomic regions having 5-hydroxymethylcytosine. The immunoprecipitated DNA was then run on gels as dot blots and analyzed using phosphoimaging, compared to serial dilutions of a standard control having a known quality of cytosine methylene sulfonate or 5-hydroxymethylcytosine, such as cytosine methylene sulfonate or 5-hydroxymethylcytosine oligonucleotides. In the examples shown in FIG. 21 and FIG. 33, we show that patients CCF2148 and CCF2674 have significantly less 5-hydroxymethylcytosine, when compared to patients CCF852 and CCF4018, having wild-type TET2. This demonstrated that the somatic mutations in TET2 in patients CCF2148 and CCF2674 directly are functional and directly impact TET2-mediated conversion of 5-methylcytosine to 5-hydroxymethylcytosine.

Role of TET Proteins in Leukemia

[0408] It has been observed that there are a high frequency of TET2, but not TET1 and TET3, mutations in various myeloid cancers, including MDS, MPD, AML, secondary AML, systemic mastocytosis, and CMML. It has been shown that TET2 is the most commonly mutated gene in MDS, and thus serves as a very useful prognostic marker.

[0409] TET2 mutations are present in both multipotent and committed progenitor cells from MPD patients. TET2 mutations have been found in patients with both JAK2 V617F-positive and -negative MPD, and these mutations have been proposed to be a pre-JAK2 event. It has been shown that there is an enrichment of TET2 missense mutations, without frame shift or nonsense mutations, or deletions, in two conserved regions that cover the catalytic core of TET proteins that contain C and D domains, as shown in FIG. 34. We postulate that these numerous heterozygous missense mutations have dominant negative roles to promote malignant transformation.

[0410] We have shown that TET1 and TET2 have differential expression patterns when both bone marrow and thymic hematopoietic progenitor cell subsets are examined. As shown in FIG. 35, TET2 is expressed most highly in the Gr-1.sup.-Mac-1.sup.+ myeloid lineage bone marrow cells; pre-B, immature B, and mature B lymphoid lineage bone marrow cells; and in DN1, DP, CD4+SP, and CD8+SP thymic lymphoid lineage cells. As shown in FIG. 36, TET1 is expressed most highly in DP, CD4+SP, and CD8+SP thymic lymphoid lineage cells.

[0411] In order to determine the role of TET2 in leukemia and malignant transformations, and the role of cooperation between TET2 and JAK2 mutations, Lin.sup.-c-kit.sup.+ cells bone marrow cells can be isolated and transduced with the various combinations of retroviral vectors: LMP-GFP and MSCV-IRES-hCD4; LMP-shTet2-GFP and MSCV-IRES-hCD4; LMP-GFP and MSCV-JAK2 V617F-IRES-hCD4; and LMP-shTet2-GFP and MSCV-JAK2 V617F-IRES-hCD4, where shTet2 is an shRNA specific for Tet2. Cells can then be sorted on the basis of GFP and hCD4 expression, using techniques known to one of skill in the art. The isolated cells can then be compared for their effects on growth kinetics, transforming activity, and in vivo tumorigenesis. For example, isolated cells can be transferred into lethally irradiated mice to investigate in vivo tumorigenesis capacities.

[0412] As shown in FIG. 37, expression of the shTet2#3 sequence results in decreased expression of Tet2 in c-kit.sup.+ bone marrow cells, as assessed by quantitative PCR analysis. Further, we show that expression of the shTet2#3 sequence results in decreased protein expression, using a Myc tagged Tet2 protein (FIG. 37).

[0413] Without wishing to be bound or limited by a theory, we postulate that the TET family of epigenetic modulators serve as potential linkers between energy metabolism and tumor suppression. Isocitratedehydrogenases (IDHs) are metabolic enzymes in the TCA cycle and catalyze the oxidative decarboxylation of isocitrate to .alpha.-ketoglutarate (.alpha.-KG). IDHs can be classified into two groups (depending on the types of e-acceptor): (1) NAD+-dependent isocitratedehydrogenases, such as IDH3A, IDH3B, IDH3G, which form heterotetramer .alpha.2.beta..gamma., play an irreversible step of TCA cycle, and are found in the mitochondrial matrix; and (2) NDAP+-dependent isocitratedehydrogenases, such as IDH1, IDH2, which form homodimers, are involved in NADPH regeneration for anabolic pathways, and can be found in the mitochondrial matrix (IDH2) or cytoplasm/peroxisome (IDH1). It is known that recurrent somatic, (dominant negative) mutations occur at R132 of IDH1 in glioblastoma multiform (GBM: .about.12%) and myeloid leukemia. Without wishing to be bound by a theory, we postulate that the R132 mutation impairs IDH1 homodimer formation, resulting in impaired .alpha.-KG generation, which results in TET family inactivation and consequent tumoriegenesis, as diagrammed in FIG. 38.

Detection of Radiolabled Glucose Added to 5-Hydroxymethylcytosine

[0414] DNA is incubated with alpha-glucosyltransferase or beta-glucosyltransferase in the presence of radiolabeled uridine diphosphate (UDP) glucose, either UDP-14C-glucose or UDP-3H-glucose, and the DNA is purified. If 5-hydroxymethylcytosine is present in the DNA, the radiolabel is isolated with the DNA and detected by liquid scintillation counting or autoradiography or other means. In some embodiments, the DNA is first contacted with one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments to convert 5-methylcytosine to 5-hydroxymethylcytosine.

Detection of Non-Radiolabled Glucose Added to 5-Hydroxymethylcytosine

[0415] Non-radioactive UDP glucose is used as a substrate and the resulting alpha-glucose-5-hydroxymethylcytosine or beta-glucose-5-hydroxymethylcytosine is detected by further chemical reaction or protein binding. Examples of a protein include an antibody or lectin that recognizes alpha-glucose-5-hydroxymethylcytosine or beta-glucose-5-hydroxymethylcytosine or an enzyme, such as hexokinase or beta-glucosyl-alpha-glucosyl-transferase, that adds further modifications to the alpha-glucose-5-hydroxymethylcytosine or beta-glucose-5-hydroxymethylcytosine. In some embodiments, the DNA is first contacted with one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments to convert 5-methylcytosine to 5-hydroxymethylcytosine.

Detection of Methylcytosine and 5-Hydroxymethylcytosine Using Covalent Trapping

[0416] A UDP glucose analog that fosters covalent trapping of the covalent enzyme-DNA intermediate is used as a substrate, such that when DNA is incubated with alpha-glucosyltransferase or beta-glucosyltransferase, any 5-hydroxymethylcytosine containing DNA is tagged with alpha-glucosyltransferase or beta-glucosyltransferase. The DNA either has naturally occurring 5-hydroxymethylcytosine residues or is contacted with one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments to convert 5-methylcytosine to 5-hydroxymethylcytosine. Also, the alpha-glucosyltransferase or beta-glucosyltransferase are created with one or more protein or non-protein tags to facilitate detection or isolation of the covalently linked enzyme-DNA complexes.

Modification and Detection of Methylcytosine and 5-Hydroxymethylcytosine

[0417] Naturally-occurring 5-hydroxymethylcytosine or that created by conversion of 5-methylcytosine in nucleic acids, such as DNA, is converted to glucose-5-hydroxymethylcytosine with alpha-glucosyltransferase or beta-glucosyltransferase and is further glycosylated using beta-glucosyl-alpha-glucosyl-transferase. The beta-glucosyl-alpha-glucosyl-transferase adds radioactively labeled glucose in UDPG to glucose-5-hydroxymethylcytosine. Alternatively, beta-glucosyl-alpha-glucosyl-transferase is used with substrates other than UDPG, such as UDP-2-deoxy-2-fluoro-glucose, to covalently trap the enzyme with its substrates. This will allow tagging of methylcytosine or 5-hydroxymethylcytosine with a protein. Beta-glucosyl-alpha-glucosyl-transferase is also created with several protein or non-protein tags to facilitate detection or isolation of the covalently linked beta-glucosyl-alpha-glucosyl-transferase glucose-5-hydroxymethylcytosine DNA complex.

[0418] The gentibiosyl (gentiobiosyl) residue in gentibiose-containing 5-hydroxymethylcytosine, which results from addition of a second glucose to glucose-5-hydroxymethylcytosine DNA by beta-glucosyl-alpha-glucosyl-transferase is detected using non-covalent methods. Detection methods include exploiting the binding of gentibiosyl residues to proteins with an affinity for this residue, such as (1) antibodies specific to gentibiose-containing 5-hydroxymethylcytosine or (2) lectins with affinity to gentibiosyl, such as Musa acuminata lectin (BanLec).

[0419] Lectins and antibodies further modified with several tags such as biotin or beads are used for solid-phase purification of gentibiose-containing 5-hydroxymethylcytosine containing DNA. Lectins and antibodies modified with gold or fluorescent tags are used for electron microscopic or immunofluorescent detection, respectively, of gentibiose-containing 5-hydroxymethylcytosine containing DNA.

[0420] If desired, covalent linkages of glucose and gentibiosyl modifications to gentibiose-containing 5-hydroxymethylcytosine and glucose-containing 5-hydroxymethylcytosine are reversed by chemical means or by enzymes such as alpha- and beta-glucosidases, thus liberating the 5-hydroxymethylcytosine containing DNA for further downstream applications. One example of these methods is shown in FIG. 4.

[0421] To detect 5-hydroxymethylcytosine, the 5-hydroxymethyl residue of 5-hydroxymethylcytosine is converted to the 5-hydroxymethylenesulfonate residue by sodium hydrogen sulfite, and then detected with antibodies to the modified residue.

[0422] Downstream applications that utilize the covalently and non-covalently tagged methylcytosine and 5-hydroxymethylcytosine include: (i) detection of methylcytosine and 5-hydroxymethylcytosine in cells or tissues directly by fluorescence or electron microscopy; (ii) detection of methylcytosine and 5-hydroxymethylcytosine by assays including blotting or linked enzyme mediated substrate conversion with radioactive, colorimetric, luminescent or fluorescent detection and (iii) separation of the tagged DNA away from untagged DNA by enzymatic, chemical or mechanical treatments, and fractionation of either the tagged or untagged DNA by precipitation with beads, magnetic means, fluorescent sorting, or other means; followed by application to whole genome analyses such as microarray hybridization and high-throughput sequencing

Diagnostic Methods for Assessing Global Methylcytosine and 5-hydroxymethylcytosine Levels

[0423] Global level of methylcytosine and/or 5-hydroxymethylcytosine, i.e., the "methylome" or "hydroxymethylome" signatures in diseased tissue samples, such as bone marrow from patients with MDS, MPD, AML, are assessed to aid in disease diagnosis of disease to permits disease classifications, risk stratify patients, direct therapy, and monitor responses to therapy.

Genetic Tests for Methylcytosine and 5-Hydroxymethylcytosine Levels

[0424] Levels of methylcytosine and/or 5-hydroxymethylcytosine are determined in cells from family members of people affected with a disease, to determine whether they might harbor the disease. 5-hydroxymethylcytosine levels are determined, in a non-limiting example, in the CD34+ hematopoietic cells of a family member of someone with MDS, MPD, AML to determine whether there is a familial predisposition.

Kits and Methods for Detection of Methylcytosine and 5-Hydroxymethylcytosine in Genomes

[0425] Whole genomic DNA is mixed with control DNA, and sheared to a desired size (average around 200 bp). The DNA is subjected to one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments mediated conversion of methylcytosine to 5-hydroxymethylcytosine in the appropriate buffer. DNA is purified on spin column. 5-hydroxymethylcytosine converted DNA is then treated simultaneously with alpha-glucosyltransferase or beta-glucosyltransferase and beta-glucosyl-alpha-glucosyl-transferase enzyme in a UDPG containing buffer. DNA is purified on spin column. Biotinylated BanLec is rocked with gentibiose-containing 5-hydroxymethylcytosine converted DNA. Streptavidin agarose beads will be added. Streptavidin-biotin-BanLec-gentibiose-containing 5-hydroxymethylcytosin-containing DNA complexes are precipitated and washed in buffer, and supernatant containing unmethylated cytosine containing DNA is saved for analysis. The beads are treated with methyl-alpha-mannoside to release the lectin, and glucosidases to cleave the gentiobiosyl residue, and solute is purified over DNA spin column. The purified DNA is subjected to further analysis, such as microarray, direct sequencing, or PCR based assays.

[0426] An internal standard of lambda DNA carrying cytosine methylation at BamHI residues is used to determine efficiency and specificity of 5-hydroxymethylcytosine detection using PCR primer pairs flanking and not flanking BamHI residues in the lambda genome.

[0427] The detection of naturally occurring 5-hydroxymethylcytosine in genomes is performed the same as above but without the conversion of methylcytosine to 5-hydroxymethylcytosine by one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments.

[0428] The kit components comprise: one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytic fragments; one or more alpha glucosyltransferases, beta-glucosyltransferases, or beta-glucosyl-alpha-glucosyl-transferases; biotinylated BanLec; streptavidin agarose beads; methyl-alpha-mannoside; alpha-glucosidase and beta-glucosidase; appropriate buffers, substrate solutions, and DNA purification spin columns and an internal standard further comprising lambda DNA cytosine methylated with BamHI methyltransferase and PCR primers.

[0429] The present invention can be defined in any of the following numbered paragraphs:

1. A method for improving the generation of stable human Foxp3+ T cells, the method comprising contacting with or delivering to a human T cell an effective 5-methylcytosine to 5-hydroxymethylcytosine converting amount of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof. 2. The method of paragraph 1, wherein the catalytically active TET family enzyme is selected from the group consisting of TET1, TET2, TET3, and CXXC4. 3. The method of paragraph 1, wherein the human T cell is a purified human CD4+ T cell. 4. The method of paragraph 1, further comprising generating stable human Foxp3+ T cells by contacting the human T cell with a composition at least one cytokine, growth factor, or activating reagent. 5. The method of paragraph 5, wherein said composition comprises TGF-0. 6. A method for improving efficiency or rate with which induced pluripotent stem (iPS) cells are produced from somatic cells, the method comprising contacting with, or delivering to, a somatic cell an effective 5-methylcytosine to 5-hydroxymethylcytosine converting amount of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active thereof, or combination thereof. 7. The method of paragraph 6, wherein the catalytically active TET family enzyme is selected from the group consisting of TET1, TET2, TET3, and CXXC4. 8. The method of paragraph 6, wherein the catalytically active TET family enzyme is TET1 or TET2. 9. The method of paragraph 6, further comprising contact with or delivering to the somatic cell an effective amount of a TET family inhibitor. 10. The method of paragraph 9, wherein the TET family inhibitor is a TET3 inhibitor. 11. The method of paragraph 6, further comprising inducing iPS cell production by contacting the adult somatic cell with or delivering to said adult somatic cell a combination of nucleic acid sequences encoding Oct-4, Sox2, c-MYC, and Klf4. 12. The method of paragraph 11, wherein the combination of nucleic acid sequences encoding Oct-4, Sox2, c-MYC, and Klf4 are delivered in a viral vector, selected from the group consisting of an adenoviral vector, a lentiviral vector, and a retroviral vector. 13. The method of paragraph 6, wherein the somatic cell is a fibroblast. 14. A method for improving efficiency of cloning mammals by nuclear transfer or nuclear transplantation, the method comprising contacting a nucleus extracted from a cell to be cloned with an effective 5-methylcytosine to 5-hydroxymethylcytosine hydroxylating amount of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, during a nuclear transfer protocol. 15. The method of paragraph 14, wherein the catalytically active TET family enzyme is selected from the group consisting of TET1, TET2, TET3, and CXXC4. 16. The method of paragraph 14, wherein the catalytically active TET family enzyme is TET1 or TET2. 17. The method of paragraph 14, further comprising contact with or delivering to the somatic cell an effective amount of a TET family inhibitor. 18. The method of paragraph 17, wherein the TET family inhibitor is a TET3 inhibitor. 19. A method for detecting a 5-hydroxymethylcytosine nucleotide in a biological sample, the method comprising contacting a biological sample with a detectably labeled antibody or an antigen binding portion thereof, a labeled intrabody, or a labeled protein, that specifically binds to 5-hydroxymethylcytosine, and detecting the amount of bound label, wherein the presence of the bound label is indicative of the 5-methylcytosine being converted to 5-hydroxymethylcytosine. 20. A kit for modulating gene transcription via hydroxylation of 5-methylcytosine to 5-hydroxymethylcytosine, the kit comprising the following separate components: (a) at least one or more catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, or nucleic acid molecule that comprises a sequence encoding at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, in an appropriate buffer or solution; and (b) packaging materials and instructions therein to use said kit to hydroxylate 5-methylcytosine to 5-hydroxymethylcytosine, for the purposes of modulating gene transcription. 21. The kit of paragraph 20, wherein the catalytically active TET family enzymes are selected from the group consisting of TET1, TET2, TET3, and CXXC4. 22. The kit of paragraph 20, further comprising at least one cytokine, growth factor, activating reagent, or combination thereof, for the purposes of generating stable human Foxp3+ regulatory T cells. 23. The kit of paragraph 22, wherein the composition comprises TGF-0. 24. The kit of paragraph 20, further comprising at least one nucleic acid sequence encoding Oct-4, Sox2, c-MYC, and Klf4, to be contacted with or delivered to a somatic cell for the purposes of improving the efficiency and rate of induced pluripotent stem cell production. 25. The kit of paragraph 24, wherein the nucleic acid sequences encoding Oct-4, Sox2, c-MYC, and Klf4 are delivered in a viral vector selected from the group consisting of an adenoviral vector, a lentiviral vector, and a retroviral vector. 26. The kit of paragraph 20, further comprising at least one reagent suitable for the detection of 5-hydroxymethylcytosine. 27. The kit of paragraph 26, wherein the reagent suitable for the detection of 5-hydroxymethylcytosine is an antibody or an antigen-binding portion thereof, an intrabody, or a protein, that specifically binds to 5-hydroxymethylcytosine. 28. The kit of paragraph 26, wherein said reagent suitable for the detection of 5-hydroxymethylcytosine is specific for cytosine-5-methylsulfonate. 29. A method for improving stem cell therapies, the method comprising contacting with, or delivering to, a stem cell an effective 5-methylcytosine to 5-hydroxymethylcytosine converting amount of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment thereof, or combination thereof, or at least one nucleic acid molecule that comprises a sequence encoding at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof. 30. The method of paragraph 29, wherein the catalytically active TET family enzyme is selected from the group consisting of TET1, TET2, TET3, and CXXC4. 31. A method for treating an individual with or at risk for cancer, the method comprising administering to an individual with or at risk for cancer an effective amount of an agent that specifically modulates hydroxylase activity of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof involved in transforming 5-methylcytosine into 5-hydroxymethylcytosine. 32. The method of paragraph 31, wherein the catalytically active TET family enzyme is selected from the group consisting of TET1, TET2, TET3, and CXXC4. 33. The method of paragraph 31, wherein the agent that specifically modulates hydroxylase activity is an inhibitor. 34. The method of paragraph 31, wherein the agent that specifically modulates hydroxylase activity is an activator. 35. The method of paragraph 31, wherein the cancer is a leukemia. 36. The method of paragraph 35, wherein the leukemia is an acute myeloid leukemia comprising the t(10:11)(q22:q23) Mixed Lineage Leukemia translocation of TET1. 37. A method for screening for an agent with TET family enzyme modulating activity, the method comprising the steps of: a) providing a cell comprising at least one TET family enzyme, functional TET family derivative, TET catalytically active fragment, recombinant TET family enzyme, or combination thereof; b) contacting said cell with a test agent, thereby creating a test sample; and c) comparing the relative levels of 5-hydroxymethylated cytosine in cells expressing the catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, recombinant TET family enzyme, or combination thereof, in the test sample with the level expressed in a control sample; and (d) determining whether or not the test agent increases or decreases the level of 5-hydroxymethylated cytosine, wherein a statistically significant decrease in the level of 5-hydroxymethylated cytosine indicates the agent is an inhibitor, and a statistically significant increase in the level of 5-hydroxymethylated cytosine indicates the agent is an activator. 38. The method of paragraph 37, wherein the catalytically active TET family enzyme is selected from the group consisting of TET1, TET2, TET3, and CXXC4. 39. The method of any of the preceding paragraphs, wherein the functional TET family derivative comprises SEQ ID NO: 1. 40. The method of any of the preceding paragraphs, wherein the TET family catalytically active fragment comprises SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, or SEQ ID NO: 5. 41. A method for covalent tagging 5-hydroxymethylcytosine in a nucleic acid, the method comprising contacting a nucleic acid molecule with an enzyme that adds one or more glucose molecules to a 5-hydroxymethylcytosine residue to generate glucosylated-5-hydroxymethylcytosine or gentibiose-containing-5-hydroxymethylcytosine, wherein the enzyme is an alpha-glucosyltransferase, a beta-glucosyltransferase, or a beta-glucosyl-alpha-glucosyl-transferase. 42. The method of paragraph 41, wherein the 5-hydroxymethylcytosine is naturally occurring. 43. The method of paragraph 41, further comprising the step of first contacting said nucleic acid with at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment thereof, or combination thereof, thereby converting 5-methylcytosine to hydroxymethylcytosine. 44. The method of paragraph 41, wherein the alpha-glucosyltransferase is encoded by a bacteriophage selected from the group consisting of T2, T4, and T6 bacteriophages. 45. The method of paragraph 41, wherein the beta-glucosyltransferase is encoded by a bacteriophage selected from T4 bacteriophages. 46. The method of paragraph 41, wherein the beta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophage selected from the group consisting of T2 and T6 bacteriophages. 47. The method of paragraph 41, wherein the nucleic acid is contacted in vitro, in a cell, or in vivo. 48. A method for detecting 5-hydroxymethylcytosine in a nucleic acid, the method comprising contacting a nucleic acid with an enzyme that utilizes labeled glucose or glucose-derivative donor substrates to add at least one labeled glucose molecules or glucose-derivatives to a 5-hydroxymethylcytosine residue to generate glucosylated-5-hydroxymethylcytosine or gentibiose-containing-5-hydroxymethylcytosine, wherein the enzyme is an alpha-glucosyltransferase, a beta-glucosyltransferase, or a beta-glucosyl-alpha-glucosyl-transferase. 49. The method of paragraph 48, wherein the glucose or glucose-derivative donor substrate is a uridine diphosphate glucose. 50. The method of paragraph 48, wherein the labeled glucose or glucose-derivative donor substrates is radioactively labeled. 51. The method of paragraph 50, wherein the radioactive label is .sup.14C or .sup.3H. 52. The method of paragraph 48, wherein the 5-hydroxymethylcytosine is naturally occurring. 53. The method of paragraph 53, further comprising the step of first contacting said nucleic acid with at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, thereby converting 5-methylcytosine to 5-hydroxymethylcytosine. 54. The method of paragraph 48, wherein the alpha-glucosyltransferase is encoded by a bacteriophage selected from the group consisting of T2, T4, and T6 bacteriophages. 55. The method of paragraph 48, wherein the beta-glucosyltransferase is encoded by a bacteriophage selected from T4 bacteriophages. 56. The method of paragraph 48, wherein the beta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophage selected from the group consisting of T2 and T6 bacteriophages. 57. The method of paragraph 48, wherein the nucleic acid is contacted in vitro, in a cell, or in vivo. 58. A method for detecting 5-hydroxymethylcytosine in a nucleic acid, the method comprising contacting the covalently tagged 5-hydroxymethylcytosine of claim 41 with a protein that recognizes a glucose molecule, glucose-derivative or gentibiosyl molecule. 59. The method of paragraph 58, wherein the protein recognizes only the glucose molecule, glucose-derivative, or gentibiosyl. 60. The method of paragraph 58, wherein the protein recognizes the glucose molecule, glucose-derivative, or gentibiosyl only in the context of 5-hydroxymethylcytosine. 61. The method of paragraph 58, wherein the protein is a lectin. 62. The method of paragraph 61, wherein the lectin is Musa acuminata lectin. 63. The method of paragraph 58, wherein the protein is a antibody or antigen-binding fragment thereof. 64. The method of paragraph 63, wherein the antibody or antigen-binding fragment thereof is modified with a tag. 65. The method of paragraph 64, wherein the tag is a biotin molecule, a bead, a gold particle, or a fluorescent molecule. 66. The method of paragraph 58, wherein the protein is an enzyme. 67. The method of paragraph 66, wherein the enzyme is hexokinase or beta-glucosyl-alpha-glucosyl-transferase. 68. A method for detecting 5-hydroxymethylcytosine in a nucleic acid, the method comprising contacting a nucleic acid with an enzyme and utilizing glucose or glucose-derivative donor substrates that trap covalent enzyme-DNA intermediates to detect 5-hydroxymethylcytosine residues, wherein the enzyme is an alpha-glucosyltransferase, a beta-glucosyltransferase, or a beta-glucosyl-alpha-glucosyl-transferase. 69. The method of paragraph 68, wherein the glucose donor substrate is a uridine diphosphate glucose analog. 70. The method of paragraph 69, wherein the uridine diphosphate glucose analog is undine-2-deoxy-2-fluoro-glucose. 71. The method of paragraph 68, wherein the 5-hydroxymethylcytosine is naturally occurring. 72. The method of paragraph 68, further comprising the step of first contacting said nucleic acid with at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, thereby converting 5-methylcytosine to 5-hydroxymethylcytosine. 73. The method of paragraph 68, wherein the enzyme is tagged. 74. The method of paragraph 68, wherein the alpha-glucosyltransferase is encoded by a bacteriophage selected from the group consisting of T2, T4, and T6 bacteriophages.

75. The method of paragraph 68, wherein the beta-glucosyltransferase is encoded by a bacteriophage selected from T4 bacteriophages. 76. The method of paragraph 68, wherein the beta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophage selected from the group consisting of T2 and T6 bacteriophages. 77. The method of paragraph 68, wherein the nucleic acid is contacted in vitro, in a cell, or in vivo. 78. An method to detect 5-hydroxymethylcytosine in a nucleic acid, the method comprising contacting a nucleic acid with sodium hydrogen sulfite to convert a 5-hydroxymethylcytosine residue in a nucleic acid to a cytosine-5-methylsulfonate, and contacting the sodium hydrogen sulfite contacted nucleic acid with a protein specific for cytosine-5-methylsulfonate. 79. The method of paragraph 78, wherein the protein is an antibody or antigen-binding fragment thereof, an enzyme, or an intrabody. 80. The method of paragraph 79, wherein the antibody comprises an antiserum. 81. The method of paragraph 79, wherein the antibody or antigen-binding fragment thereof, enzyme, or intrabody is modified with a tag. 82. The method of paragraph 81, wherein the tag is a biotin molecule, a bead, a gold particle, or a fluorescent molecule. 83. The method of paragraph 78, further comprising isolating the 5-hydroxymethylcytosine residue containing nucleic acid with the protein specific for cytosine-5-methylsulfonate 84. The method of paragraph 78, wherein the nucleic acid is in vitro, in a cell, or in vivo. 85. A kit for the detection and purification of methylcytosine and 5-hydroxymethylcytosine, the kit comprising: (a) one or more catalytically active TET family enzymes, functional TET family derivatives, or TET catalytically active fragments thereof for the conversion of methylcytosine to 5-hydroxymethylcytosine; (b) one or more enzymes encoded by bacteriophages of the "T even" family; (c) one or more glucose or glucose-derivative donor substrates; (d) one or more proteins to detect glucose or glucose-derivative modified nucleotides; (e) standard DNA purification columns, buffers, and substrate solutions; and (f) packaging materials and instructions therein to use said kits. 86. The kit of paragraph 85, wherein the enzyme encoded by bacteriophages of the "T even" family is selected from the group consisting of alpha-glucosyltransferases, beta-glucosyltransferases, and beta-glucosyl-alpha-glucosyl-transferases. 87. The kit of paragraph 86, wherein the alpha-glucosyltransferase is encoded by a bacteriophage selected from the group consisting of T2, T4, and T6 bacteriophages. 88. The kit of paragraph 86, wherein the beta-glucosyltransferase is encoded by a bacteriophage selected from T4 bacteriophages. 89. The kit of paragraph 86, wherein the beta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophage selected from the group consisting of T2 and T6 bacteriophages. 90. The kit of paragraph 85, wherein the glucose or glucose-derivative donor substrate is uridine diphosphate glucose (UDPG). 91. The kit of paragraph 90, wherein the glucose or glucose-derivative donor substrate is radiolabeled. 92. The kit of paragraph 91, wherein the uridine diphosphate glucose is radiolabeled with 14C or 3H. 93. The kit of paragraph 85, wherein the protein that detects glucose or glucose-derivative modified nucleotides is selected from a group comprising a lectin, an antibody or antigen-binding fragment thereof, or an enzyme. 94. The kit of paragraph 85, wherein the protein recognizes only the glucose or glucose-derivative. 95. The kit of paragraph 85, wherein the protein recognizes the glucose or glucose-derivative only in the context of 5-hydroxymethylcytosine. 96. The kit of paragraph 93, wherein the antibody or antigen-binding fragment thereof is modified with at least one tag. 97. The kit of paragraph 96, wherein the tag is a biotin molecule, a bead, a gold particle, or a fluorescent molecule. 98. The kit of paragraph 93, wherein the enzyme is a hexokinase or a beta-glucosyl-alpha-glucosyl-transferase. 99. The kit of paragraph 93, wherein the lectin is Musa acuminata lectin (BanLec). 100. The kit of paragraph 99, wherein the lectin is modified with a gold particle or fluorescent tag. 101. A method for diagnosing a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia in an individual in need thereof, the method comprising the steps of (i) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in a tissue or cell sample from an individual in need thereof, and (ii) comparing the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof in the tissue or cell sample from the individual with a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, from a normal control sample, wherein a difference in the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, between the sample from the individual in need and the normal control sample is indicative of the individual having a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia. 102. The method of paragraph 101, further comprising a step of comparing the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in a tissue or cell sample of the individual to a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in at least one sample from a diseased tissue or a diseased cell, wherein if the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in the tissue or cell sample from the individual in need is similar to the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, from at least one of the samples from the diseased tissue or diseased cell then the individual is diagnosed with a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia. 103. A method for monitoring a disease progression or an effect of a therapy on a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia, the method comprising the steps of (i) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in a tissue or a cell sample from an individual in need thereof and establishing a baseline level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in the tissue or cell sample from the individual; (ii) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in a tissue or cell sample from the individual at least one time following the establishment of the baseline level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, thereby establishing at least one follow-up level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, wherein a difference in the follow-up level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, relative to the baseline level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, is indicative of the progression of, or effect of a therapy on, a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia in the individual. 104. A method for determining familial predisposition to a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia in an individual in need thereof, the method comprising (i) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof in CD34+ cells from an individual in need thereof, (ii) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in CD34+ cells from a family member of the individual, wherein the family member is affected with a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia, and (iii) comparing the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof in the CD34+ cells from the individual in need thereof with the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in the CD34+ cells from the affected family member, wherein an increase in the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in the individual relative to the 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof level in the affected family member is indicative of the individual being predisposed to a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia. 105. A method for determining familial predisposition to a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia in an individual in need thereof, the method comprising (i) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof in CD34+ cells from an individual in need thereof, (ii) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in CD34+ cells from a family member of the individual, wherein the family member is affected with a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia, and (iii) comparing the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof in the CD34+ cells from the individual in need thereof with the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in the CD34+ cells from the affected family member, wherein a decrease in the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, in the individual relative to the 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof level in the affected family member is indicative of the individual being predisposed to a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia. 106. The method as in any of paragraphs 101-105, wherein the 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof, level is determined using an assay to detect cytosine-5-methylsulfonate 107. A kit for the detection and purification of 5-hydroxymethylcytosine, the kit comprising: (a) at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof for the conversion of 5-methylcytosine to 5-hydroxymethylcytosine; (b) sodium bisulfite; (c) at least one protein to detect sodium bisulfite treated nucleotides; (e) standard DNA purification columns, buffers, and substrate solutions; and (f) packaging materials and instructions therein to use said kits. 108. The kit of paragraph 107, wherein the protein that recognizes sodium bisulfite treated nucleotide is specific for cytosine-5-methylsulfonate. 109. The kit of paragraph 107, wherein the protein that detects sodium bisulfite treated nucleotides is an antibody or antigen-binding fragment thereof, an intrabody, or an enzyme. 110. The kit of paragraph 107, wherein the antibody or antigen-binding fragment thereof, intrabody, or enzyme is modified with at least one tag. 111. The kit of paragraph 110, wherein the tag is a biotin molecule, a bead, a gold particle, or a fluorescent molecule. 112. The use of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, in the manufacture of a medicament for improving the generation of stable human Foxp3+ T cells, wherein an effective amount of o at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, is contacted with, or delivered to, a human T cell to improve the generation of stable human Foxp3+ T cells. 113. The use of paragraph 112, wherein the human T cell is a purified human CD4+ T cell. 114. The use of paragraph 112, further comprising generating stable human Foxp3+ T cells by contacting the human T cell with a composition comprising at least one cytokine, growth factor, or activating reagent. 115. The use of paragraph 114, wherein said composition comprises TGF-0. 116. The use of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, in the manufacture of a medicament for improving efficiency or rate with which an induced pluripotent stem (iPS) cell is produced from a somatic cell, wherein an effective amount of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, is contacted with, or delivered to, a somatic cell to improve the efficiency or rate with which an induced pluripotent stem (iPS) cell is produced. 117. The use of paragraph 116, further comprising inducing iPS cell production by contacting with or delivering to the somatic cell at least one of a nucleic acid sequence encoding Oct-4, Sox2, c-MYC, or Klf4, or a combination thereof. 118. The use of paragraph 117, wherein the at least one nucleic acid sequence encoding Oct-4, Sox2, c-MYC, or Klf4 is delivered in a viral vector, selected from the group consisting of an adenoviral vector, a lentiviral vector, and a retroviral vector. 119. The use of paragraph 116, further comprising contacting with, or delivering to, a somatic cell an effective amount of a TET family inhibitor. 120. The use of paragraph 119, wherein the TET family inhibitor is a TET3 inhibitor. 121. The use of paragraph 138, wherein the adult somatic cell is a fibroblast. 122. The use of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, in the manufacture of a medicament for improving efficiency of cloning mammals by nuclear transfer or nuclear transplantation, wherein an effective 5-methylcytosine to 5-hydroxymethylcytosine hydroxylating amount of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, is contacted with a nucleus extracted from a cell to be cloned during a nuclear transfer protocol. 123. The use of paragraph 122, further comprising contacting a nucleus extracted from a cell to be cloned during a nuclear transfer protocol with an effective amount of a TET family inhibitor.

124. The use of paragraph 123, wherein the TET family inhibitor is a TET3 inhibitor. 125. The use of a detectably labeled antibody or a antigen-binding portion thereof, a labeled intrabody, or a labeled protein, that specifically binds to 5-hydroxymethylcytosine for detecting a 5-hydroxymethylcytosine nucleotide in a sample, wherein the presence of the bound label is indicative of the presence of 5-hydroxymethylcytosine in the sample. 126. The use of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, or at least one nucleic acid molecule encoding at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, in the manufacture of a medicament for improving stem cell therapies, wherein an effective amount of at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, or at least one nucleic acid molecule encoding at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, is contacted with, or delivered to, a stem cell for improving stem cell therapies. 127. The use of an agent that specifically modulates hydroxylase activity of a at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, involved in transforming 5-methylcytosine into 5-hydroxymethylcytosine in the manufacture of a medicament for treating an individual with or at risk for cancer. 128. The use of paragraph 127, wherein the agent that specifically modulates hydroxylase activity is an inhibitor. 129. The use of paragraph 127, wherein the agent that specifically modulates hydroxylase activity is an activator. 130. The use of paragraph 127, wherein the cancer is a myelodysplastic syndrome, a myeloproliferative disorder, acute myelogenous leukemia, systemic mastocytosis, or chronic myelomonocytic leukemia 131. The use of paragraph 127, wherein the cancer is a leukemia. 132. The use of paragraph 131, wherein the leukemia is an acute myeloid leukemia comprising the t(10:11)(q22:q23) Mixed Lineage Leukemia translocation of TET1. 133. The use as in any one of paragraphs 112, 116, 122, 126, or 127, wherein the catalytically active TET family enzyme is selected from the group consisting of TET1, TET2, TET3, and CXXC4. 134. The use as in any one of paragraphs 112, 116, 122, 126, or 127, wherein the functional TET family derivative comprises SEQ ID NO: 1. 135. The use as in any one of paragraphs 112, 116, 122, 126, or 127, wherein the TET family catalytically active fragment comprises SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, or SEQ ID NO: 5. 136. The use of an enzyme that adds one or more glucose molecules to a 5-hydroxymethylcytosine residue in a nucleic acid for covalent tagging 5-hydroxymethylcytosine to generate glucosylated-5-hydroxymethylcytosine or gentibiose-containing-5-hydroxymethylcytosine, wherein the enzyme is an alpha-glucosyltransferase, a beta-glucosyltransferase, or a beta-glucosyl-alpha-glucosyl-transferase. 137. The use of paragraph 136, wherein the 5-hydroxymethylcytosine is naturally occurring. 138. The use of paragraph 136, further comprising the step of first contacting said nucleic acid with a at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, thereby converting 5-methylcytosine to hydroxymethylcytosine. 139. The use of an enzyme that utilizes labeled glucose or glucose-derivative donor substrates to add one or more labeled glucose molecules or glucose-derivatives to a 5-hydroxymethylcytosine residue in a nucleic acid to generate glucosylated-5-hydroxymethylcytosine or gentibiose-containing-5-hydroxymethylcytosine for detecting 5-hydroxymethylcytosine, wherein the enzyme is an alpha-glucosyltransferase, a beta-glucosyltransferase, or a beta-glucosyl-alpha-glucosyl-transferase. 140. The use of paragraph 139, wherein the glucose or glucose-derivative donor substrate is a uridine diphosphate glucose. 141. The use of paragraph 139, wherein the labeled glucose or glucose-derivative donor substrate is radioactively labeled. 142. The use of paragraph 141, wherein the radioactive label is .sup.14C or .sup.3H. 143. The use of paragraph 139, wherein the 5-hydroxymethylcytosine is naturally occurring. 144. The use of paragraph 139, further comprising the step of first contacting said nucleic acid with at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, thereby converting 5-methylcytosine to 5-hydroxymethylcytosine. 145. The use of a protein that recognizes a glucose molecule, glucose-derivative or gentibiosyl molecule for detecting the covalently tagged 5-hydroxymethylcytosine of paragraph 136. 146. The use of paragraph 145, wherein the protein recognizes only the glucose molecule, glucose-derivative, or gentibiosyl. 147. The use of paragraph 145, wherein the protein recognizes the glucose molecule, glucose-derivative, or gentibiosyl only in the context of 5-hydroxymethylcytosine. 148. The use of paragraph 145, wherein the protein is a lectin. 149. The use of paragraph 148, wherein the lectin is Musa acuminata lectin. 150. The use of paragraph 145, wherein the protein is a antibody or antibody fragment thereof. 151. The use of paragraph 150, wherein the antibody or antibody fragment thereof is modified with a tag. 152. The use of paragraph 170, wherein the tag is a biotin molecule, a bead, a gold particle, or a fluorescent molecule. 153. The use of paragraph 145, wherein the protein is an enzyme. 154. The use of paragraph 153, wherein the enzyme is a hexokinase or beta-glucosyl-alpha-glucosyl-transferase. 155. The use of an enzyme and a glucose or glucose-derivative donor substrate for trapping covalent enzyme-DNA intermediates to detect a 5-hydroxymethylcytosine residue in a nucleic acid, wherein the enzyme is an alpha-glucosyltransferase, a beta-glucosyltransferase, or a beta-glucosyl-alpha-glucosyl-transferase. 156. The use of paragraph 155, wherein the glucose donor substrate is a uridine diphosphate glucose analog. 157. The use of paragraph 156, wherein the uridine diphosphate glucose analog is uridine-2-deoxy-2-fluoro-glucose. 158. The use of paragraph 155, wherein the 5-hydroxymethylcytosine is naturally occurring. 159. The use of paragraph 155, further comprising the step of first contacting said nucleic acid with at least one catalytically active TET family enzyme, functional TET family derivative, TET catalytically active fragment, or combination thereof, thereby converting 5-methylcytosine to 5-hydroxymethylcytosine. 160. The use of paragraph 155, wherein the enzyme is tagged. 161. The use of an assay to detect 5-hydroxymethylcytosine in a nucleic acid, the assay comprising contacting a nucleic acid with sodium hydrogen sulfite to convert a 5-hydroxymethylcytosine residue in the nucleic acid to cytosine-5-methylsulfonate, and contacting the sodium hydrogen sulfite contacted nucleic acid with a protein specific for cytosine-5-methylsulfonate. 162. The use of paragraph 161, wherein the protein is an antibody or antigen-binding fragment thereof, an enzyme, or an intrabody. 163. The use of paragraph 162, wherein the antibody comprises an antiserum. 164. The use of paragraph 162, wherein the antibody or antigen-binding fragment thereof, enzyme, or intrabody is modified with a tag. 165. The use of paragraph 164, wherein the tag is a biotin molecule, a bead, a gold particle, or a fluorescent molecule. 166. The use as in any one of paragraphs 136, 139, or 155, wherein the alpha-glucosyltransferase is encoded by a bacteriophage selected from the group consisting of T2, T4, and T6 bacteriophages. 167. The use as in any one of paragraphs 136, 139, or 155, wherein the beta-glucosyltransferase is encoded by a bacteriophage selected from T4 bacteriophages. 168. The use as in any one of paragraphs 136, 139, or 155, wherein the beta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophage selected from the group consisting of T2 and T6 bacteriophages. 169. The use as in any one of paragraphs 136, 139, 155, or 161, wherein the nucleic acid is contacted in vitro, in a cell, or in vivo.

REFERENCES

[0430] The references cited herein and throughout the specification and examples are herein incorporated by reference in their entirety. [0431] 1. R. B. Lorsbach et al., Leukemia 17, 637 (March, 2003). [0432] 2. R. Ono et al., Cancer Res 62, 4075 (Jul. 15, 2002). [0433] 3. F. Delhommeau et al., Blood 112, lba-3 (November, 2008). [0434] 4. F. Viguie et al., Leukemia 19, 1411 (August, 2005). [0435] 5. C. Bogani et al., Stem Cells 26, 1920 (August, 2008). [0436] 6. G. Leone, M. T. Voso, L. Teofili, M. Lubbert, Clin Immunol 109, 89 (October, 2003). [0437] 7. L. Teofili et al., Int J Cancer 123, 1586 (Oct. 1, 2008). [0438] 8. S. R. Kornberg, S. B. Zimmerman, A. Kornberg, J Biol Chem 236, 1487 (May, 1961). [0439] 9. M. Winkler, W. Ruger, Nucleic Acids Res 21, 1500 (Mar. 25, 1993). [0440] 10. S. Kuno, I. R. Lehman, J Biol Chem 237, 1266 (April, 1962). [0441] 11. H. Hayatsu, M. Shiragami, Biochemistry 18, 632 (Feb. 20, 1979). [0442] 12. D. Zilberman, S. Henikoff, Development 134, 3959 (November, 2007). [0443] 13. L. Lariviere, N. Sommer, S. Morera, J Mol Biol 352, 139 (Sep. 9, 2005). [0444] 14. L. Lariviere, V. Gueguen-Chaignon, S. Morera, J Mol Biol 330, 1077 (Jul. 25, 2003). [0445] 15. J. Wicki, D. R. Rose, S. G. Withers, Methods Enzymol 354, 84 (2002). [0446] 16. I. J. Goldstein et al., Eur J Biochem 268, 2616 (May, 2001).

Sequence CWU 1

1

102124PRTHomo sapiensMOD_RES(4)..(4)Leu, Ile or ValMOD_RES(7)..(7)Any amino acidMOD_RES(11)..(11)Leu, Ile or ValMOD_RES(17)..(17)Arg or LysMOD_RES(18)..(18)Any amino acidMOD_RES(20)..(20)Leu, Ile or Val 1Gly Val Ala Xaa Ala Pro Xaa His Gly Ser Xaa Leu Ile Glu Cys Ala 1 5 10 15 Xaa Xaa Glu Xaa His Ala Thr Thr 20 2719PRTHomo sapiens 2Glu Leu Pro Thr Cys Ser Cys Leu Asp Arg Val Ile Gln Lys Asp Lys 1 5 10 15 Gly Pro Tyr Tyr Thr His Leu Gly Ala Gly Pro Ser Val Ala Ala Val 20 25 30 Arg Glu Ile Met Glu Asn Arg Tyr Gly Gln Lys Gly Asn Ala Ile Arg 35 40 45 Ile Glu Ile Val Val Tyr Thr Gly Lys Glu Gly Lys Ser Ser His Gly 50 55 60 Cys Pro Ile Ala Lys Trp Val Leu Arg Arg Ser Ser Asp Glu Glu Lys 65 70 75 80 Val Leu Cys Leu Val Arg Gln Arg Thr Gly His His Cys Pro Thr Ala 85 90 95 Val Met Val Val Leu Ile Met Val Trp Asp Gly Ile Pro Leu Pro Met 100 105 110 Ala Asp Arg Leu Tyr Thr Glu Leu Thr Glu Asn Leu Lys Ser Tyr Asn 115 120 125 Gly His Pro Thr Asp Arg Arg Cys Thr Leu Asn Glu Asn Arg Thr Cys 130 135 140 Thr Cys Gln Gly Ile Asp Pro Glu Thr Cys Gly Ala Ser Phe Ser Phe 145 150 155 160 Gly Cys Ser Trp Ser Met Tyr Phe Asn Gly Cys Lys Phe Gly Arg Ser 165 170 175 Pro Ser Pro Arg Arg Phe Arg Ile Asp Pro Ser Ser Pro Leu His Glu 180 185 190 Lys Asn Leu Glu Asp Asn Leu Gln Ser Leu Ala Thr Arg Leu Ala Pro 195 200 205 Ile Tyr Lys Gln Tyr Ala Pro Val Ala Tyr Gln Asn Gln Val Glu Tyr 210 215 220 Glu Asn Val Ala Arg Glu Cys Arg Leu Gly Ser Lys Glu Gly Arg Pro 225 230 235 240 Phe Ser Gly Val Thr Ala Cys Leu Asp Phe Cys Ala His Pro His Arg 245 250 255 Asp Ile His Asn Met Asn Asn Gly Ser Thr Val Val Cys Thr Leu Thr 260 265 270 Arg Glu Asp Asn Arg Ser Leu Gly Val Ile Pro Gln Asp Glu Gln Leu 275 280 285 His Val Leu Pro Leu Tyr Lys Leu Ser Asp Thr Asp Glu Phe Gly Ser 290 295 300 Lys Glu Gly Met Glu Ala Lys Ile Lys Ser Gly Ala Ile Glu Val Leu 305 310 315 320 Ala Pro Arg Arg Lys Lys Arg Thr Cys Phe Thr Gln Pro Val Pro Arg 325 330 335 Ser Gly Lys Lys Arg Ala Ala Met Met Thr Glu Val Leu Ala His Lys 340 345 350 Ile Arg Ala Val Glu Lys Lys Pro Ile Pro Arg Ile Lys Arg Lys Asn 355 360 365 Asn Ser Thr Thr Thr Asn Asn Ser Lys Pro Ser Ser Leu Pro Thr Leu 370 375 380 Gly Ser Asn Thr Glu Thr Val Gln Pro Glu Val Lys Ser Glu Thr Glu 385 390 395 400 Pro His Phe Ile Leu Lys Ser Ser Asp Asn Thr Lys Thr Tyr Ser Leu 405 410 415 Met Pro Ser Ala Pro His Pro Val Lys Glu Ala Ser Pro Gly Phe Ser 420 425 430 Trp Ser Pro Lys Thr Ala Ser Ala Thr Pro Ala Pro Leu Lys Asn Asp 435 440 445 Ala Thr Ala Ser Cys Gly Phe Ser Glu Arg Ser Ser Thr Pro His Cys 450 455 460 Thr Met Pro Ser Gly Arg Leu Ser Gly Ala Asn Ala Ala Ala Ala Asp 465 470 475 480 Gly Pro Gly Ile Ser Gln Leu Gly Glu Val Ala Pro Leu Pro Thr Leu 485 490 495 Ser Ala Pro Val Met Glu Pro Leu Ile Asn Ser Glu Pro Ser Thr Gly 500 505 510 Val Thr Glu Pro Leu Thr Pro His Gln Pro Asn His Gln Pro Ser Phe 515 520 525 Leu Thr Ser Pro Gln Asp Leu Ala Ser Ser Pro Met Glu Glu Asp Glu 530 535 540 Gln His Ser Glu Ala Asp Glu Pro Pro Ser Asp Glu Pro Leu Ser Asp 545 550 555 560 Asp Pro Leu Ser Pro Ala Glu Glu Lys Leu Pro His Ile Asp Glu Tyr 565 570 575 Trp Ser Asp Ser Glu His Ile Phe Leu Asp Ala Asn Ile Gly Gly Val 580 585 590 Ala Ile Ala Pro Ala His Gly Ser Val Leu Ile Glu Cys Ala Arg Arg 595 600 605 Glu Leu His Ala Thr Thr Pro Val Glu His Pro Asn Arg Asn His Pro 610 615 620 Thr Arg Leu Ser Leu Val Phe Tyr Gln His Lys Asn Leu Asn Lys Pro 625 630 635 640 Gln His Gly Phe Glu Leu Asn Lys Ile Lys Phe Glu Ala Lys Glu Ala 645 650 655 Lys Asn Lys Lys Met Lys Ala Ser Glu Gln Lys Asp Gln Ala Ala Asn 660 665 670 Glu Gly Pro Glu Gln Ser Ser Glu Val Asn Glu Leu Asn Gln Ile Pro 675 680 685 Ser His Lys Ala Leu Thr Leu Thr His Asp Asn Val Val Thr Val Ser 690 695 700 Pro Tyr Ala Leu Thr His Val Ala Gly Pro Tyr Asn His Trp Val 705 710 715 3879PRTArtificial SequenceDescription of Artificial Sequence Synthetic polypeptideMOD_RES(3)..(3)Any amino acidMOD_RES(5)..(5)Any amino acidMOD_RES(9)..(9)Any amino acidMOD_RES(12)..(12)Any amino acidMOD_RES(15)..(15)Any amino acidMOD_RES(17)..(17)Any amino acidMOD_RES(19)..(20)Any amino acidMOD_RES(22)..(26)Any amino acidMOD_RES(28)..(28)Any amino acidMOD_RES(30)..(30)Any amino acidMOD_RES(33)..(45)Any amino acidMOD_RES(49)..(49)Any amino acidMOD_RES(54)..(54)Any amino acidMOD_RES(56)..(835)Any amino acid and this region may encompass 0 to 780 residuesMOD_RES(839)..(839)Any amino acidMOD_RES(842)..(842)Any amino acidMOD_RES(846)..(846)Any amino acidMOD_RES(852)..(853)Any amino acidMOD_RES(855)..(855)Any amino acidMOD_RES(860)..(870)Any amino acidMOD_RES(872)..(872)Any amino acidMOD_RES(876)..(876)Any amino acid 3Pro Phe Xaa Gly Xaa Thr Ala Cys Xaa Asp Phe Xaa Ala His Xaa His 1 5 10 15 Xaa Asp Xaa Xaa Asn Xaa Xaa Xaa Xaa Xaa Thr Xaa Val Xaa Thr Leu 20 25 30 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Asp Glu Gln 35 40 45 Xaa His Val Leu Pro Xaa Tyr Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 50 55 60 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 65 70 75 80 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 85 90 95 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 100 105 110 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 115 120 125 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 130 135 140 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 145 150 155 160 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 165 170 175 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 180 185 190 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 195 200 205 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 210 215 220 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 225 230 235 240 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 245 250 255 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 260 265 270 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 275 280 285 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 290 295 300 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 305 310 315 320 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 325 330 335 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 340 345 350 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 355 360 365 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 370 375 380 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 385 390 395 400 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 405 410 415 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 420 425 430 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 435 440 445 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 450 455 460 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 465 470 475 480 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 485 490 495 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 500 505 510 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 515 520 525 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 530 535 540 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 545 550 555 560 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 565 570 575 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 580 585 590 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 595 600 605 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 610 615 620 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 625 630 635 640 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 645 650 655 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 660 665 670 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 675 680 685 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 690 695 700 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 705 710 715 720 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 725 730 735 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 740 745 750 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 755 760 765 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 770 775 780 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 785 790 795 800 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 805 810 815 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 820 825 830 Xaa Xaa Xaa Gly Val Ala Xaa Ala Pro Xaa His Gly Ser Xaa Leu Ile 835 840 845 Glu Cys Ala Xaa Xaa Glu Xaa His Ala Thr Thr Xaa Xaa Xaa Xaa Xaa 850 855 860 Xaa Xaa Xaa Xaa Xaa Xaa Arg Xaa Ser Leu Val Xaa Tyr Gln His 865 870 875 4878PRTArtificial SequenceDescription of Artificial Sequence Synthetic polypeptideMOD_RES(3)..(3)Any amino acidMOD_RES(5)..(5)Any amino acidMOD_RES(9)..(9)Any amino acidMOD_RES(12)..(12)Any amino acidMOD_RES(15)..(15)Any amino acidMOD_RES(17)..(17)Any amino acidMOD_RES(19)..(20)Any amino acidMOD_RES(22)..(26)Any amino acidMOD_RES(28)..(28)Any amino acidMOD_RES(30)..(30)Any amino acidMOD_RES(33)..(44)Any amino acidMOD_RES(48)..(48)Any amino acidMOD_RES(53)..(53)Any amino acidMOD_RES(55)..(834)Any amino acid and this region may encompass 0 to 780 residuesMOD_RES(838)..(838)Any amino acidMOD_RES(841)..(841)Any amino acidMOD_RES(845)..(845)Any amino acidMOD_RES(851)..(852)Any amino acidMOD_RES(854)..(854)Any amino acidMOD_RES(859)..(869)Any amino acidMOD_RES(871)..(871)Any amino acidMOD_RES(875)..(875)Any amino acid 4Pro Phe Xaa Gly Xaa Thr Ala Cys Xaa Asp Phe Xaa Ala His Xaa His 1 5 10 15 Xaa Asp Xaa Xaa Asn Xaa Xaa Xaa Xaa Xaa Thr Xaa Val Xaa Thr Leu 20 25 30 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Asp Glu Gln Xaa 35 40 45 His Val Leu Pro Xaa Tyr Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 50 55 60 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 65 70 75 80 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 85 90 95 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 100 105 110 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 115 120 125 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 130 135 140 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 145 150 155 160 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 165 170 175 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 180 185 190 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 195 200 205 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 210 215 220 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 225 230 235 240 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 245 250 255 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 260 265 270 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 275 280 285 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 290 295 300 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 305 310 315 320 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 325 330 335 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 340 345 350 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 355 360 365 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 370 375 380 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 385 390 395 400 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 405 410 415 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa

Xaa 420 425 430 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 435 440 445 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 450 455 460 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 465 470 475 480 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 485 490 495 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 500 505 510 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 515 520 525 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 530 535 540 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 545 550 555 560 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 565 570 575 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 580 585 590 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 595 600 605 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 610 615 620 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 625 630 635 640 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 645 650 655 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 660 665 670 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 675 680 685 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 690 695 700 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 705 710 715 720 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 725 730 735 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 740 745 750 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 755 760 765 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 770 775 780 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 785 790 795 800 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 805 810 815 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 820 825 830 Xaa Xaa Gly Val Ala Xaa Ala Pro Xaa His Gly Ser Xaa Leu Ile Glu 835 840 845 Cys Ala Xaa Xaa Glu Xaa His Ala Thr Thr Xaa Xaa Xaa Xaa Xaa Xaa 850 855 860 Xaa Xaa Xaa Xaa Xaa Arg Xaa Ser Leu Val Xaa Tyr Gln His 865 870 875 5887PRTArtificial SequenceDescription of Artificial Sequence Synthetic polypeptideMOD_RES(3)..(3)Any amino acidMOD_RES(5)..(5)Any amino acidMOD_RES(9)..(9)Any amino acidMOD_RES(12)..(13)Any amino acidMOD_RES(15)..(15)Any amino acidMOD_RES(17)..(17)Any amino acidMOD_RES(19)..(20)Any amino acidMOD_RES(22)..(32)Any amino acid and this region may encopass 2 to 11 residuesMOD_RES(34)..(34)Any amino acidMOD_RES(36)..(36)Any amino acidMOD_RES(39)..(51)Any amino acid and this region may encompass 9 to 13 residuesMOD_RES(55)..(55)Any amino acidMOD_RES(60)..(60)Any amino acidMOD_RES(62)..(841)Any amino acid and this region may encompass 0 to 780 residuesMOD_RES(845)..(845)Any amino acidMOD_RES(848)..(848)Any amino acidMOD_RES(852)..(852)Any amino acidMOD_RES(858)..(859)Any amino acidMOD_RES(861)..(861)Any amino acidMOD_RES(866)..(878)Any amino acid and this region may encompass 5 to 13 residuesMOD_RES(880)..(880)Any amino acidMOD_RES(884)..(884)Any amino acid 5Pro Phe Xaa Gly Xaa Thr Ala Cys Xaa Asp Phe Xaa Xaa His Xaa His 1 5 10 15 Xaa Asp Xaa Xaa Asn Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 20 25 30 Thr Xaa Val Xaa Thr Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 35 40 45 Xaa Xaa Xaa Asp Glu Gln Xaa His Val Leu Pro Xaa Tyr Xaa Xaa Xaa 50 55 60 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 65 70 75 80 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 85 90 95 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 100 105 110 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 115 120 125 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 130 135 140 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 145 150 155 160 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 165 170 175 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 180 185 190 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 195 200 205 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 210 215 220 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 225 230 235 240 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 245 250 255 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 260 265 270 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 275 280 285 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 290 295 300 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 305 310 315 320 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 325 330 335 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 340 345 350 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 355 360 365 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 370 375 380 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 385 390 395 400 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 405 410 415 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 420 425 430 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 435 440 445 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 450 455 460 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 465 470 475 480 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 485 490 495 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 500 505 510 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 515 520 525 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 530 535 540 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 545 550 555 560 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 565 570 575 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 580 585 590 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 595 600 605 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 610 615 620 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 625 630 635 640 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 645 650 655 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 660 665 670 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 675 680 685 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 690 695 700 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 705 710 715 720 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 725 730 735 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 740 745 750 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 755 760 765 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 770 775 780 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 785 790 795 800 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 805 810 815 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 820 825 830 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gly Val Ala Xaa Ala Pro Xaa 835 840 845 His Gly Ser Xaa Leu Ile Glu Cys Ala Xaa Xaa Glu Xaa His Ala Thr 850 855 860 Thr Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Arg Xaa 865 870 875 880 Ser Leu Val Xaa Tyr Gln His 885 61776PRTHomo sapiens 6Met Ser Gln Phe Gln Val Pro Leu Ala Val Gln Pro Asp Leu Pro Gly 1 5 10 15 Leu Tyr Asp Phe Pro Gln Arg Gln Val Met Val Gly Ser Phe Pro Gly 20 25 30 Ser Gly Leu Ser Met Ala Gly Ser Glu Ser Gln Leu Arg Gly Gly Gly 35 40 45 Asp Gly Arg Lys Lys Arg Lys Arg Cys Gly Thr Cys Glu Pro Cys Arg 50 55 60 Arg Leu Glu Asn Cys Gly Ala Cys Thr Ser Cys Thr Asn Arg Arg Thr 65 70 75 80 His Gln Ile Cys Lys Leu Arg Lys Cys Glu Val Leu Lys Lys Lys Val 85 90 95 Gly Leu Leu Lys Glu Thr Gly Ser Glu Leu Ser Pro Val Asp Gly Pro 100 105 110 Val Pro Gly Gln Met Asp Ser Gly Pro Val Tyr His Gly Asp Ser Arg 115 120 125 Gln Leu Ser Ala Ser Gly Val Pro Val Asn Gly Ala Arg Glu Pro Ala 130 135 140 Gly Pro Ser Leu Leu Gly Thr Gly Gly Pro Trp Arg Val Asp Gln Lys 145 150 155 160 Pro Asp Trp Glu Ala Ala Pro Gly Pro Ala His Thr Ala Arg Leu Glu 165 170 175 Asp Ala His Asp Leu Val Ala Phe Ser Ala Val Ala Glu Ala Val Ser 180 185 190 Ser Tyr Gly Ala Leu Ser Thr Arg Leu Tyr Glu Thr Phe Asn Arg Glu 195 200 205 Met Ser Arg Glu Ala Gly Asn Asn Ser Arg Gly Pro Arg Pro Gly Pro 210 215 220 Glu Gly Cys Ser Ala Gly Ser Glu Asp Leu Asp Thr Leu Gln Thr Ala 225 230 235 240 Leu Ala Leu Ala Arg His Gly Met Lys Pro Pro Asn Cys Asn Cys Asp 245 250 255 Gly Pro Glu Cys Pro Asp Tyr Leu Glu Trp Leu Glu Gly Lys Ile Lys 260 265 270 Ser Val Val Met Glu Gly Gly Glu Glu Arg Pro Arg Leu Pro Gly Pro 275 280 285 Leu Pro Pro Gly Glu Ala Gly Leu Pro Ala Pro Ser Thr Arg Pro Leu 290 295 300 Leu Ser Ser Glu Val Pro Gln Ile Ser Pro Gln Glu Gly Leu Pro Leu 305 310 315 320 Ser Gln Ser Ala Leu Ser Ile Ala Lys Glu Lys Asn Ile Ser Leu Gln 325 330 335 Thr Ala Ile Ala Ile Glu Ala Leu Thr Gln Leu Ser Ser Ala Leu Pro 340 345 350 Gln Pro Ser His Ser Thr Pro Gln Ala Ser Cys Pro Leu Pro Glu Ala 355 360 365 Leu Ser Pro Pro Ala Pro Phe Arg Ser Pro Gln Ser Tyr Leu Arg Ala 370 375 380 Pro Ser Trp Pro Val Val Pro Pro Glu Glu His Ser Ser Phe Ala Pro 385 390 395 400 Asp Ser Ser Ala Phe Pro Pro Ala Thr Pro Arg Thr Glu Phe Pro Glu 405 410 415 Ala Trp Gly Thr Asp Thr Pro Pro Ala Thr Pro Arg Ser Ser Trp Pro 420 425 430 Met Pro Arg Pro Ser Pro Asp Pro Met Ala Glu Leu Glu Gln Leu Leu 435 440 445 Gly Ser Ala Ser Asp Tyr Ile Gln Ser Val Phe Lys Arg Pro Glu Ala 450 455 460 Leu Pro Thr Lys Pro Lys Val Lys Val Glu Ala Pro Ser Ser Ser Pro 465 470 475 480 Ala Pro Ala Pro Ser Pro Val Leu Gln Arg Glu Ala Pro Thr Pro Ser 485 490 495 Ser Glu Pro Asp Thr His Gln Lys Ala Gln Thr Ala Leu Gln Gln His 500 505 510 Leu His His Lys Arg Ser Leu Phe Leu Glu Gln Val His Asp Thr Ser 515 520 525 Phe Pro Ala Pro Ser Glu Pro Ser Ala Pro Gly Trp Trp Pro Pro Pro 530 535 540 Ser Ser Pro Val Pro Arg Leu Pro Asp Arg Pro Pro Lys Glu Lys Lys 545 550 555 560 Lys Lys Leu Pro Thr Pro Ala Gly Gly Pro Val Gly Thr Glu Lys Ala 565 570 575 Ala Pro Gly Ile Lys Pro Ser Val Arg Lys Pro Ile Gln Ile Lys Lys 580 585 590 Ser Arg Pro Arg Glu Ala Gln Pro Leu Phe Pro Pro Val Arg Gln Ile 595 600 605 Val Leu Glu Gly Leu Arg Ser Pro Ala Ser Gln Glu Val Gln Ala His 610 615 620 Pro Pro Ala Pro Leu Pro Ala Ser Gln Gly Ser Ala Val Pro Leu Pro 625 630 635 640 Pro Glu Pro Ser Leu Ala Leu Phe Ala Pro Ser Pro Ser Arg Asp Ser 645 650 655 Leu Leu Pro Pro Thr Gln Glu Met Arg Ser Pro Ser Pro Met Thr Ala 660 665 670 Leu Gln Pro Gly Ser Thr Gly Pro Leu Pro Pro Ala Asp Asp Lys Leu 675 680 685 Glu Glu Leu Ile Arg Gln Phe Glu Ala Glu Phe Gly Asp Ser Phe Gly 690 695 700 Leu Pro Gly Pro Pro Ser Val Pro Ile Gln Asp Pro Glu Asn Gln Gln 705 710 715 720 Thr Cys Leu Pro Ala Pro Glu Ser Pro Phe Ala Thr Arg Ser Pro Lys 725 730 735 Gln Ile Lys Ile Glu Ser Ser Gly Ala Val Thr Val Leu Ser Thr Thr 740 745 750 Cys Phe His Ser Glu Glu Gly Gly Gln Glu Ala Thr Pro Thr Lys Ala 755 760 765 Glu Asn Pro Leu Thr Pro Thr Leu Ser Gly Phe Leu Glu Ser Pro Leu 770 775 780 Lys Tyr Leu Asp Thr Pro Thr Lys Ser Leu Leu Asp Thr Pro Ala Lys 785 790 795 800 Arg Ala Gln Ala Glu Phe Pro Thr Cys Asp Cys Val Glu Gln Ile Val 805 810 815 Glu Lys Asp Glu Gly Pro Tyr Tyr Thr His Leu Gly Ser Gly Pro Thr 820

825 830 Val Ala Ser Ile Arg Glu Leu Met Glu Glu Arg Tyr Gly Glu Lys Gly 835 840 845 Lys Ala Ile Arg Ile Glu Lys Val Ile Tyr Thr Gly Lys Glu Gly Lys 850 855 860 Ser Ser Arg Gly Cys Pro Ile Ala Lys Trp Val Ile Arg Arg His Thr 865 870 875 880 Leu Glu Glu Lys Leu Leu Cys Leu Val Arg His Arg Ala Gly His His 885 890 895 Cys Gln Asn Ala Val Ile Val Ile Leu Ile Leu Ala Trp Glu Gly Ile 900 905 910 Pro Arg Ser Leu Gly Asp Thr Leu Tyr Gln Glu Leu Thr Asp Thr Leu 915 920 925 Arg Lys Tyr Gly Asn Pro Thr Ser Arg Arg Cys Gly Leu Asn Asp Asp 930 935 940 Arg Thr Cys Ala Cys Gln Gly Lys Asp Pro Asn Thr Cys Gly Ala Ser 945 950 955 960 Phe Ser Phe Gly Cys Ser Trp Ser Met Tyr Phe Asn Gly Cys Lys Tyr 965 970 975 Ala Arg Ser Lys Thr Pro Arg Lys Phe Arg Leu Ala Gly Asp Asn Pro 980 985 990 Lys Glu Glu Glu Val Leu Arg Lys Ser Phe Gln Asp Leu Ala Thr Glu 995 1000 1005 Val Ala Pro Leu Tyr Lys Arg Leu Ala Pro Gln Ala Tyr Gln Asn 1010 1015 1020 Gln Val Thr Asn Glu Glu Ile Ala Ile Asp Cys Arg Leu Gly Leu 1025 1030 1035 Lys Glu Gly Arg Pro Phe Ala Gly Val Thr Ala Cys Met Asp Phe 1040 1045 1050 Cys Ala His Ala His Lys Asp Gln His Asn Leu Tyr Asn Gly Cys 1055 1060 1065 Thr Val Val Cys Thr Leu Thr Lys Glu Asp Asn Arg Cys Val Gly 1070 1075 1080 Lys Ile Pro Glu Asp Glu Gln Leu His Val Leu Pro Leu Tyr Lys 1085 1090 1095 Met Ala Asn Thr Asp Glu Phe Gly Ser Glu Glu Asn Gln Asn Ala 1100 1105 1110 Lys Val Gly Ser Gly Ala Ile Gln Val Leu Thr Ala Phe Pro Arg 1115 1120 1125 Glu Val Arg Arg Leu Pro Glu Pro Ala Lys Ser Cys Arg Gln Arg 1130 1135 1140 Gln Leu Glu Ala Arg Lys Ala Ala Ala Glu Lys Lys Lys Ile Gln 1145 1150 1155 Lys Glu Lys Leu Ser Thr Pro Glu Lys Ile Lys Gln Glu Ala Leu 1160 1165 1170 Glu Leu Ala Gly Ile Thr Ser Asp Pro Gly Leu Ser Leu Lys Gly 1175 1180 1185 Gly Leu Ser Gln Gln Gly Leu Lys Pro Ser Leu Lys Val Glu Pro 1190 1195 1200 Gln Asn His Phe Ser Ser Phe Lys Tyr Ser Gly Asn Ala Val Val 1205 1210 1215 Glu Ser Tyr Ser Val Leu Gly Asn Cys Arg Pro Ser Asp Pro Tyr 1220 1225 1230 Ser Met Asn Ser Val Tyr Ser Tyr His Ser Tyr Tyr Ala Gln Pro 1235 1240 1245 Ser Leu Thr Ser Val Asn Gly Phe His Ser Lys Tyr Ala Leu Pro 1250 1255 1260 Ser Phe Ser Tyr Tyr Gly Phe Pro Ser Ser Asn Pro Val Phe Pro 1265 1270 1275 Ser Gln Phe Leu Gly Pro Gly Ala Trp Gly His Ser Gly Ser Ser 1280 1285 1290 Gly Ser Phe Glu Lys Lys Pro Asp Leu His Ala Leu His Asn Ser 1295 1300 1305 Leu Ser Pro Ala Tyr Gly Gly Ala Glu Phe Ala Glu Leu Pro Ser 1310 1315 1320 Gln Ala Val Pro Thr Asp Ala His His Pro Thr Pro His His Gln 1325 1330 1335 Gln Pro Ala Tyr Pro Gly Pro Lys Glu Tyr Leu Leu Pro Lys Ala 1340 1345 1350 Pro Leu Leu His Ser Val Ser Arg Asp Pro Ser Pro Phe Ala Gln 1355 1360 1365 Ser Ser Asn Cys Tyr Asn Arg Ser Ile Lys Gln Glu Pro Val Asp 1370 1375 1380 Pro Leu Thr Gln Ala Glu Pro Val Pro Arg Asp Ala Gly Lys Met 1385 1390 1395 Gly Lys Thr Pro Leu Ser Glu Val Ser Gln Asn Gly Gly Pro Ser 1400 1405 1410 His Leu Trp Gly Gln Tyr Ser Gly Gly Pro Ser Met Ser Pro Lys 1415 1420 1425 Arg Thr Asn Gly Val Gly Gly Ser Trp Gly Val Phe Ser Ser Gly 1430 1435 1440 Glu Ser Pro Ala Ile Val Pro Asp Lys Leu Ser Ser Phe Gly Ala 1445 1450 1455 Ser Cys Leu Ala Pro Ser His Phe Thr Asp Gly Gln Trp Gly Leu 1460 1465 1470 Phe Pro Gly Glu Gly Gln Gln Ala Ala Ser His Ser Gly Gly Arg 1475 1480 1485 Leu Arg Gly Lys Pro Trp Ser Pro Cys Lys Phe Gly Asn Ser Thr 1490 1495 1500 Ser Ala Leu Ala Gly Pro Ser Leu Thr Glu Lys Pro Trp Ala Leu 1505 1510 1515 Gly Ala Gly Asp Phe Asn Ser Ala Leu Lys Gly Ser Pro Gly Phe 1520 1525 1530 Gln Asp Lys Leu Trp Asn Pro Met Lys Gly Glu Glu Gly Arg Ile 1535 1540 1545 Pro Ala Ala Gly Ala Ser Gln Leu Asp Arg Ala Trp Gln Ser Phe 1550 1555 1560 Gly Leu Pro Leu Gly Ser Ser Glu Lys Leu Phe Gly Ala Leu Lys 1565 1570 1575 Ser Glu Glu Lys Leu Trp Asp Pro Phe Ser Leu Glu Glu Gly Pro 1580 1585 1590 Ala Glu Glu Pro Pro Ser Lys Gly Ala Val Lys Glu Glu Lys Gly 1595 1600 1605 Gly Gly Gly Ala Glu Glu Glu Glu Glu Glu Leu Trp Ser Asp Ser 1610 1615 1620 Glu His Asn Phe Leu Asp Glu Asn Ile Gly Gly Val Ala Val Ala 1625 1630 1635 Pro Ala His Gly Ser Ile Leu Ile Glu Cys Ala Arg Arg Glu Leu 1640 1645 1650 His Ala Thr Thr Pro Leu Lys Lys Pro Asn Arg Cys His Pro Thr 1655 1660 1665 Arg Ile Ser Leu Val Phe Tyr Gln His Lys Asn Leu Asn Gln Pro 1670 1675 1680 Asn His Gly Leu Ala Leu Trp Glu Ala Lys Met Lys Gln Leu Ala 1685 1690 1695 Glu Arg Ala Arg Ala Arg Gln Glu Glu Ala Ala Arg Leu Gly Leu 1700 1705 1710 Gly Gln Gln Glu Ala Lys Leu Tyr Gly Lys Lys Arg Lys Trp Gly 1715 1720 1725 Gly Thr Val Val Ala Glu Pro Gln Gln Lys Glu Lys Lys Gly Val 1730 1735 1740 Val Pro Thr Arg Gln Ala Leu Ala Val Pro Thr Asp Ser Ala Val 1745 1750 1755 Thr Val Ser Ser Tyr Ala Tyr Thr Lys Val Thr Gly Pro Tyr Ser 1760 1765 1770 Arg Trp Ile 1775 734DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 7attgtcgtag gttaagtgga ttgtaaggag gtag 34833DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 8attcactacc actctcctta cttctctttc tcc 33937DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 9gtgaaatatt gtggtaggtt aagtggattg taaggag 371040DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 10catcttaatt aacactacca ctctccttac ttctctttct 401124DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 11gtgaattaag gatttttttg tgtg 241220DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 12aaaaaacatt tccctacttc 201330DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 13gttagattat tttagtagag gtatataagt 301424DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 14accaatcaaa tttctcaact ctat 241527DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 15tgagaaattt gattggtatt taagttg 271630DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 16caatcatctc tttaataaca ttaactaacc 30174PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 17Asp Ile Arg Leu 1 18201DNAArtificial SequenceDescription of Artificial Sequence Synthetic polynucleotide 18attgtggtag gttaagtgga ttgtaaggag gtaggtgtga tatctgtagc catcgaggaa 60gatttaaata ctggaattcc acaatcagaa ctttagggac caggctctcc gggaccttat 120aacttccaag ggtggtgacg actgtgaagt ggccgcgggg agctctgtgg agaaagagaa 180gtaaggagag tggtagtgaa t 20119158DNAArtificial SequenceDescription of Artificial Sequence Synthetic polynucleotidemodified_base(103)..(106)a, c, g, t, unknown or other 19gtgaaatatt gtggtaggtt aagtggattg taaggaggta ggtgttgtag agatcgagga 60agatttaaat agtggagaat gagaagttta gaagaggatg ttnnnnatgt gttataagag 120aaagagaagt aaggagagtg gtagtgttaa ttaagatg 15820286DNAArtificial SequenceDescription of Artificial Sequence Synthetic polynucleotide 20gtgaattaag gatttttttg tgtgtttttg gttttaggag agttttattt gtgtgattga 60tttgaggttt taaaagtttt tgagtaatat taagaatgtt ttattaggat ttttttttta 120aaaatatttt aaagattttt tttttgtttt tgttggtgaa gttttttagg gaattagaga 180tatgggaaga tgaattggag gtttaagaag tattagagag aggatttgta agaaaagttg 240gggttagatg tgtatttgag tggtatgaag tagggaaatg tttttt 28621221DNAArtificial SequenceDescription of Artificial Sequence Synthetic polynucleotide 21gttagattat tttagtagag gtatataagt tcggtttcgg tatttttgtt tttattggtt 60ggatatttcg tatttttcga gtttttaaaa aygaattaat aggaagagcg gatagcgatt 120tttaacgcgt aagcgtatat ttttttaggt agcgggtagt agtcgtttta gggagggacg 180aagagattta gtaatttata gagttgagaa atttgattgg t 22122282DNAArtificial SequenceDescription of Artificial Sequence Synthetic polynucleotide 22tgagaaattt gattggtatt taagttgttt aattaatagt tgtcgttgaa gggtggggtt 60ggatggcgta agttatagtt gaaggaagaa cgtgagtayg aggtattgag gtgattggtt 120gaaggtattt tcgttgagta tttagacgtt tttttggttt ttttggcgtt aaaatgtcgt 180tcgtggtagg ggttattcgg cggttggacg agatagtggt gaatcgtatc gcggcggggg 240aagttattta gyggttagtt aatgttatta aagagatgat tg 282239601DNAHomo sapiens 23agacactgct gctccggggg gctgacctgg cggggagtgg ccgcgcagtc tgctccggcg 60ccgctttgtg cgcgcagccg ctggcccctc tactcccggg tctgcccccc gggacacccc 120tctgcctcgc ccaagtcatg cagccctacc tgcctctcca ctgtggacct ttgggaaccg 180actcctcacc tcgggggctc gggccttgac tgtgctggga gccggtaggc gtcctccgcg 240acccgcccgc gcccctcgcg cccgccgggg ccccgggctc caaagttgtg gggaccggcg 300cgagttggaa agtttgcccg agggctggtg caggcttgga gctgggggcc gtgcgctgcc 360ctgggaatgt gacccggcca gcgaccaaaa ccttgtgtga ctgagctgaa gagcagtgca 420tccagattct cctcagaagt gagactttcc aaaggaccaa tgactctgtt tcctgcgccc 480tttcattttt tcctactctg tagctatgtc tcgatcccgc catgcaaggc cttccagatt 540agtcaggaag gaagatgtaa acaaaaaaaa gaaaaacagc caactacgaa agacaaccaa 600gggagccaac aaaaatgtgg catcagtcaa gactttaagc cctggaaaat taaagcaatt 660aattcaagaa agagatgtta agaaaaaaac agaacctaaa ccacccgtgc cagtcagaag 720ccttctgaca agagctggag cagcacgcat gaatttggat aggactgagg ttctttttca 780gaacccagag tccttaacct gcaatgggtt tacaatggcg ctacgaagca cctctcttag 840caggcgactc tcccaacccc cactggtcgt agccaaatcc aaaaaggttc cactttctaa 900gggtttagaa aagcaacatg attgtgatta taagatactc cctgctttgg gagtaaagca 960ctcagaaaat gattcggttc caatgcaaga cacccaagtc cttcctgata tagagactct 1020aattggtgta caaaatccct ctttacttaa aggtaagagc caagagacaa ctcagttttg 1080gtcccaaaga gttgaggatt ccaagatcaa tatccctacc cacagtggcc ctgcagctga 1140gatccttcct gggccactgg aagggacacg ctgtggtgaa ggactattct ctgaagagac 1200attgaatgat accagtggtt ccccaaaaat gtttgctcag gacacagtgt gtgctccttt 1260tccccaaaga gcaaccccca aagttacctc tcaaggaaac cccagcattc agttagaaga 1320gttgggttca cgagtagaat ctcttaagtt atctgattct tacctggatc ccattaaaag 1380tgaacatgat tgctacccca cctccagtct taataaggtt atacctgact tgaaccttag 1440aaactgcttg gctcttggtg ggtctacgtc tcctacctct gtaataaaat tcctcttggc 1500aggctcaaaa caagcgaccc ttggtgctaa accagatcat caagaggcct tcgaagctac 1560tgcaaatcaa caggaagttt ctgataccac ctctttccta ggacaggcct ttggtgctat 1620cccacatcaa tgggaacttc ctggtgctga cccagttcat ggtgaggccc tgggtgagac 1680cccagatcta ccagagattc ctggtgctat tccagtccaa ggagaggtct ttggtactat 1740tttagaccaa caagaaactc ttggtatgag tgggagtgtt gtcccagact tgcctgtctt 1800ccttcctgtt cctccaaatc caattgctac ctttaatgct ccttccaaat ggcctgagcc 1860ccaaagcact gtctcatatg gacttgcagt ccagggtgct atacagattt tgcctttggg 1920ctcaggacac actcctcaat catcatcaaa ctcagagaaa aattcattac ctccagtaat 1980ggctataagc aatgtagaaa atgagaagca ggttcatata agcttcctgc cagctaacac 2040tcaggggttc ccattagccc ctgagagagg actcttccat gcttcactgg gtatagccca 2100actctctcag gctggtccta gcaaatcaga cagagggagc tcccaggtca gtgtaaccag 2160cacagttcat gttgtcaaca ccacagtggt gactatgcca gtgccaatgg tcagtacctc 2220ctcttcttcc tataccactt tgctaccgac tttggaaaag aagaaaagaa agcgatgtgg 2280ggtctgtgaa ccctgccagc agaagaccaa ctgtggtgaa tgcacttact gcaagaacag 2340aaagaacagc catcagatct gtaagaaaag aaaatgtgag gagctgaaaa agaaaccatc 2400tgttgttgtg cctctggagg ttataaagga aaacaagagg ccccagaggg aaaagaagcc 2460caaagtttta aaggcagatt ttgacaacaa accagtaaat ggccccaagt cagaatccat 2520ggactacagt agatgtggtc atggggaaga acaaaaattg gaattgaacc cacatactgt 2580tgaaaatgta actaaaaatg aagacagcat gacaggcatc gaggtggaga agtggacaca 2640aaacaagaaa tcacagttaa ctgatcacgt gaaaggagat tttagtgcta atgtcccaga 2700agctgaaaaa tcgaaaaact ctgaagttga caagaaacga accaaatctc caaaattgtt 2760tgtacaaacc gtaagaaatg gcattaaaca tgtacactgt ttaccagctg aaacaaatgt 2820ttcatttaaa aaattcaata ttgaagaatt cggcaagaca ttggaaaaca attcttataa 2880attcctaaaa gacactgcaa accataaaaa cgctatgagc tctgttgcta ctgatatgag 2940ttgtgatcat ctcaagggga gaagtaacgt tttagtattc cagcagcctg gctttaactg 3000cagttccatt ccacattctt cacactccat cataaatcat catgctagta tacacaatga 3060aggtgatcaa ccaaaaactc ctgagaatat accaagtaaa gaaccaaaag atggatctcc 3120cgttcaacca agtctcttat cgttaatgaa agataggaga ttaacattgg agcaagtggt 3180agccatagag gccctgactc aactctcaga agccccatca gagaattcct ccccatcaaa 3240gtcagagaag gatgaggaat cagagcagag aacagccagt ttgcttaata gctgcaaagc 3300tatcctctac actgtaagaa aagacctcca agacccaaac ttacagggag agccaccaaa 3360acttaatcac tgtccatctt tggaaaaaca aagttcatgc aacacggtgg ttttcaatgg 3420gcaaactact accctttcca actcacatat caactcagct actaaccaag catccacaaa 3480gtcacatgaa tattcaaaag tcacaaattc attatctctt tttataccaa aatcaaattc 3540atccaagatt gacaccaata aaagtattgc tcaagggata attactcttg acaattgttc 3600caatgatttg catcagttgc caccaagaaa taatgaagtg gagtattgca accagttact 3660ggacagcagc aaaaaattgg actcagatga tctatcatgt caggatgcaa cccataccca 3720aattgaggaa gatgttgcaa cacagttgac acaacttgct tcgataatta agatcaatta 3780tataaaacca gaggacaaaa aagttgaaag tacaccaaca agccttgtca catgtaatgt 3840acagcaaaaa tacaatcagg agaagggcac aatacaacag aaaccacctt caagtgtaca 3900caataatcat ggttcatcat taacaaaaca aaagaaccca acccagaaaa agacaaaatc 3960caccccatca agagatcggc ggaaaaagaa gcccacagtt gtaagttatc aagaaaatga 4020tcggcagaag tgggaaaagt tgtcctatat gtatggcaca atatgcgaca tttggatagc 4080atcgaaattt caaaattttg ggcaattttg tccacatgat tttcctactg tatttgggaa 4140aatttcttcc tcgaccaaaa tatggaaacc actggctcaa acgaggtcca ttatgcaacc 4200caaaacagta tttccaccac tcactcagat aaaattacag agatatcctg aatcagcaga 4260ggaaaaggtg aaggttgaac cattggattc actcagctta tttcatctta aaacggaatc 4320caacgggaag gcattcactg ataaagctta taattctcag gtacagttaa cggtgaatgc 4380caatcagaaa gcccatcctt tgacccagcc ctcctctcca cctaaccagt gtgctaacgt 4440gatggcaggc gatgaccaaa tacggtttca gcaggttgtt aaggagcaac tcatgcatca 4500gagactgcca acattgcctg gtatctctca tgaaacaccc ttaccggagt cagcactaac 4560tctcaggaat gtaaatgtag tgtgttcagg tggaattaca gtggtttcta ccaaaagtga 4620agaggaagtc tgttcatcca gttttggaac atcagaattt tccacagtgg acagtgcaca 4680gaaaaatttt aatgattatg ccatgaactt ctttactaac cctacaaaaa acctagtgtc 4740tataactaaa gattctgaac tgcccacctg cagctgtctt gatcgagtta tacaaaaaga 4800caaaggccca tattatacac accttggggc aggaccaagt gttgctgctg tcagggaaat 4860catggagaat aggtatggtc aaaaaggaaa cgcaataagg atagaaatag tagtgtacac 4920cggtaaagaa gggaaaagct ctcatgggtg tccaattgct aagtgggttt taagaagaag 4980cagtgatgaa gaaaaagttc tttgtttggt ccggcagcgt acaggccacc actgtccaac 5040tgctgtgatg gtggtgctca tcatggtgtg ggatggcatc cctcttccaa tggccgaccg 5100gctatacaca gagctcacag agaatctaaa gtcatacaat gggcacccta ccgacagaag 5160atgcaccctc aatgaaaatc gtacctgtac atgtcaagga attgatccag agacttgtgg 5220agcttcattc tcttttggct gttcatggag tatgtacttt aatggctgta agtttggtag 5280aagcccaagc cccagaagat ttagaattga tccaagctct cccttacatg aaaaaaacct 5340tgaagataac

ttacagagtt tggctacacg attagctcca atttataagc agtatgctcc 5400agtagcttac caaaatcagg tggaatatga aaatgttgcc cgagaatgtc ggcttggcag 5460caaggaaggt cgtcccttct ctggggtcac tgcttgcctg gacttctgtg ctcatcccca 5520cagggacatt cacaacatga ataatggaag cactgtggtt tgtaccttaa ctcgagaaga 5580taaccgctct ttgggtgtta ttcctcaaga tgagcagctc catgtgctac ctctttataa 5640gctttcagac acagatgagt ttggctccaa ggaaggaatg gaagccaaga tcaaatctgg 5700ggccatcgag gtcctggcac cccgccgcaa aaaaagaacg tgtttcactc agcctgttcc 5760ccgttctgga aagaagaggg ctgcgatgat gacagaggtt cttgcacata agataagggc 5820agtggaaaag aaacctattc cccgaatcaa gcggaagaat aactcaacaa caacaaacaa 5880cagtaagcct tcgtcactgc caaccttagg gagtaacact gagaccgtgc aacctgaagt 5940aaaaagtgaa accgaacccc attttatctt aaaaagttca gacaacacta aaacttattc 6000gctgatgcca tccgctcctc acccagtgaa agaggcatct ccaggcttct cctggtcccc 6060gaagactgct tcagccacac cagctccact gaagaatgac gcaacagcct catgcgggtt 6120ttcagaaaga agcagcactc cccactgtac gatgccttcg ggaagactca gtggtgccaa 6180tgcagctgct gctgatggcc ctggcatttc acagcttggc gaagtggctc ctctccccac 6240cctgtctgct cctgtgatgg agcccctcat taattctgag ccttccactg gtgtgactga 6300gccgctaacg cctcatcagc caaaccacca gccctccttc ctcacctctc ctcaagacct 6360tgcctcttct ccaatggaag aagatgagca gcattctgaa gcagatgagc ctccatcaga 6420cgaaccccta tctgatgacc ccctgtcacc tgctgaggag aaattgcccc acattgatga 6480gtattggtca gacagtgagc acatcttttt ggatgcaaat attggtgggg tggccatcgc 6540acctgctcac ggctcggttt tgattgagtg tgcccggcga gagctgcacg ctaccactcc 6600tgttgagcac cccaaccgta atcatccaac ccgcctctcc cttgtctttt accagcacaa 6660aaacctaaat aagccccaac atggttttga actaaacaag attaagtttg aggctaaaga 6720agctaagaat aagaaaatga aggcctcaga gcaaaaagac caggcagcta atgaaggtcc 6780agaacagtcc tctgaagtaa atgaattgaa ccaaattcct tctcataaag cattaacatt 6840aacccatgac aatgttgtca ccgtgtcccc ttatgctctc acacacgttg cggggcccta 6900taaccattgg gtctgaaggc ttttctcccc ctcttaatgc ctttgctagt gcagtgtatt 6960ttttcaaggt gctgttaaaa gaaagtcatg ttgtcgttta ctatcttcat ctcacccatt 7020tcaagtctga ggtaaaaaaa taataatgat aacaaaacgg ggtgggtatt cttaactgtg 7080actatatttt gacaattggt agaaggtgca cattttaagc aaaaataaaa gttttatagt 7140tttaaataca taaagaaatg tttcagttag gcattaacct tgatagaatc actcagtttg 7200gtgctttaaa ttaagtctgt ttactatgaa acaagagtca tttttagagg attttaacag 7260gttcatgttc tatgatgtaa aatcaagaca cacagtgtta actctacaca gcttctggtg 7320cttaaccaca tccacacagt taaaaataag ctgaattatt atttcatggt gccattgttc 7380caacatcttc caatcattgc tagaaaattg gcatattcct ttgaaataaa cttatgaaat 7440gttttctctc ttaaaatatt tctcctgtgt aaaataaatc attgttgtta gtaatggttg 7500gaggctgttc ataaattgta aatatatatt ttaaaagcac tttctatttt taaaagtaac 7560ttgaaataat atagtataag aatcctattg tctattgttt gtgcatattt gcatacaaga 7620gaaatcattt atccttgctg tgtagagttc catcttgtta actgcagtat gtattctaat 7680catgtatatg gtttgtgttc ttttactgtg tcctctcaca ttcaagtatt agcaacttgc 7740agtatataaa atagttagat aatgagaagt tgttaattat ctctaaaatt ggaattagga 7800agcatatcac caatactgat taacattctc tttggaacta ggtaagagtg gtctcttctt 7860attgaacaac ctcaatttag tttcatccca cctttctcag tataatccat gagaggtgtt 7920tccaaaagga gatgagggaa caggataggt ttcagaagag tcaaatgctt ctaatgtctc 7980aaggtgataa aatacaaaaa ctaagtagac agatatttgt actgaagtct gatacagaat 8040tagaaaaaaa aaattcttgt tgaaatattt tgaaaacaaa ttccctacta tcatcacatg 8100cctccccaac cccaagtcaa aaacaagagg aatggtacta caaacatggc tttgtccatt 8160aagagctaat tcatttgttt atcttagcat actagatttg ggaaaatgat aactcatctt 8220ttctgataat tgcctatgtt ctaggtaaca ggaaaacagg cattaagttt attttagtct 8280tcccattttc ttcctattac tttattgact cattttattg caaaacaaaa aggattaccc 8340aaacaacatg tttcgaacaa ggagaatttt caatgaaata cttgattctg ttaaaatgca 8400gaggtgctat aacattcaaa gtgtcagatt ccttgggagt atggaaaacc taatggtgct 8460tctcccttgg aaatgccata ggaagcccac aaccgctaac acttacaatt ttggtgcaaa 8520agcaaacagt tccagcaggc tctctaaaga aaaactcatt gtaacttatt aaaataatat 8580ctggtgcaaa gtatctgttt tgagcttttg actaatccaa gtaaaggaat atgaagggat 8640tgtaaaaaac aaaatgtcca ttgatagacc atcgtgtaca agtagatttc tgcttgttga 8700atatgtaaaa tagggtaatt cattgacttg ttttagtatt ttgtgtgcct tagatttccg 8760ttttaagaca tgtatatttt tgtgagccta aggtttctta tatacatata agtatataaa 8820taagtgattg tttattgctt cagctgcttc aacaagatat ttactagtat tagactatca 8880ggaatacacc cttgcgagat tatgttttag attttaggcc ttagctccca ctagaaatta 8940tttcttcacc agatttaatg gataaagttt tatggctctt tatgcatcca ctcatctact 9000cattcttcga gtctacactt attgaatgcc tgcaaaatct aagtatcact tttatttttc 9060tttggatcac cacctatgac atagtaaact tgaagaataa aaactaccct cagaaatatt 9120tttaaaagaa gtagcaaatt atcttcagta taatccatgg taatgtatgc agtaattcaa 9180attgatctct ctctcaatag gtttcttaac aatctaaact tgaaacatca atgttaattt 9240ttggaactat tgggatttgt gacgcttgtt gcagtttacc aaaacaagta tttgaaaata 9300tatagtatca actgaaatgt ttccattccg ttgttgtagt taacatcatg aatggacttc 9360ttaagctgat taccccactg tgggaaccaa attggattcc tactttgttg gactctcttt 9420cctgatttta acaatttacc atcccattct ctgccctgtg atttttttta aaagcttatt 9480caatgttctg cagcattgtg attgtatgct ggctacactg cttttagaat gctctttctc 9540atgaagcaag gaaataaatt tgtttgaaat gacattttct ctcaaaaaaa aaaaaaaaaa 9600a 9601249677DNAHomo sapiens 24gcggccgccc cgagacgccg gccccgctga gtgatgagaa cagacgtcaa actgccttat 60gaatattgat gcggaggcta ggctgctttc gtagagaagc agaaggaagc aagatggctg 120ccctttagga tttgttagaa aggagacccg actgcaactg ctggattgct gcaaggctga 180gggacgagaa cgaggctggc aaacattcag cagcacaccc tctcaagatt gtttacttgc 240ctttgctcct gttgagttac aacgcttgga agcaggagat gggctcagca gcagccaata 300ggacatgatc caggaagagc agtaagggac tgagctgctg aattcaacta gagggcagcc 360ttgtggatgg ccccgaagca agcctgatgg aacaggatag aaccaaccat gttgagggca 420acagactaag tccattcctg ataccatcac ctcccatttg ccagacagaa cctctggcta 480caaagctcca gaatggaagc ccactgcctg agagagctca tccagaagta aatggagaca 540ccaagtggca ctctttcaaa agttattatg gaataccctg tatgaaggga agccagaata 600gtcgtgtgag tcctgacttt acacaagaaa gtagagggta ttccaagtgt ttgcaaaatg 660gaggaataaa acgcacagtt agtgaacctt ctctctctgg gctccttcag atcaagaaat 720tgaaacaaga ccaaaaggct aatggagaaa gacgtaactt cggggtaagc caagaaagaa 780atccaggtga aagcagtcaa ccaaatgtct ccgatttgag tgataagaaa gaatctgtga 840gttctgtagc ccaagaaaat gcagttaaag atttcaccag tttttcaaca cataactgca 900gtgggcctga aaatccagag cttcagattc tgaatgagca ggaggggaaa agtgctaatt 960accatgacaa gaacattgta ttacttaaaa acaaggcagt gctaatgcct aatggtgcta 1020cagtttctgc ctcttccgtg gaacacacac atggtgaact cctggaaaaa acactgtctc 1080aatattatcc agattgtgtt tccattgcgg tgcagaaaac cacatctcac ataaatgcca 1140ttaacagtca ggctactaat gagttgtcct gtgagatcac tcacccatcg catacctcag 1200ggcagatcaa ttccgcacag acctctaact ctgagctgcc tccaaagcca gctgcagtgg 1260tgagtgaggc ctgtgatgct gatgatgctg ataatgccag taaactagct gcaatgctaa 1320atacctgttc ctttcagaaa ccagaacaac tacaacaaca aaaatcagtt tttgagatat 1380gcccatctcc tgcagaaaat aacatccagg gaaccacaaa gctagcgtct ggtgaagaat 1440tctgttcagg ttccagcagc aatttgcaag ctcctggtgg cagctctgaa cggtatttaa 1500aacaaaatga aatgaatggt gcttacttca agcaaagctc agtgttcact aaggattcct 1560tttctgccac taccacacca ccaccaccat cacaattgct tctttctccc cctcctcctc 1620ttccacaggt tcctcagctt ccttcagaag gaaaaagcac tctgaatggt ggagttttag 1680aagaacacca ccactacccc aaccaaagta acacaacact tttaagggaa gtgaaaatag 1740agggtaaacc tgaggcacca ccttcccaga gtcctaatcc atctacacat gtatgcagcc 1800cttctccgat gctttctgaa aggcctcaga ataattgtgt gaacaggaat gacatacaga 1860ctgcagggac aatgactgtt ccattgtgtt ctgagaaaac aagaccaatg tcagaacacc 1920tcaagcataa cccaccaatt tttggtagca gtggagagct acaggacaac tgccagcagt 1980tgatgagaaa caaagagcaa gagattctga agggtcgaga caaggagcaa acacgagatc 2040ttgtgccccc aacacagcac tatctgaaac caggatggat tgaattgaag gcccctcgtt 2100ttcaccaagc ggaatcccat ctaaaacgta atgaggcatc actgccatca attcttcagt 2160atcaacccaa tctctccaat caaatgacct ccaaacaata cactggaaat tccaacatgc 2220ctggggggct cccaaggcaa gcttacaccc agaaaacaac acagctggag cacaagtcac 2280aaatgtacca agttgaaatg aatcaagggc agtcccaagg tacagtggac caacatctcc 2340agttccaaaa accctcacac caggtgcact tctccaaaac agaccattta ccaaaagctc 2400atgtgcagtc actgtgtggc actagatttc attttcaaca aagagcagat tcccaaactg 2460aaaaacttat gtccccagtg ttgaaacagc acttgaatca acaggcttca gagactgagc 2520cattttcaaa ctcacacctt ttgcaacata agcctcataa acaggcagca caaacacaac 2580catcccagag ttcacatctc cctcaaaacc agcaacagca gcaaaaatta caaataaaga 2640ataaagagga aatactccag acttttcctc acccccaaag caacaatgat cagcaaagag 2700aaggatcatt ctttggccag actaaagtgg aagaatgttt tcatggtgaa aatcagtatt 2760caaaatcaag cgagttcgag actcataatg tccaaatggg actggaggaa gtacagaata 2820taaatcgtag aaattcccct tatagtcaga ccatgaaatc aagtgcatgc aaaatacagg 2880tttcttgttc aaacaataca cacctagttt cagagaataa agaacagact acacatcctg 2940aactttttgc aggaaacaag acccaaaact tgcatcacat gcaatatttt ccaaataatg 3000tgatcccaaa gcaagatctt cttcacaggt gctttcaaga acaggagcag aagtcacaac 3060aagcttcagt tctacaggga tataaaaata gaaaccaaga tatgtctggt caacaagctg 3120cgcaacttgc tcagcaaagg tacttgatac ataaccatgc aaatgttttt cctgtgcctg 3180accagggagg aagtcacact cagacccctc cccagaagga cactcaaaag catgctgctc 3240taaggtggca tctcttacag aagcaagaac agcagcaaac acagcaaccc caaactgagt 3300cttgccatag tcagatgcac aggccaatta aggtggaacc tggatgcaag ccacatgcct 3360gtatgcacac agcaccacca gaaaacaaaa catggaaaaa ggtaactaag caagagaatc 3420cacctgcaag ctgtgataat gtgcagcaaa agagcatcat tgagaccatg gagcagcatc 3480tgaagcagtt tcacgccaag tcgttatttg accataaggc tcttactctc aaatcacaga 3540agcaagtaaa agttgaaatg tcagggccag tcacagtttt gactagacaa accactgctg 3600cagaacttga tagccacacc ccagctttag agcagcaaac aacttcttca gaaaagacac 3660caaccaaaag aacagctgct tctgttctca ataattttat agagtcacct tccaaattac 3720tagatactcc tataaaaaat ttattggata cacctgtcaa gactcaatat gatttcccat 3780cttgcagatg tgtagagcaa attattgaaa aagatgaagg tcctttttat acccatctag 3840gagcaggtcc taatgtggca gctattagag aaatcatgga agaaaggttt ggacagaagg 3900gtaaagctat taggattgaa agagtcatct atactggtaa agaaggcaaa agttctcagg 3960gatgtcctat tgctaagtgg gtggttcgca gaagcagcag tgaagagaag ctactgtgtt 4020tggtgcggga gcgagctggc cacacctgtg aggctgcagt gattgtgatt ctcatcctgg 4080tgtgggaagg aatcccgctg tctctggctg acaaactcta ctcggagctt accgagacgc 4140tgaggaaata cggcacgctc accaatcgcc ggtgtgcctt gaatgaagag agaacttgcg 4200cctgtcaggg gctggatcca gaaacctgtg gtgcctcctt ctcttttggt tgttcatgga 4260gcatgtacta caatggatgt aagtttgcca gaagcaagat cccaaggaag tttaagctgc 4320ttggggatga cccaaaagag gaagagaaac tggagtctca tttgcaaaac ctgtccactc 4380ttatggcacc aacatataag aaacttgcac ctgatgcata taataatcag attgaatatg 4440aacacagagc accagagtgc cgtctgggtc tgaaggaagg ccgtccattc tcaggggtca 4500ctgcatgttt ggacttctgt gctcatgccc acagagactt gcacaacatg cagaatggca 4560gcacattggt atgcactctc actagagaag acaatcgaga atttggagga aaacctgagg 4620atgagcagct tcacgttctg cctttataca aagtctctga cgtggatgag tttgggagtg 4680tggaagctca ggaggagaaa aaacggagtg gtgccattca ggtactgagt tcttttcggc 4740gaaaagtcag gatgttagca gagccagtca agacttgccg acaaaggaaa ctagaagcca 4800agaaagctgc agctgaaaag ctttcctccc tggagaacag ctcaaataaa aatgaaaagg 4860aaaagtcagc cccatcacgt acaaaacaaa ctgaaaacgc aagccaggct aaacagttgg 4920cagaactttt gcgactttca ggaccagtca tgcagcagtc ccagcagccc cagcctctac 4980agaagcagcc accacagccc cagcagcagc agagacccca gcagcagcag ccacatcacc 5040ctcagacaga gtctgtcaac tcttattctg cttctggatc caccaatcca tacatgagac 5100ggcccaatcc agttagtcct tatccaaact cttcacacac ttcagatatc tatggaagca 5160ccagccctat gaacttctat tccacctcat ctcaagctgc aggttcatat ttgaattctt 5220ctaatcccat gaacccttac cctgggcttt tgaatcagaa tacccaatat ccatcatatc 5280aatgcaatgg aaacctatca gtggacaact gctccccata tctgggttcc tattctcccc 5340agtctcagcc gatggatctg tataggtatc caagccaaga ccctctgtct aagctcagtc 5400taccacccat ccatacactt taccagccaa ggtttggaaa tagccagagt tttacatcta 5460aatacttagg ttatggaaac caaaatatgc agggagatgg tttcagcagt tgtaccatta 5520gaccaaatgt acatcatgta gggaaattgc ctccttatcc cactcatgag atggatggcc 5580acttcatggg agccacctct agattaccac ccaatctgag caatccaaac atggactata 5640aaaatggtga acatcattca ccttctcaca taatccataa ctacagtgca gctccgggca 5700tgttcaacag ctctcttcat gccctgcatc tccaaaacaa ggagaatgac atgctttccc 5760acacagctaa tgggttatca aagatgcttc cagctcttaa ccatgataga actgcttgtg 5820tccaaggagg cttacacaaa ttaagtgatg ctaatggtca ggaaaagcag ccattggcac 5880tagtccaggg tgtggcttct ggtgcagagg acaacgatga ggtctggtca gacagcgagc 5940agagctttct ggatcctgac attgggggag tggccgtggc tccaactcat gggtcaattc 6000tcattgagtg tgcaaagcgt gagctgcatg ccacaacccc tttaaagaat cccaatagga 6060atcaccccac caggatctcc ctcgtctttt accagcataa gagcatgaat gagccaaaac 6120atggcttggc tctttgggaa gccaaaatgg ctgaaaaagc ccgtgagaaa gaggaagagt 6180gtgaaaagta tggcccagac tatgtgcctc agaaatccca tggcaaaaaa gtgaaacggg 6240agcctgctga gccacatgaa acttcagagc ccacttacct gcgtttcatc aagtctcttg 6300ccgaaaggac catgtccgtg accacagact ccacagtaac tacatctcca tatgccttca 6360ctcgggtcac agggccttac aacagatata tatgatatca cccccttttg ttggttacct 6420cacttgaaaa gaccacaacc aacctgtcag tagtatagtt ctcatgacgt gggcagtggg 6480gaaaggtcac agtattcatg acaaatgtgg tgggaaaaac ctcagctcac cagcaacaaa 6540agaggttatc ttaccatagc acttaatttt cactggctcc caagtggtca cagatggcat 6600ctaggaaaag accaaagcat tctatgcaaa aagaaggtgg ggaagaaagt gttccgcaat 6660ttacattttt aaacactggt tctattattg gacgagatga tatgtaaatg tgatcccccc 6720cccccgctta caactctaca catctgtgac cacttttaat aatatcaagt ttgcatagtc 6780atggaacaca aatcaaacaa gtactgtagt attacagtga caggaatctt aaaataccat 6840ctggtgctga atatatgatg tactgaaata ctggaattat ggctttttga aatgcagttt 6900ttactgtaat cttaactttt atttatcaaa atagctacag gaaacatgaa tagcaggaaa 6960acactgaatt tgtttggatg ttctaagaaa tggtgctaag aaaatggtgt ctttaatagc 7020taaaaattta atgcctttat atcatcaaga tgctatcagt gtactccagt gcccttgaat 7080aataggggta ccttttcatt caagttttta tcataattac ctattcttac acaagcttag 7140tttttaaaat gtggacattt taaaggcctc tggattttgc tcatccagtg aagtccttgt 7200aggacaataa acgtatatat gtacatatat acacaaacat gtatatgtgc acacacatgt 7260atatgtataa atattttaaa tggtgtttta gaagcacttt gtctacctaa gctttgacaa 7320cttgaacaat gctaaggtac tgagatgttt aaaaaacaag tttactttca ttttagaatg 7380caaagttgat ttttttaagg aaacaaagaa agcttttaaa atatttttgc ttttagccat 7440gcatctgctg atgagcaatt gtgtccattt ttaacacagc cagttaaatc caccatgggg 7500cttactggat tcaagggaat acgttagtcc acaaaacatg ttttctggtg ctcatctcac 7560atgctatact gtaaaacagt tttatacaaa attgtatgac aagttcattg ctcaaaaatg 7620tacagtttta agaattttct attaactgca ggtaataatt agctgcatgc tgcagactca 7680acaaagctag ttcactgaag cctatgctat tttatggatc ataggctctt cagagaactg 7740aatggcagtc tgcctttgtg ttgataatta tgtacattgt gacgttgtca tttcttagct 7800taagtgtcct ctttaacaag aggattgagc agactgatgc ctgcataaga tgaataaaca 7860gggttagttc catgtgaatc tgtcagttaa aaagaaacaa aaacaggcag ctggtttgct 7920gtggtggttt taaatcatta atttgtataa agaagtgaaa gagttgtata gtaaattaaa 7980ttgtaaacaa aactttttta atgcaatgct ttagtatttt agtactgtaa aaaaattaaa 8040tatatacata tatatatata tatatatata tatatatatg agtttgaagc agaattcaca 8100tcatgatggt gctactcagc ctgctacaaa tatatcataa tgtgagctaa gaattcatta 8160aatgtttgag tgatgttcct acttgtcata tacctcaaca ctagtttggc aataggatat 8220tgaactgaga gtgaaagcat tgtgtaccat catttttttc caagtccttt tttttattgt 8280taaaaaaaaa agcatacctt ttttcaatac ttgatttctt agcaagtata acttgaactt 8340caaccttttt gttctaaaaa ttcagggata tttcagctca tgctctccct atgccaacat 8400gtcacctgtg tttatgtaaa attgttgtag gttaataaat atattctttg tcagggattt 8460aaccctttta ttttgaatcc cttctatttt acttgtacat gtgctgatgt aactaaaact 8520aattttgtaa atctgttggc tctttttatt gtaaagaaaa gcattttaaa agtttgagga 8580atcttttgac tgtttcaagc aggaaaaaaa aattacatga aaatagaatg cactgagttg 8640ataaagggaa aaattgtaag gcaggagttt ggcaagtggc tgttggccag agacttactt 8700gtaactctct aaatgaagtt tttttgatcc tgtaatcact gaaggtacat actccatgtg 8760gacttccctt aaacaggcaa acacctacag gtatggtgtg caacagattg tacaattaca 8820ttttggccta aatacatttt tgcttactag tatttaaaat aaattcttaa tcagaggagg 8880cctttgggtt ttattggtca aatctttgta agctggcttt tgtcttttta aaaaatttct 8940tgaatttgtg gttgtgtcca atttgcaaac atttccaaaa atgtttgctt tgcttacaaa 9000ccacatgatt ttaatgtttt ttgtatacca taatatctag ccccaaacat ttgattacta 9060catgtgcatt ggtgattttg atcatccatt cttaatattt gatttctgtg tcacctactg 9120tcatttgtta aactgctggc caacaagaac aggaagtata gtttgggggg ttggggagag 9180tttacataag gaagagaaga aattgagtgg catattgtaa atatcagatc tataattgta 9240aatataaaac ctgcctcagt tagaatgaat ggaaagcaga tctacaattt gctaatatag 9300gaatatcagg ttgactatat agccatactt gaaaatgctt ctgagtggtg tcaactttac 9360ttgaatgaat ttttcatctt gattgacgca cagtgatgta cagttcactt ctgaagctag 9420tggttaactt gtgtaggaaa cttttgcagt ttgacactaa gataacttct gtgtgcattt 9480ttctatgctt ttttaaaaac tagtttcatt tcattttcat gagatgtttg gtttataaga 9540tctgaggatg gttataaata ctgtaagtat tgtaatgtta tgaatgcagg ttatttgaaa 9600gctgtttatt attatatcat tcctgataat gctatgtgag tgtttttaat aaaatttata 9660tttatttaat gcactct 96772510983DNAHomo sapiens 25atggactcag ggccagtgta ccatggggac tcacggcagc taagcgcctc aggggtgccg 60gtcaatggtg ctagagagcc cgctggaccc agtctgctgg ggactggggg tccttggcgg 120gtagaccaaa agcccgactg ggaggctgcc ccaggcccag ctcatactgc tcgcctggaa 180gatgcccacg atctggtggc cttttcggct gtggccgaag ctgtgtcctc ttatggggcc 240cttagcaccc ggctctatga aaccttcaac cgtgagatga gtcgtgaggc tgggaacaac 300agcaggggac cccggccagg gcctgagggc tgctctgctg gcagcgaaga ccttgacaca 360ctgcagacgg ccctggccct cgcgcggcat ggtatgaaac cacccaactg caactgcgat 420ggcccagaat gccctgacta cctcgagtgg ctggagggga agatcaagtc tgtggtcatg 480gaaggagggg aggagcggcc caggctccca gggcctctgc ctcctggtga ggccggcctc 540ccagcaccaa gcaccaggcc actcctcagc tcagaggtgc cccagatctc tccccaagag 600ggcctgcccc tgtcccagag tgccctgagc attgccaagg aaaaaaacat cagcttgcag 660accgccattg ccattgaggc cctcacacag ctctcctctg ccctcccgca gccttctcat 720tccacccccc aggcttcttg cccccttcct gaggccttgt cacctcctgc ccctttcaga 780tctccccagt cttacctccg ggctccctca tggcctgtgg ttcctcctga agagcactca 840tcttttgctc ctgatagctc tgccttccct ccagcaactc ctagaactga gttccctgaa 900gcctggggca ctgacacccc tccagcaacg ccccggagct cctggcccat gcctcgccca 960agccccgatc ccatggctga actggagcag

ttgttgggca gcgccagtga ttacatccag 1020tcagtattca agcggcctga ggccctgcct accaagccca aggtcaaggt ggaggcaccc 1080tcttcctccc cggccccggc cccatcccct gtacttcaga gggaggctcc cacgccatcc 1140tcggagcccg acacccacca gaaggcccag accgccctgc agcagcacct ccaccacaag 1200cgcagcctct tcctagaaca ggtgcacgac acctccttcc ctgctccttc agagccttct 1260gctcctggct ggtggccccc accaagttca cctgtcccac ggcttccaga cagaccaccc 1320aaggagaaga agaagaagct cccaacacca gctggaggtc ccgtgggaac ggagaaagct 1380gcccctggga tcaagcccag tgtccgaaag cccattcaga tcaagaagtc caggccccgg 1440gaagcacagc ccctcttccc acctgtccga cagattgtcc tggaagggct taggtcccca 1500gcctcccagg aagtgcaggc tcatccaccg gcccctctgc ctgcctcaca gggctctgct 1560gtgcccctgc ccccagaacc ttctcttgcg ctatttgcac ctagtccctc cagggacagc 1620ctgctgcccc ctactcagga aatgaggtcc cccagcccca tgacagcctt gcagccaggc 1680tccactggcc ctcttccccc tgccgatgac aagctggaag agctcatccg gcagtttgag 1740gctgaatttg gagatagctt tgggcttccc ggcccccctt ctgtgcccat tcaggacccc 1800gagaaccagc aaacatgtct cccagcccct gagagcccct ttgctacccg ttcccccaag 1860caaatcaaga ttgagtcttc gggggctgtg actgtgctct caaccacctg cttccattca 1920gaggagggag gacaggaggc cacacccacc aaggctgaga acccactcac acccaccctc 1980agtggcttct tggagtcacc tcttaagtac ctggacacac ccaccaagag tctgctggac 2040acacctgcca agagagccca ggccgagttc cccacctgcg attgcgtcga acaaatagtg 2100gagaaagatg aaggtccata ttatactcac ttgggatctg gccccacggt cgcctctatc 2160cgggaactca tggaggagcg gtatggagag aaggggaaag ccatccggat cgagaaggtc 2220atctacacgg ggaaggaggg aaagagctcc cgcggttgcc ccattgcaaa gtgggtgatc 2280cgcaggcaca cgctggagga gaagctactc tgcctggtgc ggcaccgggc aggccaccac 2340tgccagaacg ctgtgatcgt catcctcatc ctggcctggg agggcattcc ccgtagcctc 2400ggagacaccc tctaccagga gctcaccgac accctccgga agtatgggaa ccccaccagc 2460cggagatgcg gcctcaacga tgaccggacc tgcgcttgcc aaggcaaaga ccccaacacc 2520tgtggtgcct ccttctcctt tggttgttcc tggagcatgt acttcaacgg ctgcaagtat 2580gctcggagca agacacctcg caagttccgc ctcgcagggg acaatcccaa agaggaagaa 2640gtgctccgga agagtttcca ggacctggcc accgaagtcg ctcccctgta caagcgactg 2700gcccctcagg cctatcagaa ccaggtgacc aacgaggaaa tagcgattga ctgccgtctg 2760gggctgaagg aaggacggcc cttcgcgggg gtcacggcct gcatggactt ctgtgcccac 2820gcccacaagg accagcataa cctctacaat gggtgcaccg tggtctgcac cctgaccaag 2880gaagacaatc gctgcgtggg caagattccc gaggatgagc agctgcatgt tctccccctg 2940tacaagatgg ccaacacgga tgagtttggt agcgaggaga accagaatgc aaaggtgggc 3000agcggagcca tccaggtgct caccgccttc ccccgcgagg tccgacgcct gcccgagcct 3060gccaagtcct gccgccagcg gcagctggaa gccagaaagg cagcagccga gaagaagaag 3120attcagaagg agaagctgag cactccggag aagatcaagc aggaggccct ggagctggcg 3180ggcattacgt cggacccagg cctgtctctg aagggtggat tgtcccagca aggcctgaag 3240ccctccctca aggtggagcc gcagaaccac ttcagctcct tcaagtacag cggcaacgcg 3300gtggtggaga gctactcggt gctgggcaac tgccggccct ccgaccctta cagcatgaac 3360agcgtgtact cctaccactc ctactatgca cagcccagcc tgacctccgt caatggcttc 3420cactccaagt acgctctccc gtcttttagc tactatggct ttccatccag caaccccgtc 3480ttcccctctc agttcctggg tcctggtgcc tgggggcaca gtggcagcag tggcagtttt 3540gagaagaagc cagacctcca cgctctgcac aacagcctga gcccggccta cggtggtgct 3600gagtttgccg agctgcccag ccaggctgtt cccacagacg cccaccaccc cactcctcac 3660caccagcagc ctgcgtaccc aggccccaag gagtatctgc ttcccaaggc ccccctactc 3720cactcagtgt ccagggaccc ctcccccttt gcccagagct ccaactgcta caacagatcc 3780atcaagcaag agccagtaga cccgctgacc caggctgagc ctgtgcccag agacgctggc 3840aagatgggca agacacctct gtccgaggtg tctcagaatg gaggacccag tcacctttgg 3900ggacagtact caggaggccc aagcatgtcc cccaagagga ctaacggtgt gggtggcagc 3960tggggtgtgt tctcgtctgg ggagagtcct gccatcgtcc ctgacaagct cagttccttt 4020ggggccagct gcctggcccc ttcccacttc acagatggcc agtgggggct gttccccggt 4080gaggggcagc aggcagcttc ccactctgga ggacggctgc gaggcaaacc gtggagcccc 4140tgcaagtttg ggaacagcac ctcggccttg gctgggccca gcctgactga gaagccgtgg 4200gcgctggggg caggggattt caactcggcc ctgaaaggta gtcctgggtt ccaagacaag 4260ctgtggaacc ccatgaaagg agaggagggc aggattccag ccgcaggggc cagccagctg 4320gacagggcct ggcagtcctt tggtctgccc ctgggatcca gcgagaagct gtttggggct 4380ctgaagtcag aggagaagct gtgggacccc ttcagcctgg aggaggggcc ggctgaggag 4440ccccccagca agggagcggt gaaggaggag aagggcggtg gtggtgcgga ggaggaagag 4500gaggagctgt ggtcggacag tgaacacaac ttcctggacg agaacatcgg cggcgtggcc 4560gtggccccag cccacggctc catcctcatc gagtgtgccc ggcgggagct gcacgccacc 4620acgccgctta agaagcccaa ccgctgccac cccacccgca tctcgctggt cttctaccag 4680cacaagaacc tcaaccagcc caaccacggg ctggccctct gggaagccaa gatgaagcag 4740ctggcggaga gggcacgggc acggcaggag gaggctgccc ggctgggcct gggccagcag 4800gaggccaagc tctacgggaa gaagcgcaag tgggggggca ctgtggttgc tgagccccag 4860cagaaagaga agaagggggt cgtccccacc cggcaggcac tggctgtgcc cacagactcg 4920gcggtcaccg tgtcctccta tgcctacacg aaggtcactg gcccctacag ccgctggatc 4980taggtgccag ggagccagcg tacctcagcg tcgggcctgg cccgagctgt ctctgtggtg 5040cttttgccct catacctggg ggcgggttgg gggtgcagaa gtctttttat ctctatatac 5100atatatagat gcgcatatca tatatatgta tttatggtcc aaacctcaga actgacccgc 5160ccctccctta cccccacttc cccagcactt tgaagaagaa actacggctg tcgggtgatt 5220tttccgtgat cttaatattt atatctccaa gttgtccccc ccccttgtct ggggggtttt 5280tatttttatt ttctctttgt ttttaaaact ctatccttgt atatcacaat aatggaaaga 5340aagtttatag tatcctttca caaaggagta gttttaaatt ccatttaaaa tgtgtattta 5400ttggattttt taaaagcgac aatagtaatg gtaaaggatg ggcaggaaag gccagtagtg 5460ctcccccgcc cagtctcgct gggtctggcg agccaagccc ctcgggcgct ggcgaggtcc 5520tcagccatct gcccctcgag agccaagcgc ggacggtagc cacccagttc atccctcccg 5580acatacaccc cttccctttg gggaagggag cctcaggaca gcttctgtcc tctctgatag 5640gatgggagag tctgcagaaa accatctggg gtcccttttc cagtccccgg cttggagtcg 5700aagggcagat gcaccccagg ccagccccac gagatgctgg catagctttc cccagaaacc 5760aggttggaag tagatggctt caagcttgct agtctccaca ctgaatcctc tgtccgttat 5820ttatggagtc acacgatgtc atggttcact aggcagcacc tcacgctgga gctggagtgc 5880gaggttctta ggggccgtgc ccaccatgtt gccaagccaa tgcatgctga gctgaaggaa 5940tttgtcttag tggcagtttt ttaaaaaatg cccccaaagt ctatgctgat actgaaaaag 6000ggctactgta tctttaaaaa caggaagttg aacccaagct gtgaaaagcc agtggtgctc 6060tgtgcatggt gctgtgcgga gcctggtgct gtagtgttgt gctgggactt tcttgactct 6120tgggcaggtc acatcctaca ggagctcagc agaccagtgt aacaacagtt aatgcatcta 6180tcctgatccc tgaatttcca cattggacaa tggtgcatgc ctcacacctg agcctgcttc 6240ctccatgctg tcattgggtt cgggggccta cacttaacaa ttttaaagtg caagagtcaa 6300acattttcaa caggttgcta taattttcct ccctaattgg tgccatttct ccatttgatc 6360attttctttt tttcctttct cccctcttca tccactttaa tatagctgtt ctgaaattct 6420ggtgcattca ttcggttctt tgaaatgaga atgtggtgct taatttttgt gacgttgtcg 6480agagaggttg ggcctgatgg gagcaacact catcatcacc aagtcaaact ttgttggagt 6540gttggttttt cttgtgatat tagcagaaat gatctcatgc tagccatgtg gatgtgtgtg 6600tggtgaatgg ggggcttcat caggacacac agaggggaat gtggccacac ggtggatgac 6660caccaagccc tgagatgaac aggtatttac tgagcagttg tattcagata tgggtcttca 6720tgaatcatgt ttaacaatca gatgaccgct ataggcaagt tcctgagctt ccgggtgcct 6780tgagtaagag ctgagaaccg gcctgctggg tgtttactgt atctgtttgg aagcactggc 6840ggagggtcgt tgtaagatgt cctgagcatt tatgtggtct ggttttaact gtaaatagtg 6900aaagattttt ttaagcactt ttgcctagat ttaaacagca acttgaaaaa aaaagtatgt 6960tttaacatgt aattgtggga gaaattgtaa atagtagccg aatatttaat gtgctttgtc 7020tatcctccac ttttaccata ttctgtaaag ttgcatttat tttacaggac aaaaaaatga 7080aatattattg cttttgaaat aaatacccaa gagcttatca ggacttagaa ttattcagaa 7140ctcagattta taggaaaacc tctgaccttc agtttgacaa gctaaaggaa gcagagtctt 7200taatgagcat gctaattttc tagttttgag gaaaaattgg gtcctttaaa tgctattttg 7260cttatcgcat cagtactttt atgcaggtct catttgactc cgtgcttagg tagatgcggg 7320ggtgccttga aaacttcatt ttaaatgatc ttaagcaaga aatacaatat tttacgaaac 7380atttggagaa tgtgaccgtc tgtatgaccc gtggaagccc caggttggct gttggtttgg 7440aaggtcccga gtgtaaccca ggtgattctg atacttggca tgtgtgaatc ttcctgatgt 7500atgttaaata aactcttccc ctcatcaccc tttggtagga aagccattag atgaaaggag 7560aaaccaatac aagctaaaag catgcgacgt ctgtccccca gcccaaacag ccttggttca 7620tcagtttctg cagtaggaga taggctgctg agaggtgagt caagaggcag tctccattgg 7680atgtccccac tccccgcaga atggcgtttc cagagttagg cggtgtggtt gccgtgctca 7740agcccatgct gatttgtaca ctacatgtct aacctacctc aaatctcagt cattaaaatt 7800agcatgcttt agacatatat ttaaaaagta actatgcaca gctctttatc ccccccttgc 7860tgctgaagct ttcttaaaga gaaaaatcaa atttttattt tttactggca ctatcatttt 7920ttaagtccta aagatgatta acagacattt ttatcatgag aagaaaaata aagccattgc 7980aactaaagaa cctaacagca tgaccaagtt cgaagagtca tattatagca acggaaatcg 8040atggcgtctt agtcatctcc ccagtgtgcc ctgtccacgg acaccatcca cgtgcagtgc 8100aaacatttgg ttccttttct gctctgtttt gttttccctg cctgttgcgt gcaagggaag 8160tgcttgtaaa gttctgtgct acgagatttt taaaataaaa atcgcttcgc agcaggttct 8220cacaaaataa ctggtgctag ctcaagaaat catcatctga ccatcagaaa tcttgactaa 8280aggtgttgca tggatttggg ggtctttcgg tttttggttt tgggtctggc ttttagcagg 8340gccaatgttt cccacacccc ggcttcatgg gtactgcttt gccttctcac caaggtgacg 8400atggtgtgcg tggaaagaga tgatacccca ccgccccctc ttggtccttc caccagcctc 8460ttttgggaac agtagtttgc agagcaaggg atttttaaag cgctaaagca aggaaagaag 8520tagcagagct taactgcttt gtaccacaca gcagtagatg tgcaaggacg gttgacaatg 8580agtcgatgat aacctaattt cattgagaga aacccagcca gacttgcttc tagaggttta 8640atcaccatga gatctcaaac caaggcaaag ctggtggaaa actatatgat atccctgacg 8700tgcctcaacc agtatctctt tccttttgtt actgaagtgt gttttatgga ctaggaagca 8760tttttatgaa ttgaaatagt ctaaataaaa tggtgctatg gtgttttaat gtgactgtcc 8820ctgatcctgt cttgctgagg tgctatcaac gttctgaaac cacaaccaac caaaaacaag 8880gtgggctcca gtctcttggc tttttttttt ccctcccctc ttttggtgct gtcttagacc 8940cgtttaccgt gctataatct gctctgagca gtgttgtgtt gtgttgtatt gttcttccct 9000tggtggccaa acaaagcaag tcgagaaggc agctatctcc ctttctgtga tcgggagtgg 9060gcctgcctgg cttggcaggt gctttttggt tccacacctg tcttctcagg cttgatgtga 9120aagaaagggc gaagggtttt ttgagttttt gtttttgagg aaggggagtt gggtacttct 9180gcctctccta gcatgatagg cattctcata gccagggaca gattttctcc tgcagcccag 9240ggtgctaagc agacatctct gggagtccca agggcacacc aagggagacc agatggatct 9300ccttcctccc ctggcactgg ctgggaccat ggtgggcagg ggcttcattc tctgacccag 9360cgttgcttct gcctctcatt ggtaacccct tatgttcgga ctaaaggaag gagctttctt 9420tgctcactcg atgccactga ggctgctttt tagttggtgc taacctaaat ttcttcttgg 9480gtccacagaa gttgatgttt taaaaactca ccaggaagct ccattttgtg tcatccactg 9540tcacaataat ttttttaaat acctcaaaaa caggacatca tgacaacttc agtaaagtag 9600attccatgag ggtctgatac ctgcaggttg tccgtctgat gacatacttg accttgaaaa 9660atctggggtc attttgtttt tcattcttca gcagttaaga tagcggaacg ccgaaaggaa 9720ggagcgtagt tggctgtatt tcatgtttaa gttttgcttt tgaataaaat gtgaatttcc 9780tatgcccatc tcattgagct ttctcagtca ttgttgctgt catttgaaat gactccctca 9840aaacctagtt ttattagcca gctgcctctg ctgtagtaca tggccaactt caacataccc 9900tggaccaaaa catttttgag gtgcataccc ccaacataag ttacacagtc ccacatccag 9960gtgcacagag tgcgagtgca ctccgcgagt gcggggggag gggcggcccc ctctggtgct 10020cccagccctt cctcctgcag agctgcaggc aagagcagag caataggctt ctcccctgag 10080cagagaccgc agcacagaaa tgcaaggtct aaagttgctt tttgcctaag aatcagcgag 10140cgatttggcc tacttcctca ttggcttcta ttctgatatc agggatgctt tttgtagtgg 10200tattgtttgc tccctcttcg cgttttgact acccgtcatt caggggtaac tcatcactct 10260tcacacgggg atttaaatta agaaactaat tggctcatgt gaacattcca aattttcttg 10320gtttcaatac cctttttttt cttttgaggg gaaaagaggg gagaaaaaca ggagtgatgt 10380catttctttt tcatgtattc caattaaaga aacaagggca ggtcgtataa tggcatatta 10440atacattaga cttaatctag aacccctgta gctttttgat gtgttttatt tcttatctct 10500ttgaattcct gtttggttac ttggcttcca atggaggtga acttaacaac catacttgaa 10560tattccgtct tgactttgta aactgtggct acttgaaatg aagtttatct ggggttgatg 10620gatgaatggt agatttttgc aatgtctcaa ggcaatagga tgtgtattaa actgtagata 10680ttcttagtac agtaaattta tgctgataat tttattttgt ataattttta cctttttgtt 10740aatatttttt ccttccactt tattggtttg cctcctgagc tacccctcct taccctccct 10800tctccctcag tgtttcagta aatttaattt agggtgccta gaaattgcaa gtatgtatcc 10860tttttgattt gtattttatt ataatttaca caaacaactg ggtttgtgaa ctgtattact 10920cctggtatct ttaaaatatt gtgggtgttt taataaattt tatatttatt ttttgcactc 10980aaa 1098326761DNAHomo sapiens 26ggcggcagga ccagcatgca ccaccgaaac gactcccaga ggctggggaa agctggctgc 60ccgccagagc cgtcgttgca aatggcaaat actaatttcc tctccacctt atcccctgaa 120cactgcagac ctttggcggg ggaatgcatg aacaagctca aatgcggcgc tgctgaagca 180gagataatga atctccccga gcgcgtgggg actttttccg ctatcccggc tttagggggc 240atctcattac ctccaggggt catcgtcatg acagcccttc actcccccgc agcagcctca 300gcagccgtca cagacagtgc gtttcaaatt gccaatctgg cagactgccc gcagaatcat 360tcctcctcct cctcgtcctc ctcaggggga gctggcggag ccaacccagc caagaagaag 420aggaaaaggt gtggggtctg cgtgccctgc aagaggctca tcaactgtgg cgtctgcagc 480agttgcagga accgcaaaac gggacaccag atctgcaaat ttagaaaatg tgaagagcta 540aagaaaaaac ctggcacttc actagagaga acacctgttc ccagcgctga agcattccga 600tggttctttt aaagcagtag tatatcttat tttcaaggca tttggaaatg aagggcaaac 660taatgtcttg ttttaagaaa ctgcttagtc caccactgaa gaaaatatcc agaaattatt 720ttcattttat gtatagggat ttcttcaaaa aaaaaaaaaa a 761272136PRTHomo sapiens 27Met Ser Arg Ser Arg His Ala Arg Pro Ser Arg Leu Val Arg Lys Glu 1 5 10 15 Asp Val Asn Lys Lys Lys Lys Asn Ser Gln Leu Arg Lys Thr Thr Lys 20 25 30 Gly Ala Asn Lys Asn Val Ala Ser Val Lys Thr Leu Ser Pro Gly Lys 35 40 45 Leu Lys Gln Leu Ile Gln Glu Arg Asp Val Lys Lys Lys Thr Glu Pro 50 55 60 Lys Pro Pro Val Pro Val Arg Ser Leu Leu Thr Arg Ala Gly Ala Ala 65 70 75 80 Arg Met Asn Leu Asp Arg Thr Glu Val Leu Phe Gln Asn Pro Glu Ser 85 90 95 Leu Thr Cys Asn Gly Phe Thr Met Ala Leu Arg Ser Thr Ser Leu Ser 100 105 110 Arg Arg Leu Ser Gln Pro Pro Leu Val Val Ala Lys Ser Lys Lys Val 115 120 125 Pro Leu Ser Lys Gly Leu Glu Lys Gln His Asp Cys Asp Tyr Lys Ile 130 135 140 Leu Pro Ala Leu Gly Val Lys His Ser Glu Asn Asp Ser Val Pro Met 145 150 155 160 Gln Asp Thr Gln Val Leu Pro Asp Ile Glu Thr Leu Ile Gly Val Gln 165 170 175 Asn Pro Ser Leu Leu Lys Gly Lys Ser Gln Glu Thr Thr Gln Phe Trp 180 185 190 Ser Gln Arg Val Glu Asp Ser Lys Ile Asn Ile Pro Thr His Ser Gly 195 200 205 Pro Ala Ala Glu Ile Leu Pro Gly Pro Leu Glu Gly Thr Arg Cys Gly 210 215 220 Glu Gly Leu Phe Ser Glu Glu Thr Leu Asn Asp Thr Ser Gly Ser Pro 225 230 235 240 Lys Met Phe Ala Gln Asp Thr Val Cys Ala Pro Phe Pro Gln Arg Ala 245 250 255 Thr Pro Lys Val Thr Ser Gln Gly Asn Pro Ser Ile Gln Leu Glu Glu 260 265 270 Leu Gly Ser Arg Val Glu Ser Leu Lys Leu Ser Asp Ser Tyr Leu Asp 275 280 285 Pro Ile Lys Ser Glu His Asp Cys Tyr Pro Thr Ser Ser Leu Asn Lys 290 295 300 Val Ile Pro Asp Leu Asn Leu Arg Asn Cys Leu Ala Leu Gly Gly Ser 305 310 315 320 Thr Ser Pro Thr Ser Val Ile Lys Phe Leu Leu Ala Gly Ser Lys Gln 325 330 335 Ala Thr Leu Gly Ala Lys Pro Asp His Gln Glu Ala Phe Glu Ala Thr 340 345 350 Ala Asn Gln Gln Glu Val Ser Asp Thr Thr Ser Phe Leu Gly Gln Ala 355 360 365 Phe Gly Ala Ile Pro His Gln Trp Glu Leu Pro Gly Ala Asp Pro Val 370 375 380 His Gly Glu Ala Leu Gly Glu Thr Pro Asp Leu Pro Glu Ile Pro Gly 385 390 395 400 Ala Ile Pro Val Gln Gly Glu Val Phe Gly Thr Ile Leu Asp Gln Gln 405 410 415 Glu Thr Leu Gly Met Ser Gly Ser Val Val Pro Asp Leu Pro Val Phe 420 425 430 Leu Pro Val Pro Pro Asn Pro Ile Ala Thr Phe Asn Ala Pro Ser Lys 435 440 445 Trp Pro Glu Pro Gln Ser Thr Val Ser Tyr Gly Leu Ala Val Gln Gly 450 455 460 Ala Ile Gln Ile Leu Pro Leu Gly Ser Gly His Thr Pro Gln Ser Ser 465 470 475 480 Ser Asn Ser Glu Lys Asn Ser Leu Pro Pro Val Met Ala Ile Ser Asn 485 490 495 Val Glu Asn Glu Lys Gln Val His Ile Ser Phe Leu Pro Ala Asn Thr 500 505 510 Gln Gly Phe Pro Leu Ala Pro Glu Arg Gly Leu Phe His Ala Ser Leu 515 520 525 Gly Ile Ala Gln Leu Ser Gln Ala Gly Pro Ser Lys Ser Asp Arg Gly 530 535 540 Ser Ser Gln Val Ser Val Thr Ser Thr Val His Val Val Asn Thr Thr 545 550 555 560 Val Val Thr Met Pro Val Pro Met Val Ser Thr Ser Ser Ser Ser Tyr 565 570 575 Thr Thr Leu Leu Pro Thr Leu Glu Lys Lys Lys Arg Lys Arg Cys Gly 580 585 590 Val Cys Glu Pro Cys Gln Gln Lys Thr Asn Cys Gly Glu Cys Thr Tyr 595 600 605 Cys Lys Asn Arg Lys Asn Ser His Gln Ile Cys Lys Lys Arg Lys Cys 610 615 620 Glu Glu Leu Lys Lys Lys Pro Ser Val Val Val Pro Leu Glu Val Ile 625 630 635

640 Lys Glu Asn Lys Arg Pro Gln Arg Glu Lys Lys Pro Lys Val Leu Lys 645 650 655 Ala Asp Phe Asp Asn Lys Pro Val Asn Gly Pro Lys Ser Glu Ser Met 660 665 670 Asp Tyr Ser Arg Cys Gly His Gly Glu Glu Gln Lys Leu Glu Leu Asn 675 680 685 Pro His Thr Val Glu Asn Val Thr Lys Asn Glu Asp Ser Met Thr Gly 690 695 700 Ile Glu Val Glu Lys Trp Thr Gln Asn Lys Lys Ser Gln Leu Thr Asp 705 710 715 720 His Val Lys Gly Asp Phe Ser Ala Asn Val Pro Glu Ala Glu Lys Ser 725 730 735 Lys Asn Ser Glu Val Asp Lys Lys Arg Thr Lys Ser Pro Lys Leu Phe 740 745 750 Val Gln Thr Val Arg Asn Gly Ile Lys His Val His Cys Leu Pro Ala 755 760 765 Glu Thr Asn Val Ser Phe Lys Lys Phe Asn Ile Glu Glu Phe Gly Lys 770 775 780 Thr Leu Glu Asn Asn Ser Tyr Lys Phe Leu Lys Asp Thr Ala Asn His 785 790 795 800 Lys Asn Ala Met Ser Ser Val Ala Thr Asp Met Ser Cys Asp His Leu 805 810 815 Lys Gly Arg Ser Asn Val Leu Val Phe Gln Gln Pro Gly Phe Asn Cys 820 825 830 Ser Ser Ile Pro His Ser Ser His Ser Ile Ile Asn His His Ala Ser 835 840 845 Ile His Asn Glu Gly Asp Gln Pro Lys Thr Pro Glu Asn Ile Pro Ser 850 855 860 Lys Glu Pro Lys Asp Gly Ser Pro Val Gln Pro Ser Leu Leu Ser Leu 865 870 875 880 Met Lys Asp Arg Arg Leu Thr Leu Glu Gln Val Val Ala Ile Glu Ala 885 890 895 Leu Thr Gln Leu Ser Glu Ala Pro Ser Glu Asn Ser Ser Pro Ser Lys 900 905 910 Ser Glu Lys Asp Glu Glu Ser Glu Gln Arg Thr Ala Ser Leu Leu Asn 915 920 925 Ser Cys Lys Ala Ile Leu Tyr Thr Val Arg Lys Asp Leu Gln Asp Pro 930 935 940 Asn Leu Gln Gly Glu Pro Pro Lys Leu Asn His Cys Pro Ser Leu Glu 945 950 955 960 Lys Gln Ser Ser Cys Asn Thr Val Val Phe Asn Gly Gln Thr Thr Thr 965 970 975 Leu Ser Asn Ser His Ile Asn Ser Ala Thr Asn Gln Ala Ser Thr Lys 980 985 990 Ser His Glu Tyr Ser Lys Val Thr Asn Ser Leu Ser Leu Phe Ile Pro 995 1000 1005 Lys Ser Asn Ser Ser Lys Ile Asp Thr Asn Lys Ser Ile Ala Gln 1010 1015 1020 Gly Ile Ile Thr Leu Asp Asn Cys Ser Asn Asp Leu His Gln Leu 1025 1030 1035 Pro Pro Arg Asn Asn Glu Val Glu Tyr Cys Asn Gln Leu Leu Asp 1040 1045 1050 Ser Ser Lys Lys Leu Asp Ser Asp Asp Leu Ser Cys Gln Asp Ala 1055 1060 1065 Thr His Thr Gln Ile Glu Glu Asp Val Ala Thr Gln Leu Thr Gln 1070 1075 1080 Leu Ala Ser Ile Ile Lys Ile Asn Tyr Ile Lys Pro Glu Asp Lys 1085 1090 1095 Lys Val Glu Ser Thr Pro Thr Ser Leu Val Thr Cys Asn Val Gln 1100 1105 1110 Gln Lys Tyr Asn Gln Glu Lys Gly Thr Ile Gln Gln Lys Pro Pro 1115 1120 1125 Ser Ser Val His Asn Asn His Gly Ser Ser Leu Thr Lys Gln Lys 1130 1135 1140 Asn Pro Thr Gln Lys Lys Thr Lys Ser Thr Pro Ser Arg Asp Arg 1145 1150 1155 Arg Lys Lys Lys Pro Thr Val Val Ser Tyr Gln Glu Asn Asp Arg 1160 1165 1170 Gln Lys Trp Glu Lys Leu Ser Tyr Met Tyr Gly Thr Ile Cys Asp 1175 1180 1185 Ile Trp Ile Ala Ser Lys Phe Gln Asn Phe Gly Gln Phe Cys Pro 1190 1195 1200 His Asp Phe Pro Thr Val Phe Gly Lys Ile Ser Ser Ser Thr Lys 1205 1210 1215 Ile Trp Lys Pro Leu Ala Gln Thr Arg Ser Ile Met Gln Pro Lys 1220 1225 1230 Thr Val Phe Pro Pro Leu Thr Gln Ile Lys Leu Gln Arg Tyr Pro 1235 1240 1245 Glu Ser Ala Glu Glu Lys Val Lys Val Glu Pro Leu Asp Ser Leu 1250 1255 1260 Ser Leu Phe His Leu Lys Thr Glu Ser Asn Gly Lys Ala Phe Thr 1265 1270 1275 Asp Lys Ala Tyr Asn Ser Gln Val Gln Leu Thr Val Asn Ala Asn 1280 1285 1290 Gln Lys Ala His Pro Leu Thr Gln Pro Ser Ser Pro Pro Asn Gln 1295 1300 1305 Cys Ala Asn Val Met Ala Gly Asp Asp Gln Ile Arg Phe Gln Gln 1310 1315 1320 Val Val Lys Glu Gln Leu Met His Gln Arg Leu Pro Thr Leu Pro 1325 1330 1335 Gly Ile Ser His Glu Thr Pro Leu Pro Glu Ser Ala Leu Thr Leu 1340 1345 1350 Arg Asn Val Asn Val Val Cys Ser Gly Gly Ile Thr Val Val Ser 1355 1360 1365 Thr Lys Ser Glu Glu Glu Val Cys Ser Ser Ser Phe Gly Thr Ser 1370 1375 1380 Glu Phe Ser Thr Val Asp Ser Ala Gln Lys Asn Phe Asn Asp Tyr 1385 1390 1395 Ala Met Asn Phe Phe Thr Asn Pro Thr Lys Asn Leu Val Ser Ile 1400 1405 1410 Thr Lys Asp Ser Glu Leu Pro Thr Cys Ser Cys Leu Asp Arg Val 1415 1420 1425 Ile Gln Lys Asp Lys Gly Pro Tyr Tyr Thr His Leu Gly Ala Gly 1430 1435 1440 Pro Ser Val Ala Ala Val Arg Glu Ile Met Glu Asn Arg Tyr Gly 1445 1450 1455 Gln Lys Gly Asn Ala Ile Arg Ile Glu Ile Val Val Tyr Thr Gly 1460 1465 1470 Lys Glu Gly Lys Ser Ser His Gly Cys Pro Ile Ala Lys Trp Val 1475 1480 1485 Leu Arg Arg Ser Ser Asp Glu Glu Lys Val Leu Cys Leu Val Arg 1490 1495 1500 Gln Arg Thr Gly His His Cys Pro Thr Ala Val Met Val Val Leu 1505 1510 1515 Ile Met Val Trp Asp Gly Ile Pro Leu Pro Met Ala Asp Arg Leu 1520 1525 1530 Tyr Thr Glu Leu Thr Glu Asn Leu Lys Ser Tyr Asn Gly His Pro 1535 1540 1545 Thr Asp Arg Arg Cys Thr Leu Asn Glu Asn Arg Thr Cys Thr Cys 1550 1555 1560 Gln Gly Ile Asp Pro Glu Thr Cys Gly Ala Ser Phe Ser Phe Gly 1565 1570 1575 Cys Ser Trp Ser Met Tyr Phe Asn Gly Cys Lys Phe Gly Arg Ser 1580 1585 1590 Pro Ser Pro Arg Arg Phe Arg Ile Asp Pro Ser Ser Pro Leu His 1595 1600 1605 Glu Lys Asn Leu Glu Asp Asn Leu Gln Ser Leu Ala Thr Arg Leu 1610 1615 1620 Ala Pro Ile Tyr Lys Gln Tyr Ala Pro Val Ala Tyr Gln Asn Gln 1625 1630 1635 Val Glu Tyr Glu Asn Val Ala Arg Glu Cys Arg Leu Gly Ser Lys 1640 1645 1650 Glu Gly Arg Pro Phe Ser Gly Val Thr Ala Cys Leu Asp Phe Cys 1655 1660 1665 Ala His Pro His Arg Asp Ile His Asn Met Asn Asn Gly Ser Thr 1670 1675 1680 Val Val Cys Thr Leu Thr Arg Glu Asp Asn Arg Ser Leu Gly Val 1685 1690 1695 Ile Pro Gln Asp Glu Gln Leu His Val Leu Pro Leu Tyr Lys Leu 1700 1705 1710 Ser Asp Thr Asp Glu Phe Gly Ser Lys Glu Gly Met Glu Ala Lys 1715 1720 1725 Ile Lys Ser Gly Ala Ile Glu Val Leu Ala Pro Arg Arg Lys Lys 1730 1735 1740 Arg Thr Cys Phe Thr Gln Pro Val Pro Arg Ser Gly Lys Lys Arg 1745 1750 1755 Ala Ala Met Met Thr Glu Val Leu Ala His Lys Ile Arg Ala Val 1760 1765 1770 Glu Lys Lys Pro Ile Pro Arg Ile Lys Arg Lys Asn Asn Ser Thr 1775 1780 1785 Thr Thr Asn Asn Ser Lys Pro Ser Ser Leu Pro Thr Leu Gly Ser 1790 1795 1800 Asn Thr Glu Thr Val Gln Pro Glu Val Lys Ser Glu Thr Glu Pro 1805 1810 1815 His Phe Ile Leu Lys Ser Ser Asp Asn Thr Lys Thr Tyr Ser Leu 1820 1825 1830 Met Pro Ser Ala Pro His Pro Val Lys Glu Ala Ser Pro Gly Phe 1835 1840 1845 Ser Trp Ser Pro Lys Thr Ala Ser Ala Thr Pro Ala Pro Leu Lys 1850 1855 1860 Asn Asp Ala Thr Ala Ser Cys Gly Phe Ser Glu Arg Ser Ser Thr 1865 1870 1875 Pro His Cys Thr Met Pro Ser Gly Arg Leu Ser Gly Ala Asn Ala 1880 1885 1890 Ala Ala Ala Asp Gly Pro Gly Ile Ser Gln Leu Gly Glu Val Ala 1895 1900 1905 Pro Leu Pro Thr Leu Ser Ala Pro Val Met Glu Pro Leu Ile Asn 1910 1915 1920 Ser Glu Pro Ser Thr Gly Val Thr Glu Pro Leu Thr Pro His Gln 1925 1930 1935 Pro Asn His Gln Pro Ser Phe Leu Thr Ser Pro Gln Asp Leu Ala 1940 1945 1950 Ser Ser Pro Met Glu Glu Asp Glu Gln His Ser Glu Ala Asp Glu 1955 1960 1965 Pro Pro Ser Asp Glu Pro Leu Ser Asp Asp Pro Leu Ser Pro Ala 1970 1975 1980 Glu Glu Lys Leu Pro His Ile Asp Glu Tyr Trp Ser Asp Ser Glu 1985 1990 1995 His Ile Phe Leu Asp Ala Asn Ile Gly Gly Val Ala Ile Ala Pro 2000 2005 2010 Ala His Gly Ser Val Leu Ile Glu Cys Ala Arg Arg Glu Leu His 2015 2020 2025 Ala Thr Thr Pro Val Glu His Pro Asn Arg Asn His Pro Thr Arg 2030 2035 2040 Leu Ser Leu Val Phe Tyr Gln His Lys Asn Leu Asn Lys Pro Gln 2045 2050 2055 His Gly Phe Glu Leu Asn Lys Ile Lys Phe Glu Ala Lys Glu Ala 2060 2065 2070 Lys Asn Lys Lys Met Lys Ala Ser Glu Gln Lys Asp Gln Ala Ala 2075 2080 2085 Asn Glu Gly Pro Glu Gln Ser Ser Glu Val Asn Glu Leu Asn Gln 2090 2095 2100 Ile Pro Ser His Lys Ala Leu Thr Leu Thr His Asp Asn Val Val 2105 2110 2115 Thr Val Ser Pro Tyr Ala Leu Thr His Val Ala Gly Pro Tyr Asn 2120 2125 2130 His Trp Val 2135 282002PRTHomo sapiens 28Met Glu Gln Asp Arg Thr Asn His Val Glu Gly Asn Arg Leu Ser Pro 1 5 10 15 Phe Leu Ile Pro Ser Pro Pro Ile Cys Gln Thr Glu Pro Leu Ala Thr 20 25 30 Lys Leu Gln Asn Gly Ser Pro Leu Pro Glu Arg Ala His Pro Glu Val 35 40 45 Asn Gly Asp Thr Lys Trp His Ser Phe Lys Ser Tyr Tyr Gly Ile Pro 50 55 60 Cys Met Lys Gly Ser Gln Asn Ser Arg Val Ser Pro Asp Phe Thr Gln 65 70 75 80 Glu Ser Arg Gly Tyr Ser Lys Cys Leu Gln Asn Gly Gly Ile Lys Arg 85 90 95 Thr Val Ser Glu Pro Ser Leu Ser Gly Leu Leu Gln Ile Lys Lys Leu 100 105 110 Lys Gln Asp Gln Lys Ala Asn Gly Glu Arg Arg Asn Phe Gly Val Ser 115 120 125 Gln Glu Arg Asn Pro Gly Glu Ser Ser Gln Pro Asn Val Ser Asp Leu 130 135 140 Ser Asp Lys Lys Glu Ser Val Ser Ser Val Ala Gln Glu Asn Ala Val 145 150 155 160 Lys Asp Phe Thr Ser Phe Ser Thr His Asn Cys Ser Gly Pro Glu Asn 165 170 175 Pro Glu Leu Gln Ile Leu Asn Glu Gln Glu Gly Lys Ser Ala Asn Tyr 180 185 190 His Asp Lys Asn Ile Val Leu Leu Lys Asn Lys Ala Val Leu Met Pro 195 200 205 Asn Gly Ala Thr Val Ser Ala Ser Ser Val Glu His Thr His Gly Glu 210 215 220 Leu Leu Glu Lys Thr Leu Ser Gln Tyr Tyr Pro Asp Cys Val Ser Ile 225 230 235 240 Ala Val Gln Lys Thr Thr Ser His Ile Asn Ala Ile Asn Ser Gln Ala 245 250 255 Thr Asn Glu Leu Ser Cys Glu Ile Thr His Pro Ser His Thr Ser Gly 260 265 270 Gln Ile Asn Ser Ala Gln Thr Ser Asn Ser Glu Leu Pro Pro Lys Pro 275 280 285 Ala Ala Val Val Ser Glu Ala Cys Asp Ala Asp Asp Ala Asp Asn Ala 290 295 300 Ser Lys Leu Ala Ala Met Leu Asn Thr Cys Ser Phe Gln Lys Pro Glu 305 310 315 320 Gln Leu Gln Gln Gln Lys Ser Val Phe Glu Ile Cys Pro Ser Pro Ala 325 330 335 Glu Asn Asn Ile Gln Gly Thr Thr Lys Leu Ala Ser Gly Glu Glu Phe 340 345 350 Cys Ser Gly Ser Ser Ser Asn Leu Gln Ala Pro Gly Gly Ser Ser Glu 355 360 365 Arg Tyr Leu Lys Gln Asn Glu Met Asn Gly Ala Tyr Phe Lys Gln Ser 370 375 380 Ser Val Phe Thr Lys Asp Ser Phe Ser Ala Thr Thr Thr Pro Pro Pro 385 390 395 400 Pro Ser Gln Leu Leu Leu Ser Pro Pro Pro Pro Leu Pro Gln Val Pro 405 410 415 Gln Leu Pro Ser Glu Gly Lys Ser Thr Leu Asn Gly Gly Val Leu Glu 420 425 430 Glu His His His Tyr Pro Asn Gln Ser Asn Thr Thr Leu Leu Arg Glu 435 440 445 Val Lys Ile Glu Gly Lys Pro Glu Ala Pro Pro Ser Gln Ser Pro Asn 450 455 460 Pro Ser Thr His Val Cys Ser Pro Ser Pro Met Leu Ser Glu Arg Pro 465 470 475 480 Gln Asn Asn Cys Val Asn Arg Asn Asp Ile Gln Thr Ala Gly Thr Met 485 490 495 Thr Val Pro Leu Cys Ser Glu Lys Thr Arg Pro Met Ser Glu His Leu 500 505 510 Lys His Asn Pro Pro Ile Phe Gly Ser Ser Gly Glu Leu Gln Asp Asn 515 520 525 Cys Gln Gln Leu Met Arg Asn Lys Glu Gln Glu Ile Leu Lys Gly Arg 530 535 540 Asp Lys Glu Gln Thr Arg Asp Leu Val Pro Pro Thr Gln His Tyr Leu 545 550 555 560 Lys Pro Gly Trp Ile Glu Leu Lys Ala Pro Arg Phe His Gln Ala Glu 565 570 575 Ser His Leu Lys Arg Asn Glu Ala Ser Leu Pro Ser Ile Leu Gln Tyr 580 585 590 Gln Pro Asn Leu Ser Asn Gln Met Thr Ser Lys Gln Tyr Thr Gly Asn 595 600 605 Ser Asn Met Pro Gly Gly Leu Pro Arg Gln Ala Tyr Thr Gln Lys Thr 610 615 620 Thr Gln Leu Glu His Lys Ser Gln Met Tyr Gln Val Glu Met Asn Gln 625 630 635 640 Gly Gln Ser Gln Gly Thr Val Asp Gln His Leu Gln Phe Gln Lys Pro 645 650 655 Ser His Gln Val His Phe Ser Lys Thr Asp His Leu Pro Lys Ala His 660 665 670 Val Gln Ser Leu Cys Gly Thr Arg Phe His Phe Gln Gln Arg Ala Asp 675 680 685 Ser Gln Thr Glu Lys Leu Met Ser Pro Val Leu Lys Gln His Leu Asn 690 695 700 Gln Gln Ala Ser Glu Thr Glu Pro Phe Ser Asn Ser His Leu Leu Gln 705 710 715 720 His Lys Pro His Lys Gln Ala Ala Gln Thr Gln Pro Ser Gln Ser Ser 725 730 735 His Leu Pro Gln Asn Gln Gln Gln Gln Gln Lys Leu Gln Ile Lys Asn 740 745

750 Lys Glu Glu Ile Leu Gln Thr Phe Pro His Pro Gln Ser Asn Asn Asp 755 760 765 Gln Gln Arg Glu Gly Ser Phe Phe Gly Gln Thr Lys Val Glu Glu Cys 770 775 780 Phe His Gly Glu Asn Gln Tyr Ser Lys Ser Ser Glu Phe Glu Thr His 785 790 795 800 Asn Val Gln Met Gly Leu Glu Glu Val Gln Asn Ile Asn Arg Arg Asn 805 810 815 Ser Pro Tyr Ser Gln Thr Met Lys Ser Ser Ala Cys Lys Ile Gln Val 820 825 830 Ser Cys Ser Asn Asn Thr His Leu Val Ser Glu Asn Lys Glu Gln Thr 835 840 845 Thr His Pro Glu Leu Phe Ala Gly Asn Lys Thr Gln Asn Leu His His 850 855 860 Met Gln Tyr Phe Pro Asn Asn Val Ile Pro Lys Gln Asp Leu Leu His 865 870 875 880 Arg Cys Phe Gln Glu Gln Glu Gln Lys Ser Gln Gln Ala Ser Val Leu 885 890 895 Gln Gly Tyr Lys Asn Arg Asn Gln Asp Met Ser Gly Gln Gln Ala Ala 900 905 910 Gln Leu Ala Gln Gln Arg Tyr Leu Ile His Asn His Ala Asn Val Phe 915 920 925 Pro Val Pro Asp Gln Gly Gly Ser His Thr Gln Thr Pro Pro Gln Lys 930 935 940 Asp Thr Gln Lys His Ala Ala Leu Arg Trp His Leu Leu Gln Lys Gln 945 950 955 960 Glu Gln Gln Gln Thr Gln Gln Pro Gln Thr Glu Ser Cys His Ser Gln 965 970 975 Met His Arg Pro Ile Lys Val Glu Pro Gly Cys Lys Pro His Ala Cys 980 985 990 Met His Thr Ala Pro Pro Glu Asn Lys Thr Trp Lys Lys Val Thr Lys 995 1000 1005 Gln Glu Asn Pro Pro Ala Ser Cys Asp Asn Val Gln Gln Lys Ser 1010 1015 1020 Ile Ile Glu Thr Met Glu Gln His Leu Lys Gln Phe His Ala Lys 1025 1030 1035 Ser Leu Phe Asp His Lys Ala Leu Thr Leu Lys Ser Gln Lys Gln 1040 1045 1050 Val Lys Val Glu Met Ser Gly Pro Val Thr Val Leu Thr Arg Gln 1055 1060 1065 Thr Thr Ala Ala Glu Leu Asp Ser His Thr Pro Ala Leu Glu Gln 1070 1075 1080 Gln Thr Thr Ser Ser Glu Lys Thr Pro Thr Lys Arg Thr Ala Ala 1085 1090 1095 Ser Val Leu Asn Asn Phe Ile Glu Ser Pro Ser Lys Leu Leu Asp 1100 1105 1110 Thr Pro Ile Lys Asn Leu Leu Asp Thr Pro Val Lys Thr Gln Tyr 1115 1120 1125 Asp Phe Pro Ser Cys Arg Cys Val Glu Gln Ile Ile Glu Lys Asp 1130 1135 1140 Glu Gly Pro Phe Tyr Thr His Leu Gly Ala Gly Pro Asn Val Ala 1145 1150 1155 Ala Ile Arg Glu Ile Met Glu Glu Arg Phe Gly Gln Lys Gly Lys 1160 1165 1170 Ala Ile Arg Ile Glu Arg Val Ile Tyr Thr Gly Lys Glu Gly Lys 1175 1180 1185 Ser Ser Gln Gly Cys Pro Ile Ala Lys Trp Val Val Arg Arg Ser 1190 1195 1200 Ser Ser Glu Glu Lys Leu Leu Cys Leu Val Arg Glu Arg Ala Gly 1205 1210 1215 His Thr Cys Glu Ala Ala Val Ile Val Ile Leu Ile Leu Val Trp 1220 1225 1230 Glu Gly Ile Pro Leu Ser Leu Ala Asp Lys Leu Tyr Ser Glu Leu 1235 1240 1245 Thr Glu Thr Leu Arg Lys Tyr Gly Thr Leu Thr Asn Arg Arg Cys 1250 1255 1260 Ala Leu Asn Glu Glu Arg Thr Cys Ala Cys Gln Gly Leu Asp Pro 1265 1270 1275 Glu Thr Cys Gly Ala Ser Phe Ser Phe Gly Cys Ser Trp Ser Met 1280 1285 1290 Tyr Tyr Asn Gly Cys Lys Phe Ala Arg Ser Lys Ile Pro Arg Lys 1295 1300 1305 Phe Lys Leu Leu Gly Asp Asp Pro Lys Glu Glu Glu Lys Leu Glu 1310 1315 1320 Ser His Leu Gln Asn Leu Ser Thr Leu Met Ala Pro Thr Tyr Lys 1325 1330 1335 Lys Leu Ala Pro Asp Ala Tyr Asn Asn Gln Ile Glu Tyr Glu His 1340 1345 1350 Arg Ala Pro Glu Cys Arg Leu Gly Leu Lys Glu Gly Arg Pro Phe 1355 1360 1365 Ser Gly Val Thr Ala Cys Leu Asp Phe Cys Ala His Ala His Arg 1370 1375 1380 Asp Leu His Asn Met Gln Asn Gly Ser Thr Leu Val Cys Thr Leu 1385 1390 1395 Thr Arg Glu Asp Asn Arg Glu Phe Gly Gly Lys Pro Glu Asp Glu 1400 1405 1410 Gln Leu His Val Leu Pro Leu Tyr Lys Val Ser Asp Val Asp Glu 1415 1420 1425 Phe Gly Ser Val Glu Ala Gln Glu Glu Lys Lys Arg Ser Gly Ala 1430 1435 1440 Ile Gln Val Leu Ser Ser Phe Arg Arg Lys Val Arg Met Leu Ala 1445 1450 1455 Glu Pro Val Lys Thr Cys Arg Gln Arg Lys Leu Glu Ala Lys Lys 1460 1465 1470 Ala Ala Ala Glu Lys Leu Ser Ser Leu Glu Asn Ser Ser Asn Lys 1475 1480 1485 Asn Glu Lys Glu Lys Ser Ala Pro Ser Arg Thr Lys Gln Thr Glu 1490 1495 1500 Asn Ala Ser Gln Ala Lys Gln Leu Ala Glu Leu Leu Arg Leu Ser 1505 1510 1515 Gly Pro Val Met Gln Gln Ser Gln Gln Pro Gln Pro Leu Gln Lys 1520 1525 1530 Gln Pro Pro Gln Pro Gln Gln Gln Gln Arg Pro Gln Gln Gln Gln 1535 1540 1545 Pro His His Pro Gln Thr Glu Ser Val Asn Ser Tyr Ser Ala Ser 1550 1555 1560 Gly Ser Thr Asn Pro Tyr Met Arg Arg Pro Asn Pro Val Ser Pro 1565 1570 1575 Tyr Pro Asn Ser Ser His Thr Ser Asp Ile Tyr Gly Ser Thr Ser 1580 1585 1590 Pro Met Asn Phe Tyr Ser Thr Ser Ser Gln Ala Ala Gly Ser Tyr 1595 1600 1605 Leu Asn Ser Ser Asn Pro Met Asn Pro Tyr Pro Gly Leu Leu Asn 1610 1615 1620 Gln Asn Thr Gln Tyr Pro Ser Tyr Gln Cys Asn Gly Asn Leu Ser 1625 1630 1635 Val Asp Asn Cys Ser Pro Tyr Leu Gly Ser Tyr Ser Pro Gln Ser 1640 1645 1650 Gln Pro Met Asp Leu Tyr Arg Tyr Pro Ser Gln Asp Pro Leu Ser 1655 1660 1665 Lys Leu Ser Leu Pro Pro Ile His Thr Leu Tyr Gln Pro Arg Phe 1670 1675 1680 Gly Asn Ser Gln Ser Phe Thr Ser Lys Tyr Leu Gly Tyr Gly Asn 1685 1690 1695 Gln Asn Met Gln Gly Asp Gly Phe Ser Ser Cys Thr Ile Arg Pro 1700 1705 1710 Asn Val His His Val Gly Lys Leu Pro Pro Tyr Pro Thr His Glu 1715 1720 1725 Met Asp Gly His Phe Met Gly Ala Thr Ser Arg Leu Pro Pro Asn 1730 1735 1740 Leu Ser Asn Pro Asn Met Asp Tyr Lys Asn Gly Glu His His Ser 1745 1750 1755 Pro Ser His Ile Ile His Asn Tyr Ser Ala Ala Pro Gly Met Phe 1760 1765 1770 Asn Ser Ser Leu His Ala Leu His Leu Gln Asn Lys Glu Asn Asp 1775 1780 1785 Met Leu Ser His Thr Ala Asn Gly Leu Ser Lys Met Leu Pro Ala 1790 1795 1800 Leu Asn His Asp Arg Thr Ala Cys Val Gln Gly Gly Leu His Lys 1805 1810 1815 Leu Ser Asp Ala Asn Gly Gln Glu Lys Gln Pro Leu Ala Leu Val 1820 1825 1830 Gln Gly Val Ala Ser Gly Ala Glu Asp Asn Asp Glu Val Trp Ser 1835 1840 1845 Asp Ser Glu Gln Ser Phe Leu Asp Pro Asp Ile Gly Gly Val Ala 1850 1855 1860 Val Ala Pro Thr His Gly Ser Ile Leu Ile Glu Cys Ala Lys Arg 1865 1870 1875 Glu Leu His Ala Thr Thr Pro Leu Lys Asn Pro Asn Arg Asn His 1880 1885 1890 Pro Thr Arg Ile Ser Leu Val Phe Tyr Gln His Lys Ser Met Asn 1895 1900 1905 Glu Pro Lys His Gly Leu Ala Leu Trp Glu Ala Lys Met Ala Glu 1910 1915 1920 Lys Ala Arg Glu Lys Glu Glu Glu Cys Glu Lys Tyr Gly Pro Asp 1925 1930 1935 Tyr Val Pro Gln Lys Ser His Gly Lys Lys Val Lys Arg Glu Pro 1940 1945 1950 Ala Glu Pro His Glu Thr Ser Glu Pro Thr Tyr Leu Arg Phe Ile 1955 1960 1965 Lys Ser Leu Ala Glu Arg Thr Met Ser Val Thr Thr Asp Ser Thr 1970 1975 1980 Val Thr Thr Ser Pro Tyr Ala Phe Thr Arg Val Thr Gly Pro Tyr 1985 1990 1995 Asn Arg Tyr Ile 2000 29198PRTHomo sapiens 29Met His His Arg Asn Asp Ser Gln Arg Leu Gly Lys Ala Gly Cys Pro 1 5 10 15 Pro Glu Pro Ser Leu Gln Met Ala Asn Thr Asn Phe Leu Ser Thr Leu 20 25 30 Ser Pro Glu His Cys Arg Pro Leu Ala Gly Glu Cys Met Asn Lys Leu 35 40 45 Lys Cys Gly Ala Ala Glu Ala Glu Ile Met Asn Leu Pro Glu Arg Val 50 55 60 Gly Thr Phe Ser Ala Ile Pro Ala Leu Gly Gly Ile Ser Leu Pro Pro 65 70 75 80 Gly Val Ile Val Met Thr Ala Leu His Ser Pro Ala Ala Ala Ser Ala 85 90 95 Ala Val Thr Asp Ser Ala Phe Gln Ile Ala Asn Leu Ala Asp Cys Pro 100 105 110 Gln Asn His Ser Ser Ser Ser Ser Ser Ser Ser Gly Gly Ala Gly Gly 115 120 125 Ala Asn Pro Ala Lys Lys Lys Arg Lys Arg Cys Gly Val Cys Val Pro 130 135 140 Cys Lys Arg Leu Ile Asn Cys Gly Val Cys Ser Ser Cys Arg Asn Arg 145 150 155 160 Lys Thr Gly His Gln Ile Cys Lys Phe Arg Lys Cys Glu Glu Leu Lys 165 170 175 Lys Lys Pro Gly Thr Ser Leu Glu Arg Thr Pro Val Pro Ser Ala Glu 180 185 190 Ala Phe Arg Trp Phe Phe 195 3019DNAMus musculus 30gaacggcatc aaggtgaac 193119DNAMus musculus 31gttcaccttg atgccgttc 193260DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 32gatccccgaa cggcatcaag gtgaacttca agagagttca ccttgatgcc gttcttttta 603360DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 33agcttaaaaa gaacggcatc aaggtgaact ctcttgaagt tcaccttgat gccgttcggg 603419DNAMus musculus 34caacttgcat ccacgatta 193519DNAMus musculus 35taatcgtgga tgcaagttg 193660DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 36gatcccccaa cttgcatcca cgattattca agagataatc gtggatgcaa gttgttttta 603760DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 37agcttaaaaa caacttgcat ccacgattat ctcttgaata atcgtggatg caagttgggg 603819DNAMus musculus 38gaattacagt tgttacgga 193919DNAMus musculus 39tccgtaacaa ctgtaattc 194060DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 40gatccccgaa ttacagttgt tacggattca agagatccgt aacaactgta attcttttta 604160DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 41agcttaaaaa gaattacagt tgttacggat ctcttgaatc cgtaacaact gtaattcggg 604219DNAMus musculus 42cgtagaatat gtacctggt 194319DNAMus musculus 43accaggtaca tattctacg 194460DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 44gatcccccgt agaatatgta cctggtttca agagaaccag gtacatattc tacgttttta 604560DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 45agcttaaaaa cgtagaatat gtacctggtt ctcttgaaac caggtacata ttctacgggg 604619DNAMus musculus 46gaaagcagct cgaaagcgt 194719DNAMus musculus 47acgctttcga gctgctttc 194860DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 48gatccccgaa agcagctcga aagcgtttca agagaacgct ttcgagctgc tttcttttta 604960DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 49agcttaaaaa gaaagcagct cgaaagcgtt ctcttgaaac gctttcgagc tgctttcggg 605019DNAMus musculus 50actactaact ccaccctaa 195119DNAMus musculus 51ttagggtgga gttagtagt 195260DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 52gatccccact actaactcca ccctaattca agagattagg gtggagttag tagtttttta 605360DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 53agcttaaaaa actactaact ccaccctaat ctcttgaatt agggtggagt tagtagtggg 605419DNAMus musculus 54gaaggatgtg gttcgagta 195519DNAMus musculus 55tactcgaacc acatccttc 195660DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 56gatccccgaa ggatgtggtt cgagtattca agagatactc gaaccacatc cttcttttta 605760DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 57agcttaaaaa gaaggatgtg gttcgagtat ctcttgaata ctcgaaccac atccttcggg 605819DNAMus musculus 58gaactattct tgcttacaa 195919DNAMus musculus 59ttgtaagcaa gaatagttc 196060DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 60gatccccgaa ctattcttgc ttacaattca agagattgta agcaagaata gttcttttta 606160DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 61agcttaaaaa gaactattct tgcttacaat ctcttgaatt gtaagcaaga atagttcggg 606219DNAMus musculus 62gaaggagcac ccggattat 196319DNAMus musculus 63ataatccggg tgctccttc 196460DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 64gatccccgaa ggagcacccg gattatttca agagaataat ccgggtgctc cttcttttta 606560DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 65agcttaaaaa gaaggagcac ccggattatt ctcttgaaat aatccgggtg ctccttcggg 606621DNAMus musculus 66gcgtagaata tgtaactggt a 216721DNAMus musculus 67taccagttac atattctacg c 216858DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 68ccgggcgtag aatatgtaac tggtactcga gtaccagtta catattctac gctttttg 586958DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 69aattcaaaaa gcgtagaata tgtaactggt actcgagtac cagttacata ttctacgc 587019DNAMus musculus 70gaaagcagct cgaaagcgt 197119DNAMus musculus 71actactaact ccaccctaa 197221DNAMus musculus 72agaaagcagc tcgaaagcgt t 217321DNAMus musculus 73aacgctttcg agctgctttc t 217458DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 74ccggagaaag cagctcgaaa gcgttctcga gaacgctttc gagctgcttt cttttttg 587558DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 75aattcaaaaa agaaagcagc tcgaaagcgt tctcgagaac gctttcgagc tgctttct 587621DNAMus musculus 76cactactaac tccaccctaa a 217721DNAMus musculus 77tttagggtgg agttagtagt g

217858DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 78ccggcactac taactccacc ctaaactcga gtttagggtg gagttagtag tgtttttg 587958DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 79aattcaaaaa cactactaac tccaccctaa actcgagttt agggtggagt tagtagtg 588021DNAMus musculus 80gcagctggtt tatggtgatt t 218121DNAMus musculus 81aaatcaccat aaaccagctg c 218258DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 82ccgggcagct ggtttatggt gatttctcga gaaatcacca taaaccagct gctttttg 588358DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 83aattcaaaaa gcagctggtt tatggtgatt tctcgagaaa tcaccataaa ccagctgc 588419DNAMus musculus 84caacttgcat ccacgatta 198522DNAMus musculus 85cccaacttgc atccacgatt aa 228697DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 86tgctgttgac agtgagcgac caacttgcat ccacgattaa tagtgaagcc acagatgtat 60taatcgtgga tgcaagttgg gtgcctactg cctcgga 978719DNAMus musculus 87gaattacagt tgttacgga 198822DNAMus musculus 88tggaattaca gttgttacgg ag 228997DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 89tgctgttgac agtgagcgcg gaattacagt tgttacggag tagtgaagcc acagatgtac 60tccgtaacaa ctgtaattcc atgcctactg cctcgga 979019DNAMus musculus 90gaaagcagct cgaaagcgt 199122DNAMus musculus 91aagaaagcag ctcgaaagcg tt 229297DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 92tgctgttgac agtgagcgca gaaagcagct cgaaagcgtt tagtgaagcc acagatgtaa 60acgctttcga gctgctttct ttgcctactg cctcgga 979319DNAMus musculus 93actactaact ccaccctaa 199422DNAMus musculus 94tcactactaa ctccacccta aa 229597DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 95tgctgttgac agtgagcgcc actactaact ccaccctaaa tagtgaagcc acagatgtat 60ttagggtgga gttagtagtg atgcctactg cctcgga 979621DNAMus musculus 96gcgtagaata tgtacctggt a 219721DNAMus musculus 97taccaggtac atattctacg c 219897DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 98tgctgttgac agtgagcgag cgtagaatat gtacctggta tagtgaagcc acagatgtat 60accaggtaca tattctacgc gtgcctactg cctcgga 979921DNAMus musculus 99gcacgaagcg tatggataca a 2110021DNAMus musculus 100ttgtatccat acgcttcgtg c 2110197DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 101tgctgttgac agtgagcgcg cacgaagcgt atggatacaa tagtgaagcc acagatgtat 60tgtatccata cgcttcgtgc ttgcctactg cctcgga 9710223DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotidemodified_base(3)..(21)a, c, g, t, unknown or other 102aannnnnnnn nnnnnnnnnn ntt 23