Engineered E2 For Increasing The Content Of Free LYS11-Linked Ubiquitin Komander; David ; et al. [Bremm; Anja]

Engineered E2 For Increasing The Content Of Free LYS11-Linked Ubiquitin

Komander; David ; et al.

Patent Application Summary

U.S. patent application number 14/276325 was filed with the patent office on 2014-10-09 for engineered e2 for increasing the content of free lys11-linked ubiquitin. This patent application is currently assigned to MEDICAL RESEARCH COUNCIL. The applicant listed for this patent is Anja Bremm, David Komander. Invention is credited to Anja Bremm, David Komander.

Application Number	20140302582 14/276325
Document ID	/
Family ID	42315017
Filed Date	2014-10-09

United States Patent Application	20140302582
Kind Code	A1
Komander; David ; et al.	October 9, 2014

Engineered E2 For Increasing The Content Of Free LYS11-Linked Ubiquitin

Abstract

The invention provides a chimeric E2 enzyme comprising a Ubc domain fused to a heterologous ubiquitin binding domain (UBD). The chimeric enzymes of the invention may be useful in producing elevated levels of free polyubiquitin.

Inventors:

Komander; David; (Cambridge, GB) ; Bremm; Anja; (Cambridge, GB)

Applicant:

Name	City	State	Country	Type
Komander; David Bremm; Anja	Cambridge Cambridge		GB GB

Assignee:

MEDICAL RESEARCH COUNCIL
London
GB

Family ID:

42315017

Appl. No.:

14/276325

Filed:

May 13, 2014

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
13670594	Nov 7, 2012	8765406
14276325
PCT/GB2011/000704	May 6, 2011
13670594
61333145	May 10, 2010

Current U.S. Class:	435/188 ; 435/212
Current CPC Class:	C12N 9/96 20130101; C07K 2319/70 20130101; C12N 9/93 20130101; C12N 9/485 20130101; C12Y 304/19012 20130101; C12N 9/48 20130101; C07K 2319/95 20130101; C12Y 603/02019 20130101
Class at Publication:	435/188 ; 435/212
International Class:	C12N 9/96 20060101 C12N009/96; C12N 9/48 20060101 C12N009/48

Foreign Application Data

Date	Code	Application Number
May 7, 2010	GB	1007704.8

Claims

1. An E2 enzyme comprising a Ubc domain, from which an N-terminal tail or a C-terminal tail has been removed.

2. An E2 enzyme according to claim 1, which is a chimeric enzyme wherein the Ubc is fused to a heterologous ubiquitin-binding domain (UBD).

3. A chimeric E2 enzyme according to claim 2, wherein the UBD is C-terminal to the Ubc domain.

4. A chimeric E2 enzyme according to claim 2, wherein the UBD is an .alpha.-helical, zinc finger or pleckstrin homology domain.

5. A chimeric E2 enzyme according to claim 2, wherein the UBD is a domain selected from the group consisting of UIM, IUIM (MIU), DUIM, UBM, UBA, GAT, CUE, VHS, UBZ, NZF, ZnF A20, ZnF UBP (PAZ), PRU, GLUE, UEV, UBC, SH3, PFU and Jab1/MNP domains.

6. A chimeric E2 enzyme according to claim 4, wherein the UBD is derived from Isopeptidase T.

7. A chimeric E2 enzyme according to claim 6, wherein the UBD comprises the sequence from about position 163 to about position 291 of Isopeptidase T.

8. A chimeric E2 enzyme according to claim 4, wherein the UBD is a UBA, UIM, ZnF or NZF domain.

9. An E2 enzyme according to claim 1, wherein the Ubc domain is derived from an E2 enzyme selected from the group consisting of UBE2A, UBE2B, UBE2C, UBE2D1, UBE2D2, UBE2D3, UBE2D4, UBE2E1, UBE2E2, UBE2E3, UBE2F, UBE2G1, UBE2G2, UBE2H, UBE2I, UBE2J1, UBE2J2, UBE2K, UBE2L3, UBE2L6, UBE2M, UBE2N, UBE2NL, UBE2O, UBE2Q1, UBE2Q2, UBE2R1, UBE2R2, UBE2S, UBE2T, UBE2U, UBE2W, UBE2Z and BIRC6.

10. An E2 enzyme according to claim 9, wherein the E2 enzyme is a class II E2 enzyme.

11. An E2 enzyme according to claim 10, wherein an N-terminal or a C-terminal amino acid tail on the class II E2 enzyme is replaced by the UBD.

12. An E2 enzyme according to claim 10 or claim 11, wherein the Ubc domain is derived from UBE2S.

13. An E2 enzyme according to claim 12, wherein the Ubc domain comprises residues 1 to 156 of UBE2S.

14. A method for increasing the capacity of an E2 enzyme to produce free polyubiquitin chains in solution, comprising conjugating fusing the Ubc domain of said E2 enzyme to a UBD.

15. A method according to claim 14, wherein the E2 enzyme is selected from the group consisting of UBE2A, UBE2B, UBE2C, UBE2D1, UBE2D2, UBE2D3, UBE2D4, UBE2E1, UBE2E2, UBE2E3, UBE2F, UBE2G1, UBE2G2, UBE2H, UBE21, UBE2J1, UBE2J2, UBE2K, UBE2L3, UBE2L6, UBE2M, UBE2N, UBE2NL, UBE20, UBE2Q1, UBE2Q2, UBE2R1, UBE2R2, UBE2S, UBE2T, UBE2U, UBE2V1, UBE2V2, UBE2V3, UBE2W, UBE2Z, AKTIP and BIRC6 and the UBD is a domain selected from the group consisting of UIM, IUIM (MIU), DUIM, UBM, UBA, GAT, CUE, VHS, UBZ, NZF, A20-like ZnF, ZnF UBP (PAZ), PRU, GLUE, UEV, UBC, SH3, PFU and Jab1/MNP domains.

16. A method according to claim 15, wherein the E2 enzyme is UBE2S.

17. A method according to claim 14, wherein the UBD is a ZnF UBP domain.

18. A method for producing free polyubiquitin chains linked through a desired lysine residue, comprising the steps of: (a) selecting an E2 enzyme which possesses the desired lysine residue specificity; (b) fusing the Ubc catalytic domain of said E2 enzyme to a UBD ubiquitin binding domain; and incubating the resulting chimeric protein with an E1 ubiquitin activating enzyme and monomeric ubiquitin.

19. A method according to claim 18, wherein the incidence of undesired lysine linkages is reduced by including a linkage-specific deubiquitinase in the incubation.

Description

RELATED APPLICATIONS AND INCORPORATION BY REFERENCE

[0001] This application is divisional of U.S. Ser. No. 13/670,594 filed Nov. 7, 2012, which is a continuation-in-part application of international patent application Serial No. PCT/GB2011/000704 filed 6 May 2011, which published as PCT Publication No. WO 2011/138593 on 10 Nov. 2011, which claims benefit of GB patent application Serial No. 1007704.8 filed 7 May 2010 and U.S. provisional patent application Ser. No. 61/333,145 filed 10 May 2010.

[0002] The foregoing applications, and all documents cited therein or during their prosecution ("appln cited documents") and all documents cited or referenced in the appln cited documents, and all documents cited or referenced herein ("herein cited documents"), and all documents cited or referenced in herein cited documents, together with any manufacturer's instructions, descriptions, product specifications, and product sheets for any products mentioned herein or in any document incorporated by reference herein, are hereby incorporated herein by reference, and may be employed in the practice of the invention. More specifically, all referenced documents are incorporated by reference to the same extent as if each individual document was specifically and individually indicated to be incorporated by reference.

FIELD OF THE INVENTION

[0003] The present invention relates to engineered E2 ubiquitin conjugating enzymes. In particular, the invention relates to chimeric E2 enzymes which are fused to a ubiquitin binding domain (UBD). The fusion is engineered by replacing the c-terminal tail of a class II E2 enzyme with a UBD, such that the Ubiquitin conjugating (Ubc) catalytic domain is fused to the UBD. This modification increases the efficiency of ubiquitin polymerisation by E2 enzymes, and facilitates isolation of specific forms of polyubiquitin.

BACKGROUND OF THE INVENTION

[0004] Protein ubiquitination is a versatile posttranslational modification with roles in protein degradation, cell signaling, intracellular trafficking and the DNA damage response (Chen and Sun, 2009; Komander, 2009). Ubiquitin polymers are linked through one of seven internal lysine (K) residues or through the N-terminal amino group. Importantly, the type of ubiquitin linkage determines the functional outcome of the modification (Komander, 2009). The best-studied ubiquitin polymers, K48- and K63-linked chains, have degradative and non-degradative roles, respectively (Chen and Sun, 2009; Hershko and Ciechanover, 1998). However, recent data has revealed an unexpected high abundance of so-called atypical ubiquitin chains; for example, K11 linkages have been found to be as abundant as K48-linkages in S. cerevisiae (Peng et al., 2003; Xu et al., 2009).

[0005] Polyubiquitin chains are assembled on substrates through the concerted action of a three-step enzymatic cascade, involving an E1 ubiquitin activating enzyme, an E2 ubiquitin conjugating enzyme, and E3 ubiquitin ligases (Dye and Schulman, 2007). While E3 ligases attach polyubiquitin chains to a target and thus confer substrate specificity, E2 enzymes are thought to determine the type of chain linkage in polyubiquitin chains. K48- and K63-specific E2 enzymes have been identified (Chen and Pickart, 1990; Hofmann and Pickart, 1999), which allowed structural analysis of these chain types as well as a detailed understanding of specificity of ubiquitin binding domains (UBDs) and deubiquitinases (DUBs) (reviewed in Komander, 2009). This information is currently lacking for atypical ubiquitin chains.

[0006] Several recent reports have implicated K11-linked ubiquitin chains in distinct biological processes. Early data indicated that K11-linked chains are proteasomal degradation signals (Baboshina and Haas, 1996). An E2 enzyme, UBE2S/E2-EPF, was identified that assembled K11 linkages in vitro (Baboshina and Haas, 1996). The human anaphase promoting complex (APC/C) was found to assemble K11 linkages using the E2 enzyme UBE2C/UbcH10, on proteins that need to be degraded for cell cycle progression (Jin et al., 2008). A yeast proteomics study, apart from having revealed the high abundance of K11 linkages, also implicated this chain type with endoplasmic reticulum-associated degradation (ERAD), and identified yeast Ubc6 as an E2 enzyme involved in synthesis of K11-linked chains (Xu et al., 2009). In mammalian cells, K11 linkages were found to be enriched in UBA/UBX protein complexes, which interact with the key ERAD regulator p97/cdc48 (Alexandru et al., 2008). Hence, K11-linked chains seem to regulate numerous important cellular processes, and may act as a distinct proteasomal degradation signal. However, cellular mechanisms of assembly and disassembly of K11 linkages, as well as structural determinants for K11 linkage recognition, are unknown.

[0007] The structure of E2 enzymes is well characterised. All E2 enzymes comprise a conserved domain of about 16 kD (the Ubc domain) which contains the Ubc motif, [FYWLS]-H-[PC]-[NH]-[LIV]-x(3,4)-G-x-[LIV]-C-[LIV]-x-[LIV]. The Ubc domain contains a conserved cysteine residue, which accepts ubiquitin from the ubiquitin-activating enzyme E1 to form a thiol ester. Substitution of the conserved cysteine abolishes E2 activity. A suggested motif rich in basic residues is found at the N-terminus of the UBC domain which may be involved in E1 binding.

[0008] E2 enzymes can be classified on the basis of their structure into three classes.

[0009] Class I: these proteins comprise simply the "Ubc" catalytic domain. In vitro these enzymes are very poor at transferring ubiquitin to proteins on their own, and probably require an E3 to aid this in vivo. UBC 4 and 5 of S. cerevisiae, UBC1 of Arabidopsis thaliana, and human UBE2D1, UBE2D2, UBE2D3 or UBE2D4 are examples of this class of E2, and are known to be important in the ubiquitination of many short-lived and abnormal proteins prior to degradation.

[0010] Class II: these enzymes contain a C-terminal tail attached to the Ubc domain. The tails are different in type but very acidic tails, as found in Ubc2 (also known as Rad6) of S. cerevisiae, appear to mediate interaction with protein substrates, in this case with the basic histones. Ubc2/Rad6 will ubiquitinate histones in vitro, which requires the C-terminal tail and is known to be involved in DNA repair. This may be a form of ubiquitination that results in protein modification but not degradation. Other C-terminal tails appear to be involved in E2 localisation. Ubc6 of S. cerevisiae is found anchored to the ER membrane with the active site facing the cytosol. The 95 residue C-terminal tail of Ubc6 includes a hydrophobic signal-anchor sequence.

[0011] Class III: N-terminal extensions are present in this class of E2s. Several enzymes of this class have been identified but the function of the extensions is unknown.

[0012] Ubiquitin binding domains are modular protein elements that bind non-covalently to ubiquitin. They are typically small, being 20 to 150 amino acids in length, and independently-folded, making their isolation straight forward. They are based on a number of different ubiquitin binding motifs. The Ubc of E2 enzymes is one class of ubiquitin binding domain (UBD). Other classes include .alpha.-helical domains, zinc finger domains (ZnFs) and plekstrin homology (PH) domains. See, for example, Dikic et al., 2009. Many UBDs are known in the art; for example, see Table 1 in Dikic et al., page 663.

[0013] Isopeptidase T (IsoT, or USP5) contains a ZnF-type UBD (known as ZnF UBP or PAZ domain) between amino acid positions 163 and 291 (see Reyes-Turcu et al., 2006). HDAC6 (Boyault et al., 2006) also contains a ZnF UBP domain. Other zinc finger ubiquitin binding domains include UBZ domains, as contained in polymerase-h and polymerase-k; NZF and A20-like ZnF domains.

[0014] Alpha-helical types of domains include, for example, UBA domains, found in Rad23 and R23A proteins, or ubiquitin interacting motifs (UIM, MIU or dUIM); see Dikic et al., 2009.

[0015] The study of the ubiquitin system requires the ability to produce unattached polymeric ubiquitin in solution, for structural and functional analysis. As noted above, ubiquitin chains vary according to which of the 7 internal Lys residues is used for concatenation of the ubiquitin molecules. In absence of a E3 ubiquitin ligase, most E2 enzymes fail to assemble polyubiquitin. Class II E2 enzymes can assemble polyubiquitin chains on their own C-terminal tails. Very few E2 enzymes, including UBE2R2/cdc34, UBE2K and UBE2S produce free, i.e. unattached, polyubiquitin in solution. For instance, UBE2S, which assembles K-11 linked polyubiquitin, is inefficient at producing free ubiquitin multimers in solution, producing only small amounts of free ubiquitin dimers. There is a need, therefore, for improved E2 enzymes that can be used to produce free polyubiquitin in solution.

[0016] Citation or identification of any document in this application is not an admission that such document is available as prior art to the present invention.

SUMMARY OF THE INVENTION

[0017] Applicants have analyzed the K11-specific E2 enzyme UBE2S that assembles K11-linked chains on its own C-terminal tail in vitro, and also generates limited amounts of free, i.e. unattached, K11-linked diubiquitin. By removing the C-terminal tail, Applicants have engineered an E2 enzyme that produces free K-1 linked diubiquitin. Furthermore, by replacing the C-terminal tail of the E2 with a UBD, Applicants have engineered a UBE2S fusion protein that synthesizes free K11-linked polymers, including trimers and tetramers, with markedly increased efficiency, allowing high-level purification of K11-linked ubiquitin dimers, trimers and tetramers, and facilitating structural studies.

[0018] In a first aspect, therefore, there is provided an E2 enzyme comprising a Ubc domain, from which an N-terminal or a C-terminal tail has been removed.

[0019] In a preferred embodiment, the Ubc domain is fused to a heterologous ubiquitin binding domain (UBD).

[0020] Preferably, the UBD is C-terminal to the Ubc domain. In class II E2 enzymes, a C-terminal amino acid extension is present, which is partly replaced by the UBD. Some E2 enzymes, such as class Ill enzymes, have an N-terminal tail which may be removed and optionally at least partly replaced with a UBD.

[0021] UBDs are known in the art, and exemplary UBDs may be of the .alpha.-helical, zinc finger or plekstrin homology domain classes.

[0022] For example, the UBD is a domain selected from the group consisting of UIM, IUIM (MIU), DUIM, UBM, UBA, GAT, CUE, VHS, UBZ, NZF, ZnF A20, ZnF UBP (PAZ), PRU, GLUE, UEV, UBC, SH3, PFU and Jab1/MNP domains.

[0023] Preferably, the UBD is a ZnF UBP domain, such as the UBD derived from Isopeptidase T. It advantageously may comprise the sequence from about position 163 to about position 291 of Isopeptidase T, which may comprise the UBD. For example, it may comprise residues 173-289 of Isopeptidase T.

[0024] Alternative UBDs include UBA, UIM and NZF domains.

[0025] ZnF and NZF domains are particularly preferred.

[0026] The Ubc will determine the specificity of linkages used in the polyubiquitin chains. Ubc domains may be derived from E2 enzymes. Referring to human E2 enzymes, the Ubc domain may be derived from an E2 enzyme selected from the group consisting of UBE2A, UBE2B, UBE2C, UBE2D1, UBE2D2, UBE2D3, UBE2D4, UBE2E1, UBE2E2, UBE2E3, UBE2F, UBE2G1, UBE2G2, UBE2H, UBE2I, UBE2J1, UBE2J2, UBE2K, UBE2L3, UBE2L6, UBE2M, UBE2N, UBE2NL, UBE2O, UBE2Q1, UBE2Q2, UBE2R1, UBE2R2, UBE2S, UBE2T, UBE2U, UBE2W, UBE2Z, and BIRC6. The foregoing are human E2 enzymes. Of course, mammalian, yeast or other E2 enzymes may be used, preferably those enzymes which are equivalent to the foregoing human enzymes.

[0027] Preferably, the Ube domain is derived from UBE2S. The Ubc domain is comprised in residues 1 to 156 of UBE2S, and advantageously these residues are incorporated into the chimeric E2 enzyme. Residues 196-222 of UBE2S comprise the C-terminal extension; these residues are removed and/or replaced with a UBD.

[0028] The invention provides a method for increasing the capacity of an E2 enzyme to produce free polyubiquitin dimers, comprising removing a C-terminal tail from said E2 enzyme. In a further aspect, the invention provides a method for increasing the capacity of an E2 enzyme to produce free polyubiquitin chains containing more than two ubiquitin monomers in solution, comprising conjugating the Ubc domain of said E2 enzyme to a UBD.

[0029] Preferably, the polyubiquitin chains comprise trimers or tetramers of ubiquitin monomers.

[0030] In a further aspect, the invention provides a method for producing free polyubiquitin chains linked through a desired lysine residue, comprising the steps of: (a) selecting an E2 enzyme which possesses the desired specificity for ubiquitin lysine residues; (b) fusing the Ubc catalytic domain of said E2 enzyme to a UBD ubiquitin binding domain; and incubating the resulting chimeric protein with an E1 ubiquitin activating enzyme and monomeric ubiquitin.

[0031] In a preferred embodiment, the incidence of undesired lysine linkages is reduced by including a linkage-specific deubiquitinase in the incubation mixture. Such enzymes preferentially degrade polymers having a specific lysine linkage; thus, if the product of the chimeric E2 is contaminated with undesired linkage polymers, the contaminants may be specifically removed.

[0032] Accordingly, it is an object of the invention to not encompass within the invention any previously known product, process of making the product, or method of using the product such that Applicants reserve the right and hereby disclose a disclaimer of any previously known product, process, or method. It is further noted that the invention does not intend to encompass within the scope of the invention any product, process, or making of the product or method of using the product, which does not meet the written description and enablement requirements of the USPTO (35 U.S.C. .sctn.112, first paragraph) or the EPO (Article 83 of the EPC), such that Applicants reserve the right and hereby disclose a disclaimer of any previously described product, process of making the product, or method of using the product.

[0033] It is noted that in this disclosure and particularly in the claims and/or paragraphs, terms such as "comprises", "comprised", "comprising" and the like can have the meaning attributed to it in U.S. patent law; e.g., they can mean "includes", "included", "including", and the like; and that terms such as "consisting essentially of" and "consists essentially of" have the meaning ascribed to them in U.S. patent law, e.g., they allow for elements not explicitly recited, but exclude elements that are found in the prior art or that affect a basic or novel characteristic of the invention.

[0034] These and other embodiments are disclosed or are obvious from and encompassed by, the following Detailed Description.

BRIEF DESCRIPTION OF THE DRAWINGS

[0035] The following detailed description, given by way of example, but not intended to limit the invention solely to the specific embodiments described, may best be understood in conjunction with the accompanying drawings.

[0036] FIG. 1: UBE2S is a K11-specific E2 enzyme. (a) UBE2S and (b) UBE2C were analyzed in autoubiquitination assays in the presence of E1, ubiquitin and Mg.ATP. The panel of single-Lys ubiquitin mutants reveals the intrinsic linkage specificity. Autoubiquitination is visualized with a polyclonal anti-ubiquitin antibody. UBE2S, but not UBE2C, autoubiquitinates and also assembles unattached K11-linked ubiquitin chains. (c) Time course assay for autoubiquitination by UBE2S. The reaction for wild-type (wt) and K11-only ubiquitin leads to similar high-molecular weight conjugates, while for the Lysless (K0) and K63-only ubiquitin an equivalent pattern of multimonoubiquitination is observed.

[0037] FIG. 2: Assembly of K11-linked diubiquitin. (a) Domain structure of UBE2S, and autoubiquitination reactions with UBE2S wild-type and catalytic mutants. (b) UBE2S autoubiquitination occurs in cis. Wild-type UBE2S was mixed with GST-tagged inactive UBE2S.sup.C95A, and after precipitation of the GST-tagged protein, ubiquitination in supernatant (left) and precipitate (right) is analyzed. (c) Removal of the Lys-rich tail of UBE2S decreases autoubiquitination while preserving K11 specificity. (d) Purification of K11-linked diubiquitin by cation exchange chromatography. The integrated peak area (mAU*ml) is indicated. A gel showing protein-containing fractions is shown as an inset.

[0038] FIG. 3: Assembly of K11-linked tetraubiquitin. (a) UBE2S engineering to increase yields of free K11-linked ubiquitin chains. The C-terminal tail was replaced with the ZnF-UBP domain of USP5/IsoT. The fusion protein assembles free chains of up to five ubiquitin molecules, yet it is less specific and also incorporates K63-linkages with wild-type and K63-only ubiquitin (indicated by arrows). (b) Incorporation of K63-linkages may be counteracted by using a K63R ubiquitin mutant, or by including the K63-specific DUB AMSH in the reaction, as observed by disappearance of the faster migrating K63-linkage contamination. (c) 5 .mu.l aliquot of a 1 ml chain assembly reaction using 25 mg ubiquitin shows that di-, tri- and tetraubiquitin is generated in milligram quantities. (d) Cation exchange chromatography was used to purify K11-linked ubiquitin chains. The integrated peak area (mAU*ml) is specified. A gel showing protein-containing fractions is shown as an inset. (e) Purified ubiquitin tetramers of K11, K48, K63 and linear linkages have different electrophoretic mobility on 4-12% SDS-PAGE gels.

[0039] FIG. 4: Crystal structure of K11-linked diubiquitin. (a) The crystal structure of K11-linked diubiquitin in two orientations. The proximal (orange) and distal (yellow) molecules interact through the ubiquitin helix, and the isopeptide linkage (shown in ball-and-stick representation, with red oxygen and blue nitrogen atoms) is at the surface of the dimer. (b) A semitransparent surface coloured blue for residues Ile44, Leu8 and Val70 shows that the hydrophobic patch is not involved in the interface. (c) Residues at the interface are shown in stick representation, and polar interactions of <3.5 .ANG. are shown with dotted lines. Water molecules are shown as purple spheres. (d) The hydrophobic surface in K11-linked chains is extended by Leu71 and Leu73, which are exposed as Arg72/Arg74 participate in the interface.

[0040] FIG. 5: NMR Solution studies of K11-linked diubiquitin.

[0041] (a) Overlay of 15N, 1H HSQC spectra of ubiquitin K63R (red) onto K11-linked diubiquitin K63R (blue). The expansion illustrates the doubling of peaks observed for Lys29, Ile30, Asp32 and Lys33. The signal for Asp52 is unperturbed. (b, c) Weighted chemical shift perturbation according to residue number for K11-linked diubiquitin with both molecules .sup.13C, .sup.15N-labeled (blue, K63R ubiquitin mutant) or only labeled distally (orange, K11R ubiquitin mutant). Shown are chemical shift perturbations observed for doubled peaks calculated as the weighted difference between the chemical shift position in the K11-linked diubiquitin mutants and their respective monoubiquitin counterparts at pH7.4 (b) and pH 3.5 (c). Stars (*) indicate exchange-broadened residues, and arrows indicate K29 and K33. (d) Combined chemical shift perturbation differences for K48- and K63-linked diubiquitin (Tenno et al., 2004). (e) Comparison of the proximal K11-linked diubiquitin interface in a view indicated by the arrow (left). Surface map of interacting residues from NMR (middle, orange, with shifting residues in blue, and Pro residues in yellow) and from the crystal interaction (right, yellow with interface resides in marine, according to the PISA server, ebi.ac.uk/pdbe/prot_int/pistart.html). (f) Comparison of the distal K11 diubiquitin interface, coloured as in (c), as viewed indicated by the arrow in the left picture. The 2nd image shows perturbed residues obtained from the distally labeled sample, and the third image from the fully labeled diubiquitin. The fourth image corresponds to the crystal structure interface. Asp39 and Glu52, which form part of the crystallographic interface but are not perturbed in solution, are circled. A white surface indicates exchange-broadened residues.

DETAILED DESCRIPTION OF THE INVENTION

[0042] Unless defined otherwise, all technical scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art, such as in the arts of peptide chemistry, cell culture and phage display, nucleic acid chemistry and biochemistry. Standard techniques are used for molecular biology, genetic and biochemical methods (see Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd ed., 2001, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.; Ausubel et al., Short Protocols in Molecular Biology (1999) 4th ed., John Wiley & Sons, Inc.). All publications cited herein are incorporated herein by reference in their entirety for the purpose of describing and disclosing the methodologies, reagents, and tools reported in the publications that might be used in connection with the invention.

[0043] E2 enzymes, as referred to herein, are variously known as ubiquitin carrier proteins, ubiquitin conjugating enzymes or Ubcs. In many instances, E2 enzymes are thought to determine linkage specificity in polyubiquitin. 38 E2 enzymes have been identified in humans, as described in Ye and Rape, 2009. As noted above, they may be subdivided into three classes, of which class II enzymes have a C-terminal extension or tail attached to the Ube catalytic domain (also referred to as the UBCc catalytic domain). This domain is recognised as a conserved domain, and is identifiable in any E2 enzyme.

[0044] Ubiquitin binding domains, or UBDs, are modular protein domains which bind non-covalently to ubiquitin. As noted above, UBDs are divisible into a number of different categories, including .alpha.-helical, zinc finger and pleckstrin homology domains, which are structurally diverse. Preferably, a UBD is a UBD as described in Dikic et al., 2009. Other UBDs may become recognised, and it is anticipated that these too will be useful in the present invention. In one embodiment, a UBD is a ZnF UBD, for example UBZ, NZF, A20-like ZnF or ZnF UBP, as described in Dikic et al., 2009.

[0045] A chimeric protein may be constructed by fusing a Ubc domain to a UBD, according to techniques known in the art. For example, polypeptide fusions may be created by ligating nucleic acids encoding the respective domains in-frame, and expressing the coding sequence thus created. The domains may be fused directly to one another, or may be separated by one or several additional amino acids, referred to as a linker. Where a linker separates the domains, said linker advantageously does not negatively influence the three-dimensional alignment of the domains in such a way that their functional cooperation is sterically hindered. The UBD is preferably C-terminal to the Ubc domain, effectively replacing the C-terminal extension in a Class II E2.

[0046] A chimeric enzyme is an enzyme that may comprise at least two heterologous domains. In this context, heterologous signifies that the domains are not found in the same position in a single polypeptide in vivo. Normally, this means that the domains are derived from two different proteins. The proteins themselves may be found in the same organism--for example, the proteins may both be human proteins.

[0047] The term "fusion protein" refers to a protein or polypeptide that has an amino acid sequence derived from two or more proteins, for example two heterologous domains as indicated above. The fusion protein may also include linking regions of amino acids between amino acid portions derived from separate proteins. Unrelated proteins or polypeptides may also be included in the fusion, for example immunoglobulin peptides, dimerising polypeptides, stabilizing polypeptides, amphiphilic peptides, or polypeptides which may comprise amino acid sequences that provide "tags" for targeting or purification of the protein.

[0048] In one embodiment, a chimeric enzyme may also be an enzyme in which the positioning, spacing or function of two endogenous domains has been changed, by manipulation, with respect to the wild-type enzyme. For example, a C-terminal extension in a class II E2 may be repositioned by adding or removing amino acids between it and the Ubc domain. Alternatively, the amino acid sequence of the C-terminal extension itself may be mutated, to introduce desired properties. Typically, such properties include the ability to bind ubiquitin.

[0049] A protein domain, as referred to herein, is a protein or fragment of a protein which is capable of independent folding to create a defined three-dimensional structure that imparts a property to the domain. Typically, the domain is identified by its amino acid sequence, usually by identifying certain limits in a protein structure which define the domain. Domains may be identified using domain databases such as, but not limited to PFAM, PRODOM, PROSITE, BLOCKS, PRINTS, SBASE, ISREC PROFILES, SAMRT, and PROCLASS. It will be understood that the precise limits of the domain, as defined by the amino acid sequence, may vary. For example, including extra amino acids which are not normally considered to be part of the domain is unlikely to affect the function of the domain. The use of interdomain linkers is commonplace in the art to link protein domains, both in nature and in artificial protein constructs. Such linkers typically comprise sequences present upstream or downstream of the joined domains in their natural context. Moreover, removing one or more amino acids from one end of a domain may be permissible, as long as a substantial part of the domain remains which is still able to fold in the correct manner to mediate the desired function. In one embodiment, therefore, a domain is a minimal independently-folding segment of a protein which possesses the desired functional characteristic. In the case of the Ubc domain, this function is the polymerisation of ubiquitin using the desired lysine linkage. In the case of the UBD, the function is to promote the formation of free ubiquitin polymers.

[0050] In one embodiment, the entire sequence of a domain as defined by primary amino acid sequence is used. In another embodiment, a sequence shortened by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20 or more amino acids at the N and/or C terminus may be used.

[0051] The present invention increases the amount of free polyubiquitin produced by E2 enzymes, making the polyubiquitin available for any desired purpose. In this context, the production of free polyubiquitin may be increased by 10%, 15%, 20%, 25%, 50%, 75%, 100% or more. Free polyubiquitin refers to polyubiquitin chains, for example dimers, trimers, tetramers or longer chains, released into solution by the E2 enzyme rather than attached to a target.

[0052] A "nucleic acid" is a polynucleotide such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The term is used to include single-stranded nucleic acids, double-stranded nucleic acids, and RNA and DNA made from nucleotide or nucleoside analogues.

[0053] Ubc (UBCc) domains of E2 proteins share a consensus sequence, 141 amino acids in length. Comparison of Ubc domains suggests a consensus sequence

TABLE-US-00001 [SEQ ID no 19] SKRLQKELKDLKKDPPSGIS AEPVEENLLEWHGTIR GPPDTPYEGGIFKLDIEFP EDYPFKPPKVRFVTKI YHPPNVDENG KICLSI LKTHGWSPAY TLRTVLLSLQSLLN EPNPSDPLNAEAAK LYKENREEFKKKAREWT.

The Ubc motif, [FYWLS]-H-[PC]-[NH]-[LIV]-x(3,4)-G-x-[LIV]-C-[LIV]-x-[LIV], is underlined [SEQ ID no 20]. Preferably, the Ubc domain used in the present invention conforms to the consensus sequence, allowing for conservative amino acid substitutions. Substitutions to the conserved sequence may also be made which reflect deviation from the consensus seen in naturally-occurring Ubc domains. Therefore, the Ubc domain used in the present invention may be naturally occurring or synthetic. Synthetic domains may be designed according to the above consensus and constraints.

[0054] Naturally-occurring Ubc domains may be derived from proteins other than E2 enzymes.

[0055] Conservative amino acid substitutions generally follow the following scheme:

TABLE-US-00002 Side chain Members Hydrophobic met, ala, val, leu, ile Neutral hydrophilic cys, ser, thr Acidic asp, glu Basic asn, gln, his, lys, arg Residues that influence chain gly, pro orientation Aromatic trp, tyr, phe

[0056] In the above table, amino acids identified in the same row are considered to have similar side-chains and may be substituted for each other with the least impact on protein structure and function.

[0057] A list of known E2 enzymes, identified by human gene names together with yeast homologue names, appears in Table S1 in the supplementary information supplied with Ye & Rape, 2009. In the context of the present invention, E2 enzymes may be selected from this list, and Ubc domains derived therefrom for use in constructing chimeric E2 enzymes.

[0058] For example, in order to improve the production of free Lys-11 conjugated polyubiquitin, UBE2C or UBE2S should be employed. In UBE2C the Ube domain extends from residue 33 to residue 170 of the amino acid sequence.

[0059] In order to produce Lys-48 chains, UBE2G1, UBE2G2, UBE2K, UBE2R1 or UBE2R2 may be used. In UBE2G1, for example, the Ubc domain is located between residues 74 and 216 of the amino acid sequence.

[0060] Other E2 enzymes, and reported chain specificities, are set forth in Ye and Rape, 2009, as mentioned above.

[0061] Table 1 shows Seq IDs 1 to 13, which set forth exemplary nucleotide and amino acid sequences of human E2 enzymes, and identify the Ubc (UBCc) domains therein. Other sequences are available in databases, such as SWISSPROT, TrEMBL, NCBI, and the like.

TABLE-US-00003 TABLE 1 Name SEQ ID UBCc position UBE2C 1 33-170 UBE2D1 2 4-142 UBE2D2 3 4-142 UBE2D3 4 4-142 UBE2E2 5 59-196 UBE2E3 6 65-202 UBE2F 7 35-180 UBE2J1 8 12-119 UBE2J2 9 14-127 UBE2M 10 33-166 UBE2N 11 5-144 UBE2O 12 958-1108 UBE2S 13 13-152

[0062] Ube domains may be obtained from the sequences set forth above, or other E2 sequences known in the art, and covalently linked to UBD domains to create a chimeric protein. Alternatively, nucleic acids encoding domains suitable for generating chimeric E2 enzymes may be produced, for example, by restriction enzyme digestion of nucleic acids encoding the desired E2 enzyme, or by PCR amplification of a desired nucleic acid sequence using primers that flank the Ubc domain. Nucleic acids encoding E2 enzymes are known in the art and sequences therefore widely available in databases such as GENBANK. Restriction enzyme cutting sites and suitable primers may be identified using suitable software, or by eye.

[0063] The invention contemplates the use of natural Ubc domains that have been mutated. Mutation may be at the nucleic acid level, that is changes may be effected to the nucleic acid encoding a Ubc domain without changing the structure of the Ubc domain itself, as a result of redundancy in the genetic code. Such changes may, for example, confer improved expression in heterologous host cells by employing preferred codon usage patterns.

[0064] Other mutations will change the amino acid sequence of the Ubc domain. As noted above, this may take the form of additions to or deletions from the N and C termini of the domain. Moreover, changes may be made within the sequence of the Ubc domain, for example through substitution, addition or deletion of one or more amino acids. Conservative amino acid substitutions are preferred, as set forth above. For example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20 or more amino acids are added, deleted and/or substituted by other amino acids.

[0065] In a preferred embodiment, the naturally occurring Ubc sequence is used.

[0066] Expression of nucleic acids encoding chimeric E2 enzymes may be carried out in any suitable expression system. Expression systems are known in the art and may be obtained commercially or according to instructions provided in laboratory manuals.

[0067] More than 20 families of UBD have been identified. The first UBD to be identified was from S5a, a proteasome subunit, and this sequence was used in bioinformatic analyses to identify further domains, which were then shown to be bona-fide UBDs. A pattern was identified, known as the ubiquiting-interacting motif (UIM). A second motif, the Ubiquitin-Associated domain (UBA), was first identified as a domain common to proteins involved in ubiquitin metabolism. This domain too was shown to bind ubiquitin. Further domains have been discovered, including CUE domains, which are associated with Endoplasmic Reticulum targeting, and the zinc finger NZF or PAZ domains, VHS and GLUE domains.

[0068] The UEV domain is a pleckstrin homology UBD which resembles s Ubc domain, but lacks the catalytic cysteine residue. For a description of UBD domains, see Hicke et al., 2005, and Dikic et al., 2009, especially Table 1 in the latter document.

[0069] UBDs useful in the present invention may be obtained from naturally-occurring polypeptides, or may be mutated forms of domains present in such polypeptides. As noted above, mutant proteins may be created by inserting, deleting or substituting nucleic acid residues in a gene encoding the protein. The foregoing guidelines for mutation of Ubc domains may be applied to UBDs.

[0070] Zinc finger UBOs are known, for instance, in HDAC6, where the UBD is located between residues 1133 and 1204 (SEQ ID No. 14); in RABEX5, wherein A20-like ZnF and MIU UBDs are located between amino acids 1 and 74 (SEQ ID No. 15); in NPL4, where it is between positions 104 and 246 (SEQ ID No. 16); in TAB2, where it lies between residues 663 and 693 (SEQ ID No. 17); and in IsoT, where it lies between residues 173 and 289 (SEQ ID No 18).

[0071] When selecting a UBD for fusing to an Ubc, it is preferred that the lysine specificity of the Ubc should be compatible with the binding of the UBD to ubiquitin. For example, if Lys-11 is the preferred linkage residue of the Ubc, the UBO preferably binds ubiquitin in such a manner as to leave Lys-11 accessible for chain extension with ubiquitin molecules.

[0072] A chimeric protein in accordance with the invention may comprise a Ubc domain fused to a UBO. The UBO is preferably C-terminal in the fusion, although N-terminal fusions are contemplated. Fusions may be created by covalent linkage of polypeptide domains, or ligation of nucleic acids encoding such domains in the form of restriction fragments, amplification fragments or both. Moreover, synthetic nucleic acids may be used to create synthetic or partially synthetic nucleic acids encoding a fusion protein in accordance with the invention.

[0073] Fusions useful in the present invention include UBE2S and UBE2C fusions, for the production of Lys-11 linked polyubiquitin. The Ubc domains of UBE2S and UBE2C may be ligated to UBDs from a variety of proteins. For example, zinc finger UBDs may be used, such as the domains derived from polymerase-h or polymerase-k, Tax1BP1, NPL4, Vps63, TAB2, TA83, RABEXS, A20, IsopeptidaseT (IsoT) and HDAC6.

[0074] Preferred combinations include the Ube of UBE2S and the ZnF UBP domain of IsopeptidaseT, as well as the Ubc of UBE2S and the NZF of TAB2.

[0075] For example, the engineered UBE2S-UBD fusion protein is constructed making use of a naturally occurring Ncol restriction site in the human UBE2S sequence just before the Lys-rich tail (residue 196), and cloned into a vector such as pGEX6P1(Amersham). The IsoT(USP5) ZnF UBP domain (residues 173-289) are amplified from cDNA with primers

TABLE-US-00004 UBP-FW [SEQ ID No 21] 5'-CCAAGGTTCCATGGTACGGCAGGTGTCTAAGCATGCC-3' and UBP-RV [SEQ ID No 22] 5'-GCCTAGCGGCCGCTTATGTCTTCTGCATCTTCAGCAT- GTCGATG-3').

The amplified fragment is ligated into the Ncol/Notl restriction sites present in the pGEX6P1-UBE2S expression plasmid. The protein is expressed in E. coli and purified.

[0076] The TAB2 NZF domain (Amino Acids 663-693; Nucleotides 1988-2079+STOP) is amplified using primers NZFfus663FW: CCAAGGTTCCATGGATGAGGGAGCTC-AGTGGAATTG [SEQ ID No 23] and NZFfus693RV: GCCTAGCGGCCGCTTATC-AGAAATGCCTTGGCATCTC [SEQ ID No 24]. As with the ZnF domain, the amplified fragment is restriction digested and ligated into the Ncol/Notl restriction sites present in the pGEX6P1-UBE2S expression plasmid.

[0077] A similar technique may be employed for making Ubc-UBD fusions of choice.

[0078] A wide variety of expression systems are available for the production of chimeric polypeptides. For example, expression systems of both prokaryotic and eukaryotic origin may be used for the production of E2 fusion proteins.

[0079] Nucleic acid vectors are commonly used for protein expression. The term "vector" refers to a nucleic acid molecule that may be used to transport a second nucleic acid molecule into a cell, and/or express it therein. In one embodiment, the vector allows for replication of DNA sequences inserted into the vector. The vector may comprise a promoter to enhance expression of the nucleic acid molecule in at least some host cells. Vectors may replicate autonomously (extrachromosomal) or may be integrated into a host cell chromosome. In one embodiment, the vector may comprise an expression vector capable of producing a fusion protein derived from at least part of a nucleic acid sequence inserted into the vector.

[0080] A cloning vector may be a nucleic acid molecule, such as a plasmid, cosmid, or bacteriophage, that has the capability of replicating autonomously in a host cell. Cloning vectors typically contain one or a small number of restriction endonuclease recognition sites that allow insertion of a nucleic acid molecule in a determinable fashion without loss of an essential biological function of the vector, as well as nucleotide sequences encoding a marker gene that is suitable for use in the identification and selection of cells transformed with the cloning vector. Marker genes typically include genes that provide tetracycline resistance or ampicillin resistance.

[0081] An expression vector typically may comprise a transcription promoter, a gene, and a transcription terminator. Expression vectors may be autonomously replicating, or integrated into the host genome. Gene expression is usually placed under the control of a promoter, and such a gene is said to be operably linked to the promoter. Similarly, a regulatory element and a core promoter are operably linked if the regulatory element modulates the activity of the core promoter. The nucleic acid encoding the chimeric enzyme according to the invention is typically expressed under the control of a promoter in an expression vector.

[0082] To express a gene, a nucleic acid molecule encoding the protein must be operably linked to regulatory sequences that control transcriptional expression and then, introduced into a host cell. In addition to transcriptional regulatory sequences, such as promoters and enhancers, expression vectors may include transcriptional and translational regulatory sequences. The sequences used will be appropriate to the host, which may be prokaryotic or eukaryotic. The transcriptional and translational regulatory signals suitable for a mammalian host may be derived from viral sources, such as adenovirus, bovine papilloma virus, simian virus, or the like, in which the regulatory signals are associated with a particular gene that has a high level of expression. Suitable transcriptional and translational regulatory sequences also may be obtained from mammalian genes, such as actin, collagen, myosin, and metallothionein genes. Prokaryotic regulatory sequences may similarly be derived from viral genes, and are known in the art.

[0083] The inclusion of an affinity tag is useful for the identification or selection of cells expressing the fusion protein. Examples of affinity tags include polyHistidine tags (which have an affinity for nickel-chelating resin), c-myc tags, which are detected with anti-myc antibodies, calmodulin binding protein (isolated with calmodulin affinity chromatography), substance P, the RYIRS tag (which binds with anti-RYIRS antibodies), a hemagglutinin A epitope tag, which is detected with an antibody, the Glu-Glu tag, and the FLAG tag (which binds with anti-FLAG antibodies). Nucleic acid molecules encoding such peptide tags are available, for example, from Sigma-Aldrich Corporation (St. Louis, Mo., USA).

[0084] The gram-negative bacterium E. coli is widely used as a host for heterologous gene expression. Although large amounts of heterologous protein may accumulate inside the cell, this expression system is effective in the context of the present invention. Suitable strains of E. coli include BL21(DE3), BL21(DE3)pLysS, BL21(DE3)pLysE, DH1, DH41, DH5, DH51, DH51F', DH51MCR, DH10B, DH10B/p3, DH11S, C600, HB101, JM101, JM105, JM109, JM110, K38, RR1, Y1088, Y1089, CSH18, ER1451, and ER1647.

[0085] Bacteria from the genus Bacillus are also suitable as heterologous hosts, and have capability to secrete proteins into the culture medium. Other bacteria suitable as hosts are those from the genera Streptomyces and Pseudomonas. Suitable strains of Bacillus subtilus include BR151, YB886, Ml119, Ml120, and B170 (see, for example, Hardy, "Bacillus Cloning Methods," in DNA Cloning: A Practical Approach, Glover (ed.) (IRL Press 1985)). Standard techniques for propagating vectors in prokaryotic hosts are well-known to those of skill in the art (see, for example, Ausubel 1995; Wu et al., Methods in Gene Biotechnology (CRC Press, Inc. 1997)).

[0086] Eukaryotic hosts such as yeasts or other fungi may be used. In general, yeast cells are preferred over fungal cells because they are easier to manipulate. However, some proteins are either poorly secreted from the yeast cell, or in some cases are not processed properly (e.g. hyperglycosylation in yeast). In these instances, a different fungal host organism should be selected.

[0087] The use of suitable eukaryotic host cells--such as yeast, fungal and plant host cells--may provide for post-translational modifications (e.g. myristoylation, glycosylation, truncation, lapidation and tyrosine, serine or threonine phosphorylation) as may be needed to confer optimal biological activity on recombinant expression products.

[0088] In some embodiments, the fusion proteins may be expressed as GST fusions. For example, the pGEX vector system employs a GST fusion. Use of GST as a fusion partner provides an inducible expressions system which facilitates the production of proteins in the E. coli system. Proteins expressed using this system may be isolated using a glutathione capture resin.

[0089] For example, recombinant GST-UBE2S constructs are expressed in Rosetta 2 (DE3) placI eels (Novagen). 1 L cultures of cells are induced at OD.sub.600 of 0.6 with 250 .mu.M IPTG and proteins are expressed at 20.degree. C. overnight. Cells are harvested and flash-frozen. 30 ml lysis buffer containing 270 mM sucrose, 50 mM Tris (pH 8.0), 50 mM NaF, 1 protease inhibitor cocktail tablet (Roche) (0.1% v/v .beta.-mercaptoethanol, 1 mg/ml lysozyme and 0.1 mg/ml DNase) are added per liter of culture. After sonication, cell lysates are cleared using a Sorvall SS-34 rotor (18,000 rpm, 30 min, 4.degree. C.) and supernatants are incubated with Glutathione Sepharose 4B (GE Healthcare) for 1 h to immobilize soluble GST fusion proteins. Subsequently, the sepharose beads are washed with 500 ml high salt buffer [500 mM NaCl, 25 mM Tris (pH 8.5), 5 mM DTT] and 300 ml low salt buffer [150 mM NaCl, 25 mM Tris (pH 8.5), 5 mM DTT]. For site-specific cleavage of the GST tag, immobilized fusion proteins are incubated with 30 mM PreScission protease (GE Healthcare) overnight. Cleaved proteins are eluted with low salt buffer and flash-frozen in liquid nitrogen. All samples are >95% pure after purification.

[0090] The chimeric E2 enzymes of the invention produce enhanced levels of free polyubiquitin, compared to naturally occurring E2. Assays for ubiquitination are known in the art; for instance, a description of such assays, and relevant background, is set forth, for example, in WO2009134897, US2006088901 and WO2004020674. Ubiquitination assays kits are available commercially, for instance from Cisbio, Bedford, Mass., USA; Invitrogen, Carlsbad, Calif., USA; and Enzo Lifesciences, Plymouth Meeting, Pa., USA.

[0091] In general, an assay for the production of free ubiquitin requires the incubation of E1 enzyme, the chimeric E2 according to the invention and monomeric ubiquitin in the presence of ATP in a buffer solution.

[0092] E1 enzymes are available commercially, for instance from Enzo Lifesciences. A list of E1 enzymes is set forth in Table 1 of WO2004020674.

[0093] In one embodiment, ubiquitin may be labelled, to facilitate its subsequent detection or isolation.

[0094] In one embodiment, 30 .mu.l reactions may be carried out at 37.degree. C. containing 25 ng ubiquitin-activating enzyme (E1), 2 .mu.g ubiquitin conjugating enzyme (E2), 5 .mu.g ubiquitin, 10 mM ATP, 40 mM Tris (pH 7.5), 10 mM MgCl.sub.2 and 0.6 mM DTT. After 1 h the reaction is stopped by addition of 10 .mu.l 4.times.LDS sample buffer (Invitrogen), resolved by SDS-PAGE on 4-12% precast gels and subjected to Western analysis using rabbit polyclonal anti-ubiquitin antibody (Upstate).

[0095] The scale of the reactions may be increased, if necessary. Performing the reaction with naturally occurring E2S does not result in the formation of significant amounts of polyubiquitin. However, using a chimeric E2 according to the invention, polyubiquitin chains may be isolated and purified.

[0096] In one embodiment, ubiquitin dimers are synthesized by incubating 16 .mu.g E1 enzyme, 100 .mu.g UBE2S.DELTA.C, 12.5 mg ubiquitin, 10 mM ATP, 40 mM Tris (pH 7.5), 10 mM MgCl.sub.2 and 0.6 mM DTT at 37.degree. C. overnight. Subsequently, 50 mM DTT is added to the reaction before further dilution with 14 ml of 50 mM ammonium acetate (pH 4.5).

[0097] K11-linked diubiquitin may be purified by cation exchange using a MonoS column (GE Healthcare) and concentrated to 5 mg/ml. Crystals are formed after 1 day from 3 M NaCl and 0.1 M citric acid (pH 3.5). Crystals may be soaked in mother liquor containing 15% ethylene glycol before freezing in liquid nitrogen.

[0098] Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined in the appended claims.

[0099] The present invention will be further illustrated in the following Examples which are given for illustration purposes only and are not intended to limit the invention in any way.

EXAMPLES

Example 1

Analysis of E2 Enzymes Involved in K11 Chain Formation

[0100] For the assembly of free K48- and K63-linked ubiquitin chains, specific E2 enzymes have been described, and the biology of these posttranslational modifications is now known in great detail. In order to study the elusive K11 linkage, Applicants analyzed the in vitro properties of two human E2 conjugating enzymes that have been associated with this chain type, namely UBE2C/UbcH10 (Jin et al., 2008) and UBE2S/E2-EPF (Baboshina & Haas, 1996). Applicants tested whether UBE2S and UBE2C would assemble unattached polyubiquitin chains in vitro in absence of an E3 ligase. Analytical assays were carried out in 30 .mu.l reactions at 37.degree. C. containing 250 nM ubiquitin-activating enzyme (E1), 2.8 .mu.M (UBE2S) or 3.4 .mu.M (UBE2C) ubiquitin conjugating enzyme (E2), 19.5 .mu.M ubiquitin, 10 mM ATP, 40 mM Tris (pH 7.5), 10 mM MgCl.sub.2 and 0.6 mM OTT. After 1 h the reaction was stopped by addition of 10 .mu.l 4.times.LDS sample buffer (Invitrogen), resolved by SDS-PAGE on 4-12% precast gels and subjected to Western analysis using rabbit polyclonal anti-ubiquitin antibody (Upstate). Applicants found that UBE2S generated small amounts of free diubiquitin, as judged by the appearance of ubiquitin dimers on reducing SDS PAGE gels (FIG. 1a), while UBE2C did not assemble unattached ubiquitin chains (FIG. 1b). UBE2S, but not UBE2C, also underwent autoubiquitination, resulting in the appearance of high molecular weight species of UBE2S (FIG. 1a). Linkage type analysis using single-Lys ubiquitin mutants (K6-only, K11-only etc.) revealed that UBE2S assembled K11 linkages specifically (FIG. 1a), since ubiquitin dimers as well as high molecular weight forms of UBE2S were only observed with the K11-only ubiquitin mutant (FIG. 1a). UBE2S autoubiquitinated several of its 17 Lys residues, however with ubiquitin mutants lacking K11, these monoubiquitin modifications were not extended (FIG. 1a, c), and autoubiquitination with Lysless (K0) and K63-only ubiquitin followed similar kinetics resulting in 6-7 distinct multi-monoubiquitinated bands of UBE2S (FIG. 1c). To verify that UBE2S was K11-specific also with wild-type ubiquitin, Applicants performed LC-MS/MS analysis of trypsinized diubiquitin.

[0101] LC-MS/MS was carried out by nanoflow reverse phase liquid chromatography (using a U3000 from Dionex) coupled online to a Linear Ion Trap (LTQ)-Orbitrap XL mass spectrometer (Thermo Scientific). Briefly, the LC separation was performed using a C18 PepMap capillary column (75 .mu.m ID.times.150 mm; Dionex) and the peptides were eluted using a linear gradient from 5% B to 50% B over 40 minutes at a flow rate of 200 nL/min (solvent A: 98% H.sub.2O; 2% acetonitrile in 0.1% formic acid; solvent B: 90% acetonitrile in 0.1% formic acid). The eluted peptides were electrosprayed into the mass spectrometer via a nanoelectrospray source fitted with a PicoTip emitter (New Objective). A cycle of one full FT scan mass spectrum (350-2000 m/z, resolution of 60 000 at m/z 400) was followed by 6 data-dependent MS/MS acquired in the linear ion trap with normalized collision energy (setting of 35%). Target ions already selected for MS/MS were dynamically excluded for 60 s. Peptides were identified from MS/MS spectra by searching against a Swissprot database using the Mascot search algorithm (matrixscience.com) and Proteome Discoverer (Thermo Fisher Scientific). Oxidation of methionine, GlyGly and LeuArgGlyGly addition on Lysine residues were used as variable modifications. Initial mass tolerance was set to 10 ppm for peptide parent mass, 0.8 Da for fragment masses and enzyme restriction was set to trypsin specificity with 2 missed cleavages.

[0102] Applicants detected peptides derived from K11-linked diubiquitin, and with significantly less intensity also from K48- and K63-linked diubiquitin. Other linkages were not detected. Applicants focused on UBE2S and set out to harness its capability to produce free K11-linked ubiquitin chains.

[0103] Human UBE2S may comprise 222 residues with an N-terminal conserved catalytic Ubc domain spanning residues 1-156. The very C-terminal 25 residues of UBE2S encompass nine Lys residues that are conserved in UBE2S homologs (ensembl.org), while the remaining 40 residues form a non-conserved Lys-free linker (FIG. 2a). Mutation of the catalytic Cys residue in the Ubc domain to Ala (UBE2S.sup.C95A) rendered UBE2S inactive, while mutation to Ser (UBE2S.sup.C95S) acted as a ubiquitin-trapping mutant, in which the Ser residue was still charged with ubiquitin by the E1 enzyme, but failed to discharge efficiently, similar to what has been reported for UBE2N/Ubc 1317 (FIG. 2a).

[0104] Autoubiquitination of UBE2S occurred in cis, as wild-type UBE2S was unable to ubiquitinate GST-tagged UBE2S.sup.C95A in trans, despite being able to autoubiquitinate itself (FIG. 2b). The autoubiquitination of UBE2S appeared to be favored compared to formation of free K11-linked chains, and free chain production is inefficient. The Lys-rich tail of UBE2S is a likely target for autoubiquitination. Removal of the last 25 residues (UBE2S.DELTA.C) reduced autoubiquitination (FIG. 2c), increased formation of free diubiquitin (data not shown), and the enzyme remained specific for K11 linkages (FIG. 2c). From 25 mg input ubiquitin, .about.1 mg K11-linked diubiquitin could be purified by cation exchange (FIG. 2d).

Example 2

Generation of K11-Linked Ubiquitin Tetramers

[0105] In order to increase the yields of K11-linked dimers and to obtain longer polymers, Applicants reverted to protein engineering to create an UBE2S variant with increased capability to form free ubiquitin chains. Having established that the Lys-rich tail of UBE2S is polyubiquitinated by UBE2S in a cis reaction, Applicants replaced this tail (residues 196-222) with the ubiquitin binding ZnF-UBP domain of human USP5/IsoT (residues 173-289) (Reyes-Turcu et al., 2006; FIG. 3a). This UBD has two advantageous features: it binds ubiquitin with nanomolar affinity, and interacts with the free C-terminal tail of ubiquitin leaving the Lys11 side chain accessible for chain elongation. The UBE2S-UBD fusion protein was significantly more efficient in producing ubiquitin dimers, trimers, and tetramers.

[0106] Ubiquitin tetramers were synthesized by incubating 250 nM E1 enzyme, 4.8 .mu.M UBE2S-UBD, 2.9 mM ubiquitin, 400 nM AMSH, 10 mM ATP, 40 mM Tris (pH 7.5), 10 mM MgCl.sub.2 and 0.6 mM DTT in a 1 ml reaction at 37.degree. C. After 1.5 hours 400 nM AMSH was added again to counteract the formation of K63-linked ubiquitin chains. After 3 hours, 50 mM DTT was added to the reaction before further dilution with 14 ml of 50 mM ammonium acetate (pH 4.5). K11-linked di-, tri- and tetraubiquitin were purified by cation exchange using a MonoS column (GE Healthcare) (FIG. 3). It was also possible to use K11-linked diubiquitin as input material to obtain tetraubiquitin.

[0107] Specificity analysis showed that UBE2S-UBD also incorporated K63-linkages in these oligomers (see K63-only mutant in FIG. 3a). Two distinct trimer bands were observed in reactions using wild-type ubiquitin, but not with the K11-only ubiquitin mutant, indicating alternating or branched linkages with wild-type ubiquitin, since differently linked ubiquitin chains have distinct electrophoretic mobility (FIG. 3a, b). Two linkage types (K11 and K63) in the wild-type ubiquitin reaction were further confirmed by LC-MS/MS analysis.

[0108] Formation of K63-linkages could be counteracted by either using the ubiquitin K63R mutant, or by incubation with the K63-specific deubiquitinase AMSH (McCullough et al., 2004; FIG. 3b). Indeed, AMSH removed only the faster migrating of the two triubiquitin bands, showing that a chain with alternate linkages had been created by UBE2S-UBD (FIG. 3b). When Applicants included AMSH directly in the assembly reactions, Applicants were able to remove the contaminating K63 linkages in situ (FIG. 3b). This protocol allowed large scale generation and purification of K11-linked di-, tri- and tetraubiquitin (FIG. 3c, d, e) with improved yields. Almost 50% of the input ubiquitin was converted into K11-linked oligomers using UBE2S-UBD, while UBE2S.DELTA.C only assembled 15% of input ubiquitin into K11-linked dimers (FIG. 3b, compare integrated peak area in FIGS. 2d and 3d).

Example 3

Structure of K11-Linked Polyubiquitin

[0109] Generation of K11-linked ubiquitin chains in large quantities allowed detailed structural analysis of this chain type.

[0110] Large-scale ubiquitin chain assembly was carried out in 1 ml reactions. Ubiquitin dimers were synthesized by incubating 250 nM E1 enzyme, 4.8 .mu.M UBE2S.DELTA.C, 1.5 mM ubiquitin, 10 mM ATP, 40 mM Tris (pH 7.5), 10 mM MgCl.sub.2 and 0.6 mM DTT at 37.degree. C. overnight. Subsequently, 50 mM DTT was added to the reaction before further dilution with 14 ml of 50 mM ammonium acetate (pH 4.5) to precipitate enzymes. The solution was filtered through a 0.2 .mu.m syringe filter and K11-linked diubiquitin was purified by cation exchange using a MonoS column (GE Healthcare) and concentrated to 5 mg/ml. Crystals formed after 1 day from 3M NaCl and 0.1 M citric acid (pH 3.5). Before freezing in liquid nitrogen, crystals were soaked in mother liquor containing 15% ethylene glycol.

[0111] Diffraction data on crystals of K11-linked diubiquitin were collected on ESRF beamline ID14-EH2 (Grenoble). The crystals diffracted to a maximum resolution of 2.2 .ANG. and displayed an orthorhombic space group that Pointless (Evans et al., 2006) identified to be most likely P222.sub.1. The structure was solved by molecular replacement in MolRep (Vagin & Teplyakov, 2000), which identified 12 ubiquitin molecules from using monoubiquitin as a search model. The 12 molecules were related by translational symmetry, and formed two equivalent tetraubiquitin complexes with linkage ambiguity (FIG. 5), and another two diubiquitin molecules in which a two-fold axis generated the remaining dimers to form similar tetrameric assemblies. The structure was built in coot (Emsley & Cowtan, 2004) from the molecular replacement model, and refined in Phenix (Adams et al., 2002) using NCS, simulated annealing (initially) and TLS B-factor refinement at later stages of the refinement. Restraints for the isopeptide linkage were generated using phenix.elbow. Data collection and refinement statistics may be found in Table 2.

TABLE-US-00005 TABLE 2 Data collection and refinement statistics K11-linked diubiqulin Data collection Space group P222.sub.1 Cell dimensions a, b, c (.ANG.) 79.23, 79.96, 221.23 abc (.degree.) 90, 90, 90 Resolution (.ANG.) 24.92-2.20 (2.32-2.20)* R.sub.sym or R.sub.merge 0.106 (0.489) //s/ 6.0 (2.0) Completeness (%) 98.3 (99.7) Redundancy 3.1 (3.0) Refinement Resolution (.ANG.) 24.92-2.20 No. reflections 65986 R.sub.work/R.sub.free 0.205/0.252 No. atoms Protein 7255 (12 ubiquitin molecules) Ligandion 111 Water 654 B-factors Protein 30.1 Ligandion 41.8 Water 34.4 R.m.s. deviations Bond lengths (.ANG.) 0.005 Bond angles (.degree.) 0.978 *Values in parentheses are for highest-resolution shell.

[0112] K11-linked diubiquitin adopts a compact conformation distinct from any other ubiquitin chain structure observed to date (FIG. 4a). Contacts between ubiquitin moieties are entirely polar and do not involve the hydrophobic ubiquitin surface patch (Ile44, Leu8, Val70), which is the most common ubiquitin interaction site (FIG. 4b). The interface instead forms between a surface centered on Glu24 of the distal ubiquitin, and a surface around Lys29 and Lys33 of the proximal ubiquitin. Several direct and water-mediated interactions are formed, including Arg74.sup.dist-Glu34.sup.prox, backbone (bb), Arg72.sup.dist-Glu34.sup.prox, bb, Asp39.sup.dist bb-Asp32.sup.prox, bb (FIG. 4c). The crystal structure was obtained at pH 3.5 in presence of 3 M sodium chloride. These conditions may mask additional charged interactions, for example Lys33.sup.prox-Asp52.sup.dist, which are in close proximity but do not seem to interact in any of the dimer interfaces.

[0113] A striking feature of the crystal structure is the exposed location of the ubiquitin hydrophobic patch (FIG. 4b). In the crystal, eight of the twelve hydrophobic patches are not involved in crystal contacts but point towards solvent channels. Furthermore, the hydrophobic patch is extended by Leu71 and Leu73 from the C-terminal tail of ubiquitin (FIG. 4d). Since Arg72 and Arg74 are integral residues of the polar K11-diubiquitin interface, Leu71/Leu73 point outwards and are restrained unlike in monoubiquitin where the C-terminal tail is more mobile. Leu71/Leu73 therefore effectively increase the hydrophobic Ile44-surface (FIG. 4c, d). With this larger hydrophobic patch, interaction of K11-linked chains with UBDs is likely to result in new interaction modes. In particular, proteins with tandem UBDs may be well suited to interact with adjacent hydrophobic patches in K11-linked polyubiquitin. Alternatively, novel classes of UBDs may recognize the unique structural features of K11-linked chains.

[0114] Ubiquitin chains are dynamic entities and may adopt multiple conformations in solution. The solution properties of K11-linked ubiquitin chains were studied with Nuclear Magnetic Resonance (NMR) spectroscopy.

[0115] .sup.13C, .sup.15N-labeled ubiquitin K63R or K11R mutant was expressed from a pET17b plasmid in Rosetta2 (DE3) pLac1 cells. A 100 ml overnight culture grown in LB medium was pelleted and resuspended in modified K-MOPS minimal media (Neidhardt et al., 1974), lacking nitrogen and carbon sources. This was used to inoculate 3 L modified K-MOPS media supplemented with .sup.13C glucose/.sup.15N ammonium chloride. Protein expression was induced after 16 hrs growth at 30.degree. C. with 0.4 mM IPTG, and cells were harvested after a further 4 hrs. Mutant ubiquitin was purified according to Pickart & Rassi, 2005. To obtain only distally labeled K11-linked diubiquitin, wild-type ubiquitin was mixed with .sup.13C, .sup.15N-labeled ubiquitin K11R mutant in a 1:2 ration in a chain assembly reaction. Prior to data acquisition, samples were dialyzed either against phosphate buffered saline (150 mM NaCl, 18 mM Na2HPO.sub.4, 7 mM NaH.sub.2PO.sub.4.times.2H.sub.2O, pH 7.4) or against 150 mM NaCl, 50 mM NH.sub.4Ac (pH 3.5) in 3 kDa cut-off Slide-A-Lyzer dialysis cassettes (Thermo Scientific).

[0116] NMR experiments were acquired on Bruker DRX600 MHz and AV2+ 700 MHz spectrometers equipped with cryogenic triple resonance TCI probes and at a temperature of 298K; all data were processed in Topspin 2.1 (Bruker, Karlsruhe) and analyzed in Sparky (Goddard & Kneller, UCSF). Weighted chemical shift perturbations were measured in .sup.15N fast HSQC experiments (Mori et al., 1995) and defined as ((D.sup.1H).sup.2).sup.0.5+((D.sup.15N/5).sup.2).sup.0.5 [ppm](Hadjuk et al., 1997). Standard triple resonance experiments (HNCACB, CBCA(CO)NH and HNCA) were used to assign all mono- and di-ubiquitin K63R or K11R species and confirm the identity of shifted and doubled resonances.

[0117] .sup.1H, .sup.15N-heteronuclear correlation spectra (HSQC) provide a fingerprint of the local environment of individual residues. These so-called chemical shifts report on the resonance frequencies of all backbone amide protons and nitrogens, and chemical shift perturbations as a consequence of e.g. the formation of an interface are highly specific.

[0118] Applicants assembled uniformly labeled K11 diubiquitin from .sup.13C, .sup.15N-labeled K63R mutant ubiquitin. To subsequently deconvolute the contribution from both parts of the interface, in a second species only the distal moiety of K11-linked diubiquitin was .sup.13C, .sup.15N-labeled. To achieve this, assembly reactions with .sup.13C, .sup.15N-labeled K11R and unlabeled wild-type ubiquitin were performed, in which the K11R mutant serves as a distal chain terminator. To minimize buffer effects, the two labelled diubiquitin species, as well as labelled K63R, and K11R monoubiquitin (all at 100 .mu.M) were dialyzed simultaneously against neutral (pH 7.4) or acidic (pH 3.5) buffer also containing 150 mM NaCl to mask nonspecific interactions. Relaxation experiments and measurements at different concentrations confirmed monodispersity, and allowed to exclude aggregation effects for all species at the chosen experimental conditions. Applicants assigned and confirmed the chemical shift positions in all species with standard tripleresonance experiments (Supp. FIG. 4a). To generate chemical shift perturbation maps, Applicants compared uniformly labeled K11-linked diubiquitin to K63R monoubiquitin, and distally labeled K11-linked diubiquitin to K I R monoubiquitin. To assess the effect of K63R and K11R mutations Applicants compared the labeled monoubiquitin species to find perturbation differences of <0.1 ppm, with exception of the flexible loop region in ubiquitin near K11 that is slightly more perturbed.

[0119] Immediately apparent was the doubling of a defined subset of resonances in the spectra of uniformly labeled K11-linked diubiquitin, associated with the formation of a non-symmetric interface (FIG. 5a). As expected, the resonances for Lys11 and Gly76 involved in the K11-linkage were significantly shifted compared to monoubiquitin (FIG. 5b). The chemical shift perturbation map of this species contains contributions of both sides of the interface (FIG. 5b). To deconvolute the individual contributions, Applicants analyzed chemical shift perturbations of distally labeled K11-linked diubiquitin in comparison to K11R monoubiquitin (FIG. 5b). This revealed the set of perturbed resonances that correspond to the interface on the distal moiety. Importantly, all resonances that were found to be perturbed in the distally labelled K11-linked diubiquitin have equivalent or similar perturbations in the uniformly labelled K11-linked diubiquitin. However, Applicants cannot exclude or quantify contributions to these perturbations from the proximal moiety in this case.

[0120] This analysis shows that K11-linked dimers have a defined pattern of perturbed resonances in solution, which is distinct from the pattern observed for K48-, or K63-linked diubiquitin (Varadan et al., 2004; Varadan et al., 2002; Tenno et al., 2004; FIG. 5d), reflecting (a) unique conformation(s) of K11-linked diubiquitin. Consistent with the crystallographic analysis, the backbone resonances corresponding to residues 41-51 of ubiquitin (including Ile44) are not perturbed, suggesting that this region which is involved in the K48 diubiquitin interface (Varadan et al., 2002; Tenno et al., 2004; FIG. 5d) and in most ubiquitin-UBD interactions (Zhang et al., 2009; Varadan et al., 2005; Raasi et al, 2005), is not involved in the dimer interface in K11-linkages. Instead, the chemical shift perturbations indicate three regions of the ubiquitin surface that contribute to the interface and/or are affected by the K11 isopeptide bond: The flexible .beta.-hairpin loop spanning residues 5-15, possibly as a consequence of the isopeptide bond at K11; residues 29-36 that include the C-terminal part of the ubiquitin helix; and the C-terminal residues from 69-76 (FIG. 5b). Mapping of these residues onto the surface of ubiquitin reveals that the perturbed resonances correspond to a surface that is almost identical to the proximal interaction interface observed in the crystal structure (FIG. 5e). A corresponding distal interface however appears to be more distinct when compared to the crystal structure (FIG. 5f). At this interface, two residues, Gly53 and Asp24, remain exchange broadened as in monomeric ubiquitin (white in FIG. 5f), indicating that this region of the interface is dynamic and may adopt multiple conformations. Similar observations of exchange broadening in interface residues have been made in the case of K48-linked diubiquitin molecules (Varadan et al., 2002). However, two further residues that reside on the distal interface of the crystal structure, Asp39 and 10 Asp52 are also unperturbed (circled in FIG. 5f), indicating that in solution, the distal ubiquitin may rotate or move slightly, adjusting the interface.

[0121] To further distinguish between interface residues and residues perturbed as a result of forming the isopeptide linkage, Applicants analyzed chemical shift perturbations also at low pH. It has previously been shown for K48-linked diubiquitin that low pH `opens` the compact conformation of this chain type resulting in a more transient interface21. If a similar interface `opening` also occurred for K11-linkages, this may allow to define interface residues more confidently. Although at pH 3.5 Applicants observe fewer perturbations compared to pH 7.4, several residues remain perturbed (FIG. 5c). On the other hand, K29 and K33 are perturbed only at pH 7.4 but do not show significant perturbation at pH 3.5 (indicated by arrows in FIG. 5b,c). This suggests that these residues are located at an interface at pH 7.4.

[0122] In summary, the crystal structure represents most likely a more compact conformation compared to the conformation(s) of K11-linked diubiquitin in solution. However, solution studies also reveal significant perturbations indicative of an interface and hence a compact conformation of K11-linked diubiquitin. The distinct perturbation pattern suggests that K29 and K33 reside at the diubiquitin interface, which would result in a unique conformation compared to K48- and K63-linked diubiquitin. The data also highlight the dynamic nature of K11-linked ubiquitin chains. Further analysis will be required to analyze preferred domain orientations in K11-linked ubiquitin chains in solution. Taken together, the unique structural features of K11-linked diubiquitin highlight the conformational variability of differently linked ubiquitin chains (FIG. 5b,d).

REFERENCES

[0123] Komander, D. The emerging complexity of protein ubiquitination. Biochem Soc Trans 37. 937-53 (2009). [0124] Chen, Z. J. & Sun, L. J. Nonproteolytic functions of ubiquitin in cell signaling. Mol Cell 33, 275-86 (2009). [0125] Hershko, A. & Ciechanover, A. The ubiquitin system. Annu Rev Biochem 67, 425-79 (1998). [0126] Xu, P. et al. Quantitative proteomics reveals the function of unconventional ubiquitin chains in proteasomal degradation. Cell 137, 133-45 (2009). [0127] Peng, J. et al. A proteomics approach to understanding protein ubiquitination. Nat Biotechnol 21, 921-6 (2003). [0128] Dye, B. T. & Schulman, B. A. Structural mechanisms underlying posttranslational modification by ubiquitin-like proteins. Annu Rev Biophys Biomol Struct 36, 131-50 (2007). [0129] Ye, Y. & Rape, M. Building ubiquitin chains: E2 enzymes at work. Nat Rev Mol Cell Biol 10, 755-64 (2009). [0130] Hofmann, R. M. & Pickart, C. M. Noncanonical MMS2-encoded ubiquitin-conjugating enzyme functions in assembly of novel polyubiquitin chains for DNA repair. Cell 96, 645-53 (1999). [0131] Chen, Z. & Pickart, C. M. A 25-kilodalton ubiquitin carrier protein (E2) catalyzes multi-ubiquitin chain synthesis via lysine 48 of ubiquitin. J Biol Chem 265, 21835-42 (1990). [0132] Baboshina, O. V. & Haas, A. L. Novel multiubiquitin chain linkages catalyzed by the conjugating enzymes E2EPF and RAD6 are recognized by 26 S proteasome subunit 5. J Biol Chem 271, 2823-31 (1996). [0133] Jin, L., Williamson, A., Banerjee, s., Philipp, I. & Rape, M. Mechanism of ubiquitin-chain formation by the human anaphase-promoting complex. Cell 133, 653-65 (2008). [0134] Alexandru, G. et al. UBXD7 binds multiple ubiquitin ligases and implicates p97 in HIF1 alpha turnover. Cell 134, 804-16 (2008). [0135] Reyes-Turcu, F. E. et al. The ubiquitin binding domain ZnF UBP recognizes the C-terminal diglycine motif of unanchored ubiquitin. Cell 124, 1197-208 (2006). [0136] McCullough, J., Clague, M. J. & Urbe, S. AMSH is an endosome-associated ubiquitin isopeptidase. J Cell Biol 166, 487-92 (2004). [0137] Varadan, R. et al. Solution conformation of Lys63-linked di-ubiquitin chain provides clues to functional diversity of polyubiquitin signaling. J Biol Chem 279, 7055-63 (2004). [0138] Varadan, R., Walker, O., Pickart, C. & Fushman, D. Structural properties of polyubiquitin chains in solution. J Mol Biol 324, 637-47 (2002). [0139] Tenno, T. et al. Structural basis for distinct roles of Lys63- and Lys48-linked polyubiquitin chains. Genes Cells 9, 865-75 (2004). [0140] Zhang, N. et al. Structure of the s5a:k48-linked diubiquitin complex and its interactions with rpn13. Mol Cell 35, 280-90 (2009). [0141] Varadan, R., Assfalg, M., Raasi, S., Pickart, C. & Fushman, D. Structural determinants for selective recognition of a Lys48-linked polyubiquitin chain by a UBA domain. Mol Cell 18, 687-98 (2005). [0142] Raasi, S., Varadan, R., Fushman, D. & Pickart, C. M. Diverse polyubiquitin interaction properties of ubiquitin-associated domains. Nat Struct Mol Biol 12, 708-14 (2005). [0143] Evans, P. Scaling and assessment of data quality. Acta Crystalogr D Biol Crystallogr 62, 72-82 (2006). [0144] Vagin, A. & Teplyakov, A. An approach to multi-copy search in molecular replacement. Acta Crystallogr D Biol Crystallogr 56, 1622-4 (2000). [0145] Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 60, 2126-32 (2004). [0146] Adams, P. D. et al. PHENIX: building new software for automated crystallographic structure determination. Acta Crystallogr D Biol Crystallogr 58, 1948-54 (2002). [0147] Neidhardt, F. C., Bloch, P. L. & Smith, D. F. Culture medium for enterobacteria. J Bacteriol 119, 736-47 (1974). [0148] Pickart, C. M. & Raasi, S. Controlled synthesis of polyubiquitin chains. Methods Enzymol 399, 21-36 (2005). [0149] Mori, S., Abeygunawardana, C. Johnson, M. O. & van Zijl, P. C. Improved sensitivity of HSQC spectra of exchanging protons at short interscan delays using a new fast HSQC (FHSQC) detection scheme that avoids water saturation. J Magn Reson B 108, 94-8 (1995). [0150] Hajduk, P. J. et al. NMR-based discovery of lead inhibitors that block DNA binding of the human papillomavirus E2 protein. J Med Chem 40, 3144-50 (1997). [0151] Dikic, et al., Ubiquitin-binding domains--from structures to functions. Nat Rev Mol Biol 10:659-671, 2009 [0152] Hicke et al., Nat Rev Cell Biol 6:610-621, 2005

[0153] The invention is further described by the following numbered paragraphs:

[0154] 1. An E2 enzyme comprising a Ubc domain, from which an N-terminal tail or a C-terminal tail has been removed.

[0155] 2. An E2 enzyme according to paragraph 1, which is a chimeric enzyme wherein the Ubc is fused to a heterologous ubiquitin-binding domain (UBD).

[0156] 3. A chimeric E2 enzyme according to paragraph 2, wherein the UBD is C-terminal to the Ubc domain.

[0157] 4. A chimeric E2 enzyme according to paragraph 2 or paragraph 3, wherein the UBD is an .alpha.-helical, zinc finger or pleckstrin homology domain.

[0158] 5. A chimeric E2 enzyme according to paragraph 2 or paragraph 3, wherein the UBD is a domain selected from the group consisting of UIM, IUIM (MIU), DUIM, UBM, UBA, GAT, CUE, VHS, UBZ, NZF, ZnF A20, ZnF UBP (PAZ), PRU, GLUE, UEV, UBC, SH3, PFU and Jab1/MNP domains.

[0159] 6. A chimeric E2 enzyme according to paragraph 4 or paragraph 5, wherein the UBD is derived from Isopeptidase T.

[0160] 7. A chimeric E2 enzyme according to paragraph 6, wherein the UBD comprises the sequence from about position 163 to about position 291 of Isopeptidase T.

[0161] 8. A chimeric E2 enzyme according to paragraph 4 or paragraph 5, wherein the UBD is a UBA, UIM, ZnF or NZF domain.

[0162] 9. An E2 enzyme according to any preceding paragraph, wherein the Ubc domain is derived from an E2 enzyme selected from the group consisting of UBE2A, UBE2B, UBE2C, UBE2D1, UBE2D2, UBE2D3, UBE2D4, UBE2E1, UBE2E2, UBE2E3, UBE2F, UBE2G1, UBE2G2, UBE2H, UBE21, UBE2J1, UBE2J2, UBE2K, UBE2L3, UBE2L6, UBE2M, UBE2N, UBE2NL, UBE2O, UBE2Q1, UBE2Q2, UBE2R1, UBE2R2, UBE2S, UBE2T, UBE2U, UBE2W, UBE2Z and BIRC6.

[0163] 10. An E2 enzyme according to paragraph 9, wherein the E2 enzyme is a class II E2 enzyme.

[0164] 11. An E2 enzyme according to paragraph 10, wherein an N-terminal or a C-terminal amino acid tail on the class II E2 enzyme is replaced by the UBD.

[0165] 12. An E2 enzyme according to paragraph 10 or paragraph 11, wherein the Ubc domain is derived from UBE2S.

[0166] 13. An E2 enzyme according to paragraph 12, wherein the Ubc domain comprises residues 1 to 156 of UBE2S.

[0167] 14. A method for increasing the capacity of an E2 enzyme to produce free polyubiquitin chains in solution, comprising conjugating fusing the Ubc domain of said E2 enzyme to a UBD.

[0168] 15. A method according to paragraph 14, wherein the E2 enzyme is selected from the group consisting of UBE2A, UBE2B, UBE2C, UBE2D1, UBE2D2, UBE2D3, UBE2D4, UBE2E1, UBE2E2, UBE2E3, UBE2F, UBE2G1, UBE2G2, UBE2H, UBE21, UBE2J1, UBE2J2, UBE2K, UBE2L3, UBE2L6, UBE2M, UBE2N, UBE2NL, UBE20, UBE2Q1, UBE2Q2, UBE2R1, UBE2R2, UBE2S, UBE2T, UBE2U, UBE2V1, UBE2V2, UBE2V3, UBE2W, UBE2Z, AKTIP and BIRC6 and the UBD is a domain selected from the group consisting of UIM, IUIM (MIU), DUIM, UBM, UBA, GAT, CUE, VHS, UBZ, NZF, A20-like ZnF, ZnF UBP (PAZ), PRU, GLUE, UEV, UBC, SH3, PFU and Jab1/MNP domains.

[0169] 16. A method according to paragraph 15, wherein the E2 enzyme is UBE2S.

[0170] 17. A method according to any one of paragraphs 14 to 16, wherein the UBD is a ZnF UBP domain.

[0171] 18. A method for producing free polyubiquitin chains linked through a desired lysine residue, comprising the steps of: (a) selecting an E2 enzyme which possesses the desired lysine residue specificity; (b) fusing the Ubc catalytic domain of said E2 enzyme to a UBD ubiquitin binding domain; and incubating the resulting chimeric protein with an E1 ubiquitin activating enzyme and monomeric ubiquitin.

[0172] 19. A method according to paragraph 18, wherein the incidence of undesired lysine linkages is reduced by including a linkage-specific deubiquitinase in the incubation.

[0173] Having thus described in detail preferred embodiments of the present invention, it is to be understood that the invention defined by the above paragraphs is not to be limited to particular details set forth in the above description as many apparent variations thereof are possible without departing from the spirit or scope of the present invention.

Sequence CWU 1

1

381179PRTHomo sapiens 1Met Ala Ser Gln Asn Arg Asp Pro Ala Ala Thr Ser Val Ala Ala Ala 1 5 10 15 Arg Lys Gly Ala Glu Pro Ser Gly Gly Ala Ala Arg Gly Pro Val Gly 20 25 30 Lys Arg Leu Gln Gln Glu Leu Met Thr Leu Met Met Ser Gly Asp Lys 35 40 45 Gly Ile Ser Ala Phe Pro Glu Ser Asp Asn Leu Phe Lys Trp Val Gly 50 55 60 Thr Ile His Gly Ala Ala Gly Thr Val Tyr Glu Asp Leu Arg Tyr Lys 65 70 75 80 Leu Ser Leu Glu Phe Pro Ser Gly Tyr Pro Tyr Asn Ala Pro Thr Val 85 90 95 Lys Phe Leu Thr Pro Cys Tyr His Pro Asn Val Asp Thr Gln Gly Asn 100 105 110 Ile Cys Leu Asp Ile Leu Lys Glu Lys Trp Ser Ala Leu Tyr Asp Val 115 120 125 Arg Thr Ile Leu Leu Ser Ile Gln Ser Leu Leu Gly Glu Pro Asn Ile 130 135 140 Asp Ser Pro Leu Asn Thr His Ala Ala Glu Leu Trp Lys Asn Pro Thr 145 150 155 160 Ala Phe Lys Lys Tyr Leu Gln Glu Thr Tyr Ser Lys Gln Val Thr Ser 165 170 175 Gln Glu Pro 2147PRTHomo sapiens 2Met Ala Leu Lys Arg Ile Gln Lys Glu Leu Ser Asp Leu Gln Arg Asp 1 5 10 15 Pro Pro Ala His Cys Ser Ala Gly Pro Val Gly Asp Asp Leu Phe His 20 25 30 Trp Gln Ala Thr Ile Met Gly Pro Pro Asp Ser Ala Tyr Gln Gly Gly 35 40 45 Val Phe Phe Leu Thr Val His Phe Pro Thr Asp Tyr Pro Phe Lys Pro 50 55 60 Pro Lys Ile Ala Phe Thr Thr Lys Ile Tyr His Pro Asn Ile Asn Ser 65 70 75 80 Asn Gly Ser Ile Cys Leu Asp Ile Leu Arg Ser Gln Trp Ser Pro Ala 85 90 95 Leu Thr Val Ser Lys Val Leu Leu Ser Ile Cys Ser Leu Leu Cys Asp 100 105 110 Pro Asn Pro Asp Asp Pro Leu Val Pro Asp Ile Ala Gln Ile Tyr Lys 115 120 125 Ser Asp Lys Glu Lys Tyr Asn Arg His Ala Arg Glu Trp Thr Gln Lys 130 135 140 Tyr Ala Met 145 3118PRTHomo sapiens 3Met Phe His Trp Gln Ala Thr Ile Met Gly Pro Asn Asp Ser Pro Tyr 1 5 10 15 Gln Gly Gly Val Phe Phe Leu Thr Ile His Phe Pro Thr Asp Tyr Pro 20 25 30 Phe Lys Pro Pro Lys Val Ala Phe Thr Thr Arg Ile Tyr His Pro Asn 35 40 45 Ile Asn Ser Asn Gly Ser Ile Cys Leu Asp Ile Leu Arg Ser Gln Trp 50 55 60 Ser Pro Ala Leu Thr Ile Ser Lys Val Leu Leu Ser Ile Cys Ser Leu 65 70 75 80 Leu Cys Asp Pro Asn Pro Asp Asp Pro Leu Val Pro Glu Ile Ala Arg 85 90 95 Ile Tyr Lys Thr Asp Arg Glu Lys Tyr Asn Arg Ile Ala Arg Glu Trp 100 105 110 Thr Gln Lys Tyr Ala Met 115 4149PRTHomo sapiens 4 Met Leu Ser Asn Arg Lys Cys Leu Ser Lys Glu Leu Ser Asp Leu Ala 1 5 10 15 Arg Asp Pro Pro Ala Gln Cys Ser Ala Gly Pro Val Gly Asp Asp Met 20 25 30 Phe His Trp Gln Ala Thr Ile Met Gly Pro Asn Asp Ser Pro Tyr Gln 35 40 45 Gly Gly Val Phe Phe Leu Thr Ile His Phe Pro Thr Asp Tyr Pro Phe 50 55 60 Lys Pro Pro Lys Val Ala Phe Thr Thr Arg Ile Tyr His Pro Asn Ile 65 70 75 80 Asn Ser Asn Gly Ser Ile Cys Leu Asp Ile Leu Arg Ser Gln Trp Ser 85 90 95 Pro Ala Leu Thr Ile Ser Lys Val Leu Leu Ser Ile Cys Ser Leu Leu 100 105 110 Cys Asp Pro Asn Pro Asp Asp Pro Leu Val Pro Glu Ile Ala Arg Ile 115 120 125 Tyr Lys Thr Asp Arg Asp Lys Tyr Asn Arg Ile Ser Arg Glu Trp Thr 130 135 140 Gln Lys Tyr Ala Met 145 5201PRTHomo sapiens 5Met Ser Thr Glu Ala Gln Arg Val Asp Asp Ser Pro Ser Thr Ser Gly 1 5 10 15 Gly Ser Ser Asp Gly Asp Gln Arg Glu Ser Val Gln Gln Glu Pro Glu 20 25 30 Arg Glu Gln Val Gln Pro Lys Lys Lys Glu Gly Lys Ile Ser Ser Lys 35 40 45 Thr Ala Ala Lys Leu Ser Thr Ser Ala Lys Arg Ile Gln Lys Glu Leu 50 55 60 Ala Glu Ile Thr Leu Asp Pro Pro Pro Asn Cys Ser Ala Gly Pro Lys 65 70 75 80 Gly Asp Asn Ile Tyr Glu Trp Arg Ser Thr Ile Leu Gly Pro Pro Gly 85 90 95 Ser Val Tyr Glu Gly Gly Val Phe Phe Leu Asp Ile Thr Phe Ser Pro 100 105 110 Asp Tyr Pro Phe Lys Pro Pro Lys Val Thr Phe Arg Thr Arg Ile Tyr 115 120 125 His Cys Asn Ile Asn Ser Gln Gly Val Ile Cys Leu Asp Ile Leu Lys 130 135 140 Asp Asn Trp Ser Pro Ala Leu Thr Ile Ser Lys Val Leu Leu Ser Ile 145 150 155 160 Cys Ser Leu Leu Thr Asp Cys Asn Pro Ala Asp Pro Leu Val Gly Ser 165 170 175 Ile Ala Thr Gln Tyr Met Thr Asn Arg Ala Glu His Asp Arg Met Ala 180 185 190 Arg Gln Trp Thr Lys Arg Tyr Ala Thr 195 200 6207PRTHomo sapiens 6Met Ser Ser Asp Arg Gln Arg Ser Asp Asp Glu Ser Pro Ser Thr Ser 1 5 10 15 Ser Gly Ser Ser Asp Ala Asp Gln Arg Asp Pro Ala Ala Pro Glu Pro 20 25 30 Glu Glu Gln Glu Glu Arg Lys Pro Ser Ala Thr Gln Gln Lys Lys Asn 35 40 45 Thr Lys Leu Ser Ser Lys Thr Thr Ala Lys Leu Ser Thr Ser Ala Lys 50 55 60 Arg Ile Gln Lys Glu Leu Ala Glu Ile Thr Leu Asp Pro Pro Pro Asn 65 70 75 80 Cys Ser Ala Gly Pro Lys Gly Asp Asn Ile Tyr Glu Trp Arg Ser Thr 85 90 95 Ile Leu Gly Pro Pro Gly Ser Val Tyr Glu Gly Gly Val Phe Phe Leu 100 105 110 Asp Ile Thr Phe Ser Ser Asp Tyr Pro Phe Lys Pro Pro Lys Val Thr 115 120 125 Phe Arg Thr Arg Ile Tyr His Cys Asn Ile Asn Ser Gln Gly Val Ile 130 135 140 Cys Leu Asp Ile Leu Lys Asp Asn Trp Ser Pro Ala Leu Thr Ile Ser 145 150 155 160 Lys Val Leu Leu Ser Ile Cys Ser Leu Leu Thr Asp Cys Asn Pro Ala 165 170 175 Asp Pro Leu Val Gly Ser Ile Ala Thr Gln Tyr Leu Thr Asn Arg Ala 180 185 190 Glu His Asp Arg Ile Ala Arg Gln Trp Thr Lys Arg Tyr Ala Thr 195 200 205 7185PRTHomo sapiens 7Met Leu Thr Leu Ala Ser Lys Leu Lys Arg Asp Asp Gly Leu Lys Gly 1 5 10 15 Ser Arg Thr Ala Ala Thr Ala Ser Asp Ser Thr Arg Arg Val Ser Val 20 25 30 Arg Asp Lys Leu Leu Val Lys Glu Val Ala Glu Leu Glu Ala Asn Leu 35 40 45 Pro Cys Thr Cys Lys Val His Phe Pro Asp Pro Asn Lys Leu His Cys 50 55 60 Phe Gln Leu Thr Val Thr Pro Asp Glu Gly Tyr Tyr Gln Gly Gly Lys 65 70 75 80 Phe Gln Phe Glu Thr Glu Val Pro Asp Ala Tyr Asn Met Val Pro Pro 85 90 95 Lys Val Lys Cys Leu Thr Lys Ile Trp His Pro Asn Ile Thr Glu Thr 100 105 110 Gly Glu Ile Cys Leu Ser Leu Leu Arg Glu His Ser Ile Asp Gly Thr 115 120 125 Gly Trp Ala Pro Thr Arg Thr Leu Lys Asp Val Val Trp Gly Leu Asn 130 135 140 Ser Leu Phe Thr Asp Leu Leu Asn Phe Asp Asp Pro Leu Asn Ile Glu 145 150 155 160 Ala Ala Glu His His Leu Arg Asp Lys Glu Asp Phe Arg Asn Lys Val 165 170 175 Asp Asp Tyr Ile Lys Arg Tyr Ala Arg 180 185 8318PRTHomo sapiens 8Met Glu Thr Arg Tyr Asn Leu Lys Ser Pro Ala Val Lys Arg Leu Met 1 5 10 15 Lys Glu Ala Ala Glu Leu Lys Asp Pro Thr Asp His Tyr His Ala Gln 20 25 30 Pro Leu Glu Asp Asn Leu Phe Glu Trp His Phe Thr Val Arg Gly Pro 35 40 45 Pro Asp Ser Asp Phe Asp Gly Gly Val Tyr His Gly Arg Ile Val Leu 50 55 60 Pro Pro Glu Tyr Pro Met Lys Pro Pro Ser Ile Ile Leu Leu Thr Ala 65 70 75 80 Asn Gly Arg Phe Glu Val Gly Lys Lys Ile Cys Leu Ser Ile Ser Gly 85 90 95 His His Pro Glu Thr Trp Gln Pro Ser Trp Ser Ile Arg Thr Ala Leu 100 105 110 Leu Ala Ile Ile Gly Phe Met Pro Thr Lys Gly Glu Gly Ala Ile Gly 115 120 125 Ser Leu Asp Tyr Thr Pro Glu Glu Arg Arg Ala Leu Ala Lys Lys Ser 130 135 140 Gln Asp Phe Cys Cys Glu Gly Cys Gly Ser Ala Met Lys Asp Val Leu 145 150 155 160 Leu Pro Leu Lys Ser Gly Ser Asp Ser Ser Gln Ala Asp Gln Glu Ala 165 170 175 Lys Glu Leu Ala Arg Gln Ile Ser Phe Lys Ala Glu Val Asn Ser Ser 180 185 190 Gly Lys Thr Ile Ser Glu Ser Asp Leu Asn His Ser Phe Ser Leu Thr 195 200 205 Asp Leu Gln Asp Asp Ile Pro Thr Thr Phe Gln Gly Ala Thr Ala Ser 210 215 220 Thr Ser Tyr Gly Leu Gln Asn Ser Ser Ala Ala Ser Phe His Gln Pro 225 230 235 240 Thr Gln Pro Val Ala Lys Asn Thr Ser Met Ser Pro Arg Gln Arg Arg 245 250 255 Ala Gln Gln Gln Ser Gln Arg Arg Leu Ser Thr Ser Pro Asp Val Ile 260 265 270 Gln Gly His Gln Pro Arg Asp Asn His Thr Asp His Gly Gly Ser Ala 275 280 285 Val Leu Ile Val Ile Leu Thr Leu Ala Leu Ala Ala Leu Ile Phe Arg 290 295 300 Arg Ile Tyr Leu Ala Asn Glu Tyr Ile Phe Asp Phe Glu Leu 305 310 315 9259PRTHomo sapiens 9Met Ser Ser Thr Ser Ser Lys Arg Ala Pro Thr Thr Ala Thr Gln Arg 1 5 10 15 Leu Lys Gln Asp Tyr Leu Arg Ile Lys Lys Asp Pro Val Pro Tyr Ile 20 25 30 Cys Ala Glu Pro Leu Pro Ser Asn Ile Leu Glu Trp His Tyr Val Val 35 40 45 Arg Gly Pro Glu Met Thr Pro Tyr Glu Gly Gly Tyr Tyr His Gly Lys 50 55 60 Leu Ile Phe Pro Arg Glu Phe Pro Phe Lys Pro Pro Ser Ile Tyr Met 65 70 75 80 Ile Thr Pro Asn Gly Arg Phe Lys Cys Asn Thr Arg Leu Cys Leu Ser 85 90 95 Ile Thr Asp Phe His Pro Asp Thr Trp Asn Pro Ala Trp Ser Val Ser 100 105 110 Thr Ile Leu Thr Gly Leu Leu Ser Phe Met Val Glu Lys Gly Pro Thr 115 120 125 Leu Gly Ser Ile Glu Thr Ser Asp Phe Thr Lys Arg Gln Leu Ala Val 130 135 140 Gln Ser Leu Ala Phe Asn Leu Lys Asp Lys Val Phe Cys Glu Leu Phe 145 150 155 160 Pro Glu Val Val Glu Glu Ile Lys Gln Lys Gln Lys Ala Gln Asp Glu 165 170 175 Leu Ser Ser Arg Pro Gln Thr Leu Pro Leu Pro Asp Val Val Pro Asp 180 185 190 Gly Glu Thr His Leu Val Gln Asn Gly Ile Gln Leu Leu Asn Gly His 195 200 205 Ala Pro Gly Ala Val Pro Asn Leu Ala Gly Leu Gln Gln Ala Asn Arg 210 215 220 His His Gly Leu Leu Gly Gly Ala Leu Ala Asn Leu Phe Val Ile Val 225 230 235 240 Gly Phe Ala Ala Phe Ala Tyr Thr Val Lys Tyr Val Leu Arg Ser Ile 245 250 255 Ala Gln Glu 10183PRTHomo sapiens 10Met Ile Lys Leu Phe Ser Leu Lys Gln Gln Lys Lys Glu Glu Glu Ser 1 5 10 15 Ala Gly Gly Thr Lys Gly Ser Ser Lys Lys Ala Ser Ala Ala Gln Leu 20 25 30 Arg Ile Gln Lys Asp Ile Asn Glu Leu Asn Leu Pro Lys Thr Cys Asp 35 40 45 Ile Ser Phe Ser Asp Pro Asp Asp Leu Leu Asn Phe Lys Leu Val Ile 50 55 60 Cys Pro Asp Glu Gly Phe Tyr Lys Ser Gly Lys Phe Val Phe Ser Phe 65 70 75 80 Lys Val Gly Gln Gly Tyr Pro His Asp Pro Pro Lys Val Lys Cys Glu 85 90 95 Thr Met Val Tyr His Pro Asn Ile Asp Leu Glu Gly Asn Val Cys Leu 100 105 110 Asn Ile Leu Arg Glu Asp Trp Lys Pro Val Leu Thr Ile Asn Ser Ile 115 120 125 Ile Tyr Gly Leu Gln Tyr Leu Phe Leu Glu Pro Asn Pro Glu Asp Pro 130 135 140 Leu Asn Lys Glu Ala Ala Glu Val Leu Gln Asn Asn Arg Arg Leu Phe 145 150 155 160 Glu Gln Asn Val Gln Arg Ser Met Arg Gly Gly Tyr Ile Gly Ser Thr 165 170 175 Tyr Phe Glu Arg Cys Leu Lys 180 11152PRTHomo sapiens 11Met Ala Gly Leu Pro Arg Arg Ile Ile Lys Glu Thr Gln Arg Leu Leu 1 5 10 15 Ala Glu Pro Val Pro Gly Ile Lys Ala Glu Pro Asp Glu Ser Asn Ala 20 25 30 Arg Tyr Phe His Val Val Ile Ala Gly Pro Gln Asp Ser Pro Phe Glu 35 40 45 Gly Gly Thr Phe Lys Leu Glu Leu Phe Leu Pro Glu Glu Tyr Pro Met 50 55 60 Ala Ala Pro Lys Val Arg Phe Met Thr Lys Ile Tyr His Pro Asn Val 65 70 75 80 Asp Lys Leu Gly Arg Ile Cys Leu Asp Ile Leu Lys Asp Lys Trp Ser 85 90 95 Pro Ala Leu Gln Ile Arg Thr Val Leu Leu Ser Ile Gln Ala Leu Leu 100 105 110 Ser Ala Pro Asn Pro Asp Asp Pro Leu Ala Asn Asp Val Ala Glu Gln 115 120 125 Trp Lys Thr Asn Glu Ala Gln Ala Ile Glu Thr Ala Arg Ala Trp Thr 130 135 140 Arg Leu Tyr Ala Met Asn Asn Ile 145 150 121292PRTHomo sapiens 12Met Ala Asp Pro Ala Ala Pro Thr Pro Ala Ala Pro Ala Pro Ala Gln 1 5 10 15 Ala Pro Ala Pro Ala Pro Glu Ala Val Pro Ala Pro Ala Ala Ala Pro 20 25 30 Val Pro Ala Pro Ala Pro Ala Ser Asp Ser Ala Ser Gly Pro Ser Ser 35 40 45 Asp Ser Gly Pro Glu Ala Gly Ser Gln Arg Leu Leu Phe Ser His Asp 50 55 60 Leu Val Ser Gly Arg Tyr Arg Gly Ser Val His Phe Gly Leu Val Arg 65 70 75 80 Leu Ile His Gly Glu Asp Ser Asp Ser Glu Gly Glu Glu Glu Gly Arg 85 90 95 Gly Ser Ser Gly Cys Ser Glu Ala Gly Gly Ala Gly His Glu Glu Gly 100 105 110 Arg Ala Ser Pro Leu Arg Arg Gly Tyr Val Arg Val Gln Trp Tyr Pro 115 120 125 Glu Gly Val Lys Gln His Val Lys Glu Thr Lys Leu Lys Leu Glu Asp 130 135 140 Arg Ser Val Val Pro Arg Asp Val Val Arg His Met Arg Ser Thr Asp 145 150 155 160 Ser Gln Cys Gly Thr Val Ile Asp Val Asn Ile Asp Cys Ala Val Lys 165

170 175 Leu Ile Gly Thr Asn Cys Ile Ile Tyr Pro Val Asn Ser Lys Asp Leu 180 185 190 Gln His Ile Trp Pro Phe Met Tyr Gly Asp Tyr Ile Ala Tyr Asp Cys 195 200 205 Trp Leu Gly Lys Val Tyr Asp Leu Lys Asn Gln Ile Ile Leu Lys Leu 210 215 220 Ser Asn Gly Ala Arg Cys Ser Met Asn Thr Glu Asp Gly Ala Lys Leu 225 230 235 240 Tyr Asp Val Cys Pro His Val Ser Asp Ser Gly Leu Phe Phe Asp Asp 245 250 255 Ser Tyr Gly Phe Tyr Pro Gly Gln Val Leu Ile Gly Pro Ala Lys Ile 260 265 270 Phe Ser Ser Val Gln Trp Leu Ser Gly Val Lys Pro Val Leu Ser Thr 275 280 285 Lys Ser Lys Phe Arg Val Val Val Glu Glu Val Gln Val Val Glu Leu 290 295 300 Lys Val Thr Trp Ile Thr Lys Ser Phe Cys Pro Gly Gly Thr Asp Ser 305 310 315 320 Val Ser Pro Pro Pro Ser Val Ile Thr Gln Glu Asn Leu Gly Arg Val 325 330 335 Lys Arg Leu Gly Cys Phe Asp His Ala Gln Arg Gln Leu Gly Glu Arg 340 345 350 Cys Leu Tyr Val Phe Pro Ala Lys Val Glu Pro Ala Lys Ile Ala Trp 355 360 365 Glu Cys Pro Glu Lys Asn Cys Ala Gln Gly Glu Gly Ser Met Ala Lys 370 375 380 Lys Val Lys Arg Leu Leu Lys Lys Gln Val Val Arg Ile Met Ser Cys 385 390 395 400 Ser Pro Asp Thr Gln Cys Ser Arg Asp His Ser Met Glu Asp Pro Asp 405 410 415 Lys Lys Gly Glu Ser Lys Thr Lys Ser Glu Ala Glu Ser Ala Ser Pro 420 425 430 Glu Glu Thr Pro Asp Gly Ser Ala Ser Pro Val Glu Met Gln Asp Glu 435 440 445 Gly Ala Glu Glu Pro His Glu Ala Gly Glu Gln Leu Pro Pro Phe Leu 450 455 460 Leu Lys Glu Gly Arg Asp Asp Arg Leu His Ser Ala Glu Gln Asp Ala 465 470 475 480 Asp Asp Glu Ala Ala Asp Asp Thr Asp Asp Thr Ser Ser Val Thr Ser 485 490 495 Ser Ala Ser Ser Thr Thr Ser Ser Gln Ser Gly Ser Gly Thr Ser Arg 500 505 510 Lys Lys Ser Ile Pro Leu Ser Ile Lys Asn Leu Lys Arg Lys His Lys 515 520 525 Arg Lys Lys Asn Lys Ile Thr Arg Asp Phe Lys Pro Gly Asp Arg Val 530 535 540 Ala Val Glu Val Val Thr Thr Met Thr Ser Ala Asp Val Met Trp Gln 545 550 555 560 Asp Gly Ser Val Glu Cys Asn Ile Arg Ser Asn Asp Leu Phe Pro Val 565 570 575 His His Leu Asp Asn Asn Glu Phe Cys Pro Gly Asp Phe Val Val Asp 580 585 590 Lys Arg Val Gln Ser Cys Pro Asp Pro Ala Val Tyr Gly Val Val Gln 595 600 605 Ser Gly Asp His Ile Gly Arg Thr Cys Met Val Lys Trp Phe Lys Leu 610 615 620 Arg Pro Ser Gly Asp Asp Val Glu Leu Ile Gly Glu Glu Glu Asp Val 625 630 635 640 Ser Val Tyr Asp Ile Ala Asp His Pro Asp Phe Arg Phe Arg Thr Thr 645 650 655 Asp Ile Val Ile Arg Ile Gly Asn Thr Glu Asp Gly Ala Pro His Lys 660 665 670 Glu Asp Glu Pro Ser Val Gly Gln Val Ala Arg Val Asp Val Ser Ser 675 680 685 Lys Val Glu Val Val Trp Ala Asp Asn Ser Lys Thr Ile Ile Leu Pro 690 695 700 Gln His Leu Tyr Asn Ile Glu Ser Glu Ile Glu Glu Ser Asp Tyr Asp 705 710 715 720 Ser Val Glu Gly Ser Thr Ser Gly Ala Ser Ser Asp Glu Trp Glu Asp 725 730 735 Asp Ser Asp Ser Trp Glu Thr Asp Asn Gly Leu Val Glu Asp Glu His 740 745 750 Pro Lys Ile Glu Glu Pro Pro Ile Pro Pro Leu Glu Gln Pro Val Ala 755 760 765 Pro Glu Asp Lys Gly Val Val Ile Ser Glu Glu Ala Ala Thr Ala Ala 770 775 780 Val Gln Gly Ala Val Ala Met Ala Ala Pro Met Ala Gly Leu Met Glu 785 790 795 800 Lys Ala Gly Lys Asp Gly Pro Pro Lys Ser Phe Arg Glu Leu Lys Glu 805 810 815 Ala Ile Lys Ile Leu Glu Ser Leu Lys Asn Met Thr Val Glu Gln Leu 820 825 830 Leu Thr Gly Ser Pro Thr Ser Pro Thr Val Glu Pro Glu Lys Pro Thr 835 840 845 Arg Glu Lys Lys Phe Leu Asp Asp Ile Lys Lys Leu Gln Glu Asn Leu 850 855 860 Lys Lys Thr Leu Asp Asn Val Ala Ile Val Glu Glu Glu Lys Met Glu 865 870 875 880 Ala Val Pro Asp Val Glu Arg Lys Glu Asp Lys Pro Glu Gly Gln Ser 885 890 895 Pro Val Lys Ala Glu Trp Pro Ser Glu Thr Pro Val Leu Cys Gln Gln 900 905 910 Cys Gly Gly Lys Pro Gly Val Thr Phe Thr Ser Ala Lys Gly Glu Val 915 920 925 Phe Ser Val Leu Glu Phe Ala Pro Ser Asn His Ser Phe Lys Lys Ile 930 935 940 Glu Phe Gln Pro Pro Glu Ala Lys Lys Phe Phe Ser Thr Val Arg Lys 945 950 955 960 Glu Met Ala Leu Leu Ala Thr Ser Leu Pro Glu Gly Ile Met Val Lys 965 970 975 Thr Phe Glu Asp Arg Met Asp Leu Phe Ser Ala Leu Ile Lys Gly Pro 980 985 990 Thr Arg Thr Pro Tyr Glu Asp Gly Leu Tyr Leu Phe Asp Ile Gln Leu 995 1000 1005 Pro Asn Ile Tyr Pro Ala Val Pro Pro His Phe Cys Tyr Leu Ser 1010 1015 1020 Gln Cys Ser Gly Arg Leu Asn Pro Asn Leu Tyr Asp Asn Gly Lys 1025 1030 1035 Val Cys Val Ser Leu Leu Gly Thr Trp Ile Gly Lys Gly Thr Glu 1040 1045 1050 Arg Trp Thr Ser Lys Ser Ser Leu Leu Gln Val Leu Ile Ser Ile 1055 1060 1065 Gln Gly Leu Ile Leu Val Asn Glu Pro Tyr Tyr Asn Glu Ala Gly 1070 1075 1080 Phe Asp Ser Asp Arg Gly Leu Gln Glu Gly Tyr Glu Asn Ser Arg 1085 1090 1095 Cys Tyr Asn Glu Met Ala Leu Ile Arg Val Val Gln Ser Met Thr 1100 1105 1110 Gln Leu Val Arg Arg Pro Pro Glu Val Phe Glu Gln Glu Ile Arg 1115 1120 1125 Gln His Phe Ser Thr Gly Gly Trp Arg Leu Val Asn Arg Ile Glu 1130 1135 1140 Ser Trp Leu Glu Thr His Ala Leu Leu Glu Lys Ala Gln Ala Leu 1145 1150 1155 Pro Asn Gly Val Pro Lys Ala Ser Ser Ser Pro Glu Pro Pro Ala 1160 1165 1170 Val Ala Glu Leu Ser Asp Ser Gly Gln Gln Glu Pro Glu Asp Gly 1175 1180 1185 Gly Pro Ala Pro Gly Glu Ala Ser Gln Gly Ser Asp Ser Glu Gly 1190 1195 1200 Gly Ala Gln Gly Leu Ala Ser Ala Ser Arg Asp His Thr Asp Gln 1205 1210 1215 Thr Ser Glu Thr Ala Pro Asp Ala Ser Val Pro Pro Ser Val Lys 1220 1225 1230 Pro Lys Lys Arg Arg Lys Ser Tyr Arg Ser Phe Leu Pro Glu Lys 1235 1240 1245 Ser Gly Tyr Pro Asp Ile Gly Phe Pro Leu Phe Pro Leu Ser Lys 1250 1255 1260 Gly Phe Ile Lys Ser Ile Arg Gly Val Leu Thr Gln Phe Arg Ala 1265 1270 1275 Ala Leu Leu Glu Ala Gly Met Pro Glu Cys Thr Glu Asp Lys 1280 1285 1290 13222PRTHomo sapiens 13Met Asn Ser Asn Val Glu Asn Leu Pro Pro His Ile Ile Arg Leu Val 1 5 10 15 Tyr Lys Glu Val Thr Thr Leu Thr Ala Asp Pro Pro Asp Gly Ile Lys 20 25 30 Val Phe Pro Asn Glu Glu Asp Leu Thr Asp Leu Gln Val Thr Ile Glu 35 40 45 Gly Pro Glu Gly Thr Pro Tyr Ala Gly Gly Leu Phe Arg Met Lys Leu 50 55 60 Leu Leu Gly Lys Asp Phe Pro Ala Ser Pro Pro Lys Gly Tyr Phe Leu 65 70 75 80 Thr Lys Ile Phe His Pro Asn Val Gly Ala Asn Gly Glu Ile Cys Val 85 90 95 Asn Val Leu Lys Arg Asp Trp Thr Ala Glu Leu Gly Ile Arg His Val 100 105 110 Leu Leu Thr Ile Lys Cys Leu Leu Ile His Pro Asn Pro Glu Ser Ala 115 120 125 Leu Asn Glu Glu Ala Gly Arg Leu Leu Leu Glu Asn Tyr Glu Glu Tyr 130 135 140 Ala Ala Arg Ala Arg Leu Leu Thr Glu Ile His Gly Gly Ala Gly Gly 145 150 155 160 Pro Ser Gly Arg Ala Glu Ala Gly Arg Ala Leu Ala Ser Gly Thr Glu 165 170 175 Ala Ser Ser Thr Asp Pro Gly Ala Pro Gly Gly Pro Gly Gly Ala Glu 180 185 190 Gly Pro Met Ala Lys Lys His Ala Gly Glu Arg Asp Lys Lys Leu Ala 195 200 205 Ala Lys Lys Lys Thr Asp Lys Lys Arg Ala Leu Arg Arg Leu 210 215 220 141215PRTUnknownHDAC6 14Met Thr Ser Thr Gly Gln Asp Ser Thr Thr Thr Arg Gln Arg Arg Ser 1 5 10 15 Arg Gln Asn Pro Gln Ser Pro Pro Gln Asp Ser Ser Val Thr Ser Lys 20 25 30 Arg Asn Ile Lys Lys Gly Ala Val Pro Arg Ser Ile Pro Asn Leu Ala 35 40 45 Glu Val Lys Lys Lys Gly Lys Met Lys Lys Leu Gly Gln Ala Met Glu 50 55 60 Glu Asp Leu Ile Val Gly Leu Gln Gly Met Asp Leu Asn Leu Glu Ala 65 70 75 80 Glu Ala Leu Ala Gly Thr Gly Leu Val Leu Asp Glu Gln Leu Asn Glu 85 90 95 Phe His Cys Leu Trp Asp Asp Ser Phe Pro Glu Gly Pro Glu Arg Leu 100 105 110 His Ala Ile Lys Glu Gln Leu Ile Gln Glu Gly Leu Leu Asp Arg Cys 115 120 125 Val Ser Phe Gln Ala Arg Phe Ala Glu Lys Glu Glu Leu Met Leu Val 130 135 140 His Ser Leu Glu Tyr Ile Asp Leu Met Glu Thr Thr Gln Tyr Met Asn 145 150 155 160 Glu Gly Glu Leu Arg Val Leu Ala Asp Thr Tyr Asp Ser Val Tyr Leu 165 170 175 His Pro Asn Ser Tyr Ser Cys Ala Cys Leu Ala Ser Gly Ser Val Leu 180 185 190 Arg Leu Val Asp Ala Val Leu Gly Ala Glu Ile Arg Asn Gly Met Ala 195 200 205 Ile Ile Arg Pro Pro Gly His His Ala Gln His Ser Leu Met Asp Gly 210 215 220 Tyr Cys Met Phe Asn His Val Ala Val Ala Ala Arg Tyr Ala Gln Gln 225 230 235 240 Lys His Arg Ile Arg Arg Val Leu Ile Val Asp Trp Asp Val His His 245 250 255 Gly Gln Gly Thr Gln Phe Thr Phe Asp Gln Asp Pro Ser Val Leu Tyr 260 265 270 Phe Ser Ile His Arg Tyr Glu Gln Gly Arg Phe Trp Pro His Leu Lys 275 280 285 Ala Ser Asn Trp Ser Thr Thr Gly Phe Gly Gln Gly Gln Gly Tyr Thr 290 295 300 Ile Asn Val Pro Trp Asn Gln Val Gly Met Arg Asp Ala Asp Tyr Ile 305 310 315 320 Ala Ala Phe Leu His Val Leu Leu Pro Val Ala Leu Glu Phe Gln Pro 325 330 335 Gln Leu Val Leu Val Ala Ala Gly Phe Asp Ala Leu Gln Gly Asp Pro 340 345 350 Lys Gly Glu Met Ala Ala Thr Pro Ala Gly Phe Ala Gln Leu Thr His 355 360 365 Leu Leu Met Gly Leu Ala Gly Gly Lys Leu Ile Leu Ser Leu Glu Gly 370 375 380 Gly Tyr Asn Leu Arg Ala Leu Ala Glu Gly Val Ser Ala Ser Leu His 385 390 395 400 Thr Leu Leu Gly Asp Pro Cys Pro Met Leu Glu Ser Pro Gly Ala Pro 405 410 415 Cys Arg Ser Ala Gln Ala Ser Val Ser Cys Ala Leu Glu Ala Leu Glu 420 425 430 Pro Phe Trp Glu Val Leu Val Arg Ser Thr Glu Thr Val Glu Arg Asp 435 440 445 Asn Met Glu Glu Asp Asn Val Glu Glu Ser Glu Glu Glu Gly Pro Trp 450 455 460 Glu Pro Pro Val Leu Pro Ile Leu Thr Trp Pro Val Leu Gln Ser Arg 465 470 475 480 Thr Gly Leu Val Tyr Asp Gln Asn Met Met Asn His Cys Asn Leu Trp 485 490 495 Asp Ser His His Pro Glu Val Pro Gln Arg Ile Leu Arg Ile Met Cys 500 505 510 Arg Leu Glu Glu Leu Gly Leu Ala Gly Arg Cys Leu Thr Leu Thr Pro 515 520 525 Arg Pro Ala Thr Glu Ala Glu Leu Leu Thr Cys His Ser Ala Glu Tyr 530 535 540 Val Gly His Leu Arg Ala Thr Glu Lys Met Lys Thr Arg Glu Leu His 545 550 555 560 Arg Glu Ser Ser Asn Phe Asp Ser Ile Tyr Ile Cys Pro Ser Thr Phe 565 570 575 Ala Cys Ala Gln Leu Ala Thr Gly Ala Ala Cys Arg Leu Val Glu Ala 580 585 590 Val Leu Ser Gly Glu Val Leu Asn Gly Ala Ala Val Val Arg Pro Pro 595 600 605 Gly His His Ala Glu Gln Asp Ala Ala Cys Gly Phe Cys Phe Phe Asn 610 615 620 Ser Val Ala Val Ala Ala Arg His Ala Gln Thr Ile Ser Gly His Ala 625 630 635 640 Leu Arg Ile Leu Ile Val Asp Trp Asp Val His His Gly Asn Gly Thr 645 650 655 Gln His Met Phe Glu Asp Asp Pro Ser Val Leu Tyr Val Ser Leu His 660 665 670 Arg Tyr Asp His Gly Thr Phe Phe Pro Met Gly Asp Glu Gly Ala Ser 675 680 685 Ser Gln Ile Gly Arg Ala Ala Gly Thr Gly Phe Thr Val Asn Val Ala 690 695 700 Trp Asn Gly Pro Arg Met Gly Asp Ala Asp Tyr Leu Ala Ala Trp His 705 710 715 720 Arg Leu Val Leu Pro Ile Ala Tyr Glu Phe Asn Pro Glu Leu Val Leu 725 730 735 Val Ser Ala Gly Phe Asp Ala Ala Arg Gly Asp Pro Leu Gly Gly Cys 740 745 750 Gln Val Ser Pro Glu Gly Tyr Ala His Leu Thr His Leu Leu Met Gly 755 760 765 Leu Ala Ser Gly Arg Ile Ile Leu Ile Leu Glu Gly Gly Tyr Asn Leu 770 775 780 Thr Ser Ile Ser Glu Ser Met Ala Ala Cys Thr Arg Ser Leu Leu Gly 785 790 795 800 Asp Pro Pro Pro Leu Leu Thr Leu Pro Arg Pro Pro Leu Ser Gly Ala 805 810 815 Leu Ala Ser Ile Thr Glu Thr Ile Gln Val His Arg Arg Tyr Trp Arg 820 825 830 Ser Leu Arg Val Met Lys Val Glu Asp Arg Glu Gly Pro Ser Ser Ser 835 840 845 Lys Leu Val Thr Lys Lys Ala Pro Gln Pro Ala Lys Pro Arg Leu Ala 850 855 860 Glu Arg Met Thr Thr Arg Glu Lys Lys Val Leu Glu Ala Gly Met Gly 865 870 875 880 Lys Val Thr Ser Ala Ser Phe Gly Glu Glu Ser Thr Pro Gly Gln Thr 885 890 895 Asn Ser Glu Thr Ala Val Val Ala Leu Thr Gln Asp Gln Pro Ser Glu 900 905 910 Ala Ala Thr Gly Gly Ala Thr Leu Ala Gln Thr Ile Ser Glu Ala Ala 915 920 925 Ile Gly Gly Ala Met Leu Gly Gln Thr Thr Ser Glu Glu Ala Val Gly 930 935 940

Gly Ala Thr Pro Asp Gln Thr Thr Ser Glu Glu Thr Val Gly Gly Ala 945 950 955 960 Ile Leu Asp Gln Thr Thr Ser Glu Asp Ala Val Gly Gly Ala Thr Leu 965 970 975 Gly Gln Thr Thr Ser Glu Glu Ala Val Gly Gly Ala Thr Leu Ala Gln 980 985 990 Thr Thr Ser Glu Ala Ala Met Glu Gly Ala Thr Leu Asp Gln Thr Thr 995 1000 1005 Ser Glu Glu Ala Pro Gly Gly Thr Glu Leu Ile Gln Thr Pro Leu 1010 1015 1020 Ala Ser Ser Thr Asp His Gln Thr Pro Pro Thr Ser Pro Val Gln 1025 1030 1035 Gly Thr Thr Pro Gln Ile Ser Pro Ser Thr Leu Ile Gly Ser Leu 1040 1045 1050 Arg Thr Leu Glu Leu Gly Ser Glu Ser Gln Gly Ala Ser Glu Ser 1055 1060 1065 Gln Ala Pro Gly Glu Glu Asn Leu Leu Gly Glu Ala Ala Gly Gly 1070 1075 1080 Gln Asp Met Ala Asp Ser Met Leu Met Gln Gly Ser Arg Gly Leu 1085 1090 1095 Thr Asp Gln Ala Ile Phe Tyr Ala Val Thr Pro Leu Pro Trp Cys 1100 1105 1110 Pro His Leu Val Ala Val Cys Pro Ile Pro Ala Ala Gly Leu Asp 1115 1120 1125 Val Thr Gln Pro Cys Gly Asp Cys Gly Thr Ile Gln Glu Asn Trp 1130 1135 1140 Val Cys Leu Ser Cys Tyr Gln Val Tyr Cys Gly Arg Tyr Ile Asn 1145 1150 1155 Gly His Met Leu Gln His His Gly Asn Ser Gly His Pro Leu Val 1160 1165 1170 Leu Ser Tyr Ile Asp Leu Ser Ala Trp Cys Tyr Tyr Cys Gln Ala 1175 1180 1185 Tyr Val His His Gln Ala Leu Leu Asp Val Lys Asn Ile Ala His 1190 1195 1200 Gln Asn Lys Phe Gly Glu Asp Met Pro His Pro His 1205 1210 1215 15491PRTUnknownRABEX5 15Met Ser Leu Lys Ser Glu Arg Arg Gly Ile His Val Asp Gln Ser Asp 1 5 10 15 Leu Leu Cys Lys Lys Gly Cys Gly Tyr Tyr Gly Asn Pro Ala Trp Gln 20 25 30 Gly Phe Cys Ser Lys Cys Trp Arg Glu Glu Tyr His Lys Ala Arg Gln 35 40 45 Lys Gln Ile Gln Glu Asp Trp Glu Leu Ala Glu Arg Leu Gln Arg Glu 50 55 60 Glu Glu Glu Ala Phe Ala Ser Ser Gln Ser Ser Gln Gly Ala Gln Ser 65 70 75 80 Leu Thr Phe Ser Lys Phe Glu Glu Lys Lys Thr Asn Glu Lys Thr Arg 85 90 95 Lys Val Thr Thr Val Lys Lys Phe Phe Ser Ala Ser Ser Arg Val Gly 100 105 110 Ser Lys Lys Glu Ile Gln Glu Ala Lys Ala Pro Ser Pro Ser Ile Asn 115 120 125 Arg Gln Thr Ser Ile Glu Thr Asp Arg Val Ser Lys Glu Phe Ile Glu 130 135 140 Phe Leu Lys Thr Phe His Lys Thr Gly Gln Glu Ile Tyr Lys Gln Thr 145 150 155 160 Lys Leu Phe Leu Glu Gly Met His Tyr Lys Arg Asp Leu Ser Ile Glu 165 170 175 Glu Gln Ser Glu Cys Ala Gln Asp Phe Tyr His Asn Val Ala Glu Arg 180 185 190 Met Gln Thr Arg Gly Lys Val Pro Pro Glu Arg Val Glu Lys Ile Met 195 200 205 Asp Gln Ile Glu Lys Tyr Ile Met Thr Arg Leu Tyr Lys Tyr Val Phe 210 215 220 Cys Pro Glu Thr Thr Asp Asp Glu Lys Lys Asp Leu Ala Ile Gln Lys 225 230 235 240 Arg Ile Arg Ala Leu Arg Trp Val Thr Pro Gln Met Leu Cys Val Pro 245 250 255 Val Asn Glu Asp Ile Pro Glu Val Ser Asp Met Val Val Lys Ala Ile 260 265 270 Thr Asp Ile Ile Glu Met Asp Ser Lys Arg Val Pro Arg Asp Lys Leu 275 280 285 Ala Cys Ile Thr Lys Cys Ser Lys His Ile Phe Asn Ala Ile Lys Ile 290 295 300 Thr Lys Asn Glu Pro Ala Ser Ala Asp Asp Phe Leu Pro Thr Leu Ile 305 310 315 320 Tyr Ile Val Leu Lys Gly Asn Pro Pro Arg Leu Gln Ser Asn Ile Gln 325 330 335 Tyr Ile Thr Arg Phe Cys Asn Pro Ser Arg Leu Met Thr Gly Glu Asp 340 345 350 Gly Tyr Tyr Phe Thr Asn Leu Cys Cys Ala Val Ala Phe Ile Glu Lys 355 360 365 Leu Asp Ala Gln Ser Leu Asn Leu Ser Gln Glu Asp Phe Asp Arg Tyr 370 375 380 Met Ser Gly Gln Thr Ser Pro Arg Lys Gln Glu Ala Glu Ser Trp Ser 385 390 395 400 Pro Asp Ala Cys Leu Gly Val Lys Gln Met Tyr Lys Asn Leu Asp Leu 405 410 415 Leu Ser Gln Leu Asn Glu Arg Gln Glu Arg Ile Met Asn Glu Ala Lys 420 425 430 Lys Leu Glu Lys Asp Leu Ile Asp Trp Thr Asp Gly Ile Ala Arg Glu 435 440 445 Val Gln Asp Ile Val Glu Lys Tyr Pro Leu Glu Ile Lys Pro Pro Asn 450 455 460 Gln Pro Leu Ala Ala Ile Asp Ser Glu Asn Val Glu Asn Asp Lys Leu 465 470 475 480 Pro Pro Pro Leu Gln Pro Gln Val Tyr Ala Gly 485 490 16660PRTUnknownNPL4 16Leu Glu Arg Arg Trp Arg Arg Arg Arg Glu Ala Gly Ala Gly Ala Glu 1 5 10 15 Ala Ala Ala Gly Ser Ala Arg Pro Leu Gly Arg Gln Ala Ala Ala Ala 20 25 30 Arg Gly Ser Ser Pro Glu Ala Gly Ala Ala Ala Met Ala Glu Ser Ile 35 40 45 Ile Ile Arg Val Gln Ser Pro Asp Gly Val Lys Arg Ile Thr Ala Thr 50 55 60 Lys Arg Glu Thr Ala Ala Thr Phe Leu Lys Lys Val Ala Lys Glu Phe 65 70 75 80 Gly Phe Gln Asn Asn Gly Phe Ser Val Tyr Ile Asn Arg Asn Lys Thr 85 90 95 Gly Glu Ile Thr Ala Ser Ser Asn Lys Ser Leu Asn Leu Leu Lys Ile 100 105 110 Lys His Gly Asp Leu Leu Phe Leu Phe Pro Ser Ser Leu Ala Gly Pro 115 120 125 Ser Ser Glu Met Glu Thr Ser Val Pro Pro Gly Phe Lys Val Phe Gly 130 135 140 Ala Pro Asn Val Val Glu Asp Glu Ile Asp Gln Tyr Leu Ser Lys Gln 145 150 155 160 Asp Gly Lys Ile Tyr Arg Ser Arg Asp Pro Gln Leu Cys Arg His Gly 165 170 175 Pro Leu Gly Lys Cys Val His Cys Val Pro Leu Glu Pro Phe Asp Glu 180 185 190 Asp Tyr Leu Asn His Leu Glu Pro Pro Val Lys His Met Ser Phe His 195 200 205 Ala Tyr Ile Arg Lys Leu Thr Gly Gly Ala Asp Lys Gly Lys Phe Val 210 215 220 Ala Leu Glu Asn Ile Ser Cys Lys Ile Lys Ser Gly Cys Glu Gly His 225 230 235 240 Leu Pro Trp Pro Asn Gly Ile Cys Thr Lys Cys Gln Pro Ser Ala Ile 245 250 255 Thr Leu Asn Arg Gln Lys Tyr Arg His Val Asp Asn Ile Met Phe Glu 260 265 270 Asn His Thr Val Ala Asp Arg Phe Leu Asp Phe Trp Arg Lys Thr Gly 275 280 285 Asn Gln His Phe Gly Tyr Leu Tyr Gly Arg Tyr Thr Glu His Lys Asp 290 295 300 Ile Pro Leu Gly Ile Arg Ala Glu Val Ala Ala Ile Tyr Glu Pro Pro 305 310 315 320 Gln Ile Gly Thr Gln Asn Ser Leu Glu Leu Leu Glu Asp Pro Lys Ala 325 330 335 Glu Val Val Asp Glu Ile Ala Ala Lys Leu Gly Leu Arg Lys Val Gly 340 345 350 Trp Ile Phe Thr Asp Leu Val Ser Glu Asp Thr Arg Lys Gly Thr Val 355 360 365 Arg Tyr Ser Arg Asn Lys Asp Thr Tyr Phe Leu Ser Ser Glu Glu Cys 370 375 380 Ile Thr Ala Gly Asp Phe Gln Asn Lys His Pro Asn Met Cys Arg Leu 385 390 395 400 Ser Pro Asp Gly His Phe Gly Ser Lys Phe Val Thr Ala Val Ala Thr 405 410 415 Gly Gly Pro Asp Asn Gln Val His Phe Glu Gly Tyr Gln Val Ser Asn 420 425 430 Gln Cys Met Ala Leu Val Arg Asp Glu Cys Leu Leu Pro Cys Lys Asp 435 440 445 Ala Pro Glu Leu Gly Tyr Ala Lys Glu Ser Ser Ser Glu Gln Tyr Val 450 455 460 Pro Asp Val Phe Tyr Lys Asp Val Asp Lys Phe Gly Asn Glu Ile Thr 465 470 475 480 Gln Leu Ala Arg Pro Leu Pro Val Glu Tyr Leu Ile Ile Asp Ile Thr 485 490 495 Thr Thr Phe Pro Lys Asp Pro Val Tyr Thr Phe Ser Ile Ser Gln Asn 500 505 510 Pro Phe Pro Ile Glu Asn Arg Asp Val Leu Gly Glu Thr Gln Asp Phe 515 520 525 His Ser Leu Ala Thr Tyr Leu Ser Gln Asn Thr Ser Ser Val Phe Leu 530 535 540 Asp Thr Ile Ser Asp Phe His Leu Leu Leu Phe Leu Val Thr Asn Glu 545 550 555 560 Val Met Pro Leu Gln Asp Ser Ile Ser Leu Leu Leu Glu Ala Val Arg 565 570 575 Thr Arg Asn Glu Glu Leu Ala Gln Thr Trp Lys Arg Ser Glu Gln Trp 580 585 590 Ala Thr Ile Glu Gln Leu Cys Ser Glu Tyr Pro His Pro Leu Pro Arg 595 600 605 His Pro Val Ala Gly Ala Gly Glu Gln Pro Thr Leu His Ser Ser Pro 610 615 620 Leu Pro Val Val Pro Trp Ile Pro His Pro Ala Ala Ser Trp Gln Val 625 630 635 640 Pro Ser Ala Met Gln Arg Val Glu Thr Arg Pro Pro Cys Gln Ala Arg 645 650 655 Gly Arg Leu Arg 660 17693PRTUnknownTAB2 17Met Ala Gln Gly Ser His Gln Ile Asp Phe Gln Val Leu His Asp Leu 1 5 10 15 Arg Gln Lys Phe Pro Glu Val Pro Glu Val Val Val Ser Arg Cys Met 20 25 30 Leu Gln Asn Asn Asn Asn Leu Asp Ala Cys Cys Ala Val Leu Ser Gln 35 40 45 Glu Ser Thr Arg Tyr Leu Tyr Gly Glu Gly Asp Leu Asn Phe Ser Asp 50 55 60 Asp Ser Gly Ile Ser Gly Leu Arg Asn His Met Thr Ser Leu Asn Leu 65 70 75 80 Asp Leu Gln Ser Gln Asn Ile Tyr His His Gly Arg Glu Gly Ser Arg 85 90 95 Met Asn Gly Ser Arg Thr Leu Thr His Ser Ile Ser Asp Gly Gln Leu 100 105 110 Gln Gly Gly Gln Ser Asn Ser Glu Leu Phe Gln Gln Glu Pro Gln Thr 115 120 125 Ala Pro Ala Gln Val Pro Gln Gly Phe Asn Val Phe Gly Met Ser Ser 130 135 140 Ser Ser Gly Ala Ser Asn Ser Ala Pro His Leu Gly Phe His Leu Gly 145 150 155 160 Ser Lys Gly Thr Ser Ser Leu Ser Gln Gln Thr Pro Arg Phe Asn Pro 165 170 175 Ile Met Val Thr Leu Ala Pro Asn Ile Gln Thr Gly Arg Asn Thr Pro 180 185 190 Thr Ser Leu His Ile His Gly Val Pro Pro Pro Val Leu Asn Ser Pro 195 200 205 Gln Gly Asn Ser Ile Tyr Ile Arg Pro Tyr Ile Thr Thr Pro Gly Gly 210 215 220 Thr Thr Arg Gln Thr Gln Gln His Ser Gly Trp Val Ser Gln Phe Asn 225 230 235 240 Pro Met Asn Pro Gln Gln Val Tyr Gln Pro Ser Gln Pro Gly Pro Trp 245 250 255 Thr Thr Cys Pro Ala Ser Asn Pro Leu Ser His Thr Ser Ser Gln Gln 260 265 270 Pro Asn Gln Gln Gly His Gln Thr Ser His Val Tyr Met Pro Ile Ser 275 280 285 Ser Pro Thr Thr Ser Gln Pro Pro Thr Ile His Ser Ser Gly Ser Ser 290 295 300 Gln Ser Ser Ala His Ser Gln Tyr Asn Ile Gln Asn Ile Ser Thr Gly 305 310 315 320 Pro Arg Lys Asn Gln Ile Glu Ile Lys Leu Glu Pro Pro Gln Arg Asn 325 330 335 Asn Ser Ser Lys Leu Arg Ser Ser Gly Pro Arg Thr Ser Ser Thr Ser 340 345 350 Ser Ser Val Asn Ser Gln Thr Leu Asn Arg Asn Gln Pro Thr Val Tyr 355 360 365 Ile Ala Ala Ser Pro Pro Asn Thr Asp Glu Leu Met Ser Arg Ser Gln 370 375 380 Pro Lys Val Tyr Ile Ser Ala Asn Ala Ala Thr Gly Asp Glu Gln Val 385 390 395 400 Met Arg Asn Gln Pro Thr Leu Phe Ile Ser Thr Asn Ser Gly Ala Ser 405 410 415 Ala Ala Ser Arg Asn Met Ser Gly Gln Val Ser Met Gly Pro Ala Phe 420 425 430 Ile His His His Pro Pro Lys Ser Arg Ala Ile Gly Asn Asn Ser Ala 435 440 445 Thr Ser Pro Arg Val Val Val Thr Gln Pro Asn Thr Lys Tyr Thr Phe 450 455 460 Lys Ile Thr Val Ser Pro Asn Lys Pro Pro Ala Val Ser Pro Gly Val 465 470 475 480 Val Ser Pro Thr Phe Glu Leu Thr Asn Leu Leu Asn His Pro Asp His 485 490 495 Tyr Val Glu Thr Glu Asn Ile Gln His Leu Thr Asp Pro Thr Leu Ala 500 505 510 His Val Asp Arg Ile Ser Glu Thr Arg Lys Leu Ser Met Gly Ser Asp 515 520 525 Asp Ala Ala Tyr Thr Gln Ala Leu Leu Val His Gln Lys Ala Arg Met 530 535 540 Glu Arg Leu Gln Arg Glu Leu Glu Ile Gln Lys Lys Lys Leu Asp Lys 545 550 555 560 Leu Lys Ser Glu Val Asn Glu Met Glu Asn Asn Leu Thr Arg Arg Arg 565 570 575 Leu Lys Arg Ser Asn Ser Ile Ser Gln Ile Pro Ser Leu Glu Glu Met 580 585 590 Gln Gln Leu Arg Ser Cys Asn Arg Gln Leu Gln Ile Asp Ile Asp Cys 595 600 605 Leu Thr Lys Glu Ile Asp Leu Phe Gln Ala Arg Gly Pro His Phe Asn 610 615 620 Pro Ser Ala Ile His Asn Phe Tyr Asp Asn Ile Gly Phe Val Gly Pro 625 630 635 640 Val Pro Pro Lys Pro Lys Asp Gln Arg Ser Ile Ile Lys Thr Pro Lys 645 650 655 Thr Gln Asp Thr Glu Asp Asp Glu Gly Ala Gln Trp Asn Cys Thr Ala 660 665 670 Cys Thr Phe Leu Asn His Pro Ala Leu Ile Arg Cys Glu Gln Cys Glu 675 680 685 Met Pro Arg His Phe 690 18835PRTUnknownIso T 18Met Ala Glu Leu Ser Glu Glu Ala Leu Leu Ser Val Leu Pro Thr Ile 1 5 10 15 Arg Val Pro Lys Ala Gly Asp Arg Val His Lys Asp Glu Cys Ala Phe 20 25 30 Ser Phe Asp Thr Pro Glu Ser Glu Gly Gly Leu Tyr Ile Cys Met Asn 35 40 45 Thr Phe Leu Gly Phe Gly Lys Gln Tyr Val Glu Arg His Phe Asn Lys 50 55 60 Thr Gly Gln Arg Val Tyr Leu His Leu Arg Arg Thr Arg Arg Pro Lys 65 70 75 80 Glu Glu Asp Pro Ala Thr Gly Thr Gly Asp Pro Pro Arg Lys Lys Pro 85 90 95 Thr Arg Leu Ala Ile Gly Val Glu Gly Gly Phe Asp Leu Ser Glu Glu 100 105 110 Lys Phe Glu Leu Asp Glu Asp Val Lys Ile Val Ile Leu Pro Asp Tyr 115 120 125 Leu Glu Ile Ala Arg Asp Gly Leu Gly Gly Leu Pro Asp Ile Val Arg 130 135 140 Asp Arg Val Thr Ser Ala Val Glu Ala Leu Leu Ser Ala Asp Ser Ala 145 150 155 160 Ser Arg Lys Gln Glu Val Gln Ala Trp Asp Gly Glu Val Arg Gln Val

165 170 175 Ser Lys His Ala Phe Ser Leu Lys Gln Leu Asp Asn Pro Ala Arg Ile 180 185 190 Pro Pro Cys Gly Trp Lys Cys Ser Lys Cys Asp Met Arg Glu Asn Leu 195 200 205 Trp Leu Asn Leu Thr Asp Gly Ser Ile Leu Cys Gly Arg Arg Tyr Phe 210 215 220 Asp Gly Ser Gly Gly Asn Asn His Ala Val Glu His Tyr Arg Glu Thr 225 230 235 240 Gly Tyr Pro Leu Ala Val Lys Leu Gly Thr Ile Thr Pro Asp Gly Ala 245 250 255 Asp Val Tyr Ser Tyr Asp Glu Asp Asp Met Val Leu Asp Pro Ser Leu 260 265 270 Ala Glu His Leu Ser His Phe Gly Ile Asp Met Leu Lys Met Gln Lys 275 280 285 Thr Asp Lys Thr Met Thr Glu Leu Glu Ile Asp Met Asn Gln Arg Ile 290 295 300 Gly Glu Trp Glu Leu Ile Gln Glu Ser Gly Val Pro Leu Lys Pro Leu 305 310 315 320 Phe Gly Pro Gly Tyr Thr Gly Ile Arg Asn Leu Gly Asn Ser Cys Tyr 325 330 335 Leu Asn Ser Val Val Gln Val Leu Phe Ser Ile Pro Asp Phe Gln Arg 340 345 350 Lys Tyr Val Asp Lys Leu Glu Lys Ile Phe Gln Asn Ala Pro Thr Asp 355 360 365 Pro Thr Gln Asp Phe Ser Thr Gln Val Ala Lys Leu Gly His Gly Leu 370 375 380 Leu Ser Gly Glu Tyr Ser Lys Pro Val Pro Glu Ser Gly Asp Gly Glu 385 390 395 400 Arg Val Pro Glu Gln Lys Glu Val Gln Asp Gly Ile Ala Pro Arg Met 405 410 415 Phe Lys Ala Leu Ile Gly Lys Gly His Pro Glu Phe Ser Thr Asn Arg 420 425 430 Gln Gln Asp Ala Gln Glu Phe Phe Leu His Leu Ile Asn Met Val Glu 435 440 445 Arg Asn Cys Arg Ser Ser Glu Asn Pro Asn Glu Val Phe Arg Phe Leu 450 455 460 Val Glu Glu Arg Ile Lys Cys Leu Ala Thr Glu Lys Val Lys Tyr Thr 465 470 475 480 Gln Arg Val Asp Tyr Ile Met Gln Leu Pro Val Pro Met Asp Ala Ala 485 490 495 Leu Asn Lys Glu Glu Leu Leu Glu Tyr Glu Glu Lys Lys Arg Gln Ala 500 505 510 Glu Glu Glu Lys Met Ala Leu Pro Glu Leu Val Arg Ala Gln Val Pro 515 520 525 Phe Ser Ser Cys Leu Glu Ala Tyr Gly Ala Pro Glu Gln Val Asp Asp 530 535 540 Phe Trp Ser Thr Ala Leu Gln Ala Lys Ser Val Ala Val Lys Thr Thr 545 550 555 560 Arg Phe Ala Ser Phe Pro Asp Tyr Leu Val Ile Gln Ile Lys Lys Phe 565 570 575 Thr Phe Gly Leu Asp Trp Val Pro Lys Lys Leu Asp Val Ser Ile Glu 580 585 590 Met Pro Glu Glu Leu Asp Ile Ser Gln Leu Arg Gly Thr Gly Leu Gln 595 600 605 Pro Gly Glu Glu Glu Leu Pro Asp Ile Ala Pro Pro Leu Val Thr Pro 610 615 620 Asp Glu Pro Lys Ala Pro Met Leu Asp Glu Ser Val Ile Ile Gln Leu 625 630 635 640 Val Glu Met Gly Phe Pro Met Asp Ala Cys Arg Lys Ala Val Tyr Tyr 645 650 655 Thr Asp Asn Ser Gly Ala Glu Ala Ala Met Asn Trp Val Met Ser His 660 665 670 Met Asp Asp Pro Asp Phe Ala Asn Pro Leu Ile Leu Pro Gly Ser Ser 675 680 685 Gly Pro Gly Ser Thr Ser Ala Ala Ala Asp Pro Pro Pro Glu Asp Cys 690 695 700 Val Thr Thr Ile Val Ser Met Gly Phe Ser Arg Asp Gln Ala Leu Lys 705 710 715 720 Ala Leu Arg Ala Thr Asn Asn Ser Leu Glu Arg Ala Val Asp Trp Ile 725 730 735 Phe Ser His Ile Asp Asp Leu Asp Ala Glu Ala Ala Met Asp Ile Ser 740 745 750 Glu Gly Arg Ser Ala Ala Asp Ser Ile Ser Glu Ser Val Pro Val Gly 755 760 765 Pro Lys Val Arg Asp Gly Pro Gly Lys Tyr Gln Leu Phe Ala Phe Ile 770 775 780 Ser His Met Gly Thr Ser Thr Met Cys Gly His Tyr Val Cys His Ile 785 790 795 800 Lys Lys Glu Gly Arg Trp Val Ile Tyr Asn Asp Gln Lys Val Cys Ala 805 810 815 Ser Glu Lys Pro Pro Lys Asp Leu Gly Tyr Ile Tyr Phe Tyr Gln Arg 820 825 830 Val Ala Ser 835 19142PRTArtificial sequenceSynthetic sequence Ubc consensus 19Ser Lys Arg Leu Gln Lys Glu Leu Lys Asp Leu Lys Lys Asp Pro Pro 1 5 10 15 Ser Gly Ile Ser Ala Glu Pro Val Glu Glu Asn Leu Leu Glu Trp His 20 25 30 Gly Thr Ile Arg Gly Pro Pro Asp Thr Pro Tyr Glu Gly Gly Ile Phe 35 40 45 Lys Leu Asp Ile Glu Phe Pro Glu Asp Tyr Pro Phe Lys Pro Pro Lys 50 55 60 Val Arg Phe Val Thr Lys Ile Tyr His Pro Pro Asn Val Asp Glu Asn 65 70 75 80 Gly Lys Ile Cys Leu Ser Ile Leu Lys Thr His Gly Trp Ser Pro Ala 85 90 95 Tyr Thr Leu Arg Thr Val Leu Leu Ser Leu Gln Ser Leu Leu Asn Glu 100 105 110 Pro Asn Pro Ser Asp Pro Leu Asn Ala Glu Ala Ala Lys Leu Tyr Lys 115 120 125 Glu Asn Arg Glu Glu Phe Lys Lys Lys Ala Arg Glu Trp Thr 130 135 140 2016PRTArtificial sequenceSynthetic sequence Ubc Motif Consensus 20Xaa His Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gly Xaa Xaa Cys Xaa Xaa Xaa 1 5 10 15 2137DNAArtificial sequenceSynthetic sequence UBP-FW Primer 21ccaaggttcc atggtacggc aggtgtctaa gcatgcc 372244DNAArtificial sequenceSynthetic sequence UBP-RV Primer 22gcctagcggc cgcttatgtc ttctgcatct tcagcatgtc gatg 442336DNAArtificial sequenceSynthetic sequence NZFfus663FW Primer 23ccaaggttcc atggatgagg gagctcagtg gaattg 362437DNAArtificial sequenceSynthetic sequence NZFfus693RV Primer 24gcctagcggc cgcttatcag aaatgccttg gcatctc 37254PRTArtificial sequenceSynthetic sequence Lysine residue addition 25Leu Arg Gly Gly 1 26783DNAHomo sapiens 26ggcacgagcg agttcctgtc tctctgccaa cgccgcccgg atggcttccc aaaaccgcga 60cccagccgcc actagcgtcg ccgccgcccg taaaggagct gagccgagcg ggggcgccgc 120ccggggtccg gtgggcaaaa ggctacagca ggagctgatg accctcatga tgtctggcga 180taaagggatt tctgccttcc ctgaatcaga caaccttttc aaatgggtag ggaccatcca 240tggagcagct ggaacagtat atgaagacct gaggtataag ctctcgctag agttccccag 300tggctaccct tacaatgcgc ccacagtgaa gttcctcacg ccctgctatc accccaacgt 360ggacacccag ggtaacatat gcctggacat cctgaaggaa aagtggtctg ccctgtatga 420tgtcaggacc attctgctct ccatccagag ccttctagga gaacccaaca ttgatagtcc 480cttgaacaca catgctgccg agctctggaa aaaccccaca gcttttaaga agtacctgca 540agaaacctac tcaaagcagg tcaccagcca ggagccctga cccaggctgc ccagcctgtc 600cttgtgtcgt ctttttaatt tttccttaga tggtctgtcc tttttgtgat ttctgtatag 660gactctttat cttgagctgt ggtatttttg ttttgttttt gtcttttaaa ttaagcctcg 720gttgagccct tgtatattaa ataaatgcat ttttgtcctt ttttaaaaaa aaaaaaaaaa 780aaa 783272669DNAHomo sapiens 27aaaagagtct cgccggcgtc cccgcccgca cactcgcgca cactcgcgct cgggcgcaca 60cggagcaggg accggcgccc ggagcgagcc agggagcggc taaccgggga cccaccgcgc 120ggagccagcc tagctgccag cgagcccaac ccgcgacgac ccacgcccct gagccccgca 180gccgacccct gccggccggt gtccccaccg ccatccctga cccatggcgc tgaagaggat 240tcagaaagaa ttgagtgatc tacagcgcga tccacctgct cactgttcag ctggacctgt 300gggagatgac ttgttccact ggcaagccac tattatgggg cctcctgata gcgcatatca 360aggtggagtc ttctttctca ctgtacattt tccgacagat tatcctttta aaccaccaaa 420gattgctttc acaacaaaaa tttaccatcc aaacataaac agtaatggaa gtatttgtct 480cgatattctg aggtcacaat ggtcaccagc tctgactgta tcaaaagttt tattgtccat 540atgttctcta ctttgtgatc ctaatccaga tgacccctta gtaccagata ttgcacaaat 600ctataaatca gacaaagaaa aatacaacag acatgcaaga gaatggactc agaaatatgc 660aatgtaaaaa tcaaaaacat tttcatatat accagagtac tgtaaaatct aggttttttt 720caacattagc agtaaattga gcactgttta ctgtttcatt gtaccatgaa accatttgat 780ttttacccat tttaaatgtg tttctgaagc aagacaaaac aaacttccaa aaataccctt 840aagactgtga tgagagcatt tatcattttg tatgcattga gaaagacatt tattatggtt 900tttaagatac ttggacatct gcatcttcag cttacaagat ctacaatgca gctgaaaagc 960aaccaaatta ttttttgctg aaactagatg tttttacatg agaaatactg tatgtgttgt 1020ctaagatgtc agttttataa atctgtattc agatttcatt ctttgttagc tcactttata 1080atttgtattt ttttactgta tagactaaat atattctatt tacatgtatg tcaactcatt 1140acttttttcc tgtgaacagt attgaaaaac cccaacggct gataattaag tgaattaact 1200gtgtctccct tgtcttagga tattctgtag attgattgca gatttcttaa atctgaaatg 1260atctttacac tgtaattctc agcatactga ttatggagaa acacttgttt tgattttgtt 1320atacttgact taactttatt gcaatgtgaa ttaattgcac tgctaagtag gaagatgtgt 1380aacttttatt tgttgctatt cacatttgaa ttttttcctg tataggcaat attatattga 1440caccttttac agatcttact gtagcttttt ccatataaat aaaatgcttt ttctactatt 1500tgtcttgatt acttaaaaaa ataaaaatat aagtaaggat caaaactcta aaattttgca 1560tgaaaattac atccaaattg tgaaaatcag atctattttg tttgccatta gtcaccatta 1620gttatataaa ttttattgtt ttaggttagt atctctttac taaattgtca gtctataaga 1680taatatatgt tgatcccttg ctgtagagga gaatttagag taatttgggg tttgtcttgg 1740attatatcta aatggattat ttgttaaaag tactgaaatg agtataaggc agtatcaccc 1800atccaaaaga aaggtcttta tagacctgca cagtcactag attaattcat taaaatgccc 1860ccaccctgat gtaattgaca ttacatttct taacatttta aaatctagaa tttctaaaat 1920ggaatttaat gccatcacaa tttgaaaaac tttttttttt tttttactat agaagttaca 1980aaggaagttc taaaattatg cctccctctg tttttataag ttgccatcga aaagtgattt 2040aaataagcag gttatcttta tagattttaa agaaaactag aaagttttaa tgttttaact 2100tggggaaaaa tacatctctt taatgtttag catgcttgtc aaccttgagt gagtgtcatt 2160tttaagaaca gttgtagccc ttctgattat tgcagtagct gtagaagtat gtaagaatat 2220gtgatgggtg tagtcattag caaagcattt aaatcacttg agtattttgt catggttcat 2280tattattaaa gcacaaaata acctattgtt agaaaatatg tgtttttata aatgaatgta 2340aaataattaa atgaattgtg aaatggatgt ttaagaaaat ataggcttaa aaagtaaatc 2400tataaaatga tgtcttaaaa cagccatatc atgaaaaatt ctacttagct atattattat 2460aagctacatt tgccctgaat ttgaacactc aacatcacta gatttaaata tttagtatat 2520tttgatagta aagggttttg tttcttgaat atcttcactt taaacaaaaa aaaaaaacaa 2580ctttcatttg tgtggcattt atttttggaa gtgtcttctt ttttttcttt attaaagttt 2640ttgaaacttg caaaaaaaaa aaaaaaaaa 2669282879DNAHomo sapiens 28gcttcgcagc gtcacgccct ccggggccgt ggcggcgacg gcggtgcgta gcttactcac 60aggggcggcc cgtatccctc cgccgccggc gcggctcggc cctccctccc ctggcccgcc 120aatccccgcg cctcccgacc tgcccctcgg tcgggcccac cccgtgctcc gacggcccca 180ccccggcggc gcagcccgcc cgcccgcgcg tccctcggtc cacctgcagc agggaggaag 240acaggcaatc cctccggctg tccgaccaag agaggccggc cgagcccgag gcttgggctt 300ttgctttctg gcggagggat ctgcggcggt ttaggaggcg gcgctgatcc tgggaggaag 360aggcagctac ggcggcggcg gcggtggcgg ctagggcggc ggcgaataaa ggggccgccg 420ccgggtgatg cggtgaccgc tgcggcaggc ccaggagctg agtgggcccc ggccctcagc 480ccgtcccgcc ggacccgctt tcctcaactc tccatcttct cctgccgacc gagatcgccg 540aggcggcctc aggctcccta gccccttccc cgtcccttcc ccgcccccgt ccccgccccg 600ggggccgccg ccacccgcct cccaccatgg ctctgaagag aatccacaag ctccctccac 660aaaaccgcct gagctcgggc tgacagagga agccgttttg cccgatccac aagtatatcc 720tgagttcact tacctcttgg gtggcagcac acatcggtcc accctgcttg tccagaaact 780gttaagagtt ggaagttcag aagaaaaaaa aaaggaattg aatgatctgg cacgggaccc 840tccagcacag tgttcagcag gtcctgttgg agatgatatg ttccattggc aagctacaat 900aatggggcca aatgacagtc cctatcaggg tggagtattt ttcttgacaa ttcatttccc 960aacagattac cccttcaaac cacctaaggt tgcatttaca acaagaattt atcatccaaa 1020tattaacagt aatggcagca tttgtcttga tattctacga tcacagtggt ctccagcact 1080aactatttca aaagtactct tgtccatctg ttctctgttg tgtgatccca atccagatga 1140tcctttagtg cctgagattg ctcggatcta caaaacagat agagaaaagt acaacagaat 1200agctcgggaa tggactcaga agtatgcgat gtaattaaag aaattattgg ataacctcta 1260caaataaaga taggggaact ctgaaagaga aagtcctttt gatttccatt tgactgcttt 1320ctatgagccc acgcctcatc ttcccctgtg cacatgttta cctgatacag cagtgctgcg 1380tgttgtacat acttggaaca acaaactaga aatactgtac ttctgtacca acattgcctc 1440ctagcagaga agtgtgtgtg tgacaagcca gttctacagg cattacctag gtgtgagact 1500aaaagctttt cttattgact taaatttgga taacagcaag gtgtgagggg ggtggtgggt 1560atggtgtgtg cttggatggg aaagaaaagg ctccactcac ctataggaga ttatttttaa 1620gtggaatcca tttaaactca aaacagttat gaaaagcaag gtgaagaaca tgaagctgtg 1680tctgtattca ttttattccg aaggagctac gtcttaggtg aaagttatga ccaaccagat 1740taaactctac ccacatcctg cattttaagg tctaagttta actggtcaac atttaaatgg 1800attggagcta ttagtacatc aagtgtgatg ggctttgttc ccaactcttt tacatctccc 1860taccccttca acctttggcc tttcagccct tctttctctc ttccatattc tttggtttgt 1920atgtggtttc tcagttaata catagctaat agctcttatt tttcttatgt ttttaaccgc 1980ttaggtctat ttggatgtaa gggtgaaaat tcatttgatg gaaatacttg tgtatattta 2040aagacccaat tgctcctctg gagcttgtac tttcaagaat gattaatctg tgtaataaac 2100tggttactac agtcattaca tataattttg tgtgaatagg ctttttcatt tttaagaagt 2160ttgtctagct gagattagtg gtggattttc tcccacttct gaaatgttca tttatactgg 2220ttgcatttta agatcatgaa acaattccag ttacattgta aaaaggatat cttacgagta 2280attttattga acaagttaga ggcataagct taagagcatt tccatgaaac aacacatgca 2340gcattccagg aacttgattg ttaaattcaa taagaaattt gctttattaa tgaaactaag 2400ctgcatttca tcaaaacctt gtgacattcc cttggtacat aggacataaa acacagaggc 2460attgctattt ggtaagttaa gcttctgtga ttgtaattat aaaagagcaa cattgaccaa 2520acctgggaaa caagagcaca gtcttgtttg gagagtctac ataattactt tgcactaaca 2580tttgcaggat gttcacacaa ttttaaattg tactgtatgt ggctttttga agtcttccct 2640tgaccctagt aaaatatagc ttgaaacttg taaacaactg tgtttgccag aaacatcatt 2700catgtgaact aggcaagtta ccttttttcc ccccttcttt tcctaattgt aaactaggcc 2760aacctgaaag ccatggctga tgctctagcc atcaggttct ttcaaatgca tctttacact 2820cttgcacaaa agttaaggaa taaatgtcca ctgcttttgg ttttaaaaaa aaaaaaaaa 2879292006DNAHomo sapiens 29ggaatctcgt gtgaaggtgg ccctcctctt gggcctttaa cgtctgtaga tgctggagac 60cagcagaaag gatactgtgt gcgatgagat aagcatgtga gaatgctttc taaccgaaag 120tgcctttcaa aagaacttag tgatttggcc cgtgaccctc cagcacaatg ttctgcaggt 180ccagttgggg atgatatgtt tcattggcaa gccacaatta tgggacctaa tgacagccca 240tatcaaggcg gtgtattctt tttgacaatt cattttccta cagactaccc cttcaaacca 300cctaaggttg catttacaac aagaatttat catccaaata ttaacagtaa tggcagcatt 360tgtctcgata ttctaagatc acagtggtcg cctgctttaa caatttctaa agttctttta 420tccatttgtt cactgctatg tgatccaaac ccagatgacc ccctagtgcc agagattgca 480cggatctata aaacagacag agataagtac aacagaatat ctcgggaatg gactcagaag 540tatgccatgt gatgctacct taaagtcaga ataacctgca ttatagctgg aataaacttt 600aaattactgt tccttttttg attttcttat ccggctgctc ccctatcaga cctcatcttt 660tttaatttta ttttttgttt acctccctcc attcattcac atgctcatct gagaagactt 720aagttcttcc agctttggac aataactgct tttagaaact gtaaagtagt tacaagagaa 780cagttgccca agactcagaa tttttaaaaa aaaaaatgga gcatgtgtat tatgtggcca 840atgtcttcac tctaacttgg ttatgagact aaaaccattc ctcactgctc taacatgctg 900aagaaatcat ctgaggggga gggagatgga tgctcagttg tcacatcaaa ggatacagca 960ttattctagc agcatccatt cttgtttaag ccttccactg ttagagattt gaggttacat 1020gatatgcttt atgctcataa ctgatgtggc tggagaattg gtattgaatt tatagcatca 1080gcagaacaga aaatgtgatg tattttatgc atgtcaataa aggaatgacc tgttcttgtt 1140ctacagagaa tggaaattgg aagtcaaaca ccctttgtat tccaaaatag ggtctcaaac 1200attttgtaat tttcatttaa attgttagga ggcttggagc tattagttaa tctatcttcc 1260aatacactgt ttaatatagc actgaataaa tgatgcaagt tgtcaatgga tgagtgatca 1320actaatagct ctgctagtaa ttgatttatt tttcttcaat aaagttgcat aaaccaatga 1380gttagctgcc tggattaatc agtatgggaa acaatctttt gtaaatgcaa agctgttttt 1440tgtatatact gttgggattt gcttcattgt ttgacatcaa atgatgatgt aaagttcgaa 1500agagtgaata ttttgccatg ttcagttaaa gtgcacagtc tgttacaggt tgacacattg 1560cttgacctga tttatgcaga attaataagc tatttggata gtgtagcttt aatgtgctgc 1620acatgatact ggcagcccta gagttcatag atggactttt gggacccagc agttttgaaa 1680tgtgtttatg gagtttaaga aatttatttt ccaggtgcag cccctgtcta actgaaattt 1740ctcttcacct tgtacacttg acagctgaaa aaaaacaaca tgggagtaat aatgggtcaa 1800aatttgcaaa ataaagtact gttttggtgt gggagttgtc atgaggctgt gttgaagtga 1860cttatctatg tgggatattg agtatccatt gaaatggatt tgttcagcca tttacattaa 1920tgagcattta aatgcaacag atatcatttc aggtgactta acatgaatga ataaaagtca 1980atgctattgg aaaaaaaaaa aaaaaa 2006301760DNAHomo sapiens 30gacaggcgtg gtcgggtgcg tggtgcgtgg gtccggcttt cggtgactag acggtccgca 60ggggacatcc cgtccctggg gcctccccag tctccctccc cctcgcgcct gggcagctct 120ctcccagggc ttcggctcga gcctgcgacc tgcacggaca cccccccctc aggatctaaa 180atgtccactg aggcacaaag agttgatgac agtccaagca ctagtggagg aagttccgat 240ggagatcaac gtgaaagtgt tcagcaagaa ccagaaagag aacaagttca gcccaagaaa 300aaggagggaa aaatatccag caaaaccgct gctaaattgt caactagtgc taaaagaatt 360cagaaggaac ttgcagaaat cacattggac cctcctccca actgtagtgc tggacccaaa 420ggagacaaca tttatgaatg gaggtcaact atattgggac

ccccaggatc tgtctatgaa 480ggaggggtgt tctttcttga cattaccttt tcaccagact atccgtttaa accccctaag 540gttaccttcc gaacaagaat ctatcactgt aatattaaca gccaaggtgt gatctgtctg 600gacatcttaa aggacaactg gagtccggct ttaactattt ctaaagttct cctctccatc 660tgctcacttc ttacagattg caaccctgct gaccctctgg tgggcagcat cgccacacag 720tacatgacca acagagcaga gcatgaccgg atggccagac agtggaccaa gcggtacgcc 780acataggggc ctgctgcctg ccgccccgcg ggacctgtgc aagcacattc accaagtgca 840tcggtagccc tgcccacccc tccagacctc ggttcttatt ttcctatttt tattaaattt 900ggaaccattt tgtgatggta tgttgtccat cttcccatcc cagttcttcc tgcccccctt 960cctctctccc acgctctctt ttatctctca ttttattccc ttgttgattt ctgttaactt 1020gaaagatttg ggattttttc ccacctcatc atagatggga acttttgttt tcagtgcaaa 1080caatgttgga gctgtaatag taagagcttt cttacaaagc tttgtattac tgtgtggttt 1140tgtttttttt gttgttgttt atttgatttt gatttttttt tcttttatgt gatctttggg 1200aaaacacatt cagaattata tctcgtttct acttaaatgt agtgcttagg gttaattttt 1260tgtactgaag tctttattgg tgggtgcatg ctactgggaa caagtttttg tacaaaagct 1320tcaatcagaa tcactgtgca ttactgagac tctgtttatc actagccttc tgtccctccc 1380gcagaagact gttggattga acaaaataat atgtattttg atttacttaa agtgcttgta 1440aatttcttag ggacctgcca cttttgactg tggatcagtt gatgtacact tgtattatta 1500aagcactcaa taaatcactg tggctgataa ctgcacttct ggtaacccga catttgcttt 1560gtgtcctggt gaccgctgta gccctacgtg cagtgaggct tgtctaattc aattacaggt 1620tcaagtgtat ttttcatctc aaacctctaa tatttctttg gagttgagtt gcttagcatg 1680tggaatttct ccagctgtca gtagcctgat gattttatgg ttgttatagt aaattgctat 1740cattttacat attgactggg 1760311559DNAHomo sapiens 31gactgcgcgg ccgggaggag ccgagccggg cggcggcggc gggaggctac agcgcgcggg 60ggtctcccgc gtcccctccg cctcgccggg agctcgcgcc ctcgcccagc cgagctccca 120cccccgcttt tttccgaagg cgctgggcgg cgccaccctc cggccggagc ccggcactgc 180acaaccccct ccgactttca atgttccaca ctccccggcc agagcctcct cggcttcttt 240ttttccctcc ccccccttcc cccccccaca gctgcctcca tttccttaag gaagggtttt 300tttctctctc cctcccccac accgtagcgg cgcgcgagcg ggccgggcgg gcggccgagt 360tttccaagag ataacttcac caagatgtcc agtgataggc aaaggtccga tgatgagagc 420cccagcacca gcagtggcag ttcagatgcg gaccagcgag acccagccgc tccagagcct 480gaagaacaag aggaaagaaa accttctgcc acccagcaga agaaaaacac caaactctct 540agcaaaacca ctgctaagtt atccactagt gctaaaagaa ttcagaagga gctagctgaa 600ataacccttg atcctcctcc taattgcagt gctgggccta aaggagataa catttatgaa 660tggagatcaa ctatacttgg tccaccgggt tctgtatatg aaggtggtgt gttttttctg 720gatatcacat tttcatcaga ttatccattt aagccaccaa aggttacttt ccgcaccaga 780atctatcact gcaacatcaa cagtcaggga gtcatctgtc tggacatcct taaagacaac 840tggagtcccg ctttgactat ttcaaaggtt ttgctgtcta tttgttccct tttgacagac 900tgcaaccctg cggatcctct ggttggaagc atagccactc agtatttgac caacagagca 960gaacacgaca ggatagccag acagtggacc aagagatacg caacataatt cacataattt 1020gtatgcagtg tgaaggagca gaaggcatct tctcactgtg ctgcaaatct ttatagcctt 1080tacaatacgg acttctgtgt atatgttata ctgattctac tctgctttta tcctttggag 1140cctgggagac tccccaaaaa ggtaaatgct atcaagagta gaactttgta gctgtagatt 1200agttatgttt aaaacgccta cttgcaagtc ttgcttcttt gggatatcaa aatgtatttt 1260gtgatgtact aaggatactg gtcctgaagt ctaccaaata ttatagtgca ttttagccta 1320attcattatc tgtatgaagt tataaaagta gctgtagatg gctaggaatt atgtcatttg 1380tattaaaccc agatctattt ctgagtatgt ggttcatgct gttgtgaaaa atgttttacc 1440ttttaccttt gtcagtttgt aatgagagga tttcctttta ccctttgtag ctcagagagc 1500acctgatgta tcatctcaaa cacaataaac atgctcctga aggaaaaaaa aaaaaaaaa 1559321366DNAHomo sapiens 32gcgtctcgca gcagccgccc ggaccgggca tggtgttggg cgccgggccc gcctcgcctg 60tctcggggag cccagggtaa aggcagcagt aatgctaacg ctagcaagta aactgaagcg 120tgacgatggt ctcaaagggt cccggacggc agccacagcg tccgactcga ctcggagggt 180ttctgtgaga gacaaattgc ttgttaaaga ggttgcagaa cttgaagcta atttaccttg 240tacatgtaaa gtgcattttc ctgatccaaa caagcttcat tgttttcagc taacagtaac 300cccagatgag ggttactacc agggtggaaa atttcagttt gaaactgaag ttcccgatgc 360gtacaacatg gtgcctccca aagtgaaatg cctgaccaag atctggcacc ccaacatcac 420agagacaggg gaaatatgtc tgagtttatt gagagaacat tcaattgatg gcactggctg 480ggctcccaca agaacattaa aggatgtcgt ttggggatta aactctttgt ttactgatct 540tttgaatttt gatgatccac tgaatattga agctgcagaa catcatttgc gggacaagga 600ggacttccgg aataaagtgg atgactacat caaacgttat gccagatgat aaaaggggac 660gattgcaggc ccatggactg tgttacagtt tgtctctaac atgaaacagc aagaggtagc 720cccctctccc gtcctcatgc tccctctcag tcccctggat tgccccagtc ctgtgaccat 780gttgccctga agaagaccat cttcatgact gctcattgta gatggagaat tcaacataaa 840tacagcaaga aaatgtgttt gggcttctga agagttgtct gcttacctta acatgtttac 900ttttttgaac ttgtactgta taggctgttg gtgaaattct taagaagttg taatgaactc 960aaaattgagg ccagagcttg ctttcccttt tcccaaacaa aattggtttt ctgcacaagc 1020gatgctaatg atgtgttcag tgtaactcgc agattggcaa taagataccc gctacaaact 1080gtgattggat gcaaaatctc ttagcttctt tcacgaatgt tggccctgcc tagatgttgt 1140gaagcctccc agaatgcata gagtcattca ctgtagatct cttattgaaa tgcgtatttt 1200atttaatgta agtatatttt ggaacagatt tgtaatttgt acaattcaat gctttaatta 1260ttttttctat tctcatttag tttgtatttt cattgtatag agcagacaga aagatgttgg 1320gtcaagcaac tattgaagag aaatacaaag aaaaaaaaaa aaaaaa 1366334360DNAHomo sapiens 33gcggccgcgg cagggctggg cctgcgacta cccgaggagg ctgacctcca gcccgggcgc 60ccggttcagc gccgccccgg ccggcgccgg tgcctgccag gcactcaggg aggcgggggc 120gcagtggagg aggcggcgcc atcgcgaagc gagcgcctcg cccgcactca gccttgccac 180cccgcccgca gtccaggctg gactgggcgg catttgccga ggctcctcgg ccaggccccg 240tccgcccgag ccgcgctgag acccgggcag cggccgcgtg gagaggaggt ggcagcggcc 300cgggaggccg gagccaagcc agcgacccac catggagacc cgctacaacc tgaagagtcc 360ggctgttaaa cgtttaatga aagaagcggc agaattgaaa gatccaacag atcattacca 420tgcgcagcct ttagaggata acctttttga atggcacttc acggttagag ggcccccaga 480ctccgatttt gatggaggag tttatcacgg gcggatagta ctgccaccag agtatcccat 540gaaaccacca agcattattc tcctaacggc taatggtcga tttgaagtgg gcaagaaaat 600ctgtttgagc atctcaggcc atcatcctga aacttggcag ccttcgtgga gtataaggac 660agcattatta gccatcattg ggtttatgcc aacaaaagga gagggagcca taggttctct 720agattacact cctgaggaaa gaagagcact tgccaaaaaa tcacaagatt tctgttgtga 780aggatgtggc tctgccatga aggatgtcct gttgccttta aaatctggaa gcgattcaag 840ccaagctgac caagaagcca aagaactggc taggcaaata agctttaagg cagaagtcaa 900ttcatctgga aagactatct ctgagtcaga cttaaaccac tctttttcac taactgattt 960acaagatgat atacctacaa cattccaggg tgctacggcc agtacatcgt acggactcca 1020gaattcctca gcagcatcct ttcatcaacc tacccaacct gtagctaaga atacctccat 1080gagccctcga cagcgccggg cccagcagca gagtcagaga aggttgtcta cttcaccaga 1140tgtaatccag ggccaccagc caagagacaa ccacactgat catggtgggt cagctgtact 1200gattgtcatc ctgactttgg cattggcagc tcttatattc cgacgaatat atctggcaaa 1260cgaatacata tttgactttg agttataata tggttttgtg acttatgagc tgtgactcaa 1320ctgcttcatt aaacattctg cattgggtat aatctaagaa ttgtttacaa aaagattatt 1380ttgtatttac ccttcattcc tttttttgat ccttgtaagt ttagtataaa tatatctaga 1440cattcagact gtgtctagca gttacgtcct gcttaaaggg actagaagtc aaagttcctt 1500gtctcactat ttgatctgct ttgcagggaa ataacttgtt ttttctcatg tttcatcttc 1560tttttatgta aatttgtaat actttcctat attgcccttt gaaatttttg gataaaagat 1620gatgttttaa gttccaatga gtattactag ttactcaata ccacttattg agtactctgt 1680ttctacgtat gtagaatgta tagggataga agagttgaaa agggaaagca aaactcctca 1740agtagcttcc ttaaaatgtc attcatagga gatgtactgg aattgctcat tctgtgactt 1800tatttgtgtc ctaaacattc ttcagtgaaa ataattttat ttcagtcaaa catttatgag 1860gaaatgagat cacatctttg tcactggatg ctacttgaag agggagtact ttgtaaccac 1920tttgatatgc tgttatcacc accccctgcc ctctgctgcc ataatcacac aaatttaaaa 1980agaaagaaaa cagtcttcca tagattttta aggaagaaag ggcccaagcc aggagatcgc 2040ttggttttct tccagaagtt aaatgggggg atctgaagat ttgaatgttt ggtctgcttt 2100gaaatgtatg tcttttggga tgtattatat gcctagcttt ataatcagta taaattttaa 2160ttattccagg aatatgcata atattgaaat atttcatgtc ctattttaat agaaaacctc 2220agggcccaag taacagtgat agaagttaga aaaaccttta cttagaattg tccacctagt 2280cagagcccaa gaaagaattt tcagtggaaa aatcaatata taacttagtg ctagctagcg 2340ccacagactc tagtagataa tattatcatc ataatggctg gtgaaaccat ataatcacag 2400aaaaacattg ccttcagcat gttcagttcg cagcactgag ggcactcttg agggtgttgt 2460taatgaagat ttaattttta aatacaggtg gttccaagct ttcaaatagg ttatgctcca 2520aaagtgttat ttgtaagtta atttttttac aagtcaaaca atgttggaag tggtatttag 2580gttctagatc ggtccacgaa agttagccca tatgtatatc ttgaatagta taggggaggg 2640tattcataaa gtccttatgt ggttttaact aagtgaaatt atggacaaga gaaataattg 2700taaaatcgtc ttaaaggaaa atttaatttt tactcctgtt tatgggacat tcgttctatt 2760aactgtcaga cacaatttct gttttcatct gagagccagt tttcctttat ttctacatct 2820aaaataagaa catattgtac actattatat aatacagaat tgtcttaaac tttaataaat 2880tcgcatttta aaggtgttta cagattattt tttatatctg tagctgaatt tgttaaagtc 2940taaaaagctc aaggacttta tgaagatctc attatatgag gaaaatcata ggttaccatt 3000ttataactct attgccataa gaaaatacac tctaaaatct tgatttgaaa catattagaa 3060accttgattc agtgctcagt ggtctcctag taagaagtca ccgacggtag cgtcatatga 3120gaagaaagaa atccccacca cctcaacctc tgctgagatt gtgtgctagg aacagccttc 3180cctccgtttc ccctcagtca aacttgagcc agcctctgga tcgatgtgat cttattgcat 3240gtttccatgg ggtgtaccta tactttaagc caatcctgct gcattcactg ctaagttaaa 3300taaaaagcca agaagatttt gcactgtgca gatcctttgc tatctgactt gcatctcttc 3360ccccacctgt cagctagcca cctgcttgtt tgtgttggga tattttttag cacctgaagc 3420accatctgaa aggggcacca ttttcttctt ccctttgatc tcacatatgc tccctaaaaa 3480tccttaagtt gtcaatctga tccccagtgt gaggttaatg agcaaaattg gtctttgggg 3540ccctttttgt ccaagcccca ctgaaaggcc tcttcagaaa actattatct ttaaagccct 3600actttaactc cttaattcca gcatacagct aaaactggat gtatattctg gcaagtaaag 3660gctgaggact cctctttaat cctcagatct agataactca tgacatttta tttgaccaac 3720atagcacatg atgagatatc aaggtaatta aaatagcatg cttgaaaaaa aaatacgtaa 3780tctgtttcac ctgtaactgt ttaagccaat aaacttttca aaatttatgt aatgtggggc 3840ttttatgtag cactttacgt tttcatgctg cttattgttt tattctactg aaaaaaatga 3900atttcaagat tctcaacttt tttaatttca aaaattgttt attgttttga ctataggaat 3960acaaaatttc ctattttggg agaataagaa ctctttttgt catttttggc tatgaataaa 4020ctttctggtc ttttgagacc acccattttt atagatcaga atcagaaaac aggtaaacct 4080cactcacaca tttggactca tttgaacaaa aatctaggcc aaaatactga aaagcctatg 4140tgttttttta attggaagta tatgtaaggt taatgcattt agtgaacgtg actaacaaag 4200actaatgtgc acattaacag atgtactttt taaggtttta tgggaggctg tgcattgctc 4260aaaagctgtt gggaacgcct tctgaacagt tgccttcaga actagtttga gctgctcaat 4320aaaaccagtg actttactca taaaaaaaaa aaaaaaaaaa 4360342267DNAHomo sapiens 34ggttccgccc cgcgagcggc catcttggag gctgaggcgg cggcggcggc gctgcggcgg 60gttcggtggg cccaatcccg gggcggtgcg gctgtttcgg gcgcgggccc cgcttttccg 120caccctgctc cggcctcgac tacggcgagc ctgagcgcgg cggcggccca cgcgcagcga 180cagggagaga tgagcagcac cagcagtaag agggctccga ccacggcaac ccagaggctg 240aagcaggact accttcgcat taagaaagac ccggtgcctt acatctgtgc cgagcccctc 300ccttcgaata ttctcgagtg gcactatgtc gtccgaggcc cagagatgac cccttatgaa 360ggtggctatt atcatggaaa actaattttt cccagagaat ttcctttcaa acctcccagt 420atctatatga tcactcccaa cgggaggttt aagtgcaaca ccaggctgtg tctttctatc 480acggatttcc acccggacac gtggaacccg gcctggtctg tctccaccat cctgactggg 540ctcctgagct tcatggtgga gaagggcccc accctgggca gtatagagac gtcggacttc 600acgaaaagac aactggcagt gcagagttta gcatttaatt tgaaagataa agtcttttgt 660gaattatttc ctgaagtcgt ggaggagatt aaacaaaaac agaaagcaca agacgaactc 720agtagcagac cccagactct ccccttgcca gacgtggttc cagacgggga gacgcacctc 780gtccagaacg ggattcagct gctcaacggg catgcgccgg gggccgtccc aaacctcgca 840gggctccagc aggccaaccg gcaccacgga ctcctgggtg gcgccctggc gaacttgttt 900gtgatagttg ggtttgcagc ctttgcttac acggtcaagt acgtgctgag gagcatcgcg 960caggagtgag gcccaggcgc cgagacccaa ggcgccactg agggcaccgc gcaccagagc 1020gtgacctcgg caggctggac acactgccca gcacaggcag acccaccagg ctcctaggtt 1080tagcttttaa aaacctgaaa ggggaagcaa aaaccaaaat gtgtgactgg gctttggagg 1140agactggagc ctcagccctg tcctggccac gggccgctgg ggctggtgtg ggtgggcctt 1200gtgtgctgga tttgtagctt atcttccgtg ttgtctttgg acctgtttta gtaaacccgt 1260ttttcatttt attagatgtg gtcacttaga aatgcaaact tgctgccgac cgcgggctgc 1320tcctgcgttc ttggagctcc tggcgcgttt ctcggagctc ccggctcctc agcgggtggg 1380aacctcgggg cccaggggtg gagctggcgt ccgcgggtgc tggtctggcc tggccgtgtg 1440gtgatgaggc ttagcggggc cagtgacggc cgtggctcag gatccataag tcggggtttg 1500gtctcagcat ttacaaatgt gtttacagtc agaatgaaac acattccttc tagaaagtgc 1560ttgggggttt ttgctgccct ggaagccagg agcctgctca ctccaaccac aagtcgccct 1620tgactgcggc ggccgcgagc ggggcggggg ctgccggtgc cctccgcagg ccgggcctcc 1680tgggcgcccc tcggtgctgc aggctggggg gccttgggta cctgcagagc cttttctctg 1740aattccttat gtccggtggg ccagaagccc gtcctcctat gctggtggaa ggcggaggac 1800cggagtccct gcagaaggcc ccgtgcactc gggggcctcc ctcacatccc gtgccccctg 1860cgctggcctt cacagtaggt aatggctccg gcccgggtgt tcgctgtcca cggaacatgg 1920cagaggggca ccccggcccg gaaagacgcc agagccagca ggggctgttt cgggccgcgt 1980ggctccccgg gtctcggccg tctcccctct tctgcgtctg ttccgtgact tcgcctgggt 2040gggatgtacc gcaggtgcat cgcgtcgagg tggggcacgg ccgccggcaa gaaacccacc 2100ctgtccggag gcgggcgtga gacaagccca gcccgcacgc gctcatcttt cttcgttttt 2160tgatcagttt attcagaatt gctctataat ttaccaattg tatgtattta acctattctt 2220gtggaaaaaa aaggtctttc attatatctt tatttctgaa aaaaaaa 2267351540DNAHomo sapiens 35aggcgcacaa cgcaggccgg gcgggaagag ccaaagcggg caggcggcgg aaatatccga 60agcggcgggg cgcccgaggc cgttgccgac ctccgcgcta aagccgctgc tgccgcggaa 120gacgatcctc cagtacccgc ccgccgtcac cgcagctgcc gtgtcctcct cccaccccta 180gccgcacccc ctcgcggagg gatcagctga gcggccaaac ggcacggtcg ggggagcccc 240gagtccgcag ctgcagcggg gcctgagacc agagttggcg agggcaagga aggagcggcc 300ccgggcagtg ggggcggggc cgggcgggcc cgagaacagc cgaatttggc cgagcgctgc 360cgagcgagtc cgaggcgctg ggccaggccg gagccggact acgggagccg aggcgggccg 420cgcggtgggc gcggagagga gcggagcggc gcggcaggcc gggcgggtgg cggcagcagc 480ggaggaggcc gcagctgcgg gtccgaggag cggaggcgac gcgggcggcg gcggggggcc 540gggtggccgg ggtcccgggc cccgcggcgg cggcagcggc ggcggcggcg gcaggatgat 600caagctgttc tcgctgaagc agcagaagaa ggaggaggag tcggcgggcg gcaccaaggg 660cagcagcaag aaggcgtcgg cggcgcagct gcggatccag aaggacataa acgagctgaa 720cctgcccaag acgtgtgata tcagcttctc agatccagac gacctcctca acttcaagct 780ggtcatctgt cctgatgagg gcttctacaa gagtgggaag tttgtgttca gttttaaggt 840gggccagggt tacccgcatg atccccccaa ggtgaagtgt gagacaatgg tctatcaccc 900caacattgac ctcgagggca acgtctgcct caacatcctc agagaggact ggaagccagt 960ccttacgata aactccataa tttatggcct gcagtatctc ttcttggagc ccaaccccga 1020ggacccactg aacaaggagg ccgcagaggt cctgcagaac aaccggcggc tgtttgagca 1080gaacgtgcag cgctccatgc ggggtggcta catcggctcc acctactttg agcgctgcct 1140gaaatagggt tggcgcatac ccacccccgc cacggccaca agccctggca tcccctgcaa 1200atatttattg ggggccatgg gtaggggttt ggggggcggc cggtggggga atcccctgcc 1260ttggccttgc ctccccttcc tgccacgtgc ccctagttat tttttttttt ttaacaccat 1320gtgattaagg tcggcgctgc ctcccccgac ccactcagcg atgggaaatg aattggcttg 1380tctagccccc ctgctgggtg cttgttcagc ccccactctg ggctgtggag tgggtgggca 1440acgggcctgg gtagctgggc ccaggcaacc cacccctcca cctctggagg tcccaccagg 1500ctattaaagg ggaatgttac tgcaaaaaaa aaaaaaaaaa 1540362568DNAHomo sapiens 36cgcgcgcgca gtcgcgcgcg ggtcgtgccg taccaccgtc gcgggcaggc tcggccacga 60gcgccagagc cccgcgcctc ccctcgcggc ctgtcccaag tccctgcccc gcaacagagc 120gtcacttccg ccatccccgg cagcggttgg ggcggggcgc acgggggagg gggccaggtc 180ggagggaagc ccgcccgtgc ccgagcccgc gcccgagcag ggactacatt tcccgagggg 240cctcggcggc ggctgcggcg acgggcgcgg caacgtcccc cggaagtgga gcccgggact 300tccactcgtg cgtgaggcga gaggagccgg agacgagacc agaggccgaa ctcgggttct 360gacaagatgg ccgggctgcc ccgcaggatc atcaaggaaa cccagcgttt gctggcagaa 420ccagttcctg gcatcaaagc cgaaccagat gagagcaacg cccgttattt tcatgtggtc 480attgctggcc ctcaggattc cccctttgag ggagggactt ttaaacttga actattcctt 540ccagaagaat acccaatggc agcccctaaa gtacgtttca tgaccaaaat ttatcatcct 600aatgtagaca agttgggaag aatatgttta gatattttga aagataagtg gtccccagca 660ctgcagatcc gcacagttct gctatcgatc caggccttgt taagtgctcc caatccagat 720gatccattag caaatgatgt agcggagcag tggaagacca acgaagccca agccatagaa 780acagctagag catggactag gctatatgcc atgaataata tttaaattga tacgatcatc 840aagtgtgcat cacttctcct gttctgccaa gacttcctcc tctttgtttg catttaatgg 900acacagtctt agaaacatta cagaataaaa aagcccagac atcttcagtc ctttggtgat 960taaatgcaca ttagcaaatc tatgtcttgt cctgattcac tgtcataaag catgagcaga 1020ggctagaagt atcatctgga ttgttgtgaa acgtttaaaa gcagtggccc ctccctgctt 1080ttattcattt cccccatcct ggtttaagta taaagcactg tgaatgaagg tagttgtcag 1140gttagctgca ggggtgtggg tgtttttatt ttattttatt ttattttatt tttgaggggg 1200gaggtagttt aattttatgg gctcctttcc cccttttttg gtgatctaat tgcattggtt 1260aaaagcagct aaccaggtct ttagaatatg ctctagccaa gtctaacttt atttagacgc 1320tgtagatgga caagcttgat tgttggaacc aaaatgggaa cattaaacaa acatcacagc 1380cctcactaat aacattgctg tcaagtgtag attcccccct tcaaaaaaag cttgtgacca 1440ttttgtatgg cttgtctgga aacttctgta aatcttatgt tttagtaaaa tattttttgt 1500tattctactt tgcctttgta cagtttattt tactgtgttt atttcatttt cccaatttga 1560caatcgtatt ttaaaattga aactgatgga acattctttc ttggtcttca ccatctgaca 1620aattgaatgg caagaggtgg attttgccag tttcttttca ctgatgcaga tttgtgttaa 1680gatagtactg aatggagtat ttataaactg gccctgagca tgcataaagc atcagtatct 1740gacctttttt taaccttcta ggaatttgaa ataaatgtgt ttgtgttgtc tgattagatg 1800atcattggtg tcttgccaca atgtttaaaa attactgtac aggaaagtca cagcaaagat 1860agcagttgtg actgacatgt aggactttca cagttgtgcc acatttttgc ctaaaatttg 1920ggttatgaca tttttcttgg ttcttatctg aaaatttcat ctgtaacctt tcatgtgtgt 1980taagaaacac tgatctgatc atttgggatt tgctgaggca tttgtgagtc ttccttataa 2040acctgatgag cagatctcaa ctatctagct tgtgtgtcat cagaaaggtt tatccctttg 2100agagtatcaa gtcctcagtt aatgattctt gctttcatcc ctccagtatt tgctgtggga 2160gctcgtttta ttctttaatt tggaattcag taatttttct tctttattga cgaattcctc 2220ccctcacaaa actgttcttt cccacctctc tccatatcta attcctgatt cttgttattt 2280ttaagtcata aatgtagcca gtcataaata cataaatgtt aaccttcggg ttgcaacctt 2340gtctcttgca gtttaaggta atggatattg tagcccattt gaattttctt cactcttatt 2400ctcgtaattc tggagtttct tcagattgtg gtgtatttta

ttgtgctcct atgtaagatg 2460aagaattaac tattaaaatt acattttcaa catacaaaag cttttgatga ctggtaactg 2520gtatccttcc aaataaatgc attgcttggt aaaaaaaaaa aaaaaaaa 2568375395DNAHomo sapiens 37cgcctccccg cgcctcgttc gccgccgctg tcgccgccgc cgcccgagac tcgcgcagag 60cagttatggc ggatcccgca gcccccacgc ccgcagctcc cgctccagcc caggccccgg 120ctccagcccc ggaggcagtc ccggccccag ccgcagcccc cgtcccggcg ccggcgcccg 180cctcggactc ggcctccggg ccgtcctcgg actccggccc agaagccggc tcgcagcgcc 240tgctgttttc tcacgacctg gtgtcgggcc gttaccgtgg ctccgtgcac ttcgggctgg 300tgcgcctcat ccacggcgag gactcggact cggagggcga ggaggagggc cgcgggagct 360cggggtgctc cgaggccggg ggcgcgggcc acgaggaggg ccgggccagc cccctgcgcc 420gcggctacgt gcgcgtccag tggtacccgg agggcgtcaa gcagcatgtg aaggagacca 480agctgaaact agaggaccgt tctgtggtgc cccgagatgt ggtccggcac atgcgatcca 540ccgacagtca gtgtggcacg gtgatcgacg tcaacatcga ctgtgccgtc aagctcatcg 600gcaccaactg catcatctat cccgtcaaca gcaaggacct gcagcacatc tggcccttca 660tgtatgggga ctacattgcc tatgactgct ggctggggaa ggtctacgac ttgaagaacc 720agatcatcct gaagctatcc aacggcgcca ggtgctccat gaacacggaa gatggcgcca 780agctctacga cgtctgcccg cacgtcagcg actcgggtct cttcttcgat gattcctatg 840gcttctaccc aggccaggtg ctcattggcc ctgccaagat cttctccagc gtccagtggc 900tgtcaggtgt caagcccgtg ctcagcacca agagcaagtt ccgagtggtg gtggaagagg 960tgcaggttgt agagttgaaa gttacatgga ttaccaagag tttctgtcca gggggcacgg 1020acagcgtcag ccccccaccc tctgtcatca cccaggaaaa cctaggcagg gtgaagcgtc 1080tcggatgctt tgaccatgct cagcggcagc ttggggagcg ctgtctgtat gtcttcccag 1140ccaaggtaga gccagccaag attgcctggg aatgtccaga aaaaaactgc gcccaggggg 1200agggctctat ggccaagaag gtgaagcgcc tgttgaagaa gcaggttgtg cggatcatgt 1260catgctcccc agacacccag tgttcccggg accattccat ggaagaccca gacaagaagg 1320gggaatccaa aaccaagagc gaagcggagt ctgccagccc tgaggagacg cccgatggct 1380ctgccagtcc agtggagatg caggacgagg gtgcagagga gccccacgag gcaggagagc 1440agctgccccc attcctgcta aaagaaggca gagatgacag gctgcactcg gcagagcagg 1500acgcagatga tgaggctgct gatgacacgg acgacaccag ttcggtgacc tcctctgcca 1560gctccaccac ttcctcccag agcggcagcg gcacgagtcg caaaaagagc atccccttgt 1620ccatcaagaa cttaaagcgc aaacacaaga ggaagaagaa taaaatcact cgagacttca 1680agccagggga cagggtggca gtggaggtgg tgaccacgat gacctcagcc gacgtgatgt 1740ggcaggatgg ctccgtggaa tgcaacatcc gctccaacga cctcttccct gtgcaccacc 1800tggacaacaa cgagttctgc cctggagact tcgtggtaga taagcgagtc cagagctgtc 1860cagaccctgc tgtctacggt gtggtacagt ctggggacca catcggccgt acctgcatgg 1920tgaagtggtt caagctgagg ccgagtgggg acgacgtgga gctgattgga gaagaggaag 1980atgtgagtgt ttacgacatt gctgaccacc ctgactttag gttccgtaca actgacatcg 2040tcatccgcat cggcaatact gaggatgggg ctcctcacaa ggaggatgag ccatcggtgg 2100gccaggtggc ccgtgtggac gtcagcagca aggtggaggt ggtgtgggct gacaactcaa 2160agaccatcat cctgccccag cacttgtaca acatagagtc tgagattgag gagtcagact 2220acgattcggt agaaggcagc accagcgggg catcctcgga tgaatgggaa gatgatagtg 2280acagctggga gacggacaat gggctggtgg aggacgagca ccccaagata gaggagcccc 2340ccatcccacc cctggagcag ccggtggccc ctgaggacaa gggagtggtg atcagtgaag 2400aggcagccac agctgccgtc cagggggctg tggccatggc tgcccccatg gccgggctga 2460tggagaaggc tggcaaggac gggccaccca agagcttccg ggagttgaaa gaggccatca 2520agatcctgga gagcctcaag aacatgactg tggagcagct gctgacgggc tcgcccacct 2580ctccgactgt ggagcctgag aagccaactc gggagaagaa gtttctggat gacatcaaga 2640agctacagga aaacctcaag aagaccctgg acaatgtggc cattgtagag gaggagaaga 2700tggaagcagt gcccgacgta gagcgcaagg aggacaagcc cgaggggcag tcacctgtga 2760aggctgagtg gcccagcgaa accccggtgc tgtgccagca gtgtggcggc aagcctggcg 2820tcaccttcac cagcgccaag ggcgaggtct tctccgtact ggagtttgca ccctcaaatc 2880attcttttaa gaaaattgag ttccagcctc cagaagccaa gaagttcttc agcacagtgc 2940ggaaggagat ggcgctgctg gctacctcac tgcctgaggg catcatggtc aagacttttg 3000aagatagaat ggacctcttc tcagctctca tcaagggccc cactcgaacc ccctacgagg 3060atggcctcta cttgtttgac atccagctcc ccaacatcta cccagccgtg cccccccact 3120tctgctacct ctcccaatgc agtggccgcc tgaaccccaa cctgtatgac aatgggaagg 3180tgtgtgtcag cctcctgggc acctggattg gaaaggggac agagaggtgg acaagcaagt 3240ccagccttct ccaggtgctc atctccatcc aaggtctgat cctggtaaat gaaccatact 3300acaacgaagc cggcttcgac agtgaccgag gcctgcagga aggctatgaa aacagtcgct 3360gttacaatga gatggcgctg atccgcgtgg tgcagtccat gacccagctg gtgcggcggc 3420cccccgaggt ctttgagcag gagatcaggc aacactttag cactggtggc tggcggctgg 3480tgaaccgtat cgagtcctgg ctggaaaccc atgccctgct ggagaaggcc caggcactgc 3540ccaacggggt gcccaaggcc agcagctcgc cagagccccc agctgtagcc gagctgtcag 3600actccggcca acaagaacct gaggatggag ggccagcccc aggagaggcc tcccagggct 3660cagactcaga gggcggtgcc cagggcctgg cctcagctag cagggaccac acagaccaga 3720cttcggagac cgcaccagac gcatcggtgc cacccagtgt gaaaccaaag aagcggagaa 3780agagctaccg gagcttctta cctgagaaga gtggctaccc tgacatcggc ttccccctct 3840tcccactttc caagggtttc atcaagagca tccggggtgt cctgacgcag ttccgggctg 3900ccctgctaga ggcaggcatg ccggagtgca cagaggacaa gtagctgcca ggcacagagg 3960aaagagcatc accgtgggag aggccagccg ccgcctgctc actccccccc ggaatcaccc 4020ctcttcccat gcccctctgt ccccactgca aacccactgc cctcttctcc ccaaggtgag 4080tttgatgctg aagtgcaaga agtgtgttga gatgctgccg tttctatttt gaagcgagct 4140ttcaacaggc gggtcccctg tggcaaagaa aatcggaacc ctgttgccga ttttccattt 4200gtcaccccag cagaatgtcc ggcacttgct cccttgctgc cccttctcag gtcagaggcg 4260ggtgttccag ggcctgccgc ggggctctct gggccggttc cctgcagacc cgcaggagag 4320cacatgtgcc ttgcatgaag tgtgggttgc gccaacaatt cccctggtcc ctttcaacct 4380gtttagttca actcaagcct ccctgtgtcc cagaccctcc tgctgccacc accacccagg 4440tcctccctag tcctccagcg tcaacactat cccttgggag ttgtagctgc tgtcactgac 4500tcccggctat acatggcctg tcgaccacgt tatagccctc aggcctgttg aacttgctct 4560ctaagagagg ttgggaccag gctaggttcc gggtgacgcc caggagaggt ggtggccttc 4620acacatgcac atggagttga ggaccaggga gctgcaggga aagcaacagc tataggtgcc 4680ttgctcttct gtcggaggct gctgggggca agagcagctg cacaaggcca gggcaagtgc 4740tagggcccct cccccatcac atggtcacac tgggacaggc gtgcagctca ctgaactcca 4800agcgagccag ccctctcttg gactagaagg cctactgtca gcccttcgct tacaaactgc 4860aggctcaatc cgaaggggac ggccggcggg ggctctccta gtgcccagag acaggcccag 4920aggtttacaa gttttctaag cttttgataa tgtgaagctc caggccgaga ggatgctgtt 4980gagcacattg cagctatgta atttttggtg tatgtatgta atatttaagg ttggaaaaaa 5040aactcaaaag caaagatatt aactcttatt agaaaaaaag acaaaaaaaa agccaaagca 5100tgatgcgtct tgtcagcctt aagtgggctc cacacctgtg ctgtgctgtg accgcccagc 5160cagcagagct gcgggaggat ggagccggac cacacaccgt ggcatttgga accgagtcgg 5220tatcttgttt gagaaacacc cggagtgact ggtggggctg tgcttcccag tgcattgtac 5280atgtggagat gtgaatgcct actgcttacg atatctgtat aaagtgctgt gtgattaaac 5340ttttttttac ttgcaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaa 5395381207DNAHomo sapiens 38cctctccgcc acttccctcg cttctgacca tagtttgcgg ggaagggagc gagcgcgtcg 60aaaaccaagg aacgtgcgcg ctgacgtcac ggttgaggct cggagctgag gggccgcgga 120gggcgtggcc tgcgggcggt tataaagagg cagtggtgcg cgcgcggccg gctcagtgct 180gccgggcacc ggggcggcgg gttggtctac gctgtgcgcg gcggacgtcg gaggcagcgg 240ggagcggagc ggggccgccg gggcctctcc agggccgcag cggcagcagt tgggcccccc 300gccccggccg gcggaccgaa gaacgcagga agggggccgg ggggacccgc ccccggccgg 360ccgcagccat gaactccaac gtggagaacc tacccccgca catcatccgc ctggtgtaca 420aggaggtgac gacactgacc gcagacccac ccgatggcat caaggtcttt cccaacgagg 480aggacctcac cgacctccag gtcaccatcg agggccctga ggggacccca tatgctggag 540gtctgttccg catgaaactc ctgctgggga aggacttccc tgcctcccca cccaagggct 600acttcctgac caagatcttc cacccgaacg tgggcgccaa tggcgagatc tgcgtcaacg 660tgctcaagag ggactggacg gctgagctgg gcatccgaca cgtactgctg accatcaagt 720gcctgctgat ccaccctaac cccgagtctg cactcaacga ggaggcgggc cgcctgctct 780tggagaacta cgaggagtat gcggctcggg cccgtctgct cacagagatc cacgggggcg 840ccggcgggcc cagcggcagg gccgaagccg gtcgggccct ggccagtggc actgaagctt 900cctccaccga ccctggggcc ccagggggcc cgggaggggc tgagggtccc atggccaaga 960agcatgctgg cgagcgcgat aagaagctgg cggccaagaa aaagacggac aagaagcggg 1020cgctgcggcg gctgtagtgg gctctcttcc tccttccacc gtgaccccaa cctctcctgt 1080cccctccctc caactctgtc tctaagttat ttaaattatg gctggggtcg gggagggtac 1140agggggcact gggacctgga tttgtttttc taaataaagt tggaaaagca gaaaaaaaaa 1200aaaaaaa 1207

* * * * *