Ghrelin O-acyltransferase (goat) Biochemical Assay Brown; Michael S. ; et al. [Brown; Michael S.]

Ghrelin O-acyltransferase (goat) Biochemical Assay

Brown; Michael S. ; et al.

Patent Application Summary

U.S. patent application number 11/967171 was filed with the patent office on 2009-07-02 for ghrelin o-acyltransferase (goat) biochemical assay. Invention is credited to Michael S. Brown, Joseph L. Goldstein, Nick V. Grishin, Jing Yang.

Application Number	20090170141 11/967171
Document ID	/
Family ID	40688674
Filed Date	2009-07-02

United States Patent Application	20090170141
Kind Code	A1
Brown; Michael S. ; et al.	July 2, 2009

GHRELIN O-ACYLTRANSFERASE (GOAT) BIOCHEMICAL ASSAY

Abstract

Ghrelin is acylated ghrelin O-acyltransferase. Ghrelin O-acyltransferase assays comprise contacting a mixture of ghrelin and recombinant ghrelin O-acyltransferase with an agent; and detecting a resultant decrease in acylation of the ghrelin by the acyltransferase.

Inventors:	Brown; Michael S.; (Dallas, TX) ; Goldstein; Joseph L.; (Dallas, TX) ; Grishin; Nick V.; (Dallas, TX) ; Yang; Jing; (Dallas, TX)
Correspondence Address:	RICHARD ARON OSMAN 4070 CALLE ISABELLA SAN CLEMENTE CA 92672 US
Family ID:	40688674
Appl. No.:	11/967171
Filed:	December 29, 2007

Current U.S. Class:	435/15 ; 435/193; 435/320.1; 435/325
Current CPC Class:	G01N 2333/91051 20130101; C12Q 1/48 20130101
Class at Publication:	435/15 ; 435/193; 435/320.1; 435/325
International Class:	C12Q 1/48 20060101 C12Q001/48; C12N 9/10 20060101 C12N009/10; C12N 15/00 20060101 C12N015/00; C12N 5/00 20060101 C12N005/00

Goverment Interests

[0001] This work was supported by grants from the National Institutes of Health (HL20948); the Government has certain rights in this invention.

Claims

1-8. (canceled)

9. A method for assaying ghrelin O-acyltransferase (GOAT) activity in an in vitro, cell-free format comprising: combining in vitro recombinant mammalian ghrelin O-acyltransferase, a ghrelin substrate of the acyltransferase, octanoyl-CoA, and a small molecule candidate agent, wherein the ghrelin substrate or the octanoyl comprises a label, whereby the acyltransferase catalyses the covalent transfer of the octanoyl of the octanoyl-CoA to the ghrelin substrate to form labeled octanoyl-ghrelin substrate; and isolating and quantifying the labeled octanoyl-ghrelin substrate to specifically determine the amount of acylation of the ghrelin substrate by the acyltransferase in the presence of the agent.

10. The method of claim 9 wherein the ghrelin substrate comprises the label.

11. The method of claim 9 wherein the octanoyl comprises the label.

12. The method of claim 9 wherein the labeled octanoyl-ghrelin substrate is isolated by specifically immobilizing its ocatnoyl moiety.

13. The method of claim 9 wherein the labeled octanoyl-ghrelin substrate is isolated by specifically immobilizing its ghrelin substrate moiety.

14. The method of claim 9 wherein the label is a radiolabel.

15. The method of claim 9 wherein the label is a fluorescent label.

16. The method of claim 9 wherein the ghrelin substrate is ghrelin.

17. The method of claim 9 wherein the ghrelin substrate is pro-ghrelin.

18. The method of claim 9, wherein the acyltransferase is in membrane- bound form.

19. The method of claim 9, wherein the acyltransferase is in detergent-solubilized form.

20. The method of claim 9 wherein the amount of acylation of the ghrelin substrate by the acyltransferase in the presence of the agent indicates that the agent specifically inhibits the acyltransferase.

21. The method of claim 9, wherein the octanoyl comprises the label, the labeled octanoyl-ghrelin substrate is isolated by specifically immobilizing its ghrelin substrate moiety, the label is a radiolabel, the ghrelin substrate is pro-ghrelin, the acyltransferase is in membrane-bound form, and the amount of acylation of the ghrelin substrate by the acyltransferase in the presence of the agent indicates that the agent specifically inhibits the acyltransferase.

22. The method of claim 9 wherein the acyltransferase is mouse, rat, human, chimpanzee, bovine, or horse ghrelin O-acyltransferase (GOAT).

23. The method of claim 9 wherein the acyltransferase is human ghrelin O-acyltransferase (GOAT).

24. The method of claim 9 wherein the acyltransferase is mouse ghrelin O-acyltransferase (GOAT).

Description

FIELD OF THE INVENTION

[0002] The field of the invention is ghrelin O-acyltransferase assays.

BACKGROUND OF THE INVENTION

[0003] The appetite-stimulating peptide hormone, ghrelin, is the only protein in animals that is known to be modified by O-acylation with octanoate, an eight-carbon fatty acid. Octanoylation is required for the endocrine actions of ghrelin, but no enzyme that catalyzes this novel modification has yet been identified (Kojima and Kangawa, 2005; van der Lely et al., 2004).

[0004] The discovery of ghrelin was reported in 1999 by Kojima et al. (Kojima et al., 1999), who were searching for a ligand for an orphan G-protein coupled receptor (GHS-R) that stimulates the secretion of growth hormone in the pituitary gland. The ligand was purified from rat stomach, and it was shown to stimulate the release of growth hormone from cultured pituitary cells. Kojima, et al. (1999) determined that the 28-amino acid ghrelin is derived proteolytically from a precursor of 117 amino acids. Analysis by mass spectroscopy revealed that serine-3 of ghrelin is modified by O-acylation with an octanoyl residue, which is required for growth hormone releasing activity. Serine-3 is conserved in mammals, birds, and fish. In the bullfrog serine-3 is replaced by threonine, but this residue is also octanoylated (Kaiya et al., 2001; Kojima and Kangawa, 2005). Thus, O-octanoylation of ghrelin has been conserved in vertebrates over millions of years of evolution.

[0005] Interest in ghrelin rose dramatically when it was demonstrated that ghrelin concentrations in human plasma rise immediately before mealtimes (Cummings, 2006; Small and Bloom, 2004). Moreover, infusion of ghrelin into the cerebral ventricles of rats markedly enhances food intake apparently through actions on the hypothalamus (Kamegai et al., 2001). Elimination of ghrelin or its receptor in mice through knockout technology caused a modest but significant reduction in obesity when the mice were presented with high fat diets (Wortley et al., 2005; Zigman et al., 2005). These findings aroused interest in ghrelin inhibitors as potential preventatives for obesity in humans.

[0006] One way to inhibit the action of ghrelin would be to block the supposed enzyme that attaches octanoate. An inhibitor should be quite specific since no other protein is known to be octanoylated. Thus far, however, a ghrelin octanoylating enzyme has escaped identification. In the current studies, we have identified the ghrelin-acylating enzyme.

[0007] The initial insight came from studies on the Drosophila wingless gene and its mammalian homolog, Wnt. Genetic studies in Drosophila had earlier demonstrated that Wingless activity required the action of another gene porcupine (Kadowaki et al., 1996). The amino acid sequence of Porcupine contains a conserved region that is found in a family of membrane-bound hydrophobic enzymes that transfer long-chain fatty acids to membrane-associated hydroxyl acceptors, called "MBOATs" for Membrane-Bound O-Acyltransferases (Hofmann 2000). Examples include acyl-CoA:cholesterol acyltransferases (ACATs), which attaches fatty acids to the hydroxyl group of cholesterol and diacylglycerol acyltransferases (DGATs), which acylate the hydroxyl group of diacylglycerol. Subsequent studies indeed showed that Porcupine is required for the attachment of a monounsaturated long-chain fatty acid to a serine residue in Wnt (Takada et al., 2006).

[0008] Here, we show that the mammalian genome encodes 16 MBOATs produced by 11 genes, and we show that one of these MBOATs catalyzes the octanoylation of ghrelin when it is expressed together with prepro-ghrelin in cultured mammalian endocrine cell lines. We name this enzyme GOAT (Ghrelin O-Acyltransferase).

[0009] Cited Literature [0010] Altschul, et al. (1997). Nucleic Acids Res. 25, 3389-3402. [0011] Asfari, et al. (1992). Endocrinology 130, 167-178. [0012] Bizzozero, O. A. (1995). Meth. Enzymol. 250, 361-379. [0013] Chen, et al. (2004). Genes Dev. 18, 641-659. [0014] Cummings, D. E. (2006). Physio. Behavior 89, 71-84. [0015] Date, et al. (2000). Endocrinology 141, 4255-4261. [0016] Hannah, et al. (2001). J. Biol. Chem. 276, 4365-4372. [0017] Hofmann, K. (2000). TIBS 25, 111-112. [0018] Kadowaki, et al. (1996). Genes Dev. 10, 3116-3128. [0019] Kaiya et al. (2001). J. Biol. Chem. 276, 40441-40448. [0020] Kaiya, et al. (2004). Gen. Comparative Endocrin. 138, 50-57. [0021] Kamegai et al. (2001). Diabetes 50, 2438-2443. [0022] Kapust, et al. (2001). Protein Eng. 14, 993-1000. [0023] Karreman, C. (1998). FBioTechniques 24, 736-742. [0024] Kojima, et al. (1999). Nature 402, 656-660. [0025] Kojima, M. and Kangawa, K. (2005). Physiol. Rev. 85, 495-522. [0026] Miyazaki, et al. (1990). Endocrinology 127, 126-132. [0027] Nishi et al. (2005). Endocrinology 146, 2255-2264. [0028] Nohturfft, et al. (2000). Cell 102, 315-323. [0029] Small, C. J. and Bloom, S. R. (2004). Trends Endocrin. Metabolism 15, 259-263. [0030] Takada et al. (2006). Dev. Cell 11, 791-801. [0031] van der Lely, et al. (2004). Endocrine Rev. 25, 426-457. [0032] Walker, D. and Koonin, E. (1997). Intell. Sys. Mol. Biol. 5, 333-339. [0033] Willert, et al. (2003). Nature 423, 448-452. [0034] Wortley, et al. (2005) J. Clin. Invest. 115, 3573-3578. [0035] Zhu, X., Cao, Y., Voodg, K., and Steiner, D. F. (2006). J. Biol. Chem. 281, 38867-38870. [0036] Zigman, J. M. and Elmquist, J. K. (2006). Proc. Natl. Acad. Sci. USA 103, 12961-12962. [0037] Zigman, et al. (2005). J. Clin. Invest. 115, 3564-3572. [0038] Zorrilla, et al. (2006). Proc. Natl. Acad. Sci. USA 103, 13226-13231.

SUMMARY OF THE INVENTION

[0039] The invention provides methods and compositions for acylating ghrelin. In one embodiment, the invention provides a method of inhibiting acylation of ghrelin, comprising (a) combining recombinant ghrelin O-acyltransferase, ghrelin and octanoyl with an agent; and (b) detecting a resultant decrease in octanoylation of the ghrelin by the acyltransferase.

[0040] In a particular embodiment, the invention is practiced in an in vitro format, wherein the acyltransferase and ghrelin are in vitro, the octanoyl is provided in the form of labeled octanoyl-CoA, the agent is a small molecule candidate, and the detecting step detects a resultant decrease in covalent transfer of the labeled octanoyl to the ghrelin by the acyltransferase to identify the candidate as a ghrelin O-acyltransferase inhibitor.

[0041] In a particular embodiment, the method is practiced in a cell-based format, wherein the acyltransferase and ghrelin are expressed in a cell in a culture medium, the octanoyl is provided by delivering to the medium as labeled octanoate which is converted by the cell to labeled octanoyl-CoA, the agent is a small molecule candidate, and the detecting step detects a resultant decrease in covalent transfer of the labeled octanoyl to the ghrelin by the acyltransferase to identify the candidate as a ghrelin O-acyltransferase inhibitor.

[0042] In a more particular embodiment of the cell-based format, the acyltransferase is inducibly expressed in the cell, and the method further comprises the step of inducing expression of the acyltransferase.

[0043] The invention also provides compositions including (a) mixtures of isolated or recombinant ghrelin and isolated or recombinant ghrelin O-acyltransferase; (b) mixtures of defined amounts or concentrations of ghrelin and ghrelin O-acyltransferase; (c) mixtures of recombinant ghrelin and recombinant ghrelin O-acyltransferase; and (d) recombinant mammalian, particularly human, ghrelin O-acyltransferase.

[0044] The invention also provides recombinant expression constructs for the disclosed mammalian, particularly human ghrelin O-acyltransferases, which typically encode the acyltransferase operably linked to a heterologous promoter, and cells comprising such constructs.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS OF THE INVENTION

[0045] In one embodiment, the invention provides a method of modulating acylation of ghrelin, which may be implemented as a drug screening or validation assay in cell-free (in vitro) or cell-based assay formats. In preferred embodiments, the assay is practiced with multiple candidate agents in parallel, preferably massive parallel, for high-throughput screening.

[0046] Generally these methods comprise the steps of: (a) combining recombinant ghrelin O-acyltransferase, ghrelin and octanoyl group with an agent; and (b) detecting a resultant decrease in octanoylation of the ghrelin by the acyltransferase. The form of the acyltransferase, ghrelin and octanoyl are selected to be compatible with the selected assay format, as described further below. For example, ghrelin encompasses alternative forms of ghrelin that provide operable substrates for the acyltransferase in the assay, including mature, processed ghrelin (residues 1-28), pro-ghrelin (including the C-terminal propeptide--residues 29-94), and prepro-ghrelin (including the 23-residue N-terminal signal sequence).

[0047] The combination of step (a) is incubated under conditions wherein but for the presence of the agent, the ghrelin O-acyltransferase catalyzes the specific transfer of a reference or control amount of octanoyl to the ghrelin. The detecting step then detects an agent-biased amount of octanoylation of the ghrelin, wherein a reduced agent-biased octanoylation of the ghrelin relative to the control or reference amount indicates that the agent is an inhibitor of ghrelin acylation. The detecting step is typically preceded by a wash step, which depending on the assay format, may be facilitated with a bead column, filter, etc. wherein unreacted (not ghrelin-attached), labeled octanoyl is removed.

[0048] In the in vitro format, the acyltransferase is recombinant and presented in membrane-bound or detergent-solubilized, active form, and often in a determined or quantified amount. Alternative protocols for isolating membrane-bound or detergent-solubilized active forms of the enzyme are readily practiced; see, e.g. Radhakrishnan et al., Mol. Cell 15: 259-268, 2004; Radhakrishnan et al., PNAS USA 104: 6511-6518, 2007. The ghrelin is recombinant or synthetic pro-ghrelin, and often in a determined or quantified amount. The method may optionally comprise the antecedent step of recombinantly expressing and/or isolating, and/or solubilzing the acyltransferase, and may optionally comprise the antecedent step of recombinantly expressing or synthesizing, and/or isolating the ghrelin.

[0049] The octanoyl group is typically labeled (e.g. radio- or fluorescent-labeled) and presented in a transferable, high-energy form (e.g. octanoyl-CoA) to facilitate catalytic octanoylation. In an alternative embodiment, the ghrelin is labeled. The agent is typically a small molecule, assay compatible candidate, and it typically part of a library or panel of compounds screened in parallel. The detecting step generally detects a resultant decrease in covalent transfer of the labeled octanoyl to the ghrelin by the acyltransferase to identify the candidate as a ghrelin O-acyltransferase inhibitor.

[0050] In a particular embodiment, the method is practiced in scintillation proximity bead assay format, wherein the ghrelin is immobilized on a bead, and radiolabeled octanoylation of the ghrelin is detected by scintillation counts. In an alternative embodiment, the octanoyl moiety is immobilized, and the ghrelin is radiolabeled.

[0051] In the cell-based format, the acyltransferase and ghrelin are expressed in a cell in a culture medium. The cell type is discretionary, so long as it is compatible with the acylation assay. Both the acyltransferase and ghrelin (the prepro-ghrelin form) are expressed by the cell, and in a preferred embodiment, the acyltransferase is inducibly expressed in the cell, and the method further comprises the step of inducing expression of the acyltransferase with a corresponding inducer (e.g. tetracycline).

[0052] The octanoyl is provided by delivering to the medium labeled octanoate which is converted by the cell to labeled octanoyl-CoA. The agent is typically a small molecule, assay-compatible candidate, and it typically part of a library or panel of compounds screened in parallel. The detecting step generally detects a resultant decrease in covalent transfer of the labeled octanoyl to the ghrelin by the acyltransferase to identify the candidate as a ghrelin O-acyltransferase inhibitor.

[0053] The invention also provides compositions including (a) mixtures of isolated or recombinant ghrelin and isolated or recombinant ghrelin O-acyltransferase; (b) mixtures of defined amounts or concentrations of ghrelin and ghrelin O-acyltransferase; (c) mixtures of recombinant ghrelin and recombinant ghrelin O-acyltransferase; and (d) recombinant mammalian, particularly human, ghrelin O-acyltransferase.

[0054] The invention also provides recombinant expression constructs for the disclosed mammalian, particularly human ghrelin O-acyltransferases, which typically encode the acyltransferase operably linked to a heterologous promoter, and cells comprising such constructs. Methods for making recombinant ghrelin O-acyltransferase comprise culturing such cells under conditions whereby the enzyme is expressed, and optionally, isolating the enzyme.

[0055] Bioinformatic Identification and cDNA Cloning of Mouse MBOATs.

[0056] We identified sixteen members of the MBOAT family in the mouse genome, using reported MBOAT sequences (Hofmann, 2000) as queries and PSI-BLAST searches (E-value cutoff 0.005, default parameters) (Altschul et al., 1997) against the non-redundant mouse protein sequence database.

[0057] Full-length cDNAs for 15 of the 16 MBOATs were cloned by RT-PCR of total RNA isolated from the stomach of C57BL/6J mice that had been fasted for 16 hr. The cloned sequences with or without addition of sequences encoding a C-terminal Flag-tag or HA-tag were inserted into pcDNA3 or pcDNA3.1 vectors (Invitrogen) driven by the cytomegalovirus (CMV) promoter-enhancer. Primers for RT-PCR were designed according to the coding sequences available in the NCBI database. For each MBOAT without isoforms, 10 to 20 cDNA clones were sequenced in their entirety; for the three MBOATs with multiple isoforms (MBOAT1, MBOAT2, and porcupine), 60 to 80 cDNA clones were sequenced.

[0058] For one of the 16 MBOATs, we initially failed to clone a full-length cDNA. This MBOAT was designated in the NCBI database (May 2007) as "similar to O-acyltransferase (membrane bound) domain containing 1" (XM.sub.--134120). Efforts to clone its cDNA failed because the NCBI annotation at the 5' end was incorrect. As a result, the 5' primers failed to prime PCR amplification. We therefore synthesized an artificial cDNA according to the sequence of XM.sub.--134120. After obtaining four segments of DNA corresponding to nucleotides 1-391, 398-885, 907-1254, and 1261-1581 of XM.sub.--134120, we pieced them together by fusion-PCR (Karreman, 1998). On Jun. 20, 2007, the incorrect NCBI annotation of XM.sub.--134120 was replaced by two new annotations that were renamed MBOAT4, XM.sub.--001476434 and XM.sub.--001472220. These two versions of MBOAT4 differed from each other by 376 nucleotides at the 5'-end, and they differed from XM.sub.--134120 at the 5'-end in the following ways: XM.sub.--001476434 was 211 bp shorter than XM.sub.--134120 and XM.sub.--001472220 was 165 bp longer than XM.sub.--134120. To determine the correct 5'-end of the MBOAT4 mRNA, we carried out 5' rapid amplification of cDNA ends (5'-RACE) using total RNA from mouse stomach, 3' nested primers designed according to the sequence of the longer putative MBOAT4 transcript XM.sub.--001472220, and the FirstChoice RLM-RACE Kit (Ambion). The results showed that the correct annotation was XM.sub.--001476434. The current NCBI database (Nov. 27, 2007) contains partial DNA sequence information on 11 ESTs corresponding to XM.sub.--001476434. Of the 11 ESTs, only one of them (IMAGE 5655946) extends to the 5'-end. This sequence corresponds to the cDNA that we subsequently showed to encode ghrelin O-acyltransferase (GOAT).

[0059] A full-length cDNA for mouse GOAT was generated by RT-PCR of total stomach RNA as described above. The chimpanzee ortholog (XP.sub.--519692) of mouse GOAT was identified by a "blastp" analysis of the non-redundant protein database. Orthologs of GOAT in other species were found by clustering identified genomic sequences with the SEALS command grouper (with criterion -1 scut=0.6) (Walker and Koonin, 1997). In genomic DNA from several species, the annotation of exons did not permit this determination of the amino acid sequence at the N-terminus of the proteins. In these cases we used the N-terminal amino acid sequence translated from mouse cDNA as a query, which allowed us to identify complete GOAT ortholog amino acid sequences through the use of tblastn searches. The reference numbers for the corresponding genomic DNA sequences were as follows: rat (NW.sub.--047474.1), human (NT.sub.--007995.14), bovine (NW.sub.--001494415.1), horse (NW.sub.--001799700.1), and zebrafish (NW.sub.--001513480.1). Alignments were carried out by ClustalW. cDNA sequences and translates for representative animal GOAT species are appended hereto.

[0060] Cell Culture and Transient Transfection.

[0061] All cells were grown in monolayer at 37.degree. C. in an atmosphere of 8.8% CO.sub.2. Mouse AtT-20 cells were cultured in medium A (Dulbecco's modified Eagle's medium (4.5 g/L glucose) supplemented with 2 mM glutamine, 10% (v/v) fetal calf serum (FCS), 100 U/ml penicillin, and 100 .mu.g/ml streptomycin). INS-1 cells (Asfari et al., 1992) were cultured in medium B (RPMI 1640 medium supplemented with 10% FCS, 10 mM Hepes, 50 .mu.M .beta.-mercaptoethanol, 100 U/ml penicillin, and 100 .mu.g/ml streptomycin). MIN-6 cells (Miyazaki et al., 1990) were cultured in medium C (Dulbecco's modified Eagle's medium (4.5 g/L glucose) supplemented with 10% FCS, 10 mM Hepes, 50 .mu.M .beta.-mercaptoethanol, 100 U/ml penicillin, and 100 .mu.g/ml streptomycin).

[0062] For transient transfections, AtT-20 cells were set up on day 0 at 1.times.10.sup.6 per 100-mm dish; INS-1 cells and MIN-6 cells were set up at 1.5.times.10.sup.6 per 100-mm dish. On day 2, cells were transfected with plasmids using FuGENE HD Transfection Reagent (Roche) at a ratio of FuGENE HD to plasmids of 3:1. On day 3 or 4, cells were subjected to various treatments described herein. On day 4 or 5, cells were harvested for experiments. The total amount of transfected DNA in each experiment was constant and adjusted to 5 or 6 .mu.g per 100-mm dish by addition of pcDNA3.1 mock vector.

[0063] Generation of Anti-Ghrelin Antibody

[0064] DNA segments encoding mouse pro-ghrelin and ghrelin were cloned into pGEX-4T1 (GE Healthcare) to generate glutathione S-transferase (GST)-fusion proteins. For the GST-pro-ghrelin construct, the thrombin cleavage site within the vector sequence (LVPRGS) between GST and pro-ghrelin was changed to the Tobacco Etch Virus (TEV) protease site (ENLYFQG) (Kapust et al., 2001), and a His.sub.8-tag was added to the C-terminus of pro-ghrelin. GST-pro-ghrelin-His.sub.8 and GST-ghrelin were expressed in E. coli and purified using glutathione-agarose beads. GST-pro-ghrelin-His.sub.8 was cleaved by recombinant TEV protease (produced in E.coli as a GST fusion protein) to release pro-ghrelin-His.sub.8, which was further purified by nickel-affinity chromatography (Qiagen). For immunization, each rabbit was injected subcutaneously with 500 .mu.g GST-ghrelin in incomplete Freund's adjuvant, followed by sequential booster injections of 250 .mu.g GST-ghrelin and 250 .mu.g pro-ghrelin-His.sub.8, both given subcutaneously in incomplete Freund's adjuvant. The resulting rabbit anti-ghrelin antiserum recognized pro-ghrelin and ghrelin in both the desacylated and acylated forms.

[0065] Peptide Extraction from Cultured Cells.

[0066] Peptides were extracted from cultured cells using the protocol described by Kojima et al (Kojima et al., 1999). After harvesting, the cell pellet was boiled in 1-2 ml of H.sub.2O for 10 min to inactivate proteases and then cooled on ice, after which acetic acid and HCl were added directly to achieve final concentrations of 1 M and 20 mM, respectively. The cell lysate was further disrupted by passage through a 22-gauge needle 10 times, followed by centrifugation at 20,000 g for 10 min at 4.degree. C. The resulting supernatant was concentrated under vacuum to .about.20% of the original volume, subjected to 67% (v/v) acetone precipitation, and centrifuged at 20,000 g for 10 min at 4.degree. C. to remove the precipitate. The supernatant was evaporated under vacuum, and the residue was solubilized for SDS-PAGE and immunoblot analysis or reverse-phase chromatography followed by SDS-PAGE and immunoblot analysis as described below.

[0067] Immunoblot Analysis of Pro-Ghrelin and Ghrelin

[0068] The pellet containing the extracted peptides was dissolved in SDS-PAGE loading buffer (0.1 M Tris-chloride at pH 6.8, 5% (w/v) SDS, 0.1 M dithiothreitol, and 5% (v/v) glycerol), subjected to 16% Tricine SDS-PAGE, and then transferred to Immobilon-P PVDF membranes (Millipore) for immunoblot analysis. To prevent the diffusion of ghrelin during the blotting procedure, we washed each membrane three times with Phosphate-Buffered Saline (PBS) containing 0.05% Tween-20 (Sigma), after which the membrane was fixed at room temperature for 15 min in 50 mM Hepes-NaOH (pH 7.4) containing 2.5% (v/v) glutaraldehyde. The membrane was washed three times with the PBS/Tween-20 solution and then immunoblotted with either a 1:1000 dilution of anti-ghrelin antiserum or 0.5 .mu.g/ml of anti-Flag M2 monoclonal antibody. Bound antibodies were visualized by chemiluminescence using a 1:10,000 dilution of either donkey anti-rabbit IgG or donkey anti-mouse IgG conjugated to horseradish peroxidase. All membranes were exposed to Phoenix Blue X-ray film for 5 sec to 2 min at room temperature.

[0069] Separation of Desacyl-Ghrelin and Acyl-Ghrelin by Reverse-phase Chromatography

[0070] residue after evaporation of the acetone was dissolved in 3 ml of 2% (v/v) CH.sub.3CN in 0.1% (v/v) trifluoroacetic acid (TFA) and loaded onto a 360-mg Sep-Pak C18-cartridge (Waters). The cartridge was washed with 3 ml of 2% CH.sub.3CN in 0.1% TFA and eluted with a step-gradient consisting of 6 ml of solution containing 20%, 40%, and-80% CH.sub.3CN in 0.1 % TFA. The first 3 ml of each 6-ml elution were collected and evaporated under vacuum, and the residue was dissolved in 80 .mu.l of SDS-PAGE loading buffer, and aliquots of 20 .mu.l were subjected to SDS-PAGE and immunoblot analysis as described above.

[0071] Hydroxylamine Treatment

[0072] After evaporation of the 40%-CH.sub.3CN fraction from reverse-phase chromatography, the residue was suspended in 0.4 ml of solution containing 20 mM Tris-chloride (pH 8.0), 100 mM NaCl, 1 mM sodium EDTA, and Protease Inhibitors Cocktail (Roche). An aliquot of each sample (0.2 ml) was mixed with 0.2 ml of either 2 M Tris-chloride (pH 8.0) or 2 M hydroxylamine (pH 8.0) and then rotated at room temperature for 2 hr, after which the reaction was stopped by adding 0.5 ml of 1 M acetic acid. The sample was further diluted in 10 ml of 2% CH.sub.3CN in 0.1% TFA and then subjected to reverse-phase chromatography as described above.

[0073] N-Terminal Sequencing of Pro-Ghrelin and Its C-Terminal Peptide

[0074] INS-1 cells transfected with a cDNA encoding prepro-ghrelin containing a C-terminal Flag-tag were harvested by scraping on day 4 and washed once with PBS. Cells from 30 100-mm dishes were solubilized in PBS containing 0.1% (v/v) Triton X-100, 1 mM sodium EDTA, and Protease Inhibitor Cocktail. After centrifugation at 100,000 g for 30 min at 4.degree. C., a small aliquot of the supernatant (.about.1%) was subjected to SDS-PAGE and immunoblotted with anti-Flag M2 monoclonal antibody. The remainder of the supernatant was treated with 100 .mu.l of anti-Flag M2 Affinity Gel. After overnight incubation at 4.degree. C., the bound proteins were eluted by heating the gel at 95.degree. C. for 5 min in 25 mM Tris-Chloride (pH 6.8) containing 1% SDS. After centrifugation at 20,000 g for 5 min, an aliquot of the supernatant (25% of total) was loaded onto a 16% Tricine SDS-PAGE gel. After electrophoresis, proteins were transferred to an Immobilion-P.sup.SQ PVDF membrane (Millipore) and stained with 0.1% (w/v) amido black in 5% (v/v) acetic acid. After destaining with 5% acetic acid, appropriate bands were excised from the membrane and subjected to Edman degradation using the Procise 494 Protein Sequencing System (Perkin-Elmer).

[0075] [.sup.3H]Octanoate Autoradiography and Identification of [.sup.3H]Fatty Acid

[0076] [.sup.3H]Octanoate-labeled INS-1 cells were processed as described herein and then subjected to autoradiography with a Kodak Transcreen LE Intensifying Screen and Biomax MS Film at -80.degree. C. for 5 days. Radioactivity in the PVDF membrane was quantified by cutting each lane into 9 consecutive pieces from top to bottom, followed by liquid scintillation counting in 10 ml of counting cocktail (3a70B.TM., Research Products International Corp.).

[0077] To confirm the identity of the .sup.3H-labeled fatty acid linked to pro-ghrelin and ghrelin, fatty acid methyl ester (FAME) analysis was carried out. Two dishes of transfected cells were radiolabeled with [.sup.3H]octanoate. After reverse-phase chromatography, proteins in the 40%-CH.sub.3CN fraction were subjected to SDS-PAGE and transferred to a PVDF membrane. The pieces of membrane containing .sup.3H-labeled pro-ghrelin and ghrelin were cut out, pooled together, and treated with 0.5 ml of 0.1 M KOH in 100% methanol at room temperature for 2 hr to form FAME. After acidifying the sample with 0.5 ml of 1.0 M HCl, the aqueous phase was extracted twice with 0.1 ml hexane. An aliquot of the pooled organic phase (50 .mu.l) was mixed with 50 .mu.g of each FAME standard (methyl hexanoate, methyl octanoate, methyl decanoate, methyl dodecanoate, methyl myristate, and methyl palmitate) and loaded onto a C18 reverse-phase thin-layer chromatography (TLC) plate (150 .mu.m, 10.times.10 cm, Analtech). The TLC plate was developed in a solvent system of acetone/methanol/water (80:20:10, v/v/v), and FAME standards were revealed by iodine vapor counter-staining. The lane of TLC was divided into strips numbered 1 to 14 from the origin to the front, with strips 6 to 11 containing FAME standards. The resin on each strip was then scraped off and subjected to liquid scintillation counting as described above.

[0078] GOAT mRNA Expression in Mouse Tissues

[0079] Six-month old male C57BL6/J mice were fed a chow diet ad libitum prior to study. At the end of the dark phase, mice were anesthetized and exsanguinated. Various tissues were collected, snap-frozen in liquid nitrogen, and stored at -80.degree. C. The stomach, small intestine, and colon were flushed with cold PBS, after which the intestine was divided into three equal lengths, designated duodenum (proximal), jejunum (medial), and ileum (distal). Each flushed segment of the gastrointestinal tract was cut open with a small scissors, and the mucosa was carefully scraped off and placed in a tube for RNA preparation. Total RNA was prepared from mouse tissues using an RNA STAT-60 kit from Tel-Test Inc. (Friendswood, Tex., USA). Equal amounts of RNA from four mice were pooled and analyzed for mRNA expression of GOAT, ghrelin, and .beta.-actin using the TITANIUM.TM. One-Step RT-PCR Kit (Clontech). Each reaction contained 1 .mu.g of pooled total RNA isolated from different mouse tissues as described above and primers. The cycling parameters were set as 94.degree. C., 30 sec; 60.degree. C., 30 sec; and 68.degree. C., 30 sec. Number of cycles for GOAT, ghrelin, and .beta.-action was 35, 30, and 25, respectively. Aliquots (20 .mu.l) of the 50-.mu.l RT-PCR samples were loaded onto 1.5% agarose gel.

[0080] Exemplary Results

[0081] We determined the conserved sequences in the putative catalytic domains of mammalian proteins that belong to the MBOAT family. These 11 catalytic domains are found in 16 MBOAT proteins since two of the encoding genes give rise to 2 isoforms and one gives rise to 4 isoforms as a result of alternative splicing. We identified these sequences through a search of genomic databases (herein). These enzymes are postulated to transfer fatty acyl groups to hydroxyl or sulfhydryl groups, forming ester or thio-ester bonds. Among the known substrates are lipids such as cholesterol and diacylglycerol. At least one protein, Wnt, is thought to be a substrate by virtue of a serine that is acylated (Takada et al., 2006). As described below, MBOAT4 mediates the octanoylation of ghrelin, and hence it is designated GOAT. The substrates for seven of the putative MBOATs (MBOAT1-a/b, MBOAT2-a/b, MBOAT5, LRC4, and GUP1) remain unknown.

[0082] We prepared a hydropathy plot of mouse GOAT. The sequence indicates eight transmembrane segments, a finding in keeping with the sequences of other MBOATs, all of which have multiple membrane-spanning helices. The GOAT sequence is highly conserved in mammalian and avian species, and a close relative is found in zebrafish. The putative catalytic asparagine and histidine residues are conserved throughout.

[0083] As a first step in identifying the enzyme that octanoylates ghrelin, we sought to identify cultured cells that process pro-ghrelin to ghrelin. For this purpose we produced prepro-ghrelin in a variety of cultured cell lines through cDNA transfection. Prepro-ghrelin contains 117 amino acids (Kojima and Kangawa, 2005). Cleavage of the 23-amino acid signal sequence yields pro-ghrelin which has glycine as its N-terminal residue, hereafter designated residue 1. The C-terminus of mature ghrelin is generated by prohormone convertase 1/3, which cleaves after arginine-28 of pro-ghrelin, generating the mature 28-amino acid peptide (Zhu et al., 2006).

[0084] After transfection, cell extracts were subjected to SDS-PAGE and immunoblotted with a polyclonal antibody that we raised against mouse ghrelin. All of the transfected cells produced an immunoreactive peptide with an apparent molecular mass of 12 kDa that corresponds to pro-ghrelin with the signal sequence removed. Three endocrine cell lines--mouse pituitary AtT-20 cells, rat insulinoma INS-1 cells, and mouse insulinoma MIN-6 cells--all produced a smaller peptide with an apparent molecular mass of 3 kDa that corresponds to ghrelin. Two non-endocrine cell lines--human kidney HEK-293 cells and Chinese hamster ovary (CHO-7) cells--failed to produce mature ghrelin.

[0085] To confirm that the mature ghrelin band resulted from cleavage at arginine-28 of pro-ghrelin, we prepared cDNAs encoding mutant forms of prepro-ghrelin with amino acid substitutions at or near arginine-28. The cDNAs were transfected into INS-1 cells, and mature ghrelin was identified by SDS-PAGE and immunoblotting. Replacement of arginine-28 with either lysine or leucine abolished cleavage, whereas replacement of residue 26 or 27 with an arginine reduced cleavage, but did not abolish it.

[0086] To further confirm the sites of cleavage that generate ghrelin, we prepared a cDNA encoding prepro-ghrelin with a Flag-tag at the C-terminus. We introduced this cDNA into INS-1 cells and isolated the Flag-tagged peptides by adherence to an immunoaffinity gel. SDS-PAGE was used to separate the Flag-tagged pro-ghrelin and the Flag-tagged C-terminal peptide that was generated after cleavage at arginine-28 of ghrelin. The separated peptides were then transferred to PVDF membranes and processed for Edman degradation. The N-terminal sequence of pro-ghrelin was GSSFL, which is consistent with cleavage of the signal sequence at the position determined herein. The N-terminal sequence of the smaller fragment, ALEG, is consistent with cleavage after arginine-28 of ghrelin. Considered together, these data indicate that the INS-1 cells process prepro-ghrelin at the correct sites to produce authentic mature ghrelin.

[0087] We next developed a reverse-phase chromatographic procedure to separate octanoylated ghrelin from desacyl-ghrelin. For use as standards, we purchased synthetic octanoylated and desacyl-ghrelin (herein). The peptides were applied to a C18 reverse-phase cartridge and eluted with a step-gradient of 20%, 40%, and 80%-CH.sub.3CN in 0.1% TFA. The eluted peptides were subjected to SDS-PAGE and immunoblotted with anti-ghrelin. Desacyl-ghrelin was eluted in the 20%- CH.sub.3CN fraction, and octanoyl ghrelin was eluted in the 40%-CH.sub.3CN fraction. To determine whether any of the endocrine cell lines could produce octanoylated ghrelin, we transfected the cells with a cDNA encoding prepro-ghrelin and subjected the extracted peptides to reverse-phase chromatography. All of the ghrelin peptides were eluted in the 20%-CH.sub.3CN fraction, indicating that none of them was octanoylated.

[0088] We performed a series of experiments designed to determine whether any of 16 MBOATs were capable of producing octanoylated ghrelin when expressed with prepro-ghrelin in INS-1 cells. We first prepared cDNAs encoding each of the MBOATs with a C-terminal Flag-tag. When transfected into INS-1 cells, all of these cDNAs produced MBOAT protein that could be detected by SDS-PAGE and immunoblotting with anti-Flag. These cDNAs were then transfected into INS-1 cells together with a cDNA encoding prepro-ghrelin. The ghrelin peptides were extracted and subjected to reverse-phase chromatography. GOAT was the only MBOAT that produced acylated ghrelin, which was detected as a 3-kDa band that emerged in the 40%-CH.sub.3CN fraction. To confirm the acylating activity of GOAT, we repeated the co-transfection experiment. When the prepro-ghrelin cDNA was transfected together with a control cDNA (pcDNA3.1), ghrelin emerged in the 20%-CH.sub.3CN fraction, indicating a lack of acylation. We noted that pro-ghrelin emerged in the 40% and 80%-CH.sub.3CN fractions even though it was presumably not acylated. We attribute this to the known tendency of longer peptides to adhere to reverse-phase resins. When the GOAT cDNA was transfected, approximately half of the ghrelin emerged in the 40%-CH.sub.3CN fraction, indicating acylation. The elution pattern of pro-ghrelin was the same as in the control cells transfected with pcDNA3.1.

[0089] The activity of GOAT was not restricted to INS-1 cells. Expression of GOAT led to acylation of ghrelin in each of the three endocrine cell lines that were capable of processing pro-ghrelin to ghrelin. Our data confirm that the GOAT protein was expressed in the three transfected cell lines.

[0090] To confirm that ghrelin was acylated by GOAT, we tested the lability of the modification to hydroxylamine treatment, which is known to release ester-bound fatty acids from proteins (Bizzozero, 1995). When synthetic octanoylated ghrelin was treated with 1 M hydroxylamine (pH 8) the peptide no longer eluted from the reverse-phase cartridge in the 40%-CH.sub.3CN fraction. Treatment with 1 M Tris-chloride (pH 8) had no such effect. We determined the results of hydroxylamine treatment of peptide extracts obtained from INS-1 cells transfected with cDNAs encoding prepro-ghrelin and GOAT. When treated with 1M Tris-chloride, ghrelin eluted from the reverse-phase cartridge in the 40%-CH.sub.3CN fraction, but when treated with 1 M hydroxylamine it reverted to the 20%-CH.sub.3CN fraction, indicating that it had been deacylated.

[0091] Octanoylation of ghrelin in vivo is known to occur at serine-3 of the peptide. Mutation of serine-3 to alanine prevented acylation by GOAT, indicating that GOAT acylates the physiologic serine residue. Replacement of serine-3 with threonine preserved acylation, a finding consistent with the observation that this position is occupied by an octanoylated threonine in bullfrog ghrelin (Kaiya et al., 2001). Substitution of alanine for other serines in ghrelin (residues 2, 6, and 18) did not affect acylation.

[0092] Bioinformatic analysis (supra) proposed that the catalytic residues in mouse GOAT would be asparagine-307 and histidine-338. Our data demonstrate that both of these residues are required in order for GOAT to modify ghrelin. Substitution of either of these residues with alanine abolished GOAT's ability to acylate ghrelin. Another mutation (cysteine-181 to alanine) had no effect. We determined that all of the GOAT cDNAs were expressed at similar levels in the transfected cells.

[0093] To confirm that GOAT modifies ghrelin with octanoate, we transfected INS-1 cells with cDNAs encoding prepro-ghrelin, and wild-type or mutant version of GOAT. The cells were incubated with [.sup.3H]octanoate, and the extracted peptides were subjected to reverse-phase chromatography. Each 40%-CH.sub.3CN fraction was subjected to SDS-PAGE, after which the radiolabeled peptides were transferred to duplicate PVDF membranes. One membrane was subjected to immunoblot analysis with anti-ghrelin, demonstrating that pro-ghrelin was present in all lanes while ghrelin was detected only in lane 2. The other membrane was subjected to autoradiography to visualize the labeled proteins. For quantification, each lane of the membrane was cut into 9 slices, which were then subjected to scintillation counting. When the cells were transfected with the GOAT cDNA, labeled peptides were observed in the position of pro-ghrelin and ghrelin. As expected, no radioactivity was incorporated into the S3A mutant of ghrelin. Lane 4 shows the result when prepro-ghrelin contained leucine in place of arginine at the residue corresponding to position 28 of ghrelin. This substitution prevents the cleavage of pro-ghrelin to ghrelin. In this case, we observed radiolabeling of the pro-ghrelin band, but there was no ghrelin band. We observed no labeled band when the cells were transfected with a cDNA encoding a catalytically inactive mutant of GOAT (H338A). As a further control, we found that transfection of a cDNA encoding another MBOAT (MBOAT1-a) failed to produce a radiolabeled band.

[0094] To confirm that the cells had incorporated [.sup.3H]octanoate without changing its length, we removed the labeled fatty acid from the protein by methanolysis and subjected the methyl ester to thin-layer chromatography (TLC) in a system that separates fatty acid methyl esters according to chain length. Scintillation counting of the TLC plate confirmed that the material attached to pro-ghrelin and ghrelin was the eight-carbon [.sup.3H]octanoate.

[0095] Finally, we used semi-quantitative PCR to compare the levels of GOAT and prepro-ghrelin mRNAs in various tissues of the mouse. As previously reported (Kojima et al., 1999), prepro-ghrelin mRNA was expressed most highly in the stomach followed by the intestine. There was very little expression in other tissues. Likewise, GOAT mRNA was highest in stomach, and detectable in the small intestine and colon, but not in other tissues. In stomach, we noted that the amount of GOAT mRNA appeared to be much lower than the amount of prepro-ghrelin mRNA. Even after 35 cycles of PCR, the intensity of the amplified GOAT product was less than that observed with prepro-ghrelin after only 30 cycles. This relative difference of .about.200-fold was confirmed in experiments using quantitative RT-PCR. In vitro octanoylation assay

[0096] GOAT-ghrelin Acylation Assays

[0097] To facilitate screening for GOAT-ghrelin acylation inhibitors, we developed specific acylation assays. In one embodiment, enriched membranes stimulate the octanoylation of recombinant pro-ghrelin when incubated with [.sup.3H]octanoyl CoA as a source of the [.sup.3H]octanoyl group. When the assay contained membranes from INS-1 cells that had been transfected with GOAT cDNA, the amount of .sup.3H-radioactivity covalently linked to pro-ghrelin increased 5-fold above the background observed in assays containing membranes from mock-transfected INS-1 cells. No such increase was seen when the S3A mutant version of pro-ghrelin was incubated with wild-type GOAT-containing membranes or when wild type pro-ghrelin was incubated with membranes enriched in the catalytically impaired H338A mutant version of GOAT.

[0098] The acylating activity of GOAT could also be reconstituted in vitro using membranes from Sf9 insect cells that had been infected with baculovirus encoding GOAT cDNA. When wild-type pro-ghrelin was used as a substrate, the amount of [.sup.3H]octanoyl pro-ghrelin formed was more than 5-fold higher than when the S3A mutant pro-ghrelin was used as the substrate. The acylating activity of GOAT in the membranes of Sf9 insect cells was .about.5-fold higher than that of INS-1 cells.

[0099] GOAT Acylation Assay Protocols

[0100] Each assay tube, in a final volume of 50 .mu.l, contained 50 mM Tris-chloride at pH 7.0, 2 mM Na-ATP, 5 mM MgCl.sub.2, 1 mM Na-EDTA, 160 .mu.g of membrane proteins from either INS-1 cells or Sf9 cells (see below), 5 .mu.g recombinant wild-type or mutant pro-ghrelin-His.sub.8 (see below), and [.sup.3H-2,2',3,3']octanoyl CoA (132 dpm/fmol, American Radiolabeled Chemicals). The tubes were sonicated in a water-bath sonicator at 4.degree. C. for 1 min, followed by incubation at 30.degree. C. for 30 min. Reactions were stopped by addition of 1 ml of buffer A (50 mM Tris-chloride at pH 7.5, 150 mM NaCl, and 0.1% (w/v) Fos-choline 13). After centrifugation at 20,000 g for 5 min at 4.degree. C., each supernatant was loaded onto a 0.2-ml nickel affinity column to retrieve the [.sup.3H]octanoyl-labeled pro-ghrelin. The column was washed three times with 1 ml of buffer A containing 50 mM imidazole, followed by elution with 1 ml of buffer A containing 250 mM imidazole. Radioactivity present in the eluate was counted by liquid scintillation as described above under "[.sup.3H]Octanoate Autoradiography and Identification of [.sup.3H]Fatty Acid."

[0101] Recombinant wild-type and S3A mutant version of pro-ghrelin-His.sub.8 were produced as GST-fusion proteins described above under "Generation of Anti-Ghrelin Antibody." After removal of the GST by cleavage with TEV protease, the His.sub.8-tagged wild-type and mutant pro-ghrelins were purified by nickel-affinity chromatography and stored at -80.degree. C. at a stock concentration of 1 mg/ml in 10 mM Tris-chloride at pH 8.5, 50 mM NaCl, 10% (v/v) glycerol, and 0.01% (w/v) CHAPS.

[0102] Two sources of membrane proteins containing GOAT were used in the above in vitro assay--one prepared from INS-1 cells transfected with GOAT cDNA and the other from Sf9 insect cells infected with baculovirus containing GOAT cDNA. INS-1 cells were set up for experiments on day 0 as described above under "Cell Culture and Transient Transfection." On day 2, cells were transfected with 5 .mu.g pcDNA3.1 or 5 .mu.g of a cDNA encoding wildtype or H338A mutant version of mouse GOAT. On day 5, cells were harvested, and after washing once with PBS, the cell pellets were frozen at -80.degree. C. Sf9 insect cells were infected at a density of 1.times.10.sup.6/ml with baculovirus containing GOAT cDNA. Cells were harvested 48 hr post-infection, and after washing once with PBS, the cell pellets were frozen at -80.degree. C. Procedures for insertion of GOAT cDNA into pFastBac HT-A (His.sub.10-tag), generation of baculovirus, and culture of Sf9 cells were carried out by standard methods (see Radhakrishnan, et al. 2004, Mol. Cell 15, 259-268.).

[0103] Each pellet of INS-1 cells or Sf9 cells was homogenized on ice in 50 mM Tris-chloride at pH 7.0, 1 mM Na-EDTA, and 40 .mu.g/ml phenylmethanesulfonyl fluoride (PMSF) by passing through a 22-gauge needle for 30 times. After an initial centrifugation at 1,000 g for 5 min at 4.degree. C., the supernatant was centrifuged at 20,000 g for 10 min at 4.degree. C. The resulting membrane fraction (20,000 g pellet) from five 100-mm dishes of INS-1 cells or 20 ml of Sf9 cell culture was resuspended in 0.2 ml of homogenizing buffer.

[0104] The foregoing description and examples are offered by way of illustration and not by way of limitation. All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.

[0105] Appendix: cDNA and Protein Sequences of GOATs from 6 Mammals and Zebrafish.

[0106] Sequences were deduced by the tblatn program from NCBI genomic databases queried with the experimentally determined mouse GOAT protein sequence shown below.

[0107] Of the 7 GOAT protein sequences from the 7 species shown below, only 2 of these sequences in the RefSeq NCBI database (mouse and chimpanzee) matched the N-terminus of our cloned and experimentally active mouse GOAT sequence. The other 5 sequences (from rat, human, bovine, horse, and zebrafish) showed N-termini inconsistent with the mouse start in that they lacked the N-terminal segments containing the first .about.50 to 100 amino acids. Apparently, the software for prediction of coding regions missed the first one or two coding exons in these 5 species. However, tblastn searches of genomic assemblies from each of these 5 species revealed the missing N-terminal segments for all 5 sequences, each of which exhibited high sequence similarity to the mouse GOAT sequence.

[0108] Here, we list the complete protein sequences for mouse, rat, human, chimpanzee, bovine, horse, and zebrafish, and we provide DNA sequences for the coding exons of the 5 species whose N-terminal regions in RefSeq NCBI protein database are apparently incorrect.

TABLE-US-00001 Mouse Experimentally determined mouse cDNA (method for obtaining correct cDNA described in patent) sequence after the stop codon is not included, start codon is shown in bold letters (SEQ ID NO:01) GACTTCCCTTTTACAAGGGCACCGCTTAGGGACTCTAGGAAGGACAGTGG GCCTCACATTCAGGATGGATTGGCTCCAGCTCTTTTTTCTGCATCCTT TATCATTTTATCAAGGGGCTGCATTCCCCTTTGCGCTTCTGTTTAATTAT CTCTGCATCTTGGACACCTTTTCCACCCGGGCCAGGTACCTCTTTCTCCT GGCTGGAGGAGGTGTCCTGGCTTTTGCTGCCATGGGTCCCTACTCTCTGC TCATCTTCATCCCTGCGCTCTGCGCTGTGGCTCTGGTCTCCTTCCTCAGT CCACAGGAAGTCCATAGGCTGACCTTCTTCTTTCAGATGGGCTGGCAGAC CCTGTGCCATCTGGGTCTTCACTACACCGAATACTACCTGGGTGAGCCTC CACCCGTGAGGTTCTACATCACTCTTTCTTCCCTCATGCTCTTGACGCAG AGAGTCACATCCCTCTCACTGGACATTTGTGAAGGGAAGGTGGAGGCCCC GAGGCGGGGCATCAGGAGCAAGAGTTCTTTCTCTGAGCACCTGTGGGATG CTCTACCTCATTTCAGCTACTTGCTCTTTTTCCCTGCTCTCCTGGGAGGC TCCCTGTGTTCCTTCCGGAGGTTTCAGGCTTGCGTTCAAAGATCAAGCTC TTTGTATCCGAGTATCTCTTTTCGGGCTCTGACCTGGAGGGGTCTGCAGA TTCTCGGGCTGGAGTGCCTCAAGGTGGCGCTGAGGAGCGCGGTGAGTGCT GGAGCTGGACTGGATGACTGCCAGCGGCTGGAGTGCATCTACCTCATGTG GTCCACAGCCTGGCTCTTTAAACTCACCTATTACTCCCATTGGATCCTGG ACGACTCTCTCCTCCACGCGGCGGGCTTTGGCGCTGAGGCTGGCCAGGGG CCTGGAGAGGAGGGATACGTCCCCGACGTGGACATTTGGACCCTGGAAAC TACCCACAGGATCTCCCTGTTCGCCAGGCAGTGGAACCGAAGCACAGCTC TGTGGCTCAGGAGGCTCGTCTTCCGGAAGAGCCGGCGCTGGCCCCTGCTG CAGACATTTGCCTTCTCTGCCTGGTGGCACGGGCTCCACCCAGGTCAGGT GTTCGGCTTCCTGTGCTGGTCTGTAATGGTGAAAGCCGATTATCTGATTC ACACTTTTGCCAACGTATGTATCAGATCCTGGCCCCTGCGGCTGCTTTAT AGAGCCCTCACTTGGGCTCATACCCAACTCATCATTGCCTACATCATGCT GGCGGTGGAGGGCCGGAGCCTTTCCTCTCTCTGCCAACTGTGCTGTTCTT ACAACAGTCTCTTCCCTGTGATGTACGGTCTTTTGCTTTTTCTGTTAGCG GAGAGAAAAGACAAACGTAACTGA protein sequence >gi|149258535|ref|XP_001476484.1| PREDICTED: similar to FKSGS9 [Mus musculus] (SEQ ID NO:02) MDWLQLFFLHPLSFYQGAAFPFALLFNYLCILDTFSTRARYLFLLAGGGV LAFAAMGPYSLLIFIPALCAVALVSFLSPQEVHRLTFFFQMGWQTLCHLG LHYTEYYLGEPPPVRFYITLSSLMLLTQRVTSLSLDICEGKVEAPRRGIR SKSSFSEHLWDALPHFSYLLFFPALLGGSLCSFRRFQACVQRSSSLYPSI SFRALTWRGLQILGLECLKVALRSAVSAGAGLDDCQRLECIYLMWSTAWL FKLTYYSHWILDDSLLHAAGFGAEAGQGPGEEGYVPDVDIWTLETTHRIS LFARQWNRSTALWLRRLVFRKSRRWPLLQTFAFSAWWHGLHPGQVFGFLC WSVMVKADYLIHTFANVCIRSWPLRLLYRALTWAHTQLIIAYIMLAVEGR SLSSLCQLCCSYNSLFPVMYGLLLFLLAERKDKRN

TABLE-US-00002 Rat coding DNA region in 3 exons >ref|NW_047474.1|Rn16_WGA1996_4:C1695518-1695399 Rattus norvegious chromosome 16 genomic contig, reference assembly (based on RGSC v3.4) ATGGATTGGCTCCAGTTCTTCTTTCTCCATCCTGTATCACTTTATCAAGGGGCTGCTTTCCCCTTCGCGC TTCTGTTTAATTATCTCTGCATCACGGAATCCTTTCCCACCCGGGCCAGG (SEQ ID NO: 03) >ref|NW_047474.1|Rn16_WGA1996_4:c1690790-1690565 Rattus norvegious chromosome 16 genomic contig, reference assembly (based on RGSC v3.4) TACCTCTTTCTCCTGGCTGGAGGAGGTGTCCTGGCTTTGGCCGCCATGGGTCCCTACGCTCTGCTCATTT TCATCCCTGCTCTCTGTGCCGTGGCTATGATCTCCTCCCTCAGTCCACAGGAAGTCCATGGGCTGACTTT CTTCTTTCAGATGGGTTGGCAAACCCTGTGCCACCTGGGTCTTCACTACAAGGAGTACTACCTGTGTGAG CCTCCCCCTGTGAGG (SEQ ID NO: 04) >ref|NW_047474.1|Rn16_WGA1996_4:c1688186-1687224 Rattus norvegious chromosome 16 genomic contig, reference assembly (based on RGSC v3.4) TTCTACATCACTCTTTCTTCCCTCATGCTCTTGACGCAGAGAGTCACGTCTCTCTCCCTGGACATTTCTG AAGGGAAGGTGGAGGCAGCGTGGAGGGGCACCAGGAGCAGGAGTTCTTTGTGTGAGCACCTGTGGGATGC TCTACCCTATATCAGCTATTTGCTCTTTTTCCCTGCACTCCTGGGAGGCTCCCTGTGTTCCTTTCAGAGA TTTCAGGCTTGCGTTCAAAGACCAAGGTCTTTGTATCCCAGTATCTCTTTCTGGGCTCTGACCTGGAGGG GTCTGCAGATCCTTGGGCTGGAGTGCCTCAAGGTGGCGCTGAGGAGGGTGGTGAGTGCTGGCGCTGGACT GGATGATTGCCAGCGACTGGAGTGCATCTACATCATGTGGTCCACCGCTGGGCTCTTTAAACTCACCTAC TACTCCCACTGGATCCTGGACGACTCTCTCCTTCACGCGGCGGGCTTTGGATCTGAGGCTGGCCAGAGGC CTGGAGAGGAGAGATACGTCCCGGATGTGGACATTTGGACATTGGAAACTACCCACAGGATCTCCCTGTT CGCGAGGCAGTGGAACCGAAGCACAGCTCAGTGGCTCAAGAGGCTTGTCTTCCAGAGGAGCCGGCGCTGG CCCGTGCTGCAGACTTTTGCCTTCTCTGCCTGGTGGCACGGACTCCACCCAGGACAGGTGTTTGGCTTCC TGTGCTGGTCTGTGATGGTGAAAGCCGACTATCTGATCCACACTTTTGCCAATGGATGTATCAGATCCTG GCCCCTGCGGCTGCTTTATAGATCCCTCACTTGGGCCCACACTCAGATCATCATTGTTACGTTAATGCTG GCCGTGGAGGGCCGGAGCTTTTCCTCTCTCTGCCGGCTGTGCTGTTCTTACAACAGTATCTTCCCTGTAA CGTACTGCCTTTTGCTTTTTCTATTAGCGAGGAGAAAACACAAGTGTAACTGA (SEQ ID NO: 05) protein sequence region that we predict on the basis of genomic DNA (corresponding to the first two coding exons in mouse sequence), but absent from the NCBI protein sequence is highlighted with underline; ##STR00001## atggattggctccagttcttctttctccatcctgtatcactttatcaaggggctgctttc M D W L Q F F F L H P V S L Y Q G A A F cccttcgcgcttctgtttaattatctctgcatcacggaatcctttcccacccgggccagg P F A L L F N Y L C I T E S F P T R A R tacctctttctcctggctggaggaggtgtcctggctttggccgccatgggtccctacgct Y L F L L A G G G V L A L A A M G P Y A ctgctcattttcatccctgctctctgtgccgtggctatgatctcctccctcagtccacag L L I F I P A L C A V A M I S S L S P Q gaagtccatgggctgactttcttctttcagatgggttggcaaaccctgtgccacctgggt E V H G L T F F F Q M G W Q T L C H L G cttcactacaaggagtactacctgtgtgagcctccccctgtgaggttctacatcactctt L H Y K E Y Y L C E P P P V R F Y I T L tcttccctcatgctcttgacgcagagagtcacgtctctctccctggacatttctgaaggg S S L M L L T Q R V T S L S L D I S E G aaggtggaggcagcgtggaggggcaccaggagcaggagttctttgtgtgagcacctgtgg K V E A A W R G T R S R S S L C E H L W gatgctctaccctatatcagctatttgctctttttccctgcactcctgggaggctccctg D A L P Y I S Y L L F F P A L L G G S L tgttcctttcagagatttcaggcttgtcgctcaaagaccaaggtctttgtatcccagtatc C S F Q R F Q A C V Q R P R S L Y P S I tctttctgggctctgacctggaggggtctgcagatccttgggctggagtgcctcaaggtg S F W A L T W R G L Q I L G L E C L K V gcgctgaggagggtggtgagtgctggcgctggactggatgattgccagcgactggagtgc A L R R V V S A G A G L D D C Q R L E C atctacatcatgtggtccaccgctgggctctttaaactcacctactactcccactggatc I Y I M W S T A G L F K L T Y Y S H W I ctggacgactctctccttcacgcggcgggctttggatctgaggctggccagaggcctgga L D D S L L H A A G F G S E A G Q R P G gaggagagatacgtcccggatgtggacatttggacattggaaactacccacaggatctcc E E R Y V P D V D I W T L E T T H R I S ctgttcgcgaggcagtggaaccgaagcacagctcagtggctcaagaggcttgtcttccag L F A R Q W N R S T A Q W L K R L V F Q aggagccggcgctggcccgtgctgcagacttttgccttctctgcctggtggcacggactc R S R R W P V L Q T F A F S A W W H G L cacccaggacaggtgtttggcttcctgtgctggtctgtgatggtgaaagccgactatctg H P G Q V F G F L C W S V M V K A D Y L atccacacttttgccaatggatgtatcagatcctggcccctgcggctgctttatagatcc I H T F A N G C I R S W P L R L L Y R S ctcacttgggcccacactcagatcatcattgcttacgtaatgctggccgtggagggccgg L T W A H T Q I I I A Y V M L A V E G R agcttttcctctctctgccggctgtgctgttcttacaacagtatcttccctgtaacgtac S F S S L C R L C C S Y N S I F P V T Y Tgccttttgctttttctattagcgaggagaaaacacaagtgtaactga (SEQ ID NO: 07) C L L F L L A R R K H K C N - (SEQ ID NO: 06)

TABLE-US-00003 Human [The predicted cDNA sequence for human GOAT, shown below, was verified experimentally by reverse transcription/polymerase chain reaction (RT PCR) of human stomach RNA (obtained from Clontech), followed by cDNA cloning in E. coli of the RT PCR product (inserted into pcDNA3 vector) and DNA sequencing of the cloned cDNA. This sequence verification was performed on Dec. 20, 2007.] coding DNA region in 3 exons >ref|NR_007995.14|Hs8_8152:c322891-322772 Homo sapiens chromosome 8 genomic contig, reference assembly ATGGAGTGGCTTTGGCTGTTCTTTCTCCATCCTATATCGTTTTACCAGGGGGCTGCATTTCCCTTTGCAC TTCTCTTCAATTATCTCTGCATCATGGATTCATTCTCCACTCGTGCCAGG (SEQ ID NO: 08) >ref|NT_007995.14|Hs8_8152:c317045-316821 Homo sapiens chromosome 8 genomic contig, reference assembly TACCTCTTTCTCCTGACTGGAGGAGGTGCCCTGGCCGTGGCTGCCATGGGTTCCTACGCCGTGCTCGTCT TCACCCCTGCTGTCTGCGCTGTGGCTCTCCTCTGTTCCCTGGCTCCTCAGCAAGTCCACAGGTGGACCTT CTGCTTTCAGATGAGCTGGCAGACCTTGTGTCACCTAGGTCTGCACTACACTGAGTATTATCTGCATGAG CCTCCTTCTGTGAGG (SEQ ID NO: 09) >ref|INT_007995.14|Hs8_8:52:c311195-310233 Homo sapiens chromosome 8 genomic contig, reference assembly TTCTGCATCACTCTTTCTTCTCTCATGCTCTTGACCCAGAGGGTCACGTCCCTCTCTCTGGACATTTGTG AGGGGAAAGTGAAGGCAGCATCTGGAGGCTTCAGGAGCAGGAGCTCTTTGTCTGAGCATGTGTGTAAGGC ACTGCCCTATTTCAGCTACTTGCTCTTTTTCCCTGCTCTCCTGGGAGGCTCTCTGTGCTCCTTCCAGCGA TTTCAGGCTCGTGTTCAAGGGTCCAGTGCTTTGCATCCCAGACACTCTTTCTGGGCTCTGAGCTGGAGGG GTCTGCAGATTCTTGGACTAGAATGCCTAAACGTGGCAGTGAGCAGGGTGGTGGATGCAGGAGCGGGACT GACTGATTGCCAGCAATTCGAGTGCATCTATGTCGTGTGGACCACAGCTGGGCTTTTCAAGCTCACCTAC TACTCCCACTGGATCCTGGACGACTCCCTCCTCCACGCAGCGGGCTTTGGGCCTGAGCTTGGTCAGAGCC CTGGAGAGGAGGGATATGTCCCCGATGCAGACATCTGGACCCTGGAAAGAACCCACAGGATATCTGTGTT CTCAAGAAAGTGGAACCAAAGCACAGCTCGATGGCTCCGACGGCTTGTATTCCAGCACAGCAGGGCTTGG CCGTTGTTGCAGACATTTGCCTTCTCTGCCTGGTGGCATGGACTCCATCCAGGACAGGTGTTTGGTTTCG TTTGCTGGGCCGTGATGGTGGAAGCTGACTACCTGATTCACTCCTTTGCCAATGAGTTTATCAGATCCTG GCCGATGAGGCTGTTCTATAGAACCCTCACCTGGGCCCACACCCAGTTGATCATTGCCTACATCATGCTG GCTGTGGAGGTCAGGAGTCTCTCCTCTCTCTGGTTGCTCTGTAATTCGTACAACAGTGTCTTTCCCATGG TGTACTGTATTCTGCTTTTGCTATTGGCGAAGAGAAAGCACAAATGTAACTGA (SEQ ID NO: 010) protein sequence region that we predict on the basis of genomic DNA (corresponding to the first two coding exons in mouse sequence), but absent from the NCBI protein sequence is highlighted in underline; ##STR00002## atggagtggctttggctgttctttctccatcctatatcgttttaccagggggctgcattt M E W L W L F F L H P I S F Y Q G A A F ccctttgcacttctcttcaattatctctgcatcatggattcattctccactcgtgccagg P F A L L F N Y L C I M D S F S T R A R tacctctttctcctgactggaggaggtgccctggccgtggctgccatgggttcctacgcc Y L F L L T G G G A L A V A A M G S Y A gtgctcgtcttcacccctgctgtctgcgctgtggctctcctctgttccctggctcctcag V L V F T P A V C A V A L L C S L A P Q caagtccacaggtggaccttctgctttcagatgagctggcagaccttgtgtcacctaggt Q V H R W T F C F Q M S W Q T L C H L G ctgcactacactgagtattatctgcatgagcctccttctgtgaggttctgcatcactctt L H Y T E Y Y L H E P P S V R F C I T L tcttctctcatgctcttgacccagagggtcacgtccctctctctggacatttgtgagggg S S L M L L T Q R V T S L S L D I C E G aaagtgaaggcagcatctggaggcttcaggagcaggagctctttgtctgagcatgtgtgt K V K A A S G G F R S R S S L S E H V C aaggcactgccctatttcagctacttgctctttttccctgctctcctgggaggctctctg K A L P Y F S Y L L F F P A L L G G S L tgctccttccagcgatttcaggctcgtgttcaagggtccagtgctttgcatcccagacac C S F Q R F Q A R V Q G S S A L H P R H tctttctgggctctgagctggaggggtctgcagattcttggactagaatgcctaaacgtg S F W A L S W R G L Q I L G L E C L N V gcagtgagcagggtggtggatgcaggagcgggactgactgattgccagcaattcgagtgc A V S R V V D A G A G L T D C Q Q F E C atctatgtcgtgtggaccacagctgggcttttcaagctcacctactactcccactggatc I Y V V W T T A G L F K L T Y Y S H W I ctggacgactccctcctccacgcagcgggctttgggcctgagcttggtcagagccctgga L D D S L L H A A G F G P E L G Q S P G gaggagggatatgtccccgatgcagacatctggaccctggaaagaacccacaggatatct E E G Y V P D A D I W T L E R T H R I S gtgttctcaagaaagtggaaccaaagcacagctcgatggctccgacggcttgtattccag V F S R K W N Q S T A R W L R R L V F Q cacagcagggcttggccgttgttgcagacatttgccttctctgcctggtggcatggactc H S R A W P L L Q T F A F S A W W H G L catccaggacaggtgtttggtttcgtttgctgggccgtgatggtggaagctgactacctg H P G Q V F G F V C W A V M V E A D Y L attcactcctttgccaatgagtttatcagatcctggccgatgaggctgttctatagaacc I H S F A N E F I R S W P M R L F Y R T ctcacctgggcccacacccagttgatcattgcctacatcatgctggctgtggaggtcagg L T W A H T Q L I I A Y I M L A V E V R agtctctcctctctctggttgctctgtaattcgtacaacagtgtctttcccatggtgtac S L S S L W L L C N S Y N S V F P M V Y Tgtattctgcttttgctattggcgaagagaaagcacaaatgtaactga (SEQ ID NO: 12) C I L L L L L A K R K H K C N - (SEQ ID NO: 11)

TABLE-US-00004 Chimpanzee Correct protein sequence is present in the database >gi|114619777|ref|XP_519692.2| PREDICTED: hypothetical prorein LOC464094 [Pan troglodytes] (SEQ ID NO:13) MEWLRLFFLHPVSFYQGAAFPFALLFNYLCIMDSFSTRARYLFLLAGGGA LAVAAMGSYAVLVFTPAVCAVALLCSLAPQQVHRWTFCFQMSWQTLCHLG LHYTEYYLHEPPSVRFCITLSSLMLLTQRVTSLSLDICEGKVEAASGGFR SRSSLSEHVCKALPYFSYLLFFPALLGGSLCSFQRFQARVQGSSALHPRH SFWALSWRCLQILGLECLNVAVSRVVDAGAGLTDCQQFECIYVVWTTAGL FKLTYYSHWILDDSLLHAAGFGPELGQSPGEEGYVPDADIWTLERTHRIS VFARKWNQSTARWLRRLVFQHSRAWPLLQTFAFSAWWHGLHPGQVFGFVC WAVMVEADYLIHSFANEFIRSWPMRLFYRTLTWAHTQLIIAYIMLAVEVR SLSSLWLLCNSYNSVFPMVYCILLLLLVKRKHKCN

TABLE-US-00005 Bovine coding DNA region in 3 exons >ref|NW_001494415.1|Bt27_WGA2723_3:c220739-220620 Bos taurus chromosome 27 genomic contig, reference assembly (based on Btau_3.1), whole genome shotgun sequence ATGGATTGGCTCCAGCTGTTCTTCCTTGATCCTGTATCACTTTATCAAGGAGCTGCTTTTCCTTTTGCAC TTCTGTTTAATCATCTCTGTGTTATGGATTCATTTTCCACTCAGGCCAGG (SEQ ID NO: 14) >ref|NW_001494415.1|Bt27_WGA2723_3:c216688-216464 Bos taurus chromosome 27 genomic contig, reference assembly (based on Btau_3.1), whole genome shotgun sequence TACCTGTTCCTCCTGGCGGGAGGCGGTGCCCTGGCCGTGGCTGCTATGGGTGCCTTCGCTGTGCTGGTCT TCATCCCCGCCCTGTGCACGGTGGTCCTCATCCACTCGCTTGGCCCCCAGGATGTCCACAGGCCGACCTT CCTCTTTCAGATGACCTGGCAGACGCTGTGCCACCTGGGTCTGCACTATACGGAGTATTATCTGCAAGAA GCTCCTTCTACAAGG (SEQ ID NO: 15) >ref|NW_001494415.1|Bt27_WGA2723_3:c212687-212725 Bos taurus chromosome 27 genomic contig, reference assembly (based on Btau_3.1), whole genome shotgun sequence TTCTGCATCACTCTCTCTTCGCTCATGCTCTTGACCCAGAAGATCACATCTCTGTCTCTGGATATTCGTG AGGGGAAGGTGGTAGCACCATCAGGACGCATCCCTAACAAGAATTCTTTGTCTGAGCATCTGCATGCGGC TCTTCCCTATCTCAGCTACTTGCTCTTCTTCCCTGCCCTCCTAGGAGGCCCGCTGTGTTCCTTCCAGAGG TTTCAGGCTCGAGTTGAAGGGTCCAGCAGTTTGTGGTCCAGGCACTCTTTCTGGGCTCTGACCTGGAGGG CGCTGCAGATCCTGGGACTGGAGAGTCTGAAGGTGATCGTCAGCGGGGTGGTGGGCGTGGGGGCAGGACT TGGAGGCTGCAGGCAGCTGCAGTGCGTCTTCGTCCTGTGGTCCACGGCCGGGCTCTTCAAACTCACCTAC TACTCCCACTGGCTCCTGGATGACGCCCTCCTCCGCGCGGCCGGCTTTGGATCTGAGTTAGGTCGCAGCC CGGGTGAGGAGGGACTCCTCCCCGATGCGGACATTTGGACGCTGGAAACGACCCACAGGATAGCCCTGTT CGCCAGGAAGTGGAACCAGAGCACGGCTCGGTGGCTCCGACGCCTGGTTTTCCAGCAGCGCAGGACCTGG CCCTTGTTGCAGACATTCCTCTTCTCGGCCTGGTGGCACGGTCTCCACCCGGGACAGGTGTTTGGTTTCC TCTGCTGGGCTGTCATGGTGGAAGCCGACTACCTGATTCACGCCTTCGCCAGCGTGTTCATCAGCTCCTG GCCCATGCGGCTGCTCTACAGAGCCCTGGCCTGGGCCCACACCCAGCTCATCATCGCCTACATAATGCTG GCCGTGGAGGCCCGGAGCCTCTCCTCTCTCTGGCTGCTGTGGAATTCTTACAGCAGTGTCTTTCCCACGG TGTACTGTATTTTGCTTCTCCTGTTAGCAAAGAGAAAGCATAAATGCAACTGA (SEQ ID NO: 16) protein sequence region that we predict on the basis of genomic DNA (corresponding to the first two coding exons in mouse sequence), but absent from the NCBI protein sequence is highlighted in underline; ##STR00003## atggattggctccagctgttcttccttgatcctgtatcactttatcaaggagctgctttt M D W L Q L F F L D P V S L Y Q G A A F ccttttgcacttctgtttaatcatctctgtgttatggattcattttccactcaggccagg P F A L L F N H L C V M D S F S T Q A R tacctgttcctcctggcgggaggcggtgccctggccgtggctgctatgggtgccttcgct Y L F L L A G G G A L A V A A M G A F A gtgctggtcttcatccccgccctgtgcacggtggtcctcatccactcgcttggcccccag V L V F I P A L C T V V L I H S L G P Q gatgtccacaggccgaccttcctctttcagatgacctggcagacgctgtgccacctgggt D V H R P T F L F Q M T W Q T L C H L G ctgcactatacggagtattatctgcaagaagctccttctacaaggttctgcatcactctc L H Y T E Y Y L Q E A P S T R F C I T L tcttcgctcatgctcttgacccagaagatcacatctctgtctctggatattcgtgagggg S S L M L L T Q K I T S L S L D I R E G aaggtggtagcaccatcaggacgcatccctaacaagaattctttgtctgagcatctgcat K V V A P S G R I P N K N S L S E H L H gcggctcttccctatctcagctacttgctcttcttccctgccctcctaggaggcccgctg A A L P Y L S Y L L F F P A L L G G P L tgttccttccagaggtttcaggctcgagttgaagggtccagcagtttgtggtccaggcac C S F Q R F Q A R V E G S S S L W S R H tctttctgggctctgacctggagggcgctgcagatcctgggactggagagtctgaaggtg S F W A L T W R A L Q I L G L E S L K V atcgtcagcggggtggtgggcgtgggggcaggacttggaggctgcaggcagctgcagtgc I V S G V V G V G A G L G G C R Q L Q C gtcttcgtcctgtggtccacggccgggctcttcaaactcacctactactcccactggctc V F V L W S T A G L F K L T Y Y S H W L ctggatgacgccctcctccgcgcggccggctttggatctgagttaggtcgcagcccgggt L D D A L L R A A G F F S E L G R S P G gaggagggactcctccccgatgcggacatttggacgctggaaacgacccacaggatagcc E E G L L P D A D I W T L E T T H R I A ctgttcgccaggaagtggaaccagagcacggctcggtggctccgacgcctggttttccag L F A R K W N Q S T A R W L R R L V F Q cagcgcaggacctggcccttgttgcagacattcctcttctcggcctggtggcacggtctc Q R R T W P L L Q T F L F S A W W H G L cacccgggacaggtgtttggtttcctctgctgggctgtcatggtggaagccgactacctg H P G Q V F G F L C W A V M V E A D Y L attcacgccttcgccagcgtgttcatcagctcctggcccatgcggctgctctacagagcc I H A F A S V F I S S W P M R L L Y R A ctggcctgggcccacacccagctcatcatcgcctacataatgctggccgtggaggcccgg L A W A H T Q L I I A Y I M L A V E A R agcctctcctctctctggctgctgtggaattcttacagcagtgtctttcccacggtgtac S L S S L W L L W N S Y S S V F P T V Y Tgtattctgcttctcctgttagcaaagagaaagcataaatgcaactga (SEQ ID NO: 18) C I L L L L L A K R K H K C N - (SEQ ID NO: 17)

TABLE-US-00006 Horse coding DNA region in 3 exons >ref|NW_001799700.1|Eca27_WGA83_1:7589091-7589210 Equus caballus chromosome 27 genomic contig, reference assembly (based on EquCab1 scaffold_68), whole genome shotgun sequence ATGGGTTGGCTTCAGCTGTTCCTTCTCCATCCTGTATCACTTTATCAAGGGGCCGCTTTTCCTTTTGCAC TTCTATTTAATTACCTTTGCACTATGGATTCATTTTCCACTCATGCCAGG (SEQ ID NO: 19) >ref|NW_001799700.1|Eca27_WGA83_1:7591734-7591958 Equus caballus chromosome 27 genomic contig, reference assembly (based on EquCab1 scaffold_68), whole genome shotgun sequence TACCTCTTTCTGCTGGCAGGAGGAGGCGCCCTGGCCTTGGCCGCTATGGGTCCCTTTGCTGTGCTTGTCT TCATCCCTGCGATATGTGCTGTGTTTCTGATCTGCTTGCTCAGCCCACAGGAAGTCCACAGGCAGACTTT CTGCTTTCAGATGAGCTGGCAGACGCTGTGTCACCTGGGTCTGCACTATACTGAGTATTATCTGCAAGAA CTTCCTTCCACGAGG (SEQ ID NO: 20) >ref[NW_001799700.1|Eca27_WGA83_1:7594135-7595097 Equus caballus chromosome 27 genomic contig, reference assembly (based on EquCab1 scaffold_68), whole genome shotgun sequence TTCTGCCTCGCTCTTTCTTCCCTCATGCTCTTGACCCAGAGGGTCACATCCCTCTCTCTGGACATTTGTG AAGGGAAACTGGCAGCAGCATCAGGAGGCACCAGGAGCAGAAGCTCTTTGTCTGAGCATCTGTGTAAGGC ACTGCCCTATTTCAGCTACTTGCTTTTTTTTCCTGCTCTCCTAGGAGGCCCTCTGTGTTCCTTCCAGAGA TTTCAGGCCCGTGTTCAAGGGCCCAGCAACTTGTGTCCCAGGCACCCTTTCAGGGCTCTGACCTGGAGGG GTCTGCAGATTCTGGGACTAGAGTGCCTAAAGGTCGTCATGAGGGCAGTGGTGAGAGCAGGAGCAGGACT GACCGACTGCCGGCAACTCCAGTGCATCTATGTCATGTGGTCCACAGCCGGGCTCTTCAAACTCACCTAC TACTCCCACTGGATCCTGGATGACTCCCTCCTGTGTGCAGCGGGCTTTGGATCTGAGTTTGGGCAGAGCC CTGGTGAGGACGGATACATCCCTGATGCAGACATTTGGACACTGGAAACAACCCACAGGATATCCCTGTT TGCGAGAAAGTGGAACCAAAGCACAGCTCGGTGGCTCAGACGCCTCGTATTTCAGCACAGCAGGGTCTGG CCGTTGTTGCAGACATTTGCATTCTCTGCCTGGTGGCATGGGCTCCATCCAGGACAGGTGTTTGGTTTCC TCTGCTGGGCTGTGATGGTGGAAGCTGACTACCTGATTCACACCTTTGCCAAATTGTTTATCAGATCCTG GCCGATGAAGCTGCTCTATAGAACTCTGACCTGGGCCCACACCCAGCTCATCATTGCCTACATAATGCTG GCCGTGGAGGTCAGGAGCCTCTCCTCTCTCTGGCTGCTGTGTAATTCTTACAACAGTGTCTTTCCCATGG TGTATTGTATTTTGCTTTTGCTATTAGCAAAGAGAAAGCACACATTTAACTGA (SEQ ID NO: 21) protein sequence region that we predict on the basis of genomic DNA (corresponding to the first two coding exons in mouse sequence), but absent from the NCBI protein sequence is highlighted in underline; ##STR00004## atgggttggcttcagctgttccttctccatcctgtatcactttatcaaggggccgctttt M G W L Q L F L L H P V S L Y Q G A A F ccttttgcacttctatttaattacctttgcactatggattcattttccactcatgccagg P F A L L F N Y L C T M D S F S T H A R tacctctttctgctggcaggaggaggcgccctggccttggccgctatgggtccctttgct Y L F L L A G G G A L A L A A M G P F A gtgcttgtcttcatccctgcgatatgtgctgtgtttctgatctgcttgctcagcccacag V L V F I P A I C A V F L I C L L S P Q gaagtccacaggcagactttctgctttcagatgagctggcagacgctgtgtcacctgggt E V H R Q T F C F Q M S W Q T L C H L G ctgcactatactgagtattatctgcaagaacttccttccacgaggttctgcctcgctctt L H Y T E Y Y L Q E L P S T R F C L A L tcttccctcatgctcttgacccagagggtcacatccctctctctggacatttgtgaaggg S S L M L L T Q R V T S L S L D I C E G aaactggcagcagcatcaggaggcaccaggagcagaagctctttgtctgagcatctgtgt K L A A A S G G T R S R S S L S E H L C aaggcactgccctatttcagctacttgcttttttttcctgctctcctaggaggccctctg K A L P Y F S Y L L F F P A L L G G P L tgttccttccagagatttcaggcccgtgttcaagggcccagcaacttgtgtcccaggcac C S F Q R F Q A R V Q G P S N L C P R H cctttcagggctctgacctggaggggtctgcagattctgggactagagtgcctaaaggtc P F R A L T W R G L Q I L G L E C L K V gtcatgagggcagtggtgagagcaggagcaggactgaccgactgccggcaactccagtgc V M R A V V R A G A G L T D C R Q L Q C atctatgtcatgtggtccacagccgggctcttcaaactcacctactactcccactggatc I Y V M W S T A G L F K L T Y Y S H W I ctggatgactccctcctgtgtgcagcgggctttggatctgagtttgggcagagccctggt L D D S L L C A A G F G S E F G Q S P G gaggacggatacatccctgatgcagacatttggacactggaaacaacccacaggatatcc E D G Y I P D A D I W T L E T T H R I S ctgtttgcgagaaagtggaaccaaagcacagctcggtggctcagacgcctcgtatttcag L F A R K W N Q S T A R W L R R L V F Q cacagcagggtctggccgttgttgcagacatttgcattctctgcctggtggcatgggctc H S R V W P L L Q T F A F S A W W H G L catccaggacaggtgtttggtttcctctgctgggctgtgatggtggaagctgactacctg H P G Q V F G F L C W A V M V E A D Y L attcacacctttgccaaattgtttatcagatcctggccgatgaagctgctctatagaact I H T F A K L F I R S W P M K L L Y R T ctgacctgggcccacacccagctcatcattgcctacataatgctggccgtggaggtcagg L T W A H T Q L I I A Y I M L A V E V R agcctctcctctctctggctgctgtgtaattcttacaacagtgtctttcccatggtgtat S L S S L W L L C N S Y N S V F P M V Y Tgtattttgcttttgctattagcaaagagaaagcacacatttaactga (SEQ ID NO: 23) C I L L L L L A K R K H T F N - (SEQ ID NO: 22)

TABLE-US-00007 Zebrafish coding DNA region in 3 exons >ref|NW_001513480.1|Dr5_WGA761_2:794788-794913 Danic rerio chromosome 5 genomic contig, reference assembly (based on Zv6_scaffold761:1-1770220) ATGATAGATCTCCTTTGGATTTCCTTCTGATGGACACCCTCAGCTGTTTTACCAGTTTATCAACATACCAT TTGCATTTCTGTTTCATTGCTTATCCAGTCAAGGACATCTCTCGATAATCAACAGG (SEQ ID NO: 24) >ref|NW_001513480.1|Dr5_WGA761_2:794996-795220 Danio rerio chromosome 5 genomic contig, reference assembly (based on Zv6_scaffold761:1-1770220) TACGTCTATTTGGCGATGGGAGGATTCATGCTGGCTATTGCAACAATGGGTCCATATAGCTCACTGCTGT TCCTGAGTGCTATTAAACTGCTGTTACTGATCCACTATATACATCCAATGCATCTTCATCGGTGGATTCT GGGACTGCAGATGTGTTGGCAAACCTGCTGGCATTTGTACGTCCAGTACCAGATATACTGGCTTCAAGAG GCACCAGACTCAAGG (SEQ ID NO: 25) >ref|NW_001513480.1|Dr5_WGA761_2:797189-798085 Danio rerio chromosome 5 genomic contig, reference assembly (based on Zv6_scaffold761:1-1770220) CTTTTACTGGCCATATCTGCACTCATGTTGATGACCCAGAGGATTTCCTCTCTATCACTCGATTTCCAAG AGGGGACGATCTCCAATCAGTCAATCCTTATTCCATTCCTAACCTACTCGCTTTATTTCCCTGCCCTTCT TGGAGGTCCACTTTGCAGTTTCAATGCTTTTGTTCAGTCTGTCGAGCGTCAACACACCAGCATGACTTCA TATTTAGGAAATCTCACTTCAAAGATATCACAAGTTATAGTTTTGGTGTGGATTAAACAGCTTTTCAGTG AGCTTTTGAAATCTGCCACGTTTAACATCGACAGTGTTTGTCTTGATGTATTGTGGATTTGGATCTTTTC GCTGACACTTAGGCTTAATTACTATGCACACTGGAAGATGAGCGAGTGTGTTAATAATGCTGCAGGATTT GGTGTCTATTTACACAAACACAGTGGACAAACATCATGGGACGGTCTTTCTGATGGGAGTGTACTGGTGA CTGAAGCATCCAGTCGTCCTTCGGTTTTTGCGCGAAAGTGGAACCAAACCACGGTGGATTGGCTTCGAAA AATAGTCTTCAACAGGACCAGCAGATCTCCACTGTTCATGACTTTTGGGTTTTCTGCACTGTGGCACGGT CTTCACCCTGGGCAGATTCTGGGTTTCCTCATTTGGGCCGTCACTGTGCAGGCGGACTACAAACTGCATC GCTTCTTGCACCCGAAGCTTAACTCCCTGTGGAGAAAACGGCTGTATGTGTGTGTAAACTGGGCCTTTAC TCAGCTGACCGTCGCATGTGTTGTGGTCTGTGTGGAGCTTCAGAGTTTGGCATCAGTTAAGCTGCTCTGG TCTTCGTGTATTGCTGTGTTTCCACTGCTGAGTGCTCTGATCTTAATAATCCTCTGA (SEQ ID NO: 26) protein sequence region that we predict on the basis of genomic DNA (corresponding to the first coding exons in mouse sequence), but absent from the NC3I protein sequence is highlighted in underline; ##STR00005## atgatagatctcctttggatttcttctgatggacaccctcagctgttttaccagtttatc M I D L L W I S S D G H P Q L F Y Q F I aacataccatttgcatttctgtttcattgcttatccagtcaaggacatctctcgataatc N I P F A F L F H C L S S Q G H L S I I aacaggtacgtctatttggcgatgggaggattcatgctggctattgcaacaatgggtcca N R Y V Y L A M G G F M L A I A T M G P tatagctcactgctgttcctgagtgctattaaactgctgttactgatccactatatacat Y S S L L F L S A I K L L L L I H Y I H ccaatgcatcttcatcggtggattctgggactgcagatgtgttggcaaacctgctggcat P M H L H R W I L G L Q M C W Q T C W H ttgtacgtccagtaccagatatactggcttcaagaggcaccagactcaaggcttttactg L Y V Q Y Q I Y W L Q E A P D S R L L L gccatatctgcactcatgttgatgacccagaggatttcctctctatcactcgatttccaa A I S A L M L M T Q R I S S L S L D F Q gaggggacgatctccaatcagtcaatccttattccattcctaacctactcgctttatttc E G T I S N Q S I L I P F L T Y S L Y F cctgcccttcttggaggtccactttgcagtttcaatgcttttgttcagtctgtcgagcgt P A L L G G P L C S F N A F V Q S V E R caacacaccagcatgacttcatatttaggaaatctcacttcaagatatcacaagttata Q H T S M T S Y L G N L T S K I S Q V I gttttggtgtggattaaacagcttttcagtgagcttttgaatctgccacgtttaacatc V L V W I K Q L F S E L L K S A T F N I gacagtgtttgtcttgatgtattgtggatttggatcttttcgctgacacttaggcttaat D S V C L D V L W I W I F S L T L R L N tactatgcacactggaagatgagcgagtgtgttaataatgctgcaggatttggtgtctat Y Y A H W K M S E C V N N A A G F G V Y ttacacaaacacagtggacaaacatcatgggacggtctttctgatgggagtgtactggtg L H K H S G Q T S W D G L S D G S V L V actgaagcatccagtcgtccttcggtttttgcgcgaaagtggaaccaaaccacggtggat T E A S S R P S V F A R K W N Q T T V D tggcttcgaaaaatagtcttcaacaggaccagcagatctccactgttcatgacttttggg W L R K I V F N R T S R S P L F M T F G ttttctgcactgtggcacggtcttcaccctgggcagattctgggtttcctcatttgggcc F S A L W H G L H P G Q I L G F L I W A gtcactgtgcaggcggactacaaactgcatcgcttcttgcacccgaagcttaactccctg V T V Q A D Y K L H R F L H P K L N S L tggagaaaacggctgtatgtgtgtgtaaactgggcctttactcagctgaccgtcgcatgt W R K R L Y V C V N W A F T Q L T V A C gttgtggtctgtgtggagcttcagagtttggcatcagttaagctgctctggtcttcgtgt V V V C V E L Q S L A S V K L L W S S C Attgctgtgtttccactgctgagtgctctgatcttaataatcctctga (SEQ ID NO: 28) I A V F P L L S A L I L I I L - (SEQ ID NO: 27)

Sequence CWU 1

1

2811372DNAmouse 1gacttccctt ttacaagggc accgcttagg gactctagga aggacagtgg gcctcacatt 60caggatggat tggctccagc tcttttttct gcatccttta tcattttatc aaggggctgc 120attccccttt gcgcttctgt ttaattatct ctgcatcttg gacacctttt ccacccgggc 180caggtacctc tttctcctgg ctggaggagg tgtcctggct tttgctgcca tgggtcccta 240ctctctgctc atcttcatcc ctgcgctctg cgctgtggct ctggtctcct tcctcagtcc 300acaggaagtc cataggctga ccttcttctt tcagatgggc tggcagaccc tgtgccatct 360gggtcttcac tacaccgaat actacctggg tgagcctcca cccgtgaggt tctacatcac 420tctttcttcc ctcatgctct tgacgcagag agtcacatcc ctctcactgg acatttgtga 480agggaaggtg gaggccccga ggcggggcat caggagcaag agttctttct ctgagcacct 540gtgggatgct ctacctcatt tcagctactt gctctttttc cctgctctcc tgggaggctc 600cctgtgttcc ttccggaggt ttcaggcttg cgttcaaaga tcaagctctt tgtatccgag 660tatctctttt cgggctctga cctggagggg tctgcagatt ctcgggctgg agtgcctcaa 720ggtggcgctg aggagcgcgg tgagtgctgg agctggactg gatgactgcc agcggctgga 780gtgcatctac ctcatgtggt ccacagcctg gctctttaaa ctcacctatt actcccattg 840gatcctggac gactctctcc tccacgcggc gggctttggc gctgaggctg gccaggggcc 900tggagaggag ggatacgtcc ccgacgtgga catttggacc ctggaaacta cccacaggat 960ctccctgttc gccaggcagt ggaaccgaag cacagctctg tggctcagga ggctcgtctt 1020ccggaagagc cggcgctggc ccctgctgca gacatttgcc ttctctgcct ggtggcacgg 1080gctccaccca ggtcaggtgt tcggcttcct gtgctggtct gtaatggtga aagccgatta 1140tctgattcac acttttgcca acgtatgtat cagatcctgg cccctgcggc tgctttatag 1200agccctcact tgggctcata cccaactcat cattgcctac atcatgctgg cggtggaggg 1260ccggagcctt tcctctctct gccaactgtg ctgttcttac aacagtctct tccctgtgat 1320gtacggtctt ttgctttttc tgttagcgga gagaaaagac aaacgtaact ga 13722435PRTmouse 2Met Asp Trp Leu Gln Leu Phe Phe Leu His Pro Leu Ser Phe Tyr Gln1 5 10 15Gly Ala Ala Phe Pro Phe Ala Leu Leu Phe Asn Tyr Leu Cys Ile Leu 20 25 30Asp Thr Phe Ser Thr Arg Ala Arg Tyr Leu Phe Leu Leu Ala Gly Gly 35 40 45Gly Val Leu Ala Phe Ala Ala Met Gly Pro Tyr Ser Leu Leu Ile Phe 50 55 60Ile Pro Ala Leu Cys Ala Val Ala Leu Val Ser Phe Leu Ser Pro Gln65 70 75 80Glu Val His Arg Leu Thr Phe Phe Phe Gln Met Gly Trp Gln Thr Leu 85 90 95Cys His Leu Gly Leu His Tyr Thr Glu Tyr Tyr Leu Gly Glu Pro Pro 100 105 110Pro Val Arg Phe Tyr Ile Thr Leu Ser Ser Leu Met Leu Leu Thr Gln 115 120 125Arg Val Thr Ser Leu Ser Leu Asp Ile Cys Glu Gly Lys Val Glu Ala 130 135 140Pro Arg Arg Gly Ile Arg Ser Lys Ser Ser Phe Ser Glu His Leu Trp145 150 155 160Asp Ala Leu Pro His Phe Ser Tyr Leu Leu Phe Phe Pro Ala Leu Leu 165 170 175Gly Gly Ser Leu Cys Ser Phe Arg Arg Phe Gln Ala Cys Val Gln Arg 180 185 190Ser Ser Ser Leu Tyr Pro Ser Ile Ser Phe Arg Ala Leu Thr Trp Arg 195 200 205Gly Leu Gln Ile Leu Gly Leu Glu Cys Leu Lys Val Ala Leu Arg Ser 210 215 220Ala Val Ser Ala Gly Ala Gly Leu Asp Asp Cys Gln Arg Leu Glu Cys225 230 235 240Ile Tyr Leu Met Trp Ser Thr Ala Trp Leu Phe Lys Leu Thr Tyr Tyr 245 250 255Ser His Trp Ile Leu Asp Asp Ser Leu Leu His Ala Ala Gly Phe Gly 260 265 270Ala Glu Ala Gly Gln Gly Pro Gly Glu Glu Gly Tyr Val Pro Asp Val 275 280 285Asp Ile Trp Thr Leu Glu Thr Thr His Arg Ile Ser Leu Phe Ala Arg 290 295 300Gln Trp Asn Arg Ser Thr Ala Leu Trp Leu Arg Arg Leu Val Phe Arg305 310 315 320Lys Ser Arg Arg Trp Pro Leu Leu Gln Thr Phe Ala Phe Ser Ala Trp 325 330 335Trp His Gly Leu His Pro Gly Gln Val Phe Gly Phe Leu Cys Trp Ser 340 345 350Val Met Val Lys Ala Asp Tyr Leu Ile His Thr Phe Ala Asn Val Cys 355 360 365Ile Arg Ser Trp Pro Leu Arg Leu Leu Tyr Arg Ala Leu Thr Trp Ala 370 375 380His Thr Gln Leu Ile Ile Ala Tyr Ile Met Leu Ala Val Glu Gly Arg385 390 395 400Ser Leu Ser Ser Leu Cys Gln Leu Cys Cys Ser Tyr Asn Ser Leu Phe 405 410 415Pro Val Met Tyr Gly Leu Leu Leu Phe Leu Leu Ala Glu Arg Lys Asp 420 425 430Lys Arg Asn 4353120DNArat 3atggattggc tccagttctt ctttctccat cctgtatcac tttatcaagg ggctgctttc 60cccttcgcgc ttctgtttaa ttatctctgc atcacggaat cctttcccac ccgggccagg 1204225DNArat 4tacctctttc tcctggctgg aggaggtgtc ctggctttgg ccgccatggg tccctacgct 60ctgctcattt tcatccctgc tctctgtgcc gtggctatga tctcctccct cagtccacag 120gaagtccatg ggctgacttt cttctttcag atgggttggc aaaccctgtg ccacctgggt 180cttcactaca aggagtacta cctgtgtgag cctccccctg tgagg 2255963DNArat 5ttctacatca ctctttcttc cctcatgctc ttgacgcaga gagtcacgtc tctctccctg 60gacatttctg aagggaaggt ggaggcagcg tggaggggca ccaggagcag gagttctttg 120tgtgagcacc tgtgggatgc tctaccctat atcagctatt tgctcttttt ccctgcactc 180ctgggaggct ccctgtgttc ctttcagaga tttcaggctt gcgttcaaag accaaggtct 240ttgtatccca gtatctcttt ctgggctctg acctggaggg gtctgcagat ccttgggctg 300gagtgcctca aggtggcgct gaggagggtg gtgagtgctg gcgctggact ggatgattgc 360cagcgactgg agtgcatcta catcatgtgg tccaccgctg ggctctttaa actcacctac 420tactcccact ggatcctgga cgactctctc cttcacgcgg cgggctttgg atctgaggct 480ggccagaggc ctggagagga gagatacgtc ccggatgtgg acatttggac attggaaact 540acccacagga tctccctgtt cgcgaggcag tggaaccgaa gcacagctca gtggctcaag 600aggcttgtct tccagaggag ccggcgctgg cccgtgctgc agacttttgc cttctctgcc 660tggtggcacg gactccaccc aggacaggtg tttggcttcc tgtgctggtc tgtgatggtg 720aaagccgact atctgatcca cacttttgcc aatggatgta tcagatcctg gcccctgcgg 780ctgctttata gatccctcac ttgggcccac actcagatca tcattgctta cgtaatgctg 840gccgtggagg gccggagctt ttcctctctc tgccggctgt gctgttctta caacagtatc 900ttccctgtaa cgtactgcct tttgcttttt ctattagcga ggagaaaaca caagtgtaac 960tga 9636435PRTrat 6Met Asp Trp Leu Gln Phe Phe Phe Leu His Pro Val Ser Leu Tyr Gln1 5 10 15Gly Ala Ala Phe Pro Phe Ala Leu Leu Phe Asn Tyr Leu Cys Ile Thr 20 25 30Glu Ser Phe Pro Thr Arg Ala Arg Tyr Leu Phe Leu Leu Ala Gly Gly 35 40 45Gly Val Leu Ala Leu Ala Ala Met Gly Pro Tyr Ala Leu Leu Ile Phe 50 55 60Ile Pro Ala Leu Cys Ala Val Ala Met Ile Ser Ser Leu Ser Pro Gln65 70 75 80Glu Val His Gly Leu Thr Phe Phe Phe Gln Met Gly Trp Gln Thr Leu 85 90 95Cys His Leu Gly Leu His Tyr Lys Glu Tyr Tyr Leu Cys Glu Pro Pro 100 105 110Pro Val Arg Phe Tyr Ile Thr Leu Ser Ser Leu Met Leu Leu Thr Gln 115 120 125Arg Val Thr Ser Leu Ser Leu Asp Ile Ser Glu Gly Lys Val Glu Ala 130 135 140Ala Trp Arg Gly Thr Arg Ser Arg Ser Ser Leu Cys Glu His Leu Trp145 150 155 160Asp Ala Leu Pro Tyr Ile Ser Tyr Leu Leu Phe Phe Pro Ala Leu Leu 165 170 175Gly Gly Ser Leu Cys Ser Phe Gln Arg Phe Gln Ala Cys Val Gln Arg 180 185 190Pro Arg Ser Leu Tyr Pro Ser Ile Ser Phe Trp Ala Leu Thr Trp Arg 195 200 205Gly Leu Gln Ile Leu Gly Leu Glu Cys Leu Lys Val Ala Leu Arg Arg 210 215 220Val Val Ser Ala Gly Ala Gly Leu Asp Asp Cys Gln Arg Leu Glu Cys225 230 235 240Ile Tyr Ile Met Trp Ser Thr Ala Gly Leu Phe Lys Leu Thr Tyr Tyr 245 250 255Ser His Trp Ile Leu Asp Asp Ser Leu Leu His Ala Ala Gly Phe Gly 260 265 270Ser Glu Ala Gly Gln Arg Pro Gly Glu Glu Arg Tyr Val Pro Asp Val 275 280 285Asp Ile Trp Thr Leu Glu Thr Thr His Arg Ile Ser Leu Phe Ala Arg 290 295 300Gln Trp Asn Arg Ser Thr Ala Gln Trp Leu Lys Arg Leu Val Phe Gln305 310 315 320Arg Ser Arg Arg Trp Pro Val Leu Gln Thr Phe Ala Phe Ser Ala Trp 325 330 335Trp His Gly Leu His Pro Gly Gln Val Phe Gly Phe Leu Cys Trp Ser 340 345 350Val Met Val Lys Ala Asp Tyr Leu Ile His Thr Phe Ala Asn Gly Cys 355 360 365Ile Arg Ser Trp Pro Leu Arg Leu Leu Tyr Arg Ser Leu Thr Trp Ala 370 375 380His Thr Gln Ile Ile Ile Ala Tyr Val Met Leu Ala Val Glu Gly Arg385 390 395 400Ser Phe Ser Ser Leu Cys Arg Leu Cys Cys Ser Tyr Asn Ser Ile Phe 405 410 415Pro Val Thr Tyr Cys Leu Leu Leu Phe Leu Leu Ala Arg Arg Lys His 420 425 430Lys Cys Asn 43571308DNArat 7atggattggc tccagttctt ctttctccat cctgtatcac tttatcaagg ggctgctttc 60cccttcgcgc ttctgtttaa ttatctctgc atcacggaat cctttcccac ccgggccagg 120tacctctttc tcctggctgg aggaggtgtc ctggctttgg ccgccatggg tccctacgct 180ctgctcattt tcatccctgc tctctgtgcc gtggctatga tctcctccct cagtccacag 240gaagtccatg ggctgacttt cttctttcag atgggttggc aaaccctgtg ccacctgggt 300cttcactaca aggagtacta cctgtgtgag cctccccctg tgaggttcta catcactctt 360tcttccctca tgctcttgac gcagagagtc acgtctctct ccctggacat ttctgaaggg 420aaggtggagg cagcgtggag gggcaccagg agcaggagtt ctttgtgtga gcacctgtgg 480gatgctctac cctatatcag ctatttgctc tttttccctg cactcctggg aggctccctg 540tgttcctttc agagatttca ggcttgcgtt caaagaccaa ggtctttgta tcccagtatc 600tctttctggg ctctgacctg gaggggtctg cagatccttg ggctggagtg cctcaaggtg 660gcgctgagga gggtggtgag tgctggcgct ggactggatg attgccagcg actggagtgc 720atctacatca tgtggtccac cgctgggctc tttaaactca cctactactc ccactggatc 780ctggacgact ctctccttca cgcggcgggc tttggatctg aggctggcca gaggcctgga 840gaggagagat acgtcccgga tgtggacatt tggacattgg aaactaccca caggatctcc 900ctgttcgcga ggcagtggaa ccgaagcaca gctcagtggc tcaagaggct tgtcttccag 960aggagccggc gctggcccgt gctgcagact tttgccttct ctgcctggtg gcacggactc 1020cacccaggac aggtgtttgg cttcctgtgc tggtctgtga tggtgaaagc cgactatctg 1080atccacactt ttgccaatgg atgtatcaga tcctggcccc tgcggctgct ttatagatcc 1140ctcacttggg cccacactca gatcatcatt gcttacgtaa tgctggccgt ggagggccgg 1200agcttttcct ctctctgccg gctgtgctgt tcttacaaca gtatcttccc tgtaacgtac 1260tgccttttgc tttttctatt agcgaggaga aaacacaagt gtaactga 13088120DNAhuman 8atggagtggc tttggctgtt ctttctccat cctatatcgt tttaccaggg ggctgcattt 60ccctttgcac ttctcttcaa ttatctctgc atcatggatt cattctccac tcgtgccagg 1209225DNAhuman 9tacctctttc tcctgactgg aggaggtgcc ctggccgtgg ctgccatggg ttcctacgcc 60gtgctcgtct tcacccctgc tgtctgcgct gtggctctcc tctgttccct ggctcctcag 120caagtccaca ggtggacctt ctgctttcag atgagctggc agaccttgtg tcacctaggt 180ctgcactaca ctgagtatta tctgcatgag cctccttctg tgagg 22510963DNAhuman 10ttctgcatca ctctttcttc tctcatgctc ttgacccaga gggtcacgtc cctctctctg 60gacatttgtg aggggaaagt gaaggcagca tctggaggct tcaggagcag gagctctttg 120tctgagcatg tgtgtaaggc actgccctat ttcagctact tgctcttttt ccctgctctc 180ctgggaggct ctctgtgctc cttccagcga tttcaggctc gtgttcaagg gtccagtgct 240ttgcatccca gacactcttt ctgggctctg agctggaggg gtctgcagat tcttggacta 300gaatgcctaa acgtggcagt gagcagggtg gtggatgcag gagcgggact gactgattgc 360cagcaattcg agtgcatcta tgtcgtgtgg accacagctg ggcttttcaa gctcacctac 420tactcccact ggatcctgga cgactccctc ctccacgcag cgggctttgg gcctgagctt 480ggtcagagcc ctggagagga gggatatgtc cccgatgcag acatctggac cctggaaaga 540acccacagga tatctgtgtt ctcaagaaag tggaaccaaa gcacagctcg atggctccga 600cggcttgtat tccagcacag cagggcttgg ccgttgttgc agacatttgc cttctctgcc 660tggtggcatg gactccatcc aggacaggtg tttggtttcg tttgctgggc cgtgatggtg 720gaagctgact acctgattca ctcctttgcc aatgagttta tcagatcctg gccgatgagg 780ctgttctata gaaccctcac ctgggcccac acccagttga tcattgccta catcatgctg 840gctgtggagg tcaggagtct ctcctctctc tggttgctct gtaattcgta caacagtgtc 900tttcccatgg tgtactgtat tctgcttttg ctattggcga agagaaagca caaatgtaac 960tga 96311435PRThuman 11Met Glu Trp Leu Trp Leu Phe Phe Leu His Pro Ile Ser Phe Tyr Gln1 5 10 15Gly Ala Ala Phe Pro Phe Ala Leu Leu Phe Asn Tyr Leu Cys Ile Met 20 25 30Asp Ser Phe Ser Thr Arg Ala Arg Tyr Leu Phe Leu Leu Thr Gly Gly 35 40 45Gly Ala Leu Ala Val Ala Ala Met Gly Ser Tyr Ala Val Leu Val Phe 50 55 60Thr Pro Ala Val Cys Ala Val Ala Leu Leu Cys Ser Leu Ala Pro Gln65 70 75 80Gln Val His Arg Trp Thr Phe Cys Phe Gln Met Ser Trp Gln Thr Leu 85 90 95Cys His Leu Gly Leu His Tyr Thr Glu Tyr Tyr Leu His Glu Pro Pro 100 105 110Ser Val Arg Phe Cys Ile Thr Leu Ser Ser Leu Met Leu Leu Thr Gln 115 120 125Arg Val Thr Ser Leu Ser Leu Asp Ile Cys Glu Gly Lys Val Lys Ala 130 135 140Ala Ser Gly Gly Phe Arg Ser Arg Ser Ser Leu Ser Glu His Val Cys145 150 155 160Lys Ala Leu Pro Tyr Phe Ser Tyr Leu Leu Phe Phe Pro Ala Leu Leu 165 170 175Gly Gly Ser Leu Cys Ser Phe Gln Arg Phe Gln Ala Arg Val Gln Gly 180 185 190Ser Ser Ala Leu His Pro Arg His Ser Phe Trp Ala Leu Ser Trp Arg 195 200 205Gly Leu Gln Ile Leu Gly Leu Glu Cys Leu Asn Val Ala Val Ser Arg 210 215 220Val Val Asp Ala Gly Ala Gly Leu Thr Asp Cys Gln Gln Phe Glu Cys225 230 235 240Ile Tyr Val Val Trp Thr Thr Ala Gly Leu Phe Lys Leu Thr Tyr Tyr 245 250 255Ser His Trp Ile Leu Asp Asp Ser Leu Leu His Ala Ala Gly Phe Gly 260 265 270Pro Glu Leu Gly Gln Ser Pro Gly Glu Glu Gly Tyr Val Pro Asp Ala 275 280 285Asp Ile Trp Thr Leu Glu Arg Thr His Arg Ile Ser Val Phe Ser Arg 290 295 300Lys Trp Asn Gln Ser Thr Ala Arg Trp Leu Arg Arg Leu Val Phe Gln305 310 315 320His Ser Arg Ala Trp Pro Leu Leu Gln Thr Phe Ala Phe Ser Ala Trp 325 330 335Trp His Gly Leu His Pro Gly Gln Val Phe Gly Phe Val Cys Trp Ala 340 345 350Val Met Val Glu Ala Asp Tyr Leu Ile His Ser Phe Ala Asn Glu Phe 355 360 365Ile Arg Ser Trp Pro Met Arg Leu Phe Tyr Arg Thr Leu Thr Trp Ala 370 375 380His Thr Gln Leu Ile Ile Ala Tyr Ile Met Leu Ala Val Glu Val Arg385 390 395 400Ser Leu Ser Ser Leu Trp Leu Leu Cys Asn Ser Tyr Asn Ser Val Phe 405 410 415Pro Met Val Tyr Cys Ile Leu Leu Leu Leu Leu Ala Lys Arg Lys His 420 425 430Lys Cys Asn 435121308DNAhuman 12atggagtggc tttggctgtt ctttctccat cctatatcgt tttaccaggg ggctgcattt 60ccctttgcac ttctcttcaa ttatctctgc atcatggatt cattctccac tcgtgccagg 120tacctctttc tcctgactgg aggaggtgcc ctggccgtgg ctgccatggg ttcctacgcc 180gtgctcgtct tcacccctgc tgtctgcgct gtggctctcc tctgttccct ggctcctcag 240caagtccaca ggtggacctt ctgctttcag atgagctggc agaccttgtg tcacctaggt 300ctgcactaca ctgagtatta tctgcatgag cctccttctg tgaggttctg catcactctt 360tcttctctca tgctcttgac ccagagggtc acgtccctct ctctggacat ttgtgagggg 420aaagtgaagg cagcatctgg aggcttcagg agcaggagct ctttgtctga gcatgtgtgt 480aaggcactgc cctatttcag ctacttgctc tttttccctg ctctcctggg aggctctctg 540tgctccttcc agcgatttca ggctcgtgtt caagggtcca gtgctttgca tcccagacac 600tctttctggg ctctgagctg gaggggtctg cagattcttg gactagaatg cctaaacgtg 660gcagtgagca gggtggtgga tgcaggagcg ggactgactg attgccagca attcgagtgc 720atctatgtcg tgtggaccac agctgggctt ttcaagctca cctactactc ccactggatc 780ctggacgact ccctcctcca cgcagcgggc tttgggcctg agcttggtca gagccctgga 840gaggagggat atgtccccga tgcagacatc tggaccctgg aaagaaccca caggatatct 900gtgttctcaa gaaagtggaa ccaaagcaca gctcgatggc tccgacggct tgtattccag 960cacagcaggg cttggccgtt gttgcagaca tttgccttct ctgcctggtg gcatggactc 1020catccaggac aggtgtttgg tttcgtttgc tgggccgtga tggtggaagc tgactacctg 1080attcactcct ttgccaatga gtttatcaga tcctggccga tgaggctgtt ctatagaacc 1140ctcacctggg cccacaccca gttgatcatt gcctacatca tgctggctgt ggaggtcagg 1200agtctctcct ctctctggtt gctctgtaat tcgtacaaca gtgtctttcc catggtgtac 1260tgtattctgc ttttgctatt ggcgaagaga aagcacaaat gtaactga 130813435PRTchimpanzee 13Met Glu Trp Leu Arg Leu Phe Phe Leu His Pro Val Ser Phe Tyr Gln1 5 10 15Gly Ala Ala Phe Pro Phe Ala Leu Leu Phe Asn Tyr Leu Cys Ile Met

20 25 30Asp Ser Phe Ser Thr Arg Ala Arg Tyr Leu Phe Leu Leu Ala Gly Gly 35 40 45Gly Ala Leu Ala Val Ala Ala Met Gly Ser Tyr Ala Val Leu Val Phe 50 55 60Thr Pro Ala Val Cys Ala Val Ala Leu Leu Cys Ser Leu Ala Pro Gln65 70 75 80Gln Val His Arg Trp Thr Phe Cys Phe Gln Met Ser Trp Gln Thr Leu 85 90 95Cys His Leu Gly Leu His Tyr Thr Glu Tyr Tyr Leu His Glu Pro Pro 100 105 110Ser Val Arg Phe Cys Ile Thr Leu Ser Ser Leu Met Leu Leu Thr Gln 115 120 125Arg Val Thr Ser Leu Ser Leu Asp Ile Cys Glu Gly Lys Val Glu Ala 130 135 140Ala Ser Gly Gly Phe Arg Ser Arg Ser Ser Leu Ser Glu His Val Cys145 150 155 160Lys Ala Leu Pro Tyr Phe Ser Tyr Leu Leu Phe Phe Pro Ala Leu Leu 165 170 175Gly Gly Ser Leu Cys Ser Phe Gln Arg Phe Gln Ala Arg Val Gln Gly 180 185 190Ser Ser Ala Leu His Pro Arg His Ser Phe Trp Ala Leu Ser Trp Arg 195 200 205Cys Leu Gln Ile Leu Gly Leu Glu Cys Leu Asn Val Ala Val Ser Arg 210 215 220Val Val Asp Ala Gly Ala Gly Leu Thr Asp Cys Gln Gln Phe Glu Cys225 230 235 240Ile Tyr Val Val Trp Thr Thr Ala Gly Leu Phe Lys Leu Thr Tyr Tyr 245 250 255Ser His Trp Ile Leu Asp Asp Ser Leu Leu His Ala Ala Gly Phe Gly 260 265 270Pro Glu Leu Gly Gln Ser Pro Gly Glu Glu Gly Tyr Val Pro Asp Ala 275 280 285Asp Ile Trp Thr Leu Glu Arg Thr His Arg Ile Ser Val Phe Ala Arg 290 295 300Lys Trp Asn Gln Ser Thr Ala Arg Trp Leu Arg Arg Leu Val Phe Gln305 310 315 320His Ser Arg Ala Trp Pro Leu Leu Gln Thr Phe Ala Phe Ser Ala Trp 325 330 335Trp His Gly Leu His Pro Gly Gln Val Phe Gly Phe Val Cys Trp Ala 340 345 350Val Met Val Glu Ala Asp Tyr Leu Ile His Ser Phe Ala Asn Glu Phe 355 360 365Ile Arg Ser Trp Pro Met Arg Leu Phe Tyr Arg Thr Leu Thr Trp Ala 370 375 380His Thr Gln Leu Ile Ile Ala Tyr Ile Met Leu Ala Val Glu Val Arg385 390 395 400Ser Leu Ser Ser Leu Trp Leu Leu Cys Asn Ser Tyr Asn Ser Val Phe 405 410 415Pro Met Val Tyr Cys Ile Leu Leu Leu Leu Leu Val Lys Arg Lys His 420 425 430Lys Cys Asn 43514120DNAbovine 14atggattggc tccagctgtt cttccttgat cctgtatcac tttatcaagg agctgctttt 60ccttttgcac ttctgtttaa tcatctctgt gttatggatt cattttccac tcaggccagg 12015225DNAbovine 15tacctgttcc tcctggcggg aggcggtgcc ctggccgtgg ctgctatggg tgccttcgct 60gtgctggtct tcatccccgc cctgtgcacg gtggtcctca tccactcgct tggcccccag 120gatgtccaca ggccgacctt cctctttcag atgacctggc agacgctgtg ccacctgggt 180ctgcactata cggagtatta tctgcaagaa gctccttcta caagg 22516963DNAbovine 16ttctgcatca ctctctcttc gctcatgctc ttgacccaga agatcacatc tctgtctctg 60gatattcgtg aggggaaggt ggtagcacca tcaggacgca tccctaacaa gaattctttg 120tctgagcatc tgcatgcggc tcttccctat ctcagctact tgctcttctt ccctgccctc 180ctaggaggcc cgctgtgttc cttccagagg tttcaggctc gagttgaagg gtccagcagt 240ttgtggtcca ggcactcttt ctgggctctg acctggaggg cgctgcagat cctgggactg 300gagagtctga aggtgatcgt cagcggggtg gtgggcgtgg gggcaggact tggaggctgc 360aggcagctgc agtgcgtctt cgtcctgtgg tccacggccg ggctcttcaa actcacctac 420tactcccact ggctcctgga tgacgccctc ctccgcgcgg ccggctttgg atctgagtta 480ggtcgcagcc cgggtgagga gggactcctc cccgatgcgg acatttggac gctggaaacg 540acccacagga tagccctgtt cgccaggaag tggaaccaga gcacggctcg gtggctccga 600cgcctggttt tccagcagcg caggacctgg cccttgttgc agacattcct cttctcggcc 660tggtggcacg gtctccaccc gggacaggtg tttggtttcc tctgctgggc tgtcatggtg 720gaagccgact acctgattca cgccttcgcc agcgtgttca tcagctcctg gcccatgcgg 780ctgctctaca gagccctggc ctgggcccac acccagctca tcatcgccta cataatgctg 840gccgtggagg cccggagcct ctcctctctc tggctgctgt ggaattctta cagcagtgtc 900tttcccacgg tgtactgtat tttgcttctc ctgttagcaa agagaaagca taaatgcaac 960tga 96317435PRTbovine 17Met Asp Trp Leu Gln Leu Phe Phe Leu Asp Pro Val Ser Leu Tyr Gln1 5 10 15Gly Ala Ala Phe Pro Phe Ala Leu Leu Phe Asn His Leu Cys Val Met 20 25 30Asp Ser Phe Ser Thr Gln Ala Arg Tyr Leu Phe Leu Leu Ala Gly Gly 35 40 45Gly Ala Leu Ala Val Ala Ala Met Gly Ala Phe Ala Val Leu Val Phe 50 55 60Ile Pro Ala Leu Cys Thr Val Val Leu Ile His Ser Leu Gly Pro Gln65 70 75 80Asp Val His Arg Pro Thr Phe Leu Phe Gln Met Thr Trp Gln Thr Leu 85 90 95Cys His Leu Gly Leu His Tyr Thr Glu Tyr Tyr Leu Gln Glu Ala Pro 100 105 110Ser Thr Arg Phe Cys Ile Thr Leu Ser Ser Leu Met Leu Leu Thr Gln 115 120 125Lys Ile Thr Ser Leu Ser Leu Asp Ile Arg Glu Gly Lys Val Val Ala 130 135 140Pro Ser Gly Arg Ile Pro Asn Lys Asn Ser Leu Ser Glu His Leu His145 150 155 160Ala Ala Leu Pro Tyr Leu Ser Tyr Leu Leu Phe Phe Pro Ala Leu Leu 165 170 175Gly Gly Pro Leu Cys Ser Phe Gln Arg Phe Gln Ala Arg Val Glu Gly 180 185 190Ser Ser Ser Leu Trp Ser Arg His Ser Phe Trp Ala Leu Thr Trp Arg 195 200 205Ala Leu Gln Ile Leu Gly Leu Glu Ser Leu Lys Val Ile Val Ser Gly 210 215 220Val Val Gly Val Gly Ala Gly Leu Gly Gly Cys Arg Gln Leu Gln Cys225 230 235 240Val Phe Val Leu Trp Ser Thr Ala Gly Leu Phe Lys Leu Thr Tyr Tyr 245 250 255Ser His Trp Leu Leu Asp Asp Ala Leu Leu Arg Ala Ala Gly Phe Gly 260 265 270Ser Glu Leu Gly Arg Ser Pro Gly Glu Glu Gly Leu Leu Pro Asp Ala 275 280 285Asp Ile Trp Thr Leu Glu Thr Thr His Arg Ile Ala Leu Phe Ala Arg 290 295 300Lys Trp Asn Gln Ser Thr Ala Arg Trp Leu Arg Arg Leu Val Phe Gln305 310 315 320Gln Arg Arg Thr Trp Pro Leu Leu Gln Thr Phe Leu Phe Ser Ala Trp 325 330 335Trp His Gly Leu His Pro Gly Gln Val Phe Gly Phe Leu Cys Trp Ala 340 345 350Val Met Val Glu Ala Asp Tyr Leu Ile His Ala Phe Ala Ser Val Phe 355 360 365Ile Ser Ser Trp Pro Met Arg Leu Leu Tyr Arg Ala Leu Ala Trp Ala 370 375 380His Thr Gln Leu Ile Ile Ala Tyr Ile Met Leu Ala Val Glu Ala Arg385 390 395 400Ser Leu Ser Ser Leu Trp Leu Leu Trp Asn Ser Tyr Ser Ser Val Phe 405 410 415Pro Thr Val Tyr Cys Ile Leu Leu Leu Leu Leu Ala Lys Arg Lys His 420 425 430Lys Cys Asn 435181308DNAbovine 18atggattggc tccagctgtt cttccttgat cctgtatcac tttatcaagg agctgctttt 60ccttttgcac ttctgtttaa tcatctctgt gttatggatt cattttccac tcaggccagg 120tacctgttcc tcctggcggg aggcggtgcc ctggccgtgg ctgctatggg tgccttcgct 180gtgctggtct tcatccccgc cctgtgcacg gtggtcctca tccactcgct tggcccccag 240gatgtccaca ggccgacctt cctctttcag atgacctggc agacgctgtg ccacctgggt 300ctgcactata cggagtatta tctgcaagaa gctccttcta caaggttctg catcactctc 360tcttcgctca tgctcttgac ccagaagatc acatctctgt ctctggatat tcgtgagggg 420aaggtggtag caccatcagg acgcatccct aacaagaatt ctttgtctga gcatctgcat 480gcggctcttc cctatctcag ctacttgctc ttcttccctg ccctcctagg aggcccgctg 540tgttccttcc agaggtttca ggctcgagtt gaagggtcca gcagtttgtg gtccaggcac 600tctttctggg ctctgacctg gagggcgctg cagatcctgg gactggagag tctgaaggtg 660atcgtcagcg gggtggtggg cgtgggggca ggacttggag gctgcaggca gctgcagtgc 720gtcttcgtcc tgtggtccac ggccgggctc ttcaaactca cctactactc ccactggctc 780ctggatgacg ccctcctccg cgcggccggc tttggatctg agttaggtcg cagcccgggt 840gaggagggac tcctccccga tgcggacatt tggacgctgg aaacgaccca caggatagcc 900ctgttcgcca ggaagtggaa ccagagcacg gctcggtggc tccgacgcct ggttttccag 960cagcgcagga cctggccctt gttgcagaca ttcctcttct cggcctggtg gcacggtctc 1020cacccgggac aggtgtttgg tttcctctgc tgggctgtca tggtggaagc cgactacctg 1080attcacgcct tcgccagcgt gttcatcagc tcctggccca tgcggctgct ctacagagcc 1140ctggcctggg cccacaccca gctcatcatc gcctacataa tgctggccgt ggaggcccgg 1200agcctctcct ctctctggct gctgtggaat tcttacagca gtgtctttcc cacggtgtac 1260tgtattttgc ttctcctgtt agcaaagaga aagcataaat gcaactga 130819120DNAhorse 19atgggttggc ttcagctgtt ccttctccat cctgtatcac tttatcaagg ggccgctttt 60ccttttgcac ttctatttaa ttacctttgc actatggatt cattttccac tcatgccagg 12020225DNAhorse 20tacctctttc tgctggcagg aggaggcgcc ctggccttgg ccgctatggg tccctttgct 60gtgcttgtct tcatccctgc gatatgtgct gtgtttctga tctgcttgct cagcccacag 120gaagtccaca ggcagacttt ctgctttcag atgagctggc agacgctgtg tcacctgggt 180ctgcactata ctgagtatta tctgcaagaa cttccttcca cgagg 22521963DNAhorse 21ttctgcctcg ctctttcttc cctcatgctc ttgacccaga gggtcacatc cctctctctg 60gacatttgtg aagggaaact ggcagcagca tcaggaggca ccaggagcag aagctctttg 120tctgagcatc tgtgtaaggc actgccctat ttcagctact tgcttttttt tcctgctctc 180ctaggaggcc ctctgtgttc cttccagaga tttcaggccc gtgttcaagg gcccagcaac 240ttgtgtccca ggcacccttt cagggctctg acctggaggg gtctgcagat tctgggacta 300gagtgcctaa aggtcgtcat gagggcagtg gtgagagcag gagcaggact gaccgactgc 360cggcaactcc agtgcatcta tgtcatgtgg tccacagccg ggctcttcaa actcacctac 420tactcccact ggatcctgga tgactccctc ctgtgtgcag cgggctttgg atctgagttt 480gggcagagcc ctggtgagga cggatacatc cctgatgcag acatttggac actggaaaca 540acccacagga tatccctgtt tgcgagaaag tggaaccaaa gcacagctcg gtggctcaga 600cgcctcgtat ttcagcacag cagggtctgg ccgttgttgc agacatttgc attctctgcc 660tggtggcatg ggctccatcc aggacaggtg tttggtttcc tctgctgggc tgtgatggtg 720gaagctgact acctgattca cacctttgcc aaattgttta tcagatcctg gccgatgaag 780ctgctctata gaactctgac ctgggcccac acccagctca tcattgccta cataatgctg 840gccgtggagg tcaggagcct ctcctctctc tggctgctgt gtaattctta caacagtgtc 900tttcccatgg tgtattgtat tttgcttttg ctattagcaa agagaaagca cacatttaac 960tga 96322435PRThorse 22Met Gly Trp Leu Gln Leu Phe Leu Leu His Pro Val Ser Leu Tyr Gln1 5 10 15Gly Ala Ala Phe Pro Phe Ala Leu Leu Phe Asn Tyr Leu Cys Thr Met 20 25 30Asp Ser Phe Ser Thr His Ala Arg Tyr Leu Phe Leu Leu Ala Gly Gly 35 40 45Gly Ala Leu Ala Leu Ala Ala Met Gly Pro Phe Ala Val Leu Val Phe 50 55 60Ile Pro Ala Ile Cys Ala Val Phe Leu Ile Cys Leu Leu Ser Pro Gln65 70 75 80Glu Val His Arg Gln Thr Phe Cys Phe Gln Met Ser Trp Gln Thr Leu 85 90 95Cys His Leu Gly Leu His Tyr Thr Glu Tyr Tyr Leu Gln Glu Leu Pro 100 105 110Ser Thr Arg Phe Cys Leu Ala Leu Ser Ser Leu Met Leu Leu Thr Gln 115 120 125Arg Val Thr Ser Leu Ser Leu Asp Ile Cys Glu Gly Lys Leu Ala Ala 130 135 140Ala Ser Gly Gly Thr Arg Ser Arg Ser Ser Leu Ser Glu His Leu Cys145 150 155 160Lys Ala Leu Pro Tyr Phe Ser Tyr Leu Leu Phe Phe Pro Ala Leu Leu 165 170 175Gly Gly Pro Leu Cys Ser Phe Gln Arg Phe Gln Ala Arg Val Gln Gly 180 185 190Pro Ser Asn Leu Cys Pro Arg His Pro Phe Arg Ala Leu Thr Trp Arg 195 200 205Gly Leu Gln Ile Leu Gly Leu Glu Cys Leu Lys Val Val Met Arg Ala 210 215 220Val Val Arg Ala Gly Ala Gly Leu Thr Asp Cys Arg Gln Leu Gln Cys225 230 235 240Ile Tyr Val Met Trp Ser Thr Ala Gly Leu Phe Lys Leu Thr Tyr Tyr 245 250 255Ser His Trp Ile Leu Asp Asp Ser Leu Leu Cys Ala Ala Gly Phe Gly 260 265 270Ser Glu Phe Gly Gln Ser Pro Gly Glu Asp Gly Tyr Ile Pro Asp Ala 275 280 285Asp Ile Trp Thr Leu Glu Thr Thr His Arg Ile Ser Leu Phe Ala Arg 290 295 300Lys Trp Asn Gln Ser Thr Ala Arg Trp Leu Arg Arg Leu Val Phe Gln305 310 315 320His Ser Arg Val Trp Pro Leu Leu Gln Thr Phe Ala Phe Ser Ala Trp 325 330 335Trp His Gly Leu His Pro Gly Gln Val Phe Gly Phe Leu Cys Trp Ala 340 345 350Val Met Val Glu Ala Asp Tyr Leu Ile His Thr Phe Ala Lys Leu Phe 355 360 365Ile Arg Ser Trp Pro Met Lys Leu Leu Tyr Arg Thr Leu Thr Trp Ala 370 375 380His Thr Gln Leu Ile Ile Ala Tyr Ile Met Leu Ala Val Glu Val Arg385 390 395 400Ser Leu Ser Ser Leu Trp Leu Leu Cys Asn Ser Tyr Asn Ser Val Phe 405 410 415Pro Met Val Tyr Cys Ile Leu Leu Leu Leu Leu Ala Lys Arg Lys His 420 425 430Thr Phe Asn 435231308DNAhorse 23atgggttggc ttcagctgtt ccttctccat cctgtatcac tttatcaagg ggccgctttt 60ccttttgcac ttctatttaa ttacctttgc actatggatt cattttccac tcatgccagg 120tacctctttc tgctggcagg aggaggcgcc ctggccttgg ccgctatggg tccctttgct 180gtgcttgtct tcatccctgc gatatgtgct gtgtttctga tctgcttgct cagcccacag 240gaagtccaca ggcagacttt ctgctttcag atgagctggc agacgctgtg tcacctgggt 300ctgcactata ctgagtatta tctgcaagaa cttccttcca cgaggttctg cctcgctctt 360tcttccctca tgctcttgac ccagagggtc acatccctct ctctggacat ttgtgaaggg 420aaactggcag cagcatcagg aggcaccagg agcagaagct ctttgtctga gcatctgtgt 480aaggcactgc cctatttcag ctacttgctt ttttttcctg ctctcctagg aggccctctg 540tgttccttcc agagatttca ggcccgtgtt caagggccca gcaacttgtg tcccaggcac 600cctttcaggg ctctgacctg gaggggtctg cagattctgg gactagagtg cctaaaggtc 660gtcatgaggg cagtggtgag agcaggagca ggactgaccg actgccggca actccagtgc 720atctatgtca tgtggtccac agccgggctc ttcaaactca cctactactc ccactggatc 780ctggatgact ccctcctgtg tgcagcgggc tttggatctg agtttgggca gagccctggt 840gaggacggat acatccctga tgcagacatt tggacactgg aaacaaccca caggatatcc 900ctgtttgcga gaaagtggaa ccaaagcaca gctcggtggc tcagacgcct cgtatttcag 960cacagcaggg tctggccgtt gttgcagaca tttgcattct ctgcctggtg gcatgggctc 1020catccaggac aggtgtttgg tttcctctgc tgggctgtga tggtggaagc tgactacctg 1080attcacacct ttgccaaatt gtttatcaga tcctggccga tgaagctgct ctatagaact 1140ctgacctggg cccacaccca gctcatcatt gcctacataa tgctggccgt ggaggtcagg 1200agcctctcct ctctctggct gctgtgtaat tcttacaaca gtgtctttcc catggtgtat 1260tgtattttgc ttttgctatt agcaaagaga aagcacacat ttaactga 130824126DNAzebrafish 24atgatagatc tcctttggat ttcttctgat ggacaccctc agctgtttta ccagtttatc 60aacataccat ttgcatttct gtttcattgc ttatccagtc aaggacatct ctcgataatc 120aacagg 12625225DNAzebrafish 25tacgtctatt tggcgatggg aggattcatg ctggctattg caacaatggg tccatatagc 60tcactgctgt tcctgagtgc tattaaactg ctgttactga tccactatat acatccaatg 120catcttcatc ggtggattct gggactgcag atgtgttggc aaacctgctg gcatttgtac 180gtccagtacc agatatactg gcttcaagag gcaccagact caagg 22526897DNAzebrafish 26cttttactgg ccatatctgc actcatgttg atgacccaga ggatttcctc tctatcactc 60gatttccaag aggggacgat ctccaatcag tcaatcctta ttccattcct aacctactcg 120ctttatttcc ctgcccttct tggaggtcca ctttgcagtt tcaatgcttt tgttcagtct 180gtcgagcgtc aacacaccag catgacttca tatttaggaa atctcacttc aaagatatca 240caagttatag ttttggtgtg gattaaacag cttttcagtg agcttttgaa atctgccacg 300tttaacatcg acagtgtttg tcttgatgta ttgtggattt ggatcttttc gctgacactt 360aggcttaatt actatgcaca ctggaagatg agcgagtgtg ttaataatgc tgcaggattt 420ggtgtctatt tacacaaaca cagtggacaa acatcatggg acggtctttc tgatgggagt 480gtactggtga ctgaagcatc cagtcgtcct tcggtttttg cgcgaaagtg gaaccaaacc 540acggtggatt ggcttcgaaa aatagtcttc aacaggacca gcagatctcc actgttcatg 600acttttgggt tttctgcact gtggcacggt cttcaccctg ggcagattct gggtttcctc 660atttgggccg tcactgtgca ggcggactac aaactgcatc gcttcttgca cccgaagctt 720aactccctgt ggagaaaacg gctgtatgtg tgtgtaaact gggcctttac tcagctgacc 780gtcgcatgtg ttgtggtctg tgtggagctt cagagtttgg catcagttaa gctgctctgg 840tcttcgtgta ttgctgtgtt tccactgctg agtgctctga tcttaataat cctctga 89727415PRTzebrafish 27Met Ile Asp Leu Leu Trp Ile Ser Ser Asp Gly His Pro Gln Leu Phe1 5 10 15Tyr Gln Phe Ile Asn Ile Pro Phe Ala Phe Leu Phe His Cys Leu Ser 20 25 30Ser Gln Gly His Leu Ser Ile Ile Asn Arg Tyr Val Tyr Leu Ala Met 35 40

45Gly Gly Phe Met Leu Ala Ile Ala Thr Met Gly Pro Tyr Ser Ser Leu 50 55 60Leu Phe Leu Ser Ala Ile Lys Leu Leu Leu Leu Ile His Tyr Ile His65 70 75 80Pro Met His Leu His Arg Trp Ile Leu Gly Leu Gln Met Cys Trp Gln 85 90 95Thr Cys Trp His Leu Tyr Val Gln Tyr Gln Ile Tyr Trp Leu Gln Glu 100 105 110Ala Pro Asp Ser Arg Leu Leu Leu Ala Ile Ser Ala Leu Met Leu Met 115 120 125Thr Gln Arg Ile Ser Ser Leu Ser Leu Asp Phe Gln Glu Gly Thr Ile 130 135 140Ser Asn Gln Ser Ile Leu Ile Pro Phe Leu Thr Tyr Ser Leu Tyr Phe145 150 155 160Pro Ala Leu Leu Gly Gly Pro Leu Cys Ser Phe Asn Ala Phe Val Gln 165 170 175Ser Val Glu Arg Gln His Thr Ser Met Thr Ser Tyr Leu Gly Asn Leu 180 185 190Thr Ser Lys Ile Ser Gln Val Ile Val Leu Val Trp Ile Lys Gln Leu 195 200 205Phe Ser Glu Leu Leu Lys Ser Ala Thr Phe Asn Ile Asp Ser Val Cys 210 215 220Leu Asp Val Leu Trp Ile Trp Ile Phe Ser Leu Thr Leu Arg Leu Asn225 230 235 240Tyr Tyr Ala His Trp Lys Met Ser Glu Cys Val Asn Asn Ala Ala Gly 245 250 255Phe Gly Val Tyr Leu His Lys His Ser Gly Gln Thr Ser Trp Asp Gly 260 265 270Leu Ser Asp Gly Ser Val Leu Val Thr Glu Ala Ser Ser Arg Pro Ser 275 280 285Val Phe Ala Arg Lys Trp Asn Gln Thr Thr Val Asp Trp Leu Arg Lys 290 295 300Ile Val Phe Asn Arg Thr Ser Arg Ser Pro Leu Phe Met Thr Phe Gly305 310 315 320Phe Ser Ala Leu Trp His Gly Leu His Pro Gly Gln Ile Leu Gly Phe 325 330 335Leu Ile Trp Ala Val Thr Val Gln Ala Asp Tyr Lys Leu His Arg Phe 340 345 350Leu His Pro Lys Leu Asn Ser Leu Trp Arg Lys Arg Leu Tyr Val Cys 355 360 365Val Asn Trp Ala Phe Thr Gln Leu Thr Val Ala Cys Val Val Val Cys 370 375 380Val Glu Leu Gln Ser Leu Ala Ser Val Lys Leu Leu Trp Ser Ser Cys385 390 395 400Ile Ala Val Phe Pro Leu Leu Ser Ala Leu Ile Leu Ile Ile Leu 405 410 415281248DNAzebrafish 28atgatagatc tcctttggat ttcttctgat ggacaccctc agctgtttta ccagtttatc 60aacataccat ttgcatttct gtttcattgc ttatccagtc aaggacatct ctcgataatc 120aacaggtacg tctatttggc gatgggagga ttcatgctgg ctattgcaac aatgggtcca 180tatagctcac tgctgttcct gagtgctatt aaactgctgt tactgatcca ctatatacat 240ccaatgcatc ttcatcggtg gattctggga ctgcagatgt gttggcaaac ctgctggcat 300ttgtacgtcc agtaccagat atactggctt caagaggcac cagactcaag gcttttactg 360gccatatctg cactcatgtt gatgacccag aggatttcct ctctatcact cgatttccaa 420gaggggacga tctccaatca gtcaatcctt attccattcc taacctactc gctttatttc 480cctgcccttc ttggaggtcc actttgcagt ttcaatgctt ttgttcagtc tgtcgagcgt 540caacacacca gcatgacttc atatttagga aatctcactt caaagatatc acaagttata 600gttttggtgt ggattaaaca gcttttcagt gagcttttga aatctgccac gtttaacatc 660gacagtgttt gtcttgatgt attgtggatt tggatctttt cgctgacact taggcttaat 720tactatgcac actggaagat gagcgagtgt gttaataatg ctgcaggatt tggtgtctat 780ttacacaaac acagtggaca aacatcatgg gacggtcttt ctgatgggag tgtactggtg 840actgaagcat ccagtcgtcc ttcggttttt gcgcgaaagt ggaaccaaac cacggtggat 900tggcttcgaa aaatagtctt caacaggacc agcagatctc cactgttcat gacttttggg 960ttttctgcac tgtggcacgg tcttcaccct gggcagattc tgggtttcct catttgggcc 1020gtcactgtgc aggcggacta caaactgcat cgcttcttgc acccgaagct taactccctg 1080tggagaaaac ggctgtatgt gtgtgtaaac tgggccttta ctcagctgac cgtcgcatgt 1140gttgtggtct gtgtggagct tcagagtttg gcatcagtta agctgctctg gtcttcgtgt 1200attgctgtgt ttccactgct gagtgctctg atcttaataa tcctctga 1248

* * * * *