Cancer-associated genes Yoshikawa, Yoshie ; et al. [Takara Shuzo Co., Ltd.]

Cancer-associated genes

Yoshikawa, Yoshie ; et al.

Patent Application Summary

U.S. patent application number 10/325881 was filed with the patent office on 2003-06-26 for cancer-associated genes. This patent application is currently assigned to Takara Shuzo Co., Ltd.. Invention is credited to Asada, Kiyozo, Hino, Fumitsugu, Kato, Ikunoshin, Mukai, Hiroyuki, Yoshikawa, Yoshie.

Application Number	20030119047 10/325881
Document ID	/
Family ID	12916683
Filed Date	2003-06-26

United States Patent Application	20030119047
Kind Code	A1
Yoshikawa, Yoshie ; et al.	June 26, 2003

Cancer-associated genes

Abstract

To provide a method for detecting a cancer cell in a resected specimen by determining a change in an expression level of at least one of cancer-associated genes selected from genes of which cDNA is a DNA comprising a nucleotide sequence as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing, or a DNA capable of hybridizing with a nucleic acid consisting of a nucleotide sequence as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing under stringent conditions; as well as a kit for detecting cancer by the above method, and the like.

Inventors:	Yoshikawa, Yoshie; (Kyoto-shi, JP) ; Mukai, Hiroyuki; (Moriyama-shi, JP) ; Asada, Kiyozo; (Koga-gun, JP) ; Hino, Fumitsugu; (Kusatsu-shi, JP) ; Kato, Ikunoshin; (Uji-shi, JP)
Correspondence Address:	BIRCH STEWART KOLASCH & BIRCH PO BOX 747 FALLS CHURCH VA 22040-0747 US
Assignee:	Takara Shuzo Co., Ltd.
Family ID:	12916683
Appl. No.:	10/325881
Filed:	December 23, 2002

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
10325881	Dec 23, 2002
09377497	Aug 20, 1999
09377497	Aug 20, 1999
PCT/JP98/00667	Feb 18, 1998

Current U.S. Class:	435/6.12 ; 435/91.2
Current CPC Class:	C07K 14/47 20130101; C12Q 1/6886 20130101; C07K 14/82 20130101; C12Q 2600/158 20130101
Class at Publication:	435/6 ; 435/91.2
International Class:	C12Q 001/68; C12P 019/34

Foreign Application Data

Date	Code	Application Number
Feb 21, 1997	JP	9-52508

Claims

What is claimed is:

1. A method for detecting a cancer cell in a resected specimen, characterized by determining a change in an expression level of at least one of cancer-associated genes selected from genes of which cDNA is a DNA comprising a nucleotide sequence as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing, or a DNA capable of hybridizing with a nucleic acid consisting of a nucleotide sequence as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing under stringent conditions.

2. The detection method according to claim 1, characterized in that the change in an expression level of a cancer-associated gene is determined by the change in an expression level of mRNA corresponding to said gene.

3. The detection method according to claim 2, characterized in that the change in an expression level of mRNA is detected by utilizing a nucleic acid amplification method based on said mRNA or a partial portion thereof.

4. The detection method according to claim 3, characterized in that said nucleic acid amplification is polymerase chain reaction.

5. The detection method according to claim 2, characterized in that the change in an expression level of mRNA is detected by Northern hybridization method.

6. The detection method according to claim 2, characterized in that the change in an expression level of mRNA is detected by RNase protection assay.

7. The detection method according to claim 1, characterized in that the change in an expression level of a cancer-associated gene is determined by a change in an expression level of a protein encoded by said gene.

8. The detection method according to claim 7, characterized in that the change in expression of the protein is detected by utilizing an antibody capable of recognizing said protein.

9. A kit for detecting cancer by the method of claim 3, wherein the kit comprises primers for amplifying mRNA of which change in an expression level is to be determined or a partial portion thereof.

10. A kit for detecting a cancer cell by the method of claim 8, wherein the kit comprises an antibody recognizing a protein of which change in an expression level is to be determined.

11. A method for controlling proliferation of a cancer cell using a substance specifically binding to a gene or an expression product of said gene, characterized in that cDNA of the gene is at least one of DNAs selected from a DNA comprising a nucleotide sequence as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing, or a DNA capable of hybridizing with a nucleic acid consisting of any one of these nucleotide sequences under stringent conditions, wherein the DNA is usable for detection of a cancer cell by a change in an expression level thereof.

12. A peptide usable for detection of a cancer cell, characterized in that the peptide is shown in an amino acid sequence comprising an entire portion of any one of amino acid sequences as shown in SEQ ID NOs: 17 to 19, 69 and 70 in Sequence Listing or a partial portion thereof.

13. A peptide usable for detection of a cancer cell, characterized in that the peptide has an amino acid sequence comprising an amino acid sequence resulting from at least one of deletion, substitution or addition of one or more amino acid residues in an amino acid sequences as shown in any one of SEQ ID NOs: 17 to 19, 69 and 70 in Sequence Listing.

14. An antibody usable for detection of a cancer cell, wherein the antibody recognizes the peptide of claim 12 or 13.

15. A kit for detecting a gastric cancer cell, wherein the kit comprises primer pairs which are capable of amplifying a mRNA for a cancer-associated gene to be detected of which change in an expression level is to be determined, wherein said cancer-associated gene consists of the nucleotide sequence selected from the group consisting of SEQ ID NOs: 1 and 66, and wherein the primer pairs consist of at least two primers, each comprising 10 to 30 nucleotides.

Description

[0001] This application is a continuation-in-part application of PCT/JP98/00667, the entire contents of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to a method for detecting a cancer cell characterized by detecting an expression product of a gene capable of changing an expression level thereof owing to canceration. The present invention relates to a gene capable of changing an expression level thereof and a product of the gene owing to canceration.

[0004] 2. Discussion of the Related Art

[0005] Cancers constitute the top of the causes for mortality in Japan since 1981, and a gastric cancer occurs especially at the highest frequency. Recently, there has been known that there is a multi-stage carcinogenic mechanism in the course from a normal cell to a cancer cell [Fearon, E. R. et al., Cell, 61, 759-767 (1990); Sugimura, T., Science, 258, 603-607 (1992)] for which the accumulation of the abnormality in a plurality of genes including DNA repair genes, tumor suppressor genes and oncogenes is essential. Generally, the instability of a gene and the inactivation of a tumor suppressor gene are involved in the development of a cancer, and the activation of an oncogene and/or the overexpression of a growth factor are involved in the advancement and malignancy of a cancer.

[0006] The instability of a gene includes the instability of gene associated with abnormality in a DNA mismatch repair system and the instability at a chromosomal level. An example of the former includes the difference in the chain length of a simple repeated sequence present in a genome between a cancer site and a non-cancer site in the same individual (microsatellite instability) [Thibodeau, S. N. et al., Science, 260, 816-819 (1993)], and an example of the latter includes an interchromosomal translocation. The interchromosomal translocation may cause to express a protein which is not found in normal cells, or the interchromosomal translocation may affect an expression level of a protein, even if it is expressed in normal cells. In fact, in human chronic myelocytic leukemia, bcr gene is fused with c-abl gene by the interchromosomal translocation, and there has been confirmed an expression of a hybrid mRNA transcribed from bcr-abl fusion gene, which is absent in normal cells. Further, there has been confirmed that an introduction of bcr-abl fusion gene into an animal results in an onset of leukemia [Watson, J. D. et al., Molecular Biology of Recombinant DNAS, 2nd Ed., Maruzen K. K., 309 (1992)].

[0007] The inactivation of a tumor suppressor gene includes, for example, an inactivation of p53 gene. The inactivation is considered to be caused by a deletion within the gene, or a point mutation occurring in a particular portion of an encoding region [Nigro, J. M. et al., Nature, 342, 705-708 (1989); Malkin, D. et al., Science, 250, 1233-1238 (1990)]. In addition, since the deletion and the point mutation of the p53 gene are observed in various cancers, and are as frequent as 60% or higher especially in cases of a gastric cancer at an early stage [Yokozaki, H. et al., Journal of Cancer Research and Clinical Oncology, 119, 67-70 (1992)], the detection of these mutations is considered to be useful for detecting a cancer at an early stage.

[0008] On the other hand, p16/MTS1 gene has been known to be a gene which is inactivated owing to a homologous deletion, and high-frequency homologous deletions have been observed in cases of a glioma, a pancreatic cancer and a urinary bladder cancer [Cairns, P. et al., Nature Genetics, 11, 210-212 (1995)]. p16 Protein regulates a cell cycle, and the abnormality in p16 expression has been suggested to be involved in the canceration of a cell [Okamoto, A. et al., Proceedings of the National Academy of Sciences of the United States of America, 91, 11045-11049 (1994)].

[0009] As the causation for the activation of an oncogene, there can be included, for example, a viral insertion mutation in a proximity of an oncogene and an interchromosomal translocation. For example, a viral insertion mutation has been confirmed in lymphoma of a chicken which is caused by an avian leukosis virus (ALV). In this case, it has been found that DNA of an ALV is inserted in the proximity of a gene c-myc, and, by potent viral enhancer and promoter, a normal c-myc is overexpressed, and a new sequence which is different partially from the normal gene has been expressed. In addition, in a certain kind of human B cell tumor, there has been confirmed that c-myc, which is one of oncogenes, is located near a potent transcription signal of immunoglobulin by the interchromosomal translocation, whereby increasing its expression level of the mRNA. In this case, no difference has been found between a protein for c-myc in a cancer cell and a protein for c-myc expressed in a normal cell, and the canceration is considered to be caused by an increase in the expression level of the c-myc mRNA [Watson, J. D. et al., Molecular Biology of Recombinant DNAS, 2nd Ed., Maruzen K. K., 305-308 (1992)].

[0010] An overexpression of a growth factor includes, for example, an overexpression of C-Met which encodes a hepatocyte growth factor receptor. There has been confirmed that the abnormality in expression of the C-Met is observed as an expression of mRNA having the length of 6.0 kb which is not found in a normal mucous membrane at an early stage of gastric cancer [Kuniyasu, H. et al., International Journal of Cancer, 55, 72-75 (1993)], or is observed at a high frequency, and that a correlation between the gene amplification and the degree of the cancer malignancy is observed [Kuniyasu, H. et al., Biochemical and Biophysical Research Communications, 189, 227-232 (1992)].

[0011] As examples of confirming the correlation between the gene abnormality and the degree of cancer malignancy, in addition to that of the c-Met mentioned above, there have been confirmed that an amplification and/or an overexpression of an oncogene C-erbB2 gene is found in mammary cancers, ovarian cancers, gastric cancers and uterine cancers [Wright, C. et al., Cancer Research, 49, 2087-2090 (1989); Saffari, B. et al., Cancer Research, 55, 5693-5698 (1995)]; and that an amplification and/or an overexpression of an oncogene K-sam gene is found in a poorly-differentiated adenocarcinoma which is one tissue type of gastric cancer [Tahara, E. et al., Gastric Cancer, Tokyo, Springer-Verlag, Published in 1993, 209-217], respectively.

[0012] As described above, the information concerning the gene involved in the development and the advancement of a cancer as well as the abnormality of such genes has been increasing, and the genetic diagnosis of a biopsy material may serve for an early diagnosis and an assessment of the degree of malignancy of a cancer. However, since a carcinogenic mechanism comprises multiple steps and requires an accumulation of a plurality of mutations, a large part of the genes associated with the canceration have still yet been unknown, and further study is necessary. Recently, a gene therapy in which a normal p53 gene is introduced into a cancer cell whereby suppressing the proliferation of the cancer cell is now at a stage of a clinical trial. Therefore, the solution for a cancer-suppressing gene can shed light not only in the diagnosis but also in the gene therapy.

SUMMARY OF THE INVENTION

[0013] Accordingly, a first object of the present invention is to provide a method for detecting cancerated cell and a method for determining a degree of malignancy, on the basis of finding a gene usable as an index for carcinogenesis, particularly a gene capable of changing expression conditions thereof by canceration of a cell, and measuring an expression level of the gene in a resected specimen. A second object of the present invention is to provide a kit used for the above method for detecting a cancer cell and/or a method for determining a degree of malignancy of the cell. A third object of the present invention is to provide a method for controlling proliferation of a cancer cell by using a substance specifically binding to a gene capable of serving as an index for carcinogenesis or an expression product of the gene. Furthermore, a fourth object of the present invention is to provide a novel peptide associated with canceration, and a nucleic acid encoding the peptide. These and other objects of the present invention will be apparent from the following description.

[0014] To summarize the present invention, a first invention of the present invention is an invention pertaining to a method for detecting a cancer cell in a resected specimen, characterized by determining a change in an expression level of a gene selected from genes of which cDNA corresponds to a DNA comprising a nucleotide sequence as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing, or a DNA capable of hybridizing with a nucleic acid as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing under stringent conditions by, for example, determining a change of an expression level of mRNA or a change of a protein expression level.

[0015] A second invention of the present invention is an invention pertaining to a kit for detecting cancer by the method for detecting of the present invention, characterized in that the kit comprises as an essential constituent any one of primers for amplifying mRNA as an index for a change in an expression level, a probe capable of hybridizing with the above mRNA, or an antibody recognizing a protein as an index for the change in expression level.

[0016] A third invention of the present invention is a method for controlling proliferation of a cancer cell by using a substance specifically binding to the gene or an expression product thereof, characterized in that cDNA of the gene corresponds to a DNA comprising a nucleotide sequence any one of sequences of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing, or a DNA capable of hybridizing with DNA as shown in any one of sequences of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing, wherein the method gives transcriptional control of the gene and/or functional control of an expression product thereof, and the like.

[0017] A fourth invention of the present invention is an invention pertaining to a peptide usable for detecting cancer and a nucleic acid encoding the peptide, characterized in that the peptide consists of an amino acid sequence comprising an entire portion of an amino acid sequence as shown in any one of SEQ ID NOs: 17 to 19, 69 and 70 in Sequence Listing or a partial portion thereof and a nucleic acid encoding the peptide.

[0018] A fifth invention of the present invention pertains to an antibody usable for detecting cancer, the antibody recognizing the above peptide of the fourth invention.

[0019] Incidentally, the term "resected specimen" used in the present specification refers to blood, urine, feces, tissue resected by surgery, and the like. On the other hand, the term "cancer-associated gene" refers to a gene in which the expression conditions thereof change with canceration of a cell.

[0020] In order to achieve the objects mentioned above, the present inventors have found a cancer-associated gene by comparing the intracellular expression levels of genes between a cancer tissue and a control normal tissue of a cancer patient, and they have found that cancer cells can be detected by comparing the expression level of this gene. In addition, they have found a novel gene in this cancer-associated gene, whereby completing the present invention.

[0021] The terms "cancer tissue" and "control normal tissue" used in the present specification mean a tissue constituting a region of cancerous lesion in a multicellular individual and a tissue constituting a region which is identical spatially to the cancer tissue in the same individual but functions normally.

BRIEF DESCRIPTION OF THE DRAWINGS

[0022] FIG. 1 is an autoradiogram showing electrophoretic patterns of the resulting DNA fragment in a case of detecting cancer-associated genes by DD method.

[0023] FIG. 2 is an autoradiogram obtained by electrophoresing RNA and then hybridizing a labeled probe with a desired mRNA, in a case of detecting a change in an expression level of mRNA of cancer-associated genes by Northern hybridization method.

[0024] FIG. 3 is a picture showing electrophoretic patterns of the resulting DNA fragment in a case of detecting a change of expression of a cancer-associated gene by RT-PCR method.

DETAILED DESCRIPTION OF THE INVENTION

[0025] The present invention will be explained concretely below.

[0026] The first invention of the present invention provides a method for detecting a cancer cell using an expression level of the cancer-associated gene as an index.

[0027] A gene which can serve as an index for canceration is a gene capable of changing expression conditions thereof by canceration of a cell, namely, a gene of which expression is significantly induced or suppressed. Such a gene can be detected by, for instance, analyzing copy number of the gene on genome or a pattern for translocation in chromosomes, and comparing an expression level of a gene product in a normal cell and a cancerated cell to identify a gene having differences in both cells. The gene product includes, for example, mRNA transcribed by the above gene or a protein which is a translational product. In the detection in the present invention for a cancer-associated gene, it is efficient to use as an index an expression level of mRNA, in which various methods have been developed for its analysis with the progress in gene manipulation technique. Procedures for confirming a change in an expression level of a gene using as an index an expression level of mRNA includes subtractive hybridization method [Zimmermann, C. R. et al., Cell, 21, 709-715 (1989)], Representational Difference Analysis (RDA) method [Lisitsyn, N. et al., Science, 259, 946-951, (1993)], molecular index method (Japanese Patent Laid-Open No. Hei 8-322598), Differential Display (DD) method [Liang, P. and Pardee, A. B., Science, 257, 967-971 (1992)], and the like. Among them, since the procedures of the DD method are simple, the DD method is suitable for screening a gene in the present invention. The method for screening a cancer-associated gene by using the DD method utilized in the present invention will be described in detail below.

[0028] First, mRNA is converted to cDNA by carrying out a reverse transcription reaction with a genome DNA-removed crude RNA sample resulting from treating each RNA individually extracted from a cancer tissue and a control normal tissue to be compared with DNase, together with an oligo(dT) anchor primer and a reverse transcriptase (RTase). Thereafter, the nucleic acid amplification is carried out by polymerase chain reaction (PCR) with the oligo(dT) anchor primer in combination with various random primers.

[0029] Subsequently, a PCR-amplified product obtained separately from the tissues to be compared is subjected to polyacrylamide electrophoresis for each amplified product resulting from a combination of an identical primer pair. The band patterns are compared with each other to find a band exhibiting a difference between the normal cell and the cancer cell. This band is cut out from the gel, and a nucleic acid contained in the band is extracted, whereby a DNA fragment which is considered to be complementary to a partial portion with the mRNA for the cancer-associated gene can be obtained.

[0030] Thereafter, there is studied whether changes in expression levels of mRNA for the cancer-associated gene can be truly confirmed from the DNA fragment obtained in the DD method described above. When the expression level of the mRNA in a normal tissue is confirmed to be higher than that in the cancer tissue, it is determined that the cancer-associated gene is a gene of which expression level is reduced owing to canceration. On the other hand, when the expression level of the mRNA in the cancer tissue is confirmed to be higher than that in the normal tissue, it is determined that the cancer-associated gene is a gene of which expression level is amplified owing to canceration.

[0031] The confirmation on an expression level of mRNA can be made, for example, by labeling the DNA fragment obtained, subjecting a crude RNA sample extracted from either of the cancer tissue or the control normal tissue to Northern hybridization using the above DNA fragment as a detection probe, and confirming the difference in the observed signal intensity with a densitometer. In other words, the stronger the signal intensity, it can be determined that the expression level of the mRNA is high. For example, a signal intensity can be expressed as a value for a volume of a band [IOD (Integrated Optical Density)] obtained from an autoradiogram, or the like. Here, the higher the IOD value, it can be determined that the expression level of the mRNA corresponding to the band is high.

[0032] When the expression level of mRNA is too low so that the change in the expression level of the mRNA cannot be confirmed by means of Northern hybridization analysis, there can be also confirmed with more sensitive RNase protection assay [Krieg, P. A. and Melton, D. A., Methods in Enzymology, 155, 397-415 (1987)] using as a probe RNA prepared from an amplified DNA fragment obtained by the DD method described above, which is derived from mRNA deduced to be expressed from a cancer-associated gene as a template. This method utilizes RNase having substrate specificity wherein it shows cleaving activity on single-stranded RNA, but shows no cleaving activity on double-stranded RNA. Specifically, an excessive amount of a probe is added to a crude RNA sample extracted from a normal tissue and a cancer tissue-derived crude RNA sample, and the mRNA to be detected forms a hybrid with the added probe, whereby acting on an RNase having substrate specificity. The expression level of the mRNA can be confirmed by determining the amount of the double-stranded RNA remaining after the digestion with the RNase mentioned above. In other words, the larger the amount of the remaining double-stranded RNA, it can be determined that the expression level of the mRNA is high.

[0033] The nucleotide sequence of an amplified DNA fragment obtained by the DD method described above, which is derived from mRNA deduced to be expressed from a cancer-associated gene as a template, is sequenced by PCR direct sequencing [Erlich, H. A., PCR Technology, Stockton Press, Published in 1989, 45-60], or by a combination of TA cloning [Mead, D. A. et al., Bio/Technology, 9, 657-663 (1991)] with a usual nucleotide sequencing method to determine the nucleotide sequence, and the amounts of the amplified product as obtained by carrying out RT-PCR with an amplification primer which is designed based on the above nucleotide sequence information are then compared, whereby the mRNA expression level can be confirmed. In other words, the higher the amount of the resulting amplified product, it can be determined that the expression level of the mRNA is high.

[0034] Incidentally, the amplified DNA fragment obtained by the DD method described above, which is derived from mRNA deduced to be expressed from a cancer-associated gene as a template, is not necessarily cDNA complementary to an entire length of mRNA for the cancer-associated gene. In order to obtain cDNA for a cancer-associated gene, for example, a cDNA library derived from a tissue used in screening is prepared; an amplified DNA fragment obtained by the DD method described above, which is derived from mRNA deduced to be expressed from a cancer-associated gene as a template, is labeled; and DNA derived from plaque hybridization is carried out with the labeled cancer-associated gene as a detection probe, whereby cDNA clone for a cancer-associated gene can be isolated.

[0035] The present inventors have succeeded in isolating 14 kinds of DNA fragments comprising a respective nucleotide sequence of a partial portion of cDNA for cancer-associated genes. Genes expressing mRNA which corresponds to cDNA as shown in nucleotide sequences comprising a nucleotide for the DNA fragment thus obtained are named as CA11, CA13, CC24, GG24, AG26, GC31, GC32, GC33, GG33, CC34, GC35, GC36, CA42 and CC62, respectively. Correspondences between SEQ ID NOs in Sequence Listing in which a nucleotide sequence of regions presently determined in each nucleotide sequence of cDNA for 14 kinds of cancer-associated genes and the above name of the gene named by the present inventors are shown in Table 1.

1 TABLE 1 SEQ ID NOs in Sequence Listing Nucleotide Amino Acid Sequence Sequence Name of Gene 1, 66 17, 69 CA11 2 18 CA13 3 CC24 4 GG24 5 AG26 6 GC31 7 GC32 8 GC33 9 GG33 10 CC34 11, 67 GC35 12, 15, 16, 68 70 GC36 13 19 CA42 14 CC62

[0036] Here, in Table 1, the nucleotide sequence as shown in SEQ ID NO: 68 comprises the sequences as shown in SEQ ID NOs: 12, 15 and 16. In addition, the amino acid sequence as shown in SEQ ID NO: 70 is a deduced sequence based on the nucleotide sequence as shown in SEQ ID NO: 68.

[0037] The above cancer-associated genes are roughly classified into a gene in which the expression level is decreased or increased by canceration. The former genes include CA11, AG26, GC35, GC36 and CC62; and the latter genes include CA13, CC24, GG24, GC31, GC32, GC33, GG33, CC34 and CA42.

[0038] By comparing the expression level of each of the genes obtained as above, cancer cells can be detected. In this case, the cancer-associated gene serving as an index may be appropriately selected from the genes listed above, and it may be used as a single kind, or in combination of several kinds of genes. In addition, the cancer-associated gene serving as an index for detection of a cancer cell is not particularly restricted to the 14 kinds of genes listed above, and the cancer-associated gene may be any gene of which cDNA is DNA capable of hybridizing under stringent conditions with the DNA as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing, as long as the expression level of the gene is changed owing to canceration of a cell.

[0039] Conditions capable of hybridizing used in the present specification refer to, for example, those capable of hybridizing by a process comprising incubating DNA immobilized on a nylon membrane with a probe at 65.degree. C. for 20 hours in a solution containing 6.times.SSC (wherein 1.times.SSC is a solution prepared by dissolving sodium chloride 8.76 g and sodium citrate 4.41 g in IL of water), 1% SDS, 100 .mu.g/ml herring sperm DNA, 0.1% bovine serum albumin, 0.1% polyvinyl pyrrolidone and 0.1% Ficol.

[0040] In fact, there has also been confirmed in the present invention the presence of a gene having the characteristics described above. The nucleotide sequence as shown in SEQ ID NO: 10 in Sequence Listing is present in the nucleotide sequence of cDNA for CC34 gene. DNA as shown in this nucleotide sequence wherein T at base number 935 of the sequence is substituted with A, and 6 bases consisting of the sequence of GTTAAG at a 3'-terminal are deleted has been obtained as a DNA fragment with different amplification levels in the DD method using RNA prepared from a normal tissue and RNA prepared from a cancer tissue. This amplified DNA fragment is capable of hybridizing with DNA as shown in SEQ ID NO: 10 in Sequence Listing. Therefore, a gene expressing mRNA which yields this DNA fragment obtained by the DD method in the present invention is also encompassed in the cancer-associated gene for detecting a cancer cell in the present invention.

[0041] In addition, as a result of Northern hybridization using highly purified mRNA, it is found that there are plural gene transcriptional products capable of hybridizing with GC36 under stringent conditions, and signals corresponding to each of about 2 kb band, and about 2.4 to about 2.6 kb band are detected in a gastric tissue. In a case of GC35, as a result of Northern hybridization in the same manner as GC36, it is shown that there are plural gene transcriptional products capable of hybridizing with GC35 under stringent conditions and signals corresponding to each of about 1.6 kb; about 3.6 to about 4.0 kb; about 4.5 kb; and about 5.6 to about 6.0 kb are detected in a gastric tissue. It is considered that these mRNAs result from alternative splicing, wherein mRNAs with different sizes are produced by splicing via different combinations of plural exons of primary transcript (mRNA precursor) from the same gene. For instance, a nucleotide sequence of cDNA for nCL-4 encoding digestive tract-specific calpain has high homology with a nucleotide sequence of cDNA for GC36 gene, wherein the nucleotide sequence of cDNA for nCL-4 was clarified at the date after the priority date of the present application [Lee, H. -J. et. al., Biological Chemistry, 379, 175-183, 1998]. In addition, since GC36 gene translation product is identical to nCL-4 except for substitution of one amino acid and deletion of the following 26 amino acids in its amino acid sequence, it is suggested that the mRNA deduced to be expressed from nCL-4 gene and the mRNA deduced to be expressed from GC36 gene are produced by alternative splicing. Further, in the present invention, it is confirmed that an expression level of the mRNA deduced to be expressed from nCL-4 gene is reduced by canceration as in the mRNA deduced to be expressed from GC36 gene. Therefore, the cancer-associated gene usable for detection of cancer cells in the present invention also encompasses mRNAs resulting from alternative splicing, such as the mRNA deduced to be expressed from nCL-4 gene.

[0042] The determination of whether or not a cell is a cancer cell is carried out by firstly using a plurality of normal tissues to confirm a normal level of the expression level of the cancer-associated gene used as an index for canceration by a suitable detection method; subsequently determining an expression level of the cancer-associated gene in a resected specimen; and comparing it with the normal level. Specifically, in a case where the expression of the cancer-associated gene as an index is suppressed by canceration, it is determined to be cancer-positive when the expression of this cancer-associated gene cannot be confirmed or can be confirmed only at a level lower than the normal level in a resected specimen. On the contrary, in a case where the expression of the cancer-associated gene as an index is amplified by canceration, it is determined to be cancer-positive when the expression of this cancer-associated gene is at a level higher than the normal level. In the comparison of the expression level of the cancer-associated gene, there may be employed either the amount of mRNA or the amount of a protein expressed from this gene. Incidentally, the normal level referred in the present specification can be shown by the following equation based on the expression level of the cancer-associated gene in a plurality of normal tissues obtained by an appropriate detection method.

[Normal Level Value]=[Mean Expression Level of Cancer-Associated Gene in Normal Tissue].+-.2.times.[Standard Deviation] Equation 1

[0043] This normal level value as calculated encompasses 95% of the normal tissues for which the expression level of the cancer-associated gene is determined.

[0044] The detection method utilizing mRNA includes, for example, RT-PCR method, RNase protection assay or Northern hybridization.

[0045] RT-PCR (Reverse transcribed-Polymerase chain reaction) method refers to a method comprising synthesizing cDNA by reverse transcriptional reaction using mRNA as a template, and thereafter performing nucleic acid amplification by PCR [Kawasaki, E. S. et al., Amplification of RNA. In PCR Protocol, A Guide to Methods And Applications, Academic Press, Inc., San Diego, 21-27 (1991)]. In the present invention, nucleic acid amplification reaction is not particularly limited, and may be Strand Displacement Amplification (SDA) method [Walker, G. T., Nucleic Acids Res., 20, 1691-1696 (1992)], Nucleic Acid Sequence-Based Amplification (NASBA) method [Compton, J., Nature, 350, 91-92 (1991)], and the like, in which their reaction conditions are also not particularly limited. In addition, the amplified region of cDNA for the cancer-associated gene is not necessarily an entire length of cDNA, but it may be a partial region of the cDNA, as long as the confirmation of the amplified products is not hindered. It is preferable that a primer pair used in nucleic acid amplification reaction is designed so as to specifically amplify only the cDNA. As long as the confirmation of amplified products for the region is not hindered, it does not matter that cDNA which is not subject to detection may be amplified. Incidentally, the term "primer" in the present specification refers to an oligonucleotide capable of acting as an initiation site for DNA synthesis in a case of hybridizing with a template nucleic acid at a suitable temperature under conditions for allowing initiation of synthesis of a primer extension product by DNA polymerase, namely, in the presence of 4 kinds of different nucleotide triphosphates and DNA polymerase in suitable buffer (the buffer being determined by pHs, ionic strength, cofactors, and the like). Typically, the primer comprises 10 to 30 nucleotides. For instance, in a case of CA11 gene in the present specification, there can be exemplified as the former primer a combination of DNAs as shown in SEQ ID NOs: 20 and 21 in Sequence Listing. Hindrance in the confirmation of the amplified products used in the present specification refers, for instance, to a case where the confirmation is carried out by subjecting the amplified DNA fragment to agarose gel electrophoresis, and thereafter staining the gel with ethidium bromide (EtBr), the amount of the amplified DNA fragment present corresponding to mRNA for a cancer-associated gene to be detected cannot be confirmed, since a large number of the amplified DNA fragments having about the same number of bases are produced by nucleic acid amplification reaction, and the separation of each amplified DNA fragment from each other is incomplete.

[0046] Amounts of the amplified DNA level can be confirmed by subjecting the nucleic acid amplification reaction mixture to agarose gel electrophoresis; and confirming from the position and the signal intensity of a band detected with a labeled probe capable of specifically hybridizing with a desired amplified fragment. Therefore, the higher the signal intensity obtained by using a certain amount of a crude RNA sample extracted from a resected specimen, it can be determined that the expression level of a cancer-associated gene to be detected is high. The label on the probe is not particularly limited. For example, there can be used a radioactive substance typically exemplified by .sup.32P, or a fluorescent substance typically exemplified by fluorescein. The signal intensity can, for example, be indicated by IOD of a band on an autoradiogram or a fluorescent image obtained by the method described above.

[0047] On the other hand, when an amplified product can be obtained in a sufficient amount, the amplified product can be confirmed by subjecting it to agarose gel electrophoresis, staining the gel with EtBr, and confirming from the position of the amplified DNA fragment and its fluorescent intensity. Therefore, the higher the fluorescent intensity, it can be determined that the expression level of the cancer-associated gene to be detected is high. It is also possible to determine the expression level of the cancer-associated gene from an IOD of a band on a fluorescent image instead of a fluorescent intensity.

[0048] In order to carry out a more accurate determination, the degree of amplification needs to be expressed numerically. For example, a quantitative PCR method (Japanese Unexamined Patent Publication No. Hei 5-504886) may be applied in the step of nucleic acid amplification reaction, whereby achieving the purpose mentioned above. A typical method includes adding a known amount of a nucleic acid having at its both terminals the primer nucleotide sequences used in amplification of a desired gene and having different internal sequences and sizes as an internal standard and amplifying by PCR reaction; and deducing the desired gene level by comparing the final amplified level of the desired product in the light of the final amplified level of the internal standard. In the present invention, an internal standard is not limited to an externally added standard, and there may also be used cDNA obtained by using as a template mRNA of a gene expressing in a normal tissue and a cancer tissue in the same level. As such cDNA, for example, there can be included cDNA for .beta.-actin gene which is a constituent of a cell backbone.

[0049] For example, in RT-PCR method using a crude RNA sample extracted from gastric cancer tissue cells, when the synthetic oligonucleotides having the nucleotide sequences of SEQ ID NOs: 20 and 21 in Sequence Listing are used as a primer pair for nucleic acid amplification reaction, it is possible to only amplify the nucleotide sequence region as shown in base numbers 122 to 487 in SEQ ID NO: 66 in Sequence Listing of the cDNA nucleotide sequences of a CA11 gene in the present specification as shown in FIG. 3(a).

[0050] The expression level of the cancer-associated gene can be determined by RNase protection assay by adding a probe which is RNA in an excess amount capable of specifically hybridizing with mRNA for a cancer-associated gene to be detected or a partial portion thereof to a given amount of a crude RNA sample extracted from a resected specimen, and quantifying the remaining RNA after digestion with the RNase. In other words, the larger the amount of the remaining RNA, it can be determined that the expression level of the cancer-associated gene is high.

[0051] Incidentally, a probe used in this method is not particularly limited, as long as it is RNA capable of hybridizing in hybridization buffer, for example, comprising 80% formamide, 40 mM Pipes (pH 6.4), 400 mM NaCl and 1 mM EDTA at 45.degree. C. for 20 hours, and having a nucleotide sequence complementary with a nucleotide sequence specific to mRNA for a cancer-associated gene to be detected. In addition, the label on this probe is not particularly limited, and there may, for example, be used a radioactive substance typically exemplified by .sup.32P, or a fluorescent substance typically exemplified by fluorescein.

[0052] The expression level of the cancer-associated gene can be determined by Northern hybridization by fractionating a given amount of a crude RNA sample extracted from a sample tissue based on the molecular weight; immobilizing on a nylon filter, or the like; bringing mRNA for a cancer-associated gene to be detected into contact with an excess amount of a probe for detecting this gene, and determining the signal intensity obtained from the probe hybridizing with the immobilized RNA. In other words, the higher the signal intensity, it can be determined that the expression level of the cancer-associated gene is high.

[0053] Incidentally, the term "hybridizing" used in the method refers, for example, to those capable of hybridizing by a process comprising incubating at 42.degree. C. for 20 hours in hybridization buffer containing 50% formamide, 0.65M NaCl, 0.1M sodium-Pipes, 5.times. Denhardt's reagent, 0.1% SDS, 5 mM EDTA. The detection probe is preferably a nucleic acid having a nucleotide sequence complementary to a nucleotide sequence which is specific to a cancer associated-gene mRNA to be detected. The nucleic acid is not particularly limited, as long as mRNA to be detected can be particularized by location of the above signals, even if its nucleotide sequence is such that signals can be obtained at several spots in the detection of RNA. Labelling of the above probe is not particularly limited, and there can be used, for example, radioactive substances typically exemplified by .sup.32P, as well as fluorescent substances typically exemplified by fluorescein.

[0054] FIG. 2 shows one example of the change in the expression level of mRNA for a cancer-associated gene detected by Northern hybridization method. In this figure, a photograph of an autoradiogram obtained by subjecting each of the RNAs obtained from a cancer tissue and a control normal tissue to electrophoresis individually, and hybridizing with a labeled probe for detecting mRNA for CA11 gene in the present specification.

[0055] In addition, when the change in the expression level of a cancer-associated gene is confirmed using a protein as an index, the confirmation may be made based on the biological activity of the protein, and the detection using an antibody against the protein is preferred for its simplicity in the present invention.

[0056] The antibody in the present invention is an antibody capable of specifically binding to a protein encoded by the cancer-associated gene. Therefore, the larger the amount of the antibody bound to a given amount of a crude protein extracted from a resected specimen, it can be determined that the expression level of the cancer-associated gene is high.

[0057] The protein as an antigen for obtaining the antibody described above can be obtained by purifying from cancer cells expressing the gene, or it can also be obtained by gene engineering technique. For example, a nucleic acid encoding the protein can be obtained by the method described above, in which the DD method is combined with the screening of the cDNA libraries prepared from cells expressing a desired protein. The desired protein can be obtained by incorporating the cDNA obtained into an appropriate expression vector, and expressing it in an appropriate host. Further, this protein may be expressed as a fusion protein. For example, in order to increase the expression level of a desired protein, an appropriate peptide chain is added to the N-terminal or C-terminal derived from other proteins and then allowed to be expressed, and a carrier having an affinity with this peptide chain is used, whereby a desired protein can be purified readily.

[0058] In addition, the antigen for obtaining an antibody may not necessarily be an entire molecule of the protein, and the antigen may be a peptide having an amino acid sequence region which is capable of recognizing the antibody and specific to the protein.

[0059] As the method for obtaining an antibody, the antibody can, for example, be obtained as an anti-serum by immunizing an animal with a peptide together with an adjuvant by a usual method. Alternatively, it can be obtained as a monoclonal antibody according to the method of Galfre, G. et al [Galfre, G. et al., Nature, 266, 550-552 (1977)].

[0060] An example of a method for detecting a protein using an antibody includes Western blotting method.

[0061] In this method, the method for detecting with a specific antibody can be carried out by treating cells with a detergent to dissolve intracellular proteins; separating the protein by SDS-polyacrylamide electrophoresis; transferring the resulting protein onto a nitrocellulose membrane, and the like. The antibody bound to a protein can secondarily be detected with, for instance, a .sup.25I-labeled protein A, a peroxidase-linked anti-IgG antibody, and the like.

[0062] The second invention of the present invention provides a kit for detecting a cancer cell. In other words, there can be provided a kit for detecting a cancer cell by utilizing the method for detecting a cancer cell, which is the first invention of the present invention. Concretely, there can be exemplified a kit for detecting the change in the expression level of a cancer-associated gene within the cells using as an index an amount of mRNA or an amount of a protein which is expressed by this gene.

[0063] In the case of a kit for detecting a cancer cell using as an index an expression level of mRNA by using the detection method with the nucleic acid amplification described above in connection with the method for detecting a cancer cell, a primer pair is an essential constituent, where the primer pair has the characteristics described above in connection with the method for detecting a cancer cell wherein the primer pair is capable of detecting mRNA of which cDNA is DNA as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing, or DNA capable of hybridizing under stringent conditions with DNA as shown in a nucleotide sequence comprising the nucleotide sequence as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing. For example, the kit in the present invention utilizing RT-PCR as a detection method may comprise in addition to the primer pair described above reverse transcriptase, dNTPs and a thermostable DNA polymerase. Incidentally, the kinds and the number of the cancer-associated genes to be detected by this kit are not particularly limited. Therefore, the primer pair constituting this kit is not particularly limited, and it may be selected appropriately depending upon the kinds and the number of the cancer-associated genes to be detected.

[0064] One example of the primer pair using as a template cDNA for the cancer-associated gene of the present invention only a part of the region of which is specifically amplified is shown in Table 2. In each primer pair in the table, a symbol of a combination of an alphabet and numerals indicates the name of the primer in the present invention, and a number within a parenthesis attached to each symbol indicates SEQ ID NO: in Sequence Listing showing the nucleotide sequence of each primer. Incidentally, .beta.-actin shown in Table 2 is a gene selected as an internal standard for the purpose of quantifying mRNA for the cancer-associated gene in a crude RNA sample extracted from a resected specimen.

2 TABLE 2 Target Size of Amplified Gene Primer Pair DNA Predicted CA11 F1 (20) R1 (21) 366 bp CA13 F2 (22) R2 (23) 168 bp CC24 F3 (24) R3 (25) 259 bp GG24 F4 (26) R4 (27) 384 bp AG26 F5 (28) R5 (29) 389 bp GC31 F6 (30) R6 (31) 213 bp GC32 F7 (32) R7 (33) 251 bp GC33 F8 (34) R8 (35) 563 bp GG33 F9 (36) R9 (37) 218 bp CC34 F10 (38) R10 (39) 241 bp GC35 F11 (40) R11 (41) 157 bp GC36 F12 (42) R12 (43) 95 bp CA42 F13 (44) R13 (45) 245 bp CC62 F14 (46) R14 (47) 134 bp .beta.-Actin F15 (48) R15 (49) 264 bp

[0065] On the other hand, in the case of a kit for detecting a cancer cell using as an index mRNA by using a detection method employing RNase protection assay or Northern hybridization method, it is an essential requirement for a constituent to have a probe which has the characteristics described above in connection with the method for detecting a cancer and is capable of detecting mRNA of a cancer-associated gene, of which cDNA is DNA comprising the nucleotide sequence as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing, or DNA capable of hybridizing under stringent conditions with DNA as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68. For example, in the case of a kit utilizing RNase protection assay, the kit may comprise, in addition to the probe described above, RNase, a concentrated reaction mixture for RNase, and the like. The kinds and the number of the cancer-associated genes to be detected by this kit are not particularly limited. Therefore, a probe constituting this kit is not particularly limited, and it may be selected appropriately depending on the kinds and the number of the cancer-associated genes to be detected.

[0066] On the other hand, in the case of a kit for detecting a cancer cell using a protein as an index by using the detection method with an antibody, it is an essential constituent to have an antibody which has the characteristics described above in connection with the method for detecting a cancer cell and is capable of binding individually and specifically to a peptide encoded by DNA as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68 in Sequence Listing, or DNA as shown in a nucleotide sequence comprising a nucleotide sequence of DNA capable of hybridizing under stringent conditions with DNA as shown in a nucleotide sequence comprising the nucleotide sequence as shown in any one of SEQ ID NOs: 1 to 16 and 66 to 68. The kinds and the number of the cancer-associated genes to be detected by this kit are not particularly limited. Therefore, the antibody constituting this kit is not particularly limited, and it may be selected appropriately depending upon the kinds and the number of the cancer-associated genes to be detected.

[0067] By using such a kit, a cancer cell can be detected more simply. Therefore, it is possible to diagnose a cancer based on the determined expression level of a cancer-associated gene by using such a kit. In other words, humans whose confirmation of the presence of the cancer cells is made by the method for detecting a cancer cell using this kit can be determined to be cancer-positive.

[0068] The third invention of the present invention is a method for controlling proliferation of a cancer cell using a substance specifically binding to a cancer-associated gene or an expression product thereof. The specific binding substance referred in the present specification can, for example, include nucleic acids, antibodies, cytotoxic T lymphocytes (CTL), and the like.

[0069] For example, bcr-abl chimeric protein detected frequently in chronic myelocytic leukemia has a high tyrosine kinase activity and plays an important role in the onset and the proliferation of the leukemia. An antisense oligonucleotide against a gene encoding this chimeric protein can serve to suppress in vivo the proliferation of this gene-expressing tumor (Skorski, T., Proc. Natl. Acad. Sci. USA 91, 4504, 1994). On the other hand, a peptide peculiar to a cancer of a protein expressing specifically in a cancer cell has been conventionally known to be a target of T cell immunoresponse to a cancer cell, and a peptide in a proximal site of the fusion of this fusion protein is immunized, whereby obtaining T cells reactive with this fusion protein (Chen, W., Proc. Natl. Acad. Sci. USA 89, 1468, 1992), which can, for example, be carried out utilizing the techniques described in the following report. Concretely, CD4+T cells which react specifically with a peptide for ras in which a 12th amino acid glycine is substituted with another amino acid, and which have HLA-DR restrainability are separated in human T cells (Jung, S., J. Exp. Med., 173, 273, 1991); and from a mouse immunized with a recombinant vaccinia virus capable of producing a protein for ras having a mutation in a 61st amino acid a CTL against a peptide consisting of 8 amino acids including such a mutation site can be induced (Skipper, J., J. Exp. Med., 177, 1493, 1993). Further, in a mouse immunized with a solubilized mutated protein for ras prepared by a gene recombination, the proliferation of cancer cells having the same mutation in vivo is suppressed (Fenton, R. G., J. Natl. Cancer Inst., 85, 1294, 1993); and from spleen cells sensitized with a mutated peptide for ras, a CTL exhibiting a cytotoxic activity on cancer cells expressing the same mutated ras is obtained (Peace, D. J., J. Exp. Med., 179, 473, 1994).

[0070] Therefore, as to a gene found to be associated with canceration of cells in the present invention, it is possible to control the cell proliferation by using the same antisense oligonucleotide. In addition, if there can be obtained T cells reactive with a protein encoded by a gene of which expression level is considered to be increased owing to canceration, it is possible to suppress the proliferation of cells in which the protein is expressed at a high level.

[0071] The fourth invention of the present invention provides a novel peptide usable for the detection of cancer, and a nucleic acid encoding the above peptide. In the cancer associated-gene elucidated by the present inventors, genes except for CA11, CA13, GG33, GC35, GC36 and CA42 have been clarified as genes which have already been isolated and identified by homology search with database in which information of nucleotide sequences is recorded. Specifically, CC24 corresponds to cytochrome c oxidase subunit I gene [Horai, S. et al., Proc. Natl. Acad. Sci. USA 92, 532-536 (1995)]; AG26 corresponds to p190-B gene [Burbelo, P. D. et al., J. Biol. Chem. 270, 30919-30926 (1995)]; GC31 corresponds to cytochrome c oxidase subunit II gene [Power, M. D. et al., Nucleic Acids Res. 17, 6734 (1989)]; GC32 corresponds to cytochrome b gene [Anderson, S. et al., Nature 290, 457-465 (1981)]; GC33 corresponds to integrin .alpha. 6 subunit gene [Tamura, R. N. et al., Journal of Cell Biology, 111, 1593-1604 (1990)]; GG24 corresponds to F1-ATPase .beta. subunit gene [Ohta, S. et al., The Journal of Biochemistry, 99, 135-141 (1986)]; and CC62 corresponds to lactoferrin gene [Rey, M. W. et al., Nucleic Acids Res. 18, 5288 (1990)]. On the other hand, CC34 cDNA clone is a clone different from a partial region of the cDNA nucleotide sequence encoding 16SrRNA [Horai, S. et al., Proc. Natl. Acad. Sci. USA 92, 532-536 (1995)] by 7 bases. Incidentally, the association with carcinogenesis for these genes has not been known.

[0072] On the other hand, as to each of the genes of CA11, CA13, GG33, GC35 and CA42, no reports have been yet made with regard to the nucleotide sequence, the sequence identical to the amino acid sequence encoded therein or the sequence having a homology therewith in the region analyzed herein in each of cDNAs for the genes. As a result of additional analysis, it is clarified that a nucleotide sequence of cDNA for GC36 gene has homology with a nucleotide sequence of cDNA for nCL-4 as mentioned above. Here, the cDNA for nCL-4 has a nucleotide sequence, wherein 78 bp of bases are inserted between base numbers 956 and 957 of SEQ ID NO: 68 in Sequence Listing, and 241 bp at 3'-terminal of bases are deleted. Namely, GC36 cDNA sequence is different from nCL-4 cDNA sequence. In other words, in the nucleotide sequence of each of cDNAs for the genes of CA11, CA13, GG33, GC35, GC36 and CA42, a nucleic acid having the nucleotide sequence clarified in the present invention is a novel nucleic acid isolated for the first time by the present inventors.

[0073] As shown in Table 1, a peptide encoded by a novel nucleic acid in the present invention comprising the nucleotide sequence as shown in each of SEQ ID NOs: 66, 2, 13 and 68 in Sequence Listing is deduced based on this nucleotide sequence that the peptide comprises the amino acid sequence as shown in each of SEQ ID NOs: 69, 18, 19 and 70 in Sequence Listing, without being limited thereto. Specifically, there also are encompassed [1] a peptide comprising an entire portion of the amino acid sequence as shown in any one of SEQ ID NOs: 17 to 19, 69 and 70 in Sequence Listing, or a partial portion thereof; and [2] a peptide resulting from addition, deletion or substitution of one or more amino acids in the amino acid sequence as shown in any one of SEQ ID NOs: 17 to 19, 69 and 70 in Sequence Listing, and having a change in the expression level owing to canceration of cells, because of the reasons described below.

[0074] In a naturally-occurring protein, mutations such as deletion, insertion, addition and substitution of amino acids can take place in its amino acid sequence in addition to a polymorphism or a mutation in a gene encoding it as well as a modification in vivo or in purification step after its production. Nevertheless, when such a mutation is present in a region in which it is insignificant to preserve the activities and the structure of the protein, there have been known to exhibit physiological and biological activities substantially of the same level as those of the proteins without mutations.

[0075] In addition, the same can be said for the case where the mutations described above are artificially introduced into an amino acid sequence of the protein, in which case diversified, various kinds of mutants can be further prepared. For instance, it has been also known that a polypeptide resulting from substitution of a particular cysteine residue with serine in the amino acid sequence of human interleukin 2 (IL-2) retains IL-2 activity [Wang, A. et. al., Science, 224, 1431-1433 (1984)]. Therefore, proteins are encompassed within the scope of the present invention, as long as no difference in the change in an expression level owing to canceration is found, even if the protein has an amino acid sequence which results from deletion, insertion, addition or substitution of one or several amino acid residues in an amino acid sequence disclosed by the present invention.

[0076] Further, certain kinds of proteins have been known to have a peptide region which is unessential for its activity. Examples are signal peptide present in a protein secreted extracellularly, and a pro-sequence found in a precursor of a protease, or the like, and almost all of these regions are removed after translation or when converted into an active protein. Such proteins are present in the form of different primary structures, but the proteins exhibit equivalent functions eventually.

[0077] When a protein is produced by a gene engineering technique, a peptide chain irrelevant to the activity of a, desired protein may be added to an amino terminal or carboxyl terminal of the protein. For example, in order to increase the expression level of a desired protein, a fusion protein resulting from adding a part of an amino terminal region of a protein highly expressed in a host used to an amino terminal of a desired protein may be prepared. Alternatively, in order to facilitate the purification of the protein expressed, a peptide having an affinity with a particular substance may be added to an amino terminal or carboxyl terminal of a desired protein. These added peptides may remain in an added state when there is no adverse effect on the activity of a desired protein, or the added peptides may be removed from a desired protein, if necessary, by means of an appropriate treatment such as a limited degradation with a protease.

[0078] Even a protein having or adding a peptide unessential for its function is also encompassed within the scope of the protein of the present invention, as long as it can exhibit an equivalent function. The term "peptide" in the present specification refers to two or more amino acids bound to each other via peptide bonds, and is intended to encompass those referred to as "protein."

[0079] A partial portion of the novel nucleic acid in the present invention consists of a nucleic acid encoding a peptide having the amino acid sequence as shown in any one of SEQ ID NOs: 17 to 19, 69 and 70 in Sequence Listing, wherein its nucleotide sequence include those as shown in Table 1, for instance, the nucleotide sequence as shown in any one of SEQ ID NOs: 1, 2, 13, 66 and 68 and in Sequence Listing. In other words, the peptide having the amino acid sequence as shown in SEQ ID NO: 17 in Sequence Listing is encoded by the base numbers 2 to 598 of the nucleotide sequence as shown in SEQ ID NO: 1 in Sequence Listing; the peptide having the amino acid sequence as shown in SEQ ID NO: 69 in Sequence Listing is encoded by the base numbers 64 to 660 of the nucleotide sequence as shown in SEQ ID NO: 66 in Sequence Listing; the peptide having the amino acid sequence as shown in SEQ ID NO: 18 in Sequence Listing is encoded by the base numbers 1698 to 1850 of the nucleotide sequence as shown in SEQ ID NO: 2 in Sequence Listing; the peptide having the amino acid sequence as shown in SEQ ID NO: 70 in Sequence Listing is encoded by base numbers 83 to 2074 of the nucleotide sequence as shown in SEQ ID NO: 68; the peptide having the amino acid sequence as shown in SEQ ID NO: 19 in Sequence Listing is encoded by the base numbers 8 to 196 of the nucleotide sequence as shown in SEQ ID NO: 13 in Sequence Listing, respectively, but the nucleic acids encoding the novel peptide in the present invention are not limited thereto. Specifically, there are also encompassed within the present invention 1) a nucleic acid encoding a peptide usable for detection of a cancer cell, wherein the peptide comprises an entire sequence of the amino acid sequence as shown in any one of SEQ ID NOs: 17 to 19, 69 and 70 in Sequence Listing, or a partial sequence thereof; 2) a nucleic acid encoding a peptide capable of changing its expression level owing to canceration of a cell, wherein the nucleic acid is capable of hybridizing with the novel nucleic acid of the present invention under stringent conditions; 3) a nucleic acid encoding a peptide usable for detection of a cancer cell by the change in its expression level, wherein one or more amino acids are added, deleted or substituted in the amino acid sequence as shown in any one of SEQ ID NOs: 17 to 19, 69 and 70 in Sequence Listing, and the like.

[0080] The term "nucleic acid encoding an amino acid sequence" described in the present specification will be described. There has been known that as the codon (triplet base combination) designating a particular amino acid on a gene, 1 to 6 kinds each exist for every amino acid. Therefore, there can be a large number of nucleic acids each encoding an amino acid sequence, depending on its amino acid sequence. In nature, the gene does not exist in a stable form, and it is not rare that the mutation of its nucleotide sequence takes place. The mutation on the gene may not affect the amino acid sequence to be encoded (so-called "silent mutation"), in which case it can be said that different nucleic acids encoding the same amino acid sequence have been produced. There cannot, therefore, be denied the possibility that even when the nucleic acid encoding a particular amino acid sequence is isolated, a variety of nucleic acids encoding the same amino acid sequence are produced with generation passage of the organism containing them. Moreover, it is not difficult to artificially produce a variety of the nucleic acids encoding the same amino acid sequence by means of various genetic engineering techniques. For example, when a codon used on a natural nucleic acid encoding the desired protein is low in usage in the host in the production of a protein by genetic engineering, the expression level of the protein is sometimes insufficient. In such a case, high expression of the desired protein is achieved by artificially converting the codon into another one of commonly used in the host without changing the amino acid sequence encoded (for example, Japanese Examined Patent Publication No. Hei 7-102146). It is of course possible to artificially produce a variety of nucleic acids encoding a particular amino acid sequence, and the nucleic acids can be also produced in nature. Therefore, the present invention includes a nucleic acid, as long as the nucleic acid encodes an amino acid sequence disclosed in the present specification, even if it is not a nucleic acid having same nucleotide sequence disclosed in the present specification.

[0081] In fact, in the present invention, nucleic acids of which nucleotide sequences are slightly different but the amino acid sequence encoded is identical is obtained. Although R at base number 1784 is A, and K at base number 1985 is T in the nucleotide sequence as shown in SEQ ID NO: 2 in Sequence Listing of which the nucleotide sequence is contained in a nucleotide sequence for cDNA of CA13 gene, there is obtained cDNA in which R at base number 1784 is G, and K at base number 1985 is T; and a nucleic acid in which R at base number 1784 is A, and K at base number 1985 is G in the nucleotide sequence as shown in SEQ ID NO: 2 in Sequence Listing. However, the differences of the nucleotide sequence at these two sites do not affect the amino acid sequence encoded in base numbers 1698 to 1850 in the nucleotide sequence as shown in SEQ ID NO: 2 in Sequence Listing, and each peptide encoded by the above three kinds of nucleic acids has the amino acid sequence as shown in SEQ ID NO: 18 in Sequence Listing.

[0082] Among the cDNAs for novel genes of the present invention, cDNA for CA11 gene has the nucleotide sequence as shown in SEQ ID NOs: 1 and 66; cDNA for CA13 gene has the nucleotide sequence as shown in SEQ ID NO: 2; cDNA for GG33 gene has the nucleotide sequence as shown in SEQ ID NO: 9; cDNA for GC35 gene has the nucleotide sequences as shown in SEQ ID NOs: 11 and 67; cDNA for GC36 gene has the nucleotide sequences as shown in SEQ ID NOs: 12, 15, 16 and 68; and cDNA for CA42 gene has the nucleotide sequences as shown in SEQ ID NO: 13. Here, the nucleotide sequence as shown in SEQ ID NO: 66 comprises the nucleotide sequences as shown in SEQ ID NO: 1; the nucleotide sequence as shown in SEQ ID NO: 67 comprises the nucleotide sequences as shown in SEQ ID NO: 11; and the nucleotide sequence as shown in SEQ ID NO: 68 comprises the nucleotide sequences as shown in SEQ ID NOs: 12, 15 and 16.

[0083] Moreover, the novel nucleic acids of the present invention include a nucleic acid capable of hybridizing with the nucleic acid having the nucleotide sequences as shown in any one of SEQ ID NOs: 66, 2, 9, 67, 13 as well as 68 in Sequence Listing under stringent conditions, wherein the nucleic acid is complementary to a nucleotide sequence for mRNA capable of changing an expression level by canceration. In fact, the nucleic acid having the above properties is obtained in the present invention. For instance, there are obtained the above nucleic acid of which nucleotide sequence is slightly different but an encoded amino acid sequence is identical.

[0084] In addition, the fifth invention of the present invention provides an antibody against the peptide encoded by the novel nucleic acid in the present invention. The above antibody can be utilized for detection of the cancer cell described above.

EXAMPLES

[0085] The present invention will be described more concretely hereinbelow by means of the working examples, without intending to restrict the scope of the present invention thereto.

Example 1

Analysis of Cancer-Associated Gene

[0086] 1) Confirmation of mRNA Which can Serve as Index for Detecting Cancer

[0087] There was confirmed whether or not mRNA of which expression level was changed by canceration was present by DD method comprising comparing the expression of mRNA of a cancerated lesion tissue with that of a control normal tissue of a stomach as detailed below.

[0088] First, from each of a cancer tissue and a control normal tissue of a stomach excised from a patient with an advanced, poorly-differentiated adenocarcinoma, RNA was extracted with TRIzol.TM. reagent (manufactured by Gibco BRL) to obtain a crude RNA sample. A 50 .mu.g portion of the crude RNA sample thus obtained was reacted with 10 units of DNaseI (manufactured by Takara Shuzo Co., Ltd.) at 37.degree. C. for 30 minutes in the presence of 5 mM MgCl.sub.2 as a final concentration and 20 units of RNase inhibitor (manufactured by Takara Shuzo Co., Ltd.) to remove genomic DNA, Using this RNA, RT-PCR was carried out with Differential Display.TM. Kit (manufactured by Display Systems) and Enzyme Set-DD (manufactured by Takara Shuzo Co., Ltd.) in accordance with the procedures described in the instruction attached to the kit.

[0089] Specifically, reverse transcription reaction was carried out per one reaction by mixing 200 ng of the crude RNA sample treated with the above DNase, and any one kind of the oligonucleotides having the nucleotide sequences as shown in SEQ ID NOs: 56 to 64 in Sequence Listing as a primer, thereafter heat-treating at 70.degree. C. for 10 minutes, subjecting to rapid cooling, and subsequently reacting with AMV reverse transcriptase at 55.degree. C. for 30 minutes. Other downstream primers were individually reacted in the same manner to prepare 9 kinds of single-stranded cDNA samples in total.

[0090] In the subsequent nucleic acid amplification reaction by PCR, a nucleic acid amplification was carried out by PCR using each of the 9 kinds of single-stranded cDNAs described above as a template, an oligo(dT) primer identical to that used in the reverse transcription as a downstream primer, and any one kind of the 10mer-oligonucleotides in the kit which had the nucleotide sequences as shown in SEQ ID NOs: 50 to 55 in Sequence Listing as an upstream primer, to prepare 54 kinds of amplified DNA samples in total.

[0091] The PCR was carried out by adding 3 mM MgCl.sub.2, 15 .mu.M each of dATP, dGTP, dCTP and dTTP as substrates, and 1.85 kBq/ml [.alpha.-.sup.33P]-dATP (manufactured by Amersham) as a labelling compound, and reacting for 40 cycles, wherein one cycle consists of at 94.degree. C. for 30 seconds, at 40.degree. C. for 60 seconds and at 72.degree. C. for 60 seconds. After termination of the reaction, an equivolume of 95% formamide was added, and the mixture was subjected to thermal denaturation at 90.degree. C. for 2 minutes to obtain a sample for electrophoresis. The electrophoresis was carried out on a 7 M urea-denatured 5% polyacrylamide gel, and autoradiography yielded a fingerprint comprising a large number of bands, wherein there were found to be bands having different signal intensities between the autoradiogram of the cancer tissue and that of the control normal tissue.

[0092] As one example, the results where D4 having the nucleotide sequence as shown in SEQ ID NO: 59 in Sequence Listing was used as a downstream primer, and U1 having the nucleotide sequence as shown in SEQ ID NO: 50 was used as an upstream primer are shown in FIG. 1. Specifically, FIG. 1 is a reproduced photograph of an autoradiogram showing electrophoretic patterns of the DNA fragment obtained when a cancer-associated gene was detected by the DD method. Here, in FIG. 1, 1N is a lane wherein on an acrylamide gel was electrophoresed an amplified DNA fragment obtained by using as a template a crude RNA sample obtained from a normal tissue of a patient with a poorly-differentiated adenocarcinoma-type gastric cancer; and 1T is a lane wherein on an acrylamide gel was electrophoresed an amplified DNA fragment obtained by using as a template a crude RNA sample obtained from a cancer tissue of the same patient with the poorly-differentiated adenocarcinoma-type gastric cancer, respectively. A band having a stronger signal intensity in the autoradiogram obtained from the control normal tissue than in the autoradiogram of the cancer tissue sample was found at the position corresponding to about 750 bp as indicated with ".fwdarw." in FIG. 1. The present inventors named the gene expressing the mRNA which causes the band to show this difference in the intensity as CA11.

[0093] Table 3 showed the combination of the upstream and downstream primers for detecting the difference in the expression level of each mRNAs by means of the DD method, an the approximate size of an amplified DNA fragment, and the difference in the level of the amplified DNA obtained by RT-PCR from the cancer tissue and the control normal tissue for each of genes which was detected by the present inventors with the DD method and named. In the column of the primers in Table 3, a symbol of a combination of an alphabet and numerals indicates the name of a primer, and a number within a parenthesis attached to each symbol indicates SEQ ID NO: showing the nucleotide sequence of the primer in Sequence Listing.

3TABLE 3 Approximate Size of Difference in Name of Primer Pair Amplified Amount of Gene Upstream Downstream DNA fragment DNA fragment CA11 U1 (50) D4 (59) 750 bp Cancer Tissue < Normal Tissue CA13 U1 (50) D4 (59) 620 bp Cancer Tissue > Normal Tissue CC24 U2 (51) D5 (60) 800 bp Cancer Tissue > Normal Tissue GG24 U2 (51) D9 (61) 480 bp Cancer Tissue > Normal Tissue AG26 U2 (51) D3 (58) 550 bp Cancer Tissue < Normal Tissue GC31 U3 (52) D8 (63) 440 bp Cancer Tissue > Normal Tissue GC32 U3 (52) D8 (63) 310 bp Cancer Tissue > Normal Tissue GC33 U3 (52) D8 (63) 300 bp Cancer Tissue > Normal Tissue GG33 U3 (52) D9 (64) 410 bp Cancer Tissue > Normal Tissue CC34 U3 (52) D5 (60) 290 bp Cancer Tissue > Normal Tissue GC35 U3 (52) D8 (63) 210 bp Cancer Tissue < Normal Tissue GC36 U3 (52) D8 (63) 190 bp Cancer Tissue < Normal Tissue CA42 U4 (53) D4 (59) 660 bp Cancer Tissue > Normal Tissue CC62 UG (55) D5 (60) 380 bp Cancer Tissue < Normal Tissue

[0094] 2) Identification of mRNA Serving as Index for Detecting Cancer

[0095] There was investigated whether a change in an expression level of the mRNA used as a template for an amplified DNA fragment derived from each of the genes shown in Table 3 as confirmed by the DD method in Section 1) described above was truly associated with canceration.

[0096] First, the studies were made by means of Northern hybridization. Specifically, there was studied whether the difference in the expression levels of the mRNA of a cancer-associated gene expressed in a cancer tissue and that in a control normal tissue could be detected by using each amplified DNA fragment obtained by the method in Section 1) described above as a probe.

[0097] The probe for the detection was prepared as follows. Specifically, from the acrylamide gel on which the amplified DNA fragment obtained by the DD method in Section 1) described above was electrophoresed, the region containing each amplified DNA fragment shown in Table 3 was cut out, and thereto was added 100 .mu.l of water and subjected to a heat extraction to collect individually each DNA fragment contained. Re-amplification by PCR was carried out by using each DNA fragment individually as a template, with a combination of the upstream and downstream primers used to obtain each DNA fragment shown in Table 3. Further, about 100 ng of each amplified DNA fragment was labeled with .sup.32P using Random Primer DNA Labeling Kit (manufactured by Takara Shuzo Co., Ltd.) to prepare 14 kinds of probes for detection. Separately from above, mRNA for .beta.-actin gene was selected as a positive control of a crude RNA extracted from each tissue, and the synthetic oligonucleotide having the nucleotide sequence as shown in SEQ ID NO: 65 in Sequence Listing was labeled in the same manner with .sup.32P to obtain a probe for detecting mRNA for .beta.-actin gene. Thereafter, the probe for detection described above was mixed together with herring sperm DNA so as to have a concentration of 100 .mu.g/ml, and then heat-denatured. To the resulting reaction mixture was added hybridization buffer (50% formamide, 0.65 M NaCl, 0.1M Na-Pipes, 5.times. Denhardt's reagent, 0.1% SDS, 5 mM EDTA) to prepare 15 kinds of probe solutions for detection in Northern hybridization.

[0098] Northern hybridization was carried out as follows. First, 20 .mu.g per well of a crude RNA sample extracted from each of a cancer tissue and a control normal tissue from the patient with a gastric cancer prepared as described above was subjected to electrophoresis on a formalin-denatured 1% agarose gel and blotted on a Hybond N.sup.+ membrane (manufactured by Amersham). Subsequently, a blotted membrane and hybridization buffer added with heat-denatured herring sperm DNA so as to have final concentration of 100 .mu.g/ml were added to a Hybri Bag (manufactured by COSMO BIO). The resulting composition was allowed to stand at 42.degree. C. for 2 hours, and then the buffer was discarded to prepare a membrane with pre-hybridization treatment. After preparing 15 such membranes as above, to each membrane was added each of the 15 kinds of detection probe solutions for Northern hybridization described above, and hybridization was carried out at 42.degree. C. for 16 hours. Thereafter, each blotted membrane was taken from the Hybri Bag, and rinsed with washing solution I (2.times.SSC, 0.2% sodium pyrophosphate, 0.1% SDS) at 42.degree. C. for 20 minutes, and then with washing solution II (0.5.times.SSC, 0.2% sodium pyrophosphate, 0.1% SDS) at 42.degree. C. for 20 minutes. Incidentally, rinsing with washing solution II was repeated twice with replacing the washing solution. The membrane after rinsing was wrapped with a plastic film and exposed for one day and night to a high-sensitivity X-ray film (manufactured by Kodak). From the signal intensity in the resultant autoradiogram, the expression level in the cancer tissue was compared with that of the control normal tissue.

[0099] As one example, the results of the detection of mRNA for CA11 gene are shown in FIG. 2. In FIG. 2, 1N is a lane wherein on an agarose gel was electrophoresed a crude RNA sample obtained from a normal tissue of a patient with a poorly-differentiated adenocarcinoma-type gastric cancer; and 1T is a lane wherein on an agarose gel was electrophoresed a crude RNA sample obtained from a cancer tissue of the same patient with the poorly-differentiated adenocarcinoma-type gastric cancer. (a) shows results obtained with a probe for detecting CA11, and (b) shows results obtained with a probe for detecting .beta.-actin. Since both of the 1N and the 1T exhibited the signals obtained with the probes for detecting .beta.-actin as shown in (b), it is clear that in the both samples the RNA is extracted without undergoing degradation excessively. On the other hand, a clear signal as indicated by ".fwdarw." at a position near 1.1 kb was present only in lane 1N but no signals were present in lane 1T as shown in (a). Therefore, it was found that the CA11 was a gene of which expression level was reduced owing to canceration. Similarly, CC62 exhibited a band at about 2.6 kb only on the autoradiogram derived from the control normal stomach tissue. GC31, GC32 and CC34 showed the bands at about 1.0 kb, about 1.6 kb and about 1.7 kb, respectively, and in any of these genes more intensive signal was obtained for the crude RNA samples prepared from the gastric cancer tissues as compared to that of the crude RNA samples prepared from the control normal stomach tissues. Incidentally, the signal intensity was determined by measuring each band of an autoradiogram by a densitometer. Subsequently, IOD of each band obtained on the autoradiogram was calculated with FMBIO-100 (manufactured by Hitachi Soft Engineering), and an index was calculated by the equation as shown below to determine whether or not a gene was a cancer-associated gene.

[Index Value]=(X.times..beta.Y)/(Y.times..beta.X) Equation 2:

[0100] In the above equation, each symbol expresses the following value:

[0101] X: IOD of a band derived from mRNA for the gene shown in Table 3 obtained from a gastric cancer tissue;

[0102] Y: IOD of a band derived from mRNA for the gene shown in Table 3 obtained from a control normal stomach tissue;

[0103] .beta.X: IOD of a band derived from mRNA for .beta.-actin gene obtained from a gastric cancer tissue; and

[0104] .beta.Y: IOD of a band derived from mRNA for .beta.-actin gene obtained from a control normal stomach tissue.

[0105] The comparison on the expression level was made by carrying out RT-PCR with respect to each of the genes CA13, CC24, GG24, AG26, GC33, GG33, GC35, GC36 and CA42 in which no signals were obtained by Northern hybridization. In order to design a primer for the nucleic acid amplification reaction in the RT-PCR, each DNA fragment used as a probe in Northern hybridization was subjected to a direct sequencing by PCR, or was cloned by a TA cloning procedure and then sequenced by a dideoxy method, whereby determining its nucleotide sequence. The nucleotide sequences of primers designed based on the resulting nucleotide sequence information and used in the RT-PCR with mRNA derived from each of the genes as a template are as shown in any of SEQ ID NOs: 22 to 29, 34 to 37 and 40 to 45 in Sequence Listing. Table 2 shows the genes together with the corresponding primers used to confirm their expression.

[0106] A change in an expression level of mRNA by RT-PCR was confirmed by a DNaseI treatment of a crude RNA sample obtained from each of the cancer tissue and the control normal tissue of a patient with a gastric cancer prepared by the method in Section 1) described above. Thereafter, RT-PCR was carried out in a 100 .mu.l reaction system of 40 ng of each treated sample with TaKaRa RNA PCR Kit Ver. 2.1 according to the procedures described in the instruction attached to the kit. Specifically, 40 ng of a crude RNA sample as a template and an oligo(dT) primer (final concentration: 2.5 .mu.M) as a downstream primer were used to prepare a reverse transcription reaction mixture (10 mM Tris-HCl, pH 8.3, 50 mM KCl, 5 mM MgCl.sub.2, 1 mM each of dNTPs, 100 units of RNase inhibitor, 25 units of AMV reverse transcriptase), and the reverse transcription reaction was carried out at 30.degree. C. for 10 minutes, and at 55.degree. C. for 20 minutes and then at 95.degree. C. for 5 minutes. Each 10 .mu.l of the reverse transcription reaction mixture was added to each 40 .mu.l of 10 kinds of PCR reaction mixtures (final concentration: 10 mM Tris-HCl, pH 8.3, 50 mM KCl, 2.5 mM MgCl.sub.2, 1.25 units of TaKaRa Taq DNA polymerase) individually containing the primer pairs (0.2 .mu.M) for detecting each of the mRNAs for the genes of CA13, CC24, GG24, AG26, GC33, GG33, GC35, GC36, CA42 and .beta.-actin to make up a volume of 50 .mu.l. One cycle after the pre-incubation at 94.degree. C. for 2 minutes in PCR consisted of the step of incubation at 94.degree. C. for 30 seconds, at 55.degree. C. for 60 seconds, and then at 72.degree. C. for 60 seconds. The amount of an amplified DNA product was quantified by subjecting the amplified DNA product to agarose gel electrophoresis, staining the gel with ethidium bromide, calculating the IOD of each band on the fluorescent image with FMBIO-100 to obtain an index for determining whether or not a gene is a cancer-associated gene from Equation 2 shown above.

[0107] The results of Northern hybridization method and RT-PCR described above, and the patterns of the changes in the expression owing to the canceration of each of the genes evident from these results were shown in Table 4. In the column of the patterns of the changes in the expression, a gene of which expression was amplified owing to canceration was indicated with ".Arrow-up bold." and a gene of which expression was suppressed owing to canceration was indicated with ".dwnarw.". Specifically, it was determined in Table 4 that a gene having an index value greater than 1 is a gene of which expression level was increased owing to canceration, and a gene having an index value less than 1 is a gene of which expression level was reduced owing to canceration. As a result, there were clarified that the genes CA13, CC24, GG24, GC31, GC32, GC33, GG33, CC34 and CA42 were those of which expression levels were increased owing to canceration, and the genes CA11, AG26, GC35, GC36 and CC62 were those of which expression levels were reduced owing to canceration.

4 TABLE 4 Method of Patterns of Name of Determining Changes in Gene Index Value Index Value Expression CA11 0.036 A .dwnarw. CA13 6.3 B .Arrow-up bold. CC24 2.0 B .Arrow-up bold. GG24 2.8 B .Arrow-up bold. AG26 0.52 B .dwnarw. GC31 3.1 A .Arrow-up bold. GC32 3.6 A .Arrow-up bold. GC33 2.3 B .Arrow-up bold. GG33 2.2 B .Arrow-up bold. CC34 15 A .Arrow-up bold. GC35 0.0046 B .dwnarw. GC36 0.048 B .dwnarw. CA42 1.9 B .Arrow-up bold. CC62 0.56 A .dwnarw.

[0108] (note) In the table, "A" represents a determination from the autoradiogram in Northern hybridization, and "B" represents a determination based on the electrophoretic gel image of the amplified product by RT-PCR.

[0109] 3) Acquisition of Cancer-Associated Gene cDNA

[0110] A cDNA fragment of each of these cancer-associated genes was then cloned. First, a cDNA library was prepared by fractionating mRNA from a crude RNA sample derived from a cancer tissue or a normal tissue, which was prepared by the method described in Section 1) with mRNA Purification Kit (manufactured by Pharmacia) on an oligo(dT) column, and plating a phage and a host cell XLI-Blue MRF' at a cell density of about 40,000 plaques per rectangular plate in a 10 cm.times.14 cm plate using a ZAP-cDNA synthesis kit (manufactured by Stratagene) according to the protocols attached to the kit. Thereafter, phage particles were transferred onto a Hybond N.sup.+ membrane, and screening was carried out by means of plaque hybridization using a probe identical to that used in Northern hybridization described in Section 2), whereby finding a Uni-ZAP XR clone containing a desired cDNA gene. This recombinant Uni-ZAP XR clone was converted into a pBluescript phagemide by means of an in vitro excision method. The nucleotide sequence of a DNA fragment incorporated into this recombinant phagemide was determined with a fluorescent DNA sequencer (manufactured by ABI). The nucleotide sequences obtained from connection of the nucleotide sequences of the cDNA fragments contained in the cDNA library by means of walking based on the nucleotide sequence of the DNA fragment incorporated into the phagemide are shown in SEQ ID NOs: 2 to 10, 13, 14 and 68 in Sequence Listing. Since cDNAs for CA11 and GC35 obtained above have smaller sizes of mRNA than the size deduced from the results of Northern hybridization, it is highly possible that 5'-terminal portion in each of the above cDNAs is deleted. Therefore, in order to obtain nearly a whole length of cDNA, cDNA clones were isolated by again screening using a commercially available human gastric cDNA library (manufactured by Takara Shuzo Co., Ltd.) and a probe which was newly prepared based on proximal 5'-terminal region of the sequence obtained above. By means of the above screening, there were obtained a cDNA clone in which base numbers 1 to 76 of SEQ ID NO: 66 in Sequence Listing were added to 5'-terminal of the nucleotide sequence of SEQ ID NO: 1 in a case of CA11; and a cDNA clone in which base numbers 1 to 2530 of SEQ ID NO: 67 in Sequence Listing were added to 5'-terminal of the nucleotide sequence of SEQ ID NO: 11 in Sequence Listing in a case of GC35.

[0111] Each of the nucleotide sequences thus obtained was subjected to a homology search with known gene cDNA nucleotide sequences recorded in Genebank by using BLAST program [Altschul, S. F., Journal of Molecular Biology, 215, 403-410, (1990)]. As a result, there have not been reported any sequences corresponding to the cDNA of each of CA11, CA13, GC36, GG33, GC35, GC36 and CA42, so that these genes were determined to be novel genes. Further, as a result of searching an open reading frame for a gene product based on the nucleotide sequence contained in each of the gene cDNAs of CA11, CA13, GC36 and CA42, it was deduced that CA11 cDNA encodes the amino acid sequence as shown in SEQ ID NO: 69 in Sequence Listing, CA13 cDNA encodes the amino acid sequence as shown in SEQ ID NO: 18 in Sequence Listing, GC36 cDNA encodes the amino acid sequence as shown in SEQ ID NO: 70 in Sequence Listing, and CA42 cDNA encodes the amino acid sequence as shown in SEQ ID NO: 19 in Sequence Listing, respectively. On the other hand, CC24 corresponded to cytochrome c oxidase subunit I gene, AG26 to p190-B gene, GC31 to cytochrome c oxidase subunit II gene, GC32 to cytochrome b gene, GC33 to integrin a 6 subunit gene, GG24 to F1-ATPase .beta. subunit gene, and CC62 to lactoferrin gene. Moreover, the nucleotide sequence region as shown in SEQ ID NO: 10 in Sequence Listing for the CC34 cDNA was found to be different from a partial region of the cDNA encoding a mitochondrial 16SrRNA by 7 bases.

[0112] Incidentally, in the screening of the cDNA library using as a probe an amplified DNA fragment derived from CC34, in addition to the cDNA clone having the nucleotide sequence as shown in SEQ ID NO: 10 in Sequence Listing, an additional, different kind of positive cDNA clone was obtained. There was clarified that the nucleotide sequence of this cDNA had a nucleotide sequence in which T at base number 935 in the nucleotide as shown in SEQ ID NO: 10 in Sequence Listing was substituted with A, and 6 bases consisting of GTTAAG at the 3'-terminal were deleted, of which 1540 bases out of the entire 1546 bases of the entire nucleotide sequence had an identical sequence to a partial region of the cDNA encoding a mitochondrial 16SrRNA.

Example 2

Confirmation of Change in Gene Expression in Cancer Tissue

[0113] With respect to each cancer-associated gene confirmed in Example 1, the association of the expression of this gene with the canceration of cells was evaluated by using a cancer tissue different from that used in Example 1.

[0114] 1) Confirmation of Change in Gene Expression in Cancer Tissue of Patient with Signet Ring Cell Gastric Cancer

[0115] Using a crude RNA sample prepared in the same manner as in Section 1) of Example 1 from each of a cancer tissue and a control normal tissue excised from a patient with a signet ring cell gastric cancer who was different from the one provided the tissues used in Sections 1) and 2) of Example 1, the expression levels in the cancer tissue and the normal tissue were compared with respect to each of the 14 kinds of cancer-associated genes clarified in Section 3) of Example 1 by using the expression level of the mRNA as an index by means of carrying out Northern hybridization or RT-PCR described in Section 2) of Example 1. As one example, the results of the detection of mRNA for CA11 gene by RT-PCR method are shown in FIG. 3. Specifically, FIG. 3 is a photograph of a fluorescent image of the electrophoresis of a DNA fragment obtained when a change in an expression level of a cancer-associated gene is detected by RT-PCR method. The reaction conditions of the RT-PCR were according to the method described in Section 2) of Example 1, with setting two patterns in the number of the cycles of the PCR, i.e., 25 and 30. In FIG. 3, (a) shows the results of the detection of the expression of a cancer-associated gene CA11, and (b) shows the results of the confirmation of the expression of .beta.-actin as a positive control. In FIG. 3, 2T is an amplified DNA fragment obtained by using as a template a crude RNA sample extracted from a gastric cancer tissue of the patient with a signet ring cell gastric cancer, and 2N is an amplified DNA fragment obtained by using as a template a crude RNA sample extracted from a normal gastric tissue of the patient with the signet ring cell gastric cancer. Also, the numerals "25" and "30" in FIG. 3 are the numbers of the cycles of the nucleic acid amplification in the RT-PCR method. Table 5 shows the results of calculated IODs of the bands on the fluorescent image shown in FIG. 3. Incidentally, each index shown in Table 5 was calculated from Equation 2 described in Section 2) of Example 1.

5 TABLE 5 Number of Cycles 25 30 Sample Name 2T 2N 2T 2N CA11 365 31118 6345 61742 .beta.-Actin 710 562 25115 20425 Index Value 0.0093 0.083

[0116] In Table 5, since the IOD values of the band derived from .beta.-actin obtained on the fluorescent image of 2T and 2N were of the similar level in the PCR cycles of 25 and 30, there was clarified that RNAs could be similarly extracted from all samples. However, since the index was less than 1 for both the 25 and 30 cycles of the PCR, there was clarified that CA11 was a gene of which expression level was reduced owing to canceration even also with patients with a signet ring cell gastric cancer. With respect to 13 kinds of cancer-associated genes other than CA11, there was found to be a change in the expression level in the same manner as in Section 2) of Example 1, so that there was clarified that the change in the expression level of each of the 14 kinds of genes as clarified in Section 3) of Example 1 was not a change peculiar to the tissue of the patient tested in Section 1) of Example 1.

Example 3

[0117] Construction of Kit for Detecting Cancer

[0118] A kit for detecting a cancer utilizing RT-PCR method comprising the following components was constructed.

[0119] Specifically, a kit comprises DNaseI, AMV reverse transcriptase, RNase inhibitor, 10.times.RT-PCR buffer (100 mM Tris-HCl, pH 8.3, 500 mM KCl), 25 mM MgCl.sub.2, and a mixture of 2.5 mM each of dATP, dGTP, dCTP and dTTP, an oligo(dT) primer, Taq DNA polymerase, a primer pair specific to each of the genes and a primer pair for amplifying .beta.-actin gene as a positive control shown in Table 2. In the column of the primer pair in Table 2, a symbol of a combination of an alphabet and a numeral indicates the name of a primer, and a number within a parenthesis following each symbol indicates SEQ ID NO: showing the nucleotide sequence of the primer in Sequence Listing.

[0120] According to the present invention, it is made possible to simply and rapidly detect cancer. In addition, the presence of a novel nucleic acid associated with the cancer is elucidated.

Equivalent

[0121] Those skilled in the art will recognize, or be able to ascertain using simple routine experimentation, many equivalents to the specific embodiments of the invention described in the present specification. Such equivalents are intended to be encompassed in the scope of the following claims.

Sequence CWU 1

1

70 1 738 DNA Homo sapiens any n or Xaa = unknown 1 cctctgtcca ctgctttcgt gaagacaaga tgaagttcac aattgtcttt gctggacttc 60 ttggagtctt tctagctcct gcccttgcta actataatat caacgtcaat gatgacaaca 120 acaatgctgg aagtgggcag cagtcagtga gtgtcaacaa tgaacacaat gtggccaatg 180 ttgacaataa caacggatgg gactcctgga attccatctg ggattatgga aatggctttg 240 ctgcaaccag actctttcaa aagaagacat gcattgtgca caaaatgaac aaggaagtca 300 tgccctccat tcaatccctt gatgcactgg tcaaggaaaa gaagcttcag ggtaagggac 360 caggaggacc acctcccaag ggcctgatgt actcagtcaa cccaaacaaa gtcgatgacc 420 tgagcaagtt cggaaaaaac attgcaaaca tgtgtcgtgg gattccaaca tacatggctg 480 aggagatgca agaggcaagc ctgttttttt actcaggaac gtgctacacg accagtgtac 540 tatggattgt ggacatttcc ttctgtggag acacggtgga gaactaaaca attttttaaa 600 gccactatgg atttagtcgt ctgaatatgc tgtgcagaaa aaatatgggc tccagtggtt 660 tttaccatgt cattctgaaa tttttctcta ctagttatgt ttgatttctt taagtttcaa 720 taaaatcatt tagcattg 738 2 2042 DNA Homo sapiens any n or Xaa = unknown 2 ccgtgacaac actcctgtca tattggagtc caaaacttga attctgggtt gaatttttta 60 aaaatcaggt accacttgat ttcatatggg aaattgaagc aggaaatatt gagggcttct 120 tgatcacaga aaactcagaa gagatagtaa tgctcaggac aggagcggca gccccagaac 180 aggccactca tttagaattc tagtgtttca aaacactttt gtgtgttgta tggtcaataa 240 catttttcat tactgatggt gtcattcacc cattaggtaa acattccctt ttaaatgttt 300 gtttgttttt tgagacagga tctcactctg ttgccagggc tgtagtgcag tggtgtgatc 360 atagctcact gcaacctcca cctcccaggc tcaagcctcc cgaatagctg ggactacagg 420 cgcacaccac catccccggc taatttttgt attttttgta gagacggggt tttgccatgt 480 tgccaaggct ggtttcaaac tcctggactc aagaaatcca cccacctcag cctcccaaag 540 tgctaggatt acaggcatga gccactgcgc ccagccctta taaatttttg tatagacatt 600 cctttggttg gaagaatatt tataggcaat acagtcaaag tttcaaaata gcatcacaca 660 aaacatgttt ataaatgaac aggatgtaat gtacatagat gacattaaga aaatttgtat 720 gaaataattt agtcatcatg aaatatttag ttgtcatata aaaacccact gtttgagaat 780 gatgctactc tgatctaatg aatgtgaacg tgtagatgtt ttgtgtgtat ttttttaaat 840 gaaaactcaa aataagacaa gtaatttgtt gataaatatt tttaaagata actcagcatg 900 tttgtaaagc aggatacatt ttactaaaag gttcattggt tccaatcaca gctcataggt 960 agagcaaaga aagggtggat ggattgaaaa gattagcntn tgtntcggtg gcaggttccc 1020 acntcgcaag caattggaaa caaaantttn ggggagtttt attttgcatt ngggtgtgtt 1080 ttatgttaag caaaacatan tttagaanca aatgaaaaag gcaattgaaa atcccagnta 1140 tttcacctag atggnatagc caccntgagc agaacttngt gatgnttcat tctgnggaat 1200 tttgtgcttn ctactgtata gtgcatgtgg tgtaggttac tctaactggt tttgtngacg 1260 taaacattta aagtgttata ttttttataa aaatgtttat ttttaatgat atgagaaaaa 1320 ttttgttagg ccacaaaaac actgcactgt gaacatttta gaaaaggtat gtcagactgg 1380 gattaatgac agcatgattt tcaatgactg taaattgcga taaggaaatg tactgattgc 1440 caatacaccc caccctcatt acatcatcag gacttgaagc caagggttaa cccagcaagc 1500 tacaaagagg gtgtgtcaca ctgaaactca atagttgagt ttggctgttg ttgcaggaaa 1560 atgattataa ctaaaagctc tctgatagtg cagagactta ccagaagaca caaggaattg 1620 tactgaagag ctattacaat ccaaatattg ccgtttcata aatgtaataa gtaatactaa 1680 ttcacagagt attgtaaatg gtggatgaca aaagaaaatc tgctctgtgg aaagaaagaa 1740 ctgtctctac cagggtcaag agcatgaacg catcaataga aagractcgg ggaaacatcc 1800 catcaacagg actacacact tgtatataca ttcttgagaa cactgcaatg tgaaaatcac 1860 gtttgctatt tataaacttg tccttagatt aatgtgtctg gacagattgt gggagtaagt 1920 gattcttcta agaattagat acttgtcact gcctatacct gcagctgaac tgaatggtac 1980 ttcgkatgtt aatagttgtt ctgataaatc atgcaattaa aataaagtga tgcaacatct 2040 tg 2042 3 1539 DNA Homo sapiens any n or Xaa = unknown 3 atgttcgccg accgttgact attctctaca aaccacaaag acattggaac actataccta 60 ttattcggcg catgagctgg agtcctaggc acagctctaa gcctccttat tcgagccgag 120 ctgggccagc caggcaacct tctaggtaac gaccacatct acaacgttat cgtcacagcc 180 catgcatttg taataatctt cttcatagta atacccatca taatcggagg ctttggcaac 240 tgactagttc ccctaataat cggtgccccc gatatggcgt tcccccgcat aaacaacata 300 agcttctgac tcttacctcc ctctctccta ctcctgctcg catctgctat agtagaggcc 360 ggagcaggaa caggttgaac agtctaccct cccttagcag ggaactactc ccaccctgga 420 gcctccgtag acctaaccat cttctcctta cacctagcag gtgtctcctc tatcttaggg 480 gccatcaatt tcatcacaac aattatcaat ataaaacccc ctgccataac ccaataccaa 540 acgcccctct tcgtctgatc cgtcctaatc acagcagtcc tacttctcct atctctccca 600 gtcctagctg ctggcatcac tatactacta acagaccgca acctcaacac caccttcttc 660 gaccccgccg gaggaggaga ccccattcta taccaacacc tatcctgatt tttcggtcac 720 cctgaagttt atattcttat cctaccaggc ttcggaataa tctcccatat tgtaacttac 780 tactccggaa aaaaagaacc atttggatac ataggtatgg tctgagctat gatatcaatt 840 ggcttcctag ggtttatcgt gtgagcacac catatattta cagtaggaat agacgtagac 900 acacgagcat atttcacctc cgctaccata atcatcgcta tccccaccgg cgtcaaagta 960 tttagctgac tcgccacact ccacggaagc aatatgaaat gatctgctgc agtgctctga 1020 gccctaggat tcatctttct tttcaccgta ggtggcctga ctggcattgt attagcaaac 1080 tcatcactag acatcgtact acacgacacg tactacgttg tagctcactt ccactatgtc 1140 ctatcaatag gagctgtatt tgccatcata ggaggcttca ttcactgatt tcccctattc 1200 tcaggctaca ccctagacca aacctacgcc aaaatccatt tcgctatcat attcatcggc 1260 gtaaatctaa ctttcttccc acaacacttt ctcggcctat ccggaatgcc ccgacgttac 1320 tcggactacc ccgatgcata caccacatga aatatcctat catctgtagg ctcattcatt 1380 tctctaacag cagtaatatt aataattttc atgatttgag aagccttcgc ttcgaagcga 1440 aaagtcctaa tagtagaaga accctccata aacctggagt gactatatgg atgcccccca 1500 ccctaccaca cattcgaaga acccgtatac ataaaatct 1539 4 1807 DNA Homo sapiens any n or Xaa = unknown 4 gaattctttc ttcagcccat gtaaacatga aaataagggt taaaaatgac ttcattatgg 60 ggaaaaggga caggatgcaa attgttcaaa ttccgggtgg ccgctgctcc ggcctccggg 120 gccttgcgga gactcacccc ttcagcgtcg ctgcccccag ctcagctctt actgcgggcc 180 gtccgacggc ggtcccatcc tgtcagggac tatgcggcgc aaacatctcc ttcgccaaaa 240 gcaggcgccg ccaccgggcg catcgtggcg gtcattggcg cagtggtgga cgtccagttt 300 gatgagggac taccaccaat tctaaatgcc ctggaagtgc aaggcaggga gaccagactg 360 gttttggagg tggcccagca tttgggtgag agcacagtaa ggactattgc tatggatggt 420 acagaaggct tggttagagg ccagaaagta ctggattctg gtgcaccaat caaaattcct 480 gttggtcctg agactttggg cagaatcatg aatgtcattg gagaacctat tgatgaaaga 540 ggtcccatca aaaccaaaca atttgctccc attcatgctg aggctccaga gttcatggaa 600 atgagtgttg agcaggaaat tctggtgact ggtatcaagg ttgtcgatct gctagctccc 660 tatgccaagg gtggcaaaat tgggcttttt ggtggtgctg gagttggcaa gactgtactg 720 atcatggagt taatcaacaa tgtcgccaaa gcccatggtg gttactctgt gtttgctggt 780 gttggtgaga ggacccgtga aggcaatgat ttataccatg aaatgattga atctggtgtt 840 atcaacttaa aagatgccac ctctaaggta gcgctggtat atggtcaaat gaatcaacca 900 cctggtgctc gtgcccgggt agctctgact gggctgactg tggctgaata cttcagagac 960 caagaaggtc aagatgtact gctatttatt gataacatct ttcgcttcac ccaggctggt 1020 tcagaggtgt ctgcattatt gggccgaatc ccttctgctg tgggctatca gcctaccctg 1080 gccactgaca tgggcactat gcaggaaaga attaccacta ccaagaaggg atctatcacc 1140 tctgtacagg ctatctatgt gcctgctgat gacttgactg accctgcccc tgctactacg 1200 tttgcccatt tggatgctac cactgtactg tcgcgtgcca ttgctgagct gggcatctat 1260 ccagctgtgg atcctctaga ctccacctct cgtatcatgg atcccaacat tgttggcagt 1320 gagcattacg atgttgcccg tggggtgcaa aagatcctgc aggactacaa atccctccag 1380 gatatcattg ccatcctggg tatggatgaa ctttctgagg aagacaagtt gaccgtgtcc 1440 cgtgcacgga aaatacagcg tttcttgtct cagccattcc aggttgctga ggtcttcaca 1500 ggtcatatgg ggaagctggt acccctgaag gagaccatca aaggattcca gcagattttg 1560 gcaggtgaat atgaccatct cccagaacag gccttctata tggtgggacc cattgaagaa 1620 gctgtggcaa aagctgataa gctggctgaa gagcattcat cgtgaggggt ctttgtcctc 1680 tgtacttgtc tctctccttg cccctaaccc aaaaagcttc atttttctat ataggctgca 1740 caagagcctt gattgaagat atattctttc tgaacagtat ttaaggtttc caataaaatc 1800 ggaattc 1807 5 4992 DNA Homo sapiens any n or Xaa = unknown 5 ccgcggtgag ccgcgaggaa gagaggcgag cgagagtgga ggaggaggcg gcggctgcgg 60 gacggtcccc aggaatgtcg ctgccccccc cccccctgcc gttgaggagg agacggagga 120 gaccgacgtt gttagggaag atgatcccta tgatctgccg ctgtttctgc acagaaatga 180 gggaaataca aagaaccaaa tacagttcta aatttgggat ctgtattttg agatgatttt 240 attttcagaa tgagaagcat atctggttac ctttatgaat gtagagacat gagaagagag 300 ttatgatggc aaaaaacaaa gagcctcgtc ccccatccta taccatcagt atagttggac 360 tctctgggac tgaaaaagac aaaggtaact gtggagttgg aaagtcttgt ttgtgcaata 420 gatttgtacg ctcaaaagca gatgaatatt atccagagca tacttctgtg cttagcacca 480 ttgactttgg aggacgagta gtaaacaatg atcacttttt gtactggggt gacataatac 540 aaaatagtga agatggagta gaatgcaaaa ttcatgtcat tgaacaaaca gagttcattg 600 atgaccagac tttcttgcct catcggagta cgaatttgca accatatata aaacgtgcag 660 ctgcatctaa attgcagtca gcagaaaaac taatgtacat ttgcactgat cagctaggct 720 tagaacaaga ctttgaacag aagcaaatgc ctgaagggaa gctcaacgta gatggatttt 780 tattatgcat tgatgtaagt caaggatgca ataggaagtt tgatgatcaa cttaaatttg 840 tgaataacct ttttgtccag ttatcaaaat caaaaaaacc tgtaataata gcagcaacta 900 aatgtgatga atgcgtgggt cattatctta gagaagttca ggcatttgct tcaaataaaa 960 agaaccttct tgtagtggaa acactcagcg caataaaagt caacattgaa acatgtttta 1020 ctgcactggt acaaatgttg gataaaactc gtagcaagcc taaaattatt ccctatttgg 1080 atgcttataa aacacagaga caacttgttg tcacagcaac agataagttt gaaaaacttg 1140 tgcagactgt gagagattat catgcaactt ggaaaactgt tagtaataaa ttaaaaaatc 1200 atcctgatta tgaagaatac atcaacttag agggaacaag aaaggccaga aatacattct 1260 caaaacatat agaacaactt aaacaggaac atataagaaa aaggagagaa gagtatataa 1320 atactttacc aagagctttt aacactcttt tgccaaatct agaagagatt gaacatttga 1380 attggtcaga agctttgaag ttaatggaaa agagagcaga tttccagtta tgttttgtgg 1440 tgctagaaaa aactccttgg gatgaaactg accatataga caaaattaat gataggcgga 1500 ttccatttga cctcctgagc actttagaag ctgaaaaagt ctatcagaac catgtacagc 1560 atctgatatc cgagaagagg agggtggaaa tgaaggaaaa attcaaaaag actttggaaa 1620 aaattcaatt catttcacca gggcagccat gggaggaagt tatgtgcttt gttatggagg 1680 atgaagccta caaatatatc actgaggctg atagcaaaga ggtatatggt aggcatcagc 1740 gagaaatagt tgaaaaagcc aaagaagagt ttcaagaaat gctttttgag cattctgaac 1800 ttttttatga tttagatctt aatgcaacac ctagttcaga taaaatgagt gaaattcata 1860 cagttctgag tgaagaacct agatataaag ctttacagaa acttgcacct gatagggaat 1920 cccttctact taagcatata ggatttgttt atcatcccac taaagaaaca tgtcttagtg 1980 gccaaaattg tacagacatt aaagtggagc agttacttgc tagtagtctt ttacagttgg 2040 atcatggccg cttaagatta tatcacgata gtaccaatat agataaagtt aaccttttta 2100 ttttagggaa ggatggcctt gcccaagaac tagcaaatga gataaggaca caatccactg 2160 atgatgagta tgccttagat ggaaaaattt atgaacttga tcttcggccg gttgatgcca 2220 aatcgcctta ctttttgagt cagttatgga ctgccgcctt taaaccacat gggtgcttct 2280 gtgtatttaa ttccattgag tcattgagtt ttattgggga atttattggg aaaataagaa 2340 ctgaagcttc tcagatcaga aaagataaat acatggctaa tcttccattt acattaattc 2400 tggctaatca gagagattcc attagtaaga atctaccaat tctcaggcac caagggcagc 2460 agttggcaaa caagttgcaa tgtccttttg tagatgtacc tgctggtaca tatcctcgta 2520 aatttaatga aacccaaata aagcaagctc tcagaggagt attggaatca gttaaacaca 2580 atttggatgt ggtgagccca attcctgcca ataaggactt atcagaagct gacttgagaa 2640 ttgtcatgtg cgccatgtgt ggagatccat ttagtgtgga tcttattctt tcacccttcc 2700 ttgattctca ttcttgcagt gctgctcaag ctggacagaa taattcccta atgcttgata 2760 aaatcattgg tgaaaaaagg aggcgaatac agatcacaat attatcatac cactcttcaa 2820 ttggagtaag aaaagatgaa ctagttcatg ggtatatatt agtttactct gcaaaacgga 2880 aagcttcgat gggaatgctt cgagcatttc tatcagaagt tcaagacacc attcctgtac 2940 agctggtggc agttactgac agccaagcag atttttttga aaatgaggct atcaaagagt 3000 taatgactga aggagaacac attgcaactg agatcactgc taaatttaca gcactgtatt 3060 ctttatctca gtatcatcgg caaactgagg tctttactct gttttttagt gatgttctag 3120 agaaaaaaaa tatgatagaa aattcttatt tgtctgataa tacaagggaa tcaacccatc 3180 aaagtgaaga tgtttttcta ccatctccca gagactgttt tccctataat aactaccctg 3240 attcagatga tgacacagaa gcaccacctc cttatagtcc aattggggat gatgtacagt 3300 tgcttccaac acctagtgac cgttccagat atagattaga tttggaagga aatgagtatc 3360 ctattcatag taccccaaac tgtcatgacc atgaacgcaa ccataaagtg cctccaccta 3420 ttaaacctaa accagttgta cctaagacaa atgtgaaagc gctcgttcca aaccttttaa 3480 gggcaattga agctggtatt ggtaaaaatc caagaaagca gacttcccgg gtgcctttcg 3540 gtcctgaaga tatggatcct tcagataact atgcggaacc cattgataca attttcaaac 3600 agaagggcta ttctgatgag atttatgttg tcccagatga tagtcaaaat cgtattaaaa 3660 ttcgaaactc atttgtaaat aacacccaag gagatgaaga aaatgggttt tctgatagac 3720 ctcaaaaagt catggggaac ggaggccttc aaaatacaaa tataaatcta aaaccttgtt 3780 tagtaaagcc aagtcatact atagaagaac acattcagat gccagtgatg atgaggcttt 3840 caccacttct aaaaccaaaa agaaaaggaa gacatcgtgg aagtgaagaa gatccacttc 3900 tttctcctgt tgaaacttgg aaaggtggta ttgataatcc tgcaatcact tctgaccagg 3960 agttagatga taagaagatg aagaagaaaa cccacaaagt gaaagaagat aaaaaaaaga 4020 aaactaagaa cttcaatcca ccaacacgta gaaattggga aagtaattac tttgggatgc 4080 ccctccagga tctggttaca gctgagaagc ccataccact atttgttgag aaatgtgtgg 4140 aatttattga agatacaggg ttatgtaccg agagactcta ccgtgtcagc gggaataaaa 4200 ctgaccaaga aaatattcaa aagcagtttg ttcaagatca taatatcaat ctagtgtcaa 4260 tggaagtaac agtaaatgct gtagctggag cccttaaagc tttctttgca gatctgccag 4320 atcctttaat tccatattct cttcatccag aactattgga agcagcaaaa atcccggata 4380 aaacagaacg tcttcatgcc ttgaaagaaa ttgttaagaa atttcatcct gtaaactatg 4440 atgtattcag atacgtgata acacatctaa acagggttag tcagcaacat aaaatcaacc 4500 taatgacagc agacaactta tccatctgtt ttggccaacc cttgatgaga cctgatttga 4560 aatcgatgga gtttctgtct actactaaga ttcatcaatc tgttgttgaa acattcattc 4620 agcagtgtca gtttttcttt tacaatggag aaattgtaga aacgacaaac attgtggctc 4680 ctccaccacc ttcaaaccca ggacagttgg tggaaccaat ggtgccactt cagttgccgc 4740 caccattgca acctcagctg atacaaccac aattacaaac ggatcctctt ggtattatat 4800 gagtaggaag tgattgcaaa caggctggat ttggacaaaa agcaaatcta gacatgcatg 4860 tttcagggtt cagtagtata cttcatgttt catacagata attcacattc aaaattacat 4920 tttctctttg aactagatgg tattccttat tcacttacat tacaaatcta agaccatgtg 4980 ataagcatga ct 4992 6 708 DNA Homo sapiens any n or Xaa = unknown 6 tatggcacat gcagcgcaag taggtctaca agacgctact tcccctatca tagaagagct 60 tatcaccttt catgatcacg ccctcataat cattttcctt atctgcttcc tagtcctgta 120 tgcccttttc ctaacactca caacaaaact aactaatact aacatctcag acgctcagga 180 aatagaaacc gtctgaacta tcctgcccgc catcatccta gtcctcatcg ccctcccatc 240 cctacgcatc ctttacataa cagacgaggt caacgatccc tcccttacca tcaaatcaat 300 tggccaccaa tggtactgaa cctacgagta caccgactac ggcggactaa tcttcaactc 360 ctacatactt cccccattat tcctagaacc aggcgacctg cgactccttg acgttgacaa 420 tcgagtagta ctcccgattg aagcccccat tcgtataata attacatcac aagacgtctt 480 gcactcatga gctgtcccca cattaggctt aaaaacagat gcaattcccg gacgtctaaa 540 ccaaaccact ttcaccgcta cacgaccggg ggtatactac ggtcaatgct ctgaaatctg 600 tggagcaaac cacagtttca tgcccatcgt cctagaatta attcccctaa aaatctttga 660 aatagggccc gtatttaccc tatagcaccc cctctacccc ctctagag 708 7 1140 DNA Homo sapiens any n or Xaa = unknown 7 atgaccccaa tacgcaaaat taacccccta ataaaattaa ttaaccactc attcatcgac 60 ctccccaccc catccaacat ctccgcatga tgaaacttcg gctcactcct tggcgcctgc 120 ctgatcctcc aaatcaccac aggactattc ctagccatgc actactcacc agacgcctca 180 accgcctttt catcaatcgc ccacatcact cgagacgtaa attatggctg aatcatccgc 240 taccttcacg ccaatggcgc ctcaatattc tttatctgcc tcttcctaca catcgggcga 300 ggcctatatt acggatcatt tctctactca gaaacctgaa acatcggcat tatcctcctg 360 cttgcaacta tagcaacagc cttcataggt tatgtcctcc cgtgaggcca aatatcattc 420 tgaggggcca cagtaattac aaacttacta tccgccatcc catacattgg gacagaccta 480 gttcaatgaa tctgaggagg ctactcagta gacagtccca ccctcacacg attctttacc 540 tttcacttca tcttgccctt cattattgca accctagcag cactccacct cctattcttg 600 cacgaaacgg gatcaaacaa ccccctagga atcacctccc attccgataa aatcaccttc 660 cacccttact acacaatcaa agacaccctc ggcttacttc tcttccttct ctccttaatg 720 acattaacac tattctcacc agacctccta ggcgacccag acaattatac cctagccaac 780 cccttaaaca cccctcccca catcaagccc gaatgatatt tcctattcgc ctacacaatt 840 ctccgatccg tccctaacaa actaggaggc gtccttgccc tattactatc catcctcatc 900 ctagcaataa tccccatcct ccatatatcc aaacaacaaa gcataatatt tcgcccacta 960 agccaatcac tttattgact cctagccgca gacctcctca ttctaacctg aatcggagga 1020 caaccagtaa gctacccttt taccatcatt ggacaagtag catccgtact atacttcaca 1080 acaatcctaa tcctaatacc aactatctcc ctaattgaaa acaaaatact caaatgggcc 1140 8 5629 DNA Homo sapiens any n or Xaa = unknown 8 gcgcgaccgt cccgggggtg gggccgggcg cagcggcgag aggaggcgaa ggtggctgcg 60 gtagcagcag cgcggcagcc tcggacccag cccggagcgc agggcggccg ctgcaggtcc 120 ccgctcccct ccccgtgcgt ccgcccatgg ccgccgccgg gcagctgtgc ttgctctacc 180 tgtcggcggg gctcctgtcc cggctcggcg cagccttcaa cttggacact cgggaggaca 240 acgtgatccg gaaatatgga gaccccggga gcctcttcgg cttctcgctg gccatgcact 300 ggcaactgca gcccgaggac aagcggctgt tgctcgtggg ggccccgcgc ggagaagcgc 360 ttccactgca gagagccaac agaacgggag ggctgtacag ctgcgacatc accgcccggg 420 ggccatgcac gcggatcgag tttgataacg atgctgaccc cacgtcagaa agcaaggaag 480 atcagtggat gggggtcacc gtccagagcc aaggtccagg gggcaaggtc gtgacatgtg 540 ctcaccgata tgaaaaaagg cagcatgtta atacgaagca ggaatcccga gacatctttg 600 ggcggtgtta tgtcctgagt cagaatctca ggattgaaga cgatatggat gggggagatt 660 ggagcttttg tgatgggcga ttgagaggcc atgagaaatt tggctcttgc cagcaaggtg 720 tagcagctac ttttactaaa gactttcatt acattgtatt tggagccccg ggtacttata 780 actggaaagg gattgttcgt gtagagcaaa agaataacac tttttttgac atgaacatct 840 ttgaagatgg gccttatgaa gttggtggag agactgagca tgatgaaagt ctcgttcctg 900 ttcctgctaa cagttactta ggtttttctt tggactcagg gaaaggtatt gtttctaaag 960 atgagatcac ttttgtatct ggtgctccca gagccaatca cagtggagcc gtggttttgc 1020 tgaagagaga catgaagtct gcacatctcc tccctgagca catattcgat ggagaaggtc 1080 tggcctcttc atttggctat gatgtggcgg tggtggacct caacaaggat gggtggcaag 1140 atatagttat tggagcccca cagtattttg atagagatgg agaagttgga ggtgcagtgt 1200 atgtctacat gaaccagcaa ggcagatgga ataatgtgaa gccaattcgt cttaatggaa 1260 ccaaagattc tatgtttggc attgcagtaa aaaatattgg agatattaat caagatggct 1320 acccagatat tgcagttgga gctccgtatg atgacttggg aaaggttttt atctatcatg 1380 gatctgcaaa tggaataaat accaaaccaa cacaggttct caagggtata tcaccttatt 1440 ttggatattc aattgctgga aacatggacc ttgatcgaaa ttcctaccct gatgttgctg 1500

ttggttccct ctcagattca gtaactattt tcagatcccg gcctgtgatt aatattcaga 1560 aaaccatcac agtaactcct aacagaattg acctccgcca gaaaacagcg tgtggggcgc 1620 ctagtgggat atgcctccag gttaaatcct gttttgaata tactgctaac cccgctggtt 1680 ataatccttc aatatcaatt gtgggcacac ttgaagctga aaaagaaaga agaaaatctg 1740 ggctatcctc aagagttcag tttcgaaacc aaggttctga gcccaaatat actcaagaac 1800 taactctgaa gaggcagaaa cagaaagtgt gcatggagga aaccctgtgg ctacaggata 1860 atatcagaga taaactgcgt cccattccca taactgcctc agtggagatc caagagccaa 1920 gctctcgtag gcgagtgaat tcacttccag aagttcttcc aattctgaat tcagatgaac 1980 ccaagacagc tcatattgat gttcacttct taaaagaggg atgtggagac gacaatgtat 2040 gtaacagcaa ccttaaacta gaatataaat tttgcacccg agaaggaaat caagacaaat 2100 tttcttattt accaattcaa aaaggtgtac cagaactagt tctaaaagat cagaaggata 2160 ttgctttaga aataacagtg acaaacagcc cttccaaccc aaggaatccc acaaaagatg 2220 gcgatgacgc ccatgaggct aaactgattg caacgtttcc agacacttta acctattctg 2280 catatagaga actgagggct ttccctgaga aacagttgag ttgtgttgcc aaccagaatg 2340 gctcgcaagc tgactgtgag ctcggaaatc cttttaaaag aaattcaaat gtcacttttt 2400 atttggtttt aagtacaact gaagtcacct ttgacacccc atatctggat attaatctga 2460 agttagaaac aacaagcaat caagataatt tggctccaat tacagctaaa gcaaaagtgg 2520 ttattgaact gcttttatcg gtctcgggag ttgctaaacc ttcccaggtg tattttggag 2580 gtacagttgt tggcgagcaa gctatgaaat ctgaagatga agtgggaagt ttaatagagt 2640 atgaattcag ggtaataaac ttaggtaaac ctcttacaaa cctcggcaca gcaaccttga 2700 acattcagtg gccaaaagaa attagcaatg ggaaatggtt gctttatttg gtgaaagtag 2760 aatccaaagg attggaaaag gtaacttgtg agccacaaaa ggagataaac tccctgaacc 2820 taacggagtc tcacaactca agaaagaaac gggaaattac tgaaaaacag atagatgata 2880 acagaaaatt ttctttattt gctgaaagaa aataccagac tcttaactgt agcgtgaacg 2940 tgaactgtgt gaacatcaga tgcccgctgc gggggctgga cagcaaggcg tctcttattt 3000 tgcgctcgag gttatggaac agcacatttc tagaggaata ttccaaactg aactacttgg 3060 acattctcat gcgagccttc attgatgtga ctgctgctgc cgaaaatatc aggctgccaa 3120 atgcaggcac tcaggttcga gtgactgtgt ttccctcaaa gactgtagct cagtattcgg 3180 gagtaccttg gtggatcatc ctagtggcta ttctcgctgg gatcttgatg cttgctttat 3240 tagtgtttat actatggaag tgtggtttct tcaagagaaa taagaaagat cattatgatg 3300 ccacatatca caaggctgag atccatgctc agccatctga taaagagagg cttacttctg 3360 atgcatagta ttgatctact tctgtaattg tgtggattct ttaaacgctc taggtacgat 3420 gacagtgttc cccgatacca tgctgtaagg atccggaaag aagagcgaga gatcaaagat 3480 gaaaagtata ttgataacct tgaaaaaaaa cagtggatca caaagtggaa cagaaatgaa 3540 agctactcat agcgggggcc taaaaaaaaa aaagcttcac agtacccaaa ctgctttttc 3600 caactcagaa attcaatttg gatttaaaag cctgctcaat ccctgaggac tgatttcaga 3660 gtgactacac acagtacgaa cctacagttt taactgtgga tattgttacg tagcctaagg 3720 ctcctgtttt gcacagccaa atttaaaact gttggaatgg atttttcttt aactgccgta 3780 atttaacttt ctgggttgcc tttgtttttg gcgtggctga cttacatcat gtgttgggga 3840 agggcctgcc cagttgcact caggtgacat cctccagata gtgtagctga ggaggcacct 3900 acactcacct gcactaacag agtggccgtc ctaacctcgg gcctgctgcg cagacgtcca 3960 tcacgttagc tgtcccacat cacaagacta tgccattggg gtagttgtgt ttcaacggaa 4020 agtgctgtct taaactaaat gtgcaataga aggtgatgtt gccatcctac cgtcttttcc 4080 tgtttcctag ctgtgtgaat acctgctcac gtcaaatgca tacaagtttc attctccctt 4140 tcactaaaaa cacacaggtg caacagactt gaatgctagt tatacttatt tgtatatggt 4200 atttattttt tcttttcttt acaaaccatt ttgttattga ctaacaggcc aaagagtctc 4260 cagtttaccc ttcaggttgg tttaatcaat cagaattaga attagagcat gggagggtca 4320 tcactatgac ctaaattatt tactgcaaaa agaaaatctt tataaatgta ccagagagag 4380 ttgttttaat aacttatcta taaactataa cctctccttc atgacagcct ccaccccaca 4440 acccaaaagg tttaagaaat agaattataa ctgtaaagat gtttatttca ggcattggat 4500 attttttact ttagaagcct gcataatgtt tctggattta catactgtaa cattcaggaa 4560 ttcttggaga agatgggttt attcactgaa ctctagtgcg gtttactcac tgctgcaaat 4620 actgtatatt caggacttga aagaaatggt gaatgcctat ggaactagtg gatccaaact 4680 gatccagtat aagactactg aatctgctac caaaacagtt aatcagtgag tcgagtgttc 4740 tattttttgt tttgtttcct cccctatctg tattcccaaa aattactttg gggctaattt 4800 aacaagaact ttaaattgtg ttttaattgt aaaaatggca gggggtggaa ttattactct 4860 atacattcaa cagagactga atagatatga aagctgattt tttttaatta ccatgcttca 4920 caatgttaag ttatatgggg agcaacagca aacaggtgct aatttgtttt ggatatagta 4980 taagcagtgt ctgtgttttg aaagaataga acacagtttg tagtgccact gttgttttgg 5040 ggggggcttt ttttcttttt ccggaaaatc cttaaacctt aagatactaa ggacgttgtt 5100 ttggttgtac ttggaattct tagtcacaaa atatattttg tttacaaaaa tttctgtaaa 5160 acaggttata acagtgttta aagtctcagt ttcttgcttg gggaacttgt gtccctaatg 5220 tgttagattg ctagattgct aaggagctga tacttgacag ttttttagac ctgtgttact 5280 aaaaaaaaga tgaatgtcgg aaaagggtgt tgggagggtg gtcaacaaag aaacaaagat 5340 gttatggtgt ttagacttat ggttgttaaa aatgtcatct caagtcaagt cactggtctg 5400 tttgcatttg atacattttt gtactaacta gcattgtaaa attatttcat gattagaaat 5460 tacctgtgga tatttgtata aaagtgtgaa ataaattttt tataaaagtg ttcattgttt 5520 cgtaacacag cattgtatat gtgaagcaaa ctctaaaatt ataaatgaca acctgaatta 5580 tctatttcat caaaaaaaaa aaaaaaaaaa actttatggg cacaactgg 5629 9 580 DNA Homo sapiens any n or Xaa = unknown 9 ccatccaatg aggccacctc tttctaaact cagactcttc atttagggag gtgagttcca 60 ttaaggaact tgagattttc agataaatgg aaaatactag ataaagaggt atctcataga 120 tagcaaaggt aaactctcat acaatcattg agctaggaca ttaatggttc agtggttccc 180 aattctagat atacattaaa ataaattgaa aagcctttta aaaatacatg attactggac 240 ctactgaatt atatcctttg gggagcccaa gaacttatta aattctctgg gctattttta 300 tgatttctct gagctgttac tgggaactac tgattgaatc catyttttat agtaatgttt 360 ccaacagaag gctgtttscc tttgcttaac attatttcca gtgaagtatt attttccatt 420 ctggagacag ttcaaaagtt tttttaagta acagctttat tgagacaatt tatatsccgt 480 acaattcacc taaagtgtgt aattcagttg tttttagtat gttcacagaa ttgtgcagct 540 tgcatctatc accacaaatt tagaaccttg tcataatccc 580 10 1552 DNA Homo sapiens any n or Xaa = unknown 10 cccaaaccca ctccacctta ctaccagaca accttagcca aaccatttac ccaaataaag 60 tataggcgat agaaattgaa acctggcgca atagatatag taccgcaagg gaaagatgaa 120 aaattataac caagcataat atagcaagga ctaaccccta taccttctgc ataatgaatt 180 aactagaaat aactttgcaa ggagagccaa agctaagacc cccgaaacca gacgagctac 240 ctaagaacag ctaaaagagc acacccgtct atgtagcaaa atagtgggaa gatttatagg 300 tagaggcgac aaacctaccg agcctggtga tagctggttg tccaagatag aatcttagtt 360 caactttaaa tttgcccaca gaaccctcta aatccccttg taaatttaac tgttagtcca 420 aagaggaaca gctctttgga cactaggaaa aaaccttgta gagagagtaa aaaatttaac 480 acccatagta ggcctaaaag cagccaccaa ttaagaaagc gttcaagctc aacacccact 540 acctaaaaaa tcccaaacat ataactgaac tcctcacacc caattggacc aatctatcac 600 cctatagaag aactaatgtt agtataagta acatgaaaac attctcctcc gcataagcct 660 gcgtcagatt aaaacactga actgacaatt aacagcccaa tatctacaat caaccaacaa 720 gtcattatta ccctcactgt caacccaaca caggcatgct cataaggaaa ggttaaaaaa 780 agtaaaagga actcggcaaa tcttaccccg cctgtttacc aaaaacatca cctctagcat 840 caccagtatt agaggcaccg cctgcccagt gacacatgtt taacggccgc ggtaccctaa 900 ccgtgcaaag gtagcataat cacttgttcc ttaattaggg acccgtatga atggctccac 960 gagggttcag ctgtctctta cttttaacca gtgaaattga cctgcccgtg aagaggcggg 1020 catgacacag caagacgaga agaccctatg gagctttaat ttattaatgc aaacagtacc 1080 taacaaacct acaggtccta aactaccaaa cctgcattaa aaatttcggt tggggcgacc 1140 tcggagcaga acccaacctc cgagcagtac atgctaagac ttcaccagtc aaagcgaact 1200 actatactca attgatccaa taacttgacc aacggaacaa gttaccctag ggataacagc 1260 gcaatcctat tctagagtcc atatcaacaa tagggtttac gacctcgatg ttggatcagg 1320 acatcccgat ggtgcagccg ctattaaagg ttcgtttgtt caacgattaa agtcctacgt 1380 gatctgagtt cagaccggag taatccaggt cggtttctat ctacttcaaa ttcctccctg 1440 tacgaaagga caagagaaat aaggcctact tcacaaagcg ccttcccccg taaatgatat 1500 catctcaact tagtattata cccacaccca cccaagaaca gggtttgtta ag 1552 11 2116 DNA Homo sapiens any n or Xaa = unknown 11 gggtggcaga atattagtct agctatctcc cattgctctc acgcgccatc tactggattt 60 catcccaaac tacaacacga aaaactgcta attttcctgc ctgccaggcc gaggactgga 120 attcaacaga ctgtttagag cctttgccct ctgaaaactt ccagaaatga agccaactga 180 ctatattcag tttacaccag agttaaagga acgccaaccc tcccagatga gaaagaatca 240 gtgcaagaac tgtagcaatt taaaaaacca gagcgtcccc ttacctccaa atgagcccac 300 tagctccaca gcaattgttc ttaaccaatc tgaaatgatg agcatggaat tcagaatctg 360 aatggcaatg aagcttatag atatccagga gaaagttgaa atgcaatcca aggaaaccaa 420 gcaatccagt gaaatggttt aagagctgaa agataaaata ncaattttac aaaagaccca 480 aactgagctt attgagttca aaaaagaatt tcataataca atcagaagta ttaatagcag 540 aataggccaa gctgaggaaa gaatctcaga gcttgacccc tggttctttg aatcaactta 600 gacaaaaata aagaaaaaag agttttaaga aatgaacaca atctcccaga aatatgagat 660 tatgtwaaga gacaaaatct atgactcatt gccatccctg agagagaagg agagagaata 720 agcaacttgg aaaatatatt tggggacata gcccacaaaa atttccctaa tctctctaga 780 gaggttgaca tgtaaattca agaaatacag aagaccttgg ccagataata tacaagatga 840 ccatccccaa ggcacatagt catcagattc accatggtca atgcaaaaga aaaaaatctt 900 aaagacagct agggagaagg gtcaagtcac atgcagaagg actctcatta ggctggcagt 960 ggacctctca gcagaaacct gacaagccag aagagatgga gggagagggg tctatttttg 1020 tcatccttaa agaaaaaaaa ttccaaccaa gagtctcata cactgccaaa ctaagcttcc 1080 taagtgaagg agaaataaaa accttctcag acaagcaaat gctgaaggaa ttcaactaga 1140 ccagcctaac aagaggtcct aagggagtgc tgaatatgga ctcaaaagaa taacacctgc 1200 taccacaaac actcacttaa gcacacagcc caacgacact ataggcaatt acacagtaag 1260 tctacataac aacacaatga caggatcaac atctcacaca tcaatactaa ccccgagtgt 1320 aaaggggcta aatgccccac ttaaaagaca tagagtgtca agcttgataa aaagacaaga 1380 tccaatcatc cactattttc aagagctcta tgttatgtgt aatgacaccc acagactcaa 1440 agacttggag aaagatttat catgcaaaat cagaaaacaa aaaagagcag gagtcactag 1500 ttttatatca gacaaaacag actttaaacc cttaataatt aagaaagaca aagaagggta 1560 tttcctggac cacagaaggc ttattggaaa aaaggacata atgacaaagg gtacaatcca 1620 acaagaagtt ttaactattc taaatatata cacacccaac attggagcac ccagatttat 1680 aaaacaagta cttctcgatc tacaagaaga cttagacagc cacacaataa tagtgggaga 1740 ctttcacatc ctacttacag atcattgaga cagaaaacta ataaaagaac tctggactta 1800 aacttgttac ttgaccaatt ggacctaata gatatccaca gaaaacttca cccaacaaag 1860 acagaatata cattcttctt atctgcacat ggaacacatt ccaagatcaa tcacatgcta 1920 ggtaagaaag caagtctcaa taaattaaaa aaaattgaaa tcatacgaac cttaatatca 1980 gaccacaatg taattaaaaa taaatcaata tcaagaagat ctcatacata aatacatgaa 2040 aattaaacaa cttactcctg aataactctt gtgtgaacat caaaattcag gaagaaataa 2100 aaaattattt gaaatt 2116 12 173 DNA Homo sapiens any n or Xaa = unknown 12 gcgatccaca aatgggaggt gacggtccat cagggaagct gggttcgcgg ctccacggct 60 gggggctgcc gcaatttcct ggataccttt tggaccaatc cacaaataaa attgtctctg 120 actgagaaag atgaggggca ggaggagtgt agtttccttg tagccctgat gca 173 13 655 DNA Homo sapiens any n or Xaa = unknown 13 ctgatccatg ggccagcagc atcaatatta cctgggagct tacagaaatg cagaatttca 60 ggcccactgc agatctaccg aatcaaaatc ttcctttagc aaaatttctc aaacgattag 120 cactggccta catccatttt atccttcctt agctattagg gatgtgaggt ccgagggctt 180 caaaaggtcc ccggaatagc ttgttccttc atccactgtg tcctattcat tcttcagcta 240 actccagcaa tgagctgaaa ctcattcatc acccttgctg agttttcttc tcaatcctta 300 ttcctaattc tggttctaga tgagccctac ctacccagtg gttgtatttt tgtagccagt 360 gtgggacaca ggagattggc agaccaacac agctagcctc tctctagccc tccctccacc 420 tctaagtcac taacaatcca tgtttgttca gtttgttgac atgtggcatg ttcatttgtt 480 cacaacttaa tcacggggga catttcagaa aaatgtgtac taagttaaaa ccatgtttag 540 tctcctacaa cttgtacatt ttcattttct cttatcagta gattgtcctt gttgacatag 600 ctcatgcatg aggacacata gcagtacaca cacattgaat gaattgttag tcatg 655 14 2619 DNA Homo sapiens any n or Xaa = unknown 14 gactcctagg ggcttgcaga cctagtggga gagaaagaac atcgcagcag ccaggcagaa 60 ccaggacagg tgaggtgcag gctggctttc ctctcgcagc gcggtgtgga gtcctgtcct 120 gcctcagggc ttttcggagc ctggatcctc aaggaacaag tagacctggc cgcggggagt 180 ggggagggaa ggggtgtcta ttgggcaaca gggcggcaaa gccctgaata aaggggcgca 240 gggcaggcgc aagtgcagag ccttcgtttg ccaagtcgcc tccagaccgc agacatgaaa 300 cttgtcttcc tcgtcctgct gttcctcggg gccctcggac tgtgtctggc tggccgtagg 360 agaaggagtg ttcagtggtg cgccgtatcc caacccgagg ccacaaaatg cttccaatgg 420 caaaggaata tgagaaaagt gcgtggccct cctgtcagct gcataaagag agactccccc 480 atccagtgta tccaggccat tgcggaaaac agggccgatg ctgtgaccct tgatggtggt 540 ttcatatacg aggcaggcct ggccccctac aaactgcgac ctgtagcggc ggaagtctac 600 gggaccgaaa gacagccacg aactcactat tatgccgtgg ctgtggtgaa gaagggcggc 660 agctttcagc tgaacgaact gcaaggtctg aagtcctgcc acacaggcct tcgcaggacc 720 gctggatgga atgtccctac agggacactt cgtccattct tgaattggac gggtccacct 780 gagcccattg aggcagctgt ggccaggttc ttctcagcca gctgtgttcc cggtgcagat 840 aaaggacagt tccccaacct gtgtcgcctg tgtgcgggga caggggaaaa caaatgtgcc 900 ttctcctccc aggaaccgta cttcagctac tctggtgcct tcaagtgtct gagagacggg 960 gctggagacg tggcttttat cagagagagc acagtgtttg aggacctgtc agacgaggct 1020 gaaagggacg agtatgagtt actctgccca gacaacactc ggaagccagt ggacaagttc 1080 aaagactgcc atctggcccg ggtcccttct catgccgttg tggcacgaag tgtgaatggc 1140 aaggaggatg ccatctggaa tcttctccgc caggcacagg aaaagtttgg aaaggacaag 1200 tcaccgaaat tccagctctt tggctcccct agtgggcaga aagatctgct gttcaaggac 1260 tctgccattg ggttttcgag ggtgcccccg aggatagatt ctgggctgta ccttggctcc 1320 ggctacttca ctgccatcca gaacttgagg aaaagtgagg aggaagtggc tgcccggcgt 1380 gcgcgggtcg tgtggtgtgc ggtgggcgag caggagctgc gcaagtgtaa ccagtggagt 1440 ggcttgagcg aaggcagcgt gacctgctcc tcggcctcca ccacagagga ctgcatcgcc 1500 ctggtgctga aaggagaagc tgatgccatg agtttggatg gaggatatgt gtacactgca 1560 tgcaaatgtg gtttggtgcc tgtcctggca gagaactaca aatcccaaca aagcagtgac 1620 cctgatccta actgtgtgga tagacctgtg gaaggatatc ttgctgtggc ggtggttagg 1680 agatcagaca ctagccttac ctggaactct gtgaaaggca agaagtcctg ccacaccgcc 1740 gtggacagga ctgcaggctg gaatatcccc atgggcctgc tcttcaacca gacgggctcc 1800 tgcaaatttg atgaatattt cagtcaaagc tgtgcccctg ggtctgaccc gagatctaat 1860 ctctgtgctc tgtgtattgg cgacgagcag ggtgagaata agtgcgtgcc caacagcaac 1920 gagagatact acggctacac tggggctttc cggtgcctgg ctgagaatgc tggagacgtt 1980 gcatttgtga aagatgtcac tgtcttgcag aacactgatg gaaataacaa tgaggcatgg 2040 gctaaggatt tgaagctggc agactttgcg ctgctgtgcc tcgatggcaa acggaagcct 2100 gtgactgagg ctagaagctg ccatcttgcc atggccccga atcatgccgt ggtgtctcgg 2160 atggataagg tggaacgcct gaaacaggtg ctgctccacc aacaggctaa atttgggaga 2220 aatggatctg actgcccgga caagttttgc ttattccagt ctgaaaccaa aaaccttctg 2280 ttcaatgaca acactgagtg tctggccaga ctccatggca aaacaacata tgaaaaatat 2340 ttgggaccac agtatgtcgc aggcattact aatctgaaaa agtgctcaac ctcccccctc 2400 ctggaagcct gtgaattcct caggaagtaa aaccgaagaa gatggcccag ctccccaaga 2460 aagcctcagc cattcactgc ccccagctct tctccccagg tgtgttgggg ccttggctcc 2520 cctgctgaag gtggggattg cccatccatc tgcttacaat tccctgctgt cgtcttagca 2580 agaagtaaaa tgagaaattt tgttgatatt caaaaaaaa 2619 15 892 DNA Homo sapiens any n or Xaa = unknown 15 tcttgaccgg cacacacagc tcgcttcttc actttctttt ccatccactg ccggacccaa 60 gccagccttc cagggagcag ccatgcctta cctctaccgg gccccagggc ctcaggcaca 120 cccggttccc aaggacgccc ggatcaccca ctcctcaggc cagarctttg arcaaatgaa 180 gcaggartgc ctgcagarar gcaccctgtt tgaggatgca gacttcccag ccagcaattc 240 ctccctgttc tacagtgaga ggccgcagat cccctttgtg tggaaacgac cargggaaat 300 cgtgaaaaac ccaraattca ttcttggagg ggccaccagg actgatatct gccagggaga 360 gctgggagac tgctggctat tagccgccat cgcctccctt acgcttaatc aaaaagcact 420 ggccagagtc atcccccagg accaaagctt tggccctggt tatgccggga tattccattt 480 ccagttctgg cagcacagtg agtggctgga cgtggtgatc gatgaccgcc tgcccacctt 540 cagggaccgc ttggttttcc tccactctgc cgaccacaac garttctgga rcgccttgct 600 ggaaaaagcc tacgccaagc taaatgggag ctatgaagct ctgaagggag gcagcgccat 660 cgaggccatg gaagacttca ctgggggtgt ggcagagacc ttccaaacta aagaggcccc 720 cgagaacttc tatgagattc tagagaaggc tttgaagana ngctccctgc tgggctgctt 780 cattgatacc agaagtgctg cagaatctga ggcccggacg ccgtttggtc ttattaaggg 840 tcatgcctac agtgtaacgg gaattgacca ggtaagcttc cgaggccaga ga 892 16 508 DNA Homo sapiens any n or Xaa = unknown 16 tggagaatgc gagccgggtg ttccaggctc tcagtacaaa gaacanggag ttcattcatn 60 tcaatataaa ngagttcatc cattngacaa tgaacatctg aggctgcntt gtagagatgc 120 agcctgccca gntgaatctg ggnttctgga cctngacctt cagaanttct cttggtgtgg 180 aaccattacg cccagggttc actcccctct catcgtccgg ccttctccct tcatcttgat 240 ctgggaagaa tgaaatgaac tcagctacac tctctgattt tgtgctactc ctttgtaaag 300 tcactgcctt aagggggctg atggcgccac ctgtgcctta catccaggtt caggcatcac 360 tagctttccc acactctact ttccttattt ccttccatta agaattactc agagttctaa 420 cgcacagaat cctgacttcc atgtagctcc agtcattgtg atcagacatc ctttataaaa 480 catgttttta taaatgtgta tgtggaat 508 17 194 PRT Homo sapiens any n or Xaa = unknown 17 Ser Val His Cys Phe Arg Glu Asp Lys Met Lys Phe Thr Ile Val Phe 1 5 10 15 Ala Gly Leu Leu Gly Val Phe Leu Ala Pro Ala Leu Ala Asn Tyr Asn 20 25 30 Ile Asn Val Asn Asp Asp Asn Asn Asn Ala Gly Ser Gly Gln Gln Ser 35 40 45 Val Ser Val Asn Asn Glu His Asn Val Ala Asn Val Asp Asn Asn Asn 50 55 60 Gly Trp Asp Ser Trp Asn Ser Ile Trp Asp Tyr Gly Asn Gly Phe Ala 65 70 75 80 Ala Thr Arg Leu Phe Gln Lys Lys Thr Cys Ile Val His Lys Met Asn 85 90 95 Lys Glu Val Met Pro Ser Ile Gln Ser Leu Asp Ala Leu Val Lys Glu 100 105 110 Lys Lys Leu Gln Gly Lys Gly Pro Gly Gly Pro Pro Pro Lys Gly Leu 115 120 125 Met Tyr Ser Val Asn Pro Asn Lys Val Asp Asp Leu Ser Lys Phe Gly 130 135 140 Lys Asn Ile Ala Asn Met Cys Arg Gly Ile Pro Thr Tyr Met Ala Glu 145 150 155 160 Glu Met Gln Glu Ala Ser Leu Phe Phe Tyr Ser Gly Thr Cys Tyr Thr 165 170 175 Thr Ser Val Leu Trp Ile Val Asp Ile Ser Phe Cys Gly Asp Thr Val 180 185 190 Glu Asn 18 51 PRT Homo sapiens any n or Xaa = unknown 18 Met Val Asp Asp Lys Arg Lys Ser Ala Leu Trp Lys Glu Arg Thr

Val 1 5 10 15 Ser Thr Arg Val Lys Ser Met Asn Ala Ser Ile Glu Arg Thr Arg Gly 20 25 30 Asn Ile Pro Ser Thr Gly Leu His Thr Cys Ile Tyr Ile Leu Glu Asn 35 40 45 Thr Ala Met 50 19 63 PRT Homo sapiens any n or Xaa = unknown 19 Met Gly Gln Gln His Gln Tyr Tyr Leu Gly Ala Tyr Arg Asn Ala Glu 1 5 10 15 Phe Gln Ala His Cys Arg Ser Thr Glu Ser Lys Ser Ser Phe Ser Lys 20 25 30 Ile Ser Gln Thr Ile Ser Thr Gly Leu His Pro Phe Tyr Pro Ser Leu 35 40 45 Ala Ile Arg Asp Val Arg Ser Glu Gly Phe Lys Arg Ser Pro Glu 50 55 60 20 20 DNA Artificial Sequence any n or Xaa = unknown 20 tctttgctgg acttcttgga 20 21 20 DNA Artificial Sequence any n or Xaa = unknown 21 ctttgtttgg gttgactgag 20 22 20 DNA Artificial Sequence any n or Xaa = unknown 22 caccctcatt acatcatcag 20 23 20 DNA Artificial Sequence any n or Xaa = unknown 23 attccttgtg tcttctggta 20 24 21 DNA Artificial Sequence any n or Xaa = unknown 24 cagtcctact tctcctatct c 21 25 21 DNA Artificial Sequence any n or Xaa = unknown 25 atcatagctc agaccatacc t 21 26 21 DNA Artificial Sequence any n or Xaa = unknown 26 gatcctgcag gactacaaat c 21 27 20 DNA Artificial Sequence any n or Xaa = unknown 27 gcctatatag aaaaatgaag 20 28 21 DNA Artificial Sequence any n or Xaa = unknown 28 cacctagtga ccgttccaga t 21 29 21 DNA Artificial Sequence any n or Xaa = unknown 29 ttcatctcct tgggtgttat t 21 30 21 DNA Artificial Sequence any n or Xaa = unknown 30 ctcagacgct caggaaatag a 21 31 21 DNA Artificial Sequence any n or Xaa = unknown 31 aatgggggaa gtatgtagga g 21 32 21 DNA Artificial Sequence any n or Xaa = unknown 32 ttacggatca tttctctact c 21 33 21 DNA Artificial Sequence any n or Xaa = unknown 33 agggcaagat gaagtgaaag g 21 34 21 DNA Artificial Sequence any n or Xaa = unknown 34 tccggaaaga agagcgagag a 21 35 21 DNA Artificial Sequence any n or Xaa = unknown 35 tgaaacacaa ctaccccaat g 21 36 20 DNA Artificial Sequence any n or Xaa = unknown 36 atagcaaagg taaactctca 20 37 20 DNA Artificial Sequence any n or Xaa = unknown 37 tcaatcagta gttcccagta 20 38 20 DNA Artificial Sequence any n or Xaa = unknown 38 ttaacagccc aatatctaca 20 39 20 DNA Artificial Sequence any n or Xaa = unknown 39 gaacaagtga ttatgctacc 20 40 20 DNA Artificial Sequence any n or Xaa = unknown 40 agaataagca acttggaaaa 20 41 20 DNA Artificial Sequence any n or Xaa = unknown 41 tgaatctgat gactatgtgc 20 42 20 DNA Artificial Sequence any n or Xaa = unknown 42 tcctggatac cttttggacc 20 43 19 DNA Artificial Sequence any n or Xaa = unknown 43 catcagggct acaaggaaa 19 44 21 DNA Artificial Sequence any n or Xaa = unknown 44 cagatctacc gaatcaaaat c 21 45 21 DNA Artificial Sequence any n or Xaa = unknown 45 accagaatta ggaataagga t 21 46 20 DNA Artificial Sequence any n or Xaa = unknown 46 gactccatgg caaaacaaca 20 47 20 DNA Artificial Sequence any n or Xaa = unknown 47 tcttcttcgg ttttacttcc 20 48 20 DNA Artificial Sequence any n or Xaa = unknown 48 aggcaccagg gcgtgatggt 20 49 20 DNA Artificial Sequence any n or Xaa = unknown 49 ggtctcaaac atgatctggg 20 50 10 DNA Artificial Sequence any n or Xaa = unknown 50 cttgattgcc 10 51 10 DNA Artificial Sequence any n or Xaa = unknown 51 aggtgaccgt 10 52 10 DNA Artificial Sequence any n or Xaa = unknown 52 gttgcgatcc 10 53 10 DNA Artificial Sequence any n or Xaa = unknown 53 ctgatccatg 10 54 10 DNA Artificial Sequence any n or Xaa = unknown 54 ctgcttgatg 10 55 10 DNA Artificial Sequence any n or Xaa = unknown 55 gatctgactg 10 56 13 DNA Artificial Sequence any n or Xaa = unknown 56 tttttttttt taa 13 57 13 DNA Artificial Sequence any n or Xaa = unknown 57 tttttttttt tac 13 58 13 DNA Artificial Sequence any n or Xaa = unknown 58 tttttttttt tag 13 59 13 DNA Artificial Sequence any n or Xaa = unknown 59 tttttttttt tca 13 60 13 DNA Artificial Sequence any n or Xaa = unknown 60 tttttttttt tcc 13 61 13 DNA Artificial Sequence any n or Xaa = unknown 61 tttttttttt tcg 13 62 13 DNA Artificial Sequence any n or Xaa = unknown 62 tttttttttt tga 13 63 13 DNA Artificial Sequence any n or Xaa = unknown 63 tttttttttt tgc 13 64 13 DNA Artificial Sequence any n or Xaa = unknown 64 tttttttttt tgg 13 65 264 DNA Artificial Sequence any n or Xaa = unknown 65 aggcaccagg gcgtgatggt gggcatgggt cagaaggatt cctatgtggg cgacgaggcc 60 cagagcaaga gaggcatcct caccctgaag taccccatcg agcacggcat cgtcaccaac 120 tgggacgaca tggagaaaat ctggcaccac accttctaca atgagctgcg tgtggctccc 180 gaggagcacc ccgtgctgct gaccgaggcc cccctgaacc ccaaggccaa ccgcgagaag 240 atgacccaga tcatgtttga gacc 264 66 814 DNA Homo sapiens any n or Xaa = unknown 66 ataacaccta gtttgagtca acctggttaa gtacaaatat gagaaggctt ctcattcagg 60 tccatgcttg cctactcctc tgtccactgc tttcgtgaag acaagatgaa gttcacaatt 120 gtctttgctg gacttcttgg agtctttcta gctcctgccc ttgctaacta taatatcaac 180 gtcaatgatg acaacaacaa tgctggaagt gggcagcagt cagtgagtgt caacaatgaa 240 cacaatgtgg ccaatgttga caataacaac ggatgggact cctggaattc catctgggat 300 tatggaaatg gctttgctgc aaccagactc tttcaaaaga agacatgcat tgtgcacaaa 360 atgaacaagg aagtcatgcc ctccattcaa tcccttgatg cactggtcaa ggaaaagaag 420 cttcagggta agggaccagg aggaccacct cccaagggcc tgatgtactc agtcaaccca 480 aacaaagtcg atgacctgag caagttcgga aaaaacattg caaacatgtg tcgtgggatt 540 ccaacataca tggctgagga gatgcaagag gcaagcctgt ttttttactc aggaacgtgc 600 tacacgacca gtgtactatg gattgtggac atttccttct gtggagacac ggtggagaac 660 taaacaattt tttaaagcca ctatggattt agtcgtctga atatgctgtg cagaaaaaat 720 atgggctcca gtggttttta ccatgtcatt ctgaaatttt tctctactag ttatgtttga 780 tttctttaag tttcaataaa atcatttagc attg 814 67 4646 DNA Homo sapiens any n or Xaa = unknown 67 tatgtgccag gtgctctgtt gggtgccaag tgaaatgcaa ataaatggga acagtactca 60 gttcagtttg ctttgggaat taattacatg ccatgtgtgt aaattgtgct aaattttagg 120 aatacagaaa tgaattaaac gtctccaggg aacacatagt ctagtgaaga agctgacaag 180 tgaaaagaga ggatggagta aaggatttct ggatgccaat gaaaaactac tcgattcttg 240 tatactttca tatgtaagaa tttcaagtag caaaaagtca tctgggccct tagaatagca 300 tattttgaag ataataagaa ggaagtcact aagaaatgct ctcaggatct agaatagaat 360 tggtatagga aagaggaggc caagcggact tacagacagg gagtaaaaac cctgattcat 420 ctgggtaaca tatgccactg cagatattac tgtcattttt atacaaagtt tctaaatgtg 480 gcagagcaac cagagtgaaa gaggtcgggc caactgatga tgaacacaac aaaggaaatt 540 tctcagagta ctggaaggta gataaagaag agtttatgtt tattatatat ctactgccca 600 gaaaaaaatt ttaagtactc attcataaag taaataaagg cacataggta tgccattgac 660 acagaatggc ataatatcac tgggattgag ccaaccagca cttccaaaag ttgtcagttt 720 tatttaagct aatgtattat tattctaata attccaataa tatatttttt aatgctcttt 780 ctctgaaaaa ttttcccttt tccagataat gtcggtgctg gaggctgtgc aaaggctggg 840 ctcctgggca tcttgggaat ttcaatctgt gcagacattc atgtttagga tgattagccc 900 tcttgtttta tcttttcaaa gaaatacatc cttggtttac actcaaaagt caaattaaat 960 tctttcccaa tgccccaact aattttgaga ttcagtcaga aaatataaat gctgtattta 1020 tagatttttt ggtgtntgtt gttttttgta agcagcaaag ggaatccaag caatgtcttt 1080 gtcactatat agaataaaaa aaattgccag aattttaaat aaggtgcata atgtgtgaaa 1140 attcccagat aataccactg ggtcacatgt ggactagtca gctggggtcg aatttccatt 1200 tcttcgtntg ccctctggac cagcttccca tctaaccatc caaatatatg ggagcaacct 1260 gggtagagaa gaggctcaca cggtggtggc cttgacctgg ccaggggagg gacatagcgt 1320 atgcttatca aacaagttga atgctcaggt gaaggctttt agggccattc atatgagtta 1380 aaatgtcctt taactcacca aagcagtaga ctcaacctga ataaacttta taataatatg 1440 tgttgccctg gagtgagaag ggagaaaggg agagaggaag gagcacctaa catccaggaa 1500 aagatgcacc atactgaaga tcataacagg agtgaaagac tagaaatgcc aagtcaatac 1560 atagcagaaa agcaacttcc aatatttcaa ataaattgca cattgtgtac aaatctcaga 1620 tcgtgaagct gggtcacacg tgaacgttcg gctgaatgca aattcagagc aaagaggaat 1680 tactttaata acaatttatt ctcttgccgt agacctctgg gatcctagct gcagaggacc 1740 cccggcctcc gcgtttgagc tgacatgaga ctctcactag agattagatg gagaaagggc 1800 tccagcaggc acggagctgg aagctttgtc tgtgagacag ctccgcggga gcactcatcc 1860 cccagggctc tctgtctccc tctgagaggc tctggcccca tntaaccacc agaatgggag 1920 aagaagtgct tccccgtggg attagggcac atctgtcccg caggcccacc tgcctgccag 1980 tccctcccag gattcctgcc tggccacccc acaggagtgt gtacacagtg cagcctcagc 2040 tgctcagcat gggtgctttg ctccacttga gtgcattccg gcagcgtggg agctgtttga 2100 atcccccagt gcacacagat cccaacccca agggtccagg ggagggagct gtgagcagat 2160 ccggacgtcc cagggctgtg gctccggagt gcggaactgg gcccagtgct tcagcagaag 2220 aggagcccat actctcagaa aactctcaga gaggggtgag tngnacaggt tcctgggctg 2280 gtgtggaacc tangcgtgcc tncctncaca gagctggtcc agtaagtgtg gggcctgtct 2340 ccctgctgga cctctgcctg aaggagccca acgacctgga acacctaaca acaacagaaa 2400 gtcncggcca cagtgccagt gatcaggggt ccctcccctc aagaccgagg aggagacctg 2460 gtgaggggtc acccctctcc cccttgcacc acagagcacg gcttcaaagg cccggataca 2520 caaaggagcc gggtggcaga atattagtct agctatctcc cattgctctc acgcgccatc 2580 tactggattt catcccaaac tacaacacga aaaactgcta attttcctgc ctgccaggcc 2640 gaggactgga attcaacaga ctgtttagag cctttgccct ctgaaaactt ccagaaatga 2700 agccaactga ctatattcag tttacaccag agttaaagga acgccaaccc tcccagatga 2760 gaaagaatca gtgcaagaac tgtagcaatt taaaaaacca gagcgtcccc ttacctccaa 2820 atgagcccac tagctccaca gcaattgttc ttaaccaatc tgaaatgatg agcatggaat 2880 tcagaatctg aatggcaatg aagcttatag atatccagga gaaagttgaa atgcaatcca 2940 aggaaaccaa gcaatccagt gaaatggttt aagagctgaa agataaaata ncaattttac 3000 aaaagaccca aactgagctt attgagttca aaaaagaatt tcataataca atcagaagta 3060 ttaatagcag aataggccaa gctgaggaaa gaatctcaga gcttgacccc tggttctttg 3120 aatcaactta gacaaaaata aagaaaaaag agttttaaga aatgaacaca atctcccaga 3180 aatatgagat tatgtwaaga gacaaaatct atgactcatt gccatccctg agagagaagg 3240 agagagaata agcaacttgg aaaatatatt tggggacata gcccacaaaa atttccctaa 3300 tctctctaga gaggttgaca tgtaaattca agaaatacag aagaccttgg ccagataata 3360 tacaagatga ccatccccaa ggcacatagt catcagattc accatggtca atgcaaaaga 3420 aaaaaatctt aaagacagct agggagaagg gtcaagtcac atgcagaagg actctcatta 3480 ggctggcagt ggacctctca gcagaaacct gacaagccag aagagatgga gggagagggg 3540 tctatttttg tcatccttaa agaaaaaaaa ttccaaccaa gagtctcata cactgccaaa 3600 ctaagcttcc taagtgaagg agaaataaaa accttctcag acaagcaaat gctgaaggaa 3660 ttcaactaga ccagcctaac aagaggtcct aagggagtgc tgaatatgga ctcaaaagaa 3720 taacacctgc taccacaaac actcacttaa gcacacagcc caacgacact ataggcaatt 3780 acacagtaag tctacataac aacacaatga caggatcaac atctcacaca tcaatactaa 3840 ccccgagtgt aaaggggcta aatgccccac ttaaaagaca tagagtgtca agcttgataa 3900 aaagacaaga tccaatcatc cactattttc aagagctcta tgttatgtgt aatgacaccc 3960 acagactcaa agacttggag aaagatttat catgcaaaat cagaaaacaa aaaagagcag 4020 gagtcactag ttttatatca gacaaaacag actttaaacc cttaataatt aagaaagaca 4080 aagaagggta tttcctggac cacagaaggc ttattggaaa aaaggacata atgacaaagg 4140 gtacaatcca acaagaagtt ttaactattc taaatatata cacacccaac attggagcac 4200 ccagatttat aaaacaagta cttctcgatc tacaagaaga cttagacagc cacacaataa 4260 tagtgggaga ctttcacatc ctacttacag atcattgaga cagaaaacta ataaaagaac 4320 tctggactta aacttgttac ttgaccaatt ggacctaata gatatccaca gaaaacttca 4380 cccaacaaag acagaatata cattcttctt atctgcacat ggaacacatt ccaagatcaa 4440 tcacatgcta ggtaagaaag caagtctcaa taaattaaaa aaaattgaaa tcatacgaac 4500 cttaatatca gaccacaatg taattaaaaa taaatcaata tcaagaagat ctcatacata 4560 aatacatgaa aattaaacaa cttactcctg aataactctt gtgtgaacat caaaattcag 4620 gaagaaataa aaaattattt gaaatt 4646 68 2484 DNA Homo sapiens any n or Xaa = unknown 68 tcttgaccgg cacacacagc tcgcttcttc actttctttt ccatccactg ccggacccaa 60 gccagccttc cagggagcag ccatgcctta cctctaccgg gccccagggc ctcaggcaca 120 cccggttccc aaggacgccc ggatcaccca ctcctcaggc cagagctttg agcaaatgag 180 gcaggagtgc ctgcagagag gcaccctgtt tgaggatgca gacttcccag ccagcaattc 240 ctccctgttc tacagtgaga ggccgcagat cccctttgtg tggaaacgac caggggaaat 300 cgtgaaaaac ccagaattca ttcttggagg ggccaccagg actgatatct gccagggaga 360 gctgggagac tgctggctat tagccgccat cgcctccctt acgcttaatc aaaaagcact 420 ggccagagtc atcccccagg accaaagctt tggccctggt tatgccggga tattccattt 480 ccagttctgg cagcacagtg agtggctgga cgtggtgatc gatgaccgcc tgcccacctt 540 cagggaccgc ttggttttcc tccactctgc cgaccacaac gagttctgga gcgccttgct 600 ggaaaaagcc tacgccaagc taaatgggag ctatgaagct ctgaagggag gcagcgccat 660 cgaggccatg gaagacttca ctgggggtgt ggcagagacc ttccaaacta aagaggcccc 720 cgagaacttc tatgagattc tagagaaggc tttgaagaga ggctccctgc tgggctgctt 780 cattgatacc agaagtgctg cagaatctga ggcccggacg ccgtttggtc ttattaaggg 840 tcatgcctac agtgtaacgg gaattgacca ggtaagcttc cgaggccaga gaatcgagct 900 catccgaatc cggaaccctt ggggccaggt tgagtggaac gggtcgtgga gcgacaggat 960 ggcatttaag gacttcaagg cccactttga taaagtggag atctgcaacc tcactcccga 1020 tgccctggag gaagacgcga tccacaaatg ggaggtgacg gtccatcagg gaagctgggt 1080 tcgcggctcc acggctgggg gctgccgcaa tttcctggat accttttgga ccaatccaca 1140 aataaaattg tctctgactg agaaagatga ggggcaggag gagtgtagtt tccttgtagc 1200 cctgatgcag aaagatagaa ggaaactcaa gagatttggt gccaatgtgc tgacaatcgg 1260 ctatgccatt tatgagtgcc ctgacaaaga cgaacacctg aacaaagact tcttcagata 1320 ccacgcttct cgggccagaa gcaagacgtt catcaacctg agagaagtct ccgaccggtt 1380 caagctgccc cctggggagt acatcctgat tcccagcact tttgagcccc accaggaagc 1440 tgatttctgt ctgagaatct tttcagagaa aaaagccatt acccgggata tggatggaaa 1500 tgtagacatt gaccttcctg agcctccaaa gccaactcca cctgaccagg agacagagga 1560 ggagcagcgg tttcgggctc tgtttgaaca agtcgctggt gaggacatgg aggtgacagc 1620 agaggaactt gagtatgttt taaatgctgt gctgcaaaag aaaaaggaca tcaaattcaa 1680 gaagctaagc ctgatctcct gtaaaaacat catttccctg atggacacca gcggcaatgg 1740 gaagctggag tttgatgaat tcaaagtgtt ctgggacaag ctgaagcagt ggattaacct 1800 tttccttcgg tttgatgctg acaagtccgg caccatgtct acctatgaac tacggactgc 1860 actgaaagct gcaggctttc agctgagcag ccacctcctg cagctgattg tgctcaggta 1920 tgcggatgag gagctccagc tggacttcga tgacttcctc aactgcctgg tccggctgga 1980 gaatgcgagc cgggtgttcc aggctctcag tacaaagaac aaggagttca ttcatctcaa 2040 tataaatgag ttcatccatt tgacaatgaa catctgaggc tgccttgtag agatgcagcc 2100 tgcccagctg aatcttggct tctggacctt gaccttcaga acttctcttg gtgtggaacc 2160 attacgccca gggttcactc ccctctcatc gtccggcctt ctcccttcat cttgatctgg 2220 gaagaatgaa atgaactcag ctacactctc tgattttgtg ctactccttt gtaaagtcac 2280 tgccttaagg gggctgatgg cgccacctgt gccttacatc caggttcagg catcactagc 2340 tttcccacac tctactttcc ttatttcctt ccattaagaa ttactcagag ttctaacgca 2400 cagaatcctg acttccatgt agctccagtc attgtgatca gacatccttt ataaaacatg 2460 tttttataaa tgtgtatgtg gaat 2484 69 199 PRT Homo sapiens any n or Xaa = unknown 69 Met Leu Ala Tyr Ser Ser Val His Cys Phe Arg Glu Asp Lys Met Lys 1 5 10 15 Phe Thr Ile Val Phe Ala Gly Leu Leu Gly Val Phe Leu Ala Pro Ala 20 25 30 Leu Ala Asn Tyr Asn Ile Asn Val Asn Asp Asp Asn Asn Asn Ala Gly 35 40 45 Ser Gly Gln Gln Ser Val Ser Val Asn Asn Glu His Asn Val Ala Asn 50 55 60 Val Asp Asn Asn Asn Gly Trp Asp Ser Trp Asn Ser Ile Trp Asp Tyr 65 70 75 80 Gly Asn Gly Phe Ala Ala Thr Arg Leu Phe Gln Lys Lys Thr Cys Ile 85 90 95 Val His Lys Met Asn Lys Glu Val Met Pro Ser Ile Gln Ser Leu Asp 100 105 110 Ala Leu Val Lys Glu Lys Lys Leu Gln Gly Lys Gly Pro Gly Gly Pro 115 120 125 Pro Pro Lys Gly Leu Met Tyr Ser Val Asn Pro Asn Lys Val Asp Asp 130 135 140 Leu Ser Lys Phe Gly Lys Asn Ile Ala Asn Met Cys Arg Gly Ile Pro

145 150 155 160 Thr Tyr Met Ala Glu Glu Met Gln Glu Ala Ser Leu Phe Phe Tyr Ser 165 170 175 Gly Thr Cys Tyr Thr Thr Ser Val Leu Trp Ile Val Asp Ile Ser Phe 180 185 190 Cys Gly Asp Thr Val Glu Asn 195 70 664 PRT Homo sapiens any n or Xaa = unknown 70 Met Pro Tyr Leu Tyr Arg Ala Pro Gly Pro Gln Ala His Pro Val Pro 1 5 10 15 Lys Asp Ala Arg Ile Thr His Ser Ser Gly Gln Ser Phe Glu Gln Met 20 25 30 Arg Gln Glu Cys Leu Gln Arg Gly Thr Leu Phe Glu Asp Ala Asp Phe 35 40 45 Pro Ala Ser Asn Ser Ser Leu Phe Tyr Ser Glu Arg Pro Gln Ile Pro 50 55 60 Phe Val Trp Lys Arg Pro Gly Glu Ile Val Lys Asn Pro Glu Phe Ile 65 70 75 80 Leu Gly Gly Ala Thr Arg Thr Asp Ile Cys Gln Gly Glu Leu Gly Asp 85 90 95 Cys Trp Leu Leu Ala Ala Ile Ala Ser Leu Thr Leu Asn Gln Lys Ala 100 105 110 Leu Ala Arg Val Ile Pro Gln Asp Gln Ser Phe Gly Pro Gly Tyr Ala 115 120 125 Gly Ile Phe His Phe Gln Phe Trp Gln His Ser Glu Trp Leu Asp Val 130 135 140 Val Ile Asp Asp Arg Leu Pro Thr Phe Arg Asp Arg Leu Val Phe Leu 145 150 155 160 His Ser Ala Asp His Asn Glu Phe Trp Ser Ala Leu Leu Glu Lys Ala 165 170 175 Tyr Ala Lys Leu Asn Gly Ser Tyr Glu Ala Leu Lys Gly Gly Ser Ala 180 185 190 Ile Glu Ala Met Glu Asp Phe Thr Gly Gly Val Ala Glu Thr Phe Gln 195 200 205 Thr Lys Glu Ala Pro Glu Asn Phe Tyr Glu Ile Leu Glu Lys Ala Leu 210 215 220 Lys Arg Gly Ser Leu Leu Gly Cys Phe Ile Asp Thr Arg Ser Ala Ala 225 230 235 240 Glu Ser Glu Ala Arg Thr Pro Phe Gly Leu Ile Lys Gly His Ala Tyr 245 250 255 Ser Val Thr Gly Ile Asp Gln Val Ser Phe Arg Gly Gln Arg Ile Glu 260 265 270 Leu Ile Arg Ile Arg Asn Pro Trp Gly Gln Val Glu Trp Asn Gly Ser 275 280 285 Trp Ser Asp Arg Met Ala Phe Lys Asp Phe Lys Ala His Phe Asp Lys 290 295 300 Val Glu Ile Cys Asn Leu Thr Pro Asp Ala Leu Glu Glu Asp Ala Ile 305 310 315 320 His Lys Trp Glu Val Thr Val His Gln Gly Ser Trp Val Arg Gly Ser 325 330 335 Thr Ala Gly Gly Cys Arg Asn Phe Leu Asp Thr Phe Trp Thr Asn Pro 340 345 350 Gln Ile Lys Leu Ser Leu Thr Glu Lys Asp Glu Gly Gln Glu Glu Cys 355 360 365 Ser Phe Leu Val Ala Leu Met Gln Lys Asp Arg Arg Lys Leu Lys Arg 370 375 380 Phe Gly Ala Asn Val Leu Thr Ile Gly Tyr Ala Ile Tyr Glu Cys Pro 385 390 395 400 Asp Lys Asp Glu His Leu Asn Lys Asp Phe Phe Arg Tyr His Ala Ser 405 410 415 Arg Ala Arg Ser Lys Thr Phe Ile Asn Leu Arg Glu Val Ser Asp Arg 420 425 430 Phe Lys Leu Pro Pro Gly Glu Tyr Ile Leu Ile Pro Ser Thr Phe Glu 435 440 445 Pro His Gln Glu Ala Asp Phe Cys Leu Arg Ile Phe Ser Glu Lys Lys 450 455 460 Ala Ile Thr Arg Asp Met Asp Gly Asn Val Asp Ile Asp Leu Pro Glu 465 470 475 480 Pro Pro Lys Pro Thr Pro Pro Asp Gln Glu Thr Glu Glu Glu Gln Arg 485 490 495 Phe Arg Ala Leu Phe Glu Gln Val Ala Gly Glu Asp Met Glu Val Thr 500 505 510 Ala Glu Glu Leu Glu Tyr Val Leu Asn Ala Val Leu Gln Lys Lys Lys 515 520 525 Asp Ile Lys Phe Lys Lys Leu Ser Leu Ile Ser Cys Lys Asn Ile Ile 530 535 540 Ser Leu Met Asp Thr Ser Gly Asn Gly Lys Leu Glu Phe Asp Glu Phe 545 550 555 560 Lys Val Phe Trp Asp Lys Leu Lys Gln Trp Ile Asn Leu Phe Leu Arg 565 570 575 Phe Asp Ala Asp Lys Ser Gly Thr Met Ser Thr Tyr Glu Leu Arg Thr 580 585 590 Ala Leu Lys Ala Ala Gly Phe Gln Leu Ser Ser His Leu Leu Gln Leu 595 600 605 Ile Val Leu Arg Tyr Ala Asp Glu Glu Leu Gln Leu Asp Phe Asp Asp 610 615 620 Phe Leu Asn Cys Leu Val Arg Leu Glu Asn Ala Ser Arg Val Phe Gln 625 630 635 640 Ala Leu Ser Thr Lys Asn Lys Glu Phe Ile His Leu Asn Ile Asn Glu 645 650 655 Phe Ile His Leu Thr Met Asn Ile 660

* * * * *