Composition for detection of genes encoding membrane-associated proteins Reddy, Roopa ; et al. [Incyte Genomics, Inc.]

Composition for detection of genes encoding membrane-associated proteins

Reddy, Roopa ; et al.

Patent Application Summary

U.S. patent application number 10/313542 was filed with the patent office on 2003-06-26 for composition for detection of genes encoding membrane-associated proteins. This patent application is currently assigned to Incyte Genomics, Inc.. Invention is credited to Au-Young, Janice, Guegler, Karl J., Reddy, Roopa.

Application Number	20030120057 10/313542
Document ID	/
Family ID	26816220
Filed Date	2003-06-26

United States Patent Application	20030120057
Kind Code	A1
Reddy, Roopa ; et al.	June 26, 2003

Composition for detection of genes encoding membrane-associated proteins

Abstract

The present invention relates to a composition comprising a plurality of polynucleotide sequences. The composition can be used as probes or array elements.

Inventors:	Reddy, Roopa; (Fremont, CA) ; Guegler, Karl J.; (Menlo Park, CA) ; Au-Young, Janice; (Brisbane, CA)
Correspondence Address:	INCYTE GENOMICS, INC. 3160 PORTER DRIVE PALO ALTO CA 94304 US
Assignee:	Incyte Genomics, Inc. Palo Alto CA
Family ID:	26816220
Appl. No.:	10/313542
Filed:	December 5, 2002

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
10313542	Dec 5, 2002
09495050	Jan 31, 2000
6492505
60118318	Feb 1, 1999

Current U.S. Class:	536/23.5 ; 536/24.3
Current CPC Class:	Y02A 50/30 20180101; C12Q 2600/158 20130101; C12Q 1/6876 20130101; C12Q 1/6834 20130101
Class at Publication:	536/23.5 ; 536/24.3
International Class:	C07H 021/04

Claims

What is claimed is:

1. A composition comprising a plurality of polynucleotide sequences comprising at least a fragment of a sequence selected from the group consisting of SEQ ID NOs:1-305.

2. The composition of claim 1, wherein each of said polynucleotide sequences comprises at least a fragment of a sequence encoding a membrane-associated protein.

3. The composition of claim 1, wherein each of said polynucleotide sequences comprises at least a fragment of a sequence encoding a receptor.

4. The composition of claim 1, wherein each of said polynucleotide sequences comprises at least a fragment of a sequence encoding an ion channel.

5. The composition of claim 1, wherein each of said polynucleotide sequences comprises at least a fragment of a sequence selected from SEQ ID NOs:1-288.

6. The composition of claim 1, wherein said polynucleotide sequences comprise at least a fragment of a sequence selected from SEQ ID NOs:289-294.

7. The composition of claim 1, wherein said polynucleotide sequences comprise at least a fragment of a sequence selected from SEQ ID NOs:295-305.

8. The composition of claim 7, wherein the fragment is selected from the group consisting of: (a) SEQ ID NOs:295-297; or (b) SEQ ID NOs:298-305;

9. The composition of claim 1, wherein the polynucleotide sequence is a probe.

10. The composition of claim 9, wherein the polynucleotide sequence is immobilized on a substrate.

11. The composition of claim 1, wherein the polynucleotide sequence is an array element.

Description

[0001] This application claims the benefit of U.S. Provisional Application No. 60/118,318 our Docket No. PA-0013 P, filed on Feb. 1, 1999.

FIELD OF THE INVENTION

[0002] The present invention relates to a composition comprising a plurality of polynucleotide sequences for use in research and diagnostic applications.

BACKGROUND OF THE INVENTION

[0003] DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array may be used to determine whether individuals are carrying mutations that predispose them to cancer. The array has over 50,000 DNA targets to analyze more than 400 distinct mutations of p53. A cytochrome p450 gene array is useful to determine whether individuals have one of a number of specific mutations that could result in increased drug metabolism, drug resistance, or drug toxicity.

[0004] DNA-based array technology is especially relevant to screen expression of a large number of genes rapidly. There is a growing awareness that gene expression is affected in a global fashion and that genetic predisposition, disease, or therapeutic treatment may affect, directly or indirectly, the expression of a large number of genes. In some cases the interactions may be expected, such as where the genes are part of the same signaling pathway. In other cases, such as when some of the genes participate in separate signaling pathways, the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic treatment affect the coregulation and expression of a large number of genes.

[0005] It would be advantageous to prepare DNA-based arrays that can be used for monitoring the expression of a large number of membrane-associated proteins. Proteins which span or are associated with cell membranes include receptors, ion channels and symporters, cytokines and their suppressors, monomeric or heterotrimeric G- and ras-related proteins, lectins such as selectin, oncogenes and their suppressors, and the like. Receptors include G protein coupled, four transmembrane, and tyrosine kinase receptors. Some of these proteins may span a cellular membrane and some may be secreted. The secreted proteins typically include signal sequences that direct them to their final cellular or extracellular destination.

[0006] The present invention provides for a composition comprising a plurality of polynucleotide sequences for use in detecting changes in expression of a large number of genes encoding proteins which are membrane-associated proteins, receptors and ion channels. Such a composition can be employed for the diagnosis or treatment of any disease--a pancreatic disease, a cancer, an immunopathology, a neuropathology and the like--where a defect in the expression of a gene encoding membrane-associated proteins is involved.

SUMMARY OF THE INVENTION

[0007] In one aspect, the present invention provides a composition comprising a plurality of polynucleotide sequences, wherein each of said polynucleotide sequences comprises at least a fragment of a gene encoding membrane-associated proteins, receptors and ion channels.

[0008] In one preferred embodiment, the plurality of polynucleotide sequences comprises at least a fragment of one or more of the sequences, SEQ ID NOs:1-305, presented in the Sequence Listing. In a second preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of a gene encoding a membrane-associated protein. In a third preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of a gene encoding a receptor. In a fourth preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of a gene encoding ion channels. In a fifth preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of at least one or more of the sequences of SEQ ID NOs:1-288. In a sixth preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of at least one or more of the sequences of SEQ ID NOs:289-294. In a seventh preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of at least one or more of the sequences of SEQ ID NOs:295-305. In one aspect, the fragment is selected from the group consisting of SEQ ID NOs:295-297, or SEQ ID NOs:298-305. In an eighth preferred embodiment, the composition is a polynucleotide probe. In one aspect, the composition is immobilized on a substrate. In a ninth preferred embodiment, the composition is an hybridizable array element.

[0009] The composition, a hybridizable array element, is useful to monitor the expression of a plurality of expressed polynucleotides. The microarray is used in the diagnosis and treatment of a pancreatic disease, a cancer, an immunopathology, a neuropathology, and the like.

[0010] In another aspect, the present invention provides an expression profile that can reflect the expression levels of a plurality of polynucleotide sequences in a sample. The expression profile comprises a microarray and a plurality of detectable complexes. Each detectable complex is formed by hybridization of at least one probe polynucleotide sequence to at least one target polynucleotide sequence and further comprises a labeling moiety for detection.

DESCRIPTION OF THE SEQUENCE LISTING, FIGURES, AND TABLES

[0011] A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

[0012] The Sequence Listing is a compilation of nucleotide sequences obtained by sequencing and assembling clone inserts (isolates) from various cDNA libraries. Each sequence is identified by a sequence identification number (SEQ ID NO:) and by clone number.

[0013] FIGS. 1A and 1B are an alignment of SEQ ID NOs:298-302 produced using GELVIEW Fragment Assembly System software (Genetics Computer Group (GCG), Madison Wis.).

[0014] FIGS. 2A and 2B are an alignment of SEQ ID NOs:303-305 produced using GELVIEW Fragment Assembly System software (GCG).

[0015] Table 1 is a list of the sequences disclosed herein. By column, the table contains: 1) SEQ ID NO: as shown in the Sequence Listing; 2) Incyte Clone NO; 3) PRINT ID, designation of the relevant PROSITE group; 4) PRINT DESCRIPTION; 5) PRINT STRENGTH, the degree of correlation to the PROSITE group, >1300 is strong and 1000 to 1300 is weak; 6) PRINT SCORE, where >1300 is strong and 1000 to 1300 is suggestive; 7) TM, the presence of at least one transmembrane domain; and 8) SIGNAL PEPTIDE, the presence of a signal peptide. The table is arranged so that SEQ ID NOs:1-305 contain at least a fragment of a gene encoding a membrane-associated protein, some of which are receptors, and some, ion channels.

DESCRIPTION OF THE INVENTION

[0016] Definitions

[0017] The term "microarray" refers to an ordered arrangement of hybridizable array elements. The elements are arranged so that there are preferably at least one or more different elements, more preferably at least 100 elements, even more preferably at least 1,000 elements, and most preferably at least 10,000 elements on a one cm.sup.2 substrate surface. The maximum number of array elements is unlimited, but is at least 100,000. Furthermore, the hybridization signal from each array element is individually distinguishable. In a preferred embodiment, the array elements comprise polynucleotide sequences.

[0018] A "polynucleotide" refers to a chain of nucleotides. Preferably, the chain has from about five to 10,000 nucleotides, more preferably from about 50 to 3,500 nucleotides. The term "probe" refers to a polynucleotide sequence capable of hybridizing with a target sequence to form a polynucleotide probe/target complex under hybridization conditions. A "target polynucleotide" refers to a chain of nucleotides to which a polynucleotide probe can hybridize by base pairing. In some instances, the sequences will be completely complementary (no mismatches) when aligned; in others, there may be up to a 10% mismatch.

[0019] A "plurality" refers preferably to a group of at least one or more members, more preferably to a group of at least about 100, even more preferably to a group of at least about 1,000 members, and most preferably to a group of at least about 10,000 members. The maximum number of members is unlimited, but is at least about 100,000 members.

[0020] A "fragment" means a stretch of at least about 100 consecutive nucleotides. A "fragment" can also mean a stretch of at least about 100 consecutive nucleotides that contains one or more deletions, insertions or substitutions. A "fragment" can also include the entire open reading frame of a gene. Preferred fragments are those that lack secondary structure as identified by using computer software programs such as OLIGO 4.06 Primer Analysis software (National Biosciences, Plymouth Minn.), LASERGENE software (DNASTAR, Madison Wis.), MACDNASIS (Hitachi Software Engineering Co., Ltd., San Bruno Calif.) and the like.

[0021] The term "gene" or "genes" refers to polynucleotide sequence which may be the partial or complete and may comprise regulatory, untranslated, or coding regions. The phrase "genes encoding membrane-associated proteins, receptors, or ion channels" refers to genes comprising sequences that contain conserved protein motifs or domains that were identified by BLAST (Basic Local Alignment Search Tool; Altschul (1993) J Mol Evol 36:290-300; and Altschul et al. (1990) J Mol Biol 215:403-410), PRINTS, or other analytical tools. Additionally, "genes encoding membrane-associated proteins, receptors, or ion channels" refers to genes which may produce proteins which span the cell membrane or have signal sequences which direct them to their final cellular or extracellular destination.

[0022] The Invention

[0023] The present invention provides a composition comprising a plurality of polynucleotide sequences comprising at least a fragment of a gene encoding a protein which is a receptor, ion channel, or associated with cell membrane. Preferably, the plurality of polynucleotide sequences comprise at least a fragment of one or more of the sequences (SEQ ID NOs:1-305) presented in the Sequence Listing. In one preferred embodiment, the composition comprises a plurality of polynucleotide sequences, wherein each sequence comprises at least a fragment of a sequence selected from the group consisting of SEQ ID NOs:1-294. In a second preferred embodiment, the composition comprises a plurality of polynucleotide sequences, wherein each sequence comprises at least a fragment of a sequence selected from the group consisting of SEQ ID NOs:295-305.

[0024] A microarray can be used for large scale genetic or gene expression analysis of a large number of polynucleotide sequences. Such an analysis can be used in the diagnosis of diseases and in the monitoring of treatments where altered expression of genes encoding receptors, ion channels, or membrane-associated proteins cause disease, such as pancreatic disease, cancer, an immunopathology, neuropathology, and the like. Further, the microarray can be employed to investigate an individual's predisposition to a disease, such as pancreatic disease, cancer, an immunopathology, or a neuropathology. Furthermore, the microarray can be employed to investigate cellular responses to infection, drug treatment, and the like.

[0025] When the composition of the invention is employed as hybridizable array elements in a microarray, the array elements are organized in an ordered fashion so that each element is present at a specified location on the substrate. Because the array elements are at specified locations on the substrate, the hybridization patterns and intensities (which together create a unique expression profile) can be interpreted in terms of expression levels of particular genes and can be correlated with a particular disease or condition or treatment.

[0026] The composition comprising a plurality of polynucleotide sequences can also be used to purify a subpopulation of mRNAs, cDNAs, genomic fragments, and the like, in a sample. Typically, samples will include polynucleotides of interest and additional nucleic acids which may contribute to background signal in a hybridization. Therefore, it may be advantageous to remove these additional nucleic acids before hybridization. One method for removing the additional nucleic acids is to hybridize the sample containing probe polynucleotides with immobilized polynucleotide targets. Those nucleic acids which do not hybridize to the polynucleotide targets are washed away. At a later point, the immobilized target polynucleotides can be released in the form of purified target polynucleotides.

[0027] Method for Selecting Polynucleotide Sequences

[0028] This section describes the selection of the plurality of polynucleotide sequences. In one embodiment, the sequences are selected based on the presence of shared signal sequence motifs. For example, signal sequences generally contain 15 to 60 amino acids and are located at the N-terminal end of the protein. The signal sequence consists of three regions: 1) an n-region located adjacent to the N-terminus which is composed of one to five amino acids and usually carries a positive charge, 2) the h-region which is composed of 7 to 15 hydrophobic amino acids and creates a hydrophobic core; and 3) the c region which is located between the h-region and the cleavage site and is composed of three to seven polar, but mostly uncharged, amino acids. The signal sequence is removed from the protein during posttranslational processing by cleavage at the cleavage site.

[0029] A transmembrane protein is characterized by a polypeptide chain which is exposed on both sides of a membrane. The cytoplasmic and extracellular domains are separated by at least one membrane-spanning segment which traverses the hydrophobic environment of the lipid bilayer. The membrane-spanning segment is composed of amino acid residues with nonpolar side chains, usually in the form of an .alpha. helix. Segments which contain about 20-30 hydrophobic residues are long enough to span a membrane as an .alpha. helix, and they can often be identified by means of a hydropathy plot.

[0030] Receptor sequences are recognized by one or more hydrophobic transmembrane regions, cysteine disulfide bridges between extracellular loops, an extracellular N-terminus, and a cytoplasmic C-terminus. For example, in G protein-coupled receptors (GPCRs), the N-terminus interacts with ligands, the disulfide bridge interacts with agonists and antagonists, the second cytoplasmic loop has a conserved, acidic-Arg-aromatic triplet which may interact with the G proteins, and the large third intracellular loop interacts with G proteins to activate second messengers such as cyclic AMP, phospholipase C, inositol triphosphate, or ion channel proteins (Watson and Arkinstall (1994) The G-protein Linked Receptor Facts Book, Academic Press, San Diego Calif.). Other exemplary classes of receptors such as the tetraspanins (Maecker et al. (1997) FASEB J 11:428-442), calcium dependent receptors (Speiss (1990) Biochem 29:10009-18) and the single transmembrane receptors may be similarly characterized relative to their intracellular and extracellular domains, known motifs, and interactions with other molecules.

[0031] An ion channel is a transmembrane protein that forms a hydrophilic pore through which ions can cross the lipid bilayer of the membrane. An ion channel usually shows some degree of ion specificity, and up to a million ions per second may flow down their electrochemical gradients through the open pore. Ion channels are gated and allow ions to pass only under defined circumstances. Gated channels may be either voltage-gated, such as the sodium channel of neurons, or ligand-gated, such as the acetylcholine receptor of cholinergic synapses.

[0032] Membrane-associated proteins, receptors or ion channels may act directly as inhibitors or as stimulators of cell proliferation, growth, attachment, angiogenesis, and apoptosis, or indirectly by modulating the effects of transcription factors, matrix and adhesion molecules, cell cycle regulators, and other molecules in cell signaling pathways. In addition, cell signaling molecules may act as ligands or ligand cofactors for receptors which modulate cell growth, proliferation, and differentiation. These molecules may be identified by sequence homology to molecules whose function has been characterized, and by the identification of their conserved domains. Membrane-associated proteins, receptors or ion channels may be characterized using programs such as BLAST, PRINTS, or Hidden Markov Models (HMM). Fragments which include characterized, conserved regions of membrane-associated proteins, receptors, or ion channels may be used in hybridization technologies to identify similar proteins.

[0033] A large number of clones from a variety of cDNA libraries can be screened using software well known in the art to discover sequences with conserved protein domains or motifs. Such sequences may be screened using the BLOCK 2 Bioanalysis program (Incyte Pharmaceuticals, Palo Alto Calif.), a motif analysis program based on sequence information contained in the SWISSPROT and PROSITE databases, which is useful for determining the function of uncharacterized proteins translated from genomic or cDNA sequences (Bairoch et al. (1997) Nucleic Acids Res 25:217-221; Attwood et al. (1997) J Chem Inf Comput Sci 37:417-424). PROSITE is particularly useful to identify functional or structural domains that cannot be detected using common motifs because of extreme sequence divergence. The method, which is based on weight matrices, calibrates the motifs against the SWISS-PROT database to obtain a measure of the chance distribution of the matches. Similarly, databases such as PRINTS store conserved motifs useful in the characterization of proteins (Attwood et al.(1998) Nucl Acids Res 26:304-308). These conserved motifs are used in the selection and design of probes. The PRINTS database can be searched using the BLIMPS search program. The PRINTS database of protein family "fingerprints" complements the PROSITE database and utilizes groups of conserved motifs within sequence alignments to build characteristic signatures of different polypeptide families. Alternatively, HMMs can be used to find shared motifs, specifically consensus sequences (Pearson and Lipman (1988) Proc Natl Acad Sci 85:2444-2448; Smith and Waterman (1981) J Mol Biol 147:195-197). Although HMMs were initially developed to examine speech recognition patterns, they have been used in biology to analyze protein and DNA sequences and to model protein structure. HMMs have a formal probabilistic basis and use position-specific scores for amino acids or nucleotides. The algorithms are flexible in that they incorporate information from newly identified sequences to build even more successful patterns. HMMs are useful to identify the transmembrane regions and signal peptides.

[0034] In another embodiment, the sequences disclosed in the Sequence Listing can be searched against GenBank and SWISSPROT databases using BLAST. Then, the descriptions of those sequences with homology to the disclosed sequences may be scanned using keywords such as receptor, transmembrane, receptor, channel, oncogene, inhibitor, and the like.

[0035] Sequences identified by the methods described above are provided in SEQ ID NOs:1-305 in the Sequence Listing. Table 1 provides the annotation to the referenced PRINTS sequences and specifies whether they possess transmembrane and signal peptide motifs. The resulting composition can comprise polynucleotide sequences that are not redundant, i.e., there is no more than one polynucleotide sequence to represent a particular gene. Alternatively, the composition can contain polynucleotide probes or microarray elements that are redundant, i.e., a gene is represented by more than one polynucleotide sequence.

[0036] The selected polynucleotide sequences may be manipulated further to optimize their performance as hybridization probes. To optimize probe selection, the sequences are examined using a computer algorithms, which are well known in the art, to identify fragments of genes without potential secondary structure. Such computer algorithms are found in OLIGO 4.06 Primer Analysis software (National Biosciences) or LASERGENE software (DNASTAR). These programs can search nucleotide sequences to identify stem loop structures and tandem repeats and to analyze G+C content of the sequence (those sequences with a G+C content greater than 60% are excluded). Alternatively, the probes can be optimized by trial and error. Experiments can be performed to determine whether the probes hybridize optimally to target sequences under experimental conditions.

[0037] Where the greatest numbers of different polynucleotide sequences are desired, the sequences are extended to assure that different polynucleotide sequences are not derived from the same gene, i.e., the polynucleotide sequences are not redundant. The probe sequences may be extended utilizing the partial nucleotide sequences derived from clone isolates by employing methods well known in the art. For example, one method which may be employed, "restriction-site" PCR, uses universal primers to retrieve unknown sequence adjacent to a known locus (Sarkar (1993) PCR Methods Applic 2: 318-322).

[0038] Polynucleotide Sequences

[0039] This section describes the polynucleotide sequences. The polynucleotide sequences can be genomic DNA, cDNA, mRNA, or any RNA-like or DNA-like material, such as peptide nucleic acids, branched DNAs, and the like. The polynucleotide sequences can be sense or antisense, complementary sequences. Where target polynucleotides are double stranded, the probes may be either sense or antisense strands. Where the target polynucleotides are single stranded, the probes are complementary single strands.

[0040] In one embodiment, the polynucleotide sequences are cDNAs, the size of which may vary, and are preferably from 1000 to 10,000 nucleotides, more preferably from 150 to 5000 nucleotides. In a second embodiment, the polynucleotide sequences are contained within plasmids. In this case, the size of the inserted cDNA sequence, excluding the vector DNA and its regulatory sequences, may vary from about 50 to 12,000 nucleotides, more preferably from about 150 to 5000 nucleotides.

[0041] The polynucleotide can be prepared by a variety of synthetic or enzymatic schemes which are well known in the art. Sequences can be synthesized, in whole or in part, using chemical or enzymatic methods well known in the art (Caruthers et al. (1980) Nucl Acids Symp Ser (7) 215-233; Ausubel et al. (1997) Short Protocols in Molecular Biology, John Wiley & Sons, New York N.Y.).

[0042] Nucleotide analogues, which can base pair with the target nucleotide sequences, can be incorporated into the probe sequences by methods well known in the art. For example, certain guanine nucleotides can be substituted with hypoxanthine which hydrogen bonds with cytosine, but these bonds are less stable than those formed between guanine and cytosine. Alternatively, adenine nucleotides can be substituted with 2, 6-diaminopurine which forms stronger bonds with thymidine than those between adenine and thymidine. Additionally, the polynucleotide sequences can include nucleotides that have been derivatized chemically or enzymatically. Typical chemical modifications include derivatization with acyl, alkyl, aryl or amino groups.

[0043] The polynucleotide sequences can be immobilized on a substrate. Preferred substrates are any suitable rigid or semi-rigid support including membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles and capillaries. The substrate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which the polynucleotide sequences are bound. Preferably, the substrates are optically transparent.

[0044] Sequences can be synthesized, in whole or in part, on the surface of a substrate using a chemical coupling procedure and a piezoelectric printing apparatus, such as that described in PCT publication WO95/251116 (Baldeschweiler et al.). Alternatively, the target can be synthesized on a substrate surface using a self-addressable electronic device that controls when reagents are added (Heller et al. U.S. Pat. No. 5,605,662).

[0045] Complementary DNA (cDNA) can be arranged and immobilized on a substrate. The sequences can be immobilized by covalent means such as by chemical bonding procedures or UV. In one such method, a cDNA is bound to a glass surface which has been modified to contain epoxide or aldehyde groups. In another case, a cDNA target is placed on a polylysine coated surface and UV cross-linked (Shalon et al. WO95/35505). In yet another method, a DNA is actively transported from a solution to a given position on a substrate by electrical means (U.S. Pat. No. 5,605,662). Alternatively, individual DNA clones can be gridded on a filter. Cells are lysed, proteins and cellular components degraded, and the DNA coupled to the filter by UV cross-linking.

[0046] Furthermore, the sequences do not have to be directly bound to the substrate, but rather can be bound to the substrate through a linker group. The linker groups are typically about 6 to 50 atoms long, and they provide exposure to the attached polynucleotide sequence. Preferred linker groups include ethylene glycol oligomers, diamines, diacids and the like. Reactive groups on the substrate surface react with one of the terminal portions of the linker to bind the linker to the substrate. The other terminal portion of the linker is adapted to bind the polynucleotide sequence.

[0047] The polynucleotide sequences can be attached to a substrate by dispensing reagents for target synthesis on the substrate surface or by dispensing preformed DNA fragments or clones on the substrate surface. Typical dispensers include a micropipette delivering solution to the substrate with a robotic system to control the position of the micropipette with respect to the substrate. There can be a multiplicity of dispensers so that reagents can be delivered to the reaction regions simultaneously.

[0048] Sample Preparation

[0049] In order to conduct sample analysis, a sample containing nucleic acids is provided. The samples can be obtained from any bodily fluid (blood, urine, saliva, phlegm, gastric juices, etc.), cultured cells, biopsies, or other tissue preparations. DNA or RNA can be isolated from the sample according to any of a number of methods well known to those of skill in the art. For example, methods of purification of nucleic acids are described in Tijssen (1993; Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, Elsevier Science, New York N.Y.). In one case, total RNA is isolated using the TRIZOL reagent (Life Technologies, Gaithersburg Md.), and mRNA is isolated using oligo d(T) column chromatography or glass beads. Alternatively, when probe polynucleotides are derived from an mRNA, the probe polynucleotides can be DNA reverse transcribed from the mRNA, an RNA transcribed from that cDNA, a DNA amplified from that DNA, an RNA transcribed from the amplified DNA, and the like. When the target polynucleotide is derived from cDNA, the target polynucleotide can be DNA amplified from DNA or DNA reverse transcribed from RNA. In yet another alternative, the polynucleotide sequences are prepared by more than one method.

[0050] When polynucleotide sequences are amplified, it is desirable to amplify the nucleic acid sample and maintain the relative abundances represented in the original sample including low abundance transcripts. Total mRNA can be amplified by reverse transcription using a reverse transcriptase and a primer consisting of oligo d(T) and a sequence encoding the phage T7 promoter to provide a single stranded DNA template. The second DNA strand is polymerized using a DNA polymerase and a RNAse which assists in breaking up the DNA/RNA hybrid. After synthesis of the double stranded DNA, T7 RNA polymerase can be added, and RNA transcribed from the second DNA strand template (Van Gelder et al. U.S. Pat. No. 5,545,522). RNA can be amplified in vitro, in situ or in vivo (Eberwine U.S. Pat. No. 5,514,545,).

[0051] It is also advantageous to include quantitation controls within the sample to assure that amplification and labeling procedures do not change the true distribution of probe polynucleotides in a sample. For this purpose, a sample is spiked with a known amount of a control probe polynucleotide and the composition of target polynucleotide sequences includes reference target sequences which specifically hybridize with the control probe polynucleotides. After hybridization and processing, the hybridization signals obtained should reflect accurately the amount of control probe polynucleotides added to the sample.

[0052] Prior to hybridization, it may be desirable to fragment the probe polynucleotides. Fragmentation improves hybridization by minimizing secondary structure and cross-hybridization to polynucleotides in the sample with low or no complementarity. Fragmentation can be performed by mechanical or chemical means.

[0053] The probe polynucleotides may be labeled with one or more labeling moieties to allow for detection of hybridized probe/target polynucleotide complexes. The labeling moieties can include compositions that can be detected by spectroscopic, photochemical, biochemical, bioelectronic, immunochemical, electrical, optical or chemical means. The labeling moieties include radioisotopes, such as .sup.32P, .sup.33P or .sup.35S, chemiluminescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers, such as fluorescent markers and dyes, magnetic labels, linked enzymes, mass spectrometry tags, spin labels, electron transfer donors and acceptors, and the like.

[0054] Exemplary dyes include quinoline dyes, triarylmethane dyes, phthaleins, azo dyes, cyanine dyes and the like. Preferably, fluorescent markers absorb light above about 300 nm, preferably above 400 nm, and usually emit light at wavelengths at least greater than 10 nm above the wavelength of the light absorbed. Preferred fluorescent markers include fluorescein, phycoerythrin, rhodamine, lissamine, and Cy3 and Cy5 (Amersham Pharmacia Biotech, Piscataway N.J.).

[0055] Labeling can be carried out during an amplification reaction, such as polymerase chain and in vitro transcription reactions, or by nick translation or 5' or 3'-end-labeling reactions. In one case, labeled nucleotides are used in an in vitro transcription reaction. When the label is incorporated after or without an amplification step, the label is incorporated by using terminal transferase or by kinasing the 5' end of the polynucleotide sequence and then incubating overnight with a labeled oligonucleotide in the presence of T4 RNA ligase.

[0056] Alternatively, the labeling moiety can be incorporated after hybridization once a probe/target complex has formed. In one case, biotin is first incorporated during an amplification step as described above. After the hybridization reaction, unbound nucleic acids are rinsed away so that the only biotin present is attached to probe polynucleotides complexed with the target polynucleotides. An avidin-conjugated fluorophore, such as avidin-phycoerythrin, that binds with high affinity to biotin is added. In another case, the labeling moiety is incorporated by intercalation into bound probe/target complexes. In this case, an intercalating dye such as a psoralen-linked dye can be employed.

[0057] Under some circumstances it may be advantageous to immobilize the probe polynucleotides on a substrate and have the polynucleotide targets bind to the immobilized probe polynucleotides. In such cases the probe polynucleotides can be attached to a substrate as described above.

[0058] Hybridization and Detection

[0059] Hybridization causes a denatured polynucleotide probe and a denatured complementary target to form a stable duplex through base pairing. Hybridization methods are well known to those skilled in the art. (See, e.g., Ausubel, supra, units 2.8-2.11, 3.18-3.19 and 4-6-4.9.) Conditions can be selected for hybridization where completely complementary probe and target can hybridize, i.e., each base pair must interact with its complementary base pair. Alternatively, conditions can be selected where probe and target have mismatches but are still able to hybridize. Suitable conditions can be selected, for example, by varying the concentrations of salt in the prehybridization, hybridization, and wash solutions or by varying the hybridization and wash temperatures. With some membranes, the temperature can be decreased by adding formamide to the prehybridization and hybridization solutions.

[0060] Hybridization can be performed at low stringency with buffers, such as 5.times.SSC with 1% sodium dodecyl sulfate (SDS) at 60.degree. C., which permits hybridization between probe and target sequences that contain some mismatches to form probe/target complexes. Subsequent washes are performed at higher stringency with buffers such as 0.2.times.SSC with 0.1% SDS at either 45.degree. C. (medium stringency) or 68.degree. C. (high stringency), to maintain hybridization of only those probe/target complexes that contain completely complementary sequences. Background signals can be reduced by the use of detergents such as SDS, Sarcosyl, or Triton X-100, or a blocking agent, such as salmon sperm DNA.

[0061] Hybridization specificity can be evaluated by comparing the hybridization of control probe sequences to control target sequences that are added to a sample in a known amount. The control probe may have one or more sequence mismatches compared with the corresponding control target. In this manner, it is possible to evaluate whether only complementary probes are hybridizing to the targets or whether mismatched hybrid duplexes are forming.

[0062] Hybridization reactions can be performed in absolute or differential hybridization formats. In the absolute hybridization format, probe polynucleotides from one sample are hybridized to microarray elements, and signals detected after hybridization complexes form. Signal strength correlates with probe polynucleotide levels in a sample. In the differential hybridization format, differential expression of a set of genes in two biological samples is analyzed. Probe polynucleotides from the two samples are prepared and labeled with different labeling moieties. A mixture of the two labeled probe polynucleotides is hybridized to the microarray elements, and signals are examined under conditions in which the emissions from the two different labels are individually detectable. Targets in the microarray that are hybridized to substantially equal numbers of probes derived from both biological samples give a distinct combined fluorescence (Shalon WO95/35505). In a preferred embodiment, the labels are fluorescent labels with distinguishable emission spectra, such as a lissamine conjugated nucleotide analog and a fluorescein conjugated nucleotide analog. In another embodiment Cy3/Cy5 fluorophores (Amersham Pharmacia Biotech) are employed.

[0063] After hybridization, the microarray is washed to remove nonhybridized nucleic acids, and complex formation between the hybridizable array elements and the probe polynucleotides is examined. Methods for detecting complex formation are well known to those skilled in the art. In a preferred embodiment, the probe polynucleotides are labeled with a fluorescent label, and measurement of levels and patterns of fluorescence indicative of complex formation is accomplished by fluorescence microscopy, preferably confocal fluorescence microscopy. An argon ion laser excites the fluorescent label, emissions are directed to a photomultiplier, and the amount of emitted light is detected and quantitated. The detected signal should be proportional to the amount of probe/target polynucleotide complexes at each position of the microarray. The fluorescence microscope can be associated with a computer-driven scanner device to generate a quantitative two-dimensional image of hybridization intensity. The scanned image is examined to determine the abundance/expression level of each hybridized probe polynucleotide.

[0064] Typically, microarray fluorescence intensities can be normalized to take into account variations in hybridization intensities when more than one microarray is used under similar test conditions. In a preferred embodiment, individual polynucleotide probe/target complex hybridization intensities are normalized using the intensities derived from internal normalization controls contained on each microarray.

[0065] Expression Profiles

[0066] Expression profiles using the composition of this invention may be used to detect changes in the expression of genes implicated in disease. These genes include genes whose altered expression is correlated with pancreatic disease, cancer, immunopathology, neuropathology, and the like.

[0067] The expression profile comprises the polynucleotide sequences of the Sequence Listing. The expression profile also includes a plurality of detectable complexes. Each complex is formed by hybridization of one or more polynucleotide sequences or array elements to one or more complementary probe polynucleotides. At least one of the polynucleotide sequences, preferably a plurality of polynucleotide sequences, is hybridized to a complementary target polynucleotide forming at least one, and preferably a plurality, of complexes. A complex is detected by the incorporation of at least one labeling moiety, described above, in the complex. Expression profiles provide "snapshots" that reflect unique expression patterns that are characteristic of a disease or condition.

[0068] After performing hybridization experiments and interpreting the signals produced by complexes on a microarray, particular polynucleotide sequences can be identified based on their expression patterns. Such polynucleotide sequences can be used to clone a full length sequence for the gene, to produce a polypeptide, to develop a diagnostic panel for a particular disease, to choose a gene for potential therapeutic use, and the like.

[0069] Additional Utility of the Invention

[0070] Microarrays containing the sequences of the Sequence Listing can be employed in several applications including diagnostics, prognostics and treatment regimens, drug discovery and development, toxicological and carcinogenicity studies, forensics, pharmacogenomics and the like. In one situation, the microarray is used to monitor the progression of disease. Researchers can assess and catalog the differences in gene expression between healthy and diseased tissues or cells. By analyzing changes in patterns of gene expression, disease can be diagnosed at earlier stages before the patient is symptomatic. The invention can also be used to monitor the efficacy of treatment. For some treatments with known side effects, the microarray is employed to "fine tune" the treatment regimen. A dosage is established that causes a change in genetic expression patterns indicative of successful treatment. Expression patterns associated with undesirable side effects are avoided. This approach may be more sensitive and rapid than waiting for the patient to show inadequate improvement, or to manifest side effects, before altering the course of treatment.

[0071] Alternatively, animal models which mimic a disease can be used, rather than patients, to characterize expression profiles associated with a particular disease or condition. This gene expression data may be useful in diagnosing and monitoring the course of disease in the model, in determnining gene that are candidates for intervention, and in testing novel treatment regimens. Subsequently, the expression profile following protocols and treatments successful in the model system may be used on and monitored in human patients.

[0072] The expression of genes encoding membrane-associated proteins, receptors, and ion channels was highly associated with pancreatic tissue; .about.45% of the sequences of the Sequence Listing were expressed in pancreatic tissues. In particular, the microarray and expression profile is useful to diagnose a conditions of the pancreas such as diabetes, pancreatitus, pancreatic cholera, hyperlipidemia, fibrocystic disease, and cancers and tumors of the pancreas.

[0073] The expression of genes encoding membrane-associated proteins, receptors, and ion channels is closely associated with immune conditions, disorders and diseases; .about.20% of the sequences of the Sequence Listing were expressed in tissues from patients with immunological conditions such as AIDS, Addison's disease, ARDS, allergies, ankylosing spondylitis, amyloidosis, anemia, asthma, atherosclerosis, autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, cholecystitis, contact dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes mellitus, emphysema, erythroblastosis fetalis, erythema nodosum, atrophic gastritis, glomerulonephritis, Goodpasture's syndrome, gout, Graves' disease, Hashimoto's thyroiditis, hypereosinophilia, irritable bowel syndrome, multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, osteoarthritis, osteoporosis, pancreatitis, polymyositis, psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, Sjogren's syndrome, systemic anaphylaxis, systemic lupus erythematosus, systemic sclerosis, thrombocytopenic purpura, ulcerative colitis, uveitis, Werner syndrome, complications of cancer, hemodialysis, and extracorporeal circulation, viral, bacterial, fungal, parasitic, protozoal, and helminthic infections, and trauma.

[0074] The expression of genes encoding membrane-associated proteins, receptors, and ion channels is closely associated with cancers; .about.10% of the sequences of the Sequence Listing were expressed in cancerous tissues. In particular, the microarray and expression profile is useful to diagnose a cancer such as adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma and teratocarcinoma. Such cancers include, but are not limited to, cancers of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, colon, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid and uterus.

[0075] The expression of genes encoding membrane-associated proteins, receptors, and ion channels is also closely associated with the immune response. Therefore, the microarray can be used to diagnose immunopathologies including, but not limited to, AIDS, Addison's disease, adult respiratory distress syndrome, allergies, anemia, asthma, atherosclerosis, bronchitis, cholecystitus, Crohn's disease, ulcerative colitis, atopic dermatitis, dermatomyositis, diabetes mellitus, emphysema, atrophic gastritis, glomerulonephritis, gout, Graves' disease, hypereosinophilia, irritable bowel syndrome, lupus erythematosus, multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, osteoarthritis, osteoporosis, pancreatitis, polymyositis, rheumatoid arthritis, scleroderma, Sjogren's syndrome, and autoimmune thyroiditis; complications of cancer, hemodialysis, extracorporeal circulation; viral, bacterial, fungal, parasitic, and protozoal infections; and trauma.

[0076] Neuropathologies are also effected by the expression of genes encoding membrane-associated proteins, receptors, and ion channels; in fact, .about.1% of the sequences of the Sequence Listing were expressed in neuronal tissues. Thus, the microarray can be used to diagnose neuropathologies including, but not limited to, akathesia, Alzheimer's disease, amnesia, amyotrophic lateral sclerosis, bipolar disorder, catatonia, cerebral neoplasms, dementia, depression, Down's syndrome, tardive dyskinesia, dystonias, epilepsy, Huntington's disease, multiple sclerosis, neurofibromatosis, Parkinson's disease, paranoid psychoses, schizophrenia, and Tourette's disorder.

[0077] Also, researchers can use the microarray to rapidly screen large numbers of candidate drug molecules, looking for ones that produce an expression profile similar to those of known therapeutic drugs, with the expectation that molecules with the same expression profile will likely have similar therapeutic effects. Thus, the invention provides the means to determine the molecular mode of action of a drug.

[0078] It is understood that this invention is not limited to the particular devices, machines, materials and methods described. Although preferred embodiments are described; devices, machines, materials and methods similar or equivalent to these embodiments may be used to practice the invention. The preferred embodiments are not intended to limit the scope of the invention which is limited only by the appended claims.

[0079] The singular forms "a", "an", and "the" include plural reference unless the context clearly dictates otherwise. All technical and scientific terms have the meanings commonly understood by one of ordinary skill in the art. All patents mentioned herein are incorporated by reference for the purpose of describing and disclosing the devices, machines, materials and methods which are presented and which might be used in connection with the invention. Nothing in the specification is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention.

EXAMPLES

[0080] For purposes of example, the preparation and sequencing of the PANCNOT07 cDNA library is described. Preparation and sequencing of cDNAs in libraries in the LIFESEQ database (Incyte Pharmaceuticals) have varied over time, and the gradual changes involved use of kits, plasmids, and machinery available at the particular time the library was made and analyzed.

[0081] I cDNA Library Construction

[0082] The PANCNOT07 cDNA library was constructed from pancreas tissue obtained from a 25-week-old Caucasian male fetus. The frozen tissue was homogenized and lysed using a POLYTRON homogenizer (PT-3000; Brinkmann Instruments, Westbury N.J.) in guanidinium isothiocyanate solution. The lysate was centrifuged over a 5.7 M CsCl cushion using an SW28 rotor in a L8-70M ultracentrifuge (Beckman Coulter, Fullerton Calif.) for 18 hours at 25,000 rpm at ambient temperature. The RNA was extracted with acid phenol (pH 4.7), precipitated using 0.3 M sodium acetate and 2.5 volumes of ethanol, resuspended in RNAse-free water, and treated with DNase at 37.degree. C. Extraction and precipitation were repeated as before. The mRNA was isolated with the OLIGOTEX kit (Qiagen, Chatsworth Calif.) and used to construct the cDNA library.

[0083] The mRNA was handled according to the recommended protocols in the SUPERSCRIPT Plasmid system (Life Technologies). The cDNAs were fractionated on a SEPHAROSE CL4B column (Amersham Pharmacia Biotech), and those cDNAs exceeding 400 bp were ligated into pINCY1 plasmid (Incyte Pharmaceuticals). The plasmid was then transformed into DH5.alpha. competent cells (Life Technologies).

[0084] II Isolation and Sequencing of cDNA Clones

[0085] Plasmid DNA was released from the cells and purified using the REAL Prep 96 plasmid kit (Qiagen). This kit enabled the simultaneous purification of 96 samples in a 96-well block using multi-channel reagent dispensers. The recommended protocol was employed except for the following changes: 1) the bacteria were cultured in 1 ml of sterile Terrific Broth (Life Technologies) with carbenicillin at 25 mg/L and glycerol at 0.4%; 2) after inoculation, the cultures were incubated for 19 hours; and at the end of incubation, the cells were lysed with 0.3 ml of lysis buffer; and 3) following isopropanol precipitation, the DNA pellet was resuspended in 0.1 ml of distilled water. After the last step in the protocol, samples were transferred to a 96-well block for storage at 4.degree. C.

[0086] The cDNAs were sequenced by the method of Sanger and Coulson (1975, J Mol Biol 94:441-448). A MICROLAB 2200 (Hamilton, Reno Nev.) in combination with DNA Engine thermal cyclers (PTC200; MJ Research, Watertown Mass.) were used to prepare the DNA. After thermal cycling, the A, C, G, and T reactions with each DNA template were combined. Then, 50 .mu.l 100% ethanol was added, and the solution was spun at 4.degree. C. for 30 min. The supernatant was decanted, and the pellet was rinsed with 100 .mu.l 70% ethanol. After being spun for 15 min, the supernatant was discarded and the pellet was dried for 15 min under vacuum. The DNA sample was dissolved in 3 .mu.l of formaldehyde/50 mM EDTA and loaded into wells in volumes of 2 .mu.l per well for sequencing on ABI 377 DNA Sequencing systems (PE Biosystems, Foster City Calif.).

[0087] Most of the sequences were sequenced using standard ABI protocols and kits (Cat. Nos. 79345, 79339, 79340, 79357, 79355; PE Biosystems) at solution volumes of 0.25.times.-1.0.times. concentrations. Some of the sequences were sequenced using solutions and dyes from Amersham Pharmacia Biotech).

[0088] III Characterization of cDNA Clones

[0089] The nucleotide sequences of the Sequence Listing, as well as the amino acid sequences deduced from them, were used as query sequences against GenBank, SwissProt, BLOCKS, and PRINTS databases. The sequences in these databases, which contain previously identified and annotated sequences, were searched for regions of similarity using BLAST or FASTA (Pearson, W. R. (1990) Methods Enzymol 183:63-98; and Smith and Waterman (1981) Adv Appl Math 2:482-489).

[0090] VII. Extension of cDNA Sequences

[0091] The original nucleic acid sequence was extended using the Incyte cDNA clone and oligonucleotide primers. One primer was synthesized to initiate 5' extension of the known fragment, and the other, to initiate 3' extension of the known fragment. The initial primers were designed using OLIGO 4.06 software (National Biosciences), or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68.degree. C. to about 72.degree. C. Any stretch of nucleotides which would result in hairpin structures and primer-primer dimerizations was avoided.

[0092] Selected cDNA libraries, such as a pancreas library, were used to extend the sequence. If more than one extension was necessary, additional or nested sets of primers were designed. Preferred libraries are ones that have been size-selected to include larger cDNAs. Also, random primed libraries are preferred because they will contain more sequences with the 5' and upstream regions of genes. A randomly primed library is particularly useful if an oligo d(T) library does not yield a full-length cDNA. Genomic libraries are useful for extension 5' of the promoter binding region to obtain regulatory elements.

[0093] High fidelity amplification was obtained by PCR using methods well known in the art. PCR was performed in 96-well plates using the DNA ENGINE thermal cycler (MJ Research). The reaction mix contained DNA template, 200 nmol of each primer, reaction buffer containing Mg.sup.2+, (NH.sub.4).sub.2SO.sub.4, and .beta.-mercaptoethanol, TAQ DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene, San Diego Calif.), with the following parameters for primer pair PCI A and PCI B: Step 1: 94.degree. C., 3 min; Step 2: 94.degree. C., 15 sec; Step 3: 60.degree. C., 1 min; Step 4: 68.degree. C., 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68.degree. C., 5 min; Step 7: storage at 4.degree. C. In the alternative, the parameters for primer pair T7 and SK+ were as follows: Step 1: 94.degree. C., 3 min; Step 2: 94.degree. C., 15 sec; Step 3: 57.degree. C., 1 min; Step 4: 68.degree. C., 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68.degree. C., 5 min; Step 7: storage at 4.degree. C.

[0094] The concentration of DNA in each well was determined by dispensing 100 .mu.l PICOGREEN quantitation reagent (0.25% v/v; Molecular Probes, Eugene Oreg.) dissolved in 1.times.TE and 0.5 .mu.l of undiluted PCR product into each well of an opaque fluorimeter plate (Corning Costar, Acton Mass.) and allowing the DNA to bind to the reagent. The plate was scanned in a Fluroskan II (Labsystems Oy, Helsinki, Fla.) to measure the fluorescence of the sample and to quantify the concentration of DNA. A 5 .mu.l to 10 .mu.l aliquot of the reaction mixture was analyzed by electrophoresis on a 1% agarose mini-gel to determine which reactions were successful in extending the sequence.

[0095] The extended nucleotides were desalted and concentrated, transferred to 384-well plates, digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison Wis.), and sonicated or sheared prior to religation into pUC18 vector (Amersham Pharmacia Biotech). For shotgun sequencing, the digested nucleotides were separated on low concentration (0.6 to 0.8%) agarose gels, fragments were excised, and agar digested with AGARACE enzyme (Promega, Madison Wis.). Extended clones were religated using T4 ligase (New England Biolabs, Beverly Mass.) into pUC18 vector (Amersham Pharmacia Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction site overhangs, and transfected into competent E. coli cells. Transformed cells were selected on antibiotic-containing media, and individual colonies were picked and cultured overnight at 37.degree. C. in 384-well plates in LB/2.times.carbenicillin liquid media.

[0096] The cells were lysed, and DNA was amplified by PCR using TAQ DNA polymerase (Amersham Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1: 94.degree. C., 3 min; Step 2: 94.degree. C., 15 sec; Step 3: 60.degree. C., 1 min; Step 4: 72.degree. C., 2 min; Step 5: steps 2, 3, and 4 repeated 29 times; Step 6: 72.degree. C., 5 min; Step 7: storage at 4.degree. C. DNA was quantified using PICOGREEN reagent (Molecular Probes) as described above. Samples with low DNA recoveries were reamplified using the same conditions described above. Samples were diluted with 20% dimethysulphoxide (1:2, v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT cycle sequencing kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle sequencing kit (PE Biosystems). The extended sequences were assembled with the original clone using CONSED, PHRAP or GELVIEW software (GCG) and reanalyzed using BLAST, FASTA, or similar sequence analysis programs well known in the art. (See, e.g., Ausubel, supra, unit 7.7, pp. 7.65-69.)

[0097] IV Selection of Sequences

[0098] The sequences found in the Sequence Listing were selected because they possessed annotation, motifs, domains, regions or other patterns consistent with genes encoding proteins associated with membranes, receptors, or ion channels. The PRINTS database was searched using the BLIMPS search program to obtain protein family "fingerprints". The PRINTS database complements the PROSITE database and contains groups of conserved motifs within sequence alignments which are used to build characteristic signatures of different polypeptide families. For PRINTS analyses, the cutoff scores for local similarity were >1300=strong, 1000-1300=suggestive; for global similarity, p<exp-3; and for strength (degree of correlation), >1300=strong, 1000-1300=weak.

[0099] PRINTS screening was carried out electronically to identify those sequences shown in the Sequence Listing with similarity to membrane-associated proteins, receptors, and ion channels. The protein groupings screened included extracellular messengers (including cytokines, growth factors, hormones, neuropeptides, oncogenes, and vasomediators), receptors (including GPCRs, tetraspannins, receptor kinases and nuclear receptors), ion channels, and proteins associated with signaling cascades (including kinases, phosphatases, G proteins, and second messengers such as cyclic AMP, phospholipase C, inositol triphosphate, and the like).

[0100] VIII Labeling of Probes and Hybridization Analyses

[0101] Blotting

[0102] Polynucleotide sequences are isolated from a biological source and applied to a solid matrix (a blot) suitable for standard nucleic acid hybridization protocols by one of the following methods. A mixture of target nucleic acids, a restriction digest of genomic DNA, is fractionated by electrophoresis through an 0.7% agarose gel in 1.times.TAE [Tris-acetate-ethylenediamine tetraacetic acid (EDTA)] running buffer and transferred to a nylon membrane by capillary transfer using 20.times.saline sodium citrate (SSC). Alternatively, the target nucleic acids are individually ligated to a vector and inserted into bacterial host cells to form a library. Target nucleic acids are arranged on a blot by one of the following methods. In the first method, bacterial cells containing individual clones are robotically picked and arranged on a nylon membrane. The membrane is placed on bacterial growth medium, LB agar containing carbenicillin, and incubated at 37.degree. C. for 16 hours. Bacterial colonies are denatured, neutralized, and digested with proteinase K. Nylon membranes are exposed to UV irradiation in a STRATALINKER UV-crosslinker (Stratagene) to cross-link DNA to the membrane.

[0103] In the second method, target nucleic acids are amplified from bacterial vectors by thirty cycles of PCR using primers complementary to vector sequences flanking the insert. Amplified target nucleic acids are purified using SEPHACRYL-400 (Amersham Pharmacia Biotech). Purified target nucleic acids are robotically arrayed onto a glass microscope slide (Coming Science Products, Acton Mass.). The slide was previously coated with 0.05% aminopropyl silane (Sigma-Aldrich, St. Louis Mo.) and cured at 110.degree. C. The arrayed glass slide (microarray) is exposed to UV irradiation in a STRATALINKER UV-crosslinker (Stratagene).

[0104] Probe Preparation

[0105] cDNA probe sequences are made from mRNA templates. Five micrograms of mRNA is mixed with 1 .mu.g random primer (Life Technologies), incubated at 70.degree. C. for 10 minutes, and lyophilized. The lyophilized sample is resuspended in 50 .mu.l of 1.times.first strand buffer (cDNA synthesis system; Life Technologies) containing a dNTP mix, [.alpha.-.sup.32P]dCTP, dithiothreitol, and MMLV reverse transcriptase (Stratagene), and incubated at 42.degree. C. for 1-2 hours. After incubation, the probe is diluted with 42 .mu.l dH.sub.2O, heated to 95.degree. C. for 3 minutes, and cooled on ice. mRNA in the probe is removed by alkaline degradation. The probe is neutralized, and degraded mRNA and unincorporated nucleotides are removed using a PROBEQUANT G-50 microcolumn (Amersham Pharmacia Biotech). Probes can be labeled with fluorescent markers, Cy3-dCTP or Cy5-dCTP (Amersham Pharmacia Biotech), in place of the radionuclide, [.sup.32P]dCTP.

[0106] Hybridization

[0107] Hybridization is carried out at 65.degree. C. in a hybridization buffer containing 0.5 M sodium phosphate (pH 7.2), 7% SDS, and 1 mM EDTA. After the blot is incubated in hybridization buffer at 65.degree. C. for at least 2 hours, the buffer is replaced with 10 ml of fresh buffer containing the probe sequences. After incubation at 65.degree. C. for 18 hours, the hybridization buffer is removed, and the blot is washed sequentially under increasingly stringent conditions, up to 40 mM sodium phosphate, 1% SDS, 1 mM EDTA at 65.degree. C. To detect signal produced by a radiolabeled probe hybridized on a membrane, the blot is exposed to a PHOSPHORIMAGER cassette (Amersham Pharmacia Biotech), and the image is analyzed using IMAGEQUANT data analysis software (Molecular Dynamics). To detect signals produced by a fluorescent probe hybridized on a microarray, the blot is examined by confocal laser microscopy, and images are collected and analyzed using GEMTOOLS gene expression analysis software (Incyte Pharmaceuticals).

1TABLE I INCYTE PRINT PRINT SIGNAL SEQ ID NO CLONE NO PRINT ID PRINT DESCRIPTION STRENGTH SCORE TM PEPTIDE SEQ ID NO:1 8915 PR00554C ADENOSINE A2B RECEPTOR SIGNATURE 1189 1319 yes yes SEQ ID NO:2 68454 PR00572B INTERLEUKIN 8A RECEPTOR SIGNATURE 1120 1184 SEQ ID NO:3 98991 PR00897E VASOPRESSIN VIB RECEPTOR SIGNATURE 1159 1135 SEQ ID NO:4 121140 PR00247B CAMP-TYPE GPCR SIGNATURE 1230 1244 yes SEQ ID NO:5 129059 PR00535A MELANOCORTIN RECEPTOR SIGNATURE 1169 1120 SEQ ID NO:6 222732 PR00580 PROSTANOID EP1 RECEPTOR SIGNATURE 1278 1114 SEQ ID NO:7 222748 PR00635A AT1 ANGIOTENSIN II RECEPTOR SIGNATURE 1280 1356 yes yes SEQ ID NO:8 224587 PR00555C ADENOSINE A3 RECEPTOR SIGNATURE 1332 1341 yes SEQ ID NO:9 225146 PR00522G CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1341 1282 SEQ ID NO:10 225640 PR00592B EXTRACELLULAR CALCIUM-SENSING RECEPTOR 1421 1279 SIGNAT SEQ ID NO:11 225650 PR00648D GPR3 ORPHAN RECEPTOR SIGNATURE 1146 1302 yes SEQ ID NO:12 226179 PR00641H EBI1 ORPHAN RECEPTOR SIGNATURE 1259 1215 SEQ ID NO:13 226815 PR00644E GPR ORPHAN RECEPTOR SIGNATURE 1453 1250 yes SEQ ID NO:14 227559 PR00554G ADENOSINE A2B RECEPTOR SIGNATURE 1259 1234 SEQ ID NO:15 227799 PR00366A ENDOTHELIN RECEPTOR SIGNATURE 1337 1279 yes SEQ ID NO:16 227892 PR00531A HISTAMINE H2 RECEPTOR SIGNATURE 1183 1197 yes SEQ ID NO:17 228282 PR00565C DOPAMINE 1A RECEPTOR SIGNATURE 1221 1213 SEQ ID NO:18 229665 PR00648D GPR3 ORPHAN RECEPTOR SIGNATURE 1146 1233 yes SEQ ID NO:19 229779 PR00537C MU OPIOID RECEPTOR SIGNATURE 1348 1216 yes SEQ ID NO:20 240829 PR00856I PROSTACYCLIN (PROSTANOID IP) RECEPTOR 1131 1273 yes SIGNATU SEQ ID NO:21 341490 PR00531A HISTAMINE H2 RECEPTOR SIGNATURE 1183 1256 yes yes SEQ ID NO:22 402456 PR00555F ADENOSINE A3 RECEPTOR SIGNATURE 1259 1121 SEQ ID NO:23 420765 PR00536E MELANOCYTE STIMULATING HORMONE RECEP- 1313 1170 TOR SIGNA SEQ ID NO:24 481770 PR00558B ALPHA-2A ADRENERGIC RECEPTOR SIGNATURE 1519 1108 SEQ ID NO:25 548654 PR00531A HISTAMINE H2 RECEPTOR SIGNATURE 1183 1240 yes SEQ ID NO:26 632097 PR00715E CATION-DEPENDENT MANNOSE-6-PHOSPHATE 1440 1300 yes yes RECEPTOR SEQ ID NO:27 647580 PR00586B PROSTANOID EP4 RECEPTOR SIGNATURE 1452 1317 yes SEQ ID NO:28 647628 PR00514D 5-HYDROXYTRYPTAMINE 1D RECEPTOR 1252 1263 yes yes SIGNATURE SEQ ID NO:29 647931 PR00596D URIDINE NUCLEOTIDE RECEPTOR SIGNATURE 1255 1221 SEQ ID NO:30 648153 PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1213 SEQ ID NO:31 648838 PR00751E THYROTROPHIN-RELEASING HORMONE RECEPTOR 1433 1221 SIGNA SEQ ID NO:32 649152 PR00554G ADENOSINE A2B RECEPTOR SIGNATURE 1259 1078 SEQ ID NO:33 649682 PR00255B NATRIURETIC PEPTIDE RECEPTOR SIGNATURE 1264 1213 SEQ ID NO:34 649917 PR00596D URIDINE NUCLEOTIDE RECEPTOR SIGNATURE 1255 1154 SEQ ID NO:35 650726 PR00490C SECRETIN RECEPTOR SIGNATURE 1238 1195 SEQ ID NO:36 652013 PR00491C VASOACTIVE INTESTINAL PEPTIDE RECEPTOR 1121 1155 yes SIGNAT SEQ ID NO:37 738964 PR00542F MUSCARINIC M5 RECEPTOR SIGNATURE 1218 1185 SEQ ID NO:38 743323 PR00899G FUNGAL PHEROMONE STE3 GPCR SIGNATURE 1132 1218 SEQ ID NO:39 753592 PR00587A SOMATOSTATIN RECEPTOR TYPE 1 SIGNATURE 1312 1121 SEQ ID NO:40 797777 PR00512F 5-HYDROXYTRYPTAMINE 1A RECEPTOR 1388 1257 yes SIGNATURE SEQ ID NO:41 885098 PR00642C EDG1 ORPHAN RECEPTOR SIGNATURE 1193 1213 SEQ ID NO:42 947812 PR00636C AT2 ANGIOTENSIN II RECEPTOR SIGNATURE 1317 1277 SEQ ID NO:43 948051 PR00554A ADENOSINE A2B RECEPTOR SIGNATURE 1109 1255 SEQ ID NO:44 948581 PR00539E MUSCARINIC M2 RECEPTOR SIGNATURE 1322 1244 yes SEQ ID NO:45 948700 PR00554C ADENOSINE A2B RECEPTOR SIGNATURE 1189 1316 SEQ ID NO:46 948883 PR00531E HISTAMINE H2 RECEPTOR SIGNATURE 1324 1241 yes SEQ ID NO:47 948935 PR00663D GALANIN RECEPTOR SIGNATURE 1168 1282 SEQ ID NO:48 949387 PR00639D NEUROMEDIN B RECEPTOR SIGNATURE 1198 1216 yes SEQ ID NO:49 951797 PR00900G PHEROMONE A RECEPTOR SIGNATURE 996 1211 yes SEQ ID NO:50 997947 PR00928E GRAVES DISEASE CARRIER PROTEIN SIGNATURE 1410 1283 yes SEQ ID NO:51 1212964 PR00564D BURKITT'S LYMPHOMA RECEPTOR SIGNATURE 1295 1291 yes SEQ ID NO:52 1214535 PR00527A GASTRIN RECEPTOR SIGNATURE 1327 1185 SEQ ID NO:53 1219856 PR00587A SOMATOSTATIN RECEPTOR TYPE 1 SIGNATURE 1312 1181 SEQ ID NO:54 1288503 PR00663G GALANIN RECEPTOR SIGNATURE 1160 1452 yes yes SEQ ID NO:55 1298179 PR00666C PINEAL OPSIN SIGNATURE 1257 1296 yes SEQ ID NO:56 1305513 PR00639D NEUROMEDIN B RECEPTOR SIGNATURE 1198 1238 yes SEQ ID NO:57 1318926 PR00639D NEUROMEDIN B RECEPTOR SIGNATURE 1198 1299 yes SEQ ID NO:58 1328744 PR00641A EBI1 ORPHAN RECEPTOR SIGNATURE 1325 1267 yes yes SEQ ID NO:59 1328845 PR00554D ADENOSINE A2B RECEPTOR SIGNATURE 1208 1221 yes yes SEQ ID NO:60 1329044 PR00900G PHEROMONE A RECEPTOR SIGNATURE 996 1235 yes yes SEQ ID NO:61 1329081 PR00522G CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1341 1249 yes SEQ ID NO:62 1329095 PR00639D NEUROMEDIN B RECEPTOR SIGNATURE 1198 1137 SEQ ID NO:63 1329404 PR00637D TYPE 3 BOMBESIN RECEPTOR SIGNATURE 1131 1246 SEQ ID NO:64 1329477 PR00517F 5-HYDROXYTRYPTAMINE 2C RECEPTOR 1259 1310 yes SIGNATURE SEQ ID NO:65 1329584 PR00856I PROSTACYCLIN (PROSTANOID IP) RECEPTOR 1131 1176 SIGNATU SEQ ID NO:66 1329652 PR00646B RDC1 ORPHAN RECEPTOR SIGNATURE 1307 1226 yes yes SEQ ID NO:67 1329778 PR00562F BETA-2 ADRENERGIC RECEPTOR SIGNATURE 1360 1292 yes SEQ ID NO:68 1329830 PR00642B EDG1 ORPHAN RECEPTOR SIGNATURE 1218 1325 yes SEQ ID NO:69 1329851 PR00582B PROSTANOID EP3 RECEPTOR SIGNATURE 1750 1276 yes yes SEQ ID NO:70 1329862 PR00580C PROSTANOID EP1 RECEPTOR SIGNATURE 1278 1253 yes SEQ ID NO:71 1329971 PR00547A X OPIOID RECEPTOR SIGNATURE 1342 1233 SEQ ID NO:72 1329994 PR00564D BURKITT'S LYMPHOMA RECEPTOR SIGNATURE 1295 1315 yes yes SEQ ID NO:73 1329995 PR00490F SECRETIN RECEPTOR SIGNATURE 1239 1275 yes SEQ ID NO:74 1330007 PR00899K FUNGAL PHEROMONE STE3 GPCR SIGNATURE 1057 1244 yes SEQ ID NO:75 1330016 PR00514D 5-HYDROXYTRYPTAMINE ID RECEPTOR 1252 1255 yes SIGNATURE SEQ ID NO:76 1330023 PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1258 yes SEQ ID NO:77 1330061 PR00586H PROSTANOID EP4 RECEPTOR SIGNATURE 1526 1219 SEQ ID NO:78 1330108 PR00424B ADENOSINE RECEPTOR SIGNATURE 1339 1240 yes yes SEQ ID NO:79 1330215 PR00642B EDG1 ORPHAN RECEPTOR SIGNATURE 1218 1237 yes SEQ ID NO:80 1330424 PR00248F METABOTROPIC GLUTAMATE GPCR SIGNATURE 1498 1262 yes SEQ ID NO:81 1330429 PR00568D DOPAMINE D3 RECEPTOR SIGNATURE 1445 1226 yes yes SEQ ID NO:82 1330478 PRO0571G ENDOTHELIN-B RECEPTOR SIGNATURE 1420 1253 yes SEQ ID NO:83 1330641 PR00424F ADENOSINE RECEPTOR SIGNATURE 1205 1241 SEQ ID NO:84 1330656 PR00554B ADENOSINE A2B RECEPTOR SIGNATURE 1090 1227 yes SEQ ID NO:85 1330683 PR00699F C.ELEGANS INTEGRAL MEMBRANE PROTEIN 1214 1220 yes SRG SIGNA SEQ ID NO:86 1330740 PR00639D NEUROMEDIN B RECEPTOR SIGNATURE 1198 1209 yes SEQ ID NO:87 1330847 PR00641F EBI1 ORPHAN RECEPTOR SIGNATURE 1290 1260 yes yes SEQ ID NO.88 1330861 PR00424B ADENOSINE RECEPTOR SIGNATURE 1339 1310 yes SEQ ID NO:89 1330882 PR00559C ALPHA-2B ADRENERGIC RECEPTOR SIGNATURE 1284 1208 SEQ ID NO:90 1330907 PR00568A DOPAMINE D3 RECEPTOR SIGNATURE 1427 1248 yes SEQ ID NO:91 1330918 PR00908H THROMBIN RECEPTOR SIGNATURE 1409 1300 yes yes SEQ ID NO:92 1330930 PR00645I LCR1 ORPHAN RECEPTOR SIGNATURE 1511 1272 yes yes SEQ ID NO:93 1330957 PR00261E LOW DENSITY LIPOPROTEIN (LDL) RECEPTOR 1459 1236 yes SIGNAT SEQ ID NO:94 1330969 PR00515C 5-HYDROXYTRYPTAMINE 1F RECEPTOR 1351 1264 SIGNATURE SEQ ID NO:95 1331030 PR00715E CATION-DEPENDENT MANNOSE-6-PHOSPHATE 1440 1300 yes RECEPTOR SEQ ID NO:96 1331172 PR00667B RETINAL PIGMENT EPITHELIUM-RETINAL GPCR 1190 1237 yes SIGNA SEQ ID NO:97 1331278 PR00854B PROSTAGLANDIN D RECEPTOR SIGNATURE 1257 1288 yes SEQ ID NO:98 1331316 PR00596D URIDINE NUCLEOTIDE RECEPTOR SIGNATURE 1255 1294 SEQ ID NO:99 1331330 PR00699E C.ELEGANS INTEGRAL MEMBRANE PROTEIN SRG 1137 1196 yes SIGNA SEQ ID NO:100 1331371 PR00240D ALPHA-1A ADRENERGIC RECEPTOR SIGNATURE 1470 1238 yes SEQ ID NO:101 1331411 PR00542F MUSCARINIC M5 RECEPTOR SIGNATURE 1218 1284 yes SEQ ID NO:102 1331481 PR00641B EBI1 ORPHAN RECEPTOR SIGNATURE 1354 1244 SEQ ID NO:103 1331917 PR00572B INTERLEUKIN 8A RECEPTOR SIGNATURE 1120 1229 yes yes SEQ ID NO:104 1332023 PR00240D ALPHA-1A ADRENERGIC RECEPTOR SIGNATURE 1470 1307 yes yes SEQ ID NO:105 1332138 PR00752F VASOPRESSIN V1A RECEPTOR SIGNATURE 1304 1226 yes SEQ ID NO:106 1332171 PR00715I CATION-DEPENDENT MANNOSE-6-PHOSPHATE 1392 1205 RECEPTOR SEQ ID NO:107 1332391 PR00522G CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1341 1285 yes SEQ ID NO:108 1332480 PR00643D G1D ORPHAN RECEPTOR SIGNATURE 1317 1210 SEQ ID NO:109 1332803 PR00258D SPERACT RECEPTOR SIGNATURE 1254 1230 yes SEQ ID NO:110 1332830 PR00652F 5-HYDROXYTRYPTAMINE 7 RECEPTOR SIGNATURE 1488 1194 SEQ ID NO:111 1332955 PR00639D NEUROMEDIN B RECEPTOR SIGNATURE 1198 1295 yes yes SEQ ID NO:112 1332966 PR00574D BLUE-SENSITIVE OPSIN SIGNATURE 1263 1300 yes SEQ ID NO:113 1332981 PR00589E SOMATOSTATIN RECEPTOR TYPE 3 SIGNATURE 1340 1253 yes SEQ ID NO:114 1333006 PR00643H G10D RECEPTOR SIGNATURE 1453 1285 yes SEQ ID NO:115 1333107 PR00539E MUSCARINIC M2 RECEPTOR SIGNATURE 1322 1371 SEQ ID NO:116 1333116 PR00568D DOPAMINE D3 RECEPTOR SIGNATURE 1445 1206 SEQ ID NO:117 1333133 PR00643H G10D ORPHAN RECEPTOR SIGNATURE 1453 1246 SEQ ID NO:118 1352448 PR00527I GASTRIN RECEPTOR SIGNATURE 1633 1234 yes yes SEQ ID NO:119 1385827 PR00539E MUSCARINIC M2 RECEPTOR SIGNATURE 1322 1368 yes yes SEQ ID NO:120 1385922 PR00571G ENDOTHELIN-B RECEPTOR SIGNATURE 1420 1193 SEQ ID NO:121 1386485 PR00574D BLUE-SENSITIVE OPSIN SIGNATURE 1263 1317 SEQ ID NO:122 1386553 PR00857C MELATONIN RECEPTOR SIGNATURE 1472 1238 yes SEQ ID NO:123 1386660 PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1251 SEQ ID NO:124 1386859 PR00645G LCR1 ORPHAN RECEPTOR SIGNATURE 1454 1230 SEQ ID NO:125 1387302 PR00592B EXTRACELLULAR CALCIUM-SENSING RECEPTOR 1421 1203 yes SIGNAT SEQ ID NO:126 1388063 PR00665G OXYTOCIN RECEPTOR SIGNATURE 1246 1335 yes SEQ ID NO:127 1422814 PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1292 SEQ ID NO:128 1423820 PR00574D BLUE-SENSITIVE OPSIN SIGNATURE 1263 1177 yes SEQ ID NO:129 1429651 PR00554G ADENOSINE A2B RECEPTOR SIGNATURE 1259 1275 yes yes SEQ ID NO:130 1436525 PR00568D DOPAMINE D3 RECEPTOR SIGNATURE 1445 1216 yes SEQ ID NO:131 1453124 PR00649G GPR6 ORPHAN RECEPTOR SIGNATURE 1103 1292 yes SEQ ID NO:132 1460891 PR00854B PROSTAGLANDIN D RECEPTOR SIGNATURE 1257 1245 SEQ ID NO:133 1465590 PR00539E MUSCARINIC M2 RECEPTOR SIGNATURE 1322 1229 yes yes SEQ ID NO:134 1466523 PR00560D ALPHA-2C ADRENERGIC RECEPTOR SIGNATURE 1642 1319 SEQ ID NO:135 1466902 PR00240D ALPHA-1A ADRENERGIC RECEPTOR SIGNATURE 1470 1271 SEQ ID NO:136 1468040 PR00537C MU OPIOID RECEPTOR SIGNATURE 1348 1316 yes SEQ ID NO:137 1480833 PR00571G ENDOTHELIN-B RECEPTOR SIGNATURE 1420 1325 SEQ ID NO:138 1516908 PR00530I HISTAMINE H1 RECEPTOR SIGNATURE 1295 1276 yes SEQ ID NO:139 1518320 PR00343A SELECTIN SUPERFAMILY COMPLEMENT-BINDING 1245 1412 yes REPEA SEQ ID NO:140 1529624 PR00522G CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1341 1350 SEQ ID NO:141 1590311 PR00639D NEUROMEDIN B RECEPTOR SIGNATURE 1198 1168 SEQ ID NO:142 1590335 PR00666B PINEAL OPSIN SIGNATURE 1253 1224 SEQ ID NO:143 1590422 PR00240F ALPHA-1A ADRENERGIC RECEPTOR SIGNATURE 1323 1226 SEQ ID NO:144 1590455 PR00637D TYPE 3 BOMBESIN RECEPTOR SIGNATURE 1131 1182 SEQ ID NO:145 1590464 PR00637D TYPE 3 BOMBESIN RECEPTOR SIGNATURE 1131 1268 yes SEQ ID NO:146 1590496 PR00908H THROMBIN RECEPTOR SIGNATURE 1409 1226 yes SEQ ID NO:147 1590713 PR00641F EBI1 ORPHAN RECEPTOR SIGNATURE 1290 1211 SEQ ID NO:148 1590769 PR00572B INTERLEUKIN 8A RECEPTOR SIGNATURE 1120 1122 SEQ ID NO:149 1590779 PR00642C EDG1 ORPHAN RECEPTOR SIGNATURE 1193 1246 SEQ ID NO:150 1590931 PR00646F RDC1 ORPHAN RECEPTOR SIGNATURE 1188 1165 SEQ ID NO:151 1590958 PR00261C LOW DENSITY LIPOPROTEIN (LDL) RECEPTOR 1576 1234 SIGNAT SEQ ID NO:152 1590973 PR00350C VITAMIN D RECEPTOR SIGNATURE 1416 1283 yes yes SEQ ID NO:153 1591090 PR00643G G10D ORPHAN RECEPTOR SIGNATURE 1383 1292 SEQ ID NO:154 1591713 PR00553B ADENOSINE A2A RECEPTOR SIGNATURE 1258 1292 yes yes SEQ ID NO:155 1642794 PR00527B GASTRIN RECEPTOR SIGNATURE 1431 1353 SEQ ID NO:156 1687080 PR00855A PROSTAGLANDIN F RECEPTOR SIGNATURE 1361 1248 yes yes SEQ ID NO:157 1722845 PR00248C METABOTROPIC GLUTAMATE GPCR SIGNATURE 1402 1328 yes yes SEQ ID NO:158 1732911 PR00587A SOMATOSTATIN RECEPTOR TYPE 1 SIGNATURE 1312 1221 SEQ ID NO:159 1785913 PR00554C ADENOSINE A2B RECEPTOR SIGNATURE 1189 1201 SEQ ID NO:160 1809069 PR00643G G10D ORPHAN RECEPTOR SIGNATURE 1383 1367 yes yes SEQ ID NO:161 1867626 PR00665F OXYTOCIN RECEPTOR SIGNATURE 1290 1353 yes SEQ ID NO:162 1880501 PR00715I CATION-DEPENDENT MANNOSE-6-PHOSPHATE 1392 1257 yes yes RECEPTOR SEQ ID NO:163 1881009 PR00531A HISTAMINE H2 RECEPTOR SIGNATURE 1183 1269 yes SEQ ID NO:164 1909132 PR00637D TYPE 3 BOMBESIN RECEPTOR SIGNATURE 1131 1221 yes SEQ ID NO:165 1955094 PR00580 PROSTANOID EP1 RECEPTOR SIGNATURE 1160 1197 yes SEQ ID NO:166 1955688 PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1193 yes SEQ ID NO:167 1956694 PR00517F 5-HYDROXYTRYPTAMINE 2C RECEPTOR 1259 1255 SIGNATURE SEQ ID NO:168 1957189 PR00366A ENDOTHELIN RECEPTOR SIGNATURE 1337 1232 SEQ ID NO:169 1957920 PR00350B VITAMIN D RECEPTOR SIGNATURE 1494 1267 SEQ ID NO:170 1957977 PR00559C ALPHA-2B ADRENERGIC RECEPTOR SIGNATURE 1284 1157 yes SEQ ID NO:171 1958505 PR00571G ENDOTHELIN-B RECEPTOR SIGNATURE 1420 1188 SEQ ID NO:172 1972687 PR00553B ADENOSINE A2A RECEPTOR SIGNATURE 1258 1275 yes yes SEQ ID NO:173 1975013 PR00554D ADENOSINE A2B RECEPTOR SIGNATURE 1208 1312 yes SEQ ID NO:174 2010369 PR00645I LCR1 ORPHAN RECEPTOR SIGNATURE 1511 1277 yes yes SEQ ID NO:175 2019581 PR00641A EBI1 ORPHAN RECEPTOR SIGNATURE 1325 1178 SEQ ID NO:176 2022460 PR00574D BLUE-SENSITIVE OPSIN SIGNATURE 1263 1279 yes SEQ ID NO:177 2022624 PR00857C MELATONIN RECEPTOR SIGNATURE 1472 1232 yes yes SEQ ID NO:178 2022628 PR00244A NEUROKININ RECEPTOR SIGNATURE 1316 1174 SEQ ID NO:179 2022630 PR00525E DELTA OPIOID RECEPTOR SIGNATURE 1139 1261 yes SEQ ID NO:180 2022631 PR00645I LCR1 ORPHAN RECEPTOR SIGNATURE 1511 1299 yes yes SEQ ID NO:181 2023275 PR00667B RETINAL PIGMENT EPITHELIUM-RETINAL 1190 1217 GPCR SIG SEQ ID NO:182 2023747 PR00568D DOPAMINE D3 RECEPTOR SIGNATURE 1445 1244 SEQ ID NO:183 2044305 PR00751B THYROTROPHIN-RELEASING HORMONE RECEPTOR 1443 1209 yes yes SIGNA SEQ ID NO:184 2069971 PR00644E GPR ORPHAN RECEPTOR SIGNATURE 1453 1307 yes SEQ ID NO:185 2070872 PR00245C OLFACTORY RECEPTOR SIGNATURE 1364 1286 yes yes SEQ ID NO:186 2072228 PR00553A ADENOSINE A2A RECEPTOR SIGNATURE 1377 1218 yes SEQ ID NO:187 2085633 PR00752E VASOPRESSIN V1A RECEPTOR SIGNATURE 1193 1250 yes SEQ ID NO:188 2088104 PR00641A EBI1 ORPHAN RECEPTOR SIGNATURE 1325 1305 yes yes SEQ ID NO:189 2091133 PR00562C BETA-2 ADRENERGIC RECEPTOR SIGNATURE 1457 1302 SEQ ID NO:190 2123514 PR00587A SOMATOSTATIN RECEPTOR TYPE 1 SIGNATURE 1312 1252 yes SEQ ID NO:191 2150261 PR00176C SODIUM/NEUROTRANSMITTER SYMPORTER 1414 1576 yes yes SIGNATURE SEQ ID NO:192 2170670 PR00896H VASOPRESSIN RECEPTOR SIGNATURE 1331 1271 yes yes SEQ ID NO:193 2199484 PR00559B ALPHA-2B ADRENERGIC RECEPTOR SIGNATURE 1285 1195 yes SEQ ID NO:194 2204242 PR00896H VASOPRESSIN RECEPTOR SIGNATURE 1331 1305 SEQ ID NO:195 2236316 PR00554B ADENOSINE A2B RECEPTOR SIGNATURE 1090 1192 yes SEQ ID NO:196 2237722 PR00855H PROSTAGLANDIN F RECEPTOR SIGNATURE 1467 1218 SEQ ID NO:197 2238625 PR00585A PROSTANOID EP3 RECEPTOR TYPE 3 SIGNATURE 1230 1237 yes SEQ ID NO:198 2242277 PR00255B NATRIURETIC PEPTIDE RECEPTOR SIGNATURE 1264 1180 yes SEQ ID NO:199 2244782 PR00590A SOMATOSTATIN RECEPTOR TYPE 4 SIGNATURE 1253 1260 SEQ ID NO:200 2272244 PR00568D DOPAMINE D3 RECEPTOR SIGNATURE 1445 1365 yes yes SEQ ID NO:201 2284108 PR00652F 5-HYDROXYTRYPTAMINE 7 RECEPTOR SIGNATURE 1488 1317 yes yes SEQ ID NO:202 2287109 PR00641H EBI1 ORPHAN RECEPTOR SIGNATURE 1259 1305 SEQ ID NO:203 2289873 PR00596D URIDINE NUCLEOTIDE RECEPTOR SIGNATURE 1255 1254 yes yes SEQ ID NO:204 2375491 PR00592B EXTRACELLULAR CALCIUM-SENSING RECEPTOR 1421 1286 SIGNAT

SEQ ID NO:205 2376547 PR00539E MUSCARINIC M2 RECEPTOR SIGNATURE 1322 1222 SEQ ID NO:206 2377774 PR00240F ALPHA-1A ADRENERGIC RECEPTOR SIGNATURE 1323 1215 yes yes SEQ ID NO:207 2378093 PR00854A PROSTAGLANDIN D RECEPTOR SIGNATURE 1169 1350 yes yes SEQ ID NO:208 2378367 PR00571A ENDOTHELIN-B RECEPTOR SIGNATURE 1357 1278 yes SEQ ID NO:209 2378405 PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1195 SEQ ID NO:210 2378406 PR00555C ADENOSINE A3 RECEPTOR SIGNATURE 1332 1369 SEQ ID NO:211 2381364 PR00666B PINEAL OPSIN SIGNATURE 1253 1249 yes SEQ ID NO:212 2381732 PR00663D GALANIN RECEPTOR SIGNATURE 1168 1242 SEQ ID NO:213 2383045 PR00522C CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1317 1223 SEQ ID NO:214 2470285 PR00373D GLYCOPROTEIN HORMONE RECEPTOR SIGNATURE 1458 1537 yes yes SEQ ID NO:215 2488060 PR00558G ALPHA-2A ADRENERGIC RECEPTOR SIGNATURE 1396 1282 yes yes SEQ ID NO:216 2503084 PR00752E VASOPRESSIN V1A RECEPTOR SIGNATURE 1193 1235 yes SEQ ID NO:217 2511221 PR00565B DOPAMINE 1A RECEPTOR SIGNATURE 1289 1266 yes yes SEQ ID NO:218 2512109 PR00636B AT2 ANGIOTENSIN II RECEPTOR SIGNATURE 1305 1267 SEQ ID NO:219 2553280 PR00530I HISTAMINE H1 RECEPTOR SIGNATURE 1295 1275 yes SEQ ID NO:220 2603450 PR00373D GLYCOPROTEIN HORMONE RECEPTOR SIGNATURE 1458 1419 yes yes SEQ ID NO:221 2605934 PR00539C MUSCARINIC M2 RECEPTOR SIGNATURE 1365 1286 yes SEQ ID NO:222 2674641 PR00752E VASOPRESSIN V1A RECEPTOR SIGNATURE 1193 1228 yes yes SEQ ID NO:223 2681738 PR00258C SPERACT RECEPTOR SIGNATURE 1220 1307 yes yes SEQ ID NO:224 2723293 PR00554B ADENOSINE A2B RECEPTOR SIGNATURE 1090 1240 yes SEQ ID NO:225 2762348 PR00560D ALPHA-2C ADRENERGIC RECEPTOR SIGNATURE 1642 1284 yes yes SEQ ID NO:226 2773609 PR00854B PROSTAGLANDIN D RECEPTOR SIGNATURE 1257 1269 SEQ ID NO:227 2776266 PR00854B PROSTAGLANDIN D RECEPTOR SIGNATURE 1257 1269 SEQ ID NO:228 2777115 PR00527A GASTRIN RECEPTOR SIGNATURE 1327 1286 SEQ ID NO:229 2812882 PR00566B DOPAMINE 1B RECEPTOR SIGNATURE 1121 1215 yes SEQ ID NO:230 2821121 PR00560D ALPHA-2C ADRENERGIC RECEPTOR SIGNATURE 1642 1251 yes yes SEQ ID NO:231 2848989 PR00561F BETA-1 ADRENERGIC RECEPTOR SIGNATURE 1445 1293 SEQ ID NO:232 2854471 PR00643C G1D ORPHAN RECEPTOR SIGNATURE 1286 1051 SEQ ID NO:233 2854670 PR00571A ENDOTHELIN-B RECEPTOR SIGNATURE 1357 1222 yes SEQ ID NO:234 2855520 PR00590C SOMATOSTATIN RECEPTOR TYPE 4 SIGNATURE 1325 1255 SEQ ID NO:235 2855815 PR00558G ALPHA-2A ADRENERGIC RECEPTOR SIGNATURE 1396 1184 yes SEQ ID NO:236 2857653 PR00854B PROSTAGLANDIN D RECEPTOR SIGNATURE 1257 1302 yes yes SEQ ID NO:237 2866122 PR00350C VITAMIN D RECEPTOR SIGNATURE 1230 1214 yes yes SEQ ID NO:238 2925464 PR00585A PROSTANOID EP3 RECEPTOR TYPE 3 SIGNATURE 1230 1214 yes yes SEQ ID NO:239 2954714 PR00241C ANGIOTENSIN II RECEPTOR SIGNATURE 1246 1213 SEQ ID NO:240 2986560 PR00855A PROSTAGLANDIN F RECEPTOR SIGNATURE 1361 1330 SEQ ID NO:241 3068234 PR00642D EDG1 ORPHAN RECEPTOR SIGNATURE 1208 1319 yes yes SEQ ID NO:242 3077943 PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1268 SEQ ID NO:243 3144006 PR00715G CATION DEPENDENT MANNOSE-6-PHOSPHATE 1366 1247 RECEPTOR SEQ ID NO:244 3226980 PR00527A GASTRIN RECEPTOR SIGNATURE 1327 1222 yes SEQ ID NO:245 3290614 PR00522G CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1341 1263 SEQ ID NO:246 3291235 PR00240D ALPHA-1A ADRENERGIC RECEPTOR SIGNATURE 1470 1269 yes SEQ ID NO:247 3324775 PR00490F SECRETIN RECEPTOR SIGNATURE 1239 1217 yes SEQ ID NO:248 3333159 PR00581A PROSTANOID EP2 RECEPTOR SIGNATURE 1113 1243 yes SEQ ID NO:249 3393404 PR00343A SELECTIN SUPERFAMILY COMPLEMENT-BINDING 1245 1606 yes yes REPEA SEQ ID NO:250 3429408 PR00643G G10D ORPHAN RECEPTOR SIGNATURE 1383 1204 SEQ ID NO:251 3447545 PR00642E EDG1 ORPHAN RECEPTOR SIGNATURE 1216 1341 SEQ ID NO:252 3486012 PR00343C SELECTIN SUPERFAMILY COMPLEMENT-BINDING 1254 1343 REPEA SEQ ID NO:253 3513925 PR00261C LOW DENSITY LIPOPROTEIN (LDL) RECEPTOR 1576 1256 SIGNAT SEQ ID NO:254 3542591 PR00573C INTERLEUKIN 8B RECEPTOR SIGNATURE 1042 1231 yes SEQ ID NO:255 3556218 PR00373F GLYCOPROTEIN HORMONE RECEPTOR SIGNATURE 1570 1293 yes SEQ ID NO:256 3665056 PR00366A ENDOTHELIN RECEPTOR SIGNATURE 1337 1251 yes SEQ ID NO:257 4043152 PR00514D 5-HYDROXYTRYPTAMINE 1D RECEPTOR 1252 1304 yes SIGNATURE SEQ ID NO:258 4080842 PR00424B ADENOSINE RECEPTOR SIGNATURE 1339 1117 SEQ ID NO:259 4081268 PR00564G BURKITT'S LYMPHOMA RECEPTOR SIGNATURE 1305 1216 yes SEQ ID NO:260 4082591 PR00640F GASTRIN-RELEASING PEPTIDE RECEPTOR 1243 1311 SIGNATURE SEQ ID NO:261 4084463 PR00539C MUSCARINIC M2 RECEPTOR SIGNATURE 1365 1258 yes SEQ ID NO:262 4085069 PR00663E GALANIN RECEPTOR SIGNATURE 1216 1210 SEQ ID NO:263 4129226 PR00250F FUNGAL PHEROMONE MATING FACTOR STE2 1166 1178 GPCR SIGN SEQ ID NO:264 4129407 PR00564D BURKITT'S LYMPHOMA RECEPTOR SIGNATURE 1295 1222 SEQ ID NO:265 4130289 PR00539E MUSCARINIC M2 RECEPTOR SIGNATURE 1322 1236 yes yes SEQ ID NO:266 4130748 PR00751E THYROTROPHIN-RELEASING HORMONE RECEPTOR 1433 1245 SIGNA SEQ ID NO:267 4131249 PR00240F ALPHA-1A ADRENERGIC RECEPTOR SIGNATURE 1323 1305 yes SEQ ID NO:268 4132125 PR00581E PROSTANOID EP2 RECEPTOR SIGNATURE 1195 1157 SEQ ID NO:269 4132371 PR00350C VITAMIN D RECEPTOR SIGNATURE 1416 1321 yes SEQ ID NO:270 4132403 PR00248E METABOTROPIC GLUTAMATE GPCR SIGNATURE 1306 1167 SEQ ID NO:271 4132547 PR00559E ALPHA-2B ADRENERGIC RECEPTOR SIGNATURE 1546 1214 SEQ ID NO:272 4133631 PR00928F GRAVES DISEASE CARRIER PROTEIN SIGNATURE 1555 1238 SEQ ID NO:273 4166159 PR00855H PROSTAGLANDIN F RECEPTOR SIGNATURE 1467 1236 SEQ ID NO:274 4167883 PR00537C MU OPIOID RECEPTOR SIGNATURE 1348 1 1185 SEQ ID NO:275 4220523 PR00537C MU OPIOID RECEPTOR SIGNATURE 1348 1227 yes SEQ ID NO:276 4220713 PR00366G ENDOTHELIN RECEPTOR SIGNATURE 1420 1276 SEQ ID NO:277 4220819 PR00855H PROSTAGLANDIN F RECEPTOR SIGNATURE 1467 1263 SEQ ID NO:278 4220939 PR00571A ENDOTHELIN-B RECEPTOR SIGNATURE 1357 1235 yes SEQ ID NO:279 4221286 PR00574D BLUE-SENSITIVE OPSIN SIGNATURE 1263 1254 yes SEQ ID NO:280 4221314 PR00530I HISTAMINE H1 RECEPTOR SIGNATURE 1295 1263 yes yes SEQ ID NO:281 4222520 PR00350C VITAMIN D RECEPTOR SIGNATURE 1416 1202 SEQ ID NO:282 4223468 PR00592A EXTRACELLULAR CALCIUM-SENSING RECEPTOR 1379 1206 SIGNAT SEQ ID NO:283 4223734 PR00652F 5-HYDROXYTRYPTAMINE 7 RECEPTOR SIGNATURE 1488 1213 SEQ ID NO:284 4224867 PR00244E NEUROKININ RECEPTOR SIGNATURE 1282 1266 SEQ ID NO:285 4256014 PR00665F OXYTOCIN RECEPTOR SIGNATURE 1290 1353 yes SEQ ID NO:286 4352201 PR00643H G10D ORPHAN RECEPTOR SIGNATURE 1453 1297 SEQ ID NO:287 4355247 PR00536E MELANOCYTE STIMULATING HORMONE RECEP- 1313 1353 yes TOR SIGNA SEQ ID NO:288 4608111 PR00596A URIDINE NUCLEOTIDE RECEPTOR SIGNATURE 1217 1273 yes yes SEQ ID NO:289 319589 PR00635F AT1 ANGIOTENSIN II RECEPTOR SIGNATURE 1424 1398 yes yes SEQ ID NO:290 884692 PR00255B NATRIURETIC PEPTIDE RECEPTOR SIGNATURE 1264 1239 yes SEQ ID NO:291 1262948 PR00248C METABOTROPIC GLUTAMATE GPCR SIGNATURE 1402 1369 yes yes SEQ ID NO:292 1876370 PR00343A SELECTIN SUPERFAMILY COMPLEMENT-BINDING 1245 1616 yes yes REPEA SEQ ID NO:293 2088868 PR00522G CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1341 1329 yes yes SEQ ID NO:294 3550808 PR00594B P2U PURINOCEPTOR SIGNATURE 1452 1255 yes yes SEQ ID NO:295 1328883 PR00169H POTASSIUM CHANNEL SIGNATURE 1749 1204 SEQ ID NO:296 3458089 PR00169G POTASSIUM CHANNEL SIGNATURE 1540 1390 yes no SEQ ID NO:297 1329138 PR00168F SLOW VOLTAGE-GATED POTASSIUM CHANNEL 1307 1176 SIGNATUR SEQ ID NO:298 1514470 PR00944B COPPER ION BINDING PROTEIN SIGNATURE 1319 1255 no yes SEQ ID NO:299 1513293 PR00944B COPPER ION BINDING PROTEIN SIGNATURE SEQ ID NO:300 1514470 PR00944B COPPER ION BINDING PROTEIN SIGNATURE SEQ ID NO:301 1514470H1 PR00944B COPPER ION BINDING PROTEIN SIGNATURE SEQ ID NO:302 3372628 PR00944B COPPER ION BINDING PROTEIN SIGNATURE SEQ ID NO:303 4970006 PR00944B COPPER ION BINDING PROTEIN SIGNATURE 1319 1255 no no SEQ ID NO:304 4970006H1 PR00944B COPPER ION BINDING PROTEIN SIGNATURE SEQ ID NO:305 4970006FG PR00944B COPPER ION BINDING PROTEIN SIGNATURE

[0108]

Sequence CWU 0

0

* * * * *