U.S. patent application number 10/313542 was filed with the patent office on 2003-06-26 for composition for detection of genes encoding membrane-associated proteins.
This patent application is currently assigned to Incyte Genomics, Inc.. Invention is credited to Au-Young, Janice, Guegler, Karl J., Reddy, Roopa.
Application Number | 20030120057 10/313542 |
Document ID | / |
Family ID | 26816220 |
Filed Date | 2003-06-26 |
United States Patent
Application |
20030120057 |
Kind Code |
A1 |
Reddy, Roopa ; et
al. |
June 26, 2003 |
Composition for detection of genes encoding membrane-associated
proteins
Abstract
The present invention relates to a composition comprising a
plurality of polynucleotide sequences. The composition can be used
as probes or array elements.
Inventors: |
Reddy, Roopa; (Fremont,
CA) ; Guegler, Karl J.; (Menlo Park, CA) ;
Au-Young, Janice; (Brisbane, CA) |
Correspondence
Address: |
INCYTE GENOMICS, INC.
3160 PORTER DRIVE
PALO ALTO
CA
94304
US
|
Assignee: |
Incyte Genomics, Inc.
Palo Alto
CA
|
Family ID: |
26816220 |
Appl. No.: |
10/313542 |
Filed: |
December 5, 2002 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10313542 |
Dec 5, 2002 |
|
|
|
09495050 |
Jan 31, 2000 |
|
|
|
6492505 |
|
|
|
|
60118318 |
Feb 1, 1999 |
|
|
|
Current U.S.
Class: |
536/23.5 ;
536/24.3 |
Current CPC
Class: |
Y02A 50/30 20180101;
C12Q 2600/158 20130101; C12Q 1/6876 20130101; C12Q 1/6834
20130101 |
Class at
Publication: |
536/23.5 ;
536/24.3 |
International
Class: |
C07H 021/04 |
Claims
What is claimed is:
1. A composition comprising a plurality of polynucleotide sequences
comprising at least a fragment of a sequence selected from the
group consisting of SEQ ID NOs:1-305.
2. The composition of claim 1, wherein each of said polynucleotide
sequences comprises at least a fragment of a sequence encoding a
membrane-associated protein.
3. The composition of claim 1, wherein each of said polynucleotide
sequences comprises at least a fragment of a sequence encoding a
receptor.
4. The composition of claim 1, wherein each of said polynucleotide
sequences comprises at least a fragment of a sequence encoding an
ion channel.
5. The composition of claim 1, wherein each of said polynucleotide
sequences comprises at least a fragment of a sequence selected from
SEQ ID NOs:1-288.
6. The composition of claim 1, wherein said polynucleotide
sequences comprise at least a fragment of a sequence selected from
SEQ ID NOs:289-294.
7. The composition of claim 1, wherein said polynucleotide
sequences comprise at least a fragment of a sequence selected from
SEQ ID NOs:295-305.
8. The composition of claim 7, wherein the fragment is selected
from the group consisting of: (a) SEQ ID NOs:295-297; or (b) SEQ ID
NOs:298-305;
9. The composition of claim 1, wherein the polynucleotide sequence
is a probe.
10. The composition of claim 9, wherein the polynucleotide sequence
is immobilized on a substrate.
11. The composition of claim 1, wherein the polynucleotide sequence
is an array element.
Description
[0001] This application claims the benefit of U.S. Provisional
Application No. 60/118,318 our Docket No. PA-0013 P, filed on Feb.
1, 1999.
FIELD OF THE INVENTION
[0002] The present invention relates to a composition comprising a
plurality of polynucleotide sequences for use in research and
diagnostic applications.
BACKGROUND OF THE INVENTION
[0003] DNA-based arrays can provide a simple way to explore the
expression of a single polymorphic gene or a large number of genes.
When the expression of a single gene is explored, DNA-based arrays
are employed to detect the expression of specific gene variants.
For example, a p53 tumor suppressor gene array may be used to
determine whether individuals are carrying mutations that
predispose them to cancer. The array has over 50,000 DNA targets to
analyze more than 400 distinct mutations of p53. A cytochrome p450
gene array is useful to determine whether individuals have one of a
number of specific mutations that could result in increased drug
metabolism, drug resistance, or drug toxicity.
[0004] DNA-based array technology is especially relevant to screen
expression of a large number of genes rapidly. There is a growing
awareness that gene expression is affected in a global fashion and
that genetic predisposition, disease, or therapeutic treatment may
affect, directly or indirectly, the expression of a large number of
genes. In some cases the interactions may be expected, such as
where the genes are part of the same signaling pathway. In other
cases, such as when some of the genes participate in separate
signaling pathways, the interactions may be totally unexpected.
Therefore, DNA-based arrays can be used to investigate how genetic
predisposition, disease, or therapeutic treatment affect the
coregulation and expression of a large number of genes.
[0005] It would be advantageous to prepare DNA-based arrays that
can be used for monitoring the expression of a large number of
membrane-associated proteins. Proteins which span or are associated
with cell membranes include receptors, ion channels and symporters,
cytokines and their suppressors, monomeric or heterotrimeric G- and
ras-related proteins, lectins such as selectin, oncogenes and their
suppressors, and the like. Receptors include G protein coupled,
four transmembrane, and tyrosine kinase receptors. Some of these
proteins may span a cellular membrane and some may be secreted. The
secreted proteins typically include signal sequences that direct
them to their final cellular or extracellular destination.
[0006] The present invention provides for a composition comprising
a plurality of polynucleotide sequences for use in detecting
changes in expression of a large number of genes encoding proteins
which are membrane-associated proteins, receptors and ion channels.
Such a composition can be employed for the diagnosis or treatment
of any disease--a pancreatic disease, a cancer, an immunopathology,
a neuropathology and the like--where a defect in the expression of
a gene encoding membrane-associated proteins is involved.
SUMMARY OF THE INVENTION
[0007] In one aspect, the present invention provides a composition
comprising a plurality of polynucleotide sequences, wherein each of
said polynucleotide sequences comprises at least a fragment of a
gene encoding membrane-associated proteins, receptors and ion
channels.
[0008] In one preferred embodiment, the plurality of polynucleotide
sequences comprises at least a fragment of one or more of the
sequences, SEQ ID NOs:1-305, presented in the Sequence Listing. In
a second preferred embodiment, the composition comprises a
plurality of polynucleotide sequences comprising at least a
fragment of a gene encoding a membrane-associated protein. In a
third preferred embodiment, the composition comprises a plurality
of polynucleotide sequences comprising at least a fragment of a
gene encoding a receptor. In a fourth preferred embodiment, the
composition comprises a plurality of polynucleotide sequences
comprising at least a fragment of a gene encoding ion channels. In
a fifth preferred embodiment, the composition comprises a plurality
of polynucleotide sequences comprising at least a fragment of at
least one or more of the sequences of SEQ ID NOs:1-288. In a sixth
preferred embodiment, the composition comprises a plurality of
polynucleotide sequences comprising at least a fragment of at least
one or more of the sequences of SEQ ID NOs:289-294. In a seventh
preferred embodiment, the composition comprises a plurality of
polynucleotide sequences comprising at least a fragment of at least
one or more of the sequences of SEQ ID NOs:295-305. In one aspect,
the fragment is selected from the group consisting of SEQ ID
NOs:295-297, or SEQ ID NOs:298-305. In an eighth preferred
embodiment, the composition is a polynucleotide probe. In one
aspect, the composition is immobilized on a substrate. In a ninth
preferred embodiment, the composition is an hybridizable array
element.
[0009] The composition, a hybridizable array element, is useful to
monitor the expression of a plurality of expressed polynucleotides.
The microarray is used in the diagnosis and treatment of a
pancreatic disease, a cancer, an immunopathology, a neuropathology,
and the like.
[0010] In another aspect, the present invention provides an
expression profile that can reflect the expression levels of a
plurality of polynucleotide sequences in a sample. The expression
profile comprises a microarray and a plurality of detectable
complexes. Each detectable complex is formed by hybridization of at
least one probe polynucleotide sequence to at least one target
polynucleotide sequence and further comprises a labeling moiety for
detection.
DESCRIPTION OF THE SEQUENCE LISTING, FIGURES, AND TABLES
[0011] A portion of the disclosure of this patent document contains
material which is subject to copyright protection. The copyright
owner has no objection to the facsimile reproduction by anyone of
the patent document or the patent disclosure, as it appears in the
Patent and Trademark Office patent file or records, but otherwise
reserves all copyright rights whatsoever.
[0012] The Sequence Listing is a compilation of nucleotide
sequences obtained by sequencing and assembling clone inserts
(isolates) from various cDNA libraries. Each sequence is identified
by a sequence identification number (SEQ ID NO:) and by clone
number.
[0013] FIGS. 1A and 1B are an alignment of SEQ ID NOs:298-302
produced using GELVIEW Fragment Assembly System software (Genetics
Computer Group (GCG), Madison Wis.).
[0014] FIGS. 2A and 2B are an alignment of SEQ ID NOs:303-305
produced using GELVIEW Fragment Assembly System software (GCG).
[0015] Table 1 is a list of the sequences disclosed herein. By
column, the table contains: 1) SEQ ID NO: as shown in the Sequence
Listing; 2) Incyte Clone NO; 3) PRINT ID, designation of the
relevant PROSITE group; 4) PRINT DESCRIPTION; 5) PRINT STRENGTH,
the degree of correlation to the PROSITE group, >1300 is strong
and 1000 to 1300 is weak; 6) PRINT SCORE, where >1300 is strong
and 1000 to 1300 is suggestive; 7) TM, the presence of at least one
transmembrane domain; and 8) SIGNAL PEPTIDE, the presence of a
signal peptide. The table is arranged so that SEQ ID NOs:1-305
contain at least a fragment of a gene encoding a
membrane-associated protein, some of which are receptors, and some,
ion channels.
DESCRIPTION OF THE INVENTION
[0016] Definitions
[0017] The term "microarray" refers to an ordered arrangement of
hybridizable array elements. The elements are arranged so that
there are preferably at least one or more different elements, more
preferably at least 100 elements, even more preferably at least
1,000 elements, and most preferably at least 10,000 elements on a
one cm.sup.2 substrate surface. The maximum number of array
elements is unlimited, but is at least 100,000. Furthermore, the
hybridization signal from each array element is individually
distinguishable. In a preferred embodiment, the array elements
comprise polynucleotide sequences.
[0018] A "polynucleotide" refers to a chain of nucleotides.
Preferably, the chain has from about five to 10,000 nucleotides,
more preferably from about 50 to 3,500 nucleotides. The term
"probe" refers to a polynucleotide sequence capable of hybridizing
with a target sequence to form a polynucleotide probe/target
complex under hybridization conditions. A "target polynucleotide"
refers to a chain of nucleotides to which a polynucleotide probe
can hybridize by base pairing. In some instances, the sequences
will be completely complementary (no mismatches) when aligned; in
others, there may be up to a 10% mismatch.
[0019] A "plurality" refers preferably to a group of at least one
or more members, more preferably to a group of at least about 100,
even more preferably to a group of at least about 1,000 members,
and most preferably to a group of at least about 10,000 members.
The maximum number of members is unlimited, but is at least about
100,000 members.
[0020] A "fragment" means a stretch of at least about 100
consecutive nucleotides. A "fragment" can also mean a stretch of at
least about 100 consecutive nucleotides that contains one or more
deletions, insertions or substitutions. A "fragment" can also
include the entire open reading frame of a gene. Preferred
fragments are those that lack secondary structure as identified by
using computer software programs such as OLIGO 4.06 Primer Analysis
software (National Biosciences, Plymouth Minn.), LASERGENE software
(DNASTAR, Madison Wis.), MACDNASIS (Hitachi Software Engineering
Co., Ltd., San Bruno Calif.) and the like.
[0021] The term "gene" or "genes" refers to polynucleotide sequence
which may be the partial or complete and may comprise regulatory,
untranslated, or coding regions. The phrase "genes encoding
membrane-associated proteins, receptors, or ion channels" refers to
genes comprising sequences that contain conserved protein motifs or
domains that were identified by BLAST (Basic Local Alignment Search
Tool; Altschul (1993) J Mol Evol 36:290-300; and Altschul et al.
(1990) J Mol Biol 215:403-410), PRINTS, or other analytical tools.
Additionally, "genes encoding membrane-associated proteins,
receptors, or ion channels" refers to genes which may produce
proteins which span the cell membrane or have signal sequences
which direct them to their final cellular or extracellular
destination.
[0022] The Invention
[0023] The present invention provides a composition comprising a
plurality of polynucleotide sequences comprising at least a
fragment of a gene encoding a protein which is a receptor, ion
channel, or associated with cell membrane. Preferably, the
plurality of polynucleotide sequences comprise at least a fragment
of one or more of the sequences (SEQ ID NOs:1-305) presented in the
Sequence Listing. In one preferred embodiment, the composition
comprises a plurality of polynucleotide sequences, wherein each
sequence comprises at least a fragment of a sequence selected from
the group consisting of SEQ ID NOs:1-294. In a second preferred
embodiment, the composition comprises a plurality of polynucleotide
sequences, wherein each sequence comprises at least a fragment of a
sequence selected from the group consisting of SEQ ID
NOs:295-305.
[0024] A microarray can be used for large scale genetic or gene
expression analysis of a large number of polynucleotide sequences.
Such an analysis can be used in the diagnosis of diseases and in
the monitoring of treatments where altered expression of genes
encoding receptors, ion channels, or membrane-associated proteins
cause disease, such as pancreatic disease, cancer, an
immunopathology, neuropathology, and the like. Further, the
microarray can be employed to investigate an individual's
predisposition to a disease, such as pancreatic disease, cancer, an
immunopathology, or a neuropathology. Furthermore, the microarray
can be employed to investigate cellular responses to infection,
drug treatment, and the like.
[0025] When the composition of the invention is employed as
hybridizable array elements in a microarray, the array elements are
organized in an ordered fashion so that each element is present at
a specified location on the substrate. Because the array elements
are at specified locations on the substrate, the hybridization
patterns and intensities (which together create a unique expression
profile) can be interpreted in terms of expression levels of
particular genes and can be correlated with a particular disease or
condition or treatment.
[0026] The composition comprising a plurality of polynucleotide
sequences can also be used to purify a subpopulation of mRNAs,
cDNAs, genomic fragments, and the like, in a sample. Typically,
samples will include polynucleotides of interest and additional
nucleic acids which may contribute to background signal in a
hybridization. Therefore, it may be advantageous to remove these
additional nucleic acids before hybridization. One method for
removing the additional nucleic acids is to hybridize the sample
containing probe polynucleotides with immobilized polynucleotide
targets. Those nucleic acids which do not hybridize to the
polynucleotide targets are washed away. At a later point, the
immobilized target polynucleotides can be released in the form of
purified target polynucleotides.
[0027] Method for Selecting Polynucleotide Sequences
[0028] This section describes the selection of the plurality of
polynucleotide sequences. In one embodiment, the sequences are
selected based on the presence of shared signal sequence motifs.
For example, signal sequences generally contain 15 to 60 amino
acids and are located at the N-terminal end of the protein. The
signal sequence consists of three regions: 1) an n-region located
adjacent to the N-terminus which is composed of one to five amino
acids and usually carries a positive charge, 2) the h-region which
is composed of 7 to 15 hydrophobic amino acids and creates a
hydrophobic core; and 3) the c region which is located between the
h-region and the cleavage site and is composed of three to seven
polar, but mostly uncharged, amino acids. The signal sequence is
removed from the protein during posttranslational processing by
cleavage at the cleavage site.
[0029] A transmembrane protein is characterized by a polypeptide
chain which is exposed on both sides of a membrane. The cytoplasmic
and extracellular domains are separated by at least one
membrane-spanning segment which traverses the hydrophobic
environment of the lipid bilayer. The membrane-spanning segment is
composed of amino acid residues with nonpolar side chains, usually
in the form of an .alpha. helix. Segments which contain about 20-30
hydrophobic residues are long enough to span a membrane as an
.alpha. helix, and they can often be identified by means of a
hydropathy plot.
[0030] Receptor sequences are recognized by one or more hydrophobic
transmembrane regions, cysteine disulfide bridges between
extracellular loops, an extracellular N-terminus, and a cytoplasmic
C-terminus. For example, in G protein-coupled receptors (GPCRs),
the N-terminus interacts with ligands, the disulfide bridge
interacts with agonists and antagonists, the second cytoplasmic
loop has a conserved, acidic-Arg-aromatic triplet which may
interact with the G proteins, and the large third intracellular
loop interacts with G proteins to activate second messengers such
as cyclic AMP, phospholipase C, inositol triphosphate, or ion
channel proteins (Watson and Arkinstall (1994) The G-protein Linked
Receptor Facts Book, Academic Press, San Diego Calif.). Other
exemplary classes of receptors such as the tetraspanins (Maecker et
al. (1997) FASEB J 11:428-442), calcium dependent receptors (Speiss
(1990) Biochem 29:10009-18) and the single transmembrane receptors
may be similarly characterized relative to their intracellular and
extracellular domains, known motifs, and interactions with other
molecules.
[0031] An ion channel is a transmembrane protein that forms a
hydrophilic pore through which ions can cross the lipid bilayer of
the membrane. An ion channel usually shows some degree of ion
specificity, and up to a million ions per second may flow down
their electrochemical gradients through the open pore. Ion channels
are gated and allow ions to pass only under defined circumstances.
Gated channels may be either voltage-gated, such as the sodium
channel of neurons, or ligand-gated, such as the acetylcholine
receptor of cholinergic synapses.
[0032] Membrane-associated proteins, receptors or ion channels may
act directly as inhibitors or as stimulators of cell proliferation,
growth, attachment, angiogenesis, and apoptosis, or indirectly by
modulating the effects of transcription factors, matrix and
adhesion molecules, cell cycle regulators, and other molecules in
cell signaling pathways. In addition, cell signaling molecules may
act as ligands or ligand cofactors for receptors which modulate
cell growth, proliferation, and differentiation. These molecules
may be identified by sequence homology to molecules whose function
has been characterized, and by the identification of their
conserved domains. Membrane-associated proteins, receptors or ion
channels may be characterized using programs such as BLAST, PRINTS,
or Hidden Markov Models (HMM). Fragments which include
characterized, conserved regions of membrane-associated proteins,
receptors, or ion channels may be used in hybridization
technologies to identify similar proteins.
[0033] A large number of clones from a variety of cDNA libraries
can be screened using software well known in the art to discover
sequences with conserved protein domains or motifs. Such sequences
may be screened using the BLOCK 2 Bioanalysis program (Incyte
Pharmaceuticals, Palo Alto Calif.), a motif analysis program based
on sequence information contained in the SWISSPROT and PROSITE
databases, which is useful for determining the function of
uncharacterized proteins translated from genomic or cDNA sequences
(Bairoch et al. (1997) Nucleic Acids Res 25:217-221; Attwood et al.
(1997) J Chem Inf Comput Sci 37:417-424). PROSITE is particularly
useful to identify functional or structural domains that cannot be
detected using common motifs because of extreme sequence
divergence. The method, which is based on weight matrices,
calibrates the motifs against the SWISS-PROT database to obtain a
measure of the chance distribution of the matches. Similarly,
databases such as PRINTS store conserved motifs useful in the
characterization of proteins (Attwood et al.(1998) Nucl Acids Res
26:304-308). These conserved motifs are used in the selection and
design of probes. The PRINTS database can be searched using the
BLIMPS search program. The PRINTS database of protein family
"fingerprints" complements the PROSITE database and utilizes groups
of conserved motifs within sequence alignments to build
characteristic signatures of different polypeptide families.
Alternatively, HMMs can be used to find shared motifs, specifically
consensus sequences (Pearson and Lipman (1988) Proc Natl Acad Sci
85:2444-2448; Smith and Waterman (1981) J Mol Biol 147:195-197).
Although HMMs were initially developed to examine speech
recognition patterns, they have been used in biology to analyze
protein and DNA sequences and to model protein structure. HMMs have
a formal probabilistic basis and use position-specific scores for
amino acids or nucleotides. The algorithms are flexible in that
they incorporate information from newly identified sequences to
build even more successful patterns. HMMs are useful to identify
the transmembrane regions and signal peptides.
[0034] In another embodiment, the sequences disclosed in the
Sequence Listing can be searched against GenBank and SWISSPROT
databases using BLAST. Then, the descriptions of those sequences
with homology to the disclosed sequences may be scanned using
keywords such as receptor, transmembrane, receptor, channel,
oncogene, inhibitor, and the like.
[0035] Sequences identified by the methods described above are
provided in SEQ ID NOs:1-305 in the Sequence Listing. Table 1
provides the annotation to the referenced PRINTS sequences and
specifies whether they possess transmembrane and signal peptide
motifs. The resulting composition can comprise polynucleotide
sequences that are not redundant, i.e., there is no more than one
polynucleotide sequence to represent a particular gene.
Alternatively, the composition can contain polynucleotide probes or
microarray elements that are redundant, i.e., a gene is represented
by more than one polynucleotide sequence.
[0036] The selected polynucleotide sequences may be manipulated
further to optimize their performance as hybridization probes. To
optimize probe selection, the sequences are examined using a
computer algorithms, which are well known in the art, to identify
fragments of genes without potential secondary structure. Such
computer algorithms are found in OLIGO 4.06 Primer Analysis
software (National Biosciences) or LASERGENE software (DNASTAR).
These programs can search nucleotide sequences to identify stem
loop structures and tandem repeats and to analyze G+C content of
the sequence (those sequences with a G+C content greater than 60%
are excluded). Alternatively, the probes can be optimized by trial
and error. Experiments can be performed to determine whether the
probes hybridize optimally to target sequences under experimental
conditions.
[0037] Where the greatest numbers of different polynucleotide
sequences are desired, the sequences are extended to assure that
different polynucleotide sequences are not derived from the same
gene, i.e., the polynucleotide sequences are not redundant. The
probe sequences may be extended utilizing the partial nucleotide
sequences derived from clone isolates by employing methods well
known in the art. For example, one method which may be employed,
"restriction-site" PCR, uses universal primers to retrieve unknown
sequence adjacent to a known locus (Sarkar (1993) PCR Methods
Applic 2: 318-322).
[0038] Polynucleotide Sequences
[0039] This section describes the polynucleotide sequences. The
polynucleotide sequences can be genomic DNA, cDNA, mRNA, or any
RNA-like or DNA-like material, such as peptide nucleic acids,
branched DNAs, and the like. The polynucleotide sequences can be
sense or antisense, complementary sequences. Where target
polynucleotides are double stranded, the probes may be either sense
or antisense strands. Where the target polynucleotides are single
stranded, the probes are complementary single strands.
[0040] In one embodiment, the polynucleotide sequences are cDNAs,
the size of which may vary, and are preferably from 1000 to 10,000
nucleotides, more preferably from 150 to 5000 nucleotides. In a
second embodiment, the polynucleotide sequences are contained
within plasmids. In this case, the size of the inserted cDNA
sequence, excluding the vector DNA and its regulatory sequences,
may vary from about 50 to 12,000 nucleotides, more preferably from
about 150 to 5000 nucleotides.
[0041] The polynucleotide can be prepared by a variety of synthetic
or enzymatic schemes which are well known in the art. Sequences can
be synthesized, in whole or in part, using chemical or enzymatic
methods well known in the art (Caruthers et al. (1980) Nucl Acids
Symp Ser (7) 215-233; Ausubel et al. (1997) Short Protocols in
Molecular Biology, John Wiley & Sons, New York N.Y.).
[0042] Nucleotide analogues, which can base pair with the target
nucleotide sequences, can be incorporated into the probe sequences
by methods well known in the art. For example, certain guanine
nucleotides can be substituted with hypoxanthine which hydrogen
bonds with cytosine, but these bonds are less stable than those
formed between guanine and cytosine. Alternatively, adenine
nucleotides can be substituted with 2, 6-diaminopurine which forms
stronger bonds with thymidine than those between adenine and
thymidine. Additionally, the polynucleotide sequences can include
nucleotides that have been derivatized chemically or enzymatically.
Typical chemical modifications include derivatization with acyl,
alkyl, aryl or amino groups.
[0043] The polynucleotide sequences can be immobilized on a
substrate. Preferred substrates are any suitable rigid or
semi-rigid support including membranes, filters, chips, slides,
wafers, fibers, magnetic or nonmagnetic beads, gels, tubing,
plates, polymers, microparticles and capillaries. The substrate can
have a variety of surface forms, such as wells, trenches, pins,
channels and pores, to which the polynucleotide sequences are
bound. Preferably, the substrates are optically transparent.
[0044] Sequences can be synthesized, in whole or in part, on the
surface of a substrate using a chemical coupling procedure and a
piezoelectric printing apparatus, such as that described in PCT
publication WO95/251116 (Baldeschweiler et al.). Alternatively, the
target can be synthesized on a substrate surface using a
self-addressable electronic device that controls when reagents are
added (Heller et al. U.S. Pat. No. 5,605,662).
[0045] Complementary DNA (cDNA) can be arranged and immobilized on
a substrate. The sequences can be immobilized by covalent means
such as by chemical bonding procedures or UV. In one such method, a
cDNA is bound to a glass surface which has been modified to contain
epoxide or aldehyde groups. In another case, a cDNA target is
placed on a polylysine coated surface and UV cross-linked (Shalon
et al. WO95/35505). In yet another method, a DNA is actively
transported from a solution to a given position on a substrate by
electrical means (U.S. Pat. No. 5,605,662). Alternatively,
individual DNA clones can be gridded on a filter. Cells are lysed,
proteins and cellular components degraded, and the DNA coupled to
the filter by UV cross-linking.
[0046] Furthermore, the sequences do not have to be directly bound
to the substrate, but rather can be bound to the substrate through
a linker group. The linker groups are typically about 6 to 50 atoms
long, and they provide exposure to the attached polynucleotide
sequence. Preferred linker groups include ethylene glycol
oligomers, diamines, diacids and the like. Reactive groups on the
substrate surface react with one of the terminal portions of the
linker to bind the linker to the substrate. The other terminal
portion of the linker is adapted to bind the polynucleotide
sequence.
[0047] The polynucleotide sequences can be attached to a substrate
by dispensing reagents for target synthesis on the substrate
surface or by dispensing preformed DNA fragments or clones on the
substrate surface. Typical dispensers include a micropipette
delivering solution to the substrate with a robotic system to
control the position of the micropipette with respect to the
substrate. There can be a multiplicity of dispensers so that
reagents can be delivered to the reaction regions
simultaneously.
[0048] Sample Preparation
[0049] In order to conduct sample analysis, a sample containing
nucleic acids is provided. The samples can be obtained from any
bodily fluid (blood, urine, saliva, phlegm, gastric juices, etc.),
cultured cells, biopsies, or other tissue preparations. DNA or RNA
can be isolated from the sample according to any of a number of
methods well known to those of skill in the art. For example,
methods of purification of nucleic acids are described in Tijssen
(1993; Laboratory Techniques in Biochemistry and Molecular Biology:
Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic
Acid Preparation, Elsevier Science, New York N.Y.). In one case,
total RNA is isolated using the TRIZOL reagent (Life Technologies,
Gaithersburg Md.), and mRNA is isolated using oligo d(T) column
chromatography or glass beads. Alternatively, when probe
polynucleotides are derived from an mRNA, the probe polynucleotides
can be DNA reverse transcribed from the mRNA, an RNA transcribed
from that cDNA, a DNA amplified from that DNA, an RNA transcribed
from the amplified DNA, and the like. When the target
polynucleotide is derived from cDNA, the target polynucleotide can
be DNA amplified from DNA or DNA reverse transcribed from RNA. In
yet another alternative, the polynucleotide sequences are prepared
by more than one method.
[0050] When polynucleotide sequences are amplified, it is desirable
to amplify the nucleic acid sample and maintain the relative
abundances represented in the original sample including low
abundance transcripts. Total mRNA can be amplified by reverse
transcription using a reverse transcriptase and a primer consisting
of oligo d(T) and a sequence encoding the phage T7 promoter to
provide a single stranded DNA template. The second DNA strand is
polymerized using a DNA polymerase and a RNAse which assists in
breaking up the DNA/RNA hybrid. After synthesis of the double
stranded DNA, T7 RNA polymerase can be added, and RNA transcribed
from the second DNA strand template (Van Gelder et al. U.S. Pat.
No. 5,545,522). RNA can be amplified in vitro, in situ or in vivo
(Eberwine U.S. Pat. No. 5,514,545,).
[0051] It is also advantageous to include quantitation controls
within the sample to assure that amplification and labeling
procedures do not change the true distribution of probe
polynucleotides in a sample. For this purpose, a sample is spiked
with a known amount of a control probe polynucleotide and the
composition of target polynucleotide sequences includes reference
target sequences which specifically hybridize with the control
probe polynucleotides. After hybridization and processing, the
hybridization signals obtained should reflect accurately the amount
of control probe polynucleotides added to the sample.
[0052] Prior to hybridization, it may be desirable to fragment the
probe polynucleotides. Fragmentation improves hybridization by
minimizing secondary structure and cross-hybridization to
polynucleotides in the sample with low or no complementarity.
Fragmentation can be performed by mechanical or chemical means.
[0053] The probe polynucleotides may be labeled with one or more
labeling moieties to allow for detection of hybridized probe/target
polynucleotide complexes. The labeling moieties can include
compositions that can be detected by spectroscopic, photochemical,
biochemical, bioelectronic, immunochemical, electrical, optical or
chemical means. The labeling moieties include radioisotopes, such
as .sup.32P, .sup.33P or .sup.35S, chemiluminescent compounds,
labeled binding proteins, heavy metal atoms, spectroscopic markers,
such as fluorescent markers and dyes, magnetic labels, linked
enzymes, mass spectrometry tags, spin labels, electron transfer
donors and acceptors, and the like.
[0054] Exemplary dyes include quinoline dyes, triarylmethane dyes,
phthaleins, azo dyes, cyanine dyes and the like. Preferably,
fluorescent markers absorb light above about 300 nm, preferably
above 400 nm, and usually emit light at wavelengths at least
greater than 10 nm above the wavelength of the light absorbed.
Preferred fluorescent markers include fluorescein, phycoerythrin,
rhodamine, lissamine, and Cy3 and Cy5 (Amersham Pharmacia Biotech,
Piscataway N.J.).
[0055] Labeling can be carried out during an amplification
reaction, such as polymerase chain and in vitro transcription
reactions, or by nick translation or 5' or 3'-end-labeling
reactions. In one case, labeled nucleotides are used in an in vitro
transcription reaction. When the label is incorporated after or
without an amplification step, the label is incorporated by using
terminal transferase or by kinasing the 5' end of the
polynucleotide sequence and then incubating overnight with a
labeled oligonucleotide in the presence of T4 RNA ligase.
[0056] Alternatively, the labeling moiety can be incorporated after
hybridization once a probe/target complex has formed. In one case,
biotin is first incorporated during an amplification step as
described above. After the hybridization reaction, unbound nucleic
acids are rinsed away so that the only biotin present is attached
to probe polynucleotides complexed with the target polynucleotides.
An avidin-conjugated fluorophore, such as avidin-phycoerythrin,
that binds with high affinity to biotin is added. In another case,
the labeling moiety is incorporated by intercalation into bound
probe/target complexes. In this case, an intercalating dye such as
a psoralen-linked dye can be employed.
[0057] Under some circumstances it may be advantageous to
immobilize the probe polynucleotides on a substrate and have the
polynucleotide targets bind to the immobilized probe
polynucleotides. In such cases the probe polynucleotides can be
attached to a substrate as described above.
[0058] Hybridization and Detection
[0059] Hybridization causes a denatured polynucleotide probe and a
denatured complementary target to form a stable duplex through base
pairing. Hybridization methods are well known to those skilled in
the art. (See, e.g., Ausubel, supra, units 2.8-2.11, 3.18-3.19 and
4-6-4.9.) Conditions can be selected for hybridization where
completely complementary probe and target can hybridize, i.e., each
base pair must interact with its complementary base pair.
Alternatively, conditions can be selected where probe and target
have mismatches but are still able to hybridize. Suitable
conditions can be selected, for example, by varying the
concentrations of salt in the prehybridization, hybridization, and
wash solutions or by varying the hybridization and wash
temperatures. With some membranes, the temperature can be decreased
by adding formamide to the prehybridization and hybridization
solutions.
[0060] Hybridization can be performed at low stringency with
buffers, such as 5.times.SSC with 1% sodium dodecyl sulfate (SDS)
at 60.degree. C., which permits hybridization between probe and
target sequences that contain some mismatches to form probe/target
complexes. Subsequent washes are performed at higher stringency
with buffers such as 0.2.times.SSC with 0.1% SDS at either
45.degree. C. (medium stringency) or 68.degree. C. (high
stringency), to maintain hybridization of only those probe/target
complexes that contain completely complementary sequences.
Background signals can be reduced by the use of detergents such as
SDS, Sarcosyl, or Triton X-100, or a blocking agent, such as salmon
sperm DNA.
[0061] Hybridization specificity can be evaluated by comparing the
hybridization of control probe sequences to control target
sequences that are added to a sample in a known amount. The control
probe may have one or more sequence mismatches compared with the
corresponding control target. In this manner, it is possible to
evaluate whether only complementary probes are hybridizing to the
targets or whether mismatched hybrid duplexes are forming.
[0062] Hybridization reactions can be performed in absolute or
differential hybridization formats. In the absolute hybridization
format, probe polynucleotides from one sample are hybridized to
microarray elements, and signals detected after hybridization
complexes form. Signal strength correlates with probe
polynucleotide levels in a sample. In the differential
hybridization format, differential expression of a set of genes in
two biological samples is analyzed. Probe polynucleotides from the
two samples are prepared and labeled with different labeling
moieties. A mixture of the two labeled probe polynucleotides is
hybridized to the microarray elements, and signals are examined
under conditions in which the emissions from the two different
labels are individually detectable. Targets in the microarray that
are hybridized to substantially equal numbers of probes derived
from both biological samples give a distinct combined fluorescence
(Shalon WO95/35505). In a preferred embodiment, the labels are
fluorescent labels with distinguishable emission spectra, such as a
lissamine conjugated nucleotide analog and a fluorescein conjugated
nucleotide analog. In another embodiment Cy3/Cy5 fluorophores
(Amersham Pharmacia Biotech) are employed.
[0063] After hybridization, the microarray is washed to remove
nonhybridized nucleic acids, and complex formation between the
hybridizable array elements and the probe polynucleotides is
examined. Methods for detecting complex formation are well known to
those skilled in the art. In a preferred embodiment, the probe
polynucleotides are labeled with a fluorescent label, and
measurement of levels and patterns of fluorescence indicative of
complex formation is accomplished by fluorescence microscopy,
preferably confocal fluorescence microscopy. An argon ion laser
excites the fluorescent label, emissions are directed to a
photomultiplier, and the amount of emitted light is detected and
quantitated. The detected signal should be proportional to the
amount of probe/target polynucleotide complexes at each position of
the microarray. The fluorescence microscope can be associated with
a computer-driven scanner device to generate a quantitative
two-dimensional image of hybridization intensity. The scanned image
is examined to determine the abundance/expression level of each
hybridized probe polynucleotide.
[0064] Typically, microarray fluorescence intensities can be
normalized to take into account variations in hybridization
intensities when more than one microarray is used under similar
test conditions. In a preferred embodiment, individual
polynucleotide probe/target complex hybridization intensities are
normalized using the intensities derived from internal
normalization controls contained on each microarray.
[0065] Expression Profiles
[0066] Expression profiles using the composition of this invention
may be used to detect changes in the expression of genes implicated
in disease. These genes include genes whose altered expression is
correlated with pancreatic disease, cancer, immunopathology,
neuropathology, and the like.
[0067] The expression profile comprises the polynucleotide
sequences of the Sequence Listing. The expression profile also
includes a plurality of detectable complexes. Each complex is
formed by hybridization of one or more polynucleotide sequences or
array elements to one or more complementary probe polynucleotides.
At least one of the polynucleotide sequences, preferably a
plurality of polynucleotide sequences, is hybridized to a
complementary target polynucleotide forming at least one, and
preferably a plurality, of complexes. A complex is detected by the
incorporation of at least one labeling moiety, described above, in
the complex. Expression profiles provide "snapshots" that reflect
unique expression patterns that are characteristic of a disease or
condition.
[0068] After performing hybridization experiments and interpreting
the signals produced by complexes on a microarray, particular
polynucleotide sequences can be identified based on their
expression patterns. Such polynucleotide sequences can be used to
clone a full length sequence for the gene, to produce a
polypeptide, to develop a diagnostic panel for a particular
disease, to choose a gene for potential therapeutic use, and the
like.
[0069] Additional Utility of the Invention
[0070] Microarrays containing the sequences of the Sequence Listing
can be employed in several applications including diagnostics,
prognostics and treatment regimens, drug discovery and development,
toxicological and carcinogenicity studies, forensics,
pharmacogenomics and the like. In one situation, the microarray is
used to monitor the progression of disease. Researchers can assess
and catalog the differences in gene expression between healthy and
diseased tissues or cells. By analyzing changes in patterns of gene
expression, disease can be diagnosed at earlier stages before the
patient is symptomatic. The invention can also be used to monitor
the efficacy of treatment. For some treatments with known side
effects, the microarray is employed to "fine tune" the treatment
regimen. A dosage is established that causes a change in genetic
expression patterns indicative of successful treatment. Expression
patterns associated with undesirable side effects are avoided. This
approach may be more sensitive and rapid than waiting for the
patient to show inadequate improvement, or to manifest side
effects, before altering the course of treatment.
[0071] Alternatively, animal models which mimic a disease can be
used, rather than patients, to characterize expression profiles
associated with a particular disease or condition. This gene
expression data may be useful in diagnosing and monitoring the
course of disease in the model, in determnining gene that are
candidates for intervention, and in testing novel treatment
regimens. Subsequently, the expression profile following protocols
and treatments successful in the model system may be used on and
monitored in human patients.
[0072] The expression of genes encoding membrane-associated
proteins, receptors, and ion channels was highly associated with
pancreatic tissue; .about.45% of the sequences of the Sequence
Listing were expressed in pancreatic tissues. In particular, the
microarray and expression profile is useful to diagnose a
conditions of the pancreas such as diabetes, pancreatitus,
pancreatic cholera, hyperlipidemia, fibrocystic disease, and
cancers and tumors of the pancreas.
[0073] The expression of genes encoding membrane-associated
proteins, receptors, and ion channels is closely associated with
immune conditions, disorders and diseases; .about.20% of the
sequences of the Sequence Listing were expressed in tissues from
patients with immunological conditions such as AIDS, Addison's
disease, ARDS, allergies, ankylosing spondylitis, amyloidosis,
anemia, asthma, atherosclerosis, autoimmune hemolytic anemia,
autoimmune thyroiditis, bronchitis, cholecystitis, contact
dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis,
diabetes mellitus, emphysema, erythroblastosis fetalis, erythema
nodosum, atrophic gastritis, glomerulonephritis, Goodpasture's
syndrome, gout, Graves' disease, Hashimoto's thyroiditis,
hypereosinophilia, irritable bowel syndrome, multiple sclerosis,
myasthenia gravis, myocardial or pericardial inflammation,
osteoarthritis, osteoporosis, pancreatitis, polymyositis,
psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma,
Sjogren's syndrome, systemic anaphylaxis, systemic lupus
erythematosus, systemic sclerosis, thrombocytopenic purpura,
ulcerative colitis, uveitis, Werner syndrome, complications of
cancer, hemodialysis, and extracorporeal circulation, viral,
bacterial, fungal, parasitic, protozoal, and helminthic infections,
and trauma.
[0074] The expression of genes encoding membrane-associated
proteins, receptors, and ion channels is closely associated with
cancers; .about.10% of the sequences of the Sequence Listing were
expressed in cancerous tissues. In particular, the microarray and
expression profile is useful to diagnose a cancer such as
adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma and
teratocarcinoma. Such cancers include, but are not limited to,
cancers of the adrenal gland, bladder, bone, bone marrow, brain,
breast, cervix, colon, gall bladder, ganglia, gastrointestinal
tract, heart, kidney, liver, lung, muscle, ovary, pancreas,
parathyroid, penis, prostate, salivary glands, skin, spleen,
testis, thymus, thyroid and uterus.
[0075] The expression of genes encoding membrane-associated
proteins, receptors, and ion channels is also closely associated
with the immune response. Therefore, the microarray can be used to
diagnose immunopathologies including, but not limited to, AIDS,
Addison's disease, adult respiratory distress syndrome, allergies,
anemia, asthma, atherosclerosis, bronchitis, cholecystitus, Crohn's
disease, ulcerative colitis, atopic dermatitis, dermatomyositis,
diabetes mellitus, emphysema, atrophic gastritis,
glomerulonephritis, gout, Graves' disease, hypereosinophilia,
irritable bowel syndrome, lupus erythematosus, multiple sclerosis,
myasthenia gravis, myocardial or pericardial inflammation,
osteoarthritis, osteoporosis, pancreatitis, polymyositis,
rheumatoid arthritis, scleroderma, Sjogren's syndrome, and
autoimmune thyroiditis; complications of cancer, hemodialysis,
extracorporeal circulation; viral, bacterial, fungal, parasitic,
and protozoal infections; and trauma.
[0076] Neuropathologies are also effected by the expression of
genes encoding membrane-associated proteins, receptors, and ion
channels; in fact, .about.1% of the sequences of the Sequence
Listing were expressed in neuronal tissues. Thus, the microarray
can be used to diagnose neuropathologies including, but not limited
to, akathesia, Alzheimer's disease, amnesia, amyotrophic lateral
sclerosis, bipolar disorder, catatonia, cerebral neoplasms,
dementia, depression, Down's syndrome, tardive dyskinesia,
dystonias, epilepsy, Huntington's disease, multiple sclerosis,
neurofibromatosis, Parkinson's disease, paranoid psychoses,
schizophrenia, and Tourette's disorder.
[0077] Also, researchers can use the microarray to rapidly screen
large numbers of candidate drug molecules, looking for ones that
produce an expression profile similar to those of known therapeutic
drugs, with the expectation that molecules with the same expression
profile will likely have similar therapeutic effects. Thus, the
invention provides the means to determine the molecular mode of
action of a drug.
[0078] It is understood that this invention is not limited to the
particular devices, machines, materials and methods described.
Although preferred embodiments are described; devices, machines,
materials and methods similar or equivalent to these embodiments
may be used to practice the invention. The preferred embodiments
are not intended to limit the scope of the invention which is
limited only by the appended claims.
[0079] The singular forms "a", "an", and "the" include plural
reference unless the context clearly dictates otherwise. All
technical and scientific terms have the meanings commonly
understood by one of ordinary skill in the art. All patents
mentioned herein are incorporated by reference for the purpose of
describing and disclosing the devices, machines, materials and
methods which are presented and which might be used in connection
with the invention. Nothing in the specification is to be construed
as an admission that the invention is not entitled to antedate such
disclosure by virtue of prior invention.
EXAMPLES
[0080] For purposes of example, the preparation and sequencing of
the PANCNOT07 cDNA library is described. Preparation and sequencing
of cDNAs in libraries in the LIFESEQ database (Incyte
Pharmaceuticals) have varied over time, and the gradual changes
involved use of kits, plasmids, and machinery available at the
particular time the library was made and analyzed.
[0081] I cDNA Library Construction
[0082] The PANCNOT07 cDNA library was constructed from pancreas
tissue obtained from a 25-week-old Caucasian male fetus. The frozen
tissue was homogenized and lysed using a POLYTRON homogenizer
(PT-3000; Brinkmann Instruments, Westbury N.J.) in guanidinium
isothiocyanate solution. The lysate was centrifuged over a 5.7 M
CsCl cushion using an SW28 rotor in a L8-70M ultracentrifuge
(Beckman Coulter, Fullerton Calif.) for 18 hours at 25,000 rpm at
ambient temperature. The RNA was extracted with acid phenol (pH
4.7), precipitated using 0.3 M sodium acetate and 2.5 volumes of
ethanol, resuspended in RNAse-free water, and treated with DNase at
37.degree. C. Extraction and precipitation were repeated as before.
The mRNA was isolated with the OLIGOTEX kit (Qiagen, Chatsworth
Calif.) and used to construct the cDNA library.
[0083] The mRNA was handled according to the recommended protocols
in the SUPERSCRIPT Plasmid system (Life Technologies). The cDNAs
were fractionated on a SEPHAROSE CL4B column (Amersham Pharmacia
Biotech), and those cDNAs exceeding 400 bp were ligated into pINCY1
plasmid (Incyte Pharmaceuticals). The plasmid was then transformed
into DH5.alpha. competent cells (Life Technologies).
[0084] II Isolation and Sequencing of cDNA Clones
[0085] Plasmid DNA was released from the cells and purified using
the REAL Prep 96 plasmid kit (Qiagen). This kit enabled the
simultaneous purification of 96 samples in a 96-well block using
multi-channel reagent dispensers. The recommended protocol was
employed except for the following changes: 1) the bacteria were
cultured in 1 ml of sterile Terrific Broth (Life Technologies) with
carbenicillin at 25 mg/L and glycerol at 0.4%; 2) after
inoculation, the cultures were incubated for 19 hours; and at the
end of incubation, the cells were lysed with 0.3 ml of lysis
buffer; and 3) following isopropanol precipitation, the DNA pellet
was resuspended in 0.1 ml of distilled water. After the last step
in the protocol, samples were transferred to a 96-well block for
storage at 4.degree. C.
[0086] The cDNAs were sequenced by the method of Sanger and Coulson
(1975, J Mol Biol 94:441-448). A MICROLAB 2200 (Hamilton, Reno
Nev.) in combination with DNA Engine thermal cyclers (PTC200; MJ
Research, Watertown Mass.) were used to prepare the DNA. After
thermal cycling, the A, C, G, and T reactions with each DNA
template were combined. Then, 50 .mu.l 100% ethanol was added, and
the solution was spun at 4.degree. C. for 30 min. The supernatant
was decanted, and the pellet was rinsed with 100 .mu.l 70% ethanol.
After being spun for 15 min, the supernatant was discarded and the
pellet was dried for 15 min under vacuum. The DNA sample was
dissolved in 3 .mu.l of formaldehyde/50 mM EDTA and loaded into
wells in volumes of 2 .mu.l per well for sequencing on ABI 377 DNA
Sequencing systems (PE Biosystems, Foster City Calif.).
[0087] Most of the sequences were sequenced using standard ABI
protocols and kits (Cat. Nos. 79345, 79339, 79340, 79357, 79355; PE
Biosystems) at solution volumes of 0.25.times.-1.0.times.
concentrations. Some of the sequences were sequenced using
solutions and dyes from Amersham Pharmacia Biotech).
[0088] III Characterization of cDNA Clones
[0089] The nucleotide sequences of the Sequence Listing, as well as
the amino acid sequences deduced from them, were used as query
sequences against GenBank, SwissProt, BLOCKS, and PRINTS databases.
The sequences in these databases, which contain previously
identified and annotated sequences, were searched for regions of
similarity using BLAST or FASTA (Pearson, W. R. (1990) Methods
Enzymol 183:63-98; and Smith and Waterman (1981) Adv Appl Math
2:482-489).
[0090] VII. Extension of cDNA Sequences
[0091] The original nucleic acid sequence was extended using the
Incyte cDNA clone and oligonucleotide primers. One primer was
synthesized to initiate 5' extension of the known fragment, and the
other, to initiate 3' extension of the known fragment. The initial
primers were designed using OLIGO 4.06 software (National
Biosciences), or another appropriate program, to be about 22 to 30
nucleotides in length, to have a GC content of about 50% or more,
and to anneal to the target sequence at temperatures of about
68.degree. C. to about 72.degree. C. Any stretch of nucleotides
which would result in hairpin structures and primer-primer
dimerizations was avoided.
[0092] Selected cDNA libraries, such as a pancreas library, were
used to extend the sequence. If more than one extension was
necessary, additional or nested sets of primers were designed.
Preferred libraries are ones that have been size-selected to
include larger cDNAs. Also, random primed libraries are preferred
because they will contain more sequences with the 5' and upstream
regions of genes. A randomly primed library is particularly useful
if an oligo d(T) library does not yield a full-length cDNA. Genomic
libraries are useful for extension 5' of the promoter binding
region to obtain regulatory elements.
[0093] High fidelity amplification was obtained by PCR using
methods well known in the art. PCR was performed in 96-well plates
using the DNA ENGINE thermal cycler (MJ Research). The reaction mix
contained DNA template, 200 nmol of each primer, reaction buffer
containing Mg.sup.2+, (NH.sub.4).sub.2SO.sub.4, and
.beta.-mercaptoethanol, TAQ DNA polymerase (Amersham Pharmacia
Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA
polymerase (Stratagene, San Diego Calif.), with the following
parameters for primer pair PCI A and PCI B: Step 1: 94.degree. C.,
3 min; Step 2: 94.degree. C., 15 sec; Step 3: 60.degree. C., 1 min;
Step 4: 68.degree. C., 2 min; Step 5: Steps 2, 3, and 4 repeated 20
times; Step 6: 68.degree. C., 5 min; Step 7: storage at 4.degree.
C. In the alternative, the parameters for primer pair T7 and SK+
were as follows: Step 1: 94.degree. C., 3 min; Step 2: 94.degree.
C., 15 sec; Step 3: 57.degree. C., 1 min; Step 4: 68.degree. C., 2
min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6:
68.degree. C., 5 min; Step 7: storage at 4.degree. C.
[0094] The concentration of DNA in each well was determined by
dispensing 100 .mu.l PICOGREEN quantitation reagent (0.25% v/v;
Molecular Probes, Eugene Oreg.) dissolved in 1.times.TE and 0.5
.mu.l of undiluted PCR product into each well of an opaque
fluorimeter plate (Corning Costar, Acton Mass.) and allowing the
DNA to bind to the reagent. The plate was scanned in a Fluroskan II
(Labsystems Oy, Helsinki, Fla.) to measure the fluorescence of the
sample and to quantify the concentration of DNA. A 5 .mu.l to 10
.mu.l aliquot of the reaction mixture was analyzed by
electrophoresis on a 1% agarose mini-gel to determine which
reactions were successful in extending the sequence.
[0095] The extended nucleotides were desalted and concentrated,
transferred to 384-well plates, digested with CviJI cholera virus
endonuclease (Molecular Biology Research, Madison Wis.), and
sonicated or sheared prior to religation into pUC18 vector
(Amersham Pharmacia Biotech). For shotgun sequencing, the digested
nucleotides were separated on low concentration (0.6 to 0.8%)
agarose gels, fragments were excised, and agar digested with
AGARACE enzyme (Promega, Madison Wis.). Extended clones were
religated using T4 ligase (New England Biolabs, Beverly Mass.) into
pUC18 vector (Amersham Pharmacia Biotech), treated with Pfu DNA
polymerase (Stratagene) to fill-in restriction site overhangs, and
transfected into competent E. coli cells. Transformed cells were
selected on antibiotic-containing media, and individual colonies
were picked and cultured overnight at 37.degree. C. in 384-well
plates in LB/2.times.carbenicillin liquid media.
[0096] The cells were lysed, and DNA was amplified by PCR using TAQ
DNA polymerase (Amersham Pharmacia Biotech) and Pfu DNA polymerase
(Stratagene) with the following parameters: Step 1: 94.degree. C.,
3 min; Step 2: 94.degree. C., 15 sec; Step 3: 60.degree. C., 1 min;
Step 4: 72.degree. C., 2 min; Step 5: steps 2, 3, and 4 repeated 29
times; Step 6: 72.degree. C., 5 min; Step 7: storage at 4.degree.
C. DNA was quantified using PICOGREEN reagent (Molecular Probes) as
described above. Samples with low DNA recoveries were reamplified
using the same conditions described above. Samples were diluted
with 20% dimethysulphoxide (1:2, v/v), and sequenced using DYENAMIC
energy transfer sequencing primers and the DYENAMIC DIRECT cycle
sequencing kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE
Terminator cycle sequencing kit (PE Biosystems). The extended
sequences were assembled with the original clone using CONSED,
PHRAP or GELVIEW software (GCG) and reanalyzed using BLAST, FASTA,
or similar sequence analysis programs well known in the art. (See,
e.g., Ausubel, supra, unit 7.7, pp. 7.65-69.)
[0097] IV Selection of Sequences
[0098] The sequences found in the Sequence Listing were selected
because they possessed annotation, motifs, domains, regions or
other patterns consistent with genes encoding proteins associated
with membranes, receptors, or ion channels. The PRINTS database was
searched using the BLIMPS search program to obtain protein family
"fingerprints". The PRINTS database complements the PROSITE
database and contains groups of conserved motifs within sequence
alignments which are used to build characteristic signatures of
different polypeptide families. For PRINTS analyses, the cutoff
scores for local similarity were >1300=strong,
1000-1300=suggestive; for global similarity, p<exp-3; and for
strength (degree of correlation), >1300=strong,
1000-1300=weak.
[0099] PRINTS screening was carried out electronically to identify
those sequences shown in the Sequence Listing with similarity to
membrane-associated proteins, receptors, and ion channels. The
protein groupings screened included extracellular messengers
(including cytokines, growth factors, hormones, neuropeptides,
oncogenes, and vasomediators), receptors (including GPCRs,
tetraspannins, receptor kinases and nuclear receptors), ion
channels, and proteins associated with signaling cascades
(including kinases, phosphatases, G proteins, and second messengers
such as cyclic AMP, phospholipase C, inositol triphosphate, and the
like).
[0100] VIII Labeling of Probes and Hybridization Analyses
[0101] Blotting
[0102] Polynucleotide sequences are isolated from a biological
source and applied to a solid matrix (a blot) suitable for standard
nucleic acid hybridization protocols by one of the following
methods. A mixture of target nucleic acids, a restriction digest of
genomic DNA, is fractionated by electrophoresis through an 0.7%
agarose gel in 1.times.TAE [Tris-acetate-ethylenediamine
tetraacetic acid (EDTA)] running buffer and transferred to a nylon
membrane by capillary transfer using 20.times.saline sodium citrate
(SSC). Alternatively, the target nucleic acids are individually
ligated to a vector and inserted into bacterial host cells to form
a library. Target nucleic acids are arranged on a blot by one of
the following methods. In the first method, bacterial cells
containing individual clones are robotically picked and arranged on
a nylon membrane. The membrane is placed on bacterial growth
medium, LB agar containing carbenicillin, and incubated at
37.degree. C. for 16 hours. Bacterial colonies are denatured,
neutralized, and digested with proteinase K. Nylon membranes are
exposed to UV irradiation in a STRATALINKER UV-crosslinker
(Stratagene) to cross-link DNA to the membrane.
[0103] In the second method, target nucleic acids are amplified
from bacterial vectors by thirty cycles of PCR using primers
complementary to vector sequences flanking the insert. Amplified
target nucleic acids are purified using SEPHACRYL-400 (Amersham
Pharmacia Biotech). Purified target nucleic acids are robotically
arrayed onto a glass microscope slide (Coming Science Products,
Acton Mass.). The slide was previously coated with 0.05%
aminopropyl silane (Sigma-Aldrich, St. Louis Mo.) and cured at
110.degree. C. The arrayed glass slide (microarray) is exposed to
UV irradiation in a STRATALINKER UV-crosslinker (Stratagene).
[0104] Probe Preparation
[0105] cDNA probe sequences are made from mRNA templates. Five
micrograms of mRNA is mixed with 1 .mu.g random primer (Life
Technologies), incubated at 70.degree. C. for 10 minutes, and
lyophilized. The lyophilized sample is resuspended in 50 .mu.l of
1.times.first strand buffer (cDNA synthesis system; Life
Technologies) containing a dNTP mix, [.alpha.-.sup.32P]dCTP,
dithiothreitol, and MMLV reverse transcriptase (Stratagene), and
incubated at 42.degree. C. for 1-2 hours. After incubation, the
probe is diluted with 42 .mu.l dH.sub.2O, heated to 95.degree. C.
for 3 minutes, and cooled on ice. mRNA in the probe is removed by
alkaline degradation. The probe is neutralized, and degraded mRNA
and unincorporated nucleotides are removed using a PROBEQUANT G-50
microcolumn (Amersham Pharmacia Biotech). Probes can be labeled
with fluorescent markers, Cy3-dCTP or Cy5-dCTP (Amersham Pharmacia
Biotech), in place of the radionuclide, [.sup.32P]dCTP.
[0106] Hybridization
[0107] Hybridization is carried out at 65.degree. C. in a
hybridization buffer containing 0.5 M sodium phosphate (pH 7.2), 7%
SDS, and 1 mM EDTA. After the blot is incubated in hybridization
buffer at 65.degree. C. for at least 2 hours, the buffer is
replaced with 10 ml of fresh buffer containing the probe sequences.
After incubation at 65.degree. C. for 18 hours, the hybridization
buffer is removed, and the blot is washed sequentially under
increasingly stringent conditions, up to 40 mM sodium phosphate, 1%
SDS, 1 mM EDTA at 65.degree. C. To detect signal produced by a
radiolabeled probe hybridized on a membrane, the blot is exposed to
a PHOSPHORIMAGER cassette (Amersham Pharmacia Biotech), and the
image is analyzed using IMAGEQUANT data analysis software
(Molecular Dynamics). To detect signals produced by a fluorescent
probe hybridized on a microarray, the blot is examined by confocal
laser microscopy, and images are collected and analyzed using
GEMTOOLS gene expression analysis software (Incyte
Pharmaceuticals).
1TABLE I INCYTE PRINT PRINT SIGNAL SEQ ID NO CLONE NO PRINT ID
PRINT DESCRIPTION STRENGTH SCORE TM PEPTIDE SEQ ID NO:1 8915
PR00554C ADENOSINE A2B RECEPTOR SIGNATURE 1189 1319 yes yes SEQ ID
NO:2 68454 PR00572B INTERLEUKIN 8A RECEPTOR SIGNATURE 1120 1184 SEQ
ID NO:3 98991 PR00897E VASOPRESSIN VIB RECEPTOR SIGNATURE 1159 1135
SEQ ID NO:4 121140 PR00247B CAMP-TYPE GPCR SIGNATURE 1230 1244 yes
SEQ ID NO:5 129059 PR00535A MELANOCORTIN RECEPTOR SIGNATURE 1169
1120 SEQ ID NO:6 222732 PR00580 PROSTANOID EP1 RECEPTOR SIGNATURE
1278 1114 SEQ ID NO:7 222748 PR00635A AT1 ANGIOTENSIN II RECEPTOR
SIGNATURE 1280 1356 yes yes SEQ ID NO:8 224587 PR00555C ADENOSINE
A3 RECEPTOR SIGNATURE 1332 1341 yes SEQ ID NO:9 225146 PR00522G
CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1341 1282 SEQ ID NO:10 225640
PR00592B EXTRACELLULAR CALCIUM-SENSING RECEPTOR 1421 1279 SIGNAT
SEQ ID NO:11 225650 PR00648D GPR3 ORPHAN RECEPTOR SIGNATURE 1146
1302 yes SEQ ID NO:12 226179 PR00641H EBI1 ORPHAN RECEPTOR
SIGNATURE 1259 1215 SEQ ID NO:13 226815 PR00644E GPR ORPHAN
RECEPTOR SIGNATURE 1453 1250 yes SEQ ID NO:14 227559 PR00554G
ADENOSINE A2B RECEPTOR SIGNATURE 1259 1234 SEQ ID NO:15 227799
PR00366A ENDOTHELIN RECEPTOR SIGNATURE 1337 1279 yes SEQ ID NO:16
227892 PR00531A HISTAMINE H2 RECEPTOR SIGNATURE 1183 1197 yes SEQ
ID NO:17 228282 PR00565C DOPAMINE 1A RECEPTOR SIGNATURE 1221 1213
SEQ ID NO:18 229665 PR00648D GPR3 ORPHAN RECEPTOR SIGNATURE 1146
1233 yes SEQ ID NO:19 229779 PR00537C MU OPIOID RECEPTOR SIGNATURE
1348 1216 yes SEQ ID NO:20 240829 PR00856I PROSTACYCLIN (PROSTANOID
IP) RECEPTOR 1131 1273 yes SIGNATU SEQ ID NO:21 341490 PR00531A
HISTAMINE H2 RECEPTOR SIGNATURE 1183 1256 yes yes SEQ ID NO:22
402456 PR00555F ADENOSINE A3 RECEPTOR SIGNATURE 1259 1121 SEQ ID
NO:23 420765 PR00536E MELANOCYTE STIMULATING HORMONE RECEP- 1313
1170 TOR SIGNA SEQ ID NO:24 481770 PR00558B ALPHA-2A ADRENERGIC
RECEPTOR SIGNATURE 1519 1108 SEQ ID NO:25 548654 PR00531A HISTAMINE
H2 RECEPTOR SIGNATURE 1183 1240 yes SEQ ID NO:26 632097 PR00715E
CATION-DEPENDENT MANNOSE-6-PHOSPHATE 1440 1300 yes yes RECEPTOR SEQ
ID NO:27 647580 PR00586B PROSTANOID EP4 RECEPTOR SIGNATURE 1452
1317 yes SEQ ID NO:28 647628 PR00514D 5-HYDROXYTRYPTAMINE 1D
RECEPTOR 1252 1263 yes yes SIGNATURE SEQ ID NO:29 647931 PR00596D
URIDINE NUCLEOTIDE RECEPTOR SIGNATURE 1255 1221 SEQ ID NO:30 648153
PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1213 SEQ ID NO:31
648838 PR00751E THYROTROPHIN-RELEASING HORMONE RECEPTOR 1433 1221
SIGNA SEQ ID NO:32 649152 PR00554G ADENOSINE A2B RECEPTOR SIGNATURE
1259 1078 SEQ ID NO:33 649682 PR00255B NATRIURETIC PEPTIDE RECEPTOR
SIGNATURE 1264 1213 SEQ ID NO:34 649917 PR00596D URIDINE NUCLEOTIDE
RECEPTOR SIGNATURE 1255 1154 SEQ ID NO:35 650726 PR00490C SECRETIN
RECEPTOR SIGNATURE 1238 1195 SEQ ID NO:36 652013 PR00491C
VASOACTIVE INTESTINAL PEPTIDE RECEPTOR 1121 1155 yes SIGNAT SEQ ID
NO:37 738964 PR00542F MUSCARINIC M5 RECEPTOR SIGNATURE 1218 1185
SEQ ID NO:38 743323 PR00899G FUNGAL PHEROMONE STE3 GPCR SIGNATURE
1132 1218 SEQ ID NO:39 753592 PR00587A SOMATOSTATIN RECEPTOR TYPE 1
SIGNATURE 1312 1121 SEQ ID NO:40 797777 PR00512F
5-HYDROXYTRYPTAMINE 1A RECEPTOR 1388 1257 yes SIGNATURE SEQ ID
NO:41 885098 PR00642C EDG1 ORPHAN RECEPTOR SIGNATURE 1193 1213 SEQ
ID NO:42 947812 PR00636C AT2 ANGIOTENSIN II RECEPTOR SIGNATURE 1317
1277 SEQ ID NO:43 948051 PR00554A ADENOSINE A2B RECEPTOR SIGNATURE
1109 1255 SEQ ID NO:44 948581 PR00539E MUSCARINIC M2 RECEPTOR
SIGNATURE 1322 1244 yes SEQ ID NO:45 948700 PR00554C ADENOSINE A2B
RECEPTOR SIGNATURE 1189 1316 SEQ ID NO:46 948883 PR00531E HISTAMINE
H2 RECEPTOR SIGNATURE 1324 1241 yes SEQ ID NO:47 948935 PR00663D
GALANIN RECEPTOR SIGNATURE 1168 1282 SEQ ID NO:48 949387 PR00639D
NEUROMEDIN B RECEPTOR SIGNATURE 1198 1216 yes SEQ ID NO:49 951797
PR00900G PHEROMONE A RECEPTOR SIGNATURE 996 1211 yes SEQ ID NO:50
997947 PR00928E GRAVES DISEASE CARRIER PROTEIN SIGNATURE 1410 1283
yes SEQ ID NO:51 1212964 PR00564D BURKITT'S LYMPHOMA RECEPTOR
SIGNATURE 1295 1291 yes SEQ ID NO:52 1214535 PR00527A GASTRIN
RECEPTOR SIGNATURE 1327 1185 SEQ ID NO:53 1219856 PR00587A
SOMATOSTATIN RECEPTOR TYPE 1 SIGNATURE 1312 1181 SEQ ID NO:54
1288503 PR00663G GALANIN RECEPTOR SIGNATURE 1160 1452 yes yes SEQ
ID NO:55 1298179 PR00666C PINEAL OPSIN SIGNATURE 1257 1296 yes SEQ
ID NO:56 1305513 PR00639D NEUROMEDIN B RECEPTOR SIGNATURE 1198 1238
yes SEQ ID NO:57 1318926 PR00639D NEUROMEDIN B RECEPTOR SIGNATURE
1198 1299 yes SEQ ID NO:58 1328744 PR00641A EBI1 ORPHAN RECEPTOR
SIGNATURE 1325 1267 yes yes SEQ ID NO:59 1328845 PR00554D ADENOSINE
A2B RECEPTOR SIGNATURE 1208 1221 yes yes SEQ ID NO:60 1329044
PR00900G PHEROMONE A RECEPTOR SIGNATURE 996 1235 yes yes SEQ ID
NO:61 1329081 PR00522G CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1341
1249 yes SEQ ID NO:62 1329095 PR00639D NEUROMEDIN B RECEPTOR
SIGNATURE 1198 1137 SEQ ID NO:63 1329404 PR00637D TYPE 3 BOMBESIN
RECEPTOR SIGNATURE 1131 1246 SEQ ID NO:64 1329477 PR00517F
5-HYDROXYTRYPTAMINE 2C RECEPTOR 1259 1310 yes SIGNATURE SEQ ID
NO:65 1329584 PR00856I PROSTACYCLIN (PROSTANOID IP) RECEPTOR 1131
1176 SIGNATU SEQ ID NO:66 1329652 PR00646B RDC1 ORPHAN RECEPTOR
SIGNATURE 1307 1226 yes yes SEQ ID NO:67 1329778 PR00562F BETA-2
ADRENERGIC RECEPTOR SIGNATURE 1360 1292 yes SEQ ID NO:68 1329830
PR00642B EDG1 ORPHAN RECEPTOR SIGNATURE 1218 1325 yes SEQ ID NO:69
1329851 PR00582B PROSTANOID EP3 RECEPTOR SIGNATURE 1750 1276 yes
yes SEQ ID NO:70 1329862 PR00580C PROSTANOID EP1 RECEPTOR SIGNATURE
1278 1253 yes SEQ ID NO:71 1329971 PR00547A X OPIOID RECEPTOR
SIGNATURE 1342 1233 SEQ ID NO:72 1329994 PR00564D BURKITT'S
LYMPHOMA RECEPTOR SIGNATURE 1295 1315 yes yes SEQ ID NO:73 1329995
PR00490F SECRETIN RECEPTOR SIGNATURE 1239 1275 yes SEQ ID NO:74
1330007 PR00899K FUNGAL PHEROMONE STE3 GPCR SIGNATURE 1057 1244 yes
SEQ ID NO:75 1330016 PR00514D 5-HYDROXYTRYPTAMINE ID RECEPTOR 1252
1255 yes SIGNATURE SEQ ID NO:76 1330023 PR00647I SENR ORPHAN
RECEPTOR SIGNATURE 1291 1258 yes SEQ ID NO:77 1330061 PR00586H
PROSTANOID EP4 RECEPTOR SIGNATURE 1526 1219 SEQ ID NO:78 1330108
PR00424B ADENOSINE RECEPTOR SIGNATURE 1339 1240 yes yes SEQ ID
NO:79 1330215 PR00642B EDG1 ORPHAN RECEPTOR SIGNATURE 1218 1237 yes
SEQ ID NO:80 1330424 PR00248F METABOTROPIC GLUTAMATE GPCR SIGNATURE
1498 1262 yes SEQ ID NO:81 1330429 PR00568D DOPAMINE D3 RECEPTOR
SIGNATURE 1445 1226 yes yes SEQ ID NO:82 1330478 PRO0571G
ENDOTHELIN-B RECEPTOR SIGNATURE 1420 1253 yes SEQ ID NO:83 1330641
PR00424F ADENOSINE RECEPTOR SIGNATURE 1205 1241 SEQ ID NO:84
1330656 PR00554B ADENOSINE A2B RECEPTOR SIGNATURE 1090 1227 yes SEQ
ID NO:85 1330683 PR00699F C.ELEGANS INTEGRAL MEMBRANE PROTEIN 1214
1220 yes SRG SIGNA SEQ ID NO:86 1330740 PR00639D NEUROMEDIN B
RECEPTOR SIGNATURE 1198 1209 yes SEQ ID NO:87 1330847 PR00641F EBI1
ORPHAN RECEPTOR SIGNATURE 1290 1260 yes yes SEQ ID NO.88 1330861
PR00424B ADENOSINE RECEPTOR SIGNATURE 1339 1310 yes SEQ ID NO:89
1330882 PR00559C ALPHA-2B ADRENERGIC RECEPTOR SIGNATURE 1284 1208
SEQ ID NO:90 1330907 PR00568A DOPAMINE D3 RECEPTOR SIGNATURE 1427
1248 yes SEQ ID NO:91 1330918 PR00908H THROMBIN RECEPTOR SIGNATURE
1409 1300 yes yes SEQ ID NO:92 1330930 PR00645I LCR1 ORPHAN
RECEPTOR SIGNATURE 1511 1272 yes yes SEQ ID NO:93 1330957 PR00261E
LOW DENSITY LIPOPROTEIN (LDL) RECEPTOR 1459 1236 yes SIGNAT SEQ ID
NO:94 1330969 PR00515C 5-HYDROXYTRYPTAMINE 1F RECEPTOR 1351 1264
SIGNATURE SEQ ID NO:95 1331030 PR00715E CATION-DEPENDENT
MANNOSE-6-PHOSPHATE 1440 1300 yes RECEPTOR SEQ ID NO:96 1331172
PR00667B RETINAL PIGMENT EPITHELIUM-RETINAL GPCR 1190 1237 yes
SIGNA SEQ ID NO:97 1331278 PR00854B PROSTAGLANDIN D RECEPTOR
SIGNATURE 1257 1288 yes SEQ ID NO:98 1331316 PR00596D URIDINE
NUCLEOTIDE RECEPTOR SIGNATURE 1255 1294 SEQ ID NO:99 1331330
PR00699E C.ELEGANS INTEGRAL MEMBRANE PROTEIN SRG 1137 1196 yes
SIGNA SEQ ID NO:100 1331371 PR00240D ALPHA-1A ADRENERGIC RECEPTOR
SIGNATURE 1470 1238 yes SEQ ID NO:101 1331411 PR00542F MUSCARINIC
M5 RECEPTOR SIGNATURE 1218 1284 yes SEQ ID NO:102 1331481 PR00641B
EBI1 ORPHAN RECEPTOR SIGNATURE 1354 1244 SEQ ID NO:103 1331917
PR00572B INTERLEUKIN 8A RECEPTOR SIGNATURE 1120 1229 yes yes SEQ ID
NO:104 1332023 PR00240D ALPHA-1A ADRENERGIC RECEPTOR SIGNATURE 1470
1307 yes yes SEQ ID NO:105 1332138 PR00752F VASOPRESSIN V1A
RECEPTOR SIGNATURE 1304 1226 yes SEQ ID NO:106 1332171 PR00715I
CATION-DEPENDENT MANNOSE-6-PHOSPHATE 1392 1205 RECEPTOR SEQ ID
NO:107 1332391 PR00522G CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1341
1285 yes SEQ ID NO:108 1332480 PR00643D G1D ORPHAN RECEPTOR
SIGNATURE 1317 1210 SEQ ID NO:109 1332803 PR00258D SPERACT RECEPTOR
SIGNATURE 1254 1230 yes SEQ ID NO:110 1332830 PR00652F
5-HYDROXYTRYPTAMINE 7 RECEPTOR SIGNATURE 1488 1194 SEQ ID NO:111
1332955 PR00639D NEUROMEDIN B RECEPTOR SIGNATURE 1198 1295 yes yes
SEQ ID NO:112 1332966 PR00574D BLUE-SENSITIVE OPSIN SIGNATURE 1263
1300 yes SEQ ID NO:113 1332981 PR00589E SOMATOSTATIN RECEPTOR TYPE
3 SIGNATURE 1340 1253 yes SEQ ID NO:114 1333006 PR00643H G10D
RECEPTOR SIGNATURE 1453 1285 yes SEQ ID NO:115 1333107 PR00539E
MUSCARINIC M2 RECEPTOR SIGNATURE 1322 1371 SEQ ID NO:116 1333116
PR00568D DOPAMINE D3 RECEPTOR SIGNATURE 1445 1206 SEQ ID NO:117
1333133 PR00643H G10D ORPHAN RECEPTOR SIGNATURE 1453 1246 SEQ ID
NO:118 1352448 PR00527I GASTRIN RECEPTOR SIGNATURE 1633 1234 yes
yes SEQ ID NO:119 1385827 PR00539E MUSCARINIC M2 RECEPTOR SIGNATURE
1322 1368 yes yes SEQ ID NO:120 1385922 PR00571G ENDOTHELIN-B
RECEPTOR SIGNATURE 1420 1193 SEQ ID NO:121 1386485 PR00574D
BLUE-SENSITIVE OPSIN SIGNATURE 1263 1317 SEQ ID NO:122 1386553
PR00857C MELATONIN RECEPTOR SIGNATURE 1472 1238 yes SEQ ID NO:123
1386660 PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1251 SEQ ID
NO:124 1386859 PR00645G LCR1 ORPHAN RECEPTOR SIGNATURE 1454 1230
SEQ ID NO:125 1387302 PR00592B EXTRACELLULAR CALCIUM-SENSING
RECEPTOR 1421 1203 yes SIGNAT SEQ ID NO:126 1388063 PR00665G
OXYTOCIN RECEPTOR SIGNATURE 1246 1335 yes SEQ ID NO:127 1422814
PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1292 SEQ ID NO:128
1423820 PR00574D BLUE-SENSITIVE OPSIN SIGNATURE 1263 1177 yes SEQ
ID NO:129 1429651 PR00554G ADENOSINE A2B RECEPTOR SIGNATURE 1259
1275 yes yes SEQ ID NO:130 1436525 PR00568D DOPAMINE D3 RECEPTOR
SIGNATURE 1445 1216 yes SEQ ID NO:131 1453124 PR00649G GPR6 ORPHAN
RECEPTOR SIGNATURE 1103 1292 yes SEQ ID NO:132 1460891 PR00854B
PROSTAGLANDIN D RECEPTOR SIGNATURE 1257 1245 SEQ ID NO:133 1465590
PR00539E MUSCARINIC M2 RECEPTOR SIGNATURE 1322 1229 yes yes SEQ ID
NO:134 1466523 PR00560D ALPHA-2C ADRENERGIC RECEPTOR SIGNATURE 1642
1319 SEQ ID NO:135 1466902 PR00240D ALPHA-1A ADRENERGIC RECEPTOR
SIGNATURE 1470 1271 SEQ ID NO:136 1468040 PR00537C MU OPIOID
RECEPTOR SIGNATURE 1348 1316 yes SEQ ID NO:137 1480833 PR00571G
ENDOTHELIN-B RECEPTOR SIGNATURE 1420 1325 SEQ ID NO:138 1516908
PR00530I HISTAMINE H1 RECEPTOR SIGNATURE 1295 1276 yes SEQ ID
NO:139 1518320 PR00343A SELECTIN SUPERFAMILY COMPLEMENT-BINDING
1245 1412 yes REPEA SEQ ID NO:140 1529624 PR00522G CANNABINOID
RECEPTOR TYPE 1 SIGNATURE 1341 1350 SEQ ID NO:141 1590311 PR00639D
NEUROMEDIN B RECEPTOR SIGNATURE 1198 1168 SEQ ID NO:142 1590335
PR00666B PINEAL OPSIN SIGNATURE 1253 1224 SEQ ID NO:143 1590422
PR00240F ALPHA-1A ADRENERGIC RECEPTOR SIGNATURE 1323 1226 SEQ ID
NO:144 1590455 PR00637D TYPE 3 BOMBESIN RECEPTOR SIGNATURE 1131
1182 SEQ ID NO:145 1590464 PR00637D TYPE 3 BOMBESIN RECEPTOR
SIGNATURE 1131 1268 yes SEQ ID NO:146 1590496 PR00908H THROMBIN
RECEPTOR SIGNATURE 1409 1226 yes SEQ ID NO:147 1590713 PR00641F
EBI1 ORPHAN RECEPTOR SIGNATURE 1290 1211 SEQ ID NO:148 1590769
PR00572B INTERLEUKIN 8A RECEPTOR SIGNATURE 1120 1122 SEQ ID NO:149
1590779 PR00642C EDG1 ORPHAN RECEPTOR SIGNATURE 1193 1246 SEQ ID
NO:150 1590931 PR00646F RDC1 ORPHAN RECEPTOR SIGNATURE 1188 1165
SEQ ID NO:151 1590958 PR00261C LOW DENSITY LIPOPROTEIN (LDL)
RECEPTOR 1576 1234 SIGNAT SEQ ID NO:152 1590973 PR00350C VITAMIN D
RECEPTOR SIGNATURE 1416 1283 yes yes SEQ ID NO:153 1591090 PR00643G
G10D ORPHAN RECEPTOR SIGNATURE 1383 1292 SEQ ID NO:154 1591713
PR00553B ADENOSINE A2A RECEPTOR SIGNATURE 1258 1292 yes yes SEQ ID
NO:155 1642794 PR00527B GASTRIN RECEPTOR SIGNATURE 1431 1353 SEQ ID
NO:156 1687080 PR00855A PROSTAGLANDIN F RECEPTOR SIGNATURE 1361
1248 yes yes SEQ ID NO:157 1722845 PR00248C METABOTROPIC GLUTAMATE
GPCR SIGNATURE 1402 1328 yes yes SEQ ID NO:158 1732911 PR00587A
SOMATOSTATIN RECEPTOR TYPE 1 SIGNATURE 1312 1221 SEQ ID NO:159
1785913 PR00554C ADENOSINE A2B RECEPTOR SIGNATURE 1189 1201 SEQ ID
NO:160 1809069 PR00643G G10D ORPHAN RECEPTOR SIGNATURE 1383 1367
yes yes SEQ ID NO:161 1867626 PR00665F OXYTOCIN RECEPTOR SIGNATURE
1290 1353 yes SEQ ID NO:162 1880501 PR00715I CATION-DEPENDENT
MANNOSE-6-PHOSPHATE 1392 1257 yes yes RECEPTOR SEQ ID NO:163
1881009 PR00531A HISTAMINE H2 RECEPTOR SIGNATURE 1183 1269 yes SEQ
ID NO:164 1909132 PR00637D TYPE 3 BOMBESIN RECEPTOR SIGNATURE 1131
1221 yes SEQ ID NO:165 1955094 PR00580 PROSTANOID EP1 RECEPTOR
SIGNATURE 1160 1197 yes SEQ ID NO:166 1955688 PR00647I SENR ORPHAN
RECEPTOR SIGNATURE 1291 1193 yes SEQ ID NO:167 1956694 PR00517F
5-HYDROXYTRYPTAMINE 2C RECEPTOR 1259 1255 SIGNATURE SEQ ID NO:168
1957189 PR00366A ENDOTHELIN RECEPTOR SIGNATURE 1337 1232 SEQ ID
NO:169 1957920 PR00350B VITAMIN D RECEPTOR SIGNATURE 1494 1267 SEQ
ID NO:170 1957977 PR00559C ALPHA-2B ADRENERGIC RECEPTOR SIGNATURE
1284 1157 yes SEQ ID NO:171 1958505 PR00571G ENDOTHELIN-B RECEPTOR
SIGNATURE 1420 1188 SEQ ID NO:172 1972687 PR00553B ADENOSINE A2A
RECEPTOR SIGNATURE 1258 1275 yes yes SEQ ID NO:173 1975013 PR00554D
ADENOSINE A2B RECEPTOR SIGNATURE 1208 1312 yes SEQ ID NO:174
2010369 PR00645I LCR1 ORPHAN RECEPTOR SIGNATURE 1511 1277 yes yes
SEQ ID NO:175 2019581 PR00641A EBI1 ORPHAN RECEPTOR SIGNATURE 1325
1178 SEQ ID NO:176 2022460 PR00574D BLUE-SENSITIVE OPSIN SIGNATURE
1263 1279 yes SEQ ID NO:177 2022624 PR00857C MELATONIN RECEPTOR
SIGNATURE 1472 1232 yes yes SEQ ID NO:178 2022628 PR00244A
NEUROKININ RECEPTOR SIGNATURE 1316 1174 SEQ ID NO:179 2022630
PR00525E DELTA OPIOID RECEPTOR SIGNATURE 1139 1261 yes SEQ ID
NO:180 2022631 PR00645I LCR1 ORPHAN RECEPTOR SIGNATURE 1511 1299
yes yes SEQ ID NO:181 2023275 PR00667B RETINAL PIGMENT
EPITHELIUM-RETINAL 1190 1217 GPCR SIG SEQ ID NO:182 2023747
PR00568D DOPAMINE D3 RECEPTOR SIGNATURE 1445 1244 SEQ ID NO:183
2044305 PR00751B THYROTROPHIN-RELEASING HORMONE RECEPTOR 1443 1209
yes yes SIGNA SEQ ID NO:184 2069971 PR00644E GPR ORPHAN RECEPTOR
SIGNATURE 1453 1307 yes SEQ ID NO:185 2070872 PR00245C OLFACTORY
RECEPTOR SIGNATURE 1364 1286 yes yes SEQ ID NO:186 2072228 PR00553A
ADENOSINE A2A RECEPTOR SIGNATURE 1377 1218 yes SEQ ID NO:187
2085633 PR00752E VASOPRESSIN V1A RECEPTOR SIGNATURE 1193 1250 yes
SEQ ID NO:188 2088104 PR00641A EBI1 ORPHAN RECEPTOR SIGNATURE 1325
1305 yes yes SEQ ID NO:189 2091133 PR00562C BETA-2 ADRENERGIC
RECEPTOR SIGNATURE 1457 1302 SEQ ID NO:190 2123514 PR00587A
SOMATOSTATIN RECEPTOR TYPE 1 SIGNATURE 1312 1252 yes SEQ ID NO:191
2150261 PR00176C SODIUM/NEUROTRANSMITTER SYMPORTER 1414 1576 yes
yes SIGNATURE SEQ ID NO:192 2170670 PR00896H VASOPRESSIN RECEPTOR
SIGNATURE 1331 1271 yes yes SEQ ID NO:193 2199484 PR00559B ALPHA-2B
ADRENERGIC RECEPTOR SIGNATURE 1285 1195 yes SEQ ID NO:194 2204242
PR00896H VASOPRESSIN RECEPTOR SIGNATURE 1331 1305 SEQ ID NO:195
2236316 PR00554B ADENOSINE A2B RECEPTOR SIGNATURE 1090 1192 yes SEQ
ID NO:196 2237722 PR00855H PROSTAGLANDIN F RECEPTOR SIGNATURE 1467
1218 SEQ ID NO:197 2238625 PR00585A PROSTANOID EP3 RECEPTOR TYPE 3
SIGNATURE 1230 1237 yes SEQ ID NO:198 2242277 PR00255B NATRIURETIC
PEPTIDE RECEPTOR SIGNATURE 1264 1180 yes SEQ ID NO:199 2244782
PR00590A SOMATOSTATIN RECEPTOR TYPE 4 SIGNATURE 1253 1260 SEQ ID
NO:200 2272244 PR00568D DOPAMINE D3 RECEPTOR SIGNATURE 1445 1365
yes yes SEQ ID NO:201 2284108 PR00652F 5-HYDROXYTRYPTAMINE 7
RECEPTOR SIGNATURE 1488 1317 yes yes SEQ ID NO:202 2287109 PR00641H
EBI1 ORPHAN RECEPTOR SIGNATURE 1259 1305 SEQ ID NO:203 2289873
PR00596D URIDINE NUCLEOTIDE RECEPTOR SIGNATURE 1255 1254 yes yes
SEQ ID NO:204 2375491 PR00592B EXTRACELLULAR CALCIUM-SENSING
RECEPTOR 1421 1286 SIGNAT
SEQ ID NO:205 2376547 PR00539E MUSCARINIC M2 RECEPTOR SIGNATURE
1322 1222 SEQ ID NO:206 2377774 PR00240F ALPHA-1A ADRENERGIC
RECEPTOR SIGNATURE 1323 1215 yes yes SEQ ID NO:207 2378093 PR00854A
PROSTAGLANDIN D RECEPTOR SIGNATURE 1169 1350 yes yes SEQ ID NO:208
2378367 PR00571A ENDOTHELIN-B RECEPTOR SIGNATURE 1357 1278 yes SEQ
ID NO:209 2378405 PR00647I SENR ORPHAN RECEPTOR SIGNATURE 1291 1195
SEQ ID NO:210 2378406 PR00555C ADENOSINE A3 RECEPTOR SIGNATURE 1332
1369 SEQ ID NO:211 2381364 PR00666B PINEAL OPSIN SIGNATURE 1253
1249 yes SEQ ID NO:212 2381732 PR00663D GALANIN RECEPTOR SIGNATURE
1168 1242 SEQ ID NO:213 2383045 PR00522C CANNABINOID RECEPTOR TYPE
1 SIGNATURE 1317 1223 SEQ ID NO:214 2470285 PR00373D GLYCOPROTEIN
HORMONE RECEPTOR SIGNATURE 1458 1537 yes yes SEQ ID NO:215 2488060
PR00558G ALPHA-2A ADRENERGIC RECEPTOR SIGNATURE 1396 1282 yes yes
SEQ ID NO:216 2503084 PR00752E VASOPRESSIN V1A RECEPTOR SIGNATURE
1193 1235 yes SEQ ID NO:217 2511221 PR00565B DOPAMINE 1A RECEPTOR
SIGNATURE 1289 1266 yes yes SEQ ID NO:218 2512109 PR00636B AT2
ANGIOTENSIN II RECEPTOR SIGNATURE 1305 1267 SEQ ID NO:219 2553280
PR00530I HISTAMINE H1 RECEPTOR SIGNATURE 1295 1275 yes SEQ ID
NO:220 2603450 PR00373D GLYCOPROTEIN HORMONE RECEPTOR SIGNATURE
1458 1419 yes yes SEQ ID NO:221 2605934 PR00539C MUSCARINIC M2
RECEPTOR SIGNATURE 1365 1286 yes SEQ ID NO:222 2674641 PR00752E
VASOPRESSIN V1A RECEPTOR SIGNATURE 1193 1228 yes yes SEQ ID NO:223
2681738 PR00258C SPERACT RECEPTOR SIGNATURE 1220 1307 yes yes SEQ
ID NO:224 2723293 PR00554B ADENOSINE A2B RECEPTOR SIGNATURE 1090
1240 yes SEQ ID NO:225 2762348 PR00560D ALPHA-2C ADRENERGIC
RECEPTOR SIGNATURE 1642 1284 yes yes SEQ ID NO:226 2773609 PR00854B
PROSTAGLANDIN D RECEPTOR SIGNATURE 1257 1269 SEQ ID NO:227 2776266
PR00854B PROSTAGLANDIN D RECEPTOR SIGNATURE 1257 1269 SEQ ID NO:228
2777115 PR00527A GASTRIN RECEPTOR SIGNATURE 1327 1286 SEQ ID NO:229
2812882 PR00566B DOPAMINE 1B RECEPTOR SIGNATURE 1121 1215 yes SEQ
ID NO:230 2821121 PR00560D ALPHA-2C ADRENERGIC RECEPTOR SIGNATURE
1642 1251 yes yes SEQ ID NO:231 2848989 PR00561F BETA-1 ADRENERGIC
RECEPTOR SIGNATURE 1445 1293 SEQ ID NO:232 2854471 PR00643C G1D
ORPHAN RECEPTOR SIGNATURE 1286 1051 SEQ ID NO:233 2854670 PR00571A
ENDOTHELIN-B RECEPTOR SIGNATURE 1357 1222 yes SEQ ID NO:234 2855520
PR00590C SOMATOSTATIN RECEPTOR TYPE 4 SIGNATURE 1325 1255 SEQ ID
NO:235 2855815 PR00558G ALPHA-2A ADRENERGIC RECEPTOR SIGNATURE 1396
1184 yes SEQ ID NO:236 2857653 PR00854B PROSTAGLANDIN D RECEPTOR
SIGNATURE 1257 1302 yes yes SEQ ID NO:237 2866122 PR00350C VITAMIN
D RECEPTOR SIGNATURE 1230 1214 yes yes SEQ ID NO:238 2925464
PR00585A PROSTANOID EP3 RECEPTOR TYPE 3 SIGNATURE 1230 1214 yes yes
SEQ ID NO:239 2954714 PR00241C ANGIOTENSIN II RECEPTOR SIGNATURE
1246 1213 SEQ ID NO:240 2986560 PR00855A PROSTAGLANDIN F RECEPTOR
SIGNATURE 1361 1330 SEQ ID NO:241 3068234 PR00642D EDG1 ORPHAN
RECEPTOR SIGNATURE 1208 1319 yes yes SEQ ID NO:242 3077943 PR00647I
SENR ORPHAN RECEPTOR SIGNATURE 1291 1268 SEQ ID NO:243 3144006
PR00715G CATION DEPENDENT MANNOSE-6-PHOSPHATE 1366 1247 RECEPTOR
SEQ ID NO:244 3226980 PR00527A GASTRIN RECEPTOR SIGNATURE 1327 1222
yes SEQ ID NO:245 3290614 PR00522G CANNABINOID RECEPTOR TYPE 1
SIGNATURE 1341 1263 SEQ ID NO:246 3291235 PR00240D ALPHA-1A
ADRENERGIC RECEPTOR SIGNATURE 1470 1269 yes SEQ ID NO:247 3324775
PR00490F SECRETIN RECEPTOR SIGNATURE 1239 1217 yes SEQ ID NO:248
3333159 PR00581A PROSTANOID EP2 RECEPTOR SIGNATURE 1113 1243 yes
SEQ ID NO:249 3393404 PR00343A SELECTIN SUPERFAMILY
COMPLEMENT-BINDING 1245 1606 yes yes REPEA SEQ ID NO:250 3429408
PR00643G G10D ORPHAN RECEPTOR SIGNATURE 1383 1204 SEQ ID NO:251
3447545 PR00642E EDG1 ORPHAN RECEPTOR SIGNATURE 1216 1341 SEQ ID
NO:252 3486012 PR00343C SELECTIN SUPERFAMILY COMPLEMENT-BINDING
1254 1343 REPEA SEQ ID NO:253 3513925 PR00261C LOW DENSITY
LIPOPROTEIN (LDL) RECEPTOR 1576 1256 SIGNAT SEQ ID NO:254 3542591
PR00573C INTERLEUKIN 8B RECEPTOR SIGNATURE 1042 1231 yes SEQ ID
NO:255 3556218 PR00373F GLYCOPROTEIN HORMONE RECEPTOR SIGNATURE
1570 1293 yes SEQ ID NO:256 3665056 PR00366A ENDOTHELIN RECEPTOR
SIGNATURE 1337 1251 yes SEQ ID NO:257 4043152 PR00514D
5-HYDROXYTRYPTAMINE 1D RECEPTOR 1252 1304 yes SIGNATURE SEQ ID
NO:258 4080842 PR00424B ADENOSINE RECEPTOR SIGNATURE 1339 1117 SEQ
ID NO:259 4081268 PR00564G BURKITT'S LYMPHOMA RECEPTOR SIGNATURE
1305 1216 yes SEQ ID NO:260 4082591 PR00640F GASTRIN-RELEASING
PEPTIDE RECEPTOR 1243 1311 SIGNATURE SEQ ID NO:261 4084463 PR00539C
MUSCARINIC M2 RECEPTOR SIGNATURE 1365 1258 yes SEQ ID NO:262
4085069 PR00663E GALANIN RECEPTOR SIGNATURE 1216 1210 SEQ ID NO:263
4129226 PR00250F FUNGAL PHEROMONE MATING FACTOR STE2 1166 1178 GPCR
SIGN SEQ ID NO:264 4129407 PR00564D BURKITT'S LYMPHOMA RECEPTOR
SIGNATURE 1295 1222 SEQ ID NO:265 4130289 PR00539E MUSCARINIC M2
RECEPTOR SIGNATURE 1322 1236 yes yes SEQ ID NO:266 4130748 PR00751E
THYROTROPHIN-RELEASING HORMONE RECEPTOR 1433 1245 SIGNA SEQ ID
NO:267 4131249 PR00240F ALPHA-1A ADRENERGIC RECEPTOR SIGNATURE 1323
1305 yes SEQ ID NO:268 4132125 PR00581E PROSTANOID EP2 RECEPTOR
SIGNATURE 1195 1157 SEQ ID NO:269 4132371 PR00350C VITAMIN D
RECEPTOR SIGNATURE 1416 1321 yes SEQ ID NO:270 4132403 PR00248E
METABOTROPIC GLUTAMATE GPCR SIGNATURE 1306 1167 SEQ ID NO:271
4132547 PR00559E ALPHA-2B ADRENERGIC RECEPTOR SIGNATURE 1546 1214
SEQ ID NO:272 4133631 PR00928F GRAVES DISEASE CARRIER PROTEIN
SIGNATURE 1555 1238 SEQ ID NO:273 4166159 PR00855H PROSTAGLANDIN F
RECEPTOR SIGNATURE 1467 1236 SEQ ID NO:274 4167883 PR00537C MU
OPIOID RECEPTOR SIGNATURE 1348 1 1185 SEQ ID NO:275 4220523
PR00537C MU OPIOID RECEPTOR SIGNATURE 1348 1227 yes SEQ ID NO:276
4220713 PR00366G ENDOTHELIN RECEPTOR SIGNATURE 1420 1276 SEQ ID
NO:277 4220819 PR00855H PROSTAGLANDIN F RECEPTOR SIGNATURE 1467
1263 SEQ ID NO:278 4220939 PR00571A ENDOTHELIN-B RECEPTOR SIGNATURE
1357 1235 yes SEQ ID NO:279 4221286 PR00574D BLUE-SENSITIVE OPSIN
SIGNATURE 1263 1254 yes SEQ ID NO:280 4221314 PR00530I HISTAMINE H1
RECEPTOR SIGNATURE 1295 1263 yes yes SEQ ID NO:281 4222520 PR00350C
VITAMIN D RECEPTOR SIGNATURE 1416 1202 SEQ ID NO:282 4223468
PR00592A EXTRACELLULAR CALCIUM-SENSING RECEPTOR 1379 1206 SIGNAT
SEQ ID NO:283 4223734 PR00652F 5-HYDROXYTRYPTAMINE 7 RECEPTOR
SIGNATURE 1488 1213 SEQ ID NO:284 4224867 PR00244E NEUROKININ
RECEPTOR SIGNATURE 1282 1266 SEQ ID NO:285 4256014 PR00665F
OXYTOCIN RECEPTOR SIGNATURE 1290 1353 yes SEQ ID NO:286 4352201
PR00643H G10D ORPHAN RECEPTOR SIGNATURE 1453 1297 SEQ ID NO:287
4355247 PR00536E MELANOCYTE STIMULATING HORMONE RECEP- 1313 1353
yes TOR SIGNA SEQ ID NO:288 4608111 PR00596A URIDINE NUCLEOTIDE
RECEPTOR SIGNATURE 1217 1273 yes yes SEQ ID NO:289 319589 PR00635F
AT1 ANGIOTENSIN II RECEPTOR SIGNATURE 1424 1398 yes yes SEQ ID
NO:290 884692 PR00255B NATRIURETIC PEPTIDE RECEPTOR SIGNATURE 1264
1239 yes SEQ ID NO:291 1262948 PR00248C METABOTROPIC GLUTAMATE GPCR
SIGNATURE 1402 1369 yes yes SEQ ID NO:292 1876370 PR00343A SELECTIN
SUPERFAMILY COMPLEMENT-BINDING 1245 1616 yes yes REPEA SEQ ID
NO:293 2088868 PR00522G CANNABINOID RECEPTOR TYPE 1 SIGNATURE 1341
1329 yes yes SEQ ID NO:294 3550808 PR00594B P2U PURINOCEPTOR
SIGNATURE 1452 1255 yes yes SEQ ID NO:295 1328883 PR00169H
POTASSIUM CHANNEL SIGNATURE 1749 1204 SEQ ID NO:296 3458089
PR00169G POTASSIUM CHANNEL SIGNATURE 1540 1390 yes no SEQ ID NO:297
1329138 PR00168F SLOW VOLTAGE-GATED POTASSIUM CHANNEL 1307 1176
SIGNATUR SEQ ID NO:298 1514470 PR00944B COPPER ION BINDING PROTEIN
SIGNATURE 1319 1255 no yes SEQ ID NO:299 1513293 PR00944B COPPER
ION BINDING PROTEIN SIGNATURE SEQ ID NO:300 1514470 PR00944B COPPER
ION BINDING PROTEIN SIGNATURE SEQ ID NO:301 1514470H1 PR00944B
COPPER ION BINDING PROTEIN SIGNATURE SEQ ID NO:302 3372628 PR00944B
COPPER ION BINDING PROTEIN SIGNATURE SEQ ID NO:303 4970006 PR00944B
COPPER ION BINDING PROTEIN SIGNATURE 1319 1255 no no SEQ ID NO:304
4970006H1 PR00944B COPPER ION BINDING PROTEIN SIGNATURE SEQ ID
NO:305 4970006FG PR00944B COPPER ION BINDING PROTEIN SIGNATURE
[0108]
Sequence CWU 0
0
* * * * *