U.S. patent application number 10/998518 was filed with the patent office on 2005-07-21 for expression monitoring by hybridization to high density oligonucleotide arrays.
This patent application is currently assigned to Affymetrix Inc.. Invention is credited to Brown, Eugene L., Chee, Mark, Gingeras, Thomas R., Lockhart, David J., Wong, Gordon G..
Application Number | 20050158746 10/998518 |
Document ID | / |
Family ID | 24108589 |
Filed Date | 2005-07-21 |
United States Patent
Application |
20050158746 |
Kind Code |
A1 |
Lockhart, David J. ; et
al. |
July 21, 2005 |
Expression monitoring by hybridization to high density
oligonucleotide arrays
Abstract
This invention provides methods of monitoring the expression
levels of a multiplicity of genes. The methods involve hybridizing
a nucleic acid sample to a high density array of oligonucleotide
probes where the high density array contains oligonucleotide probes
complementary to subsequences of target nucleic acids in the
nucleic acid sample. In one embodiment, the method involves
providing a pool of target nucleic acids comprising RNA transcripts
of one or more target genes, or nucleic acids derived from the RNA
transcripts, hybridizing said pool of nucleic acids to an array of
oligonucleotide probes immobilized on surface, where the array
comprising more than 100 different oligonucleotides and each
different oligonucleotide is localized in a predetermined region of
the surface, the density of the different oligonucleotides is
greater than about 60 different oligonucleotides per 1 cm.sup.2,
and the olignucleotide probes are complementary to the RNA
transcripts or nucleic acids derived from the RNA transcripts; and
quantifying the hybridized nucleic acids in the array.
Inventors: |
Lockhart, David J.; (Santa
Clara, CA) ; Brown, Eugene L.; (Newton Highlands,
MA) ; Wong, Gordon G.; (Brookline, MA) ; Chee,
Mark; (Palo Alto, CA) ; Gingeras, Thomas R.;
(Encinitas, CA) |
Correspondence
Address: |
TOWNSEND AND TOWNSEND AND CREW LLP
TWO EMBARCADERO CENTER
8TH FLOOR
SAN FRANCISCO
CA
94111-3834
US
|
Assignee: |
Affymetrix Inc.
Santa Clara
CA
|
Family ID: |
24108589 |
Appl. No.: |
10/998518 |
Filed: |
November 23, 2004 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10998518 |
Nov 23, 2004 |
|
|
|
10353792 |
Jan 28, 2003 |
|
|
|
10353792 |
Jan 28, 2003 |
|
|
|
09935365 |
Aug 22, 2001 |
|
|
|
6548257 |
|
|
|
|
09935365 |
Aug 22, 2001 |
|
|
|
09212004 |
Dec 14, 1998 |
|
|
|
6410229 |
|
|
|
|
09212004 |
Dec 14, 1998 |
|
|
|
08529115 |
Sep 15, 1995 |
|
|
|
6040138 |
|
|
|
|
Current U.S.
Class: |
435/6.11 ;
435/6.14 |
Current CPC
Class: |
C12Q 1/6809 20130101;
C12Q 2565/507 20130101; C12Q 1/6809 20130101; G01N 15/1475
20130101; C12Q 1/6837 20130101 |
Class at
Publication: |
435/006 |
International
Class: |
C12Q 001/68 |
Claims
1. A method of simultaneously monitoring the expression of a
multiplicity of genes, said method comprising: (a) providing a pool
of target nucleic acids comprising RNA transcripts of one or more
of said genes, or nucleic acids derived from said RNA transcripts;
(b) hybridizing said pool of nucleic acids to an array of
oligonucleotide probes immobilized on a surface, said array
comprising more than 100 different oligonucleotides wherein each
different oligonucleotide is localized in a predetermined region of
said surface, the density of said different oligonucleotides is
greater than about 60 different oligonucleotides per 1 cm.sup.2,
and said olignucleotide probes are complementary to said RNA
transcripts or said nucleic acids derived from said RNA
transcripts; and (c) quantifying the hybridization of said nucleic
acids to said array.
2. The method of claim 1, wherein the concentration of said RNA
transcripts in said pool, or nucleic acids derived from said RNA
transcripts, is proportional to the expression levels of said
genes.
3. The method of claim 1, wherein said array of oligonucleotides
further comprises mismatch control probes.
4. The method of claim 3, wherein said quantifying comprises
calculating the difference in hybridization signal intensity
between each of said oligonucleotide probes and its corresponding
mismatch control probe.
5. The method of claim 4, wherein said quantifying comprises
calculating the average difference in hybridization signal
intensity between each of said oligonucleotide probes and its
corresponding mismatch control probe for each gene.
6. The method of claim 3, wherein said oligonucleotide probes
present in said array are oligonucleotide probes selected according
to the method of claim 29.
7. The method of claim 3, wherein said oligonucleotide probes
present in said array are oligonucleotide probes selected according
to the method-of claim 29.
8. The method of claim 3, wherein said oligonucleotide probes
analyzed in said quantifying step are selected according to the
method of claim 29.
9. The method of claim 3, wherein said oligonucleotide probes
analyzed in said quantifying step are selected according to the
method of claim 29.
10. The method of claim 1, wherein hybridization and quantification
is accomplished in under 48 hours.
11. The method of claim 1, wherein said multiplicity of genes is
100 genes or more.
12. The method of claim 1, wherein for each gene, said array
comprises at least 10 different oligonucleotide probes
complementary to subsequences of that gene.
13. The method of claim 1, wherein said hybridization is performed
with a fluid volume of about 250 .mu.l or less.
14. The method of claim 1, wherein said quantifying comprises
detecting a hybridization signal that is proportional to the
concentration of said RNA in said nucleic acid sample.
15. The method of claim 1, wherein said oligonucleotides are from 5
to about 50 nucleotides in length.
16. The method of claim 1, wherein said oligonucleotides are
synthesized by light-directed polymer synthesis.
17. The method of claim 1, wherein said array comprises
oligonucleotide sequences from constitutively expressed control
genes.
18. The method of claim 17, wherein said control genes are selected
from the group consisting of .beta.-actin, GAPDH, and the
transferrin receptor.
19. The method of claim 1, wherein said hybridization comprises a
hybridization at low stringency of 30.degree. C. to 50.degree. C.
and 6.times.SSPE-T or lower and a wash at higher stringency.
20. The method of claim 1, wherein said pool of nucleic acids is a
pool of mRNAs.
21-54. (canceled)
Description
BACKGROUND OF THE INVENTION
[0001] Many disease states are characterized by differences in the
expression levels of various genes either through changes in the
copy number of the genetic DNA or through changes in levels of
transcription (e.g. through control of initiation, provision of RNA
precursors, RNA processing, etc.) of particular genes. For example,
losses and gains of genetic material play an important role in
malignant transformation and progression. These gains and losses
are thought to be "driven" by at least two kinds of genes.
Oncogenes are positive regulators of tumorgenesis, while tumor
suppressor genes are negative regulators of tumorgenesis (Marshall,
Cell, 64: 313-326 (1991); Weinberg, Science, 254: 1138-1146
(1991)). Therefore, one mechanism of activating unregulated growth
is to increase the number of genes coding for oncogene proteins or
to increase the level of expression of these oncogenes (e.g. in
response to cellular or environmental changes), and another is to
lose genetic material or to decrease the level of expression of
genes that code for tumor suppressors. This model is supported by
the losses and gains of genetic material associated with glioma
progression (Mikkelson et al. J. Cellular Biochm. 46: 3-8 (1991)).
Thus, changes in the expression (transcription) levels of
particular genes (e.g. oncogenes or tumor suppressors), serve as
signposts for the presence and progression of various cancers.
[0002] Similarly, control of the cell cycle and cell development,
as well as diseases, are characterized by the variations in the
transcription levels of particular genes. Thus, for example, a
viral infection is often characterized by the elevated expression
of genes of the particular virus. For example, outbreaks of Herpes
simplex, Epstein-Barr virus infections (e.g. infectious
mononucleosis), cytomegalovirus, Varicella-zoster virus infections,
parvovirus infections, human papillomavirus infections, etc. are
all characterized by elevated expression of various genes present
in the respective virus. Detection of elevated expression levels of
characteristic viral genes provides an effective diagnostic of the
disease state. In particular, viruses such as herpes simplex, enter
quiescent states for periods of time only to erupt in brief periods
of rapid replication. Detection of expression levels of
characteristic viral genes allows detection of such active
proliferative (and presumably infective) states.
[0003] Oligonucleotide probes have long been used to detect
complementary nucleic acid sequences in a nucleic acid of interest
(the "target" nucleic acid) and have been used to detect expression
of particular genes (e.g., a Northern Blot). In some assay formats,
the oligonucleotide probe is tethered, i.e., by covalent
attachment, to a solid support, and arrays of oligonucleotide
probes immobilized on solid supports have been used to detect
specific nucleic acid sequences in a target nucleic acid. See,
e.g., PCT patent publication Nos. WO 89/10977 and 89/11548. Others
have proposed the use of large numbers of oligonucleotide probes to
provide the complete nucleic acid sequence of a target nucleic acid
but failed to provide an enabling method for using arrays of
immobilized probes for this purpose. See U.S. Pat. Nos. 5,202,231
and 5,002,867 and PCT patent publication No. WO 93/17126.
[0004] The use of "traditional" hybridization protocols for
monitoring or quantifying gene expression is problematic. For
example two or more gene products of approximately the same
molecular weight will prove difficult or impossible to distinguish
in a Northern blot because they are not readily separated by
electrophoretic methods. Similarly, as hybridization efficiency and
cross-reactivity varies with the particular subsequence (region) of
a gene being probed it is difficult to obtain an accurate and
reliable measure of gene expression with one, or even a few, probes
to the target gene.
[0005] The development of VLSIPS.TM. technology provided methods
for synthesizing arrays of many different oligonucleotide probes
that occupy a very small surface area. See U.S. Pat. No. 5,143,854
and PCT patent publication No. WO 90/15070. U.S. patent application
Ser. No. 082,937, filed Jun. 25, 1993, describes methods for making
arrays of oligonucleotide probes that can be used to provide the
complete sequence of a target nucleic acid and to detect the
presence of a nucleic acid containing a specific nucleotide
sequence.
[0006] Prior to the present invention, however, it was unknown that
high density oligonucleotide arrays could be used to reliably
monitor message levels of a multiplicity of preselected genes in
the presence of a large abundance of other (non-target) nucleic
acids (e.g., in a cDNA library, DNA reverse transcribed from an
mRNA, mRNA used directly or amplified, or polymerized from a DNA
template). In addition, the prior art provided no rapid and
effective method for identifying a set of oligonucleotide probes
that maximize specific hybridization efficacy while minimizing
cross-reactivity nor of using hybridization patterns (in particular
hybridization patterns of a multiplicity of oligonucleotide probes
in which multiple oligonucleotide probes are directed to each
target nucleic acid) for quantification of target nucleic acid
concentrations.
SUMMARY OF THE INVENTION
[0007] The present invention is premised, in part, on the discovery
that microfabricated arrays of large numbers of different
oligonucleotide probes (DNA chips) may effectively be used to not
only detect the presence or absence of target nucleic acid
sequences, but to quantify the relative abundance of the target
sequences in a complex nucleic acid pool. In particular, prior to
this invention it was unknown that hybridization to high density
probe arrays would permit small variations in expression levels of
a particular gene to be identified and quantified in a complex
population of nucleic acids that out number the target nucleic
acids by 1,000 fold to 1,000,000 fold or more.
[0008] Thus, this invention provides for a method of simultaneously
monitoring the expression (e.g. detecting and or quantifying the
expression) of a multiplicity of genes. The levels of transcription
for virtually any number of genes may be determined simultaneously.
Typically, at least about 10 genes, preferably at least about 100,
more preferably at least about 1000 and most preferably at least
about 10,000 different genes are assayed at one time.
[0009] The method involves providing a pool of target nucleic acids
comprising mRNA transcripts of one or more of said genes, or
nucleic acids derived from the mRNA transcripts; hybridizing the
pool of nucleic acids to an array of oligonucleotide probes
immobilized on a surface, where the array comprises more than 100
different oligonucleotides, each different oligonucleotide is
localized in a predetermined region of said surface, the density of
the different oligonucleotides is greater than about 60 different
oligonucleotides per 1 cm.sup.2, and the olignucleotide probes are
complementary to the mRNA transcripts or nucleic acids derived from
the mRNA transcripts; and quantifying the hybridized nucleic acids
in the array. In a preferred embodiment, the pool of target nucleic
acids is one in which the concentration of the target nucleic acids
(mRNA transcripts or nucleic acids derived from the mRNA
transcripts) is proportional to the expression levels of genes
encoding those target nucleic acids.
[0010] In a preferred embodiment, the array of oligonucleotide
probes is a high density array comprising greater than about 100,
preferably greater than about 1,000 more preferably greater than
about 16,000 and most preferably greater than about 65,000 or
250,000 or even 1,000,000 different oligonucleotide probes. Such
high density arrays comprise a probe density of generally greater
than about 60, more generally greater than about 100, most
generally greater than about 600, often greater greater than about
1000, more often greater than about 5,000, most often greater than
about 10,000, preferably greater than about 40,000 more preferably
greater than about 100,000, and most preferably greater than about
about 400,000 different oligonucleotide probes per cm.sup.2. The
oligonucleotide probes range from about 5 to about 50 nucleotides,
more preferably from about 10 to about 40 nucleotides and most
preferably from about 15 to about 40 nucleotides in length. The
array may comprise more than 10, preferably more than 50, more
preferably more than 100, and most preferably more than 1000
oligonucleotide probes specific for each target gene. Although a
planar array surface is preferred, the array may be fabricated on a
surface of virtually any shape or even a multiplicity of
surfaces.
[0011] The array may further comprise mismatch control probes.
Where such mismatch controls are present, the quantifying step may
comprise calculating the difference in hybridization signal
intensity between each of the oligonucleotide probes and its
corresponding mismatch control probe. The quantifying may further
comprise calculating the average difference in hybridization signal
intensity between each of the oligonucleotide probes and its
corresponding mismatch control probe for each gene.
[0012] The probes present in the high density array can be
oligonucleotide probes selected according to the optimization
methods described below. Alternatively, non-optimal probes may be
included in the array, but the probes used for quantification
(analysis) can be selected according to the optimization methods
described below.
[0013] Oligonucleotide arrays for the practice of this invention
are preferably synthesized by light-directed very large scaled
immobilized polymer synthesis (VLSPS) as described herein. The
array includes test probes which are oligonucleotide probes each of
which has a sequence that is complementary to a subsequence of one
of the genes (or the mRNA or the corresponding antisense cRNA)
whose expression is to be detected. In addition, the array can
contain normalization controls, mismatch controls and expression
level controls as described herein.
[0014] The pool of nucleic acids may be labeled before, during, or
after hybridization, although in a preferred embodiment, the
nucleic acids are labeled before hybridization. Fluorescence labels
are particularly preferred and, where used, quantification of the
hybridized nucleic acids is by quantification of fluorescence from
the hybridized fluorescently labeled nucleic acid. Such
quantification is facilitated by the use of a fluorescence
microscope which can be equipped with an automated stage to permit
automatic scanning of the array, and which can be equipped with a
data acquisition system for the automated measurement recording and
subsequent processing of the fluorescence intensity
information.
[0015] In a preferred embodiment, hybridization is at low
stringency (e.g. about 20.degree. C. to about 50.degree. C., more
preferably about 30.degree. C. to about 40.degree. C., and most
preferably about 37.degree. C. and 6.times.SSPE-T or lower) with at
least one wash at higher stringency. Hybridization may include
subsequent washes at progressively increasing stringency until a
desired level of hybridization specificity is reached.
[0016] The pool of target nucleic acids can be the total
polyA.sup.+ mRNA isolated from a biological sample, or cDNA made by
reverse transcription of the RNA or second strand cDNA or RNA
transcribed from the double stranded cDNA intermediate.
Alternatively, the pool of target nucleic acids can be treated to
reduce the complexity of the sample and thereby reduce the
background signal obtained in hybridization. In one approach, a
pool of mRNAs, derived from a biological sample, is hybridized with
a pool of oligonucleotides comprising the oligonucleotide probes
present in the high density array. The pool of hybridized nucleic
acids is then treated with RNase A which digests the single
stranded regions. The remaining double stranded hybridization
complexes are then denatured and the oligonucleotide probes are
removed, leaving a pool of mRNAs enhanced for those mRNAs
complementary to the oligonucleotide probes in the high density
array.
[0017] In another approach to background reduction, a pool of mRNAs
derived from a biological sample is hybridized with paired target
specific oligonucleotides where the paired target specific
oligonucleotides are complementary to regions flanking subsequences
of the mRNAs complementary to the oligonucleotide probes in the
high density array. The pool of hybridized nucleic acids is treated
with RNase H which digests the hybridized (double stranded) nucleic
acid sequences. The remaining single stranded nucleic acid
sequences which have a length about equivalent to the region
flanked by the paired target specific oligonucleotides are then
isolated (e.g. by electrophoresis) and used as the pool of nucleic
acids for monitoring gene expression.
[0018] Finally, a third approach to background reduction involves
eliminating or reducing the representation in the pool of
particular preselected target mRNA messages (e.g., messages that
are characteristically overexpressed in the sample). This method
involves hybridizing an oligonucleotide probe that is complementary
to the preselected target mRNA message to the pool of polyA.sup.+
mRNAs derived from a biological sample. The oligonucleotide probe
hybridizes with the particular preselected polyA.sup.+ mRNA
(message) to which it is complementary. The pool of hybridized
nucleic acids is treated with RNase H which digests the double
stranded (hybridized) region thereby separating the message from
its polyA.sup.+ tail. Isolating or amplifying (e.g., using an oligo
dT column) the polyA.sup.+ mRNA in the pool then provides a pool
having a reduced or no representation of the preselected target
mRNA message.
[0019] It will be appreciated that the methods of this invention
can be used to monitor (detect and/or quantify) the expression of
any desired gene of known sequence or subsequence. Moreover, these
methods permit monitoring expression of a large number of genes
simultaneously and effect significant advantages in reduced labor,
cost and time. The simultaneous monitoring of the expression levels
of a multiplicity of genes permits effective comparison of relative
expression levels and identification of biological conditions
characterized by alterations of relative expression levels of
various genes. Genes of particular interest for expression
monitoring include genes involved in the pathways associated with
various pathological conditions (e.g., cancer) and whose expression
is thus indicative of the pathological condition. Such genes
include, but are not limited to the HER2 (c-erbB-2/neu)
proto-oncogene in the case of breast cancer, receptor tyrosine
kinases (RTKS) associated with the etiology of a number of tumors
including carcinomas of the breast, liver, bladder, pancreas, as
well as glioblastomas, sarcomas and squamous carcinomas, and tumor
suppressor genes such as the P53 gene and other "marker" genes such
as RAS, MSH2, MLH1 and BRCA1. Other genes of particular interest
for expression monitoring are genes involved in the immune response
(e.g., interleukin genes), as well as genes involved in cell
adhesion (e.g., the integrins or selectins) and signal transduction
(e.g., tyrosine kinases), etc.
[0020] In another embodiment, this invention provides for a method
of selecting a set of oligonucleotide probes, that specifically
bind to a target nucleic acid (e.g., a gene or genes whose
expression is to be monitored or nucleic acids derived from the
gene or its transcribed mRNA). The method involves providing a high
density array of oligonucleotide probes where the array comprises a
multiplicity of probes wherein each probe is complementary to a
subsequence of the target nucleic acid. The target nucleic acid is
then hybridized to the array of oligonucleotide probes to identify
and select those probes where the difference in hybridization
signal intensity between each probe and its mismatch control is
detectable (preferably greater than about 10% of the background
signal intensity, more preferably greater than about 20% of the
background signal intensity and most preferably greater than about
50% of the background signal intensity). The method can further
comprise hybridizing the array to a second pool of nucleic acids
comprising nucleic acids other than the target nucleic acids; and
identifying and selecting probes having the lowest hybridization
signal and where both the probe and its mismatch control have a
hybridization intensity equal to or less than about 5 times the
background signal intensity, preferably equal to or less than about
2 times the background signal intensity, more preferably equal to
or less than about 1 times the background signal intensity, and
most preferably equal or less than about half the background signal
intensity.
[0021] In a preferred embodiment, the multiplicity of probes can
include every different probe of length n that is complementary to
a subsequence of the target nucleic acid. The probes can range from
about 10 to about 50 nucleotides in length. The array is preferably
a high density array as described above. Similarly, the
hybridization methods, conditions, times, fluid volumes, detection
methods are as described above and herein below.
[0022] In addition, this invention provides for a composition
comprising an array of oligonucleotide probes immobilized on a
substrate, where the array comprises more than 100 different
oligonucleotides and each different oligonucleotide is localized in
a predetermined region of the solid support and the density of the
array is greater than about 60 different oligonucleotides per 1
cm.sup.2 of substrate. The oligonucleotide probes are specifically
hybridized to one or more fluorescently labeled nucleic acids such
that the fluorescence in each region of the array is indicative of
the level of expression of each of a multiplicity of preselected
genes. The array is preferably a high density array as described
above and may further comprise expression level controls, mismatch
controls and normalization controls as described herein.
[0023] Finally, this invention provides for kits for simultaneously
monitoring expression levels of a multiplicity of genes. The kits
include an array of immobilized oligonucleotide probes
complementary to subsequences of the multiplicity of target genes,
as described above. In one embodiment, the array comprises at least
100 different oligonucleotide probes and the density of the array
is greater than about 60 different oligonucleotides per 1 cm.sup.2
of surface. The kit may also include instructions describing the
use of the array for detection and/or quantification of expression
levels of the multiplicity of genes. The kit may additionally
include one or more of the following: buffers, hybridization mix,
wash and read solutions, labels, labeling reagents (enzymes etc.),
"control" nucleic acids, software for probe selection, array
reading or data analysis and any of the other materials or reagents
described herein for the practice of the claimed methods.
[0024] Definitions.
[0025] The phrase "massively parallel screening" refers to the
simultaneous screening of at least about 100, preferably about
1000, more preferably about 10,000 and most preferably about
1,000,000 different nucleic acid hybridizations.
[0026] The terms "nucleic acid" or "nucleic acid molecule" refer to
a deoxyribonucleotide or ribonucleotide polymer in either single-or
double-stranded form, and unless otherwise limited, would encompass
known analogs of natural nucleotides that can function in a similar
manner as naturally occurring nucleotides.
[0027] An oligonucleotide is a single-stranded nucleic acid ranging
in length from 2 to about 500 bases.
[0028] As used herein a "probe" is defined as an oligonucleotide
capable of binding to a target nucleic acid of complementary
sequence through one or more types of chemical bonds, usually
through complementary base pairing, usually through hydrogen bond
formation. As used herein, an oligonucleotide probe may include
natural (ie. A, G, C, or T) or modified bases (7-deazaguanosine,
inosine, etc.). In addition, the bases in oligonucleotide probe may
be joined by a linkage other than a phosphodiester bond, so long as
it does not interfere with hybridization. Thus, oligonucleotide
probes may be peptide nucleic acids in which the constituent bases
are joined by peptide bonds rather than phosphodiester
linkages.
[0029] The term "target nucleic acid" refers to a nucleic acid
(often derived from a biological sample), to which the
oligonucleotide probe is designed to specifically hybridize. It is
either the presence or absence of the target nucleic acid that is
to be detected, or the amount of the target nucleic acid that is to
be quantified. The target nucleic acid has a sequence that is
complementary to the nucleic acid sequence of the corresponding
probe directed to the target. The term target nucleic acid may
refer to the specific subsequence of a larger nucleic acid to which
the probe is directed or to the overall sequence (e.g., gene or
mRNA) whose expression level it is desired to detect. The
difference in usage will be apparent from context.
[0030] "Subsequence" refers to a sequence of nucleic acids that
comprise a part of a longer sequence of nucleic acids.
[0031] The term "complexity" is used here according to standard
meaning of this term as established by Britten et al. Methods of
Enzymol. 29:363 (1974). See, also Cantor and Schimmel Biophysical
Chemistry: Part III at 1228-1230 for further explanation of nucleic
acid complexity.
[0032] "Bind(s) substantially" refers to complementary
hybridization between a probe nucleic acid and a target nucleic
acid and embraces minor mismatches that can be accommodated by
reducing the stringency of the hybridization media to achieve the
desired detection of the target polynucleotide sequence.
[0033] The phrase "hybridizing specifically to", refers to the
binding, duplexing, or hybridizing of a molecule only to a
particular nucleotide sequence under stringent conditions when that
sequence is present in a complex mixture (e.g., total cellular) DNA
or RNA. The term "stringent conditions" refers to conditions under
which a probe will hybridize to its target subsequence, but to no
other sequences. Stringent conditions are sequence-dependent and
will be different in different circumstances. Longer sequences
hybridize specifically at higher temperatures. Generally, stringent
conditions are selected to be about 5.degree. C. lower than the
thermal melting point (Tm) for the specific sequence at a defined
ionic strength and pH. The Tm is the temperature (under defined
ionic strength, pH, and nucleic acid concentration) at which 50% of
the probes complementary to the target sequence hybridize to the
target sequence at equilibrium. (As the target sequences are
generally present in excess, at Tm, 50% of the probes are occupied
at equilibrium). Typically, stringent conditions will be those in
which the salt concentration is at least about 0.01 to 1.0 M Na ion
concentration (or other salts) at pH 7.0 to 8.3 and the temperature
is at least about 30.degree. C. for short probes (e.g., 10 to 50
nucleotides). Stringent conditions may also be achieved with the
addition of destabilizing agents such as formamide.
[0034] The term "mismatch control" refers to a probe that has a
sequence deliberately selected not to be perfectly complementary to
a particular target sequence. The mismatch control typically has a
corresponding test probe that is perfectly complementary to the
same particular target sequence. The mismatch may comprise one or
more bases. While the mismatch(s) may be locates anywhere in the
mismatch probe, terminal mismatches are less desirable as a
terminal mismatch is less likely to prevent hybridization of the
target sequence. In a particularly preferred embodiment, the
mismatch is located at or near the center of the probe such that
the mismatch is most likely to destabilize the duplex with the
target sequence under the test hybridization conditions.
[0035] The terms "background" or "background signal intensity"
refer to hybridization signals resulting from non-specific binding,
or other interactions, between the labeled target nucleic acids and
components of the oligonucleotide array (e.g., the oligonucleotide
probes, control probes, the array substrate, etc.). Background
signals may also be produced by intrinsic fluorescence of the array
components themselves. A single background signal can be calculated
for the entire array, or a different background signal may be
calculated for each target nucleic acid. In a preferred embodiment,
background is calculated as the average hybridization signal
intensity for the lowest 5% to 10% of the probes in the array, or,
where a different background signal is calculated for each target
gene, for the lowest 5% to 10% of the probes for each gene. Of
course, one of skill in the art will appreciate that where the
probes to a particular gene hybridize well and thus appear to be
specifically binding to a target sequence, they should not be used
in a background signal calculation. Alternatively, background may
be calculated as the average hybridization signal intensity
produced by hybridization to probes that are not complementary to
any sequence found in the sample (e.g. probes directed to nucleic
acids of the opposite sense or to genes not found in the sample
such as bacterial genes where the sample is mammalian nucleic
acids). Background can also be calculated as the average signal
intensity produced by regions of the array that lack any probes at
all.
[0036] The term "quantifying" when used in the context of
quantifying transcription levels of a gene can refer to absolute or
to relative quantification. Absolute quantification may be
accomplished by inclusion of known concentration(s) of one or more
target nucleic acids (e.g. control nucleic acids such as Bio B or
with known amounts the target nucleic acids themselves) and
referencing the hybridization intensity of unknowns with the known
target nucleic acids (e.g. through generation of a standard curve).
Alternatively, relative quantification can be accomplished by
comparison of hybridization signals between two or more genes, or
between two or more treatments to quantify the changes in
hybridization intensity and, by implication, transcription
level.
BRIEF DESCRIPTION OF THE DRAWINGS
[0037] FIG. 1 shows a plot of hybridization intensity plotted as a
function of concentration of target mRNA. Graphs A and B show the
hybridization intensity of IL-4 RNA hybridized to the high density
array of Example 1. Graph B expands the ordinate of graph A to show
the low concentration values. Graphs C and D show hybridization
intensity plotted as a function of target RNA for a collection of
different target RNAs. The graphs show the average values of the
1000 highest intensity probes. Graph D expands the ordinate of
graph C to show the low concentration values.
[0038] FIG. 2 shows a plot of hybridization intensity for mouse
library RNA, mouse library RNA spiked with mCTLA8, IL-6, IL-3,
IFN-.gamma., and IL-12p40 at 10 pM or 50 pM. The data presented is
based upon approximately the best (optimal) 10% of the probes to
each gene, where the optimal probes are selected according to the
method disclosed herein.
[0039] FIG. 3 shows a plot of the data from Example 1 (FIG. 2) with
the ordinate condensed to show the constitutively expressed GAPDH
and Actin genes and the intrinsic expressed IL-10 gene.
DETAILED DESCRIPTION
[0040] This invention provides methods of monitoring (detecting
and/or quantifying) the expression levels of one or more genes. The
methods involve hybridization of a nucleic acid target sample to a
high density array of nucleic acid probes and then quantifying the
amount of target nucleic acids hybridized to each probe in the
array.
[0041] While nucleic acid hybridization has been used for some time
to determine the expression levels of various genes (e.g., Northern
Blot), it was a surprising discovery of this invention that high
density arrays are suitable for the quantification of the small
variations in expression (transcription) levels of a gene in the
presence of a large population of heterogenous nucleic acids. The
signal may be present at a concentration of less than about 1 in
1,000, and is often present at a concentration less than 1 in
10,000 more preferably less than about 1 in 50,000 and most
preferably less than about 1 in 100,000 or even 1 in 1,000,000.
[0042] Prior to this invention, it was expected that hybridization
of such a complex mixture to a high density array might overwhelm
the available probes and make it impossible to detect the presence
of low-level target nucleic acids. It was thus unclear that a low
level signal could be isolated and detected in the presence of
misleading signals due to cross-hybridization and non-specific
binding both to substrate and probe.
[0043] It was a surprising discovery that, to the contrary, high
density arrays are particularly well suited for monitoring
expression of a multiplicity of genes and provide a level of
sensitivity and discrimination hitherto unexpected.
[0044] Preferred high density arrays of this invention comprise
greater than about 100, preferably greater than about 1000, more
preferably greater than about 16,000 and most preferably greater
than about 65,000 or 250,000 or even greater than about 1,000,000
different oligonucleotide probes. The oligonucleotide probes range
from about 5 to about 50 nucleotides, more preferably from about 10
to about 40 nucleotides and most preferably from about 15 to about
40 nucleotides in length.
[0045] The location and sequence of each different oligonucleotide
probe sequence in the array is known. Moreover, the large number of
different probes occupies a relatively small area providing a high
density array having a probe density of generally greater than
about 60, more generally greater than about 100, most generally
greater than about 600, often greater greater than about 1000, more
often greater than about 5,000, most often greater than about
10,000, preferably greater than about 40,000 more preferably
greater than about 100,000, and most preferably greater than about
about 400,000 different oligonucleotide probes per cm.sup.2. The
small surface area of the array (often less than about 10 cm.sup.2,
preferably less than about 5 cm.sup.2 more preferably less than
about 2 cm.sup.2, and most preferably less than about 1.6 cm.sup.2)
permits extremely uniform hybridization conditions (temperature
regulation, salt content, etc.) while the extremely large number of
probes allows massively parallel processing of hybridizations.
[0046] It was a discovery of this invention that the use of high
density arrays for expression monitoring provides a number of
advantages not found with other methods. For example, the use of
large numbers of different probes that specifically bind to the
transcription product of a particular target gene provides a high
degree of redundancy and internal control that permits optimization
of probe sets for effective detection of particular target genes
and minimizes the possibility of errors due to cross-reactivity
with other nucleic acid species.
[0047] Apparently suitable probes often prove ineffective for
expression monitoring by hybridization. For example, certain
subsequences of a particular target gene may be found in other
regions of the genome and probes directed to these subsequences
will cross-hybridize with the other regions and not provide a
signal that is a meaningful measure of the expression level of the
target gene. Even probes that show little cross reactivity may be
unsuitable because they generally show poor hybridization due to
the formation of structures that prevent effective hybridization.
Finally, in sets with large numbers of probes, it is difficult to
identify hybridization conditions that are optimal for all the
probes in a set. Because of the high degree of redundancy provided
by the large number of probes for each target gene, it is possible
to eliminate those probes that function poorly under a given set of
hybridization conditions and still retain enough probes to a
particular target gene to provide an extremely sensitive and
reliable measure of the expression level (transcription level) of
that gene.
[0048] In addition, the use of large numbers of different probes to
each target gene makes it possible to monitor expression of
families of closely-related nucleic acids. The probes may be
selected to hybridize both with subsequences that are conserved
across the family and with subsequences that differ in the
different nucleic acids in the family. Thus, hybridization with
such arrays permits simultaneous monitoring of the various members
of a gene family even where the various genes are approximately the
same size and have high levels of homology. Such measurements are
difficult or impossible with traditional hybridization methods.
[0049] Because the high density arrays contain such a large number
of probes it is possible to provide numerous controls including,
for example, controls for variations or mutations in a particular
gene, controls for overall hybridization conditions, controls for
sample preparation conditions, controls for metabolic activity of
the cell from which the nucleic acids are derived and mismatch
controls for non-specific binding or cross hybridization.
[0050] Finally, because of the small area occupied by the high
density arrays, hybridization may be carried out in extremely small
fluid volumes (e.g., 250 .mu.l or less, more preferably 100 .mu.l
or less, and most preferably 10 .mu.l or less). In small volumes,
hybridization may proceed very rapidly. In addition, hybridization
conditions are extremely uniform throughout the sample, and the
hybridization format is amenable to automated processing.
[0051] This invention demonstrates that hybridization with high
density oligonucleotide probe arrays provides an effective means of
monitoring expression of a multiplicity of genes. In addition this
invention provides for methods of sample treatment and array
designs and methods of probe selection that optimize signal
detection at extremely low concentrations in complex nucleic acid
mixtures.
[0052] The expression monitoring methods of this invention may be
used in a wide variety of circumstances including detection of
disease, identification of differential gene expression between two
samples (e.g., a pathological as compared to a healthy sample),
screening for compositions that upregulate or downregulate the
expression of particular genes, and so forth.
[0053] In one preferred embodiment, the methods of this invention
are used to monitor the expression (transcription) levels of
nucleic acids whose expression is altered in a disease state. For
example, a cancer may be characterized by the overexpression of a
particular marker such as the HER2 (c-erbB-2/neu) proto-oncogene in
the case of breast cancer. Similarly, overexpression of receptor
tyrosine kinases (RTKs) is associated with the etiology of a number
of tumors including carcinomas of the breast, liver, bladder,
pancreas, as well as glioblastomas, sarcomas and squamous
carcinomas (see Carpenter, Ann. Rev. Biochem., 56: 881-914 (1987)).
Conversely, a cancer (e.g., colerectal, lung and breast) may be
characterized by the mutation of or underexpression of a tumor
suppressor gene such as P53 (see, e.g., Tominaga et al. Critical
Rev. in Oncogenesis, 3: 257-282 (1992)).
[0054] The materials and methods of this invention are typically
used to monitor the expression of a multiplicity of different genes
simultaneously. Thus, in one embodiment, the invention provide for
simultaneous monitoring of at least about 10, preferably at least
about 100, more preferably at least about 1000 and most preferably
at least about 10,000 different genes.
[0055] I. Methods of Monitoring Gene Expression.
[0056] Generally the methods of monitoring gene expression of this
invention involve (1) providing a pool of target nucleic acids
comprising RNA transcript(s) of one or more target gene(s), or
nucleic acids derived from the RNA transcript(s); (2) hybridizing
the nucleic acid sample to a high density array of probes
(including control probes); and (3) detecting the hybridized
nucleic acids and calculating a relative expression (transcription)
level.
[0057] A) Providing a Nucleic Acid Sample.
[0058] One of skill in the art will appreciate that in order to
measure the transcription level (and thereby the expression level)
of a gene or genes, it is desirable to provide a nucleic acid
sample comprising mRNA transcript(s) of the gene or genes, or
nucleic acids derived from the mRNA transcript(s). As used herein,
a nucleic acid derived from an mRNA transcript refers to a nucleic
acid for whose synthesis the mRNA transcript or a subsequence
thereof has ultimately served as a template. Thus, a cDNA reverse
transcribed from an mRNA, an RNA transcribed from that cDNA, a DNA
amplified from the cDNA, an RNA transcribed from the amplified DNA,
etc., are all derived from the mRNA transcript and detection of
such derived products is indicative of the presence and/or
abundance of the original transcript in a sample. Thus, suitable
samples include, but are not limited to, mRNA transcripts of the
gene or genes, cDNA reverse transcribed from the mRNA, cRNA
transcribed from the cDNA, DNA amplified from the genes, RNA
transcribed from amplified DNA, and the like.
[0059] In a particularly preferred embodiment, where it is desired
to quantify the transcription level (and thereby expression) of a
one or more genes in a sample, the nucleic acid sample is one in
which the concentration of the mRNA transcript(s) of the gene or
genes, or the concentration of the nucleic acids derived from the
mRNA transcript(s), is proportional to the transcription level (and
therefore expression level) of that gene. Similarly, it is
preferred that the hybridization signal intensity be proportional
to the amount of hybridized nucleic acid. While it is preferred
that the proportionality be relatively strict (e.g., a doubling in
transcription rate results in a doubling in mRNA transcript in the
sample nucleic acid pool and a doubling in hybridization signal),
one of skill will appreciate that the proportionality can be more
relaxed and even non-linear. Thus, for example, an assay where a 5
fold difference in concentration of the target mRNA results in a 3
to 6 fold difference in hybridization intensity is sufficient for
most purposes. Where more precise quantification is required
appropriate controls can be run to correct for variations
introduced in sample preparation and hybridization as described
herein. In addition, serial dilutions of "standard" target mRNAs
can be used to prepare calibration curves according to methods well
known to those of skill in the art. Of course, where simple
detection of the presence or absence of a transcript is desired, no
elaborate control or calibration is required.
[0060] In the simplest embodiment, such a nucleic acid sample is
the total mRNA isolated from a biological sample. The term
"biological sample", as used herein, refers to a sample obtained
from an organism or from components (e.g., cells) of an organism.
The sample may be of any biological tissue or fluid. Frequently the
sample will be a "clinical sample" which is a sample derived from a
patient. Such samples include, but are not limited to, sputum,
blood, blood cells (e.g., white cells), tissue or fine needle
biopsy samples, urine, peritoneal fluid, and pleural fluid, or
cells therefrom. Biological samples may also include sections of
tissues such as frozen sections taken for histological
purposes.
[0061] The nucleic acid (either genomic DNA or mRNA) may be
isolated from the sample according to any of a number of methods
well known to those of skill in the art. One of skill will
appreciate that where alterations in the copy number of a gene are
to be detected genomic DNA is preferably isolated. Conversely,
where expression levels of a gene or genes are to be detected,
preferably RNA (mRNA) is isolated.
[0062] Methods of isolating total mRNA are well known to those of
skill in the art. For example, methods of isolation and
purification of nucleic acids are described in detail in Chapter 3
of Laboratory Techniques in Biochemistry and Molecular Biology:
Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic
Acid Preparation, P. Tijssen, ed. Elsevier, N.Y. (1993) and Chapter
3 of Laboratory Techniques in Biochemistry and Molecular Biology:
Hybridization With Nucleic Acid Probes, Part L Theory and Nucleic
Acid Preparation, P. Tijssen, ed. Elsevier, N.Y. (1993)).
[0063] In a preferred embodiment, the total nucleic acid is
isolated from a given sample using, for example, an acid
guanidinium-phenol-chloroform extraction method and polyA.sup.+
mRNA is isolated by oligo dT column chromatography or by using
(dT)n magnetic beads (see, e.g., Sambrook et al., Molecular
Cloning: A Laboratory Manual (2nd ed.), Vols. 1-3, Cold Spring
Harbor Laboratory, (1989), or Current Protocols in Molecular
Biology, F. Ausubel et al., ed. Greene Publishing and
Wiley-Interscience, New York (1987)).
[0064] Frequently, it is desirable to amplify the nucleic acid
sample prior to hybridization. One of skill in the art will
appreciate that whatever amplification method is used, if a
quantitative result is desired, care must be taken to use a method
that maintains or controls for the relative frequencies of the
amplified nucleic acids.
[0065] Methods of "quantitative" amplification are well known to
those of skill in the art. For example, quantitative PCR involves
simultaneously co-amplifying a known quantity of a control sequence
using the same primers. This provides an internal standard that may
be used to calibrate the PCR reaction. The high density array may
then include probes specific to the internal standard for
quantification of the amplified nucleic acid.
[0066] One preferred internal standard is a synthetic AW106 cRNA.
The AW106 cRNA is combined with RNA isolated from the sample
according to standard techniques known to those of skill in the
art. The RNA is then reverse transcribed using a reverse
transcriptase to provide copy DNA. The cDNA sequences are then
amplified (e.g., by PCR) using labeled primers. The amplification
products are separated, typically by electrophoresis, and the
amount of radioactivity (proportional to the amount of amplified
product) is determined. The amount of mRNA in the sample is then
calculated by comparison with the signal produced by the known
AW106 RNA standard. Detailed protocols for quantitative PCR are
provided in PCR Protocols, A Guide to Methods and Applications,
Innis et al., Academic Press, Inc. N.Y., (1990).
[0067] Other suitable amplification methods include, but are not
limited to polymerase chain reaction (PCR) (Innis, et al., PCR
Protocols. A guide to Methods and Application. Academic Press, Inc.
San Diego, (1990)), ligase chain reaction (LCR) (see Wu and
Wallace, Genomics, 4: 560 (1989), Landegren, et al., Science, 241:
10T7 (1988) and Barringer, et al., Gene, 89: 117 (1990),
transcription amplification (Kwoh, et al., Proc. Natl. Acad. Sci.
USA, 86: 1173 (1989)), and self-sustained sequence replication
(Guatelli, et al., Proc. Nat. Acad. Sci. USA, 87: 1874 (1990)).
[0068] In a particularly preferred embodiment, the sample mRNA is
reverse transcribed with a reverse transcriptase and a promer
consisting of oligo dT and a sequence encoding the phage T7
promoter to provide single stranded DNA template. The second DNA
strand is polymerized using a DNA polymerase. After synthesis of
double-stranded cDNA, T7 RNA polymerase is added and RNA is
transcribed from the cDNA template. Successive rounds of
transcription from each single cDNA template results in amplified
RNA. Methods of in vitro polymerization are well known to those of
skill in the art (see, e.g., Sambrook, supra.) and this particular
method is described in detail by Van Gelder, et al., Proc. Natl.
Acad. Sci. USA, 87: 1663-1667 (1990) who demonstrate that in vitro
amplification according to this method preserves the relative
frequencies of the various RNA transcripts. Moreover, Eberwine et
al. Proc. Natl. Acad. Sci. USA, 89: 3010-3014 provide a protocol
that uses two rounds of amplification via in vitro transcription to
achieve greater than 10.sup.6 fold amplification of the original
starting material thereby permiting expression monitoring even
where biological samples are limited.
[0069] It will be appreciated by one of skill in the art that the
direct transcription method described above provides an antisense
(aRNA) pool. Where antisense RNA is used as the target nucleic
acid, the oligonucleotide probes provided in the array are chosen
to be complementary to subsequences of the antisense nucleic acids.
Conversely, where the target nucleic acid pool is a pool of sense
nucleic acids, the oligonucleotide probes are selected to be
complementary to subsequences of the sense nucleic acids. Finally,
where the nucleic acid pool is double stranded, the probes may be
of either sense as the target nucleic acids include both sense and
antisense strands.
[0070] The protocols cited above include methods of generating
pools of either sense or antisense nucleic acids. Indeed, one
approach can be used to generate either sense or antisense nucleic
acids as desired. For example, the cDNA can be directionally cloned
into a vector (e.g., Stratagene's p Bluscript II KS (+) phagemid)
such that it is flanked by the T3 and T7 promoters. In vitro
transcription with the T3 polymerase will produce RNA of one sense
(the sense depending on the orientation of the insert), while in
vitro transcription with the T7 polymerase will produce RNA having
the opposite sense. Other suitable cloning systems include phage
lamda vectors designed for Cre-loxP plasmid subcloning (see e.g.,
Palazzolo et al., Gene, 88: 25-36 (1990)).
[0071] In a particularly preferred embodiment, a high activity RNA
polymerase (e.g. about 2500 units/.mu.L for T7, available from
Epicentre Technologies) is used.
[0072] B) Labeling Nucleic Acids.
[0073] In a preferred embodiment, the hybridized nucleic acids are
detected by detecting one or more labels attached to the sample
nucleic acids. The labels may be incorporated by any of a number of
means well known to those of skill in the art. However, in a
preferred embodiment, the label is simultaneously incorporated
during the amplification step in the preparation of the sample
nucleic acids. Thus, for example, polymerase chain reaction (PCR)
with labeled primers or labeled nucleotides will provide a labeled
amplification product. In a preferred embodiment, transcription
amplification, as described above, using a labeled nucleotide (e.g.
fluorescein-labeled UTP and/or CTP) incorporates a label into the
transcribed nucleic acids.
[0074] Alternatively, a label may be added directly to the original
nucleic acid sample (e.g., mRNA, polyA mRNA, cDNA, etc.) or to the
amplification product after the amplification is completed. Means
of attaching labels to nucleic acids are well known to those of
skill in the art and include, for example nick translation or
end-labeling (e.g. with a labeled RNA) by kinasing of the nucleic
acid and subsequent attachment (ligation) of a nucleic acid linker
joining the sample nucleic acid to a label (e.g., a
fluorophore).
[0075] Detectable labels suitable for use in the present invention
include any composition detectable by spectroscopic, photochemical,
biochemical, immunochemical, electrical, optical or chemical means.
Useful labels in the present invention include biotin for staining
with labeled streptavidin conjugate, magnetic beads (e.g.,
Dynabeads.TM.), fluorescent dyes (e.g., fluorescein, texas red,
rhodamine, green fluorescent protein, and the like), radiolabels
(e.g., .sup.3H, .sup.125I, .sup.35S, 14C, or .sup.32P), enzymes
(e.g., horse radish peroxidase, alkaline phosphatase and others
commonly used in an ELISA), and colorimetric labels such as
colloidal gold or colored glass or plastic (e.g., polystyrene,
polypropylene, latex, etc.) beads. Patents teaching the use of such
labels include U.S. Pat. Nos. 3,817,837; 3,850,752; 3,939,350;
3,996,345; 4,277,437; 4,275,149; and 4,366,241.
[0076] Means of detecting such labels are well known to those of
skill in the art. Thus, for example, radiolabels may be detected
using photographic film or scintillation counters, fluorescent
markers may be detected using a photodetector to detect emitted
light. Enzymatic labels are typically detected by providing the
enzyme with a substrate and detecting the reaction product produced
by the action of the enzyme on the substrate, and colorimetric
labels are detected by simply visualizing the colored label.
[0077] The label may be added to the target (sample) nucleic
acid(s) prior to, or after the hybridization. So called "direct
labels" are detectable labels that are directly attached to or
incorporated into the target (sample) nucleic acid prior to
hybridization. In contrast, so called "indirect labels" are joined
to the hybrid duplex after hybridization. Often, the indirect label
is attached to a binding moiety that has been attached to the
target nucleic acid prior to the hybridization. Thus, for example,
the target nucleic acid may be biotinylated before the
hybridization. After hybridization, an aviden-conjugated
fluorophore will bind the biotin bearing hybrid duplexes providing
a label that is easily detected. For a detailed review of methods
of labeling nucleic acids and detecting labeled hybridized nucleic
acids see Laboratory Techniques in Biochemistry and Molecular
Biology, Vol. 24: Hybridization With Nucleic Acid Probes, P.
Tijssen, ed. Elsevier, N.Y., (1993)).
[0078] Fluorescent labels are preferred and easily added during an
in vitro transcription reaction. In a preferred embodiment,
fluorescein labeled UTP and CTP are incorporated into the RNA
produced in an in vitro transcription reaction as described
above.
[0079] C) Modifying Sample to Improve Signal/Noise Ratio.
[0080] The nucleic acid sample may be modified prior to
hybridization to the high density probe array in order to reduce
sample complexity thereby decreasing background signal and
improving sensitivity of the measurement. In one embodiment,
complexity reduction is achieved by selective degradation of
background mRNA. This is accomplished by hybridizing the sample
mRNA (e.g., polyA.sup.+ RNA) with a pool of DNA oligonucleotides
that hybridize specifically with the regions to which the probes in
the array specifically hybridize. In a preferred embodiment, the
pool of oligonucleotides consists of the same probe
oligonucleotides as found on the high density array.
[0081] The pool of oligonucleotides hybridizes to the sample mRNA
forming a number of double stranded (hybrid duplex) nucleic acids.
The hybridized sample is then treated with RNase A, a nuclease that
specifically digests single stranded RNA. The RNase A is then
inhibited, using a protease and/or commercially available RNase
inhibitors, and the double stranded nucleic acids are then
separated from the digested single stranded RNA. This separation
may be accomplished in a number of ways well known to those of
skill in the art including, but not limited to, electrophoresis,
and gradient centrifugation. However, in a preferred embodiment,
the pool of DNA oligonucleotides is provided attached to beads
forming thereby a nucleic acid affinity column. After digestion
with the RNase A, the hybridized DNA is removed simply by
denaturing (e.g., by adding heat or increasing salt) the hybrid
duplexes and washing the previously hybridized mRNA off in an
elution buffer.
[0082] The undigested mRNA fragments which will be hybridized to
the probes in the high density array are then preferably
end-labeled with a fluorophore attached to an RNA linker using an
RNA ligase. This procedure produces a labeled sample RNA pool in
which the nucleic acids that do not correspond to probes in the
array are eliminated and thus unavailable to contribute to a
background signal.
[0083] Another method of reducing sample complexity involves
hybridizing the mRNA with deoxyoligonucleotides that hybridize to
regions that border on either size the regions to which the high
density array probes are directed. Treatment with RNAse H
selectively digests the double stranded (hybrid duplexes) leaving a
pool of single-stranded mRNA corresponding to the short regions
(e.g., 20 mer) that were formerly bounded by the
deoxyolignucleotide probes and which correspond to the targets of
the high density array probes and longer mRNA sequences that
correspond to regions between the targets of the probes of the high
density array. The short RNA fragments are then separated from the
long fragments (e.g., by electrophoresis), labeled if necessary as
described above, and then are ready for hybridization with the high
density probe array.
[0084] In a third approach, sample complexity reduction involves
the selective removal of particular (preselected) mRNA messages. In
particular, highly expressed mRNA messages that are not
specifically probed by the probes in the high density array are
preferably removed. This approach involves hybridizing the
polyA.sup.+ mRNA with an oligonucleotide probe that specifically
hybridizes to the preselected message close to the 3' (poly A) end.
The probe may be selected to provide high specificity and low cross
reactivity. Treatment of the hybridized message/probe complex with
RNase H digests the double stranded region effectively removing the
polyA.sup.+ tail from the rest of the message. The sample is then
treated with methods that specifically retain or amplify
polyA.sup.+ RNA (e.g., an oligo dT column or (dT)n magnetic beads).
Such methods will not retain or amplify the selected message(s) as
they are no longer associated with a polyA.sup.+ tail. These highly
expressed messages are effectively removed from the sample
providing a sample that has reduced background mRNA.
[0085] II. Hybridization Array Design.
[0086] A) Probe Composition.
[0087] One of skill in the art will appreciate that an enormous
number of array designs are suitable for the practice of this
invention. The high density array will typically include a number
of probes that specifically hybridize to the nucleic acid
expression of which is to be detected. In addition, in a preferred
embodiment, the array will include one or more control probes.
[0088] 1) Test Probes.
[0089] In its simplest embodiment, the high density array includes
"test probes". These are oligonucleotides that range from about 5
to about 50 nucleotides, more preferably from about 10 to about 40
nucleotides and most preferably from about 15 to about 40
nucleotides in length. These oligonucleotide probes have sequences
complementary to particular subsequences of the genes whose
expression they are designed to detect. Thus, the test probes are
capable of specifically hybridizing to the target nucleic acid they
are to detect.
[0090] In addition to test probes that bind the target nucleic
acid(s) of interest, the high density array can contain a number of
control probes. The control probes fall into three categories
referred to herein as 1) Normalization controls; 2) Expression
level controls; and 3) Mismatch controls.
[0091] 2) Normalization Controls.
[0092] Normalization controls are oligonucleotide probes that are
perfectly complementary to labeled reference oligonucleotides that
are added to the nucleic acid sample. The signals obtained from the
normalization controls after hybridization provide a control for
variations in hybridization conditions, label intensity, "reading"
efficiency and other factors that may cause the signal of a perfect
hybridization to vary between arrays. In a preferred embodiment,
signals (e.g., fluorescence intensity) read from all other probes
in the array are divided by the signal (e.g., fluorescence
intensity) from the control probes thereby normalizing the
measurements.
[0093] Virtually any probe may serve as a normalization control.
However, it is recognized that hybridization efficiency varies with
base composition and probe length. Preferred normalization probes
are selected to reflect the average length of the other probes
present in the array, however, they can be selected to cover a
range of lengths. The normalization control(s) can also be selected
to reflect the (average) base composition of the other probes in
the array, however in a preferred embodiment, only one or a few
normalization probes are used and they are selected such that they
hybridize well (i.e. no secondary structure) and do not match any
target-specific probes.
[0094] Normalization probes can be localized at any position in the
array or at multiple positions throughout the array to control for
spatial variation in hybridization efficiently. In a preferred
embodiment, the normalization controls are located at the corners
or edges of the array as well as in the middle.
[0095] 3) Expression Level Controls.
[0096] Expression level controls are probes that hybridize
specifically with constitutively expressed genes in the biological
sample. Expression level controls are designed to control for the
overall health and metabolic activity of a cell. Examination of the
covariance of an expression level control with the expression level
of the target nucleic acid indicates whether measured changes or
variations in expression level of a gene is due to changes in
transcription rate of that gene or to general variations in health
of the cell. Thus, for example, when a cell is in poor health or
lacking a critical metabolite the expression levels of both an
active target gene and a constitutively expressed gene are expected
to decrease. The converse is also true. Thus where the expression
levels of both an expression level control and the target gene
appear to both decrease or to both increase, the change may be
attributed to changes in the metabolic activity of the cell as a
whole, not to differential expression of the target gene in
question. Conversely, where the expression levels of the target
gene and the expression level control do not covary, the variation
in the expression level of the target gene is attributed to
differences in regulation of that gene and not to overall
variations in the metabolic activity of the cell.
[0097] Virtually any constitutively expressed gene provides a
suitable target for expression level controls. Typically expression
level control probes have sequences complementary to subsequences
of constitutively expressed "housekeeping genes" including, but not
limited to the .beta.-actin gene, the transferrin receptor gene,
the GAPDH gene, and the like.
[0098] 4) Mismatch Controls.
[0099] Mismatch controls may also be provided for the probes to the
target genes, for expression level controls or for normalization
controls. Mismatch controls are oligonucleotide probes identical to
their corresponding test or control probes except for the presence
of one or more mismatched bases. A mismatched base is a base
selected so that it is not complementary to the corresponding base
in the target sequence to which the probe would otherwise
specifically hybridize. One or more mismatches are selected such
that under appropriate hybridization conditions (e.g.stringent
conditions) the test or control probe would be expected to
hybridize with its target sequence, but the mismatch probe would
not hybridize (or would hybridize to a significantly lesser
extent). Preferred mismatch probes contain a central mismatch.
Thus, for example, where a probe is a 20 mer, a corresponding
mismatch probe will have the identical sequence except for a single
base mismatch (e.g., substituting a G, a C or a T for an A) at any
of positions 6 through 14 (the central mismatch).
[0100] Mismatch probes thus provide a control for non-specific
binding or cross-hybridization to a nucleic acid in the sample
other than the target to which the probe is directed. Mismatch
probes thus indicate whether a hybridization is specific or not.
For example, if the target is present the perfect match probes
should be consistently brighter than the mismatch probes. In
addition, if all central mismatches are present, the mismatch
probes can be used to detect a mutation. Finally, it was also a
discovery of the present invention that the difference in intensity
between the perfect match and the mismatch probe (I(PM)-I(MM))
provides a good measure of the concentration of the hybridized
material.
[0101] 5) Sample Preparation/Amplification Controls.
[0102] The high density array may also include sample
preparation/amplification control probes. These are probes that are
complementary to subsequences of control genes selected because
they do not normally occur in the nucleic acids of the particular
biological sample being assayed. Suitable sample
preparation/amplification control probes include, for example,
probes to bacterial genes (e.g., Bio B) where the sample in
question is a biological from a eukaryote.
[0103] The RNA sample is then spiked with a known amount of the
nucleic acid to which the sample preparation/amplification control
probe is directed before processing. Quantification of the
hybridization of the sample preparation/amplification control probe
then provides a measure of alteration in the abundance of the
nucleic acids caused by processing steps (e.g. PCR, reverse
transcription, in vitro transcription, etc.).
[0104] B) "Test Probe" Selection and Optimization.
[0105] In a preferred embodiment, oligonucleotide probes in the
high density array are selected to bind specifically to the nucleic
acid target to which they are directed with minimal non-specific
binding or cross-hybridization under the particular hybridization
conditions utilized. Because the high density arrays of this
invention can contain in excess of 1,000,000 different probes, it
is possible to provide every probe of a characteristic length that
binds to a particular nucleic acid sequence. Thus, for example, the
high density array can contain every possible 20 mer sequence
complementary to an IL-2 mRNA.
[0106] There, however, may exist 20 mer subsequences that are not
unique to the IL-2 mRNA. Probes directed to these subsequences are
expected to cross hybridize with occurrences of their complementary
sequence in other regions of the sample genome. Similarly, other
probes simply may not hybridize effectively under the hybridization
conditions (e.g., due to secondary structure, or interactions with
the substrate or other probes). Thus, in a preferred embodiment,
the probes that show such poor specificity or hybridization
efficiency are identified and may not be included either in the
high density array itself (e.g., during fabrication of the array)
or in the post-hybridization data analysis.
[0107] Thus, in one embodiment, this invention provides for a
method of optimizing a probe set for detection of a particular
gene. Generally, this method involves providing a high density
array containing a multiplicity of probes of one or more particular
length(s) that are complementary to subsequences of the mRNA
transcribed by the target gene. In one embodiment the high density
array may contain every probe of a particular length that is
complementary to a particular mRNA. The probes of the high density
array are then hybridized with their target nucleic acid alone and
then hybridized with a high complexity, high concentration nucleic
acid sample that does not contain the targets complementary to the
probes. Thus, for example, where the target nucleic acid is an RNA,
the probes are first hybridized with their target nucleic acid
alone and then hybridized with RNA made from a cDNA library (e.g.,
reverse transcribed polyA.sup.+ mRNA) where the sense of the
hybridized RNA is opposite that of the target nucleic acid (to
insure that the high complexity sample does not contain targets for
the probes). Those probes that show a strong hybridization signal
with their target and little or no cross-hybridization with the
high complexity sample are preferred probes for use in the high
density arrays of this invention.
[0108] The high density array may additionally contain mismatch
controls for each of the probes to be tested. In a preferred
embodiment, the mismatch controls contain a central mismatch. Where
both the mismatch control and the target probe show high levels of
hybridization (e.g., the hybridization to the mismatch is nearly
equal to or greater than the hybridization to the corresponding
test probe), the test probe is preferably not used in the high
density array.
[0109] In a particularly preferred embodiment, optimal probes are
selected according to the following method: First, as indicated
above, an array is provided containing a multiplicity of
oligonucleotide probes complementary to subsequences of the target
nucleic acid. The oligonucleotide probes may be of a single length
or may span a variety of lengths ranging from 5 to 50 nucleotides.
The high density array may contain every probe of a particular
length that is complementary to a particular mRNA or may contain
probes selected from various regions of particular mRNAs. For each
target-specific probe the array also contains a mismatch control
probe; preferably a central mismatch control probe.
[0110] The oligonucleotide array is hybridized to a sample
containing target nucleic acids having subsequences complementary
to the oligonucleotide probes and the difference in hybridization
intensity between each probe and its mismatch control is
determined. Only those probes where the difference between the
probe and its mismatch control exceeds a threshold hybridization
intensity (e.g. preferably greater than 10% of the background
signal intensity, more preferably greater than 20% of the
background signal intensity and most preferably greater than 50% of
the background signal intensity) are selected. Thus, only probes
that show a strong signal compared to their mismatch control are
selected.
[0111] The probe optimization procedure can optionally include a
second round of selection. In this selection, the oligonucleotide
probe array is hybridized with a nucleic acid sample that is not
expected to contain sequences complementary to the probes. Thus,
for example, where the probes are complementary to the RNA sense
strand a sample of antisense RNA is provided. Of course, other
samples could be provided such as samples from organisms or cell
lines known to be lacking a particular gene, or known for not
expressing a particular gene.
[0112] Only those probes where both the probe and its mismatch
control show hybridization intensities below a threshold value
(e.g. less than about 5 times the background signal intensity,
preferably equal to or less than about 2 times the background
signal intensity, more preferably equal to or less than about 1
times the background signal intensity, and most preferably equal or
less than about half background signal intensity) are selected. In
this way probes that show minimal non-specific binding are
selected. Finally, in a preferred embodiment, the n probes (where n
is the number of probes desired for each target gene) that pass
both selection criteria and have the highest hybridization
intensity for each target gene are selected for incorporation into
the array, or where already present in the array, for subsequent
data analysis. Of course, one of skill in the art, will appreciate
that either selection criterion could be used alone for selection
of probes.
[0113] III. Synthesis of High Density Arrays
[0114] Methods of forming high density arrays of oligonucleotides,
peptides and other polymer sequences with a minimal number of
synthetic steps are known. The oligonucleotide analogue array can
be synthesized on a solid substrate by a variety of methods,
including, but not limited to, light-directed chemical coupling,
and mechanically directed coupling. See Pirrung et al., U.S. Pat.
No. 5,143,854 (see also PCI Application No. WO 90/15070) and Fodor
et al., PCT Publication Nos. WO 92/10092 and WO 93/09668 which
disclose methods of forming vast arrays of peptides,
oligonucleotides and other molecules using, for example,
light-directed synthesis techniques. See also, Fodor et al.,
Science, 251, 767-77 (1991). These procedures for synthesis of
polymer arrays are now referred to as VLSIP.TM. procedures. Using
the VLSIPS.TM. approach, one heterogenous array of polymers is
converted, through simultaneous coupling at a number of reaction
sites, into a different heterogenous array. See, U.S. application
Ser. Nos. 07/796,243 and 07/980,523.
[0115] The development of VLSIPS.TM. technology as described in the
above-noted U.S. Pat. No. 5,143,854 and PCT patent publication Nos.
WO 90/15070 and 92/10092, is considered pioneering technology in
the fields of combinatorial synthesis and screening of
combinatorial libraries. More recently, patent application Ser. No.
08/082,937, filed Jun. 25, 1993 describes methods for making arrays
of oligonucleotide probes that can be used to check or determine a
partial or complete sequence of a target nucleic acid and to detect
the presence of a nucleic acid containing a specific
oligonucleotide sequence.
[0116] In brief, the light-directed combinatorial synthesis of
oligonucleotide arrays on a glass surface proceeds using automated
phosphoramidite chemistry and chip masking techniques. In one
specific implementation, a glass surface is derivatized with a
silane reagent containing a functional group, e.g., a hydroxyl or
amine group blocked by a photolabile protecting group. Photolysis
through a photolithogaphic mask is used selectively to expose
functional groups which are then ready to react with incoming
5'-photoprotected nucleoside phosphoramidites. The phosphoramidites
react only with those sites which are illuminated (and thus exposed
by removal of the photolabile blocking group). Thus, the
phosphoramidites only add to those areas selectively exposed from
the preceding step. These steps are repeated until the desired
array of sequences have been synthesized on the solid surface.
Combinatorial synthesis of different oligonucleotide analogues at
different locations on the array is determined by the pattern of
illumination during synthesis and the order of addition of coupling
reagents.
[0117] In the event that an oligonucleotide analogue with a
polyamide backbone is used in the VLSIPS.TM. procedure, it is
generally inappropriate to use phosphoramidite chemistry to perform
the synthetic steps, since the monomers do not attach to one
another via a phosphate linkage. Instead, peptide synthetic methods
are substituted. See, e.g., Pirrung et al. U.S. Pat. No.
5,143,854.
[0118] Peptide nucleic acids are commercially available from, e.g.,
Biosearch, Inc. (Bedford, Mass.) which comprise a polyamide
backbone and the bases found in naturally occurring nucleosides.
Peptide nucleic acids are capable of binding to nucleic acids with
high specificity, and are considered "oligonucleotide analogues"
for purposes of this disclosure.
[0119] In addition to the foregoing, additional methods which can
be used to generate an array of oligonucleotides on a single
substrate are described in co-pending Applications Ser. No.
07/980,523, filed Nov. 20, 1992, and 07/796,243, filed Nov. 22,
1991 and in PCT Publication No. WO 93/09668. In the methods
disclosed in these applications, reagents are delivered to the
substrate by either (1) flowing within a channel defined on
predefined regions or (2) "spotting" on predefined regions.
However, other approaches, as well as combinations of spotting and
flowing, may be employed. In each instance, certain activated
regions of the substrate are mechanically separated from other
regions when the monomer solutions are delivered to the various
reaction sites.
[0120] A typical "flow channel" method applied to the compounds and
libraries of the present invention can generally be described as
follows. Diverse polymer sequences are synthesized at selected
regions of a substrate or solid support by forming flow channels on
a surface of the substrate through which appropriate reagents flow
or in which appropriate reagents are placed. For example, assume a
monomer "A" is to be bound to the substrate in a first group of
selected regions. If necessary, all or part of the surface of the
substrate in all or a part of the selected regions is activated for
binding by, for example, flowing appropriate reagents through all
or some of the channels, or by washing the entire substrate with
appropriate reagents. After placement of a channel block on the
surface of the substrate, a reagent having the monomer A flows
through or is placed in all or some of the channel(s). The channels
provide fluid contact to the first selected regions, thereby
binding the monomer A on the substrate directly or indirectly (via
a spacer) in the first selected regions.
[0121] Thereafter, a monomer B is coupled to second selected
regions, some of which may be included among the first selected
regions. The second selected regions will be in fluid contact with
a second flow channel(s) through translation, rotation, or
replacement of the channel block on the surface of the substrate;
through opening or closing a selected valve; or through deposition
of a layer of chemical or photoresist. If necessary, a step is
performed for activating at least the second regions. Thereafter,
the monomer B is flowed through or placed in the second flow
channel(s), binding monomer B at the second selected locations. In
this particular example, the resulting sequences bound to the
substrate at this stage of processing will be, for example, A, B,
and AB. The process is repeated to form a vast array of sequences
of desired length at known locations on the substrate.
[0122] After the substrate is activated, monomer A can be flowed
through some of the channels, monomer B can be flowed through other
channels, a monomer C can be flowed through still other channels,
etc. In this manner, many or all of the reaction regions are
reacted with a monomer before the channel block must be moved or
the substrate must be washed and/or reactivated. By making use of
many or all of the available reaction regions simultaneously, the
number of washing and activation steps can be minimized.
[0123] One of skill in the art will recognize that there are
alternative methods of forming channels or otherwise protecting a
portion of the surface of the substrate. For example, according to
some embodiments, a protective coating such as a hydrophilic or
hydrophobic coating (depending upon the nature of the solvent) is
utilized over portions of the substrate to be protected, sometimes
in combination with materials that facilitate wetting by the
reactant solution in other regions. In this manner, the flowing
solutions are further prevented from passing outside of their
designated flow paths.
[0124] The "spotting" methods of preparing compounds and libraries
of the present invention can be implemented in much the same manner
as the flow channel methods. For example, a monomer A can be
delivered to and coupled with a first group of reaction regions
which have been appropriately activated. Thereafter, a monomer B
can be delivered to and reacted with a second group of activated
reaction regions. Unlike the flow channel embodiments described
above, reactants are delivered by directly depositing (rather than
flowing) relatively small quantities of them in selected regions.
In some steps, of course, the entire substrate surface can be
sprayed or otherwise coated with a solution. In preferred
embodiments, a dispenser moves from region to region, depositing
only as much monomer as necessary at each stop. Typical dispensers
include a micropipette to deliver the monomer solution to the
substrate and a robotic system to control the position of the
micropipette with respect to the substrate. In other embodiments,
the dispenser includes a series of tubes, a manifold, an array of
pipettes, or the like so that various reagents can be delivered to
the reaction regions simultaneously.
[0125] IV. Hybridization.
[0126] Nucleic acid hybridization simply involves providing a
denatured probe and target nucleic acid under conditions where the
probe and its complementary target can form stable hybrid duplexes
through complementary base pairing. The nucleic acids that do not
form hybrid duplexes are then washed away leaving the hybridized
nucleic acids to be detected, typically through detection of an
attached detectable label. It is generally recognized that nucleic
acids are denatured by increasing the temperature or decreasing the
salt concentration of the buffer containing the nucleic acids.
Under low stringency conditions (e.g., low temperature and/or high
salt) hybrid duplexes (e.g., DNA:DNA, RNA:RNA, or RNA:DNA) will
form even where the annealed sequences are not perfectly
complementary. Thus specificity of hybridization is reduced at
lower stringency. Conversely, at higher stringency (e.g., higher
temperature or lower salt) successful hybridization requires fewer
mismatches.
[0127] One of skill in the art will appreciate that hybridization
conditions may be selected to provide any degree of stringency. In
a preferred embodiment, hybridization is performed at low
stringency in this case in 6.times.SSPE-T at 37.degree. C. (0.005%
Triton X-100) to ensure hybridization and then subsequent washes
are performed at higher stringency (e.g., 1.times.SSPE-T at
37.degree. C.) to eliminate mismatched hybrid duplexes. Successive
washes may be performed at increasingly higher stringency (e.g.,
down to as low as 0.25.times.SSPE-T at 37.degree. C. to 50.degree.
C.) until a desired level of hybridization specificity is obtained.
Stringency can also be increased by addition of agents such as
formamide. Hybridization specificity may be evaluated by comparison
of hybridization to the test probes with hybridization to the
various controls that can be present (e.g., expression level
control, normalization control, mismatch controls, etc.).
[0128] In general, there is a tradeoff between hybridization
specificity (stringency) and signal intensity. Thus, in a preferred
embodiment, the wash is performed at the highest stringency that
produces consistent results and that provides a signal intensity
greater than approximately 10% of the background intensity. Thus,
in a preferred embodiment, the hybridized array may be washed at
successively higher stringency solutions and read between each
wash. Analysis of the data sets thus produced will reveal a wash
stringency above which the hybridization pattern is not appreciably
altered and which provides adequate signal for the particular
oligonucleotide probes of interest.
[0129] In a preferred embodiment, background signal is reduced by
the use of a detergent (e.g., C-TAB) or a blocking reagent (e.g.,
sperm DNA, cot-1 DNA, etc.) during the hybridization to reduce
non-specific binding. In a particularly preferred embodiment, the
hybridization is performed in the presence of about 0.5 mg/ml DNA
(e.g., herring sperm DNA). The use of blocking agents in
hybridization is well known to those of skill in the art (see,
e.g., Chapter 8 in P. Tijssen, supra.)
[0130] The stability of duplexes formed between RNAs or DNAs are
generally in the order of RNA:RNA>RNA:DNA>DNA:DNA, in
solution. Long probes have better duplex stability with a target,
but poorer mismatch discrimination than shorter probes (mismatch
discrimination refers to the measured hybridization signal ratio
between a perfect match probe and a single base mismatch probe).
Shorter probes (e.g., 8-mers) discriminate mismatches very well,
but the overall duplex stability is low.
[0131] Altering the thermal stability (T.sub.m) of the duplex
formed between the target and the probe using, e.g., known
oligonucleotide analogues allows for optimization of duplex
stability and mismatch discrimination. One useful aspect of
altering the T.sub.m arises from the fact that adenine-thymine
(A-T) duplexes have a lower T.sub.m than guanine-cytosine (G-C)
duplexes due in part to the fact that the A-T duplexes have 2
hydrogen bonds per base-pair, while the G-C duplexes have 3
hydrogen bonds per base pair. In heterogeneous oligonucleotide
arrays in which there is a non-uniform distribution of bases, it is
not generally possible to optimize hybridization for each
oligonucleotide probe simultaneously. Thus, in some embodiments, it
is desirable to selectively destabilize G-C duplexes and/or to
increase the stability of A-T duplexes. This can be accomplished,
e.g., by substituting guanine residues in the probes of an array
which form G-C duplexes with hypoxanthine, or by substituting
adenine residues in probes which form A-T duplexes with 2,6
diaminopurine or by using the salt tetramethyl ammonium chloride
(TMACl) in place of NaCl.
[0132] Altered duplex stability conferred by using oligonucleotide
analogue probes can be ascertained by following, e.g., fluorescence
signal intensity of oligonucleotide analogue arrays hybridized with
a target oligonucleotide over time. The data allow optimization of
specific hybridization conditions at, e.g., room temperature (for
simplified diagnostic applications in the future).
[0133] Another way of verifying altered duplex stability is by
following the signal intensity generated upon hybridization with
time. Previous experiments using DNA targets and DNA chips have
shown that signal intensity increases with time, and that the more
stable duplexes generate higher signal intensities faster than less
stable duplexes. The signals reach a plateau or "saturate" after a
certain amount of time due to all of the binding sites becoming
occupied. These data allow for optimization of hybridization, and
determination of the best conditions at a specified
temperature.
[0134] Methods of optimizing hybridization conditions are well
known to those of skill in the art (see, e.g., Laboratory
Techniques in Biochemistry and Molecular Biology, Vol 24:
Hybridization With Nucleic Acid Probes, P. Tijssen, ed. Elsevier,
N.Y., (1993)).
[0135] V. Signal Detection.
[0136] Means of detecting labeled target (sample) nucleic acids
hybridized to the probes of the high density array are known to
those of skill in the art. Thus, for example, where a calorimetric
label is used, simple visuaization of the label is sufficient.
Where a radioactive labeled probe is used, detection of the
radiation (e.g with photographic film or a solid state detector) is
sufficient.
[0137] In a preferred embodiment, however, the target nucleic acids
are labeled with a fluorescent label and the localization of the
label on the probe array is accomplished with fluorescent
microscopy. The hybridized array is excited with a light source at
the excitation wavelength of the particular fluorescent label and
the resulting fluorescence at the emission wavelength is detected.
In a particularly preferred embodiment, the excitation light source
is a laser appropriate for the excitation of the fluorescent
label.
[0138] The confocal microscope may be automated with a
computer-controlled stage to automatically scan the entire high
density array. Similarly, the microscope may be equipped with a
phototransducer (e.g., a photomultiplier, a solid state array, a
ccd camera, etc.) attached to an automated data acquisition system
to automatically record the fluorescence signal produced by
hybridization to each oligonucleotide probe on the array. Such
automated systems are described at length in U.S. Pat. No:
5,143,854, PCT Application 20 92/10092, and copending U.S. Ser. No.
08/195,889 filed on Feb. 10, 1994. Use of laser illumination in
conjunction with automated confocal microscopy for signal detection
permits detection at a resolution of better than about 100 .mu.m,
more preferably better than about 50 .mu.m, and most preferably
better than about 25 .mu.m.
[0139] VI. Signal Evaluation.
[0140] One of skill in the art will appreciate that methods for
evaluating the hybridization results vary with the nature of the
specific probe nucleic acids used as well as the controls provided.
In the simplest embodiment, simple quantification of the
fluorescence intensity for each probe is determined. This is
accomplished simply by measuring probe signal strength at each
location (representing a different probe) on the high density array
(e.g., where the label is a fluorescent label, detection of the
amount of florescence (intensity) produced by a fixed excitation
illumination at each location on the array). Comparison of the
absolute intensities of an array hybridized to nucleic acids from a
"test" sample with intensities produced by a "control" sample
provides a measure of the relative expression of the nucleic acids
that hybridize to each of the probes.
[0141] One of skill in the art, however, will appreciate that
hybridization signals will vary in strength with efficiency of
hybridization, the amount of label on the sample nucleic acid and
the amount of the particular nucleic acid in the sample. Typically
nucleic acids present at very low levels (e.g., <1 pM) will show
a very weak signal. At some low level of concentration, the signal
becomes virtually indistinguishable from background. In evaluating
the hybridization data, a threshold intensity value may be selected
below which a signal is not counted as being essentially
indistinguishable from background.
[0142] Where it is desirable to detect nucleic acids expressed at
lower levels, a lower threshold is chosen. Conversely, where only
high expression levels are to be evaluated a higher threshold level
is selected. In a preferred embodiment, a suitable threshold is
about 10% above that of the average background signal.
[0143] In addition, the provision of appropriate controls permits a
more detailed analysis that controls for variations in
hybridization conditions, cell health, non-specific binding and the
like. Thus, for example, in a preferred embodiment, the
hybridization array is provided with normalization controls as
described above in Section II.A.2. These normalization controls are
probes complementary to control sequences added in a known
concentration to the sample. Where the overall hybridization
conditions are poor, the normalization controls will show a smaller
signal reflecting reduced hybridization. Conversely, where
hybridization conditions are good, the normalization controls will
provide a higher signal reflecting the improved hybridization.
Normalization of the signal derived from other probes in the array
to the normalization controls thus provides a control for
variations in hybridization conditions. Typically, normalization is
accomplished by dividing the measured signal from the other probes
in the array by the average signal produced by the normalization
controls. Normalization may also include correction for variations
due to sample preparation and amplification. Such normalization may
be accomplished by dividing the measured signal by the average
signal from the sample preparation/amplification control probes
(e.g., the Bio B probes). The resulting values may be multiplied by
a constant value to scale the results.
[0144] As indicated above, the high density array can include
mismatch controls. In a preferred embodiment, there is a mismatch
control having a central mismatch for every probe (except the
normalization controls) in the array. It is expected that after
washing in stringent conditions, where a perfect match would be
expected to hybridize to the probe, but not to the mismatch, the
signal from the mismatch controls should only reflect non-specific
binding or the presence in the sample of a nucleic acid that
hybridizes with the mismatch. Where both the probe in question and
its corresponding mismatch control both show high signals, or the
mismatch shows a higher signal than its corresponding test probe,
there is a problem with the hybridization and the signal from those
probes is ignored. The difference in hybridization signal intensity
between the target specific probe and its corresponding mismatch
control is a measure of the discrimination of the target-specific
probe. Thus, in a preferred embodiment, the signal of the mismatch
probe is subtracted from the signal from its corresponding test
probe to provide a measure of the signal due to specific binding of
the test probe.
[0145] The concentration of a particular sequence can then be
determined by measuring the signal intensity of each of the probes
that bind specifically to that gene and normalizing to the
normalization controls. Where the signal from the probes is greater
than the mismatch, the mismatch is subtracted. Where the mismatch
intensity is equal to or greater than its corresponding test probe,
the signal is ignored. The expression level of a particular gene
can then be scored by the number of positive signals (either
absolute or above a threshold value), the intensity-of the positive
signals (either absolute or above a selected threshold value), or a
combination of both metrics (e.g., a weighted average).
[0146] It is a surprising discovery of this invention, that
normalization controls are often unnecessary for useful
quantification of a hybridization signal. Thus, where optimal
probes have been identified in the two step selection process as
described above, in Section II.B., the average hybridization signal
produced by the selected optimal probes provides a good quantified
measure of the concentration of hybridized nucleic acid.
[0147] VII. Monitoring Expression Levels
[0148] As indicated above, the methods of this invention may be
used to monitor expression levels of a gene in a wide variety of
contexts. For example, where the effects of a drug on gene
expression is to be determined the drug will be administered to an
organism, a tissue sample, or a cell. Nucleic acids from the tissue
sample, cell, or a biological sample from the organism and from an
untreated organism tissue sample or cell are isolated as described
above, hybridized to a high density probe array containing probes
directed to the gene of interest and the expression levels of that
gene are determined as described above.
[0149] Similarly, where the expression levels of a disease marker
(e.g., P53, RTK, or HER2) are to be detected (e.g., for the
diagnosis of a pathological condition in a patient), comparison of
the expression levels of the disease marker in the sample to
disease markers from a healthy organism will reveal any deviations
in the expression levels of the marker in the test sample as
compared to the healthy sample. Correlation of such deviations with
a pathological condition provides a diagnostic assay for that
condition.
EXAMPLES
[0150] The following examples are offered to illustrate, but not to
limit the present invention.
Example 1
Detection of the Expression Levels of Target Genes
[0151] Experiments were designed to evalutate the specificity of
hybridization, the relationship between hybridization signal and
concentration of target nucleic acid, and the quantifiability of
RNA detection at low concentration levels. These experiments
involved hybridizing labeled RNA from a number of preselected genes
(IL-2, IL-3, IL4, IL-6, IL-10, IL-12p40, GM-CSF, IFN-.gamma.,
TNF-.alpha., mCTLA8, .beta.-actin, GAPDH, IL-11 receptor, and Bio
B) to a high density oligonucleotide probe array comprising a large
number of probes complementary to subsequences of these genes (see,
Section B, below for a description of the array) in the presence or
absence of an RNA sample transcribed from a cDNA library. The
target genes were hybridized to the high density probe array either
individually, together, or individually or together in the presence
of labeled RNA transcribed from a murine cDNA library as described
below.
[0152] A) Preparation of Labeled RNA.
[0153] 1) From Each of the Preselected Genes.
[0154] Fourteen genes (IL-2, IL-3, I14, IL-6, IL-10, IL-12p40,
GM-CSF, IFN-.gamma., TNF-.alpha., CTLA8, .beta.-actin, GAPDH, IL-11
receptor, and Bio B) were each cloned into the p Bluescript II KS
(+) phagemid (Stratagene, La Jolla, Calif., USA). The orientation
of the insert was such that T3 RNA polymerase gave sense
transcripts and T7 polymerase gave antisense RNA.
[0155] In vitro transcription was done with cut templates in a
manner like that described by Melton et al., Nucleic Acids
Research, 12: 7035-7056 (1984). A typical in vitro transcription
reaction used 5 .mu.g DNA template, a buffer such as that included
in Ambion's Maxiscript in vitro Transcription Kit (Ambion Inc.,
Huston, Tex., USA) and GTP (3 mM), ATP (1.5 mM), UTP and
fluoresceinated UTP (3 mM total, UTP: F1-UTP 1:1) and CTP and
fluoresceinated CTP (2 mM total, CTP: FI-CTP, 3:1). Reactions done
in the Ambion buffer had 20 mM DTT and RNase inhibitor. The T7
polymerase was a high concentration polymerase (activity about 2500
units/.mu.L) available from Epicentre Technologies, Madison, Wis.,
USA. The reaction was run from 1.5 to about 8 hours.
[0156] The nucleotide triphosphates were removed using a
microcon-100 or Pharmacia microspin S-200 column. The labeled RNA
was then fragmented in a pH 8.1 Tris-HCl buffer containing 30 mM
Mg(OAc).sub.2 at 94.degree. C. for 30 to 40 minutes depending on
the length of the RNA transcript.
[0157] 2) From cDNA Libraries.
[0158] Labeled RNA was produced from one of two murine cell lines;
T10, a B cell plasmacytoma which was known not to express the genes
(except IL-10, actin and GAPDH) used as target genes in this study,
and 2D6, an IL-12 growth dependent T cell line (TH.sub.1 subtype)
that is known to express most of the genes used as target genes in
this study. Thus, RNA derived from the T10 cell line provided a
good total RNA baseline mixture suitable for spiking with known
quantities of RNA from the particular target genes. In contrast,
mRNA derived from the 2D6 cell line provided a good positive
control providing typical endogenously transcribed amounts of the
RNA from the target genes,
[0159] To produce the T10 cDNA library, cDNA was directionally
cloned into .lambda.SHlox-1 (GibcoBRL, Gaithersburg, Md., USA) at
EcoRI/HInd III to give a phage library. The phage library was
converted to a plasmid library using "automatic Cre-loxP plasmid
subcloning according to the method of Palazzolo, et al., Gene, 88:
25-36 (1990). After this the DNA was linearized with Not I and T7
polymerase was used to generate labeled T10 RNA in an in vitro
transcription reaction as described above.
[0160] Labeled 2D6 mRNA was produced by directionally cloning the
2D6 cDNA with .alpha.ZipLox, NotI-SalI arms available from GibcoBRL
in a manner similar to T10. The linearized pZ11 library was
transcribed with 17 to generate sense RNA as described above.
[0161] B) High Density Array Preparation
[0162] A high density array of 20 mer oligonucleotide probes was
produced using VLSIPS technology. The high density array included
the oligonucleotide probes as listed in Table 1. A central mismatch
control probe was provided for each gene-specific probe resulting
in a high density array containing over 16,000 different
oligonucleotide probes.
1TABLE 1 High density array design. For every probe there was also
a mismatch control having a central 1 base mismatch. Probe Type
Target Nucleic Acid Number of Probes Test Probes: IL-2 691 IL-3 751
IL-4 361 IL-6 691 IL-10 481 IL-12p40 911 GM-CSF 661 IFN-.gamma. 991
TNF-.alpha. 641 mCTLA8 391 IL-11 receptor 158 House Keeping Genes:
GAPDH 388 .beta.-actin 669 Bacterial gene (sample Bio B 286
preparation/amplification control) The high density array was
synthesized on a planar glass slide.
[0163] C) Hybridization Conditions.
[0164] The RNA transcribed from cDNA was then hybridized to the
high density oligonucleotide probe array at low stringency (e.g.,
in 6.times.SSPE-T with 0.5 mg/ml unlabeled, degraded herring sperm
DNA as a blocking agent, at 37.degree. C. for 18 hours). The
hybridized arrays were washed under progressively more stringent
conditions, (e.g., in 1.times.SSPE-T at 37.degree. C. for 7 minutes
down to 0.25.times.SSPE-T overnight) with the hybridized array
being read by a laser-illuminated scanning confocal fluorescence
microscope between washes.
[0165] It was discovered that the excess RNA in the sample
frequently bound up the high density array probes and/or targets
and apparently prevented the probes from specifically binding with
their intended target. This problem was obviated by hybridizing at
temperatures over 30.degree. C. and/or adding CTAB
(cetyltrimethylammonium bromide) a detergent.
[0166] D) Optimization of Probe Selection
[0167] In order to optimize probe selection for each of the target
genes, the high density array of oligonucleotide probes was
hybridized with the mixture of labeled RNAs transcribed from each
of the target genes. Fluorescence intensity at each location on the
high density array was determined by scanning the high density
array with a laser illuminated scanning confocal fluorescence
microscope connected to a data acquisition system.
[0168] Probes were then selected for further data analysis in a
two-step procedure. First, in order to be counted, the difference
in intensity between a probe and its corresponding mismatch probe
had to exceed a threshold limit (50 counts, or about half
background, in this case). This eliminated from consideration
probes that did not hybridize well and probes for which the
mismatch control hybridizes at an intensity comparable to the
perfect match.
[0169] The high density array was hybridized to a labeled RNA
sample which, in principle, contains none of the sequences on the
high density array. In this case, the oligonucleotide probes were
chosen to be complementary to the sense RNA. Thus, an anti-sense
RNA population should have been incapable of hybridizing to any of
the probes on the array. Where either a probe or its mismatch
showed a signal above a threshold value (100 counts above
background) it was not included in subsequent analysis.
[0170] Then, the signal for a particular gene was counted as the
average difference (perfect match--mismatch control) for the
selected probes for each gene.
[0171] D) Interpretation of Results.
[0172] 1) Specificity of Hybridization
[0173] In order to evaluate the specificity of hybridization, the
high density array described above was hybridized with 50 pM of the
RNA sense strand of IL-2, IL-3, IL-4, IL-6, Actin, GAPDH and Bio B
or IL-10, IL-12p40, GM-CSF, IFN-.gamma., TNF-.alpha., mCTLA8 and
Bio B. The hybridized array showed strong specific signals for each
of the test target nucleic acids with minimal cross
hybridization.
[0174] 2) Relationship between Target Concentration and
Hybridization Signal
[0175] In order to evaluate the relationship between hybridization
signal and target probe concentration, hybridization intensity was
measured as a function of concentration of the RNAs for one or more
of the target genes. FIG. 1 shows the results of this experiment.
Graphs A and B are plots of the hybridization intensity of high
concentrations (50 pM to 10 nM) of IL-4 hybridized to the array for
90 minutes at 22.degree. C. Plot B merely expands the ordinate of
plot A to show the low concentration values. In both plots, the
hybridization signal increases with target concentration and the
signal level is proportional to the RNA concentration between 50 pM
and 1 nM.
[0176] Graphs C and D are plots of the average hybridization
intensity differences of the 1000 most intense probes when the
array is hybridized, for 15 hours at 37.degree. C., to a mixture of
0.5 pM to 20 pM each of labeled RNA from IL-2, IL-3, IL4, IL-6,
IL-10, GM-CSF, IFN-.gamma., TNF-.alpha., mCTLA8, B-actin, GAPDH,
and Bio B. Even a signal, in effect, averaged across 13 different
target RNAs, shows an intensity proportional to target RNA
concentration. Again, Graph D expands the ordinate of plot A to
show the low concentration signal.
[0177] At high target nucleic acid concentration, the hybridization
time could be decreased, while at lower target nucleic acid
concentration, the hybridization time should be increased. By
varying hybridization time, it is possible to obtain a
substantially linear relationship between target RNA concentration
and hybridization intensity for a wide range of target RNA
concentrations.
[0178] 3) Detection of Gene Expression Levels in a Complex Target
Sample.
[0179] In order to evaluate the ability of the high density array
described above to measure variations in expression levels of the
target genes, hybridization was performed with the T10 murine
library RNA, the library spiked with 10 pM each of mCILA8, IL-6,
IL-3, IFN-.gamma., and IL-12 and 50 pM of each of these RNA
transcripts prepared as described above.
[0180] Because simply spiking the RNA mixture with the selected
target genes and then immediately hybridizing might provide an
artificially elevated reading relative to the rest of the mixture,
the spiked sample was treated to a series of procedures to mitigate
differences between the library RNA and the added RNA. Thus the
"spike" was added to the sample which was then heated to 37.degree.
C. and annealed. The sample was then frozen, thawed, boiled for 5
minutes, cooled on ice and allowed to return to room temperature
before performing the hybridization.
[0181] The sample was then hybridized at low stringency and washed
at progressively higher stringency as described above. The best
probes for each target gene were selected as described above, in
Section D, and the average intensity of the difference (perfect
match--mismatch) of the probes for each target gene is plotted in
FIGS. 2 and 3.
[0182] A 50 pM spike represents a target mRNA concentration of
about 1 in 24,000, while a 10 pM spike represents a target mRNA
concentration of about 1 in 120,000. As illustrated in FIG. 2, the
high density array easily resolves and quantifies the relative
expression levels of each of the target genes in one simultaneous
hybridization. Moreover, the relative expression level is
quantifiable with a 5 fold difference in concentration of the
target mRNA resulting in a 3 to 6 fold difference in hybridization
intensity for the five spiked targets.
[0183] FIG. 3 replots FIG. 2 on a condensed scale so that the
expression levels of constitutively expressed GAPDH and Actin and
the level of IL-10 which is endogenously expressed by the cell
line, is visible. It is notable that the single hybridization to
the array resolved expression levels varying from 1 in 1000 for
GAPDH to 1 in 124,000 for the spiked mRNAs without the high
concentration RNA (the RNA library) overwhelming the signal from
the genes expressed at low levels (e.g., IL-10).
[0184] It is also worthy of note that the endogenous (intrinsic)
IL-10 was transcribed at a level comparable to or lower than the
spiked RNAs (see FIG. 2) and the method thus is capable of
quantifying the levels of transcription of genes that are
transcribed at physiologically realistic levels.
[0185] The method described herein thus easily quantifies changes
in RNA concentrations of 5 to 10 fold. Detection is highly specific
and quantitative at levels as low as 1 in 120,000. The sensitivity
and specificity is sufficient to detect low concentration RNAs
(comparable to about 20 to 30 per cell) in the presence of total
mammalian cell message populations. Other experiments have detected
concentrations as low as 1 in 300,000, comparable to about 10 RNAs
per cell and the method clearly provides a means for simultaneously
screening transcription levels of literally hundreds of genes
simultaneously in a complex RNA pool.
[0186] It is understood that the examples and embodiments described
herein are for illustrative purposes only and that various
modifications or changes in light thereof will be suggested to
persons skilled in the art and are to be included within the spirit
and purview of this application and scope of the appended claims.
All publications, patents, and patent applications cited herein are
hereby incorporated by reference for all purposes.
* * * * *