U.S. patent application number 11/174042 was filed with the patent office on 2005-11-03 for array based methods for synthesizing nucleic acid mixtures.
Invention is credited to Amorese, Douglas A., Atwell, Andrew S., Ilsley, Diane D., Kincaid, Robert H., Wolber, Paul K..
Application Number | 20050244885 11/174042 |
Document ID | / |
Family ID | 24519019 |
Filed Date | 2005-11-03 |
United States Patent
Application |
20050244885 |
Kind Code |
A1 |
Wolber, Paul K. ; et
al. |
November 3, 2005 |
Array based methods for synthesizing nucleic acid mixtures
Abstract
Methods for generating mixtures of nucleic acids, e.g.,
oligonucleotide primers, are provided. In the subject methods, an
array is employed as template to generate mixtures of nucleic acids
via a template driven primer extension reaction. In preferred
embodiments, each probe on the array employed in the subject
methods comprises a constant domain and a variable domain, where
the constant domain is further characterized by having at least a
recognition domain. Also provided are the arrays employed in the
subject methods and kits for practicing the subject methods. The
subject methods find use in a variety of applications, including
the generation of target nucleic acids from an mRNA sample for use
in hybridization assays, e.g., differential gene expression
analyses.
Inventors: |
Wolber, Paul K.; (Los Altos,
CA) ; Kincaid, Robert H.; (Half Moon Bay, CA)
; Amorese, Douglas A.; (Los Altos, CA) ; Ilsley,
Diane D.; (San Jose, CA) ; Atwell, Andrew S.;
(Sunnyvale, CA) |
Correspondence
Address: |
AGILENT TECHNOLOGIES, INC.
INTELLECTUAL PROPERTY ADMINISTRATION, LEGAL DEPT.
P.O. BOX 7599
M/S DL429
LOVELAND
CO
80537-0599
US
|
Family ID: |
24519019 |
Appl. No.: |
11/174042 |
Filed: |
July 1, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11174042 |
Jul 1, 2005 |
|
|
|
09628472 |
Jul 31, 2000 |
|
|
|
Current U.S.
Class: |
435/6.12 ;
435/6.1; 435/91.2 |
Current CPC
Class: |
B01J 2219/00608
20130101; B01J 2219/00572 20130101; C12Q 2525/131 20130101; C12Q
2565/525 20130101; C12Q 2525/161 20130101; C12Q 2521/301 20130101;
C12Q 2565/537 20130101; C12Q 2521/301 20130101; C12Q 2525/161
20130101; C40B 40/06 20130101; C12Q 2565/537 20130101; B01J
2219/00659 20130101; C40B 70/00 20130101; B01J 2219/00576 20130101;
B01J 2219/00675 20130101; B01J 2219/00596 20130101; C12Q 1/6837
20130101; B01J 2219/00722 20130101; C12Q 1/6883 20130101; B82Y
30/00 20130101; B01J 2219/00626 20130101; C40B 50/14 20130101; C12Q
1/6837 20130101; C12Q 1/6876 20130101; C12Q 1/6837 20130101; C12Q
1/6806 20130101; B01J 2219/00605 20130101; C12Q 1/6806 20130101;
B01J 2219/00497 20130101; C12Q 2600/158 20130101; B01J 2219/00574
20130101; B01J 2219/00617 20130101; B01J 2219/00585 20130101 |
Class at
Publication: |
435/006 ;
435/091.2 |
International
Class: |
C12Q 001/68; C12P
019/34 |
Claims
1-21. (canceled)
22. A method for producing a mixture of nucleic acids, said method
comprising: (a) providing an array of distinct single-stranded
probe nucleic acids of differing sequence immobilized on a surface
of a planar substrate where each distinct probe present on said
array comprises a constant domain and a complement variable domain;
wherein said complement variable domain is at the 5' end of said
each distinct probe; (b) hybridizing nucleic acids complementary to
said constant domain with said array of single-stranded probe
nucleic acids to produce a template array of overhang comprising
duplex nucleic acids, wherein each overhang comprising duplex
nucleic acid of said array comprises a double-stranded constant
region and a single-stranded variable region overhang; (c)
subjecting said template array of overhang comprising duplex
nucleic acids to a cyclic reaction or an in vitro transcription
protocol to produce a mixture of single stranded nucleic acids of
differing sequence; and (d) separating said mixture of nucleic
acids from said template array.
23. The method according to claim 22, wherein said mixture of
nucleic acids is a mixture of deoxyribo-oligonucleotides.
24. The method according to claim 22, wherein said step (c)
comprises a cyclic reaction.
25. The method according to claim 24, wherein said cyclic reaction
comprises a protocol selected from the group consisting of: linear
PCR and strand displacement amplification.
26. The method according to claim 22, wherein said constant domain
comprises at least one domain selected from the group consisting
of: a linker domain; a functional domain and a recognition
domain.
27. The method according to claim 22, wherein said step (c)
comprises an in vitro transcription protocol.
28. The method according to claim 27, wherein said constant domain
comprises at least one domain selected from the group consisting
of: a linker domain; a functional domain and a recognition
domain.
29. The method according to claim 28, wherein said functional
domain is an RNA polymerase promoter domain.
30. The method according to claim 22, wherein said array is
described by the formula: surface-L-R--F-cV-5'wherein: L is an
optional linking domain; R is a recognition domain; F is a
functional domain; and cV is said complement domain.
31. The method according to claim 30, wherein said hybridizing step
(b) comprises contacting said array with a population of nucleic
acids of the formula: 5'-cR-cF-3'wherein: cR is the complement of
R; and cF is the complement of F.
32. The method according to claim 31, wherein said template array
of overhang comprising duplex nucleic acids is described by the
formula: 1
33. The method according to claim 32, wherein each distinct
constituent member of said mixture produced by said method
comprises a different variable domain V.
34. The method according to claim 30, wherein said recognition
domain is recognized by a restriction endonuclease.
35. The method according to claim 22, wherein said array comprises
at least about 50 different single-stranded probe nucleic acids of
differing sequence.
36. The method according to claim 35, wherein said mixture of
nucleic acids produced by said method comprises at least about 50
nucleic acids of differing sequence.
37. The method according to claim 36, wherein each constituent
member of said mixture ranges in length from about 20 to 60 nt.
38. A method according to claim 22, wherein said method further
comprises employing said mixture of nucleic acids as primers in a
target generation step in which target nucleic acids are produced
from an mRNA sample to produce a population of target nucleic
acids.
39. The method according to claim 38, wherein said target
generation step (b) comprises a template driven primer extension
reaction.
40. The method according to claim 38, wherein said target
generation step (b) produces labeled target nucleic acids.
41. The method according to claim 38, wherein said method further
comprises contacting said set of target nucleic acids with an array
of probe nucleic acids under hybridization conditions and detecting
the presence of target nucleic acids hybridized to probe nucleic
acids of said array.
42. The method according to claim 41, wherein said target nucleic
acids are labeled.
43. The method according to claim 41, wherein said method further
comprises washing unbound target away from the surface of said
array.
Description
FIELD OF THE INVENTION
[0001] The field of this invention is molecular biology, and
particularly gene expression analysis.
BACKGROUND OF THE INVENTION
[0002] The characterization of cellular gene expression (i.e., gene
expression analysis) finds application in a variety of disciplines,
such as in the analysis of differential expression between
different tissue types, different stages of cellular growth or
between normal and diseased states.
[0003] Fundamental to differential expression analysis is the
detection of different mRNA species in a test sample, and often the
quantitative determination of different mRNA levels in that test
sample. In order to detect different mRNA levels in a given test
population, a population of labeled target nucleic acids that, at
least partially, reflects or mirrors the mRNA profile of the test
sample is produced. In other words, a population of labeled target
nucleic acids is generated where at least a portion of the mRNA
species in the test sample are represented, in terms of presence
and often in terms of amount. Following target generation, the
target population is contacted with one or more probe sequences,
e.g., as found on an array, whereby the presence and often amount
of specific targets in the target population is detected. From the
resultant data, information about the mRNAs present in the sample,
i.e., the mRNA profile and gene expression profile, can be readily
deduced.
[0004] A fundamental step in gene expression analysis assays is,
therefore, the step of labeled target generation. Target generation
protocols typically include a primer extension reaction, in which a
primer is contacted with an initial mRNA sample to produce a
labeled target population, as described above. In certain
protocols, polyA primers and variants thereof are employed.
Disadvantages of such protocols include the inability to produce
target from prokaryotic mRNA species that lack a polyA tail and the
propensity of such protocols to produce target that lacks 5' mRNA
information. While the use of random primers overcomes some of
these disadvantages, random primer protocols suffer from their own
disadvantages, e.g., lack of specificity resulting from increased
complexity in the primer mixture produced by the process, where not
only mRNA is represented, but also rRNA, tRNA and snRNA. In yet
other protocols, custom primer mixes are employed in target
generation. While such protocols overcome the above-described
disadvantages with polyA and random primer based protocols, custom
primer mix or gene specific primer based protocols can be
prohibitively expensive, particularly in array-based hybridization
protocols in which custom arrays are employed.
[0005] As such, there is continued interest in the development of
new primer generation protocols. Of particular interest would be
the development of a protocol that realizes the advantages of gene
specific primer based protocols while at the same time is
economical to perform and is therefore suitable for use in custom
array-based hybridization assays.
[0006] Relevant Literature
[0007] See U.S. Pat. No. 5,795,714 and the references cited
therein.
SUMMARY OF THE INVENTION
[0008] Methods for generating mixtures of nucleic acids, e.g.,
oligonucleotide primers, are provided. In the subject methods, an
array of probe nucleic acids is employed as template to generate
mixtures of nucleic acids via a template driven primer extension
reaction. In preferred embodiments, each probe on the array
employed in the subject methods comprises a constant domain and a
variable domain, where the constant domain is further characterized
by having at least a recognition domain, and optionally a
functional domain and/or linker domain. Also provided are the
arrays employed in the subject methods and kits for practicing the
subject methods. The subject methods find use in a variety of
applications, including the generation of target nucleic acids from
an mRNA sample for use in hybridization assays, e.g., differential
gene expression analysis.
BRIEF DESCRIPTION OF THE FIGURES
[0009] FIG. 1 provides a view of the stained gel produced in
Example 1 of the Experimental section, infra.
DEFINITIONS
[0010] The term "nucleic acid" as used herein means a polymer
composed of nucleotides, e.g., deoxyribonucleotides or
ribonucleotides, or compounds produced synthetically (e.g. PNA as
described in U.S. Pat. No. 5,948,902 and the references cited
therein) which can hybridize with naturally occurring nucleic acids
in a sequence specific manner analogous to that of two naturally
occurring nucleic acids.
[0011] The terms "ribonucleic acid" and "RNA" as used herein mean a
polymer composed of ribonucleotides.
[0012] The terms "deoxyribonucleic acid" and "DNA" as used herein
mean a polymer composed of deoxyribonucleotides.
[0013] The term "oligonucleotide" as used herein denotes single
stranded nucleotide multimers of from about 10 to 100 nucleotides
and up to 200 nucleotides in length.
[0014] The term "polynucleotide" as used herein refers to single or
double stranded polymer composed of nucleotide monomers of
generally greater than 100 nucleotides in length.
[0015] The term "mRNA" means messenger RNA.
[0016] The term "array" means a substrate having at least one
planar surface on which is immobilized a plurality of different
probe nucleic acids.
DESCRIPTION OF THE SPECIFIC EMBODIMENTS
[0017] Methods for generating mixtures of nucleic acids, e.g.,
oligonucleotide primers, are provided. In the subject methods, an
array is employed as template to generate mixtures of nucleic acids
via a template driven primer extension reaction. In preferred
embodiments, each probe on the array employed in the subject
methods comprises a constant domain and a variable domain, where
the constant domain is further characterized by having at least a
recognition domain, and optionally a functional and/or linker
domain. Also provided are the arrays employed in the subject
methods and kits for practicing the subject methods. The subject
methods find use in a variety of applications, including the
generation of target nucleic acids from an mRNA sample for use in
hybridization assays, e.g., differential gene expression analysis.
In further describing the subject invention, the subject methods
will be described first, followed by a review of representative
protocols in which the nucleic acid mixtures produced by the
subject methods find use as well as a description of kits that find
use in practicing the subject methods.
[0018] Before the subject invention is described further, it is to
be understood that the invention is not limited to the particular
embodiments of the invention described below, as variations of the
particular embodiments may be made and still fall within the scope
of the appended claims. It is also to be understood that the
terminology employed is for the purpose of describing particular
embodiments, and is not intended to be limiting. Instead, the scope
of the present invention will be established by the appended
claims.
[0019] In this specification and the appended claims, the singular
forms "a," "an" and "the" include plural reference unless the
context clearly dictates otherwise. Unless defined otherwise, all
technical and scientific terms used herein have the same meaning as
commonly understood to one of ordinary skill in the art to which
this invention belongs.
[0020] Methods
[0021] As summarized above, the subject invention provides methods
for generating mixtures of nucleic acids by a template driven
primer extension protocol in which an array is employed as
template. The mixture of nucleic acids produced by the subject
methods is characterized by having a known composition. As such, at
least the sequence of each individual or distinct nucleic acid in
the mixture of differing sequence is known. In many embodiments,
the relative amount or copy number of each distinct nucleic acid of
differing sequence is known. Each nucleic acid present in the
mixture at least includes a variable domain that serves to
distinguish it from any other nucleic acid in the mixture, i.e.,
any other nucleic acid that does not have the identical
sequence--any nucleic acid that is not its copy. The variable
domain, S.sub.ij, is a nucleic acid that hybridizes under stringent
conditions to gene i at location j and is capable of serving as a
primer in reverse transcription beginning at base j. The number of
different variable domains, S.sub.ij, present in the mixture may
vary, but is generally at least about 10, usually at least about 20
and more usually at least about 50, where the number may be as
great as 25,000 or greater. In many embodiments, the number of
different variable domains present in the mixture ranges from about
1,978 to 25,000, usually from about 4,200 to 8,400. In addition to
the distinguishing variable domain, the constituent members of the
mixture may all share one or more domains of common sequence,
depending on the particular protocol employed to generate the
mixture, as described in greater detail below.
[0022] In the subject methods, the first step is generally to
provide an array, i.e., a substrate having a planar surface on
which is immobilized a plurality of distinct nucleic acid probes,
in which each probe sequence on the array includes a constant
domain and a complement variable domain. This providing step may
include either generating the array de novo or obtaining a pre-made
array from a commercial source, where in either case the array will
have the characteristics described below. Arrays of nucleic acids
are known in the art, where representative arrays that may be
modified to become arrays of the subject invention as described
below, include those described in: U.S. Pat. Nos. 5,242,974;
5,384,261; 5,405,783; 5,412,087; 5,424,186; 5,429,807; 5,436,327;
5,445,934; 5,472,672; 5,527,681; 5,529,756; 5,545,531; 5,554,501;
5,556,752; 5,561,071; 5,599,695; 5,624,711; 5,639,603; 5,658,734;
5,795,714; WO 93/17126; WO 95/11995; WO 95/35505; EP 742 287; and
EP 799 897; the disclosures of which are herein incorporated by
reference.
[0023] As mentioned above, each distinct probe nucleic acid on the
array includes a constant domain and a complement variable domain.
The complement variable domain of each distinct probe has a
sequence that is the complement of a variable or distinguishing
domain found in a constituent member of the mixture of nucleic
acids that is produced by the subject methods as described above,
where by complement is meant that the variable and complement
variable sequences hybridize under stringent conditions, e.g., at
50.degree. C. or higher and 0.1.times.SSC (15 mM sodium
chloride/1.5 mM sodium citrate) or thermodynamically equivalent
conditions. Thus, the array includes a plurality of distinct probes
that differ from each other by complement variable domain, where
the number of distinct probes on an array employed in the subject
methods is typically at least 10, usually at least 20 and more
usually at least 50, where the number may be as high as 25,000 or
higher. In many embodiments, the number of distinct probes ranges
from about 1,978 to 25,000, usually from about 4,200 to 8,400.
[0024] Because of the nature of the subject methods, as described
below, each distinct complement variable domain will be represented
in the nucleic acid mixture produced using the array, i.e., the
complement of each distinct complement variable domain sequence
will be found in the mixture of nucleic acids produced by the
subject methods. For example, where an array has 10 different
probes that differ by complement variable domain such that it has
10 different complement variable domains, i.e., cV.sub.1-10, the
nucleic acid mixture produced by the subject methods as described
below will have 10 different or distinct nucleic acids, where each
different nucleic acid sequence in the mixture includes a sequence
that is the complement of one of cV.sub.1-10, i.e., V.sub.1-10.
[0025] The relative copy number of each probe on the array may or
may not be selected to "normalize" the nucleic acid mixture made
with the array with respect to the mRNA sample with which it is to
be used. For example, if the array is to be used to make a nucleic
acid mixture that has a 10-fold increase in the copy number of
target that hybridizes to a rare mRNA, the copy number of the
corresponding (e.g. identical or complementary) probe on the array
can be appropriately increased relative to other probes that
correspond to less rare mRNA species in the mRNA sample. In many
embodiments, the complement variable domain is a domain that has a
sequence that is chosen to hybridize under stringent conditions to
a sequence of interest found in a particular mRNA. In many
embodiments, the complement variable sequence has a sequence that
is denoted as cS.sub.ij, where c stands for complement and S.sub.ij
is a nucleic acid that primes reverse transcription of a gene i
beginning at base j. Thus, in many embodiments of the invention,
the complement variable domain of each probe is the complement of a
nucleic acid that is capable of hybridizing to a different gene of
interest i at location or base j and acting as a primer under
reverse transcription conditions. For example, where 10 different
genes, i.e., genes 1 to 10 are represented on the array and the
sequence of interest for each gene begins at base number 50, 60,
70, 80, 90, 100, 110, 120, 130 and 140, respectively (counting from
the 5' end of the mRNA molecule), and each complement variable
domain is 20 bases long, the complement variable domains of each
distinct probe on the array, i.e., cV.sub.1 to V.sub.10, will be as
follows:
1 Variable Domain Sequence cV.sub.1 Sequence that hybridizes under
stringent conditions to bases 50 to 30 of gene 1 cV.sub.2 Sequence
that hybridizes under stringent conditions to bases 60 to 40 of
gene 2 cV.sub.3 Sequence that hybridizes under stringent conditions
to bases 70 to 50 of gene 3 cV.sub.4 Sequence that hybridizes under
stringent conditions to bases 80 to 60 of gene 4 cV.sub.5 Sequence
that hybridizes under stringent conditions to bases 90 to 70 of
gene 5 cV.sub.6 Sequence that hybridizes under stringent conditions
to bases 100 to 80 of gene 6 cV.sub.7 Sequence that hybridizes
under stringent conditions to bases 110 to 90 of gene 7 cV.sub.8
Sequence that hybridizes under stringent conditions to bases 120 to
100 of gene 8 cV.sub.9 Sequence that hybridizes under stringent
conditions to bases 130 to 110 of gene 9 cV.sub.10 Sequence that
hybridizes under stringent conditions to bases 140 to 120 of gene
10
[0026] While the length of the complement variable domain in the
specific example provided above is 20 bases or residues, i.e., 20
nt, the length may vary considerably and will be chosen based on
the desired length of the resultant nucleic acids in the to be
produced mixture within the synthesis constraints of the subject
method. Generally, the length of the complement variable domain
will range from about 15 to 40, usually from about 15 to 30 and
more usually from about 20 to 25 nt.
[0027] As mentioned above, in addition to the unique complement
variable domain, each probe nucleic acid present on the array
includes a common or shared constant domain 3' of the complement
variable domain. This constant domain typically ranges in length
from about 20 to 50, usually from about 20 to 45 and more usually
from about 25 to 40 nt. The constant domain typically comprises at
least one of the following constant sub-domains: a functional
domain; a recognition domain and a linker domain. In many
embodiments, each probe contains at least a recognition sub-domain,
and optionally a functional domain and/or a linker domain. These
constant sub-domains may be grouped together on the probe or
separated so as to flank the variable domain of the probe. As such,
in certain embodiments these sub-domains are generally arranged in
the order of functional domain, recognition domain and linker
domain going from the 5' to the 3' end of the probe sequence, such
that the linker domain is at the 3' probe terminus and is attached,
either directly or indirectly, to the substrate surface of the
array. In yet other embodiments, one or more of the domains, e.g.,
the functional sub-domain, may be present on the 5' end of the
variable domain.
[0028] The optional functional sub-domain is generally a sequence
that imparts or contributes some function to a duplex nucleic acid
in which it is present. Functional domains of interest include:
polymerase promoter sites, e.g., T3 or T7 RNA polymerase promoter
sites, sequences unique with respect to the intended target
organism for the array experiment (i.e. unique priming sites) and
the like. The length of this functional domain typically ranges
from about 10 nt to 40 nt, usually from about 20 nt to 30 nt
[0029] The recognition sequence of the constant domain is typically
a sequence that, when present in duplex format, is recognized and
cleaved by a restriction endonuclease. A large number of
restriction endonucleases are known to those of skill in the art.
Specific restriction endonuclease recognized sites of interest that
may make up the subject recognition sequence include, but are not
limited to: Hinc II and the like. Generally, the length of the
recognition domain ranges from about 4 nt to 8 nt, usually from
about 5 nt to 6 nt
[0030] The linker sub-domain of the subject constant domains is
optional. The linker domain may be any convenient sequence,
including random sequence or a non-polynucleotide chemical linker
(e.g. an ethylene glycol-based polyether oligomer), where the sole
purpose of the linker domain is to project the other domains of the
probe away from the substrate surface. Generally, the linker domain
if present, has a length ranging from about 1 to 20, usually from
about 1 to 15 and more usually from about 1 to 10, including 5 to
10 nt.
[0031] In many, though not all, embodiments, each surface bound
probe on the array employed in the subject methods is described by
the following formula:
surface-3'-L-R--F-cV-5'
[0032] wherein:
[0033] L is the optional linking domain;
[0034] R is the recognition domain;
[0035] F is the functional domain; and
[0036] cV is the complement variable domain, i.e., the complement
of the variable domain, cS.sub.ij, of the nucleic acid produced by
the subject methods to which it hybridizes under stringent
conditions;
[0037] where each of these elements are as described above.
[0038] As mentioned above, the subject arrays are provided by any
convenient means, including obtaining them from a commercial source
or by synthesizing them de novo. To synthesize the arrays employed
in the subject methods, the first step is generally to determine
the nature of the mixture of nucleic acids that is to be produced
using the subject array according to the subject methods. In those
embodiments where the nucleic acid mixture is to be employed as
gene specific primer in the generation of target nucleic acid, as
described in greater detail below, the first step is to identify
those genes that are to be represented by a primer in the primer
mixture, i.e., those specific mRNAs potentially present in the
experimental samples which are to have primers in the mixture that
are capable of hybridizing to them under stringent conditions.
Following identification of these genes, the specific region, i.e.
stretch or domain, of each mRNA to which the primer is to hybridize
is then identified. These specific domains or regions may be
identified using any convenient protocol and set of selection
criteria, where of interest in many embodiments is the use of the
algorithm and selection methods based thereon described in U.S.
patent application Ser. No. 09/021,701, the disclosure of which is
herein incorporated by reference. As such, a plurality of different
sequences of interest will be identified, wherein each sequence is
described by the formula S.sub.ij, where i is the gene of interest
and j is the specific base at which the sequence starts, as
described above. Following identification of each variable or
S.sub.ij sequence as described above, a probe sequence for each
different variable or S.sub.ij sequence is identified, where the
probe sequence has the following sequence in many embodiments:
3'-L-R--F-cV-5'
[0039] wherein:
[0040] L is the linking domain;
[0041] R is the recognition domain;
[0042] F is the functional domain; and
[0043] cV is the complement of the variable domain, i.e.,
cS.sub.ij;
[0044] where each of these elements are as defined above and each
of the probes varies only in terms of its cV domain.
[0045] Following identification of the probe sequences as defined
above, an array is produced in which each of the probe sequences of
the identified set is present. The array may be produced using any
convenient protocol, where suitable protocols include both
synthesis of the complement probe followed by deposition onto a
substrate surface, as well as synthesis of the probe directly on
the substrate surface. Representative protocols for array synthesis
are described in: U.S. Pat. Nos. 5,242,974; 5,384,261; 5,405,783;
5,412,087; 5,424,186; 5,429,807; 5,436,327; 5,445,934; 5,472,672;
5,527,681; 5,529,756; 5,545,531; 5,554,501; 5,556,752; 5,561,071;
5,599,695; 5,624,711; 5,639,603; 5,658,734; 5,795,714; WO 93/17126;
WO 95/11995; WO 95/35505; EP 742 287; and EP 799 897; the
disclosures of which are herein incorporated by reference.
[0046] Following provision of the array employed in the subject
methods, as described above, the next step is to contact the array
with universal primer under hybridization conditions sufficient to
produce a template array that includes a plurality of overhang
comprising duplex nucleic acids on its surface, where the overhang
is made up of the complement variable domain of each probe of the
array. The universal primer is capable of hybridizing to the
constant domain, or at least a portion thereof (e.g., at least that
portion immediately 3' of the complement variable domain). The
universal primer has a length that is sufficient to prime template
driven primer extension, where the length of the universal primer
generally ranges from about 10 to 45 nt, usually from about 15 to
35 nt and more usually from about 20 to 30 nt. In many embodiments,
the universal primer is the complement of the recognition and/or
functional sub-domains of the constant domain of each probe on the
array. As such, in many embodiments the universal primer employed
has a sequence described by the formula:
5'-cR-cF-3'
[0047] wherein:
[0048] cR is the complement of the recognition domain; and
[0049] cF is the complement of the functional domain.
[0050] As mentioned above, the template array produced by this
method is an array of duplex probe molecules made up of a first
nucleic acid having a constant and complement variable domain and a
second nucleic acid which is the universal primer and is hybridized
to the constant domain (or at least that portion of the constant
domain that is 3' of the variable domain complement). As such, the
array produced by this step is an array of overhang comprising
duplex nucleic acid, typically DNA, molecules, where the overhang
is made up of the complement variable domain of each probe on the
array.
[0051] This template array of overhang comprising duplex probes is
then subjected to primer extension reaction conditions sufficient
to produce the desired mixture of nucleic acids. The specific
primer extension reaction conditions to which the template array of
overhang comprising duplex nucleic acids is subjected may vary
depending on the particular protocol used and/or the specific
nature of the nucleic acid mixture to be produced therefrom.
Specific primer extension reaction conditions of interest include,
but are not limited to: linear PCR (Polymerase Chain Reaction);
strand displacement amplification; and in vitro transcription. Each
of these specific primer extension reaction conditions is now
reviewed in greater detail.
[0052] Where the template array is subjected to linear PCR
conditions, the array is contacted in an aqueous reaction mixture
with a source of DNA polymerase, dNTPs and any other desired or
requisite primer extension reagents under conditions sufficient to
produce linearly amplified amounts of nucleic acids, e.g., under
thermal cycling conditions. As such, the polymerase employed in the
subject methods is generally, though not necessarily (e.g., where
new polymerase is added after each cycle) a thermostable
polymerase. A variety of thermostable polymerases are known to
those of skill in the art, where representative polymerases
include, but are not limited to: Taq polymerase, Vent.RTM.
polymerase, Pfu polymerase and the like. The amount of polymerase
present in the reaction mixture may vary but is sufficient to
provide for the requisite amount of polymerase activity, where the
specific amount employed may be readily determined by those of
skill in the art. Also present in the reaction mixture is a
collection of the four dNTPs, i.e., dATP, dCTP, dGTP and dTTP. The
dNTPs may be present in varying or equimolar amounts, where the
amount of each dNTP typically ranges from about 10 .mu.M to 10 mM,
usually from about 100 .mu.M to 300 .mu.M. Other reagents that may
be present in the reaction mixture include: monovalent cations
(e.g. Na.sup.+), divalent cations (e.g. Mg.sup.++), buffers (e.g.
Tris), surfactants (e.g. Triton X-100) and the like. In this linear
PCR embodiment of the subject methods, the reaction mixture is
subjected to thermal cycling conditions in which the temperature of
the reaction mixture is cycled through an annealing, primer
extension and dissociation temperatures in a manner that results in
the production of linearly amplified amounts of nucleic acid for
each different sequence probe on the template array. The annealing
temperature typically ranges from about 50.degree. C. to 80.degree.
C., usually from about 60.degree. C. to 75.degree. C. and is
maintained for period of time ranging from about 10 sec. to 10
min., usually from about 30 sec. to 2 min. The primer extension
temperature typically ranges from about 55.degree. C. to 75.degree.
C., usually from about 60.degree. C. to 70.degree. C. and is
maintained for period of time ranging from about 30 sec. to 10
min., usually from about 1 min. to 5 min. The dissociation
temperature typically ranges from about 80.degree. C. to 99.degree.
C., usually from about 90.degree. C. to 95.degree. C. and is
maintained for period of time ranging from about 1 sec. to 2 min.,
usually from about 30 sec. to 1 min.
[0053] In strand displacement amplification, the array of overhang
comprising duplex nucleic acids is employed as primed template in
linear amplification variations of the exponential amplification
protocols described in Walker et al., Nucleic Acids Res. (1992)
20:1691-1696 and Walker et al., Proc. Nat'l Acad. Sci. USA (1992)
89:392-396; as well as in U.S. Pat. No. 5,648,211; the disclosure
of which is herein incorporated by reference. Briefly, isothermal
linear amplification is achieved as follows. Following production
of the array of overhang comprising duplex nucleic acids, the
template array is subjected to a cycle of strand nicking of the
universal primer after sequence cR, typically by using a
restriction endonuclease. Generally, the template strand or probe
sequence is protected via an appropriately placed phosphorthioate
linkage in the surface-bound template strand. Extension of the 3'
end exposed by the nick is then allowed to proceed by using a DNA
polymerase that lacks a 5'.fwdarw.3' exonuclease activity but
possesses a strand displacement activity, e.g., Klenow fragment.
Each cycle in this protocol releases a nucleic acid molecule which
has the formula: 5'-cF-Sij-3'. In certain variants of this method,
nicking may be achieved by making R a half-site for a restriction
endonuclease that exhibits single-strand cleavage activity, or by
employing a nicking endonuclease, such as N.BstNBI, and the
like.
[0054] In yet other embodiments, the subject template array of
duplex nucleic acids is employed in an in vitro transcription
method. In this embodiment, the template array is modified from
that described above to be of the following formula:
(surface)-L-R-(C)Sij-F-5'
[0055] wherein:
[0056] L and R are as defined above;
[0057] F is an RNA polymerase promoter, e.g., T3 or T7 promoter;
and
[0058] (C) Sij is Sij modified to end in a C residue.
[0059] The universal primer employed with this array has the
formula 5'-cR-3'. When the template array is contacted with NTPs,
T3 or T7 polymerase and the appropriate transcription buffer,
rinonucleic acids of the formula 5'-(rG)rcSij-rcF-3' are produced,
where r stands for ribonucleotide. By contacting this resultant
mixture of ribonucleic acids with the DNA primer 5'-F-3' and a
reverse transcriptase, a mixture of deoxyribonucleic acids suitable
for use as primer in target generation protocols is produced.
[0060] The subject template arrays may also be used in other
nucleic acid primer extension generation protocols--the above being
merely representative of the protocols in which the subject
template arrays find use.
[0061] The above described array template based primer extension
generation methods result in the production of a mixture of nucleic
acids, typically a mixture of deoxyribonucleic acids, where each of
the different complement variable domains of the template array is
represented in the mixture, i.e., there is at least one nucleic
acid in the mixture that has a variable domain that hybridizes
under stringent conditions to each different complement variable
domain present on the array. The length of each of the nucleic
acids present in the resultant mixture typically ranges from about
20 to 60 nt, usually from about 25 to 55 nt and more usually from
about 30 to 50 nt. Because of the manner in which the subject
mixtures of nucleic acids are produced, the resultant mixtures of
nucleic acids may be viewed as mixtures of gene specific primers,
where the gene specific primers are specific for each of the
different genes represented on the template array employed in the
production of the nucleic acid mixture. In certain embodiments, the
mixture may be "normalized" with respect to a given mRNA
population, as described above.
[0062] Utility
[0063] The nucleic acid mixtures produced by the subject methods
find use in a variety of different applications, and are
particularly suited for use as primers in the generation of target
nucleic acids, e.g., for array based differential gene expression
analysis applications. Where the subject nucleic acids mixtures are
used as primers for target generation in gene expression analyses,
the first step is to generate a population of target nucleic acids
from an initial mRNA source or sample. By target nucleic acid is
meant a nucleic acid that has a sequence, e.g., S.sub.ij, which is
either the same as, or complementary to, the sequence of an mRNA
found in an initial sample, where the target may be DNA or RNA and
be present in amplified amounts as compared to the initial amount
of mRNA, depending on the particular target generation protocol
that is employed.
[0064] In the subject methods, the target or image nucleic acids
are produced from the subject nucleic acid mixtures generally
through enzymatic generation protocols. Specifically, the target
nucleic acids are typically produced using template dependent
polymerization protocols and an initial mRNA source. The initial
mRNA source may be present in a variety of different samples, where
the sample will typically be derived from a physiological source.
The physiological source may be derived from a variety of
eukaryotic or prokaryotic sources, with physiological sources of
interest including sources derived from single-celled organisms
such as yeast and multicellular organisms, including plants and
animals, particularly mammals, where the physiological sources from
multicellular organisms may be derived from particular organs or
tissues of the multicellular organism, or from isolated cells
derived therefrom. In obtaining the sample of RNA to be analyzed
from the physiological source from which it is derived, the
physiological source may be subjected to a number of different
processing steps, where such processing steps might include tissue
homogenization, cell isolation and cytoplasm extraction, nucleic
acid extraction and the like, where such processing steps are known
to those of skill in the art. Methods of isolating RNA from cells,
tissues, organs or whole organisms are known to those of skill in
the art and are described in Maniatis et al. (1989), Molecular
Cloning: A Laboratory Manual 2d Ed. (Cold Spring Harbor Press).
[0065] A number of different enzymatic protocols for generating
image or target nucleic acids from an initial mRNA sample are known
and continue to be developed. Any convenient protocol may be
employed, where the particular protocol employed depends, at least
in part, on a number of factors, including: whether one wants to
generate amplified amounts of target or image nucleic acid; whether
one wants to generate geometrically or linearly amplified amounts
of target nucleic acid; whether bias in the amount of target can be
tolerated, etc. A common feature of the protocols that find use in
preparing the image or target nucleic acids of the subject
invention is the use of the subject nucleic acid mixtures produced
using array-based template protocols described above as primer.
[0066] A number of nucleic acid amplification methods can be
employed to generate the target nucleic acid from an initial mRNA
source, where these methods can employ the subject nucleic acid
mixtures as primer. Such methods include the "polymerase chain
reaction" (PCR) as described in U.S. Pat. No. 4,683,195, the
disclosure of which is herein incorporated by reference, and a
number of transcription-based exponential amplification methods,
such as those described in U.S. Pat. Nos. 5,130,238; 5,399,491; and
5,437,990; the disclosures of which are herein incorporated by
reference. Each of these methods uses primer-dependent nucleic acid
synthesis to generate a DNA or RNA product, which serves as a
template for subsequent rounds of primer-dependent nucleic acid
synthesis. Each process uses (at least) two primer sequences
complementary to different strands of a desired nucleic acid
sequence and results in an exponential increase in the number of
copies of the target sequence.
[0067] Alternatively, amplification methods that utilize a single
primer may be employed to generate target or image nucleic acids
from an initial mRNA sample, where the subject nucleic acid
mixtures are employed as primer. See e.g. U.S. Pat. Nos. 5,554,516;
and 5,716,785; the disclosures of which are herein incorporated by
reference. The methods reported in these patents utilize a single
primer containing an RNA polymerase promoter sequence and a
sequence complementary to the 3'-end of the desired nucleic acid
target sequence(s) ("promoter-primer"). In both methods, the
promoter-primer is added under conditions where it hybridizes to
the target sequence(s) and is converted to a substrate for RNA
polymerase. In both methods, the substrate intermediate is
recognized by RNA polymerase, which produces multiple copies of RNA
complementary to the target sequence(s) ("cRNA").
[0068] Whatever process is employed to generate the target nucleic
acid, where representative protocols have been provided immediately
above, the process may be modified to include the use of chemical
analogs of nucleotides that have been modified to include a label
moiety, e.g., an organic fluorophore, an isotopic label, a capture
ligand, e.g., biotin, etc. As a result, the target nucleic acids
produced using the subject nucleic acid mixtures as primers often
are labeled, either directly or indirectly, for use in subsequent
hybridization assays.
[0069] The above target generation protocols are merely
representative and by no means inclusive of all of the different
types of protocols in which the subject nucleic acid mixtures find
use as primers.
[0070] The resultant populations of target nucleic acids find use
as, inter alia, target in hybridization assays, such as gene
expression analysis applications. Gene expression analysis
protocols are well known to those of skill in the art, and the
populations of target nucleic acids produced by the subject methods
find use in many, if not all, of these protocols. In gene
expression analysis protocols using the subject populations of
labeled target, the population of labeled target is typically
contacted with a population of probe nucleic acids, e.g., on an
array, under hybridization conditions, usually stringent
hybridization conditions. The array may be the same array that is
used as the template array or a different array. Following
hybridization, non-bound target is removed or separated from the
probe, e.g., by washing. Washing results in a pattern of hybridized
target, which may be read using any convenient protocol, e.g., with
a fluorescent scanner device. From this pattern, information
regarding the mRNA expression profile in the initial mRNA sample
from which the target population was produced may be readily
derived or deduced.
[0071] In certain embodiments, the subject methods include a step
of transmitting data from at least one of the detecting and
deriving steps, as described above, to a remote location. By
"remote location" is meant a location other than the location at
the which the array is present and hybridization occur. For
example, a remote location could be another location (e.g. office,
lab, etc.) in the same city, another location in a different city,
another location in a different state, another location in a
different country, etc. The data may be transmitted to the remote
location for further evaluation and/or use. Any convenient
telecommunications means may be employed for transmitting the data,
e.g., facsimile, modem, internet, etc.
[0072] Kits
[0073] Also provided by the subject invention are kits for use in
preparing the subject target populations of nucleic acids. The kits
may comprise containers, each with one or more of the various
reagents (typically in concentrated form) utilized in the methods,
including, for example, buffers, dNTPs, reverse transcriptase,
etc., where the kits will at least include a sufficient amount of
universal primer, e.g., an amount ranging from about 25 pmol to 25
.mu.mol. In addition, the subject kits may include an array of
single stranded probe nucleic acids (or a means for producing the
same) wherein each probe has a constant region and complement
variable region, as described above. Where the kit has a means for
producing the template array, the kit typically includes a
substrate having a planar surface, and one or more reagents
necessary for synthesis of the probes, which may vary depending on
the nature of the protocol to be used to generate the array. The
kits may further include reagents necessary for producing labeled
target nucleic acids, where such reagents may include reverse
transcriptase, labeled dNTPs, etc. A set of instructions will also
typically be included, where the instructions may be associated
with a package insert and/or the packaging of the kit or the
components thereof.
[0074] The following examples are offered by way of illustration
and not by way of limitation.
Experimental
EXAMPLE
[0075] In order to demonstrate the feasibility of using an
oligonucleotide array as a template for enzymatic polynucleotide
synthesis, the following experiment was performed:
[0076] 1. An in situ oligonucleotide array was manufactured; the
array contained 8455 (89.times.95) features (.about.100 .mu.m
diameter) with the following sequence:
2 (SEQ ID NO:01) 5'-CTTTCTTGGATCAACCCGCTCAATGCTCCCTATAGTGAG- TC
GTATTACAATTCATTTTTT-surface
[0077] In the above sequence, the large dash underlines indicate
the unique sequence cS.sub.ij, the small dashes indicate the
recognition/functional sequence F-R (in this case, a T7 RNA
polymerase promoter) and the continuous underline indicates a
linker sequence Q.
[0078] 2. The array was hybridized for 1 hour at 60.degree. C. to
the following oligonucleotide (PT7, 250 nM)
3 3'-GATATCACTCAGCATAATGTTAAGTA-5' (SEQ ID NO:02)
[0079] i.e. the complementary strand of the T7 promoter portion of
the oligonucleotide on the surface. The purpose of this treatment
was to produce a double-stranded T7 promoter, which is necessary
for T7 RNA polymerase activity (note that a double-stranded
template strand is not necessary; a 5'-overhanging single-stranded
template is known to be sufficient).
[0080] 3. The array was washed briefly with ice-cold water (to
remove salts from the hybridization buffer) and blown dry with
nitrogen. The hybridization chamber was reassembled and filled with
a transcription mixture (250 .mu.l) containing T7 transcription
buffer (including NTP's), T7 RNA polymerase, 1% Triton X-100 and
the oligonucleotide of step 2 (250 nM). The assembly was incubated
overnight at 40.degree. C. An identical positive control array was
also incubated in contact with the same transcription mixture, with
a soluble version of the array-bound oligonucleotide of step 1
added (HCV185; 250 nM). Finally, a second positive control mixture
was incubated in a PCR tube.
[0081] 4. The transcription mixtures were removed from the
experimental and positive control arrays. Half of each array
mixture was concentrated >10.times. using a Microcon-3
ultrafiltration concentrator.
[0082] 5. The various samples were analyzed on a 15%
polyacrylamide/4M urea gel, stained with ethidium bromide and
visualized by fluorescence. The results are provided in FIG. 1.
[0083] The results provided in FIG. I clearly show visible
transcript in the concentrated experimental array sample (lane 2).
Separate negative control experiments demonstrated that reactions
which omitted the complementary oligonucleotide PT7 or the T7 RNA
polymerase did not produce visible bands on a similar gel (data not
shown). Microcon concentration of .about.80 .mu.l of 250 nM PT7
oligo also failed to yield a visible band on a similar gel (data
not shown). Thus, the observed gel pattern is dependent upon the
presence of T7 RNA polymerase and a double-stranded T7 promoter,
and is not due to the added oligonucleotide PT7. Furthermore, the
chief product of transcription from an array-bound template
displays the same gel migration rate as the chief product of
positive-control transcription reactions. The most likely
explanation for the observed data is that we have reduced to
practice the T7 RNA polymerase version of enzymatic oligonucleotide
production from an array template.
[0084] It is evident that the subject invention provides a number
of advantages over current target nucleic acid generation
protocols. These advantages include the provision of an economical
and rapid synthesis method for custom primer mixtures that are
particularly suited for use in target generation for use with the
nucleic acid arrays. Using the subject methods leads to increased
specificity in microarray based assays. Using the subject methods,
one can develop microarray based assays in which the microarray is
customized to be sensitive or insensitive to various splicing
variants of different genes of interest, even where the splicing
variant is present proximal to the 5' end of the coding sequence.
Allele specific mRNA profiling is possible with the subject methods
by picking the variable region so that the 3'-end of the primer
produced hybridizes at a base where the two alleles differ. In
addition, the subject methods can be employed to easily produce
normalized target nucleic acid mixtures. Accordingly, the invention
represents a significant contribution to the art.
[0085] All publications and patent applications cited in this
specification are herein incorporated by reference as if each
individual publication or patent application were specifically and
individually indicated to be incorporated by reference. The
citation of any publication is for its disclosure prior to the
filing date and should not be construed as an admission that the
present invention is not entitled to antedate such publication by
virtue of prior invention.
[0086] Although the foregoing invention has been described in some
detail by way of illustration and example for purposes of clarity
of understanding, it is readily apparent to those of ordinary skill
in the art in light of the teachings of this invention that certain
changes and modifications may be made thereto without departing
from the scope of the appended claims.
Sequence CWU 1
1
2 1 60 DNA Artificial Sequence synthetic probe 1 ctttcttgga
tcaacccgct caatgctccc tatagtgagt cgtattacaa ttcatttttt 60 2 26 DNA
Artificial Sequence synthetic probe 2 gatatcactc agcataatgt taagta
26
* * * * *