U.S. patent application number 17/522708 was filed with the patent office on 2022-05-26 for dna array.
The applicant listed for this patent is Complete Genomics, Inc. Invention is credited to Matthew J. Callow, Radoje Drmanac, Snezana Drmanac, Brian K. Hauser, George Yeung.
Application Number | 20220162694 17/522708 |
Document ID | / |
Family ID | |
Filed Date | 2022-05-26 |
United States Patent
Application |
20220162694 |
Kind Code |
A1 |
Drmanac; Radoje ; et
al. |
May 26, 2022 |
DNA ARRAY
Abstract
Random arrays of single molecules are provided for carrying out
large scale analyses, particularly of biomolecules, such as genomic
DNA, cDNAs, proteins, and the like. In one aspect, arrays of the
invention comprise concatemers of DNA fragments that are randomly
disposed on a regular array of discrete spaced apart regions, such
that substantially all such regions contain no more than a single
concatemer.
Inventors: |
Drmanac; Radoje; (Los Altos
Hills, CA) ; Callow; Matthew J.; (Mountain View,
CA) ; Drmanac; Snezana; (Los Altos Hills, CA)
; Hauser; Brian K.; (Campbell, CA) ; Yeung;
George; (Mountain View, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Complete Genomics, Inc |
San Jose |
CA |
US |
|
|
Appl. No.: |
17/522708 |
Filed: |
November 9, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16994343 |
Aug 14, 2020 |
|
|
|
17522708 |
|
|
|
|
16425846 |
May 29, 2019 |
|
|
|
16994343 |
|
|
|
|
15442659 |
Feb 25, 2017 |
10351909 |
|
|
16425846 |
|
|
|
|
14714133 |
May 15, 2015 |
9650673 |
|
|
15442659 |
|
|
|
|
12882880 |
Sep 15, 2010 |
|
|
|
14714133 |
|
|
|
|
11451691 |
Jun 13, 2006 |
8445194 |
|
|
12882880 |
|
|
|
|
60776415 |
Feb 24, 2006 |
|
|
|
60725116 |
Oct 7, 2005 |
|
|
|
60690771 |
Jun 15, 2005 |
|
|
|
International
Class: |
C12Q 1/6874 20060101
C12Q001/6874; C12Q 1/682 20060101 C12Q001/682; C12Q 1/6869 20060101
C12Q001/6869; C12Q 1/6806 20060101 C12Q001/6806; C12Q 1/6837
20060101 C12Q001/6837; C07H 21/04 20060101 C07H021/04; C07K 1/04
20060101 C07K001/04; G01N 15/14 20060101 G01N015/14 |
Goverment Interests
GOVERNMENT INTERESTS
[0002] This invention was made with Government support under grant
No. 1 U01A1057315-01 awarded by the National Institutes of Health.
The Government has certain rights in the invention.
Claims
1. An array of polymer molecules comprising: a support having a
surface; and a plurality of polymer molecules attached to the
surface, wherein each polymer molecule has a random coil state and
comprises a branched or linear structure of multiple copies of one
or more linear polymeric units, such that the polymer molecule is
attached to the surface within a region substantially equivalent to
a projection of the random coil on the surface and randomly
disposed at a density such that at least thirty percent of the
polymer molecules are separately detectable.
2. The array of polymer molecules of claim 1 wherein said one or
more linear polymeric units are each single stranded
polynucleotides and wherein said surface has reactive
functionalities or capture oligonucleotides attached thereto and
wherein said polymer molecules are each attached to said surface by
one or more linkages formed by one or more reactive functionalities
reacting with complementary functionalities of said polymers
molecules or by one or more complexes formed between the capture
oligonucleotides and complementary sequences of the polymer
molecules.
3. The array of claim 2 wherein said surface is a planar surface
having an array of discrete spaced apart regions, wherein each
discrete spaced apart region has a size substantially equivalent to
said projection of said random coil of said polymer molecule and
contains said reactive functionalities or said capture
oligonucleotides attached thereto.
4. The array of claim 3 wherein each of said discrete spaced apart
regions has an area of less than 1 0.1 to 20 .mu.m.sup.2.
5. The array of claim 4 wherein said discrete spaced apart regions
form a regular array with a nearest neighbor distance in the range
of from 0.1 to 20 .mu.m and wherein a majority of said discrete
spaced apart regions contain no more than one said polymer
molecules.
6. The array of claim 5 wherein said polymer molecules are randomly
distributed on said discrete spaced apart regions and wherein said
nearest neighbor distance is in the range of from 0.3 to 3
.mu.m.
7. The array of claim 6 wherein each of said polymer molecules is a
polynucleotide molecule comprising a concatemer of multiple copies
of a target sequence and an adaptor oligonucleotide.
8. The array of claim 4 wherein said discrete spaced apart regions
are wells in said support, the wells each having an opening with an
area equal to or less than that of said discrete spaced apart
regions.
9. An array of polynucleotide molecules comprising: a support
having a surface; and a plurality of polynucleotide molecules
attached to the surface, wherein each polynucleotide molecule has a
random coil state and comprises a concatemer of multiple copies of
a target sequence such that the polynucleotide molecule is attached
to the surface within a region substantially equivalent to a
projection of the random coil on the surface and randomly disposed
at a density such that at least thirty percent of the
polynucleotide molecules have a nearest neighbor distance of at
least fifty nm.
10. The array of claim 9 wherein said surface has reactive
functionalities attached thereto and wherein said polynucleotide
molecules are each attached to said surface by one or more linkages
formed by one or more reactive functionalities reacting with
complementary functionalities of said polynucleotide molecules.
11. The array of claim 10 wherein said surface is a planar surface
having an array of discrete spaced apart regions, wherein each
discrete spaced apart region has a size substantially equivalent to
said projection of said random coil of said polynucleotide molecule
and contains said reactive functionalities attached thereto and
wherein such regions have at most one of said polynucleotides
attached.
12. The array of claim 11 wherein said reactive functionalities are
hydrophobic functionalities.
13. The array of claim 11 wherein said discrete spaced apart region
has an area of less than 1 .mu.m.sup.2.
14. The array of claim 13 wherein said discrete spaced apart
regions are wells in said support, the wells each having an opening
with an area equal to or less than that of said discrete spaced
apart regions.
15. The array of claim 13 wherein said polynucleotides are randomly
distributed on said discrete spaced apart regions and wherein said
nearest neighbor distance is in the range of from 0.3 to 3
.mu.m.
16. The array of claim 15 wherein substantially every said discrete
spaced apart region has a polynucleotide attached.
17. The array of claim 15 wherein said concatemer comprises
alternating copies of said target sequence and said adaptor
oligonucleotide.
18. The array of claim 17 wherein each of said concatemers contains
at least 10 copies of its respective target sequence.
19. The array of claim 17 wherein said target sequences each have a
length in the range of from 50 to 500 nucleotides and wherein said
adaptor oligonucleotide has a length in the range of from 6 to 60
nucleotides.
20-69. (canceled)
70. A method of determining a nucleotide sequence of a target
polynucleotide, the method comprising the steps of: (a) generating
a plurality of target concatemers from the target polynucleotide,
each target concatemer comprising multiple copies of a fragment of
the target polynucleotide and the plurality of target concatemers
including a number of fragments that substantially covers the
target polynucleotide; (b) forming a random array of target
concatemers fixed to a surface at a density such that at least a
majority of the target concatemers are optically resolvable; (c)
hybridizing one or more probes from a first set of probes to the
random array under conditions that permit the formation of
perfectly matched duplexes between the one or more probes and
complementary sequences on target concatemers; (d) hybridizing one
or more probes from a second set of probes to the random array
under conditions that permit the formation of perfectly matched
duplexes between the one or more probes and complementary sequences
on target concatemers; (e) ligating probes from the first and
second sets hybridized to a target concatemer at contiguous sites;
(f) identifying the sequences of the ligated first and second
probes; and (g) repeating steps (c) through (f) until the sequence
of the target polynucleotide can be determined from the identities
of the sequences of the ligated probes.
71-73. (canceled)
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 16/994,343, filed Aug. 14, 2020; which is a continuation of
U.S. application Ser. No. 16/425,846, filed May 29, 2019; which is
a continuation of U.S. application Ser. No. 15/442,659, filed Feb.
25, 2017, now U.S. Pat. No. 10,351,909; which is a continuation of
U.S. application Ser. No. 14/714,133 filed May 15, 2015, now U.S.
Pat. No. 9,650,673; which is a continuation of U.S. application
Ser. No. 12/882,880, filed Sep. 15, 2010; which is continuation of
U.S. application Ser. No. 11/451,691, filed Jun. 13, 2006, now U.S.
Pat. No. 8,445,194; which claims priority from U.S. provisional
application Nos. 60/776,415, filed Feb. 24, 2006, 60/725,116, filed
Oct. 7, 2005, and 60/690,771 filed Jun. 15, 2005, each of which is
hereby incorporated by reference in its entirety.
REFERENCE TO SUBMISSION OF A SEQUENCE LISTING
[0003] The instant application contains a Sequence Listing which
has been submitted via EFS-Web and is hereby incorporated by
reference in its entirety. Said ASCII copy, created on Aug. 14,
2020, is named 092171-1282770-5004-US13_SL.txt, and is 13 KB in
size.
FIELD OF THE INVENTION
[0004] The present invention relates to methods and compositions
for high-throughput analysis of populations of individual
molecules, and more particularly, to methods and compositions
related to fabrication of single molecule arrays and applications
thereof, especially in high-throughput nucleic acid sequencing and
genetic analysis.
BACKGROUND
[0005] Large-scale molecular analysis is central to understanding a
wide range of biological phenomena related to states of health and
disease both in humans and in a host of economically important
plants and animals, e.g. Collins et al (2003), Nature, 422:
835-847; Hirschhorn et al (2005), Nature Reviews Genetics, 6:
95-108; National Cancer Institute, Report of Working Group on
Biomedical Technology, "Recommendation for a Human Cancer Genome
Project," (February, 2005). Miniaturization has proved to be
extremely important for increasing the scale and reducing the costs
of such analyses, and an important route to miniaturization has
been the use of microarrays of probes or analytes. Such arrays play
a key role in most currently available, or emerging, large-scale
genetic analysis and proteomic techniques, including those for
single nucleotide polymorphism detection, copy number assessment,
nucleic acid sequencing, and the like, e.g. Kennedy et al (2003),
Nature Biotechnology, 21: 1233-1237; Gunderson et al (2005), Nature
Genetics, 37: 549-554; Pinkel and Albertson (2005), Nature Genetics
Supplement, 37: 511 S17; Leamon et al (2003), Electrophoresis, 24:
3769-3777; Shendure et al (2005), Science, 309: 1728-1732; Cowie et
al (2004), Human Mutation, 24: 261-271; and the like. However, the
scale of microarrays currently used in such techniques still falls
short of that required to meet the goals of truly low cost analyses
that would make practical such operations as personal genome
sequencing, environmental sequencing to use changes in complex
microbial communities as an indicator of states of health, either
personal or environmental, studies that associate genomic features
with complex traits, such as susceptibilities to cancer, diabetes,
cardiovascular disease, and the like, e.g. Collins et al (cited
above); Hirschhorn et al (cited above); Tringe et al (2005), Nature
Reviews Genetics, 6: 805-814; Service (2006), Science, 311:
1544-1546.
[0006] Increasing the scale of analysis in array-based schemes for
DNA sequencing is particularly challenging as the feature size of
the array is decreased to molecular levels, since most schemes
require not only a procedure for forming high density arrays, but
also repeated cycles of complex biochemical steps that complicate
the problems of array integrity, signal generation, signal
detection, and the like, e.g. Metzker (2005), Genome Research, 15:
1767-1776; Shendure et al (2004), Nature Reviews Genetics, 5:
335-344; Weiss (1999), Science, 283: 1676-1683. Some approaches
have employed high density arrays of unamplified target sequences,
which present serious signal-to-noise challenges, when "sequencing
by synthesis" chemistries have been used, e.g. Balasubramanian et
al, U.S. Pat. No. 6,787,308. Other approaches have employed in situ
amplification of randomly disposed target sequences, followed by
application of "sequencing by synthesis" chemistries. Such
approaches also have given rise to various difficulties, including
(i) significant variability in the size of target sequence
clusters, (ii) gradual loss of phase in extension steps carried out
by polymerases, (iii) lack of sequencing cycle efficiency that
inhibits read lengths, and the like, e.g. Kartalov et al, Nucleic
Acids Research, 32: 2873-2879 (2004); Mitra et al, Anal. Biochem.
320: 55-65 (2003); Metzker (cited above).
[0007] In view of the above, it would be advantageous for the
medical, life science, and agricultural fields if there were
available molecular arrays and arraying techniques that permitted
efficient and convenient analysis of large numbers of individual
molecules, such as DNA fragments covering substantially an entire
mammalian-sized genome, in parallel in a single analytical
operation.
SUMMARY OF THE INVENTION
[0008] In one aspect, the invention provides high density single
molecule arrays, methods of making and using such compositions, and
kits for implementing such methods. Compositions of the invention
in one form include random arrays of a plurality of different
single molecules disposed on a surface, where the single molecules
each comprise a macromolecular structure and at least one analyte,
such that each macromolecular structure comprises a plurality of
attachment functionalities that are capable of forming bonds with
one or more functionalities on the surface. In one aspect, the
analyte is a component of the macromolecular structure, and in
another aspect, the analyte is attached to the macromolecular
structure by a linkage between a unique functionality on such
structure and a reactive group or attachment moiety on the analyte.
In another aspect, compositions of the invention include random
arrays of single molecules disposed on a surface, where the single
molecules each comprise a concatemer of at least one target
polynucleotide and each is attached to the surface by linkages
formed between one or more functionalities on the surface and
complementary functionalities on the concatemer. In another form,
compositions of the invention include random arrays of single
molecules disposed on a surface, where the single molecules each
comprise a concatemer of at least one target polynucleotide and at
least one adaptor oligonucleotide and each is attached to such
surface by the formation of duplexes between capture
oligonucleotides on the surface and the attachment oligonucleotides
in the concatemer. In still another form, compositions of the
invention include random arrays of single molecules disposed on a
surface, where each single molecule comprises a bifunctional
macromolecular structure having a unique functionality and a
plurality of complementary functionalities, and where each single
molecule is attached to the surface by linkages between one or more
functionalities on the surface and complementary functionalities on
the bifunctional macromolecular structure, the unique functionality
having an orthogonal chemical reactivity with respect to the
complementary functionalities and being capable of forming a
covalent linkage with an analyte. In regard to the above
compositions, in another aspect, such single molecules are disposed
in a planar array randomly distributed onto discrete spaced apart
regions having defined positions. Preferably, in this aspect, the
discrete spaced apart regions each have an area that permits the
capture of no more than a single molecule and each is surrounded by
an inter-regional space that is substantially free of other single
molecules.
[0009] In one aspect, the invention includes an array of polymer
molecules comprising: (a) a support having a surface; and (b) a
plurality of polymer molecules attached to the surface, wherein
each polymer molecule has a random coil state and comprises a
branched or linear structure of multiple copies of one or more
linear polymeric units, such that the polymer molecule is attached
to the surface within a region substantially equivalent to a
projection of the random coil on the surface and randomly disposed
at a density such that at least thirty percent of the polymer
molecules are separately detectable. As discussed more fully below,
whenever the polymer molecules are linear, in one embodiment,
"substantially equivalent" in reference to the above projection
means a substantially circular region with a diameter equal to the
root mean square of the end-to-end distance of such linear polymer.
In another embodiment, for linear or branched polymers,
"substantially equivalent" means a substantially circular region
having a diameter that is one half or less than the total length of
the polymer; or in another embodiment one tenth or less; or in
another embodiment, one hundredth or less.
[0010] In another aspect, the invention includes an array of
polynucleotide molecules comprising: (a) a support having a
surface; and (b) a plurality of polynucleotide molecules attached
to the surface, wherein each polynucleotide molecule has a random
coil state and comprises a concatemer of multiple copies of a
target sequence such that the polynucleotide molecule is attached
to the surface within a region substantially equivalent to a
projection of the random coil on the surface and randomly disposed
at a density such that at least thirty percent of the
polynucleotide molecules have a nearest neighbor distance of at
least fifty nm.
[0011] A method of making arrays of provided polymer molecules
wherein each polymer molecule has a random coil or similar or other
three-dimensional state and comprises a branched or linear
structure of multiple copies of one or more linear polymeric units,
such that the existing polymer molecule is attached to the surface
within a region substantially equivalent to a projection of the
random coil on the surface or a region having size that is one half
or less, one tenth or less or one hundredth or less of the total
length of the polymer, and randomly disposed at a density such that
at least twenty or at least thirty percent of the polymer molecules
are separately detectable.
[0012] In still another aspect, the invention provides an array of
single molecules comprising: (a) a support having a planar surface
having a regular array of discrete spaced apart regions, wherein
each discrete spaced apart region has an area of less than 1
.mu.m.sup.2 and contains reactive functionalities attached thereto;
and (b) a plurality of single molecules attached to the surface,
wherein each single molecule comprises a macromolecular structure
and at least one analyte having an attachment moiety, such that
each macromolecular structure comprises a unique functionality and
a plurality of attachment functionalities that are capable of
forming linkages with the reactive functionalities of the discrete
spaced apart regions, and such that the analyte is attached to the
macromolecular structure by a linkage between the unique
functionality and the attachment moiety of the analyte, wherein the
plurality of single molecules are randomly disposed on the discrete
spaced apart regions such that at least a majority of the discrete
spaced apart regions contain only one single molecule.
[0013] In another aspect, the invention provides an array of
polynucleotide molecules comprising: (a) a support having a surface
with capture oligonucleotides attached thereto; and (b) a plurality
of polynucleotide molecules attached to the surface, wherein each
polynucleotide molecule comprises a concatemer of multiple copies
of a target sequence and an adaptor oligonucleotide such that the
polynucleotide molecule is attached to the surface by one or more
complexes formed between capture oligonucleotides and adaptor
oligonucleotides, the polynucleotide molecules being randomly
disposed on the surface at a density such that at least a majority
of the polynucleotide molecules have a nearest neighbor distance of
at least fifty nm. In one embodiment of this aspect, the surface is
a planar surface having an array of discrete spaced apart regions,
wherein each discrete spaced apart region has a size equivalent to
that of the polynucleotide molecule and contains the capture
oligonucleotides attached thereto and wherein substantially all
such regions have at most one of the polynucleotide molecules
attached.
[0014] The invention further includes, a method of making an array
of polynucleotide molecules comprising the following steps: (a)
generating a plurality of polynucleotide molecules each comprising
a concatemer of a DNA fragment from a source DNA and an adaptor
oligonucleotide; and (b) disposing the plurality of polynucleotide
molecules onto a support having a surface with capture
oligonucleotides attached thereto so that the polynucleotide
molecules are fixed to the surface by one or more complexes formed
between capture oligonucleotides and adaptor oligonucleotides and
so that the polynucleotide molecules are randomly distributed on
the surface at a density such that a majority of the polynucleotide
molecules have a nearest neighbor distance of at least fifty nm,
thereby forming the array of polynucleotide molecules.
[0015] In another aspect, the invention provides a method of
determining a nucleotide sequence of a target polynucleotide, the
method comprising the steps of: (a) generating a plurality of
target concatemers from the target polynucleotide, each target
concatemer comprising multiple copies of a fragment of the target
polynucleotide and the plurality of target concatemers including a
number of fragments that substantially covers the target
polynucleotide; (b) forming a random array of target concatemers
fixed to a surface at a density such that at least a majority of
the target concatemers are optically resolvable; (c) identifying a
sequence of at least a portion of each fragment in each target
concatemer; and (d) reconstructing the nucleotide sequence of the
target polynucleotide from the identities of the sequences of the
portions of fragments of the concatemers. In one embodiment of this
aspect, the step of identifying includes the steps of (a)
hybridizing one or more probes from a first set of probes to the
random array under conditions that permit the formation of
perfectly matched duplexes between the one or more probes and
complementary sequences on target concatemers; (b) hybridizing one
or more probes from a second set of probes to the random array
under conditions that permit the formation of perfectly matched
duplexes between the one or more probes and complementary sequences
on target concatemers; (c) ligating probes from the first and
second sets hybridized to a target concatemer at contiguous sites;
(d) identifying the sequences of the ligated first and second
probes; and (e) repeating steps (a through (d) until the sequence
of the target polynucleotide can be determined from the identities
of the sequences of the ligated probes.
[0016] In another aspect, the invention includes kits for making
random arrays of the invention and for implementing applications of
the random arrays of the invention, particularly high-throughput
analysis of one or more target polynucleotides.
[0017] Among other advantages, the methods of the invention provide
flexibility in making and using an array of structured random
arrays for more efficient haplotype and splice variant
determination, analysis of multiple samples in parallel, staggered
sequencing reaction to eliminate the idle time of CCD detectors,
parallel probing cycles to shorten the sequencing completion time
of longer DNA fragments.
[0018] The present invention provides a significant advance in the
microarray field by providing arrays of single molecules comprising
linear and/or branched polymer structures that may incorporate or
have attached target analyte molecules. In one form, such single
molecules are concatemers of target polynucleotides arrayed at
densities that permit efficient high resolution analysis of
mammalian-sized genomes, including sequence determination of all or
substantial parts of such genomes, sequence determination of tagged
fragments from selected regions of multiple genomes, digital
readouts of gene expression, and genome-wide assessments of copy
number patterns, methylation patterns, chromosomal stability,
individual genetic variation, and the like.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] FIGS. 1A-1I illustrate various embodiments of the methods
and compositions of the invention.
[0020] FIGS. 2A-2B illustrate methods of circularizing genomic DNA
fragments for generating concatemers of polynucleotide
analytes.
[0021] FIG. 3 is an image of a glass surface containing a
disposition of concatemers of E. coli fragments.
[0022] FIG. 4 is an image of concatemers derived from two different
organisms that are selectively labeled using oligonucleotide
probes.
[0023] FIG. 5 is an image of concatemers of DNA fragments that
contain a degenerated base, each of which is identified by a
specific ligation probe.
[0024] FIG. 6 is an image of concatemers of DNA fragments that
contain a segment of degenerate bases, pairs of which are
identified by specific probes.
[0025] FIG. 7 is a scheme for identifying sequence differences
between reference sequences and test sequences using enzymatic
mismatch detection and for constructing DNA circles therefrom.
[0026] FIG. 8 is another for identifying sequence differences
between a reference sequence and a test sequence using enzymatic
mismatch detection and for constructing DNA circles therefrom.
[0027] FIG. 9 shows general elements of the universal nano-ball
probe template single stranded DNA circle.
[0028] FIG. 10 illustrates using the MetaMorph software, 3 images
were overlaid together with slight shifts. The blue colored image
corresponds to result of hybridization of the BrPrb3 (the adaptor
probe) to the array. The red colored image, shifted slightly above
the blue image, corresponds to the result of hybridization of the
Ba3 probe to the array. The green colored image, shifted slightly
below the blue image, corresponds to the result of hybridization of
the Yp3 probe to the array. The circle denoted with `A` indicates
the position of one of the spots co-hybridize with both the adaptor
probe and the Ba3 probe, while the circle denoted with `B`
indicates the position of one of the spots co-hybridize with both
the adaptor probe and the Yp3 probe. Note: these arrays are produce
by attaching DNA nano-balls without any size selection to glass
surface covered with a carpet of capture oligonucleotides. We are
working on applying nano-printing or surface pattering by
photochemistry technologies to producing a glass substrate
containing a grid of DNA nano-ball binding sites where each site is
about 0.25-0.50 micrometer in size and surrounded by 0.75 micron or
0.50 micron of surface that does not bind DNA. Only one DNA
nano-ball will be able to attach to such a binding site. This will
produce a regular grid of individual submicron DNA spots of similar
size.
[0029] FIG. 11 illustrates using the MetaMorph software, 5 images
were overlaid together with slight shifts. The blue colored image
corresponds to result of hybridization of the BrPrb3 (the adaptor
probe) to the array. The red image corresponds to hybridization
with the A-specific ligation probe pair (T1Aa9 and T1Ab9), the
green image corresponds to hybridization with the C-specific
ligation probe pair (T1Aa10 and T1Ab9), the yellow image
corresponds to hybridization with the G-specific ligation probe
pair (T1Aa11 and T1Ab9), the cyan image corresponds to
hybridization with the T-specific ligation probe pair (T1Aa12 and
T1Ab9). The circle denoted with `A` indicates the position of one
of the spots co-hybridize with both the adaptor probe and the
A-specific ligation probe pair, similarly for circles denoted with
`C`, `G` and `T`. Note: these arrays are produce by attaching DNA
nano-balls without any size selection to glass surface covered with
a carpet of capture oligonucleotides. We are working on applying
nano-printing or surface pattering by photochemistry technologies
to producing a glass substrate containing a grid of DNA nano-ball
binding sites where each site is about 0.25-0.50 micrometer in size
and surrounded by 0.75 micron or 0.50 micron of surface that does
not bind DNA. Only one DNA nano-ball will be able to attach to such
a binding site. This will produce a regular grid of individual
submicron DNA spots of similar size.
[0030] FIG. 12 shows attachment of single stranded concatemers to
glass surface. RCA generated concatemer of a 94mer was incubated on
a capture-probe coated glass in the presence of TAMRA labeled
probe. a, initially attached single molecule concatemers with
partial attachment and extension of one molecule; b, final
attachment and condensation of the molecule due to hybridization to
capture probes. c, image shows concatemer threads on glass with no
capture probes. Random arrays were imaged on our rSBH
instrument.
[0031] FIG. 13 shows an image of randomly distributed concatemers
hybridized to capture oligonucleotides. Sequences were detected
with a TAMRA labeled probe to adapter sequences.
[0032] FIG. 14 shows a circle formation schema: A. Ligation of an
adapter to 5' end of genomic fragment via universal template. B.
Closing of the adapter-modified fragment having 3'-polyA tail using
a bridging template. Gel tests: A. Preservation of DNA circles (top
band) with Exonuclease V digestion. B. In the presence of Phi29 DNA
polymerase high molecular weight DNA molecules are observed,
indicating the success of the rolling circle amplification.
[0033] FIG. 15 shows PCR amplification with tailed primers (1) is
followed by strand removal or strand separation and the addition of
a bridging oligonucleotide (2). Circle formation proceeds utilizing
the bridge and DNA ligase (3).
[0034] FIG. 16 shows comparison of structured and standard random
DNA arrays made by attaching RCR products. On the left is a
standard random array with capture oligonucleotides spread over the
entire glass slide (black bars shown at the side-view at the
top-left). RCR concatemer products are randomly attached at a low
spot density (bottom panel) in order to prevent co-localization of
multiple DNA products per spot (blue and green concatemer chains).
The spots vary in size (up to 1 um) due to FIG. 17 shows single
stranded DNA is amplified and captured to a solid support through
biotinylated reverse primers
[0035] FIG. 18 shows mismatches are formed along the 10 kb
heteroduplex from test and reference DNA (panel (i)). After
cleavage each fragment can bind two adapter A molecules that will
release a large proportion of the genomic DNA and capture the
mutated regions (panel (ii)).
[0036] FIG. 19 shows that Biotinylated (b) test DNA is mixed with a
reference DNA and after heat denaturation and annealing produces a
population of biotinylated heteroduplexes and non biotinylated
homoduplexes. Heteroduplexes containing polymorphisms are attached
to the surface with streptavidin (S) for isolation from reference
DNA. (Panel (i)). Mismatches are formed along the 10 kb
heteroduplex from test and reference DNA (Panel (ii)). After
cleavage each fragment can bind two adapter molecules that will
release a large proportion of the genomic DNA and capture the
mutated regions (Panel (iii)).
[0037] FIG. 20 shows production, capture and amplification of DNA
mismatches.
[0038] FIG. 21 shows Method 11 for production, capture and
amplification of DNA mismatches.
DETAILED DESCRIPTION OF THE INVENTION
[0039] The practice of the present invention may employ, unless
otherwise indicated, conventional techniques and descriptions of
organic chemistry, polymer technology, molecular biology (including
recombinant techniques), cell biology, biochemistry, and
immunology, which are within the skill of the art. Such
conventional techniques include polymer array synthesis,
hybridization, ligation, and detection of hybridization using a
label. Specific illustrations of suitable techniques can be had by
reference to the example herein below. However, other equivalent
conventional procedures can, of course, also be used. Such
conventional techniques and descriptions can be found in standard
laboratory manuals such as Genome Analysis: A Laboratory Manual
Series (Vols. I-IV), Using Antibodies: A Laboratory Manual, Cells:
A Laboratory Manual, PCR Primer: A Laboratory Manual, and Molecular
Cloning: A Laboratory Manual (all from Cold Spring Harbor
Laboratory Press), Stryer, L. (1995) Biochemistry (4th Ed.)
Freeman, New York, Gait, "Oligonucleotide Synthesis: A Practical
Approach" 1984, IRL Press, London, Nelson and Cox (2000),
Lehninger, Principles of Biochemistry 3.sup.rd Ed., W. H. Freeman
Pub., New York, N.Y. and Berg et al. (2002) Biochemistry, 5.sup.th
Ed., W. H. Freeman Pub., New York, N.Y., all of which are herein
incorporated in their entirety by reference for all purposes.
[0040] The invention provides random single molecule arrays for
large-scale parallel analysis of populations of molecules,
particularly DNA fragments, such as genomic DNA fragments.
Generally, single molecules of the invention comprise an attachment
portion and an analyte portion. The attachment portion comprises a
macromolecular structure that provides for multivalent attachment
to a surface, particularly a compact or restricted area on a
surface so that signals generated from it or an attached analyte
are concentrated. That is, the macromolecular structure occupies a
compact and limited region of the surface. Macromolecular
structures of the invention may be bound to a surface in a variety
of ways. Multi-valent bonds may be covalent or non-covalent.
Non-covalent bonds include formation of duplexes between capture
oligonucleotides on the surface and complementary sequences in the
macromolecular structure, and adsorption to a surface by attractive
noncovalent interactions, such as Van der Waal forces, hydrogen
bonding, ionic and hydrophobic interactions, and the like.
Multi-valent covalent bonding may be accomplished, as described
more fully below, by providing reactive functionalities on the
surface that can reactive with a plurality of complementary
functionalities in the macromolecular structures. An analyte
portion may be attached to a macromolecular structure by way of a
unique linkage or it may form a part of, and be integral with, the
macromolecular structure. Single molecules of the invention are
disposed randomly on a surface of a support material, usually from
a solution; thus, in one aspect, single molecules are uniformly
distributed on a surface in close approximation to a Poisson
distribution. In another aspect, single molecules are disposed on a
surface that contains discrete spaced apart regions in which single
molecules are attached. Preferably, macromolecular structures,
preparation methods, and areas of such discrete spaced apart
regions are selected so that substantially all such regions contain
at most only one single molecule. Preferably, single molecules of
the invention, particularly concatemers, are roughly in a random
coil configuration on a surface and are confined to the area of a
discrete spaced apart region. In one aspect, the discrete spaced
apart regions have defined locations in a regular array, which may
correspond to a rectilinear pattern, hexagonal pattern, or the
like. A regular array of such regions is advantageous for detection
and data analysis of signals collected from the arrays during an
analysis. Also, single molecules confined to the restricted area of
a discrete spaced apart region provide a more concentrated or
intense signal, particularly when fluorescent probes are used in
analytical operations, thereby providing higher signal-to-noise
values. Single molecules of the invention are randomly distributed
on the discrete spaced apart regions so that a given region usually
is equally likely to receive any of the different single molecules.
In other words, the resulting arrays are not spatially addressable
immediately upon fabrication, but may be made so by carrying out an
identification or decoding operation. That is, the identities of
the single molecules are discernable, but not known. As described
more fully below, in some embodiments, there are subsets of
discrete spaced apart regions that receive single molecules only
from corresponding subsets, for example, as defined by
complementary sequences of capture oligonucleotides and adaptor
oligonucleotides.
[0041] In one aspect the invention provides products and processes
for making them. For example, in one approach, preparation of DNA
and detection and quantification arrays includes providing a
mixture of DNA fragments 10, 20, 50, 100 or more bases and shorter
than 25, or 50, or 100, or 500, or 1000, or 2000 or 5000 or 10,000
bases from a source DNA. In embodiments, DNA arrays are formed by
attaching concatemers of the same fragment or by in-situ
amplification of a single DNA molecule. In embodiments, the DNA in
each spot is identified by hybridization signature or partial or
complete sequence determination. Some embodiments comprise RCR
based formation of DNA concatemers with or without sequence
complementary to the support bound capture oligonucleotide. Some
embodiments utilize a support with a grid of regions with DNA
capture chemistry separated by surface without DNA capture
chemistry, each region being 0.1-10 micrometer with center to
center distance of about 0.2 to 20 urn. In some embodiments, the
source DNA is all sequence variants of given length 8 to 20 base.
In some embodiments, the methods comprise identifying nano-ball
sequence by ligation of two adapter dependent or adapter
independent oligonucleotides, and use individual probes or pools of
probes with 0 to about 8 informative bases. In some embodiments,
the invention comprises highly multiplexed DNA detection and
quantification methods consisting of providing a DNA array
containing more than 100,000, more than one million, or more than
ten million DNA spots identified by hybridization signature or
partial or complete sequence, hybridizing target sample comprising
labeled or tagged (or target able to be labeled or tagged) DNA
fragments under conditions allowing the formation of complementary
DNA hybrids, detecting bound labels/tags or bound DNA in array
spots; analyzing data to detect and quantify DNA molecules in the
sample substantially complementary to one or more DNAs on the
array. In some embodiments, DNA are arrays prepared using RCR based
formation of DNA concatemers with or without sequence complementary
to the support bound capture oligo bound. Some embodiments include
a washing step before a detecting step to remove non-hybridized
DNA. Some embodiments include a stringent washing step before a
detecting step to remove non-hybridized DNA and DNA hybridized to
targets with larger number of mismatches. Some embodiments include
performing multiple detection step during the increased stringency
(for example higher temperature, or higher pH) washes. Some
embodiments include determining gene expression and or alternative
splicing; gene deletion or duplication; pathogen detection,
quantification and characterization, SNP detection; mutation
discovery, microbe detection and quantification in natural sources;
DNA sequencing, industrial use in agriculture, food pathogens,
medical diagnostics, cancer samples. In some embodiments, labeling
or tagging of sample molecules is done after binding them to the
detector molecules in the array. In one aspect the invention
provides a support with DNA/RNA with natural or analog bases spots
in a grid or random spot array with informative single stranded DNA
longer than 15, or 25, or 50, or 75 or 100 or 125, or 150, or 200,
or 250, or 300, or 400, or 500, or 750, or 1000 bases and more than
10,000 or 100,000 or 1 million spots per mm.sup.2 containing
multiple copies of the same DNA per spot, wherein more than 1000 or
10,000 or 100,000 different DNA is present in the array and which
DNA is at which spot is determined after DNA attachment. In some
embodiments, more than 50, 60, 70, 80, 90 or 95% of spots in the
grid have single informative DNA species excluding errors produced
by amplification. In some embodiments the invention provides a
plate with 2, 4, 6, 8, 10, 12, 16, 24, 32, 48, 64, 96, 192, 384 or
more such DNA arrays, where in most cases the same DNA is in
different spots in the individual arrays. In some embodiments an
array containing DNA fragments from multiple (2-2000, 10-2000,
20-2000, 50-2000, 100-2000, 100-10,000, 500-10,000 species) is
provided. In some embodiments, an array contains DNA fragments that
have SNP or other differences between individuals or species. In
some embodiments, DNA copies per spot produced by RCR before
attachment. In some embodiments, the DNA isolated from natural
sources. In some embodiments, the identity or sequence of DNA/RNA
or other detector molecule in usable spots is inferred by matching
hybridization or other binding signature or partial or complete
polymer sequence to a reference data base of signatures or
sequences.
[0042] Described herein are DNA/RNA and their derivatives or
peptides or protein and other array products, including processes
for their preparation and uses, that are based on applying mixtures
of detecting molecules of partially or fully known primary
structure or polymer sequence, preferably as concatemers of the
same molecule, on substrates with a pattern of high density small
binding sites separated by non-binding surface, followed by
determining which detecting molecule from the mixture is attached
at which binding site.
[0043] Macromolecular structures of the invention comprise
polymers, either branched or linear, and may be synthetic, e.g.
branched DNA, or may be derived from natural sources, e.g. linear
DNA fragments from a patient's genomic DNA. Usually, macromolecular
structures comprise concatemers of linear single stranded DNA
fragments that can be synthetic, derived from natural sources, or
can be a combination of both. As used herein, the term "target
sequence" refers to either a synthetic nucleic acid or a nucleic
acid derived from a natural source, such as a patient specimen, or
the like. Usually, target sequences are part of a concatemer
generated by methods of the invention, e.g. by RCR, but may also be
part of other structures, such as dendrimers, and other branched
structures. When target sequences are synthetic or derived from
natural sources, they are usually replicated by various methods in
the process of forming macromolecular structures or single
molecules of the invention. It is understood that such methods can
introduce errors into copies, which nonetheless are encompassed by
the term "target sequence."
[0044] Particular features or components of macromolecular
structures may be selected to satisfy a variety of design
objectives in particular embodiments. For example, in some
embodiments, it may be advantageous to maintain an analyte molecule
as far from the surface as possible, e.g. by providing an
inflexible molecular spacer as part of a unique linkage. As another
example, reactive functionalities may be selected as having a size
that effectively prevents attachment of multiple macromolecular
structures to one discrete spaced apart region. As still another
example, macromolecular structures may be provided with other
functionalities for a variety of other purposes, e.g. enhancing
solubility, promoting formation of secondary structures via
hydrogen bonding, and the like.
[0045] In one aspect, macromolecular structures are sufficiently
large that their size, e.g. a linear dimension (such as a diameter)
of a volume occupied in a conventional physiological saline
solution, is approximately equivalent to that a discrete spaced
apart region. For macromolecular structures that are linear
polynucleotides, in one aspect, sizes may range from a few thousand
nucleotides, e.g. 10,000, to several hundred thousand nucleotides,
e.g. 100-200 thousand. As explained more fully below, in several
embodiments, such macromolecular structures are made by generating
circular DNAs and then replicating them in a rolling circle
replication reaction to form concatemers of complements of the
circular DNAs.
[0046] The above concepts are illustrated more fully in the
embodiments shown schematically in FIGS. 1A-1G. After describing
these Figures, elements of the invention are disclosed in
additional detail and examples are given. As mentioned above, in
one aspect, macromolecular structures of the invention are single
stranded polynucleotides comprising concatemers of a target
sequence or fragment. In particular, such polynucleotides may be
concatemers of a target sequence and an adaptor oligonucleotide.
For example, source nucleic acid (1000) is treated (1001) to form
single stranded fragments (1006), preferably in the range of from
50 to 600 nucleotides, and more preferably in the range of from 300
to 600 nucleotides, which are then ligated to adaptor
oligonucleotides (1004) to form a population of adaptor-fragment
conjugates (1002). Source nucleic acid (1000) may be genomic DNA
extracted from a sample using conventional techniques, or a cDNA or
genomic library produced by conventional techniques, or synthetic
DNA, or the like. Treatment (1001) usually entails fragmentation by
a conventional technique, such as chemical fragmentation, enzymatic
fragmentation, or mechanical fragmentation, followed by
denaturation to produce single stranded DNA fragments. Adaptor
oligonucleotides (1004), in this example, are used to form (1008) a
population (1010) of DNA circles by the method illustrated in FIG.
2A. In one aspect, each member of population (1010) has an adaptor
with an identical primer binding site and a DNA fragment from
source nucleic acid (1000). The adapter also may have other
functional elements including, but not limited to, tagging
sequences, attachment sequences, palindromic sequences, restriction
sites, functionalization sequences, and the like. In other
embodiments, classes of DNA circles may be created by providing
adaptors having different primer binding sites. After DNA circles
(1010) are formed, a primer and rolling circle replication (RCR)
reagents may be added to generate (1011) in a conventional RCR
reaction a population (1012) of concatemers (1015) of the
complements of the adaptor oligonucleotide and DNA fragments, which
population can then be isolated using conventional separation
techniques. Alternatively, RCR may be implemented by successive
ligation of short oligonucleotides, e.g. 6-mers, from a mixture
containing all possible sequences, or if circles are synthetic, a
limited mixture of oligonucleotides having selected sequences for
circle replication. Concatemers may also be generated by ligation
of target DNA in the presence of a bridging template DNA
complementary to both beginning and end of the target molecule. A
population of different target DNA may be converted in concatemers
by a mixture of corresponding bridging templates. Isolated
concatemers (1014) are then disposed (1016) onto support surface
(1018) to form a random array of single molecules. Attachment may
also include wash steps of varying stringencies to remove
incompletely attached single molecules or other reagents present
from earlier preparation steps whose presence is undesirable or
that are nonspecifically bound to surface (1018). Concatemers
(1020) can be fixed to surface (1018) by a variety of techniques,
including covalent attachment and non-covalent attachment. In one
embodiment, surface (1018) may have attached capture
oligonucleotides that form complexes, e.g. double stranded
duplexes, with a segment of the adaptor oligonucleotide, such as
the primer binding site or other elements. In other embodiments,
capture oligonucleotides may comprise oligonucleotide clamps, or
like structures, that form triplexes with adaptor oligonucleotides,
e.g. Gryaznov et al, U.S. Pat. No. 5,473,060. In another
embodiment, surface (1018) may have reactive functionalities that
react with complementary functionalities on the concatemers to form
a covalent linkage, e.g. by way of the same techniques used to
attach cDNAs to microarrays, e.g. Smirnov et al (2004), Genes,
Chromosomes & Cancer, 40: 72-77; Beaucage (2001), Current
Medicinal Chemistry, 8: 1213-1244, which are incorporated herein by
reference. Long DNA molecules, e.g. several hundred nucleotides or
larger, may also be efficiently attached to hydrophobic surfaces,
such as a clean glass surface that has a low concentration of
various reactive functionalities, such as --OH groups. Concatemers
of DNA fragments may be further amplified in situ after disposition
of a surface. For example after disposition, concatemer may be
cleaved by reconstituting a restriction site in adaptor sequences
by hybridization of an oligonucleotide, after which the fragments
are circularized as described below and amplified in situ by a RCR
reaction.
[0047] FIG. 113 illustrates a section (1102) of a surface of a
random array of single molecules, such as single stranded
polynucleotides. Such molecules under conventional conditions (a
conventional DNA buffer, e.g. TE, SSC, SSPE, or the like, at room
temperature) form random coils that roughly fill a spherical volume
in solution having a diameter of from about 100 to 300 nm, which
depends on the size of the DNA and buffer conditions, in a manner
well known in the art, e.g. Edvinsson, "On the size and shape of
polymers and polymer complexes," Dissertation 696 (University of
Uppsala, 2002). One measure of the size of a random coil polymer,
such as single stranded DNA, is a root mean square of the
end-to-end distance, which is roughly a measure of the diameter of
the randomly coiled structure. Such diameter, referred to herein as
a "random coil diameter," can be measured by light scatter, using
instruments, such as a Zetasizer Nano System (Malvern Instruments,
UK), or like instrument. Additional size measures of macromolecular
structures of the invention include molecular weight, e.g. in
Daltons, and total polymer length, which in the case of a branched
polymer is the sum of the lengths of all its branches. Upon
attachment to a surface, depending on the attachment chemistry,
density of linkages, the nature of the surface, and the like,
single stranded polynucleotides fill a flattened spheroidal volume
that on average is bounded by a region (1107) defined by dashed
circles (1108) having a diameter (1110), which is approximately
equivalent to the diameter of a concatemer in random coil
configuration. Stated another way, in one aspect, macromolecular
structures, e.g. concatemers, and the like, are attached to surface
(1102) within a region that is substantially equivalent to a
projection of its random coil state onto surface (1102), for
example, as illustrated by dashed circles (1108). An area occupied
by a macromolecular structure can vary, so that in some
embodiments, an expected area may be within the range of from 2-3
times the area of projection (1108) to some fraction of such area,
e.g. 25-50 percent. As mentioned elsewhere, preserving the compact
form of the macromolecular structure on the surface allows a more
intense signal to be produced by probes, e.g. fluorescently labeled
oligonucleotides, specifically directed to components of a
macromolecular structure or concatemer. The size of diameter (1110)
of regions (1107) and distance (1106) to the nearest neighbor
region containing a single molecule are two quantities of interest
in the fabrication of arrays. A variety of distance metrics may be
employed for measuring the closeness of single molecules on a
surface, including center-to-center distance of regions (1107),
edge-to-edge distance of regions (1007), and the like. Usually,
center-to-center distances are employed herein. The selection of
these parameters in fabricating arrays of the invention depends in
part on the signal generation and detection systems used in the
analytical processes. Generally, densities of single molecules are
selected that permit at least twenty percent, or at least thirty
percent, or at least forty percent, or at least a majority of the
molecules to be resolved individually by the signal generation and
detection systems used. In one aspect, a density is selected that
permits at least seventy percent of the single molecules to be
individually resolved. In one aspect, whenever scanning electron
microscopy is employed, for example, with molecule-specific probes
having gold nanoparticle labels, e.g. Nie et al (2006), Anal.
Chem., 78: 1528-1534, which is incorporated by reference, a density
is selected such that at least a majority of single molecules have
a nearest neighbor distance of 50 nm or greater; and in another
aspect, such density is selected to ensure that at least seventy
percent of single molecules have a nearest neighbor distance of 100
nm or greater. In another aspect, whenever optical microscopy is
employed, for example with molecule-specific probes having
fluorescent labels, a density is selected such that at least a
majority of single molecules have a nearest neighbor distance of
200 nm or greater; and in another aspect, such density is selected
to ensure that at least seventy percent of single molecules have a
nearest neighbor distance of 200 nm or greater. In still another
aspect, whenever optical microscopy is employed, for example with
molecule-specific probes having fluorescent labels, a density is
selected such that at least a majority of single molecules have a
nearest neighbor distance of 300 nm or greater; and in another
aspect, such density is selected to ensure that at least seventy
percent of single molecules have a nearest neighbor distance of 300
nm or greater, or 400 nm or greater, or 500 nm or greater, or 600
nm or greater, or 700 nm or greater, or 800 nm or greater. In still
another embodiment, whenever optical microscopy is used, a density
is selected such that at least a majority of single molecules have
a nearest neighbor distance of at least twice the minimal feature
resolution power of the microscope. In another aspect, polymer
molecules of the invention are disposed on a surface so that the
density of separately detectable polymer molecules is at least 1000
per .mu.m.sup.2, or at least 10,000 per .mu.m.sup.2, or at least
100,000 per .mu.m.sup.2.
[0048] In another aspect of the invention, illustrated for a
particular embodiment in FIG. 1C, the requirement of selecting
densities of randomly disposed single molecules to ensure desired
nearest neighbor distances is obviated by providing on a surface
discrete spaced apart regions that are substantially the sole sites
for attaching single molecules. That is, in such embodiments the
regions on the surface between the discrete spaced apart regions,
referred to herein as "inter-regional areas," are inert in the
sense that concatemers, or other macromolecular structures, do not
bind to such regions. In some embodiments, such inter-regional
areas may be treated with blocking agents, e.g. DNAs unrelated to
concatemer DNA, other polymers, and the like. As in FIG. 1A, source
nucleic acids (1000) are fragmented and adaptored (1002) for
circularization (1010), after which concatemers are formed by RCR
(1012). Isolated concatemers (1014) are then applied to surface
(1120) that has a regular array of discrete spaced apart regions
(1122) that each have a nearest neighbor distance (1124) that is
determined by the design and fabrication of surface (1120). As
described more fully below, arrays of discrete spaced apart regions
(1122) having micron and submicron dimensions for derivatizing with
capture oligonucleotides or reactive functionalities can be
fabricated using conventional semiconductor fabrication techniques,
including electron beam lithography, nano imprint technology,
photolithography, and the like. Generally, the area of discrete
spaced apart regions (1122) is selected, along with attachment
chemistries, macromolecular structures employed, and the like, to
correspond to the size of single molecules of the invention so that
when single molecules are applied to surface (1120) substantially
every region (1122) is occupied by no more than one single
molecule. The likelihood of having only one single molecule per
discrete spaced apart region may be increased by selecting a
density of reactive functionalities or capture oligonucleotides
that results in fewer such moieties than their respective
complements on single molecules. Thus, a single molecule will
"occupy" all linkages to the surface at a particular discrete
spaced apart region, thereby reducing the chance that a second
single molecule will also bind to the same region. In particular,
in one embodiment, substantially all the capture oligonucleotides
in a discrete spaced apart region hybridize to adaptor
oligonucleotides a single macromolecular structure. In one aspect,
a discrete spaced apart region contains a number of reactive
functionalities or capture oligonucleotides that is from about ten
percent to about fifty percent of the number of complementary
functionalities or adaptor oligonucleotides of a single molecule.
The length and sequence(s) of capture oligonucleotides may vary
widely, and may be selected in accordance with well-known
principles, e.g. Wetmur, Critical Reviews in Biochemistry and
Molecular Biology, 26: 227-259 (1991); Britten and Davidson,
chapter 1 in Hames et al, editors, Nucleic Acid Hybridization: A
Practical Approach (IRL Press, Oxford, 1985). In one aspect, the
lengths of capture oligonucleotides are in a range of from 6 to 30
nucleotides, and in another aspect, within a range of from 8 to 30
nucleotides, or from 10 to 24 nucleotides. Lengths and sequences of
capture oligonucleotides are selected (i) to provide effective
binding of macromolecular structures to a surface, so that losses
of macromolecular structures are minimized during steps of
analytical operations, such as washing, etc., and (ii) to avoid
interference with analytical operations on analyte molecules,
particularly when analyte molecules are DNA fragments in a
concatemer. In regard to (i), in one aspect, sequences and lengths
are selected to provide duplexes between capture oligonucleotides
and their complements that are sufficiently stable so that they do
not dissociate in a stringent wash. In regard to (ii), if DNA
fragments are from a particular species of organism, then
databases, when available, may be used to screen potential capture
sequences that may form spurious or undesired hybrids with DNA
fragments. Other factors in selecting sequences for capture
oligonucleotides are similar to those considered in selecting
primers, hybridization probes, oligonucleotide tags, and the like,
for which there is ample guidance, as evidenced by the references
cited below in the Definitions section. In some embodiments, a
discrete spaced apart region may contain more than one kind of
capture oligonucleotide, and each different capture oligonucleotide
may have a different length and sequence. In one aspect of
embodiments employing regular arrays of discrete spaced apart
regions, sequences of capture oligonucleotides are selected so that
sequences of capture oligonucleotide at nearest neighbor regions
have different sequences. In a rectilinear array, such
configurations are achieved by rows of alternating sequence types.
In other embodiments, a surface may have a plurality of subarrays
of discrete spaced apart regions wherein each different subarray
has capture oligonucleotides with distinct nucleotide sequences
different from those of the other subarrays. A plurality of
subarrays may include 2 subarrays, or 4 or fewer subarrays, or 8 or
fewer subarrays, or 16 or fewer subarrays, or 32 or fewer
subarrays, or 64 of fewer subarrays. In still other embodiments, a
surface may include 5000 or fewer subarrays. In one aspect, capture
oligonucleotides are attached to the surface of an array by a
spacer molecule, e.g. polyethylene glycol, or like inert chain, as
is done with microarrays, in order to minimize undesired effects of
surface groups or interactions with the capture oligonucleotides or
other reagents.
[0049] In one aspect, the area of discrete spaced apart regions
(1122) is less than 1 .mu.m.sup.2; and in another aspect, the area
of discrete spaced apart regions (1122) is in the range of from
0.04 .mu.m.sup.2 to 1 .mu.m.sup.2; and in still another aspect, the
area of discrete spaced apart regions (1122) is in the range of
from 0.2 .mu.m.sup.2 to 1 .mu.m.sup.2. In another aspect, when
discrete spaced apart regions are approximately circular or square
in shape so that their sizes can be indicated by a single linear
dimension, the size of such regions are in the range of from 125 nm
to 250 nm, or in the range of from 200 nm to 500 nm. In one aspect,
center-to-center distances of nearest neighbors of regions (1122)
are in the range of from 0.25 um to 20 .mu.m; and in another
aspect, such distances are in the range of from 1 um to 10 .mu.m,
or in the range from 50 to 1000 nm. In one aspect, regions (1120)
may be arranged on surface (1018) in virtually any pattern in which
regions (1122) have defined locations, i.e. in any regular array,
which makes signal collection and data analysis functions more
efficient. Such patterns include, but are not limited to,
concentric circles of regions (1122), spiral patterns, rectilinear
patterns, hexagonal patterns, and the like. Preferably, regions
(1122) are arranged in a rectilinear or hexagonal pattern.
[0050] As illustrated in FIG. 1D, in certain embodiments, DNA
circles prepared from source nucleic acid (1200) need not include
an adaptor oligonucleotide. As before, source nucleic acid (1200)
is fragmented and denatured (1202) to form a population of single
strand fragments (1204), preferably in the size range of from about
50 to 600 nucleotides, and more preferably in the size range of
from about 300 to 600 nucleotides, after which they are
circularized in a non-template driven reaction with circularizing
ligase, such as CircLigase (Epicentre Biotechnologies, Madison,
Wis.), or the like. After formation of DNA circles (1206),
concatemers are generated by providing a mixture of primers that
bind to selected sequences. The mixture of primers may be selected
so that only a subset of the total number of DNA circles (1206)
generate concatemers. After concatemers are generated (1208), they
are isolated and applied to surface (1210) to form a random array
of the invention.
[0051] As mentioned above, single molecules of the invention
comprise an attachment portion and an analyte portion such that the
attachment portion comprises a macromolecular structure that
provides multivalent attachment of the single molecule to a
surface. As illustrated in FIG. 1E, macromolecular structures may
be concatemers made by an RCR reaction in which the DNA circles in
the reaction are synthetic. An analyte portion of a single molecule
is then attached by way of a unique functionality on the
concatemer. Synthetic DNA circles of virtually any sequence can be
produced using well-known techniques, conveniently, in sizes up to
several hundred nucleotides, e.g. 200, and with more difficulty, in
sizes of many hundreds of nucleotides, e.g. up to 500, e.g. Kool,
U.S. Pat. No. 5,426,180; Dolinnaya et al (1993), Nucleic Acids
Research, 21: 5403 5407; Rubin et al (1995), Nucleic Acids
Research, 23: 3547-3553; and the like, which are incorporated
herein by reference. Synthetic DNA circles (1300) that comprise
primer binding sites (1301) are combined with primer (1302) in an
RCR reaction (1306) to produce concatemers (1308). Usually, in this
embodiment, all circles have the same sequence, although different
sequences can be employed, for example, for directing subsets of
concatemers to preselected regions of an array via complementary
attachment moieties, such as adaptor sequences and capture
oligonucleotides. Primer (1302) is synthesized with a functionality
(1304, designated as "R") at its 5' end that is capable of reacting
with a complementary functionality on an analyte to form a covalent
linkage. Exemplary functionalities include amino groups, sulfhydryl
groups, and the like, that can be attached with commercially
available chemistries (e.g. Glen Research). Concatemers (1308) are
applied to surface (1310) to form an array (1314), after which
analytes (1312) having an attachment moiety are applied to array
(1310) where a linkage is formed with a concatemer by reaction of
unique functionalities, R (1311) and attachment moiety (1312).
Alternatively, prior to application to array (1310), concatemers
(1308) may be combined with analytes (1312) so that attachment
moieties and unique functionalities can react to form a linkage,
after which the resulting conjugate is applied to array (1310).
There is abundant guidance in the literature in selecting
appropriate attachment moieties and unique functionalities for
linking concatemers (1308) and many classes of analyte. In one
aspect, for linking protein or peptide analytes to concatemers,
many homo- and heterobifunctional reagents are available
commercially (e.g. Pierce) and are disclosed in references such as
Hermanson, Bioconjugate Techniques (Academic Press, New York,
1996), which is incorporated by reference. For example, whenever
the unique functionality is an amino group, then concatemers (1308)
can be linked to a sulfhydryl group on an analyte using
N-succinimidyl 3-(2-pyridyldithio)propionate (SPDP),
succinimidyloxycarbonyl-a methyl-a-(2-pyridyldithio)toluene (SMPT),
succinimidyl-4-(N-maleimidomethyl)cyclohexane-1-carboxylate (SMCC),
m-maleimidobenzoyl-N-hydroxysuccinimide ester (MB S), N
succinimidyl(4-iodoacetyl)aminobenzoate (SIAB), succinimidyl
6-((iodoacetyl)amino)hexanoate (SIAX), and like reagents. Suitable
complementary functionalities on analytes include amino groups,
sulfhydryl groups, carbonyl groups, which may occur naturally on
analytes or may be added by reaction with a suitable homo- or
heterobifunctional reagent. Analyte molecules may also be attached
to macromolecular structures by way of non-covalent linkages, such
as biotin streptavidin linkages, the formation of complexes, e.g. a
duplexes, between a first oligonucleotide attached to a concatemer
and a complementary oligonucleotide attached to, or forming part
of, an analyte, or like linkages. Analytes include biomolecules,
such as nucleic acids, for example, DNA or RNA fragments,
polysaccharides, proteins, and the like.
[0052] As mentioned above, macromolecular structures of the
invention may comprise branched polymers as well as linear
polymers, such as concatemers of DNA fragments. Exemplary branched
polymer structures are illustrated in FIGS. 1F and 1G. In FIG. 1F,
a branched DNA structure is illustrated that comprises a backbone
polynucleotide (1400) and multiple branch polynucleotides (1402)
each connected to backbone polynucleotide (1400) by their 5' ends
to form a comb-like structure that has all 3' ends, except for a
single 5' end (1404) on backbone polynucleotide (1400), which is
derivatized to have a unique functionality. As mentioned below,
such unique functionality may be a reactive chemical group, e.g. a
protected or unprotected amine, sulfhydryl, or the like, or it may
be an oligonucleotide having a unique sequence for capturing an
analyte having an oligonucleotide with a complementary sequence
thereto. Likewise, such unique functionality may be a capture
moiety, such as biotin, or the like. Such branched DNA structures
are synthesized using known techniques, e.g. Gryaznov, U.S. Pat.
No. 5,571,677; Urdea et al, U.S. Pat. No. 5,124,246; Seeman et al,
U.S. Pat. No. 6,255,469; and the like, which are incorporated
herein by reference. Whenever such macromolecular structures are
polynucleotides, the sequences of components thereof may be
selected for facile self-assembly, or they may be linked by way of
specialized linking chemistries, e.g. as disclosed below, in which
case sequences are selected based on other factors, including, in
some embodiments, avoidance of self-annealing, facile binding to
capture oligonucleotides on a surface, and the like. In FIG. 1G, a
dendrimer structure is illustrated that comprises oligonucleotide
(1406), which is derivatized with multiple tri-valent linking
groups (1408) that each have two functionalities (1410, designated
by "R") by which additional polymers (1407), e.g. polynucleotides,
can be attached to form a linkage to oligonucleotide (1406) thereby
forming macromolecular structure (1409), which, in turn, if
likewise derivatized with multivalent linkers, can form a nucleic
acid dendrimer. Trivalent linkers (1408) for use with
oligonucleotides are disclosed in Iyer et al, U.S. Pat. No.
5,916,750, which is incorporated herein by reference. As
illustrated in FIG. 1H, once such dendrimeric or branched
structures (1411) are constructed, they can be attached to array
(1420) as described above for linear polynucleotides, after which
analytes (1430) can be attached via unique functionalities (1410).
Optionally, unreacted unique functionalities (1422) may be capped
using conventional techniques. Alternatively, dendrimeric or
branched structures (1411) may be combined with analytes (1430)
first, e.g. in solution, so that conjugates are formed, and then
the conjugates are disposed on array (1420). When the analyte is a
polynucleotide (1440) with a free 3' end, as shown in FIG. 11, such
end may be extended in an in situ RCR reaction to form either
concatemers of target sequences or other sequences for further
additions. Likewise, polynucleotide analytes may be extended by
ligation using conventional techniques.
Source Nucleic Acids and Circularization of Target Sequences
[0053] In one aspect of the invention, macromolecular structures
comprise concatemers of polynucleotide analytes, i.e. target
sequences, which are extracted or derived from a sample, such as
genomic DNA or cDNAs from a patient, an organism of economic
interest, or the like. Random arrays of the invention comprising
such single molecules are useful in providing genome-wide analyses,
including sequence determination, SNP measurement, allele
quantitation, copy number measurements, and the like. For
mammalian-sized genomes, preferably fragmentation is carried out in
at least two stages, a first stage to generate a population of
fragments in a size range of from about 100 kilobases (Kb) to about
250 kilobases, and a second stage, applied separately to each
100-250 Kb fragment, to generate fragments in the size range of
from about 50 to 600 nucleotides, and more preferably in the range
of from about 300 to 600 nucleotides, for generating concatemers
for a random array. In some aspects of the invention, the first
stage of fragmentation may also be employed to select a
predetermined subset of such fragments, e.g. fragments containing
genes that encode proteins of a signal transduction pathway, or the
like. The amount of genomic DNA required for constructing arrays of
the invention can vary widely. In one aspect, for mammalian-sized
genomes, fragments are generated from at least 10
genome-equivalents of DNA; and in another aspect, fragments are
generated from at least 30 genome-equivalents of DNA; and in
another aspect, fragments are generated from at least 60
genome-equivalents of DNA.
[0054] Genomic DNA is obtained using conventional techniques, for
example, as disclosed in Sambrook et al., supra, 1999; Current
Protocols in Molecular Biology, Ausubel et al., eds. (John Wiley
and Sons, Inc., NY, 1999), or the like, Important factors for
isolating genomic DNA include the following: 1) the DNA is free of
DNA processing enzymes and contaminating salts; 2) the entire
genome is equally represented; and 3) the DNA fragments are between
about 5,000 and 100,000 by in length. In many cases, no digestion
of the extracted DNA is required because shear forces created
during lysis and extraction will generate fragments in the desired
range. In another embodiment, shorter fragments (1-5 kb) can be
generated by enzymatic fragmentation using restriction
endonucleases. In one embodiment, 10-100 genome-equivalents of DNA
ensure that the population of fragments covers the entire genome.
In some cases, it is advantageous to provide carrier DNA, e.g.
unrelated circular synthetic double-stranded DNA, to be mixed and
used with the sample DNA whenever only small amounts of sample DNA
are available and there is danger of losses through nonspecific
binding, e.g. to container walls and the like.
[0055] In generating fragments in either stage, fragments may be
derived from either an entire genome or it may be derived from a
selected subset of a genome. Many techniques are available for
isolating or enriching fragments from a subset of a genome, as
exemplified by the following references that are incorporated by
reference: Kandpal et al (1990), Nucleic Acids Research, 18:
1789-1795; Callow et al, U.S. patent publication 2005/0019776;
Zabeau et al, U.S. Pat. No. 6,045,994; Deugau et al, U.S. Pat. No.
5,508,169; Sibson, U.S. Pat. No. 5,728,524; Guilfoyle et al, U.S.
Pat. No. 5,994,068; Jones et al, U.S. patent publication
2005/0142577; Gullberg et al, U.S. patent publication 2005/0037356;
Matsuzaki et al, U.S. patent publication 2004/0067493; and the
like.
[0056] For mammalian-sized genomes, an initial fragmentation of
genomic DNA can be achieved by digestion with one or more "rare"
cutting restriction endonucleases, such as Not I, Asc I, Bae I,
CspC I, Pac I, Fse I, Sap I, Sfi I, Psr I, or the like. The
resulting fragments can be used directly, or for genomes that have
been sequenced, specific fragments may be isolated from such
digested DNA for subsequent processing as illustrated in FIG. 2B.
Genomic DNA (230) is digested (232) with a rare cutting restriction
endonuclease to generate fragments (234), after which the fragments
(234) are further digested for a short period (i.e. the reaction is
not allowed to run to completion) with a 5' single stranded
exonuclease, such as 2 exonuclease, to expose sequences (237)
adjacent to restriction site sequences at the end of the fragments.
Such exposed sequences will be unique for each fragment.
Accordingly, biotinylated primers (241) specific for the ends of
desired fragments can be annealed to a capture oligonucleotide for
isolation; or alternatively, such fragments can be annealed to a
primer having a capture moiety, such as biotin, and extended with a
DNA polymerase that does not have strand displacement activity,
such as Taq polymerase Stoffel fragment. After such extension, the
3' end of primers (241) abut the top strand of fragments (242) such
that they can be ligated to form a continuous strand. The latter
approach may also be implemented with a DNA polymerase that does
have strand displacement activity and replaces the top strand (242)
by synthesis. In either approach, the biotinylated fragments may
then be isolated (240) using a solid support (239) derivatized with
streptavidin.
[0057] In another aspect, primer extension from a genomic DNA
template is used to generate a linear amplification of selected
sequences greater than 10 kilobases surrounding genomic regions of
interest. For example, to create a population of defined-sized
targets, 20 cycles of linear amplification is performed with a
forward primer followed by 20 cycles with a reverse primer. Before
applying the second primer, the first primer is removed with a
standard column for long DNA purification or degraded if a few
uracil bases are incorporated. A greater number of reverse strands
are generated relative to forward strands resulting in a population
of double stranded molecules and single stranded reverse strands.
The reverse primer may be biotinylated for capture to streptavidin
beads which can be heated to melt any double stranded homoduplexes
from being captured. All attached molecules will be single stranded
and representing one strand of the original genomic DNA.
[0058] The products produced can be fragmented to 0.2-2 kb in size,
or more preferably, 0.3-0.6 kb in size (effectively releasing them
from the solid support) and circularized for an RCR reaction. In
one method of circularization, illustrated in FIG. 2A, after
genomic DNA (200) is fragmented and denatured (202), single
stranded DNA fragments (204) are first treated with a terminal
transferase (206) to attach a poly dA tails (208) to 3-prime ends.
This is then followed by ligation (212) of the free ends
intra-molecularly with the aid of bridging oligonucleotide (210).
that is complementary to the poly dA tail at one end and
complementary to any sequence at the other end by virtue of a
segment of degenerate nucleotides. Duplex region (214) of bridging
oligonucleotide (210) contains at least a primer binding site for
RCR and, in some embodiments, sequences that provide complements to
a capture oligonucleotide, which may be the same or different from
the primer binding site sequence, or which may overlap the primer
binding site sequence. The length of capture oligonucleotides may
vary widely, In one aspect, capture oligonucleotides and their
complements in a bridging oligonucleotide have lengths in the range
of from 10 to 100 nucleotides; and more preferably, in the range of
from 10 to 40 nucleotides. In some embodiments, duplex region (214)
may contain additional elements, such as an oligonucleotide tag,
for example, for identifying the source nucleic acid from which
it's associated DNA fragment came. That is, in some embodiments,
circles or adaptor ligation or concatemers from different source
nucleic acids may be prepared separately during which a bridging
adaptor containing a unique tag is used, after which they are mixed
for concatemer preparation or application to a surface to produce a
random array. The associated fragments may be identified on such a
random array by hybridizing a labeled tag complement to its
corresponding tag sequences in the concatemers, or by sequencing
the entire adaptor or the tag region of the adaptor. Circular
products (218) may be conveniently isolated by a conventional
purification column, digestion of non-circular DNA by one or more
appropriate exonucleases, or both.
[0059] As mentioned above, DNA fragments of the desired sized
range, e.g. 50-600 nucleotides, can also be circularized using
circularizing enzymes, such as CircLigase, as single stranded DNA
ligase that circularizes single stranded DNA without the need of a
template. CircLigase is used in accordance with the manufacturer's
instructions (Epicentre, Madison, Wis.). A preferred protocol for
forming single stranded DNA circles comprising a DNA fragment and
one or more adapters is to use standard ligase such as T4 ligase
for ligation an adapter to one end of DNA fragment and then to use
CircLigase to close the circle, as described more fully below.
[0060] An exemplary protocol for generating a DNA circle comprising
an adaptor oligonucleotide and a target sequence using T4 ligase.
The target sequence is a synthetic oligo T1N (sequence:
5'-NNNNNNNNGCATANCACGANGTCATNATCGTNCAAACGTCAGTCCANGAATCNAGATCCACTTAGANTGN-
CGN NNNNNNN-3') (SEQ ID NO: 1). The adaptor is made up of 2
separate oligos. The adaptor oligo that joins to the 5' end of T1N
is BR2-ad (sequence: 5'-TATCATCTGGATGTTAGGAAGACAAAAGGAAGCT
GAGGACATTAACGGAC-3') (SEQ ID NO: 2) and the adaptor oligo that
joins to the 3' end of T1N is UR3-ext (sequence:
5'-ACCTTCAGACCAGAT-3') (SEQ ID NO: 3) UR3-ext contains a type IIs
restriction enzyme site (Acu I: CTTCAG) to provide a way to
linearize the DNA circular for insertion of a second adaptor.
BR2-ad is annealed to BR2-temp (sequence
5'-NNNNNNNNGTCCGTTAATGTCCTCAG-3') (SEQ ID NO: 4) to form a
double-stranded adaptor BR2 adaptor. UR3-ext is annealed to
biotinylated UR3-temp (sequence
5'-[BIOTIN]-ATCTGGTCTGAAGGTNNNNNNNNN-3') (SEQ ID NO: 5) to form a
double-stranded adaptor UR3 adaptor. 1 pmol of target T1N is
ligated to 25 pmol of BR2 adaptor and 10 pmol of UR3 adaptor in a
single ligation reaction containing 50 mM Tris-C1, pH7.8, 10% PEG,
1 mM ATP, 50 mg/L BSA, 10 mM MgCl2, 0.3 unit/.mu.l T4 DNA ligase
(Epicentre Biotechnologies, WI) and 10 mM DTT) in a final volume of
10 ul. The ligation reaction is incubated in a temperature cycling
program of 15.degree. C. for 11 min, 37.degree. C. for 1 min
repeated 18 times. The reaction is terminated by heating at
70.degree. C. for 10 min. Excess BR2 adaptors are removed by
capturing the ligated products with streptavidin magnetic beads
(New England Biolabs, MA). 3.3 ul of 4.times. binding buffer (2M
NaCl, 80 mM Tris HCl pH7.5) is added to the ligation reaction which
is then combined with 15 .mu.g of streptavidin magnetic beads in
1.times. binding buffer (0.5M NaC1, 20 mM Tris HCl pH7.5). After 15
min incubation in room temperature, the beads are washed twice with
4 volumes of low salt buffer (0.15M NaC1, 20 mM Tris HCl pH7.5).
Elution buffer (10 mM Tris HC1 pH7.5) is pre-warmed to 70 deg, 10
ul of which is added to the beads at 70.degree. C. for 5 min. After
magnetic separation, the supernatant is retained as primary
purified sample. This sample is further purified by removing the
excess UR3 adaptors with magnetic beads pre-bound with a
biotinylated oligo BR-rc-bio (sequence:
5'-[BIOTIN]CTTTTGTCTTCCTAACATCC-3') (SEQ ID NO: 6) that is reverse
complementary to BR2-ad similarly as described above. The
concentration of the adaptor-target ligated product in the final
purified sample is estimated by urea polyacrylamide gel
electrophoresis analysis. The circularization is carried out by
phosphorylating the ligation products using 0.2 unit/.mu.1 T4
polynucleotide kinase (Epicentre Biotechnologies) in 1 mM ATP and
standard buffer provided by the supplier, and circularized with
ten-fold molar excess of a splint oligo UR3 closing-88 (sequence
5'-AGATGATAATCTGGTC-3') (SEQ ID NO: 7) using 0.3 unit/.mu.1 of T4
DNA ligase (Epicentre Biotechnologies) and 1 mM ATP. The
circularized product is validated by performing RCR reactions as
described below.
Generating Polynucleotide Concatemers by Rolling Circle
Replication
[0061] In one aspect of the invention, single molecules comprise
concatemers of polynucleotides, usually polynucleotide analytes,
i.e. target sequences, that have been produce in a conventional
rolling circle replication (RCR) reaction. Guidance for selecting
conditions and reagents for RCR reactions is available in many
references available to those of ordinary skill, as evidence by the
following that are incorporated by reference: Kool, U.S. Pat. No.
5,426,180; Lizardi, U.S. Pat. Nos. 5,854,033 and 6,143,495;
Landegren, U.S. Pat. No. 5,871,921; and the like. Generally, RCR
reaction components comprise single stranded DNA circles, one or
more primers that anneal to DNA circles, a DNA polymerase having
strand displacement activity to extend the 3' ends of primers
annealed to DNA circles, nucleoside triphosphates, and a
conventional polymerase reaction buffer. Such components are
combined under conditions that permit primers to anneal to DNA
circles and be extended by the DNA polymerase to form concatemers
of DNA circle complements. An exemplary RCR reaction protocol is as
follows: In a 50 .mu.L reaction mixture, the following ingredients
are assembled: 2-50 pmol circular DNA, 0.5 units/pi phage (.phi.29
DNA polymerase, 0.2 .mu.g/.mu.L BSA, 3 mM dNTP, 1.times. (.phi.29
DNA polymerase reaction buffer (Amersham). The RCR reaction is
carried out at 30.degree. C. for 12 hours. In some embodiments, the
concentration of circular DNA in the polymerase reaction may be
selected to be low (approximately 10-100 billion circles per ml, or
10-100 circles per picoliter) to avoid entanglement and other
intermolecular interactions.
[0062] Preferably, concatemers produced by RCR are approximately
uniform in size; accordingly, in some embodiments, methods of
making arrays of the invention may include a step of size selecting
concatemers. For example, in one aspect, concatemers are selected
that as a population have a coefficient of variation in molecular
weight of less than about 30%; and in another embodiment, less than
about 20%. In one aspect, size uniformity is further improved by
adding low concentrations of chain terminators, such ddNTPs, to the
RCR reaction mixture to reduce the presence of very large
concatemers, e.g. produced by DNA circles that are synthesized at a
higher rate by polymerases. In one embodiment, concentrations of
ddNTPs are used that result in an expected concatemer size in the
range of from 50-250 Kb, or in the range of from 50-100 Kb. In
another aspect, concatemers may be enriched for a particular size
range using a conventional separation techniques, e.g.
size-exclusion chromatography, membrane filtration, or the
like.
Generation of Macromolecular Structures Comprising Branched
Polymers and DNA Assemblies
[0063] In one aspect of the invention, macromolecular structures
comprise polymers having at least one unique functionality, which
for polynucleotides is usually a functionality at a 5' or 3' end,
and a plurality of complementary functionalities that are capable
of specifically reacting with reactive functionalities of the
surface of a solid support. Macromolecular structures comprising
branched polymers, especially branched polynucleotides, may be
synthesized in a variety of ways, as disclosed by Gryaznov (cited
above), Urdea (cited above), and like references. In one aspect,
branched polymers of the invention include comb-type branched
polymers, which comprise a linear polymeric unit with one or more
branch points located at interior monomers and/or linkage moieties.
Branched polymers of the invention also include fork-type branched
polymers, which comprise a linear polymeric unit with one or two
branch points located at terminal monomers and/or linkage moieties.
Macromolecular structures of the invention also include assemblies
of linear and/or branched polynucleotides bound together by one or
more duplexes or triplexes. Such assemblies may be self-assembled
from component linear polynucleotide, e.g. as disclosed by Goodman
et al, Science, 310: 1661-1665 (2005); Birac et al, J. Mol. Graph
Model, (Apr. 18, 2006); Seeman et al, U.S. Pat. No. 6,255,469; and
the like, which are incorporated herein by reference. In one
aspect, linear polymeric units of the invention have the form:
"--(M-L).sub.n-" wherein L is a linker moiety and M is a monomer
that may be selected from a wide range of chemical structures to
provide a range of functions from serving as an inert
non-sterically hindering spacer moiety to providing a reactive
functionality which can serve as a branching point to attach other
components, a site for attaching labels; a site for attaching
oligonucleotides or other binding polymers for hybridizing or
binding to amplifier strands or structures, e.g. as described by
Urdea et al, U.S. Pat. No. 5,124,246 or Wang et al, U.S. Pat. No.
4,925,785; a site for attaching "hooks", e.g. as described in
Whiteley et al, U.S. Pat. No. 4,883,750; or as a site for attaching
other groups for affecting solubility, promotion of duplex and/or
triplex formation, such as intercalators, alkylating agents, and
the like. The following references disclose several phosphoramidite
and/or hydrogen phosphonate monomers suitable for use in the
present invention and provide guidance for their synthesis and
inclusion into oligonucleotides: Newton et al, Nucleic Acids
Research, 21:1155-1162 (1993); Griffin et al, J. Am. Chem. Soc.,
114:7976-7982 (1992); Jaschke et al, Tetrahedron Letters,
34:301-304 (1992); Ma et al, International application
PCT/CA92/00423; Zon et al, International application
PCT/US90/06630; Durand et al, Nucleic Acids Research, 18:6353 6359
(1990); Salunkhe et al, J. Am. Chem. Soc., 114:8768-8772 (1992);
Urdea et al, U.S. Pat. No. 5,093,232; Ruth, U.S. Pat. No.
4,948,882; Cruickshank, U.S. Pat. No. 5,091,519; Haralambidis et
al, Nucleic Acids Research, 15:4857-4876 (1987); and the like. More
particularly, M is a straight chain, cyclic, or branched organic
molecular structure containing from 1 to 20 carbon atoms and from 0
to 10 heteroatoms selected from the group consisting of oxygen,
nitrogen, and sulfur. Preferably, M is alkyl, alkoxy, alkenyl, or
aryl containing from 1 to 16 carbon atoms; heterocyclic having from
3 to 8 carbon atoms and from 1 to 3 heteroatoms selected from the
group consisting of oxygen, nitrogen, and sulfur; glycosyl; or
nucleosidyl. More preferably, M is alkyl, alkoxy, alkenyl, or aryl
containing from 1 to 8 carbon atoms; glycosyl; or nucleosidyl.
Preferably, L is a phosphorus(V) linking group which may be
phosphodiester, phosphotriester, methyl or ethyl phosphonate,
phosphorothioate, phosphorodithioate, phosphoramidate, or the like.
Generally, linkages derived from phosphoramidite or hydrogen
phosphonate precursors are preferred so that the linear polymeric
units of the invention can be conveniently synthesized with
commercial automated DNA synthesizers, e.g. Applied Biosystems,
Inc. (Foster City, Calif.) model 394, or the like and may vary
significantly depending on the nature of M and L. Usually, n varies
from about 3 to about 100. When M is a nucleoside or analog thereof
or a nucleoside-sized monomer and L is a phosphorus(V) linkage,
then n varies from about 12 to about 100. Preferably, when M is a
nucleoside or analog thereof or a nucleoside-sized monomer and L is
a phosphorus(V) linkage, then n varies from about 12 to about 40.
Polymeric units are assembled by forming one or more covalent
bridges among them. In one aspect, bridges are formed by reacting
thiol, phosphorothioate, or phosphorodithioate groups on one or
more components with haloacyl- or haloalkylamimo groups on one or
more other components to form one or more thio- or
dithiophosphorylacyl or thio- or dithiophosphorylalkyi bridges.
Generally, such bridges have one of the following forms:
--NHRSP(.dbd.Z)(O)--OR--NHRS--, wherein R is alkyl or acyl and Z is
sulfur or oxygen. The assembly reaction may involve from 2 to 20
components depending on the particular embodiment; but preferably,
it involves from 2 to 8 components; and more preferably, it
involves from 2 to 4 components. Preferably, the haloacyl. or
haloalkylamino groups are haloacetylamino groups; and more
preferably, the haloacetylamino groups are bromoacetylamino groups.
The acyl or alkyl moieties of the haloacyl- or haloalkylamino
groups contain from 1 to 12 carbon atoms; and more preferably, such
moieties contain from 1 to 8 carbon atoms. The reaction may take
place in a wide range of solvent systems; but generally, the
assembly reaction takes place under liquid aqueous conditions or in
a frozen state in ice, e.g. obtained by lowering the temperature of
a liquid aqueous reaction mixture. Alternatively, formation of
thiophosphorylacetylamino bridges in DMSO/H20 has been reported by
Thuong et al, Tetrahedron Letters, 28:4157-4160 (1987); and
Francois et al, Proc. Natl. Acad. Sci., 86:9702-9706 (1989).
Typical aqueous conditions include 4 .mu.M of reactants in 25 mM
NaC1 and 15 mM phosphate buffer (pH 7.0). The thio- or
dithiophosphorylacyl- or thio- or dithiophosphorylalkylamino
bridges are preferred because they can be readily and selectively
cleaved by oxidizing agents, such as silver nitrate, potassium
iodide, and the like. Preferably, the bridges are cleaved with
potassium iodide, KI.sub.3, at a concentration equivalent to about
a hundred molar excess of the bridges. Usually, a KI.sub.3 is
employed at a concentration of about 0.1M. The facile cleavage of
these bridges is a great advantage in synthesis of complex
macromolecular structures, as it provides a convenient method for
analyzing final products and for confirming that the structure of
the final product is correct. A 3'-haloacyl- or haloalkylamino (in
this example, haloacetylamino) derivatized oligonucleotide 1 is
reacted with a 5'-phosphorothioate derivatized oligonucleotide 2
according to the following scheme:
5'-BBB . . . B--NHC(.dbd.O)CH.sub.2X+ (1)
SP(.dbd.O)(O--)--BBB . . . B-3'.fwdarw.5'-BBB . . .
B--NHC(.dbd.O)CH.sub.2SP(.dbd.O)(O--)O--BBB . . . B-3' (2)
[0064] wherein X is halo and B is a nucleotide. It is understood
that the nucleotides are merely exemplary of the more general
polymeric units, (M-L).sub.n described above. Compound 1 can be
prepared by reacting N-succinimidyl haloacetate in
N,N-dimethylformamide (DMF) with a 3'-aminodeoxyribonucleotide
precursor in a sodium borate buffer at room temperature. After
about 35 minutes the mixture is diluted (e.g. with H.sub.20),
desalted and, purified, e.g. by reverse phase HPLC. The
Y-aminodeoxyribonucleotide precursor can be prepared as described
in Gryaznov and Letsinger, Nucleic Acids Research, 20:3403-3409
(1992). Briefly, after deprotection, the 5' hydroxyl of a
deoxythymidine linked to a support via a standard succinyl linkage
is phosphitylated by reaction with
chloro-(diisopropylethylamino)-methoxyphosphine in an appropriate
solvent, such as dichloromethane/diisopropylethylamine. After
activation with tetrazole, the 5' phosphitylated thymidine is
reacted with a 5'-trityl-O-3'-amino-3'-deoxynucleoside to form a
nucleoside-thymidine dimer wherein the nucleoside moieties are
covalently joined by a phosphoramidate linkage. The remainder of
the oligonucleotide is synthesized by standard phosphoramidite
chemistry. After cleaving the succinyl linkage, the oligonucleotide
with a 3' terminal amino group is generated by cleaving the
phosphoramidate link by acid treatment, e.g. 80% aqueous acetic
acid for 18-20 hours at room temperature. 5'-monophosphorothioate
oligonucleotide 2 is formed as follows: A 5' monophosphate is
attached to the 5' end of an oligonucleotide either chemically or
enzymatically with a kinase, e.g. Sambrook et al, Molecular
Cloning: A Laboratory Manual, 2nd Edition (Cold Spring Harbor
Laboratory, New York, 1989). Preferably, as a final step in
oligonucleotide synthesis, a monophosphate is added by chemical
phosphorylation as described by Thuong and Asscline, Chapter 12 in,
Eckstein, editor, Oligonucleotides and Analogues (IRL Press,
Oxford, 1991) or by Horn and Urdea, Tetrahedron Lett., 27:4705
(1986) (e.g. using commercially available reagents such as 5'
Phosphate-ON.TM. from Clontech Laboratories (Palo Alto, Calif.)).
The 5'-monophosphate is then sulfurized using conventional
sulfurizing agents, e.g. treatment with a 5% solution of S.sub.8 in
pyfidine/CS.sub.2 (1:1, v/v, 45 minutes at room temperature); or
treatment with sulfurizing agent described in U.S. Pat. Nos.
5,003,097; 5,151,510; or 5,166,387. Monophosphorodithioates are
prepared by analogous procedures, e.g. Froehler et al, European
patent publication 0 360 609 A2; Caruthers et al, International
application PCT/US89/02293; and the like. Likewise to the above, a
5' haloacetylamino derivatized oligonucleotide 3 is reacted with a
3'-monophosphorothioate oligonucleotide 4 according to the
following scheme:
3'-BBB . . . B--NHC(.dbd.O)CH.sub.2X+ (3)
S--P(.dbd.O)(O--)O--BBB . . . B-5'.fwdarw.3'-BBB . . .
B--NHC(.dbd.O)CH.sup.2SP(.dbd.O)(O--)--BBB . . . B-5' (4)
wherein the symbols are defined the same as above, except that the
nucleotides monomers of the j- and k-mers are in opposite
orientations. In this case, Compound 3 can be prepared by reacting
N succinimidyl haloacetate in N,N-dimethylformamide (DMF) with a
5'-aminodeoxyribonucleotide precursor in a sodium borate buffer at
room temperature, as described above for the 3'-amino
oligonucleotide. 5'-aminodeoxynucleosides are prepared in
accordance with Glinski et al, J. Chem. Soc. Chem. Comm., 915-916
(1970); Miller et al, J. Org. Chem. 29:1772 (1964); Ozols et al,
Synthesis, 7:557-559 (1980); and Azhayev et al, Nucleic Acids
Research, 6:625-643 (1979); which are incorporated by reference.
The 3'-monophosphorothioate oligonucleotide 4 can be prepared as
described by Thuong and Asscline (cited above). Oligonucleotides 1
and 4 and 2 and 3 may be reacted to form polymeric units having
either two 5' termini or two 3' termini, respectively.
[0065] Reactive functionalities for the attachment of branches may
be introduced at a variety of sites. Preferably, amino
functionalities are introduce on a polymeric unit or loop at
selected monomers or linking moieties which are then converted to
haloacetylamino groups as described above. Amino-derivatized bases
of nucleoside monomers may be introduced as taught by Urdea et al,
U.S. Pat. No. 5,093,232; Ruth U.S. Pat. No. 4,948,882; Haralambidis
et al, Nucleic Acids Research, 15:4857-4876 (1987); or the like.
Amino functionalities may also be introduced by a protected
hydroxyamine phosphoramidite commercially available from Clontech
Laboratories (Palo Alto, Calif.) as Aminomodifier II.TM..
Preferably, amino functionalities are introduced by generating a
derivatized phosphoramidate linkage by oxidation of a phosphite
linkage with 12 and an alkyldiamine, e.g. as taught by Agrawal et
al, Nucleic Acids Research, 18:5419-5423 (1990); and Jager et al,
Biochemistry, 27:7237-7246 (1988). Generally, for the above
procedures, it is preferable that the haloacyl- or haloalkylamino
derivatized polymeric units be prepared separately from the
phosphorothioate derivatized polymeric units, otherwise the
phosphorothioate moieties require protective groups.
Solid Phase Surfaces for Constructing Random Arrays
[0066] A wide variety of supports may be used with the invention.
In one aspect, supports are rigid solids that have a surface,
preferably a substantially planar surface so that single molecules
to be interrogated are in the same plane. The latter feature
permits efficient signal collection by detection optics, for
example. In another aspect, solid supports of the invention are
nonporous, particularly when random arrays of single molecules are
analyzed by hybridization reactions requiring small volumes.
Suitable solid support materials include materials such as glass,
polyacrylamide-coated glass, ceramics, silica, silicon, quartz,
various plastics, and the like. In one aspect, the area of a planar
surface may be in the range of from 0.5 to 4 cm.sup.2. In one
aspect, the solid support is glass or quartz, such as a microscope
slide, having a surface that is uniformly silanized. This may be
accomplished using conventional protocols, e.g. acid treatment
followed by immersion in a solution of 3-glycidoxypropyl
trimethoxysilane, N,N-diisopropylethylamine, and anhydrous xylene
(8:1:24 v/v) at 80.degree. C., which forms an epoxysilanized
surface. e.g. Beattie et a (1995), Molecular Biotechnology, 4: 213.
Such a surface is readily treated to permit end-attachment of
capture oligonucleotides, e.g. by providing capture
oligonucleotides with a 3' or 5' triethylene glycol phosphoryl
spacer (see Beattie et al, cited above) prior to application to the
surface. Many other protocols may be used for adding reactive
functionalities to glass and other surfaces, as evidenced by the
disclosure in Beaucage (cited above).
[0067] Whenever enzymatic processing is not required, capture
oligonucleotides may comprise non-natural nucleosidic units and/or
linkages that confer favorable properties, such as increased duplex
stability; such compounds include, but not limited to, peptide
nucleic acids (PNAs), locked nucleic acids (LNA), oligonucleotide
N3'.fwdarw.P5' phosphoramidates, oligo-2'-0-alkylribonucleotides,
and the like.
[0068] In embodiments of the invention in which patterns of
discrete spaced apart regions are required, photolithography,
electron beam lithography, nano imprint lithography, and nano
printing may be used to generate such patterns on a wide variety of
surfaces, e.g. Pirrung et al, U.S. Pat. No. 5,143,854; Fodor et al,
U.S. Pat. No. 5,774,305; Guo, (2004) Journal of Physics D: Applied
Physics, 37: R123-141; which are incorporated herein by
reference.
[0069] In one aspect, surfaces containing a plurality of discrete
spaced apart regions are fabricated by photolithography. A
commercially available, optically flat, quartz substrate is spin
coated with a 100-500 nm thick layer of photo-resist. The
photo-resist is then baked on to the quartz substrate. An image of
a reticle with a pattern of regions to be activated is projected
onto the surface of the photo-resist, using a stepper. After
exposure, the photo-resist is developed, removing the areas of the
projected pattern which were exposed to the UV source. This is
accomplished by plasma etching, a dry developing technique capable
of producing very fine detail. The substrate is then baked to
strengthen the remaining photo-resist. After baking, the quartz
wafer is ready for functionalization. The wafer is then subjected
to vapor-deposition of 3-aminopropyldimethylethoxysilane. The
density of the amino functionalized monomer can be tightly
controlled by varying the concentration of the monomer and the time
of exposure of the substrate. Only areas of quartz exposed by the
plasma etching process may react with and capture the monomer. The
substrate is then baked again to cure the monolayer of
amino-functionalized monomer to the exposed quartz. After baking,
the remaining photo-resist may be removed using acetone. Because of
the difference in attachment chemistry between the resist and
silane, aminosilane-functionalized areas on the substrate may
remain intact through the acetone rinse. These areas can be further
functionalized by reacting them with p-phenylenediisothiocyanate in
a solution of pyridine and N--N-dimethlyformamide. The substrate is
then capable of reacting with amine-modified oligonucleotides.
Alternatively, oligonucleotides can be prepared with a
5'-carboxy-modifier-c10 linker (Glen Research). This technique
allows the oligonucleotide to be attached directly to the amine
modified support, thereby avoiding additional functionalization
steps.
[0070] In another aspect, surfaces containing a plurality of
discrete spaced apart regions are fabricated by nano-imprint
lithography (NIL). For DNA an-ay production, a quartz substrate is
spin coated with a layer of resist, commonly called the transfer
layer. A second type of resist is then applied over the transfer
layer, commonly called the imprint layer. The master imprint tool
then makes an impression on the imprint layer. The overall
thickness of the imprint layer is then reduced by plasma etching
until the low areas of the imprint reach the transfer layer.
Because the transfer layer is harder to remove than the imprint
layer, it remains largely untouched. The imprint and transfer
layers are then hardened by heating. The substrate is then put into
a plasma etcher until the low areas of the imprint reach the
quartz. The substrate is then derivatized by vapor deposition as
described above.
[0071] In another aspect, surfaces containing a plurality of
discrete spaced apart regions are fabricated by nano printing. This
process uses photo, imprint, or e-beam lithography to create a
master mold, which is a negative image of the features required on
the print head. Print heads are usually made of a soft, flexible
polymer such as polydimethylsiloxane (PDMS). This material, or
layers of materials having different properties, are spin coated
onto a quartz substrate. The mold is then used to emboss the
features onto the top layer of resist material under controlled
temperature and pressure conditions. The print head is then
subjected to a plasma based etching process to improve the aspect
ratio of the print head, and eliminate distortion of the print head
due to relaxation over time of the embossed material. Random array
substrates are manufactured using nano-printing by depositing a
pattern of amine modified oligonucleotides onto a homogenously
derivatized surface. These oligonucleotides would serve as capture
probes for the RCR products. One potential advantage to
nano-printing is the ability to print interleaved patterns of
different capture probes onto the random array support. This would
be accomplished by successive printing with multiple print heads,
each head having a differing pattern, and all patterns fitting
together to form the final structured support pattern. Such methods
allow for some positional encoding of DNA elements within the
random array. For example, control concatemers containing a
specific sequence can be bound at regular intervals throughout a
random array.
[0072] In still another aspect, a high density array of capture
oligonucleotide spots of sub micron size is prepared using a
printing head or imprint-master prepared from a bundle, or bundle
of bundles, of about 10,000 to 100 million optical fibers with a
core and cladding material. By pulling and fusing fibers a unique
material is produced that has about 50-1000 nm cores separated by a
similar or 2-5 fold smaller or larger size cladding material. By
differential etching (dissolving) of cladding material a
nano-printing head is obtained having a very large number of
nano-sized posts. This printing head may be used for depositing
oligonucleotides or other biological (proteins, oligopeptides, DNA,
aptamers) or chemical compounds such as silane with various active
groups. In one embodiment the glass fiber tool is used as a
patterned support to deposit oligonucleotides or other biological
or chemical compounds. In this case only posts created by etching
may be contacted with material to be deposited. Also, a flat cut of
the fused fiber bundle may be used to guide light through cores and
allow light-induced chemistry to occur only at the tip surface of
the cores, thus eliminating the need for etching. In both cases,
the same support may then be used as a light guiding/collection
device for imaging fluorescence labels used to tag oligonucleotides
or other reactants. This device provides a large field of view with
a large numerical aperture (potentially >1). Stamping or
printing tools that perform active material or oligonucleotide
deposition may be used to print 2 to 100 different oligonucleotides
in an interleaved pattern. This process requires precise
positioning of the print head to about 50-500 nm. This type of
oligonucleotide array may be used for attaching 2 to 100 different
DNA populations such as different source DNA. They also may be used
for parallel reading from sub-light resolution spots by using DNA
specific anchors or tags. Information can be accessed by DNA
specific tags, e.g. 16 specific anchors for 16 DNAs and read 2
bases by a combination of 5-6 colors and using 16 ligation cycles
or one ligation cycle and 16 decoding cycles. This way of making
arrays is efficient if limited information (e.g. a small number of
cycles) is required per fragment, thus providing more information
per cycle or more cycles per surface.
[0073] In one embodiment "inert" concatemers are used to prepare a
surface for attachment of test concatemers. The surface is first
covered by capture oligonucleotides complementary to the binding
site present on two types of synthetic concatemers; one is a
capture concatemer, the other is a spacer concatemer. The spacer
concatemers do not have DNA segments complementary to the adapter
used in preparation of test concatemers and they are used in about
5-50, preferably 10.times. excess to capture concatemers. The
surface with capture oligonucleotide is "saturated" with a mix of
synthetic concatemers (prepared by chain ligation or by RCR) in
which the spacer concatemers are used in about 10-fold (or 5 to
50-fold) excess to capture concatemers. Because of the -10:1 ratio
between spacer and capture concatemers, the capture concatemers are
mostly individual islands in a sea of spacer concatemers. The 10:1
ratio provides that two capture concatemers are on average
separated by two spacer concatemers. If concatemers are about 200
nm in diameter, then two capture concatemers are at about 600 nm
center-to-center spacing. This surface is then used to attach test
concatemers or other molecular structures that have a binding site
complementary to a region of the capture concatemers but not
present on the spacer concatemers. Capture concatemers may be
prepared to have less copies than the number of binding sites in
test concatemers to assure single test concatemer attachment per
capture concatemer spot. Because the test DNA can bind only to
capture concatemers, an array of test concatemers may be prepared
that have high site occupancy without congregation. Due to random
attachment, some areas on the surface may not have any concatemers
attached, but these areas with free capture oligonucleotide may not
be able to bind test concatemers since they are designed not to
have binding sites for the capture oligonucleotide. An array of
individual test concatemers as described would not be arranged in a
grid pattern. An ordered grid pattern should simplify data
collection because less pixels are needed and less sophisticated
image analysis systems are needed also.
[0074] In one aspect, multiple arrays of the invention may be place
on a single surface. For example, patterned array substrates may be
produced to match the standard 96 or 384 well plate format. A
production format can be an 8.times.12 pattern of 6 mm.times.6 mm
arrays at 9 mm pitch or 16.times.24 of 3.33 mm.times.3.33 mm array
at 4.5 mm pitch, on a single piece of glass or plastic and other
optically compatible material. In one example each 6 mm.times.6 mm
array consists of 36 million 250-500 nm square regions at 1
micrometer pitch. Hydrophobic or other surface or physical barriers
may be used to prevent mixing different reactions between unit
arrays.
[0075] By way of example, binding sites (i.e. discrete spaced apart
regions) for DNA samples are prepared by silanization of
lithographically defined sites on silicon dioxide on silicon,
quartz, or glass surfaces with 3-aminopropyldimethylethoxysilane or
similar silanization agent followed by derivatization with
p-phenylenediisothiocyanate or similar derivatization agent. For
example, the binding sites may be square, circular or
regular/irregular polygons produced by photolithography,
direct-write electron beam, or nano-imprint lithography.
Minimization of non-specific binding in regions between binding
site The wettability (hydrophobic v. hydrophilic) and reactivity of
the field surrounding the binding sites can be controlled to
prevent DNA samples from binding in the field; that is, in places
other than the binding sites. For example, the field may be
prepared with hexamethyldisilazane (HMDS), or a similar agent
covalently bonded to the surface, to be hydrophobic and hence
unsuitable to hydrophilic bonding of the DNA samples. Similarly,
the field may be coated with a chemical agent such as a
fluorine-based carbon compound that renders it unreactive to DNA
samples.
[0076] For the three surface fabrication processes listed in the
prior paragraph, the follow exemplary steps are followed. For
photolithography:
1) Clean glass wafer 2) Prime surface with HMDS 3) Pattern binding
sites in photoresist 4) Reactive ion etch binding site surface with
oxygen to remove HMDS 5) Silanize with 0.3%
3-aminopropyldimethylethoxysilane 6) Coat with photoresist to
protect wafer during sawing 7) Saw wafer into chips 8) Strip
photoresist 9) Derivatize binding sites with solution of 10%
pyridine and 90% N,N-Dimethylformaide (DMF) using 2.25 mg
p-phenylenediisothiocyanate (PDC) per ml of solution for 2 h
followed by methanol, acetone, and water rinses
[0077] For direct write electron beam surface fabrication:
1) Clean glass wafer 2) Prime surface with HMDS 3) Pattern binding
sites in PMMA with electron beam 4) Reactive ion etch binding site
surface with oxygen to remove HMDS 5) Silanize with 0.3%
3-aminopropyldimethylethoxysilane 6) Coat with photoresist to
protect wafer during sawing 7) Saw wafer into chips 8) Strip
photoresist 9) Derivatize binding sites with solution of 10%
pyridine and 90% N,N Dimethylformaide (DMF) using 2.25 mg
p-phenylenediisothiocyanate (PDC) per ml of solution for 2 h
followed by methanol, acetone, and water rinses.
[0078] For nano imprint lithography surface fabrication:
1) Clean glass wafer 2) Prime surface with HMDS 3) Coat wafer with
transfer layer 4) Contact print pattern with nano imprint template
and photopolymer on top of transfer layer 5) Dry etch pattern into
transfer layer 6) Reactive ion etch binding site surface with
oxygen to remove HMDS 7) Silanize with 0.3%
3-aminopropyldimethylethoxysilane 8) Coat with photoresist to
protect wafer during sawing 9) Saw wafer into chips 10) Strip
photoresist 11) Derivatize binding sites with solution of 10%
pyridine and 90% N,N Dimethylformaide (DMF) using 2.25 mg
p-phenylenediisothiocyanate (PDC) per ml of solution for 2 h
followed by methanol, acetone, and water rinses.
[0079] As mentioned above, a glass surface may also be used for
constructing random arrays of the invention. For example, a
suitable glass surface may be constructed from microscope cover
slips. Microscope cover slips (22 mm sq-170 um thick) are placed in
Teflon racks. They are soaked in 3 molar KOH in 95% ethanol/water
for 2 minutes. They are then rinsed in water, followed by an
acetone rinse. This removes surface contamination and prepares the
glass for silanization. Plasma cleaning is an alternative to KOH
cleaning. Fused silica or quartz may also be substituted for glass.
The clean, dry cover slips are immersed in 0.3%
3-aminopropyldimethylethoxysilane, 0.3% water, in acetone. They are
left to react for 45 minutes. They are then rinsed in acetone and
cured at 100.degree. C. for 1 hour.
3-aminopropyldimethylethoxysilane may be used as a replacement for
3-aminopropyltriethoxysilane because it forms a mono-layer on the
glass surface. The monolayer surface provides a lower background.
The silanization agent may also be applied using vapor deposition.
3-aminopropyltriethoxysilane tends to form more of a polymeric
surface when deposited in solution phase. The amino modified silane
is then terminated with a thiocyanate group. This is done in a
solution of 10% pyridine and 90% N,N-Dimethylformaide (DMF) using
2.25 mg p-phenylenediisothiocyanate (PDC) per ml of solution. The
reaction is run for 2 hours, then the slide is washed in methanol,
followed by acetone, and water rinses. The cover slips are then
dried and ready to bind probe. There are additional chemistries
that can be used to modify the amino group at the end of the
silanization agent. For example, glutaraldehyde can be used to
modify the amino group at the end of the silanization agent to a
aldehyde group which can be coupled to an amino modified
oligonucleotide. Capture oligonucleotides are bound to the surface
of the cover slide by applying a solution of 10-50 micromolar
capture oligonucleotide in 100 millimolar sodium bicarbonate in
water to the surface. The solution is allowed to dry, and is then
washed in water.
[0080] It may be beneficial to avoid terminating the 3-amino group
with PDC and perform a direct conjugation (of the 3-amino end) to
the capture oligonucleotide which has been modified with either a
carboxyl group or an aldehyde group at the 5' end. In the case of
the carboxyl group, the oligonucleotide is applied in a solution
that contains EDC (1-Ethyl-3-(3-dimethylaminopropyl)-carbodiimide).
In the case of the aldehyde group, the oligo is kept wet for 5-10
minutes then the surface is treated with a 1% solution of sodium
borohydride.
[0081] In another aspect of the invention, random arrays are
prepared using nanometer-sized beads. Sub-micron glass or other
types of beads (e.g. in the 20-50 nm range) are used which are
derivatized with a short oligonucleotide, e.g. 6-30 nucleotides,
complementary to an adaptor oligonucleotide in the circles used to
generate concatemers. The number of oligonucleotides on the bead
and the length of the sequence can be controlled to weakly bind the
concatemers in solution. Reaction rate of the beads should be much
faster than that of the solid support alone. After binding
concatemers, the beads are then allowed to settle on the surface of
an array substrate. The array substrate has longer, more stable,
more numerous oligonucleotides, such that conditions may be
selected to permit preferential binding to the surface, thereby
forming a spaced array of concatemers. If the beads are magnetic, a
magnetic field can be used to pull them to the surface, it may also
be used to move them around the surface. Alternatively, a
centrifuge may be used to concentrate the beads on the surface. An
exemplary protocol is as follows: 1. A preparation of 20 ul of
concatemer solution with one million concatemers per I ul is mixed
with 20 million nano-beads with about 500 capture oligonucleotides
about 8 bases in length (6-16 bases may be use under different
conditions). A 100 nm nano-bead there is approximately 40,000 nm2
and can hold up to 4000 short oligonucleotides. One way to control
the density of capture probes is to mix in this case about 8 times
more of a 2-4 bases long oligonucleotides with the same attachment
chemistry with the capture probe. Also, much smaller nano-beads
(20-50 nm) may be used. 2. Reaction conditions (temperature, pH,
salt concentration) are adjusted so that concatemers with over 300
copies will attach to nanobeads in significant numbers. 3. The
reaction is applied under the same stringent conditions to a
support with 4.times.4 mm of patterned surface with 16 million
active sites about 200 nm in size, and nanobeads are allowed or
forced to settle on the substrate surface bringing large
concatemers with them. The largest distance that a
nano-bead-concatemer has to travel is about 1 mm. The vertical
movement of beads minimizes number of potential
concatemer-concatemer encounters. The reaction solution may be
applied in aliquots, e.g. 4 applications 5 ul each. In this case
the thickness of the applied solution (e.g. the nano-bead maximal
travel distance) is only about 250 microns. 4. Further increase
stringency of the reaction to release concatemers from nano-beads
and attach them to active sites on the support with .sup..about.300
capture oligonucleotides 20-50 bases in length. 5. Concatemers
attached to nano-beads will predominately settle initially between
active sites on the support because there are 25 times more
inactive than active surface. Slight horizontal movement force
(e.g. substrate tilting, and other forces), may be applied to move
nano-bead-concatemers about one to a few microns around.
Detection Instrumentation
[0082] As mentioned above, signals from single molecules on random
arrays made in accordance with the invention are generated and
detected by a number of detection systems, including, but not
limited to, scanning electron microscopy, near field scanning
optical microscopy (NSOM), total internal reflection fluorescence
microscopy (TIRFM), and the like. Abundant guidance is found in the
literature for applying such techniques for analyzing and detecting
nanoscale structures on surfaces, as evidenced by the following
references that are incorporated by reference: Reimer et al,
editors, Scanning Electron Microscopy: Physics of Image Formation
and Microanalysis, 2.sup.nd Edition (Springer, 1998); Nie et al,
Anal. Chem., 78: 1528-1534 (2006); Hecht et al, Journal Chemical
Physics, 112: 7761-7774 (2000); Zhu et al, editors, Near-Field
Optics: Principles and Applications (World Scientific Publishing,
Singapore, 1999); Drmanac, International patent publication WO
2004/076683; Lehr et al, Anal. Chem., 75: 2414-2420 (2003);
Neuschafer et al, Biosensors & Bioelectronics, 18: 489-497
(2003); Neuschafer et al, U.S. Pat. No. 6,289,144; and the like. Of
particular interest is TIRFM, for example, as disclosed by
Neuschafer et al, U.S. Pat. No. 6,289,144; Lehr et al (cited
above); and Drmanac, International patent publication WO
2004/076683. In one aspect, instruments for use with arrays of the
invention comprise three basic components: (i) a fluidics system
for storing and transferring detection and processing reagents,
e.g. probes, wash solutions, and the like, to an array; (ii) a
reaction chamber, or flow cell, holding or comprising an array and
having flow-through and temperature control capability; and (iii)
an illumination and detection system. In one embodiment, a flow
cell has a temperature control subsystem with ability to maintain
temperature in the range from about 5-95.degree. C., or more
specifically 10-85.degree. C., and can change temperature with a
rate of about 0.5-2.degree. C. per second.
[0083] In some cases, the system hardware may be described as
consisting of five major components; the robotic fluid handling
system, the reaction chamber, the temperature control system, the
illumination system and the detection system.
[0084] Reaction flow cell. Each DNA array segment may be housed in
a separate flow cell, allowing cycles to be run asynchronously.
Each flow cell provides temperature control, physically indexes the
substrate, and creates a fluid path over the active area of the
substrate. The active area of a flow cell may be determined by how
many unit sub-arrays each flow cell contains. For an eight
flow-cell system, each flow cell may contain an active area of
48.times.4 square millimeters, or 192 square millimeters in a
6.times.8 arrangement of unit sub-arrays. Similarly, in a 16
flow-cell system, each flow cell may have a 1 cm.times.1.5 cm
substrate with 4.times.6 unit subarrays. A side port is connected
to a dedicated syringe pump, which "pulls" or "pushes" fluid from
the flow cell (see A.4.). A thin optical window may be installed in
the flow cell. This window may allow the top surface of the
substrate to be imaged. The DNB array substrate cannot be imaged
through the bottom due to the required substrate thickness. The
placement of the optical window over the DNB array creates thermal
regulation difficulties. The thin cross section of fluid between
the array substrate and optical window may cool or heat to room
temperature relatively quickly. Creating a pocket above the optical
window may allow filling the area directly above the window with
optical oil. This oil may act as a thermal transfer medium
connecting the top of the thin optical window to the temperature
controlled flow cell body. The use of optical oil as the thermal
transfer medium may also allow designing a lens system with an
numerical aperture (NA) better than 1.0. Various other solutions
are possible and may be explored.
[0085] Each DNA array segment may be housed in a separate flow
cell, allowing cycles to be run asynchronously. Each flow cell
provides temperature control, physically indexes the substrate, and
creates a fluid path over the active area of the substrate. The
active area of a flowcell may be determined by how many unit sub
arrays each flowcell contains. For an eight flowcell system each
flowcell may contain an active area of 48.times.4 square
millimeters, or 192 square millimeters in a 6.times.8 arrangement
of unit subarrays. Similarly, in a 16 flowcell system each cell may
have a 1 cm.times.1.5 cm substrate with 4.times.6 unit subarrays. A
side port is connected to a dedicated syringe pump, which "pulls"
or "pushes" fluid from the flow cell. A second port is connected to
a funnel like mixing chamber that is equipped with a liquid level
sensor. The solutions are dispensed into the mixing chamber, mixed
if needed, then drawn into the flow cell. When the level sensor
detects air in the funnel's connection to the flow cell, the pump
is reversed a known amount to back the fluid up to the funnel. This
prevents air from entering the flow cell. This design has worked
well for the small test substrates and may be scaled up for the
random array substrates. A thin optical window may be installed in
the flow cell. This window may allow the top surface of the
substrate to be imaged. The DNB array substrate cannot be imaged
through the bottom due to required substrate thickness. The
placement of the optical window over the DNB array creates thermal
regulation difficulties. The thin cross section of fluid between
the array substrate and optical window may cool or heat to room
temperature relatively quickly. Creating a pocket above the optical
window may allow filling the area directly above the window with
optical oil. This oil may act as a thermal transfer medium
connecting the top of the thin optical window to the temperature
controlled flow cell body. The Solexa (5) is attempting sequencing
by synthesis on random array substrates with non-amplified or
in-situ amplified DNA. Cycles of fluorescent nucleotide addition
result in read lengths of about 25 bases that are then used to
assemble and align the final sequence to a reference sequence.
Researchers (6, 7) and companies such as Helicos Biosciences are
also attempting sequencing by synthesis from non-amplified
templates. The main limitations of these methods are short read
lengths leading to incomplete sequence determination. Furthermore,
the ability to read only one base per DNA per cycle with random
attachment of DNA, requiring larger array surfaces and large
numbers of CCD pixels per DNA sample leads to higher genome
sequencing costs.
[0086] In one aspect, a flow cell for 1'' square 170 micrometer
thick cover slips can be used that has been derivatized to bind
macromolecular structures of the invention. The cell encloses the
"array" by sandwiching the glass and a gasket between two planes.
One plane has an opening of sufficient size to permit imaging, and
an indexing pocket for the cover slip. The other plane has an
indexing pocket for the gasket, fluid ports, and a temperature
control system. One fluid port is connected to a syringe pump which
"pulls" or "pushes" fluid from the flow cell the other port is
connected to a funnel like mixing chamber. The chamber, in turn is
equipped with a liquid level sensor. The solutions are dispensed
into the funnel, mixed if needed, then drawn into the flow cell.
When the level sensor reads air in the funnels connection to the
flow cell the pump is reversed a known amount to back the fluid up
to the funnel. This prevents air from entering the flow cell. The
cover slip surface may be sectioned off and divided into strips to
accommodate fluid flow/capillary effects caused by sandwiching.
Such substrate may be housed in an "open air"/"open face" chamber
to promote even flow of the buffers over the substrate by
eliminating capillary flow effects. Imaging may be accomplished
with a 100.times. objective using TIRF or epi illumination and a
1.3 mega pixel Hamamatsu orca-er-ag on a Zeiss axiovert 200, or
like system. This configuration images RCR concatemers bound
randomly to a substrate (non-ordered array). Imaging speed may be
improved by decreasing the objective magnification power, using
grid patterned arrays and increasing the number of pixels of data
collected in each image. For example, up to four or more cameras
may be used, preferably in the 10-16 megapixel range. Multiple band
pass filters and dichroic mirrors may also be used to collect pixel
data across up to four or more emission spectra. To compensate for
the lower light collecting power of the decreased magnification
objective, the power of the excitation light source can be
increased. Throughput can be increased by using one or more flow
chambers with each camera, so that the imaging system is not idle
while the samples are being hybridized/reacted. Because the probing
of arrays can be non-sequential, more than one imaging system can
be used to collect data from a set of arrays, further decreasing
assay time.
[0087] During the imaging process, the substrate must remain in
focus. Some key factors in maintaining focus are the flatness of
the substrate, orthogonality of the substrate to the focus plane,
and mechanical forces on the substrate that may deform it.
Substrate flatness can be well controlled, glass plates which have
better than 1/4 wave flatness are readily obtained. Uneven
mechanical forces on the substrate can be minimized through proper
design of the hybridization chamber. Orthogonality to the focus
plane can be achieved by a well-adjusted, high precision stage.
Auto focus routines generally take additional time to run, so it is
desirable to run them only if necessary. After each image is
acquired, it will be analyzed using a fast algorithm to determine
if the image is in focus. If the image is out of focus, the auto
focus routine will run. It will then store the objectives Z
position information to be used upon return to that section of that
array during the next imaging cycle. By mapping the objectives Z
position at various locations on the substrate, we will reduce the
time required for substrate image acquisition.
[0088] A suitable illumination and detection system for
fluorescence-based signal is a Zeiss Axiovert 200 equipped with a
TIRF slider coupled to a 80 milliwatt 532 nm solid state laser. The
slider illuminates the substrate through the objective at the
correct TIRF illumination angle. TIRF can also be accomplished
without the use of the objective by illuminating the substrate
though a prism optically coupled to the substrate. Planar wave
guides can also be used to implement TIRF on the substrate Epi
illumination can also be employed. The light source can be
rastered, spread beam, coherent, incoherent, and originate from a
single or multi-spectrum source.
[0089] One embodiment for the imaging system contains a 20.times.
lens with a 1.25 mm field of view, with detection being
accomplished with a 10 megapixel camera. Such a system images
approx 1.5 million concatemers attached to the patterned array at 1
micron pitch. Under this configuration there are approximately 6.4
pixels per concatemer. The number of pixels per concatemer can be
adjusted by increasing or decreasing the field of view of the
objective. For example a 1 mm field of view would yield a value of
10 pixels per concatemer and a 2 mm field of view would yield a
value of 2.5 pixels per concatemer. The field of view may be
adjusted relative to the magnification and NA of the objective to
yield the lowest pixel count per concatemer that is still capable
of being resolved by the optics, and image analysis software.
[0090] Both TIRF and EPI illumination allow for almost any light
source to be used. One illumination schema is to share a common set
of monochromatic illumination sources (about 4 lasers for 6-8
colors) amongst imagers. Each imager collects data at a different
wavelength at any given time and the light sources would be
switched to the imagers via an optical switching system. In such an
embodiment, the illumination source preferably produces at least 6,
but more preferably 8 different wavelengths. Such sources include
gas lasers, multiple diode pumped solid state lasers combined
through a fiber coupler, filtered Xenon Arc lamps, tunable lasers,
or the more novel Spectralum Light Engine, soon to be offered by
Tidal Photonics. The Spectralum Light Engine uses prism to
spectrally separate light. The spectrum is projected onto a Texas
Instruments Digital Light Processor, which can selectively reflect
any portion of the spectrum into a fiber or optical connector. This
system is capable of monitoring and calibrating the power output
across individual wavelengths to keep them constant so as to
automatically compensate for intensity differences as bulbs age or
between bulb changes.
[0091] The following table represent examples of possible lasers,
dyes and filters.
TABLE-US-00001 excitation emission laser filter filter Dye 407 nm
405/12 436/12 Alexa-405 401/421 407 nm 405/12 546/10 cascade
409/558 yellow 488 nm 488/10 514/11 Alexa-488 492/517 543 nm 546/10
540/565 Tamra 540/565 Bodipy 543 nm 546/10 620/12 577/618 577/618
546/10 620/12 Alexa-594 594/613 635 nm 635/11 650/11 Alexa-635
632/647 635 nm 635/11 Alexa700 702/723
[0092] Successfully scoring 6 billion concatemers through
.sup..about.350 (.sup..about.60 per color) images per region over
24 hours may require a combination of parallel image acquisition,
increased image acquisition speed, and increased field of view for
each imager. Additionally, the imager may support between six to
eight colors. Commercially available microscopes commonly image a
.sup..about.1 mm field of view at 20.times. magnification with an
NA of 0.8. At the proposed concatemer pitch of 0.5 micron, this
translates into roughly 4 million concatemers per image. This
yields approximately 1,500 images for 6 billion spots per
hybridization cycle, or 0.5 million images for 350 imaging cycles.
In a large scale sequencing operation, each imager preferably
acquires 200,000 images per day, based on a 300 millisecond
exposure time to a 16 mega pixel CCD. Thus, a preferred instrument
design is 4 imager modules each serving 4 flow cells (16 flow cells
total). The above described imaging schema assumes that each imager
has a CCD detector with 10 million pixels and be used with an
exposure time of roughly 300 milliseconds. This should be an
acceptable method for collecting data for 6 fluorophore labels. One
possible drawback to this imaging technique is that certain
fluorophores may be unintentionally photo bleached by the light
source while other fluorophores are being imaged. Keeping the
illumination power low and exposure times to a minimum would
greatly reduce photo bleaching. By using intensified CCDs (ICCDs)
data could be collected of roughly the same quality with
illumination intensities and exposure times that are orders of
magnitude lower than standard CCDs. ICCDs are generally available
in the 1-1.4 megapixel range. Because they require much shorter
exposure times, a one megapixel ICCD can acquire ten or more images
in the time a standard CCD acquires a single image. Used in
conjunction with fast filter wheels, and a high speed flow cell
stage, a one mega pixel ICCD should be able to collect the same
amount of data as a 10 megapixel standard CCD.
[0093] Optics capable of imaging larger fields of view with high
numerical apertures can be manufactured as custom lens assemblies.
Indications are that 20.times. optics capable of imaging a 3 mm
field of view with a NA >0.9 can be fabricated. Two such imaging
systems, in combination with high pixel count CCD's or CCD mosaic
arrays should be able to image the complete eight flow cell assay
in roughly 14 hours. As described, further gains can be realized by
using 16 flow cells. Doubling the number of flow cells would reduce
imaging time to 9 hours by reducing the number of images per each
field of view.
[0094] The reaction efficiency on the concatemer and other random
DNA arrays may depend on the efficient use of probes, anchors or
primers and enzymes. This may be achieved by mixing liquids (such
as pooling liquid back and forth in the flow through chamber),
applying agitations or using horizontal or vertical electric fields
to bring DNA from different parts of the reaction volume in the
proximity of the surface. One approach for efficient low cost assay
reaction is to apply reaction mixes in a thin layer such as
droplets or layers of about one to a few microns, but preferably
less than 10 microns, in size/thickness. In a 1.times.1.times.1
micron volume designated for a 1.times.1 micron spot area, in 1
pmol/1 ul (1 uM concentration) there would be about 1000 molecules
of probe in close proximity to 1-1000 copies of DNA. Using up to
100-300 molecules of probes would not significantly reduce the
probe concentration and it would provide enough reacted probes to
get significant signal. This approach may be used in an open
reaction chamber that may stay open or closed for removal and
washing of the probes and enzyme.
[0095] The physical makeup of the machine will include a number of
additions to the standard microscope. A large area automated plate
stage may be added to the microscope. This stage will accommodate
the two substrates needed for each decoding assay. Another
possibility is to use two smaller substrates that can fit in the
standard plate stage. Each substrate will be fitted into a cassette
and those cassettes will be fitted on to the stage. The cassette
will index the substrate to the stage and provide a method to
contain fluids over the assay substrate. Cassettes will have ports
to facilitate the addition and removal of large volumes of buffer.
They will also provide a means to control the temperature of the
substrate, through a connection with a temperature control
subsystem with ability to maintain temperature in the range from
about 5-95.degree. C. or more specifically 10-85.degree. C.) and
can change temperature in the cycle about 0.5-2.degree. C. per
second. Another key component is the 3 axis robot gantry which will
be equipped with a syringe pump actuated pipetting head. This
robotic pipetter will be used to add the probe pools to each
cassette. Syringe pumps will be used to pump buffers into and out
of each cassette. In another embodiment, the robotic pipetting may
be replaced with pumps and valves based automation of decoding
probe pool delivery. In yet another embodiment all reagents and
substrates may be contained on a microfluidic chip.
[0096] As mentioned above, higher throughput can be achieved by
using multiple cameras and multiple flow cells. A single robotic
liquid handling gantry may service, for example, 16 flow cells. In
addition, all components of the system may share a common
temperature control system, and set of reagents. For combinatorial
SBH sequencing operations, the robot may prepare probe pools and
ligation buffers to be dispensed into the flow cell funnels.
Dedicated syringe pumps may dispense wash and hybridization buffers
directly into the funnel ports for each flow cell. Each imager may
service a group of 2-4 flow cells. Each group of flow cells may be
positioned on an XY motion platform, similar to the automated plate
stages commonly found on research microscopes. System control and
coordination between all system components may be performed via
software running on a master computer. The control software may run
assay cycles asynchronously, allowing each imager to run
continuously throughout the assay. Flow cells are connected to a
temperature control system with one heater and one chiller allowing
for heating or cooling on demand of each flow cell or 2-4 blocks of
cells independently. Each flow cell temperature may be monitored,
and if a flow cell temperature drops below a set threshold, a valve
may open to a hot water recirculation. Likewise, if a flow cell
temperature is above the set threshold a valve may open to a cold
water recirculation. If a flow cell is within a set temperature
range neither valve may open. The hot and cold recirculation water
runs through the aluminum flow cell body, but remains separate and
isolated from the assay buffers and reagents.
Sequence Analysis of Random Arrays of Target Sequence
Concatemers
[0097] As mentioned above, random arrays of biomolecules, such as
genomic DNA fragments or cDNA fragments, provides a platform for
large scale sequence determination and for genome-wide measurements
based on counting sequence tags, in a manner similar to
measurements made by serial analysis of gene expression (SAGE) or
massively parallel signature sequencing, e.g. Velculescu, et al,
(1995), Science 270, 484-487; and Brenner et al (2000), Nature
Biotechnology, 18: 630-634. Such genome-wide measurements include,
but are not limited to, determination of polymorphisms, including
nucleotide substitutions, deletions, and insertions, inversions,
and the like, determination of methylation patterns, copy number
patterns, and the like, such as could be carried out by a wide
range of assays known to those with ordinary skill in the art, e.g.
Syvanen (2005), Nature Genetics Supplement, 37: S5-510; Gunderson
et al (2005), Nature Genetics, 37: 549-554; Fan et al (2003), Cold
Spring Harbor Symposia on Quantitative Biology, LXVIII: 69-78; and
U.S. Pat. Nos. 4,883,750; 6,858,412; 5,871,921; 6,355,431; and the
like, which are incorporated herein by reference.
[0098] A variety of sequencing methodologies can be used with
random arrays of the invention, including, but not limited to,
hybridization-based methods, such as disclosed in Drmanac, U.S.
Pat. Nos. 6,864,052; 6,309,824; and 6,401,267; and Drmanac et al,
U.S. patent publication 2005/0191656, which are incorporated by
reference, sequencing by synthesis methods, e.g. Nyren et al, U.S.
Pat. No. 6,210,891; Ronaghi, U.S. Pat. No. 6,828,100; Ronaghi et al
(1998), Science, 281: 363-365; Balasubramanian, U.S. Pat. No.
6,833,246; Quake, U.S. Pat. No. 6,911,345; Li et al, Proc. Natl.
Acad. Sci., 100: 414-419 (2003), which are incorporated by
reference, and ligation-based methods, e.g. Shendure et al (2005),
Science, 309: 1728-1739, which is incorporated by reference.
[0099] Combination of probe hybridization or probe-probe ligation
with other DNA array based short read sequencing methods. There are
many approaches to determine about 10-100 bases of sequence per DNA
samples on an array of DNA samples. There are various sequencing by
synthesis (SBS) methods (Solexa, 454) including primer extension
methods, ligation based methods (4) or degradation/ligation based
methods (Lynx). All of these methods may be combined with probe
hybridization or probe-probe ligation data to provide longer read
lengths with small numbers of cycles to get higher accuracy. DNA
arrays may be prepared from DNA samples about 100-1000 bases in
length where, in one segment or two segments close to
adapters/primers/anchors sequences are determined by positional
methods (SBS and others). The same DNA array is subjected to probe
hybridization or probe-probe combinatorial ligation on the entire
DNA or a part that is still in the form of ssDNA.
[0100] In one aspect, a method of determining a nucleotide sequence
of a target polynucleotide in accordance with the invention
comprises the following steps: (a) generating a plurality of target
concatemers from the target polynucleotide, each target concatemer
comprising multiple copies of a fragment of the target
polynucleotide and the plurality of target concatemers including a
number of fragments that substantially covers the target
polynucleotide; (b) forming a random array of target concatemers
fixed to a surface at a density such that at least a majority of
the target concatemers are optically resolvable; (c) identifying a
sequence of at least a portion of each fragment in each target
concatemer; and (d) reconstructing the nucleotide sequence of the
target polynucleotide from the identities of the sequences of the
portions of fragments of the concatemers. Usually, "substantially
covers" means that the amount of DNA analyzed contains an
equivalent of at least two copies of the target polynucleotide, or
in another aspect, at least ten copies, or in another aspect, at
least twenty copies, or in another aspect, at least 100 copies.
Target polynucleotides may include DNA fragments, including genomic
DNA fragments and cDNA fragments, and RNA fragments. Guidance for
the step of reconstructing target polynucleotide sequences can be
found in the following references, which are incorporated by
reference: Lander et al, Genomics, 2: 231-239 (1988); Vingron et
al, J. Mol. Biol., 235: 1-12 (1994); and like references.
[0101] In one aspect, a sequencing method for use with the
invention for determining sequences in a plurality of DNA or RNA
fragments comprises the following steps: (a) generating a plurality
of polynucleotide molecules each comprising a concatemer of a DNA
or RNA fragment; (b) forming a random array of polynucleotide
molecules fixed to a surface at a density such that at least a
majority of the target concatemers are optically resolvable; and
(c) identifying a sequence of at least a portion of each DNA or RNA
fragment in resolvable polynucleotides using at least one chemical
reaction of an optically detectable reactant. In one embodiment,
such optically detectable reactant is an oligonucleotide. In
another embodiment, such optically detectable reactant is a
nucleoside triphosphate, e.g. a fluorescently labeled nucleoside
triphosphate that may be used to extend an oligonucleotide
hybridized to a concatemer. In another embodiment, such optically
detectable reagent is an oligonucleotide formed by ligating a first
and second oligonucleotides that form adjacent duplexes on a
concatemer. In another embodiment, such chemical reaction is
synthesis of DNA or RNA, e.g. by extending a primer hybridized to a
concatemer. In yet another embodiment, the above optically
detectable reactant is a nucleic acid binding oligopeptide or
polypeptide or protein.
[0102] In one aspect, parallel sequencing of polynucleotide
analytes of concatemers on a random array is accomplished by
combinatorial SBH (cSBH), as disclosed by Drmanac in the
above-cited patents. In one aspect, a first and second sets of
oligonucleotide probes are provide, wherein each sets has member
probes that comprise oligonucleotides having every possible
sequence for the defined length of probes in the set. For example,
if a set contains probes of length six, then it contains 4096
(=4.sup.6) probes. In another aspect, first and second sets of
oligonucleotide probes comprise probes having selected nucleotide
sequences designed to detect selected sets of target
polynucleotides. Sequences are determined by hybridizing one probe
or pool of probe, hybridizing a second probe or a second pool of
probes, ligating probes that form perfectly matched duplexes on
their target sequences, identifying those probes that are ligated
to obtain sequence information about the target sequence, repeating
the steps until all the probes or pools of probes have been
hybridized, and determining the nucleotide sequence of the target
from the sequence information accumulated during the hybridization
and identification steps.
[0103] For sequencing operation, in some embodiments, the sets may
be divided into subsets that are used together in pools, as
disclosed in U.S. Pat. No. 6,864,052. Probes from the first and
second sets may be hybridized to target sequences either together
or in sequence, either as entire sets or as subsets, or pools. In
one aspect, lengths of the probes in the first or second sets are
in the range of from 5 to 10 nucleotides, and in another aspect, in
the range of from 5 to 7 nucleotides, so that when ligated they
form ligation products with a length in the range of from 10 to 20,
and from 10 to 14, respectively.
[0104] A nice feature of the probe-probe assay is that only a small
subset of all 6-mers need to be scored for each 300-600 base
fragment to allow an efficient mapping of DNA fragments. In
addition, redundant base reading with all 6-mers is achieved by
combined data from overlapping DNA fragments. This is especially
efficient when combined with 40-base reads obtained by probe-anchor
ligation (see algorithm description in D.2.7). It is possible to
score 1/16 of all 6-mers for each DNB by creating 16 subsets of 256
6-mers (i.e. a total of 4096 6-mers) and 16 array sections from a
3-6 billion whole genome DNB array. These 16 array sections
comprised of 24 2.times.2 mm unit arrays (see array preparation
above) may be analyzed in parallel in 16 reaction chambers each
with a different subset of 256 6-mers. The 16 6-mer subsets may be
scored by a combinatorial ligation of 16 N563 and 16 B3N5-tail
probes each. For 500-base fragments, out of 256 6-mers scored,
about 32 may be positive on average, e.g. -200 bases may be read
for each fragment. This is more than enough for mapping 500-base
fragments especially in combination with 40-base end sequence and
the hierarchical fragmentation schema (see algorithm description in
C.2.7.). In addition, because there are 250-500 overlapping
500-base fragments covering each base in the genome, all 6-mers may
be scored -20 times in these fragments providing -120 6-mer reads
for each base with 6 overlapping 6-mers. The required 16 chambers
are also good for optics and imaging since multiple chambers with
reactions staggered in time, allow simple continuous use of
multiple CCD cameras.
[0105] In another aspect, using such techniques, the sequence
identity of each attached DNA concatemer may be determined by a
"signature" approach. About 50 to 100 or possibly 200 probes are
used such that about 25-50% or in some applications 10-30% of
attached concatemers will have a full match sequence for each
probe. This type of data allows each amplified DNA fragment within
a concatemer to be mapped to the reference sequence. For example,
by such a process one can score 64 4-mers (i.e. 25% of all possible
256 4-mers) using 16 hybridization/stripoff cycles in a 4 colors
labeling schema. On a 60-70 base fragment amplified in a concatemer
about 16 of 64 probes will be positive since there are 64 possible
4-mers present in a 64 base long sequence (i.e. one quarter of all
possible 4-mers). Unrelated 60-70 base fragments will have a very
different set of about 16 positive decoding probes. A combination
of 16 probes out of 64 probes has a random chance of occurrence in
1 of every one billion fragments which practically provides a
unique signature for that concatemer. Scoring 80 probes in 20
cycles and generating 20 positive probes create a signature even
more likely to be unique: occurrence by chance is 1 in billion
billions. Previously, a "signature" approach was used to select
novel genes from cDNA libraries. An implementation of a signature
approach is to sort obtained intensities of all tested probes and
select up to a predefined (expected) number of probes that satisfy
the positive probe threshold. These probes will be mapped to
sequences of all DNA fragments (sliding window of a longer
reference sequence may be used) expected to be present in the
array. The sequence that has all or a statistically sufficient
number of the selected positive probes is assigned as the sequence
of the DNA fragment in the given concatemer. In another approach an
expected signal can be defined for all used probes using their pre
measured full match and mismatch hybridization/ligation efficiency.
In this case a measure similar to the correlation factor can be
calculated.
[0106] A preferred way to score 4-mers is to ligate pairs of
probes, for example: N.sub.(5-7)BBB with BN.sub.(7-9), where B is
the defined base and N is a degenerate base. For generating
signatures on longer DNA concatemer probes, more unique bases will
be used. For example, a 25% positive rate in a fragment 1000 bases
in length would be achieved by N.sub.(4-6) BBBB and BBN.sub.(6-8).
Note that longer fragments need the same number of about 60-80
probes (15-20 ligation cycles using 4 colors).
[0107] In one embodiment all probes of a given length (e.g. 4096
N.sub.2-4BBBBBBN.sub.2-4) or all ligation pairs may be used to
determine complete sequence of the DNA in a concatemer. For
example, 1024 combinations of N.sub.(5-7)B.sub.3 and BBN.sub.(6-8)
may be scored (256 cycles if 4 colors are used) to determine
sequence of DNA fragments of up to about 250 bases, preferably up
to about 100 bases.
[0108] The decoding of sequencing probes with large numbers of Ns
may be prepared from multiple syntheses of subsets of sequences at
degenerated bases to minimize difference in the efficiency. Each
subset is added to the mix at a proper concentration. Also, some
subsets may have more degenerated positions than others. For
example, each of 64 probes from the set N.sub.(5-7)BBB may be
prepared in 4 different synthesis. One is regular all 5-7 bases to
be fully degenerated; second is NO-3(A,T)5BBB; third is
NO-2(A,T)(G,C)(A,T)(G,C)(A,T)BBB, and the fourth is
NO-2(G,C)(A,T)(G,C)(A,T)(G,C)BBB.
[0109] Oligonucleotide preparation from the three specific
syntheses is added in to regular synthesis in experimentally
determined amounts to increase hybrid generation with target
sequences that have in front of the BBB sequence an AT rich (e.g.
AATAT) or (A or T) and (G or C) alternating sequence (e.g. ACAGT or
GAGAC). These sequences are expected to be less efficient in
forming a hybrid. All 1024 target sequences can be tested for the
efficiency to form hybrid with N.sub.0-3NNNNNBBB probes and those
types that give the weakest binding may be prepared in about 1-10
additional synthesis and added to the basic probe preparation.
[0110] Decoding by Signatures: a smaller number of probes for small
number of distinct samples: 5-7 positive out of 20 probes (5 cycles
using 4 colors) has capacity to distinct about 10-100 thousand
distinct fragments
[0111] Decoding of 8-20mer RCR products. In this application arrays
are formed as random distributions of unique 8 to 20 base
recognition sequences in the form of DNA concatemers. The probes
need to be decoded to determine the sequence of the 8-20 base probe
region. At least two options are available to do this and the
following example describes the process for a 12 mer. In the first,
one half of the sequence is determined by utilizing the
hybridization specificity of short probes and the ligation
specificity of fully matched hybrids. Six to ten bases adjacent to
the 12 mer are predefined and act as a support for a 6mer to 10-mer
oligonucleotide. This short 6mer will ligate at its 3-prime end to
one of 4 labeled 6-mers to 10-mers. These decoding probes consist
of a pool of 4 oligonucleotides in which each oligonucleotide
consists of 4-9 degenerate bases and 1 defined base. This
oligonucleotide will also be labeled with one of four fluorescent
labels. Each of the 4 possible bases A, C, G, or T will therefore
be represented by a fluorescent dye. For example these 5 groups of
4 oligonucleotides and one universal oligonucleotide (Us) can be
used in the ligation assays to sequence first 5 bases of 12-mers:
B=each of 4 bases associated with a specific dye or tag at the
end:
UUUUUUUU.BNNNNNNN*
UUUUUUUU.NBNNNNNN
UUUUUUUU.NNBNNNNN
UUUUUUUU.NNNBNNNN
UUUUUUUU.NNNNBNNN
[0112] Six or more bases can be sequences with additional probe
pools. To improve discrimination at positions near the center of
the 12mer the 6mer oligonucleotide can be positioned further into
the 12mer sequence. This will necessitate the incorporation of
degenerate bases into the 3-prime end of the non-labeled
oligonucleotide to accommodate the shift. This is an example of
decoding probes for position 6 and 7 in the 12-mer.
UUUUUUNN.NNNBNNNN
UUUUUUNN.NNNNBNNN
[0113] In a similar way the 6 bases from the right side of the
12mer can be decoded by using a fixed oligonucleotide and 5-prime
labeled probes. In the above described system 6 cycles are required
to define 6 bases of one side of the 12mer. With redundant cycle
analysis of bases distant to the ligation site this may increase to
7 or 8 cycles. In total then, complete sequencing of the 12mer
could be accomplished with 12-16 cycles of ligation. Partial or
complete sequencing of arrayed DNA by combining two distinct types
of libraries of detector probes. In this approach one set has
probes of the general type N.sub.3-8B.sub.4-6 (anchors) that are
ligated with the first 2 or 3 or 4 probes/probe pools from the set
BN.sub.6-8, NBN.sub.5-7, N.sub.2BN.sub.4-6, and N.sub.3BN.sub.3-5.
The main requirement is to test in a few cycles a probe from the
first set with 2-4 or even more probes from the second set to read
longer continuous sequence such as 5-6+3-4=8-10 in just 3-4 cycles.
In one example, the process is:
[0114] 1) Hybridize 1-4 4-mers or more 5-mer anchors to obtain
70-80% 1 or 2 anchors per DNA. One way to discriminate which anchor
is positive from the pool is to mix specific probes with distinct
hybrid stability (maybe different number of Ns in addition).
Anchors may be also tagged to determine which anchor from the pool
is hybridized to a spot. Tags, as additional DNA segment, may be
used for adjustable displacement as a detection method.
[0115] For example, certain probes can be used after hybridization
or hybridization and ligation differentially removed with two
corresponding displacers: Separate cycles may be used just to
determine which anchor is positive. For this purpose anchors
labeled or tagged with multiple colors may be ligated to unlabeled
N7-N10 supporter oligonucleotides.
[0116] 2) Hybridize BNNNNNNNN probe with 4 colors corresponding to
4 bases; wash discriminatively (or displace by complement to the
tag) to read which of two scored bases is associated to which
anchor if two anchors are positive in one DNA. Thus, two 7-10 base
sequences can be scores at the same time.
[0117] In 2-4 cycles extend to 4-6 base anchor for additional 2-4
bases run 16 different anchors per each array (32-64 physical
cycles if 4 colors are used) to determine about 16 possible 8-mers
(.sup..about.100 bases total) per each fragment (more than enough
to map it to the reference (probability that a 100-mer will have a
set of 10 8-mers is less than 1 in trillion trillions; (10exp-28).
By combining data from different anchors scored in parallel on the
same fragment in another array complete sequence of that fragment
and by extension to entire genomes may be generated from
overlapping 7-10-mers.
[0118] Tagging probes with DNA tags for larger multiplex of
decoding or sequence determination probes Instead of directly
labeling probes they can be tagged with different oligonucleotide
sequences made of natural bases or new synthetic bases (such as
isoG and isoC). Tags can be designed to have very precise binding
efficiency with their anti-tags using different oligonucleotide
lengths (about 6-24 bases) and/or sequence including GC content.
For example 4 different tags may be designed that can be recognized
with specific anti-tags in 4 consecutive cycles or in one
hybridization cycle followed by a discriminative wash. In the
discriminative wash initial signal is reduced to 95-99%, 30-40%,
10-20% and 0-5% for each tag, respectively. In this case by
obtaining two images 4 measurements are obtained assuming that
probes with different tags will rarely hybridize to the same dot.
Another benefit of having many different tags even if they are
consecutively decoded (or 2-16 at a time labeled with 2-16 distinct
colors) is the ability to use a large number of individually
recognizable probes in one assay reaction. This way a 4-64 times
longer assay time (that may provide more specific or stronger
signal) may be affordable if the probes are decoded in short
incubation and removal reactions.
[0119] The decoding process requires the use of 48-96 or more
decoding probes. These pools will be further combined into 12-24 or
more pools by encoding them with four fluorophores, each having
different emission spectra. Using a 20.times. objective, each 6
mm.times.6 mm array may require roughly 30 images for full coverage
by using a 10 mega pixel camera with. Each of 1 micrometer array
areas is read by about 8 pixels. Each image is acquired in 250
milliseconds, 150 ms for exposure and 100 ms to move the stage.
Using this fast acquisition it will take .sup..about.7.5 seconds to
image each array, or 12 minutes to image the complete set of 96
arrays on each substrate. In one embodiment of an imaging system,
this high image acquisition rate is achieved by using four
ten-megapixel cameras, each imaging the emission spectra of a
different fluorophore. The cameras are coupled to the microscope
through a series of dichroic beam splitters. The autofocus routine,
which takes extra time, runs only if an acquired image is out of
focus. It will then store the Z axis position information to be
used upon return to that section of that array during the next
imaging cycle. By mapping the autofocus position for each location
on the substrate we will drastically reduce the time required for
image acquisition.
[0120] Each array requires about 12-24 cycles to decode. Each cycle
consists of a hybridization, wash, array imaging, and strip-off
step. These steps, in their respective orders, may take for the
above example 5, 2, 12, and 5 minutes each, for a total of 24
minutes each cycle, or roughly 5-10 hours for each array, if the
operations were performed linearly. The time to decode each array
can be reduced by a factor of two by allowing the system to image
constantly. To accomplish this, the imaging of two separate
substrates on each microscope is staggered. While one substrate is
being reacted, the other substrate is imaged.
[0121] An exemplary decoding cycle using cSBH includes the
following steps: (i) set temperature of array to hybridization
temperature (usually in the range 5-25.degree. C.); (ii) use robot
pipetter to pre mix a small amount of decoding probe with the
appropriate amount of hybridization buffer; (iii) pipette mixed
reagents into hybridization chamber; (iv) hybridize for
predetermined time; (v) drain reagents from chamber using pump
(syringe or other); (vi) add a buffer to wash mismatches of
non-hybrids; (vii) adjust chamber temperature to appropriate wash
temp (about 10-40.degree. C.); (viii) drain chamber; (ix) add more
wash buffer if needed to improve imaging; (x) image each array,
preferably with a mid power (20.times.) microscope objective
optically coupled to a high pixel count high sensitivity CCD
camera, or cameras; plate stage moves chambers (or perhaps
flow-cells with input funnels) over object, or objective-optics
assembly moves under chamber; certain optical arrangements, using
di-chroic mirrors/beam-splitters can be employed to collect
multi-spectral images simultaneously, thus decreasing image
acquisition time; arrays can be imaged in sections or whole,
depending on array/image size/pixel density; sections can be
assembled by aligning images using statistically significant empty
regions pre-coded onto substrate (during active site creation) or
can be made using a multi-step nano-printing technique, for example
sites (grid of activated sites) can be printed using specific
capture probe, leaving empty regions in the grid; then print a
different pattern or capture probe in that region using separate
print head; (xi) drain chamber and replace with probe strip buffer
(or use the buffer already loaded) then heat chamber to probe
stripoff temperature (60-90.degree. C.); high pH buffer may be used
in the strip-off step to reduce stripoff temperature; wait for the
specified time; (xii) remove buffer; (xiii) start next cycle with
next decoding probe pool in set.
Labels and Signal Generation by Probes Directed to Polynucleotides
on Arrays of the Invention
[0122] The oligonucleotide probes of the invention can be labeled
in a variety of ways, including the direct or indirect attachment
of radioactive moieties, fluorescent moieties, colorimetric
moieties, chemiluminescent moieties, and the like. Many
comprehensive reviews of methodologies for labeling DNA and
constructing DNA adaptors provide guidance applicable to
constructing oligonucleotide probes of the present invention. Such
reviews include Kricka, Ann. Clin. Biochem., 39: 114-129 (2002);
Schaferling et al, Anal. Bioanal. Chem., (Apr. 12, 2006); Matthews
et al, Anal. Biochem., Vol 169, pgs. 1-25 (1988); Haugland,
Handbook of Fluorescent Probes and Research Chemicals, Tenth
Edition (Invitrogen/Molecular Probes, Inc., Eugene, 2006); Keller
and Manak, DNA Probes, 2nd Edition (Stockton Press, New York,
1993); and Eckstein, editor, Oligonucleotides and Analogues: A
Practical Approach (IRL Press, Oxford, 1991); Wetmur, Critical
Reviews in Biochemistry and Molecular Biology, 26: 227-259 (1991);
Hermanson, Bioconjugate Techniques (Academic Press, New York,
1996); and the like. Many more particular methodologies applicable
to the invention are disclosed in the following sample of
references: Fung et al, U.S. Pat. No. 4,757,141; Hobbs, Jr., et al
U.S. Pat. No. 5,151,507; Cruickshank, U.S. Pat. No. 5,091,519;
(synthesis of functionalized oligonucleotides for attachment of
reporter groups); Jablonski et al, Nucleic Acids Research, 14:
6115-6128 (1986) (enzyme-oligonucleotide conjugates); Ju et al,
Nature Medicine, 2: 246-249 (1996); Bawendi et al, U.S. Pat. No.
6,326,144 (derivatized fluorescent nanocrytals); Bruchez et al,
U.S. Pat. No. 6,274,323 (derivatized fluorescent nanocrystals); and
the like.
[0123] In one aspect, one or more fluorescent dyes are used as
labels for the oligonucleotide probes, e.g. as disclosed by Menchen
et al, U.S. Pat. No. 5,188,934 (4,7-dichlorofluorscein dyes); Begot
et al, U.S. Pat. No. 5,366,860 (spectrally resolvable rhodamine
dyes); Lee et al, U.S. Pat. No. 5,847,162 (4,7-dichlororhodamine
dyes); Khanna et al, U.S. Pat. No. 4,318,846 (ether-substituted
fluorescein dyes); Lee et al, U.S. Pat. No. 5,800,996 (energy
transfer dyes); Lee et al, U.S. Pat. No. 5,066,580 (xanthene dyes):
Mathies et al, U.S. Pat. No. 5,688,648 (energy transfer dyes); and
the like. Labeling can also be carried out with quantum dots, as
disclosed in the following patents and patent publications,
incorporated herein by reference: U.S. Pat. Nos. 6,322,901;
6,576,291; 6,423,551; 6,251,303; 6,319,426; 6,426,513; 6,444,143;
5,990,479; 6,207,392; 2002/0045045; 2003/0017264; and the like. As
used herein, the term "fluorescent signal generating moiety" means
a signaling means which conveys information through the fluorescent
absorption and/or emission properties of one or more molecules.
Such fluorescent properties include fluorescence intensity,
fluorescence life time, emission spectrum characteristics, energy
transfer, and the like.
[0124] Commercially available fluorescent nucleotide analogues
readily incorporated into the labeling oligonucleotides include,
for example, Cy3-dCTP, Cy3-dUTP, Cy5-dCTP, Cy5-dUTP (Amersham
Biosciences, Piscataway, N.J., USA), fluorescein-12-dUTP,
tetramethylrhodamine-6-dUTP, Texas Red.RTM.-5-dUTP, Cascade
Blue.RTM.-7-dUTP, BODIPY.RTM. FL-14-dUTP, BODIPY.RTM. R-14-dUTP,
BODIPY.RTM. TR-14-dUTP, Rhodamine Green.TM.-5-dUTP, Oregon
Green.RTM. 488-5-dUTP, Texas Red.RTM.-12-dUTP, BODIPY.RTM.
630/650-14-dUTP, BODIPY.RTM. 650/665-14-dUTP, Alexa Fluor.RTM.
488-5-dUTP, Alexa Fluor.RTM. 532-5-dUTP, Alexa Fluor.RTM.
568-5-dUTP, Alexa Fluor.RTM. 594-5-dUTP, Alexa Fluor.RTM.
546-14-dUTP, fluorescein-12-UTP, tetramethylrhodamine-6-UTP, Texas
Red.RTM.-5-UTP, Cascade Blue.RTM.-7-UTP, BODIPY.RTM. FL-14-UTP,
BODIPY.RTM. TMR-14-UTP, BODIPY.RTM. TR-14-UTP, Rhodamine
Green.TM.-5-UTP, Alexa Fluor.RTM. 488-5-UTP, Alexa Fluor.RTM.
546-14-UTP (Molecular Probes, Inc. Eugene, Oreg., USA). Other
fluorophores available for post-synthetic attachment include, inter
alia, Alexa Fluor.RTM. 350, Alexa Fluor.RTM. 532, Alexa Fluor.RTM.
546, Alexa Fluor.RTM. 568, Alexa Fluor.RTM. 594, Alexa Fluor.RTM.
647, BODIPY 493/503, BODIPY FL, BODIPY R6G, BODIPY 530/550, BODIPY
TMR, BODIPY 558/568, BODIPY 558/568, BODIPY 564/570, BODIPY
576/589, BODIPY 581/591, BODIPY 630/650, BODIPY 650/665, Cascade
Blue, Cascade Yellow, Dansyl, lissamine rhodamine B, Marina Blue,
Oregon Green 488, Oregon Green 514, Pacific Blue, rhodamine 6G,
rhodamine green, rhodamine red, tetramethylrhodamine, Texas Red
(available from Molecular Probes, Inc., Eugene, Oreg., USA), and
Cyt, Cy3.5, Cy5.5, and Cy7 (Amersham Biosciences, Piscataway, N.J.
USA, and others). FRET tandem fluorophores may also be used, such
as PerCP-Cy5.5, PE-Cy5, PE-Cy5.5, PE-Cy7, PE-Texas Red, and
APC-Cy7; also, PE-Alexa dyes (610, 647, 680) and APC-Alexa dyes.
Biotin, or a derivative thereof, may also be used as a label on a
detection oligonucleotide, and subsequently bound by a detectably
labeled avidin/streptavidin derivative (e.g.
phycoerythrin-conjugated streptavidin), or a detectably labeled
anti-biotin antibody. Digoxigenin may be incorporated as a label
and subsequently bound by a detectably labeled anti-digoxigenin
antibody (e.g. fluoresceinated anti-digoxigenin). An
aminoallyl-dUTP residue may be incorporated into a detection
oligonucleotide and subsequently coupled to an N-hydroxy
succinimide (NHS) derivitized fluorescent dye, such as those listed
supra. In general, any member of a conjugate pair may be
incorporated into a detection oligonucleotide provided that a
detectably labeled conjugate partner can be bound to permit
detection. As used herein, the term antibody refers to an antibody
molecule of any class, or any subfragment thereof, such as an Fab.
Other suitable labels for detection oligonucleotides may include
fluorescein (FAM), digoxigenin, dinitrophenol (DNP), dansyl,
biotin, bromodeoxyuridine (BrdU), hexahistidine (6.times.His),
phosphor amino acids (e.g. P-tyr, P-ser, P-thr), or any other
suitable label. In one embodiment the following hapten/antibody
pairs are used for detection, in which each of the antibodies is
derivatized with a detectable label: biotin/.alpha.-biotin,
digoxigenin/a-digoxigenin, dinitrophenol (DNP)/.alpha.-DNP,
5-Carboxyfluorescein (FAM)/.alpha.-FAM. As described in schemes
below, probes may also be indirectly labeled, especially with a
hapten that is then bound by a capture agent, e.g. as disclosed in
Holtke et al, U.S. Pat. Nos. 5,344,757; 5,702,888; and 5,354,657;
Huber et al, U.S. Pat. No. 5,198,537; Miyoshi, U.S. Pat. No.
4,849,336; Misiura and Gait, PCT publication WO 91/17160; and the
like. Many different hapten-capture agent pairs are available for
use with the invention. Exemplary, haptens include, biotin,
des-biotin and other derivatives, dinitrophenol, dansyl,
fluorescein, CY5, and other dyes, digoxigenin, and the like. For
biotin, a capture agent may be avidin, streptavidin, or antibodies.
Antibodies may be used as capture agents for the other haptens
(many dye-antibody pairs being commercially available, e.g.
Molecular Probes).
Kits of the Invention
[0125] In the commercialization of the methods described herein,
certain kits for construction of random arrays of the invention and
for using the same for various applications are particularly
useful. Kits for applications of random arrays of the invention
include, but are not limited to, kits for determining the
nucleotide sequence of a target polynucleotide, kits for
large-scale identification of differences between reference DNA
sequences and test DNA sequences, kits for profiling exons, and the
like. A kit typically comprises at least one support having a
surface and one or more reagents necessary or useful for
constructing a random array of the invention or for carrying out an
application therewith. Such reagents include, without limitation,
nucleic acid primers, probes, adaptors, enzymes, and the like, and
are each packaged in a container, such as, without limitation, a
vial, tube or bottle, in a package suitable for commercial
distribution, such as, without limitation, a box, a sealed pouch, a
blister pack and a carton. The package typically contains a label
or packaging insert indicating the uses of the packaged materials.
As used herein, "packaging materials" includes any article used in
the packaging for distribution of reagents in a kit, including
without limitation containers, vials, tubes, bottles, pouches,
blister packaging, labels, tags, instruction sheets and package
inserts.
[0126] In one aspect, the invention provides a kit for making a
random array of concatemers of DNA fragments from a source nucleic
acid comprising the following components: (i) a support having a
surface; and (ii) at least one adaptor oligonucleotide for ligating
to each DNA fragment and forming a DNA circle therewith, each DNA
circle capable of being replicated by a rolling circle replication
reaction to form a concatemer that is capable of being randomly
disposed on the surface. In such kits, the surface may be a planar
surface having an array of discrete spaced apart regions, wherein
each discrete spaced apart region has a size equivalent to that of
said concatemers. The discrete spaced apart regions may form a
regular array with a nearest neighbor distance in the range of from
0.1 to 20 .mu.m The concatemers on the discrete spaced apart
regions may have a nearest neighbor distance such that they are
optically resolvable. The discrete spaced apart regions may have
capture oligonucleotides attached and the adaptor oligonucleotides
may each have a region complementary to the capture
oligonucleotides such that the concatemers are capable of being
attached to the discrete spaced apart regions by formation of
complexes between the capture oligonucleotides and the
complementary regions of the adaptor oligonucleotides. In some
embodiments, the concatemers are randomly distributed on said
discrete spaced apart regions and the nearest neighbor distance is
in the range of from 0.3 to 3',am Such kits may further comprise
(a) a terminal transferase for attaching a homopolymer tail to said
DNA fragments to provide a binding site for a first end of said
adaptor oligonucleotide, (b) a ligase for ligating a strand of said
adaptor oligonucleotide to ends of said DNA fragment to form said
DNA circle, (c) a primer for annealing to a region of the strand of
said adaptor oligonucleotide, and (d) a DNA polymerase for
extending the primer annealed to the strand in a rolling circle
replication reaction. The above adaptor oligonucleotide may have a
second end having a number of degenerate bases in the range of from
4 to 12.
[0127] In another aspect the invention provides kits for sequencing
a target polynucleotide comprising the following components: (i) a
support having a planar surface having an array of optically
resolvable discrete spaced apart regions, wherein each discrete
spaced apart region has an area of less than 1 .mu.m.sup.2; (ii) a
first set of probes for hybridizing to a plurality of concatemers
randomly disposed on the discrete spaced apart regions, the
concatemers each containing multiple copies of a DNA fragment of
the target polynucleotide; and (iii) a second set of probes for
hybridizing to the plurality of concatemers such that whenever a
probe from the first set hybridizes contiguously to a probe from
the second set, the probes are ligated. Such kits may further
include a ligase, a ligase buffer, and a hybridization buffer. In
some embodiments, the discrete spaced apart regions may have
capture oligonucleotides attached and the concatemers may each have
a region complementary to the capture oligonucleotides such that
said concatemers are capable of being attached to the discrete
spaced apart regions by formation of complexes between the capture
oligonucleotides and the complementary regions of said
concatemers.
[0128] In still another aspect, the invention provides kits for
constructing a single molecule array comprising the following
components: (i) a support having a surface having reactive
functionalities; and (ii) a plurality of macromolecular structures
each having a unique functionality and multiple complementary
functionalities, the macromolecular structures being capable of
being attached randomly on the surface wherein the attachment is
formed by one or more linkages formed by reaction of one or more
reactive functionalities with one or more complementary
functionalities; and wherein the unique functionality is capable of
selectively reacting with a functionality on an analyte molecule to
form the single molecule array. In some embodiments of such kits,
the surface is a planar surface having an array of discrete spaced
apart regions containing said reactive functionalities and wherein
each discrete spaced apart region has an area less than 1
ptm.sup.2. In further embodiments, the discrete spaced apart
regions form a regular array with a nearest neighbor distance in
the range of from 0.1 to 20 pun. In further embodiments, the
concatemers on the discrete spaced apart regions have a nearest
neighbor distance such that they are optically resolvable. In still
further embodiments, the macromolecular structures may be
concatemers of one or more DNA fragments and wherein the unique
functionalities are at a 3' end or a 5' end of the concatemers.
[0129] In another aspect, the invention includes kits for
circularizing DNA fragments comprising the components: (a) at least
one adaptor oligonucleotide for ligating to one or more DNA
fragments and forming DNA circles therewith (b) a terminal
transferase for attaching a homopolymer tail to said DNA fragments
to provide a binding site for a first end of said adaptor
oligonucleotide, (c) a ligase for ligating a strand of said adaptor
oligonucleotide to ends of said DNA fragment to form said DNA
circle, (d) a primer for annealing to a region of the strand of
said adaptor oligonucleotide, and (e) a DNA polymerase for
extending the primer annealed to the strand in a rolling circle
replication reaction. In an embodiment of such kit, the above
adaptor oligonucleotide may have a second end having a number of
degenerate bases in the range of from 4 to 12. The above kit may
further include reaction buffers for the terminal transferase,
ligase, and DNA polymerase. In still another aspect, the invention
includes a kit for circularizing DNA fragments using a Circligase
enzyme (Epicentre Biotechnologies, Madison, Wis.), which kit
comprises a volume exclusion polymer. In another aspect, such kit
further includes the following components: (a) reaction buffer for
controlling pH and providing an optimized salt composition for
Circligase, and (b) Circligase cofactors. In another aspect, a
reaction buffer for such kit comprises 0.5 M MOPS (pH 7.5), 0.1 M
KC1, 50 mM MgC1.sub.2, and 10 mM DTT. In another aspect, such kit
includes Circligase, e.g. 10-100 .mu.L Circligase solution (at 100
unit/.mu.L). Exemplary volume exclusion polymers are disclosed in
U.S. Pat. No. 4,886,741, which is incorporated by reference, and
include polyethylene glycol, polyvinylpyrrolidone, dextran sulfate,
and like polymers. In one aspect, polyethylene glycol (PEG) is 50%
PEG4000. In one aspect, a kit for circle formation includes the
following:
TABLE-US-00002 Amount Component Final Conc. 2 .mu.L Circligase 10
.times. reaction buffer lx 0.5 .mu.L 1 mM ATP 25 .mu.M 0.5 .mu.L 50
mM MnC1.sup.2 1.25 mM 4 .mu.L 50% PEG4000 10% 2 .mu.L Circligase
ssDNA ligase (100 10 units/.mu.L units/pi) single stranded DNA
template 0.5-10 pmol/.mu.L sterile water
[0130] Final reaction volume: 20 .mu.L. The above components are
used in the following protocol: [0131] Heat DNA at 60-96.degree. C.
depending on the length of the DNA (ssDNA templates that have a
5'-phosphate and a 3'-hydroxyl group). [0132] Preheat 2.2.times.
reaction mix at 60.degree. C. for about 5-10 min. [0133] If DNA was
preheated to 96.degree. C. cool it down at 60.degree. C. [0134] Mix
DNA and buffer at 60.degree. C. without cooling it down and
incubate for 2-3h. [0135] Heat Inactivate enzyme to stop the
ligation reaction.
Large-Scale Mutation Discovery by Mismatch Enzyme Cleavage
[0136] Arrays and sequencing methods of the invention used may be
used for large-scale identification of polymorphisms using mismatch
cleavage techniques. Several approaches to mutation detection
employ a heteroduplex in which the mismatch itself is utilized for
cleavage recognition. Chemical cleavage with piperidine at
mismatches modified with hydroxylamine or osmium tetroxide provides
one approach to release a cleaved fragment. In a similar way the
enzymes T7 endonuclease I or T4 endonuclease VII have been used in
the enzyme mismatch cleavage (EMC) techniques, e.g. Youil et al,
Proc. Natl. Acad. Sci., 92: 87-91 (1995); Mashal et al, Nature
Genetics, 9: 177-183 (1995); Babon et al, Molecular Biotechnology,
23: 73-81 (2003); Ellis et al, Nucleic Acids Research, 22:
2710-2711 (1994); and the like, which are incorporated herein by
reference. Cleavase is used in the cleavage fragments length
polymorphism (CFLP) technique which has been commercialized by
Third Wave Technologies. When single stranded DNA is allowed to
fold and adopt a secondary structure the DNA will form internal
hairpin loops at locations dependent upon the base sequence of the
strand. Cleavase will cut single stranded DNA five-prime of the
loop and the fragments can then be separated by PAGE or similar
size resolving techniques. Mismatch binding proteins such as Mut S
and Mut Y also rely upon the formation of heteroduplexes for their
ability to identify mutation sites. Mismatches are usually repaired
but the binding action of the enzymes can be used for the selection
of fragments through a mobility shift in gel electrophoresis or by
protection from exonucleases, e.g. Ellis et al (cited above).
[0137] Templates for heteroduplex formation are prepared by primer
extension from genomic DNA. For the same genomic region of the
reference DNA, an excess of the opposite strand is prepared in the
same way as the test DNA but in a separate reaction. The test DNA
strand produced is biotinylated and is attached to a streptavidin
support. Homoduplex formation is prevented by heating and removal
of the complementary strand. The reference preparation is now
combined with the single stranded test preparation and annealed to
produce heteroduplexes. This heteroduplex is likely to contain a
number of mismatches. Residual DNA is washed away before the
addition of the mismatch endonuclease, which, if there is a
mismatch every 1 kb would be expected to produce about 10 fragments
for a 10 kb primer extension. After cleavage, each fragment can
bind an adapter at each end and enter the mismatch-fragment circle
selection process. Capture of mismatch cleaved DNA from Large
genomic fragments. The 5-10 kb genomic fragments prepared from
large genomic fragments as described above are biotinylated by the
addition of a biotinylated dideoxy nucleotide at the 3-prime end
with terminal transferase and excess biotinylated nucleotide are
removed by filtration. A reference BAC clone that covers the same
region of sequence is digested with the same six-base cutter to
match the fragments generated from the test DNA. The biotinylated
genomic fragments are heat denatured in the presence of the BAC
reference DNA and slowly annealed to generate biotinylated
heteroduplexes. The reference BAC DNA is in large excess to the
genomic DNA so the majority of biotinylated products will be
heteroduplexes. The biotinylated DNA can then be attached to the
surface for removal of the reference DNA. Residual DNA is washed
away before the addition of the mismatch endonuclease. After
cleavage, each fragment can bind an adapter at each end and enter
the mismatch circle selection process as follows. (a) DNA is
cleaved on both sides of the mismatch. (b) 5-prime overhangs are
generated that can be ligated. (3' overhangs are also created by
digesting with an appropriate restriction endonuclease having a
four base recognition site.) (c) An adapter is introduced that
contains an active overhang at one side. (d) An adapter is ligated
to each of the two generated fragments (only ligation to the right
from the 5' phosphate after addition of sequences to the 3' end of
the top strand). (e) The molecule is phosphorylated and a bridging
oligonucleotide is used to ligate the two ends of the single
stranded molecule. (f) After circularization, a concatemer is
generated by extending a primer in a RCR reaction.
Circle Formation from Mismatch Cleavage Products
[0138] Method I. The heteroduplexes generated above can be used for
selection of small DNA circles, as illustrated in FIGS. 7 and 8. As
shown in FIG. 7, in this process, heteroduplex (700) of a sample is
treated with the mismatch enzyme to create products cleaved on both
strands (704 and 706) surrounding the mutation site (702) to
produce fragments (707) and (705). T7 endonuclease I or similar
enzyme cleaves 5-prime of the mutation site to reveal a 5-prime
overhang of varying length on both strands surrounding the
mutation. The next phase is to capture the cleaved products in a
form suitable for amplification and sequencing. Adapter (710) is
ligated to the overhang produced by the mismatch cutting (only
fragment (705) shown), but because the nature of the overhang is
unknown, at least three adapters are needed and each adapter is
synthesized with degenerate bases to accommodate all possible ends.
The adapter can be prepared with an internal biotin (708) on the
non-circularizing strand to allow capture for buffer exchange and
sample cleanup, and also for direct amplification on the surface if
desired.
[0139] Because the intervening sequence between mutations does not
need to be sequenced and reduces the sequencing capacity of the
system it is removed when studying genomic-derived samples.
Reduction of sequence complexity is accomplished by a type Its
enzyme that cuts the DNA at a point away from the enzyme
recognition sequence. In doing so, the cut site and resultant
overhangs will be a combination of all base variants. Enzymes that
can be used include MmcI (20 bases with 2 base 3' overhang) and Eco
P15I (with 25 bases and 2 base 5' overhang). The adapter is about
50 by in length to provide sequences for initiation of rolling
circle amplification and also provide stiffer sequence for circle
formation, as well as recognition site (715) for a type Its
restriction endonuclease. Once the adapter has been ligated to the
fragment the DNA is digested (720) with the type Its restriction
enzyme to release all but 20-25 bases of sequence containing the
mutation site that remains attached to the adapter.
[0140] The adaptered DNA fragment is now attached to a streptavidin
support for removal of excess fragment DNA. Excess adapter that did
not ligate to mismatch cleaved ends will also bind to the
streptavidin solid support. The new degenerate end created by the
type Its enzyme can now be ligated to a second adapter through the
phosphorylation of one strand of the second adapter. The other
strand is non-phosphorylated and blocked at the 3-prime end with a
dideoxy nucleotide. The structure formed is essentially the genomic
fragment of interest captured between two different adapters. To
create a circle from this structure would simply require both ends
of the molecule coming together and ligating, e.g. via formation of
staggered ends by digesting at restriction sites (722) and (724),
followed by intra-molecular ligation. Although this event should
happen efficiently, there is also the possibility that the end of
an alternative molecule could ligate at the other end of the
molecule creating a dimer molecule, or greater multiples of each
unit molecule. One way to minimize this is to perform the ligation
under dilute conditions so only intra-molecular ligation is
favored, then re-concentrating the sample for future steps. An
alternative strategy to maximize the efficiency of circle formation
without inter-molecular ligation is to block excess adapters on the
surface. This can be achieved by using lambda exonuclease to digest
the lower strand. If second adapter has been attached then it will
be protected from digestion because there is no 5-prime phosphate
available. If only the first adapter is attached to the surface
then the 5-prime phosphate is exposed for degradation of the lower
strand of the adapter. This will lead to loss of excess first
adapter from the surface.
[0141] After lambda exonuclease treatment the 5 prime end of the
top strand of the first adapter is prepared for ligation to the
3-prime end of the second adapter. This can be achieved by
introducing a restriction enzyme site into the adapters so that
re-circularization of the molecule can occur with ligation.
Amplification of DNA captured into the circular molecules proceeds
by a rolling circle amplification to form long linear concatemer
copies of the circle. If extension initiates 5-prime of the biotin,
the circle and newly synthesized strand is released into solution.
Complementary oligonucleotides on the surface are responsible for
condensation and provide sufficient attachment for downstream
applications. One strand is a closed circle and acts as the
template. The other strand, with an exposed 3-prime end, acts as an
initiating primer and is extended.
[0142] Method II. This method, illustrated in FIG. 8, is similar to
the procedure above with the following modifications. 1) The
adapter can be prepared with a 3-prime biotin (808) on the
non-circularized strand to allow capture for buffer exchange and
sample cleanup. 2) Reduction of sequence complexity of the 10 kb
heteroduplex fragments described above occurs through the use of
4-base cutting restriction enzymes, e.g. with restriction sites
(810), (812), and (814). Use of 2 or 3 enzymes in the one reaction
could reduce the genomic fragment size down to about 100 bases. The
adapter-DNA fragment can be attached to a streptavidin support for
removal of excess fragment DNA. Excess adapter that did not ligate
to mismatch cleaved ends will also bind to the streptavidin solid
support. The biotinylated and phosphorylated strand can now be
removed by lambda exonuclease which will degrade from the 5-prime
end but leave the non-phosphorylated strand intact. To create a
circle from this structure now requires both ends of the molecule
coming together and ligating to form the circle. Several approaches
are available to form the circle using a bridging oligonucleotide,
as described above. A polynucleotide can be added to the 3-prime
end with terminal transferase to create a sequence for one half of
a bridge oligonucleotide (818) to hybridize to, shown as polyA tail
(816). The other half will bind to sequences in the adapter.
Alternatively, before addition of the exonuclease, an adapter can
be added to the end generated by the 4-base cutter which will
provide sequence for the bridge to hybridize to after removal of
one strand by exonuclease. A key aspect of this selection procedure
is the ability to select the strand for circularization and
amplification. This ensures that only the strand with the original
mutation (from the 5-prime overhang) and not the strand from the
adapter is amplified. If the 3-prime recessed strand was amplified
then a mismatch from the adapter could create a false base call at
the site of or near to the mutation. Amplification of DNA captured
into the circular molecules proceeds by a rolling circle
amplification to form linear concatemer copies of the circle.
[0143] Alternative applications of mis-match derived circles. The
mis-match derived small circular DNA molecules may be amplified by
other means such as PCR. Common primer binding sites can be
incorporated into the adapter sequences The amplified material can
be used for mutation detection by methods such as Sanger sequencing
or array based sequencing.
[0144] Cell-free clonal selection of cDNAs. Traditional methods of
cloning have several drawbacks including the propensity of bacteria
to exclude sequences from plasmid replication and the time
consuming and reagent-intensive protocols required to generate
clones of individual cDNA molecules. Linear single-stranded can be
made from amplifications of DNA molecules that have been closed
into a circular form. These large concatemeric, linear forms arise
from a single molecule and can act as efficient, isolated targets
for PCR when separated into a single reaction chamber, in much the
same way a bacterial colony is picked to retrieve the cDNA
containing plasmid. We plan to develop this approach as a means to
select cDNA clones without having to pass through a cell-based
clonal selection step. The first step of this procedure will
involve ligating a gene specific oligonucleotide directed to the
5-prime end with a poly dA sequence for binding to the poly dT
sequence of the 3-prime end of the cDNA. This oligonucleotide acts
as a bridge to allow T4 DNA ligase to ligate the two ends and form
a circle.
[0145] The second step of the reaction is to use a primer, or the
bridging oligonucleotide, for a strand displacing polymerase such
as Phi 29 polymerase to create a concatemer of the circle. The long
linear molecules will then be diluted and arrayed in 1536 well
plates such that wells with single molecules can be selected. To
ensure about 10% of the wells contain 1 molecule approximately 90%
would have to be sacrificed as having no molecules. To detect the
wells that are positive a dendrimer that recognizes a universal
sequence in the target is hybridized to generate 10K-100K dye
molecules per molecule of target. Excess dendrimer is removed
through hybridization to biotinylated capture oligos. The wells are
analyzed with a fluorescent plate reader and the presence of DNA
scored. Positive wells are then re-arrayed to consolidate the
clones into plates with complete wells for further
amplification
Splice Variant Detection and Exon Profiling
[0146] The process described is based on random DNA arrays and
"smart" probe pools for the identification and quantification of
expression levels of thousands of genes and their splice variants.
In eukaryotes, as the primary transcript emerges from the
transcription complex, spliceosomes interact with splice sites on
the primary transcript to excise out the introns, e.g. Maniatis et
al, Nature, 418: 236-243 (2002). However, because of either
mutations that alter the splice site sequences, or external factors
that affect spliceosome interaction with splice sites, alternative
splice sites, or cryptic splice sites, could be selected resulting
in expression of protein variants encoded by mRNA with different
sets of exons. Surveys of cDNA sequences from large scale EST
sequencing projects indicated that over 50% of the genes have known
splice variants. In a recent study using a microarray-based
approach, it was estimated that as high as 75% of genes are
alternatively spliced, e.g. Johnson et al, Science, 302: 2141-2144
(2003).
[0147] The diversity of proteins generated through alternative
splicing could partially contribute to the complexity of biological
processes in higher eukaryotes. This also leads to the implication
that the aberrant expression of variant protein forms could be
responsible for pathogenesis of diseases. Indeed, alternative
splicing has been found to associate with various diseases like
growth hormone deficiency, Parkinson's disease, cystic fibrosis and
myotonic dystrophy, e.g. Garcia-Blanco et al, Nature Biotechnology,
22: 535-546 (2004). Because of the difficulty in isolating and
characterizing novel splice variants, the evidence implicating
roles of splice variants in cancer could represent the tip of the
iceberg. With the availability of tools that could rapidly and
reliably characterize splicing patterns of mRNA, it would help to
elucidate the role of alternative splicing in cancer and in disease
development in general.
[0148] In one aspect, methods of the invention permit large-scale
measurement of splice variants with the following steps: (a)
Prepare full length first strand cDNA for targeted or all mRNAs.
(b) Circularize the generated full length (or all) first strand
cDNA molecules by incorporating an adapter sequence. (c) By using
primer complementary to the adapter sequence perform rolling circle
replication (RCR) of cDNA circles to form concatemers with over 100
copies of initial cDNA. (d) Prepare random arrays by attaching RCR
produced "cDNA balls" to glass surface coated with capture
oligonucleotide complementary to a portion of the adapter sequence;
with an advanced submicron patterned surface one mm.sup.2 can have
between 1-10 million cDNA spots; note that the attachment is a
molecular process and does not require robotic spotting of
individual "cDNA balls" or concatemers. (e) Starting from pre-made
universal libraries of 4096 6-mers and 1024 labeled 5-mers, use a
sophisticated computer program and a simple robotic pipettor to
create 40-80 pools of about 200 6-mers and 20 5-mers for testing
all 10,000 or more exons in targeted 1000 or more up to all known
genes in the sample organism/tissue. (f) In a 4-8 hour process,
hybridize/ligate all probe pools in 40-80 cycles on the same random
array using an automated microscope-like instrument with a
sensitive 10-mega pixel CCD detector for generating an array image
for each cycle. (g) Use a computer program to perform spot signal
intensity analysis to identify which cDNA is on which spot, and if
any of the expected exons is missing in any of the analyzed genes.
Obtain exact expression levels for each splice variant by counting
occurrences in the array.
[0149] This system provides a complete analysis of the exon pattern
on a single transcript, instead of merely providing information on
the ratios of exon usage or quantification of splicing events over
the entire population of transcribed genes using the current
expression arrays hybridized with labeled mRNA/cDNA. At the maximum
limit of its sensitivity, it allows a detailed analysis down to a
single molecule of a mRNA type present in only one in hundreds of
other cells; this would provide unique potentials for early
diagnosis of cancer cells. The combination of selective cDNA
preparation with an "array of random arrays" in a standard 384-well
format and with "smart" pools of universal short probes provides
great flexibility in designing assays; for examples, deep analysis
of a small number of genes in selected samples, or more general
analysis in a larger number of samples, or analysis of a large
number of genes in smaller number of samples. The analysis provides
simultaneously 1) detection of each specific splice variant, 2)
quantification of expression of wild type and alternatively spliced
mRNAs. It can also be used to monitor gross chromosomal alterations
based on the detection of gene deletions and gene translocations by
loss of heterozygosity and presence of two sub-sets of exons from
two genes in the same transcript on a single spot on the random
array. The exceptional capacity and informativeness of this assay
is coupled with simple sample preparation from very small
quantities of mRNA, fully-automated assay based on all pre-made,
validated reagents including libraries of universal labeled and
unlabeled probes and primers/adapters that will be ultimately
developed for all human and model organism genes. The proposed
splice variant profiling process is equivalent to high throughput
sequencing of individual full length cDNA clones; rSBH throughput
can reach one billion cDNA molecules profiled in a 4-8 hour assay.
This system will provide a powerful tool to monitor changes in
expression levels of various splice variants during disease
emergence and progression. It can enable discovery of novel splice
variants or validate known splice variants to serve as biomarkers
to monitor cancer progression. It can also provide means to further
understanding the roles of alternative splice variants and their
possible uses as therapeutic targets. Universal nature and
flexibility of this low cost and high throughput assay provides
great commercial opportunities for cancer research and diagnostics
and in all other biomedical areas. This high capacity system is
ideal for service providing labs or companies.
[0150] Preparation of templates for in vitro transcription. Exon
sequences are cloned into the multiple cloning sites (MCS) of
plasmid pBluescript, or like vector. For the purposes of
demonstrating the usefulness of the probe pools, it is not
necessary to clone the contiguous full-length sequence, nor to
maintain the proper protein coding frame. For genes that are
shorter than 1 kb, PCR products are generated from cDNA using gene
specific oligos for the full length sequence. For longer genes, PCR
products are generated comprising about 500 by that corresponding
to contiguous block of exons and ordered the fragments by cloning
into appropriate cloning sites in the MCS of pBluescript. This is
also the approach for cloning the alternative spliced versions,
since the desired variant might not be present in the cDNA source
used for PCR.
[0151] The last site of the MCS is used to insert a string of 40
A's to simulate the polyA tails of cellular mRNA. This is to
control for the possibility that the polyA tail might interfere
with the sample preparation step described below, although it is
not expected to be a problem since a poly-dA tail is incorporated
in sample preparation of genomic fragments as described. T7 RNA
polymerase will be used to generate the run-off transcripts and the
RNA generated will be purified with the standard methods.
[0152] Preparation of samples for arraying. Because the probe pools
are designed for specific genes, cDNA is prepared for those
specific genes only. For priming the reverse transcription
reactions, gene-specific primers are used, therefore for 1000
genes, 1000 primers are used. The location of the priming site for
the reverse transcription is selected with care, since it is not
reasonable to expect the synthesis of cDNA >2 kb to be of high
efficiency. It is quite common that the last exon would consist of
the end of the coding sequence and a long 3' untranslated region.
In the case of CD44 for example, although the full-length mRNA is
about 5.7 kb, the 3' UTR comprises of 3 kb, while the coding region
is only 2.2 kb. Therefore the logical location of the reverse
transcription primer site is usually immediately downstream of the
end of the coding sequence. For some splice variants, the
alternative exons are often clustered together as a block to create
a region of variability. In the case of Tenascin C variants (8.5
kb), the most common isoform has a block of 8 extra exons, and
there is evidence to suggest that there is variability in exon
usage in that region. So for Tenascin C, the primer will be located
just downstream of that region. Because of the concern of
synthesizing cDNA with length >2 kb, for long genes, it might be
necessary to divide the exons into blocks of 2 kb with multiple
primers.
[0153] Reverse transcription reactions may be carried out with
commercial systems, e.g. SuperScript III system from Invitrogen
(Carlsbad, Calif.) and the StrataScript system from Stratagene (La
Jolla, Calif.). Once single stranded cDNA molecules are produced,
the rest of the procedures involved putting on the adaptor
sequence, circularization of the molecule and RCR as described
above. The 5' ends of the cDNAs are basically the incorporated
gene-specific primers used for initiating the reverse
transcription. By incorporating a 7 base universal tag on the 5'
end of the reverse-transcription priming oligos, all the cDNA
generated will carry the same 7 base sequence at the 5' end. Thus a
single template oligonucleotide that is complementary to both the
adaptor sequence and the universal tag can be used to ligate the
adaptor to all the target molecules, without using the template
oligonucleotide with degenerate bases. As for the 3' end of the
cDNA (5' end of the mRNA) which is usually ill-defined, it may be
treated like a random sequence end of a genomic fragment. Similar
methods of adding a polyA tail will be applied, thus the same
circle closing reaction may also be used.
[0154] Reverse transcriptases are prone to terminate prematurely to
create truncated cDNAs. Severely truncated cDNAs probably will not
have enough probe binding sites to be identified with a gene
assignment, thus would not be analyzed. cDNA molecules that are
close, but not quite full-length, may show up as splice variant
with missing 5' exons. If there are no corroborating evidence from
a sequence database to support such variants, they may be
discounted. A way to avoid such problem is to select for only the
full-length cDNA (or those with the desired 3' end) to be
compatible with circle closing reaction, then any truncated
molecules will not be circularized nor replicated. First a
dideoxy-cytosine residue can be added to the 3' end of all the cDNA
to block ligation, then by using a mismatch oligo targeting the
desired sequence, a new 3' end can be generated by enzyme mismatch
cleavage using T4 endonuclease VII. With the new 3' end, the cDNA
can proceed with the adding a poly-dA tail and with the standard
protocols of circularization and replication.
[0155] Replicated and arrayed concatemers of the exon fragments may
be carried out using combinatorial SBH, as described above. The
algorithm of the following steps may be used to select 5-mer and
6-mer probes for use in the technique:
[0156] Step 1: Select 1000-2000 shortest exons (total about 20-50
kb), and find out matching sequences for each of 1024 available
labeled 5-mers. On average each 5-mer will occur 20 times over 20
kb, but some may occur over 50 or over 100 times. By selecting the
most frequent 5-mer, the largest number of short exons will be
detected with the single labeled probe. A goal would be to detect
about 50-100 short exons (10%-20% of 500 exons) per cycle. Thus
less than 10 labeled probes and 50-100 unlabeled 6-mers would be
sufficient. Small number of labeled probes is favorable because it
minimizes overall fluorescent background.
[0157] Step 2. Find out all 6-mers that are contiguous with all
sites in all 1000 genes that are complementary to 10 selected
5-mers. On average 20 such sites will exist in each 2 kb gene.
Total number of sites would be about 20,000, e.g., each 6-mer on
average will occur 5 times. Sort 6-mers by the hit frequency. The
most frequent may have over 20 hits, e.g. such 6-mer will detect 20
genes through combinations with 10 labeled probes. Thus, to get a
single probe pair for each of the 500 genes a minimum of 25 6-mer
probes would be required. Realistically, 100 to 200 6-mers may be
required.
[0158] Due to benefits of combinatorial SBH that uses pre-made
libraries of 6-mer and 5-mer probes 40 probe pools are readily
prepared with about 200 probes per pool using established pipetting
robotics. The information generated is equivalent to having over 3
probes per exon, therefore the use of 8000 5-mers and 6-mers
effectively replaces the 30,000 longer exons specific probes
required for a single set of 1000 genes.
[0159] Exon profiling. The profiling of exons can be performed in
two phases: the gene identification phase and the exon
identification phase. In the gene identification phase, each
concatemer on the array can be uniquely identified with a
particular gene. In theory, 10 probe pools or hybridization cycles
will be enough to identify 1000 genes using the following scheme.
Each gene is assigned a unique binary code. The number of binary
digits thus depends on the total number of genes: 3 digits for 8
genes, 10 digits for 1024 genes. Each probe pool is designed to
correspond to a digit of the binary code and would contain probes
that would hit a unique combination of half of the genes and one
hit per gene only. Thus for each hybridization cycle, an unique
half of the genes will score a 1 for that digit and the other half
will score zero. Ten hybridization cycles with 10 probe pools will
generate 1024 unique binary codes, enough to assign 1000 unique
genes to all the concatemers on the array. To provide redundancy in
the identification data, 15-20 cycles would be used. If 20 cycles
are used, it would provide 1 million unique binary codes and there
should be enough information to account for loss of signals due to
missing exons or gene deletions. It will also be equivalent to
having 10 data points per gene (20 cycles of 500 data point each
give 10,000 data points total), or one positive probe-pair per
exon, on average. At this point after 20 cycles, this system is
capable of making assignment of 1 million unique gene identities to
the ampliots. Therefore by counting gene identities of the
ampliots, one can determine quantitatively the expression level of
all the genes (but not sub-typing of splice variants) in any given
samples.
[0160] After identifying each ampliot with a gene assignment, its
exon pattern will be profiled in the exon identification phase. For
the exon identification phase, one exon per gene in all or most of
the genes is tested per hybridization cycle. In most cases 10-20
exon identification cycles should be sufficient. Thus, in the case
of using 20 exon identification cycles we will obtain information
of 2 probes per each of 10 exons in each gene. For genes with more
than 20 exons, methods can be developed so that 2 exons per gene
can be probed at the same cycle. One possibility is using multiple
fluorophores of different colors, and another possibility is to
exploit differential hybrid stabilities of different ligation probe
pairs.
[0161] In conclusion, a total of about 40 assay cycles will provide
sufficient information to obtain gene identity at each spot and to
provide three matching probe-pairs for each of 10,000 exons with
enough informational redundancy to provide accurate identification
of missing exons due to alternative splicing or chromosomal
deletions.
Example 1: Glass Cover Slip as Random Array Support: Derivatization
Protocol
[0162] concatemers. The following materials are used: Millipore DI
water 2.5 ml of 3-Am inopropyldimethylethoxysilane (Gelest) 1.6
grams p-phenylenediisothiocyanate (Acros Organics/fisher) 210 grams
KOH (VWR)
Ethanol (VWR)
Methanol (VWR)
Pyridine (VWR)
N,N-dimethylformamide (VWR)
Acetone (VWR)
Equipment
[0163] 100c oven magnetic stir plate 1 2''.times..5'' magnetic stir
bar 2 4 liter Nunc beaker 7 4''.times.8''.times.4'' glass
containers 1 liter graduated cylinder 1 100 ml graduated cylinder 1
lab scale 1 Metzler scale 1 large weigh boat 1 small weigh boat 1
pair thick nitrite gloves 1 large funnel 1 ml pipettman with filter
tips 1 nalgene stir bar 1 airtight container (tupperware)
[0164] Using the large graduated cylinder measure 950 m1 of
ethanol, add to the 4 liter Nunc beaker. Measure 50 m1 of DI water
in the small graduated cylinder and add to the same nunc beaker.
Measure out 210 grams of KOH pellets in a weigh boat on the lab
scale. Add stir bar and KOH pellets to the beaker. Place beaker on
stir plate and stir at low speed until KOH is completely dissolved.
While KOH is dissolving, lay out 6 pre-washed glass containers fill
containers 2-5 with DI water until 1/2 inch from top (800 m1). Fill
container 6 with acetone 1/2'' to top. Carefully pour dissolved KOH
solution into container 1 until 1/2'' to top. Add racked cover
slips to container 1 wait 3 minutes, remove racks from container 1
and wash in containers 2-5 leaving racks in each container a
minimum of 15 seconds. Submerse racks briefly in container 6. Set
aside racks, dispose the solutions from containers 1 and 2 in the
basic waste container using the large funnel and thick nitrile
gloves, clean and dry labware. Lay out 7 clean and dry glass
containers. Add 775 ml of acetone to container 1 add 2.5 ml of DI
water to container 1. stir container 1 with pipette tip for 20
seconds. With a new pipette tip add 2.5 ml of
3-aminopropyldimethylethoxysilane to container 1. Stir with pipette
tip for 10 seconds. Immerse all 5 racks of cover slips into
container 1. Cover container 1 with polypropylene box top. Wait 45
minutes. 15 minutes prior to the completion of the reaction, fill
containers 2-4 until 1/2'' to top with acetone, fill container 5
with water 1/2'' to top. Fill container 6 until 1/2'' to top with
acetone. Upon reaction completion (45 minutes) transfer cover slip
racks 1-5 from container 1 to container 2, wait 15 seconds. Repeat
this though container 6. Place racks into empty container 7 and put
in 100c oven. Wait one hour.
[0165] Lay out 7 glass containers. After racks come out of oven,
use the Meltzer scale to weigh out 1.6 grams of
p-phenylenediisothiocyanate (PDC) in the small weigh boat. Pour 720
ml dimethylformamide into the cleaned 1 liter graduated cylinder,
fill to 800 m1 with pyridine. Pour 50% this solution into a clean
class container then pour it back into the cylinder to mix (repeat
once). Fill container 1 until 1/2'' to top with this solution. Add
the PDC from the weigh boat to container 1. Use stir bar to mix
solution. Crush PDC clumps that refuse to dissolve, then stir
again. Cover slip racks should be cool by now. Place all 5 racks
into container one. Cover with polypropylene box top. Wait 2 hours.
10 minutes prior to reaction completion fill containers 2 and 3
with methanol until 1/2'' from top. Fill containers 4 and 5 with
acetone until 1/2'' from top. Fill container 6 with 65% acetone 35%
water until 1/2'' from top. Fill container 7 with acetone.
Successively transfer racks through all containers, waiting 15
seconds between each transfer.
[0166] Remove racks from container 7 dump contents of containers
1-7 into organic waste drum. Replace racks to container 7 and dry
in oven for 15 minutes. Place dry racks into airtight container,
they are now ready for attachment.
Example 2: Preparation of RCR Products from E. coli Genomic DNA
& Disposition onto a Glass Cover Slip
[0167] E. coli genomic DNA (32 ug) (Sigma Chemical Co) was
fragmented with 0.16 U of DnaseI (Epicentre) at 37.degree. C. for
10 min and then heat inactivated at 95.degree. C. for 10 min.
Reaction products were distributed with an average size of 200 by
as determined by agarose gel electrophoresis. If reaction products
did not meet the required size distribution they were further
digested with the addition of fresh enzyme. The final concentration
was 200 ng/ul of genomic DNA.
[0168] The Dnase digested DNA (26 ng/ul) was reacted with Terminal
deoxynucleotide transferase (0.66 U/ul) from New England Biolabs
(NEB) in reaction buffer supplied by NEB. The reaction contained
dATP (2 mM) and was performed at 37 C for 30 min and then heat
inactivated at 70 C for 10 min. The DNA sample was then heated to
95 C for 5 min before rapid cooling on ice.
[0169] A synthetic DNA adapter was then ligated to the 5' end of
the genomic DNA by first forming a hybrid of a 65-base
oligonucleotide (TATCATCTACTGCACTGACCGGATGTTAGGAAGACAAAAGGAAGCT
GAGGGTCACATTAACGGAC) (SEQ ID NO: 8) with a second oligonucleotide
(NNNNNNNGTCCGTTAA TGTGAC 3' 2'3'ddC) (SEQ ID NO: 9) at the 3' end
of the 65mer in which the 7 "Ns" form an overhang. The shorter
oligo will act as a splint for ligation of the 65mer to the 5' end
of the genomic fragments. The splint molecule consists of 7
degenerate bases at its 5' end to hybridize to variable bases at
the 5' end of the genomic DNA. The adapter hybrid was formed by
slowly hybridizing 1200 pmol of adapter with 1200 pmol of splint in
52 ul from 95 C to room temperature over 1 hr.
[0170] T4 DNA Ligase (0.3 U/ul) was combined with genomic DNA (17
ng/ul) and adapter-splint (0.5 uM) in 1.times. ligase reaction
buffer supplied by NEB. The ligation proceeded at 15 C for 30 min,
20 C for 30 min and then inactivated at 70 C for 10 min. A second
splint molecule (AGATGATATTTTTTTT 3' 2'3'ddC) (SEQ ID NO: 10) (0.6
uM) was then added to the reaction and the mix was supplemented
with more ligase buffer and T4 DNA ligase (0.3 U/ul). The reaction
proceeded at 15 C for 30 min and then at 20 C for 30 min before
inactivation for 10 min at 70 C.
[0171] The ligation mix was then treated with exonuclease I (NEB)
(1 U/ul) at 37 C for 60 min, followed by inactivation at 80 C for
20 min
[0172] Rolling circle replication was performed in reaction buffer
supplied by NEB with BSA (0.1 ug/ul), 0.2 mM each dNTP, an
initiating primer (TCAGCTTCCTTTTGTCTTCCTAAC) (SEQ ID NO: 11) at 2
fmol/ul, exonuclease treated ligation of genomic DNA at 24 pg/ul,
and Phi 29 polymerase (0.2 U/ul). The reaction was performed for 1
hr at 30 C and then heat inactivated at 70 C for 10 min.
[0173] RCR reaction products were attached to the surface of cover
slips by first attaching amine modified oligonucleotides to the
surface of the cover slips. A capture probe GAMINOC6][SP
C18][SP-C18]GGATGTTAGGAAGACAAAAGGAAGCTGAGG) (SEQ ID NO: 12) (50 uM)
was added to the DITC derivatized cover slips in 0.1 uM NaHCO.sub.3
and allowed to dry at 40 C for about 30 min. The cover slips were
rinsed in DDI water for 15 min and dried. RCR reaction products
(4.5 ul) were then combined with 0.5 ul of 20.times.SSPE and added
to the center of the slide. The sample was allowed to air dry and
non-attached material was washed off for 10 min in 3.times.SSPE and
then briefly in DDI water. The slide was then dried before assembly
on the microscope. Attached RCR products were visualized by
hybridizing an I Imer TAMRA labeled probe that is complementary to
a region of the adapter
[0174] RCR reaction products were formed from a single stranded
80mer synthetic DNA target
NNNGCATANCACGANGTCATNATCGTNCAAACGTCAGTCCANGAATC
NAGATCCACTTAGANTAAAAAAAAAAAA) (SEQ ID NO: 13) as above but without
poly A addition with TDT. The RCR reaction contained target
molecules at an estimated 12.6 fmol/ul. Reaction products (5 ul)
were combined with SSPE (2.times.) and SDS (0.3%) in a total
reaction volume of 20 ul. The sample was applied to a cover-slip in
which lines of capture probe ([AMINOC6][SP-C18][SP
C18]GGATGTTAGGAAGACAAAAGGAAGCTGAGG), deposited in a solution of 50
uM with 0.1 uM NaHCO.sub.3, were dried onto the surface and left in
a humid chamber for 30 min. The solution was then washed off in
3.times.SSPE for 10 min and then briefly in water. Various reaction
components were tested for their effect upon RCR product formation.
The addition of Phi 29 to the RCR reaction at a final concentration
of 0.1 U/ul rather than 0.2 U/ul was found to create a greater
proportion of RCR products that were of larger intensity after
detection probe hybridization. The addition of initiating primer at
10 to 100 fold molar ratio relative to estimated target
concentration was also found to be optimal. Increased extension
times produced more intense fluorescent signals but tended to
produce more diffuse concatemers. With the current attachment
protocols a 21u-extension time produced enhanced signals relative
to a I hr incubation with minimal detrimental impact upon RCR
product morphology.
[0175] Further optimization of RCR products have been achieved by
reducing the estimated concentration of synthetic and genomic
targets to 0.1 to 0.25 fmol/ul in the RCR reaction. This typically
results in distinct and unique RCR products on the surface of the
microscope slide using method 1 for attachment. For synthetic
targets in which a higher concentration of targets in the RCR
reaction may be present (e.g. >5 fmol/ul), RCR products may be
attached by method 2.
[0176] Attachment method 1. RCR reaction products (4.5 ul) were
combined with 0.5 ul of 20.times.SSPE and added to the center of
the slide. The sample was allowed to air dry and non-attached
material was washed off for 10 min in 3.times.SSPE and then briefly
in DDI water. The slide was then dried before assembly on the
microscope. Attached RCR products were visualized by hybridizing an
I Imer TAMRA labeled probe that is complementary to a region of the
adapter. Attachment method 2. RCR reaction products (1 ul) were
combined with 50 ul of 3.times.SSPE and added to the center of the
cover slip with capture probe attached. Addition of SDS (0.3%) was
found to promote specific attachment to the capture probes and not
to the derivatized surface. The sample was incubated at room
temperature for 30 min and non-attached material was washed off for
10 min in 3.times.SSPE and then briefly in DDI water. The slide was
then dried before assembly on the microscope. Attached RCR products
were visualized by hybridizing an IImer TAMRA labeled probe that is
complementary to a region of the adapter. The above protocols
provide RCR product densities of about 1 RCR product per 2-4 micron
square. Exemplary image of a resulting cover slip is shown in FIG.
3.
Example 3: Distinguish RCR Products on Random Arrays Using
Fluorescently Labeled Probes
[0177] PCR products from diagnostic regions of Bacillus anthracis
and Yersinia pestis were converted into single stranded DNA and
attached to a universal adaptor. These two samples were then mixed
and replicated together using RCR and deposited onto a glass
surface as a random array. Successive hybridization with amplicon
specific probes showed that each spot on the array corresponded
uniquely to either one of the two sequences and that they can be
identified specifically with the probes, as illustrated in FIG. 4.
This result demonstrates sensitivity and specificity of identifying
DNA present in submicron sized DNA concatemers having about
100-1000 copies of a DNA fragment generated by the RCR reaction. A
155 by amplicon sequence from B. anthracis and a 275 by amplicon
sequence from Y. pestis were amplified using standard PCR
techniques with PCR primers in which one primer of the pair was
phosphorylated. A single stranded form of the PCR products was
generated by degradation of the phosphorylated strand using lambda
exonuclease. The 5' end of the remaining strand was then
phosphorylated using T4 DNA polynucleotide kinase to allow ligation
of the single stranded product to the universal adaptor. The
universal adaptor was ligated using T4 DNA ligase to the 5' end of
the target molecule, assisted by a template oligonucleotide
complementary to the 5' end of the targets and 3' end of the
universal adaptor. The adaptor ligated targets were then
circularized using bridging oligonucleotides with bases
complementary to the adaptor and to the 3' end of the targets.
Linear DNA molecules were removed by treating with exonuclease I.
RCR products (DNA concatemers) were generated by mixing the
single-stranded samples and using Phi29 polymerase to replicate
around the circularized adaptor-target molecules with the bridging
oligonucleotides as the initiating primers.
[0178] To prepare the cover slips for attaching amine-modified
oligonucleotides, the cover slips were first cleaned in a
potassium/ethanol solution followed by rinsing and drying. They
were then treated with a solution of
3-aminopropyldimethylethoxysilane, acetone, and water for 45
minutes and cured in an oven at 100.degree. C. for 1 hour. As a
final step, the cover slips were treated with a solution of
p-phenylenediisothiocyanate (PDC), pyridine, and dimethylformamide
for 2 hours. The capture oligonucleotide (sequence
5'-GGATGTTAGGAAGACAAAAGGAAGCTGAGG-3') (SEQ ID NO: 14) is
complementary to the universal adaptor sequence. and is modified at
the 5' end with an amine group and 2 C-18 linkers. For attachment,
10 IA of the capture oligo at 10 .mu.M in 0.1M NaHCO.sub.3 was
spotted onto the center of the derivatized cover slip, dried for 10
minutes in a 70.degree. C. oven and rinsed with water. To create an
array of DNA concatemers, the RCR reaction containing the DNA
concatemers was diluted 10-folds with 3.times.SSPE, 20 IA of which
was then deposited over the immobilized capture oligonucleotides on
the cover slip surface for 30 minutes in a moisture saturated
chamber. The cover slip with the DNA concatemers was then assembled
into a reaction chamber and was rinsed by 2 ml of 3.times.SSPE.
Arrayed target concatemer molecules derived from B. anthracis and
Y. pestis PCR amplicons were probed sequentially with TAMRA-labeled
oligomer: probe BrPrb3 (sequence: 5'-CATTAACGGAC-3' (SEQ ID NO:
15), specifically complementary to the universal adaptor sequence),
probe Ba3 (sequence: 5'-TGAGCGATTCG-3' (SEQ ID NO: 16),
specifically complementary to the Ba3 amplicon sequence), probe Yp3
(sequence: 5'-GGTGTCATGGA-3', specifically complementary to the Yp3
amplicon sequence). The probes were hybridized to the array at a
concentration of 0.1 .mu.M for 20 min in 3.times.SSPE at room
temperature. Excess probes were washed off with 2 ml of
3.times.SSPE. Images were taken with the TIRF microscope. The
probes were then stripped off with 1 ml of 3.times.SSPE at
80.degree. C. for 5 minutes to prepare the arrayed target molecules
for the next round of hybridization.
[0179] By overlaying the images obtained from successive
hybridization of 3 probes, as shown in FIG. 4, it can be seen that
most of the arrayed molecules that hybridized with the adaptor
probe would only hybridize to either the amplicon 1 probe (e.g. "A"
in FIG. 4) or the amplicon 2 probe (e.g. "B" in FIG. 4), with very
few that would hybridize to both. This specific hybridization
pattern demonstrates that each spot on the array contains only one
type of sequence, either the B anthracis amplicon or the Y. pestis
amplicon.
Example 4: Decoding a Base Position in Arrayed Concatemers Created
from a Synthetic 80-Mer Oligonucleotide Containing a Degenerated
Base
[0180] Individual molecules of a synthetic oligonucleotide
containing a degenerate base can be divided into 4 sub-populations,
each may have either an A, C, G or T base at that particular
position. An array of concatemers created from this synthetic DNA
may have about 25% of spots with each of the bases. Successful
identification of these sub-populations of concatemers was
demonstrated by four successive hybridization and ligation of pairs
of probes, specific to each of the 4 bases, as shown in FIG. 5. A
5' phosphorylated, 3' TAMRA-labeled pentamer oligonucleotide was
paired with one of the four hexamer oligonucleotides. Each of these
4 ligation probe pairs should hybridize to either an A, C, G or T
containing version of the target. Discrimination scores of greater
than 3 were obtained for most targets, demonstrating the ability to
identify single base differences between the nanoball targets. The
discrimination score is the highest spot score divided by the
average of the other 3 base-specific signals of the same spot. By
adjusting the assay conditions (buffer composition, concentrations
of all components, time and temperature of each step in the cycle)
higher signal to background and full match to mismatch ratios are
expected. This was demonstrated with a similar ligation assay
performed on the spotted arrays of 6-mer probes. In this case
full-match/background ratio was about 50 and the average full
match/mismatch ratio was 30. The results further demonstrate the
ability to determine partial or complete sequences of DNA present
in concatemers by increasing the number of consecutive probe cycles
or by using 4 or more probes labeled with different dyes per each
cycle. Synthetic oligonucleotide (T1A:
5'-GCATANCACGANGTCATNATCGTNCAAACGTCAGTCCANGAATCNAGATCCACTTAGANTAAAAAA
AAAAAA-3') (SEQ ID NO: 13) contains at position 32 a degenerate
base. Universal adaptor was ligated to this oligonucleotide and the
adaptor-T1A DNA was circularized as described before. DNA
concatemers made using the rolling circle replication (RCR)
reaction on this target were arrayed onto the random array. Because
each spot on this random array corresponded to tandemly replicated
copies originated from a single molecule of T1A, therefore DNA in a
particular arrayed spot would contain either an A, or a C, or a G,
or a T at positions corresponding to position 32 of T1A. To
identify these sub-populations, a set of 4 ligation probes specific
to each of the 4 bases was used. A 5' phosphorylated, 3'
TAMRA-labeled pentamer oligonucleotide corresponding to position
33-37 of TIA with sequence CAAAC (probe T1A9b) was paired with one
of the following hexamer oligonucleotides corresponding to position
27-32: ACTGTA (probe T1A9a), ACTGTC (probe T1A10a), ACTGTG (probe
T1A1 1 a), ACTGTT (probe T1AI2a). Each of these 4 ligation probe
pairs should hybridize to either an A, C, G or T containing version
of T1A. For each hybridization cycle, the probes were incubated
with the array in a ligation/hybridization buffer containing T4 DNA
ligase at 20.degree. C. for 5 minutes. Excess probes were washed
off at 20.degree. C. and images were taken with a TIRF microscope.
Bound probes were stripped to prepare for the next round of
hybridization.
[0181] An adaptor specific probe (BrPrb3) was hybridized to the
array to establish the positions of all the spots. The 4 ligation
probe pairs, at 0.4 1.1M, were then hybridized successively to the
array with the base identifications as illustrated for four spots
in FIG. 5. It is clear that most of the spots are associated with
only one of the 4 ligation probe pairs, and thus the nature of the
base at position 32 of T1A can be determined specifically.
Example 5: Decoding Two Degenerate Bases at the End of a Synthetic
80-Mer Oligonucleotide
[0182] The same synthetic oligonucleotide described above contains
8 degenerate bases at the 5' end to simulate random genomic DNA
ends. The concatemers created from this oligonucleotide may have
these 8 degenerate bases placed directly next to the adaptor
sequence. To demonstrate the feasibility of sequencing the two
unknown bases adjacent to the known adaptor sequence, a 12-mer
oligonucleotide (UK0-12 sequence 5'-ACATTAACGGAC-3') (SEQ ID NO:
17) with a specific sequence to hybridize to the 3' end of the
adaptor sequence was used as the anchor, and a set of 16
TAMRA-labeled oligonucleotides in the form of BBNNNNNN were used as
the sequence-reading probes. For each hybridization cycle, 0.2 uM
of UK0-12 anchor probe and 0.4 uM of the BBNNNNNN probe were
incubated with the array in a ligation/hybridization buffer
containing T4 DNA ligase at 20.degree. C. for 10 minutes. Excess
probes were washed off at 20.degree. C. and images were taken with
a TIRF microscope. Bound probes were stripped to prepare for the
next round of hybridization.
[0183] Using a subset of the BBNNNNNN probe set (namely GA, GC, GG
and GT in the place of BB), spots were able to be identified spots
on the concatemer array created from targets that specifically bind
to one of these 4 probes, with an average full match/mismatch ratio
of over 20, as shown in FIG. 6.
1. COMPREHENSIVE DNA/RNA ANALYSIS USING ULTRA-HIGH CAPACITY
SELF-ASSEMBLED DNA NANO-ARRAY (saDNA) CHIPS PRODUCED FROM MIXTURES
OF NATURAL OR SYNTHETIC DNA FRAGMENTS
[0184] The nucleic acid hybridization process is used widely for
characterization of a DNA/RNA sample. Antibodies or other proteins
or compounds are used in various binding assays for
characterization of protein samples. For an efficient extensive
analysis of sample with many hybridization assays arrays of
gene/genomic fragments or synthetic oligonucleotides are prepared
in various ways. For preparing arrays of gene/genome fragments,
individual fragments are usually prepared in separate tubes/wells
and than deposited on the substrate. This process is too laborious
for preparing large number of samples (e.g. close or more than one
million) and/or does not allow to prepare an array of small, high
density spots, especially below 10 micrometer dot size. For
preparing high density arrays of about 100,000 or more
oligonucleotides in situ chemical synthesis of DNA is usually
performed. We describe here DNA/RNA and their derivatives or
peptides or protein and other array products, including processes
for their preparation and uses, that are based on applying mixtures
of detecting molecules of partially of fully known primary
structure or polymer sequence, preferably as concatemers of the
same molecule, on substrates with a pattern of high density small
binding sites separated by non-binding surface, followed by
determining which detecting molecule from the mixture is attached
at which binding site.
[0185] 1.1 saDNA Chip Preparation
[0186] In one embodiment, the saDNA platform utilizes attached
nano-balls of concatenated DNA/RNA as detecting molecules (DMs) for
hybridization to a solution phase, labeled DNA or RNA target. Since
no specific DMs are attached to specific binding sites on the
substrate they must first pass through a full or partial
sequencing, re-sequencing or signature identification.
[0187] 1.1.1. Preparation of DNA Fragments for Probe Generation
[0188] High density DNA nano-ball probe arrays are prepared from
source nucleic acids (NA) that can be derived from
[0189] 1. A library of gene clones
[0190] 2. PCR or otherwise derived amplicons
[0191] 3. Selected fragments of genomic DNA
[0192] 4. cDNA or mRNA, siRNA or other RNA mixture
[0193] 5. The entire genomic DNA of one or a mixture of
individuals.
[0194] The source NA may originate from one species or from
multiple species.
[0195] It is preferable to have all of the DNA probe segments of
the sample in a similar number of copies to avoid over
representation of individual sequences in the array. DNA from
multiple individuals of one species may be mixed to get the best
representation of every part of the genome. Some important or
control DNA probe segments may be intentionally added in higher or
lower amount than other fragments. Too many DNA probes having the
same, or significantly overlapped DNA, may reduce the sensitivity
of detection by competing for target DNA in solution.
[0196] DNA for probe generation may be fragmented to the preferred
length of 30-100 bases, although sizes of about 10-2000 bases in
length may also be used, and longer DNA may provide better
sensitivity. For example, twenty labeled target DNA fragments of
100 bases in length can hybridize to one 2000 base attached DNA
probe template thereby increasing the label density per probe site.
The preferred DNA length may be selected by various separation
methods including size exclusion matrices or gel
electrophoresis.
DNA for probe arrays may be also be generated from synthetic DNA
that has all sequence variants within a given length of eight to
twenty bases. The short probes will create a universal chip for DNA
sequencing by representing all possible sequences of 8 to 20 bases
within the array.
[0197] Elements of the RCR generated universal saDNA probes as
shown in FIG. 9 include
[0198] 1. An adapter sequence (BBBBB) and an N8_20 degenerate
detector oligonucleotide sequence (e.g. mixture of all
oligonucleotides of given length); mixtures of oligonucleotides of
variable length may also be used and none or some of the lengths
may not have all possible sequences.
[0199] 2. A capture sequence of 20-100, more frequently 25-50 bases
in length allows for the attachment of the concatenated RCR product
targets to the array via hybrid formation to attached
oligonucleotides on the surface.
[0200] 3. Primer binding site
[0201] 4. A probe binding sequence for QC of attachment efficiency
and relative quantitation of copy number in the concatenated RCR
product
[0202] 5. Sequences at the 5-prime and 3-prime ends of the adapter
allow for ligation of the N8.20 degenerate oligonucleotide via two
bridging oligonucleotides about 12-20 bases in length. Bridging
oligonucleotides may have several degenerated bases that bind to
the ends of detector oligonucleotides.
[0203] Selection of 10,000 to 1 million or more specific genomic
DNA fragments 20-2000 (preferably 100-1000) bases in length may be
performed for preparing sequence-specific DNA nano-ball probe
arrays. A large number of specific primers could be synthesized and
used individually or in pools for selecting subsets of genomic DNA
by primer extension or PCR. Another option is to make a universal
library of all 6-mers or 7-mers with and without 5 to 10 degenerate
bases at the 5' end and a universal tail further 5-prime of the
degenerate bases. For example BBBBBBB and U20N5_1 oBBBBBBB (where B
represents defined bases in the synthesis, U represents a sequence
present in all primers and N represents degenerate bases in the
synthesis). These primers can be used directly to amplify selected
DNA segments in viral or bacterial genomes, in one to three
consecutive amplification steps, and with the possibility of using
nested pairs of primers. Ligation of two 6-mers or two 7-mers (or
6-mer+7-mer) may generate a more specific primer that can be also
be used for genomes of higher complexity, including human. Several
pairs of primers could be created in one reaction tube using
selected 7-mer templates from a library of all 7-mers. Because
there is no need to produce a large quantity of DNA, 14-mers with a
universal primer tail may be sufficient. Nested 14-mer primers may
also be used to assure amplification of the region of interest.
[0204] 1.1.2. DNA Nano-Ball Preparation
[0205] Preparation of concatenated detector molecules (DNA
nano-balls) requires the formation of circular DNA molecules. DNA
is initially heat denatured and one end is ligated to an adapter.
In a second ligation reaction the second end of the probe template
is ligated to the free end of the adapter to complete the circle
formation. The adapter may include short palindromic sequences (eg.
ATCGATCGAT) to induce intra-molecular hybrid formation between
adapter replicas e.g. ----ATCGATCGAT-------TAGCTAGCTA--- and
compaction of the concatemer.
[0206] Rolling circle replication (RCR) then occurs with a primer
that is complementary to a portion of the adapter and phi29 strand
displacing polymerase. The concentration of circular DNA in the
polymerase reaction may be low (approximately 10-100 billion
circles per ml, or 10-100 circles per picoliter) to avoid
entanglement, and incorporation of palindromes in the adapter may
also minimize intermolecular interactions.
[0207] The RCR reaction may result in products of varying length so
removal of small nano-balls may be important for good
quantification of target in the hybridization assay. Selection of
small DNA nano-balls could occur by size exclusion methodologies or
complementary concantenated blocker molecules that will hybridize
to all adapter molecules in short molecules. Longer molecules will
have excess adapter molecules that can be hybridized to a capture
molecule on a solid support, whereas shorter molecules will be
blocked from binding to the support.
[0208] For making a large number of unit arrays a continuous
amplification of selected DNA fragments may be performed by cutting
concatemers by hybridizing an oligonucleotide to the adapter region
that generates a restriction enzyme site. New circles are formed by
ligation of the free ends and a second RCR reaction is performed.
One-billion fold amplification of the original DNA is possible to
achieve in three to four rounds of RCR (1) and would provide enough
DNA for making millions of arrays.
[0209] 1.1.3. Arraying DNA Nano-Balls
[0210] DNA detector nano balls will be arrayed on a glass or other
support with a grid of capture oligonucleotide sites. The capture
oligonucleotide may be 20 to 100 bases in length and could be
prepared using modified DNA such as LNA and PNA to increase hybrid
stability. All attached oligonucleotide sites may have the same
capture oligonucleotide and the surface between these sites may be
hydrophobic to prevent binding of hydrophilic molecules. The array
of capture oligonucleotides may be produced by nano-printing
techniques or by creating active sites for oligonucleotide
attachment using photochemistry (2). Another, among many DNA
nano-ball attachment options, is to create a positively charged
spot surface that binds negatively charged DNA. The attached
oligonucleotide region size may vary for different applications but
could range from about 0.2 microns to 2 microns in diameter. Large
oligonucleotide attachment sites may be suitable for longer DNA
molecules.
[0211] Binding of sa DNA nano-ball probes may proceed at specific
temperatures with or without mixing until about 80%-99% of spots
are occupied. Empty sites that do not bind a DNA nano-ball may be
used as barcodes for aligning the grid between different CCD camera
images of the same array used for decoding and a sample assay.
Another option is to stamp or spray the nano-ball solution in the
form of a 10 micron solution layer or picoliter or sub-picoliter
drops containing about 10 nano-balls. About 10 such drops per the
same surface will be sunegatfficient to occupy all 100 binding
sites with 1 micrometer pitch at the 10.times.10 micrometer
substrate surface. We have created DNA nano-balls from E. coli
genomic DNA at an estimated concentration of less than 150
nano-balls/picoliter of RCR reaction which assumes maximal
efficiency of circle DNA formation and polymerase extension. This
approach of attachment will help to minimize the binding and
association of two nano-balls with complementary DNA because of no
surface mixing of millions of nano-balls over already attached
nano-balls.
[0212] It may also be desirable to perform additional in situ DNA
amplification that requires cutting the attached concatemerized
DNA, recircularization (preferably by using a different adapter
DNA) and RCR. This could be achieved with two different capture
probes present at the oligonucleotide attachment site such that DNA
concatenated with both adapters can be captured at the site.
Another method for in situ amplification is to use capture
oligonucleotides as primers for a strand displacing polymerase.
These methods could achieve 10,000 to 100,000 or more copies per
attachment site. Since 100,000 copies of a 1 kb DNA molecule that
is 500 nm in length will occupy about 10% of the 500 nm.times.500
nm.times.500 nm spot volume, there would be ample space to maintain
a concatemerized molecule of this size. RCR products may be
fragmented after attachment using a complementary oligonucleotide
to create a double-stranded DNA cutting site.
[0213] In one embodiment, DNA fragments can be attached to a
preexisting concatemer of oligonucleotide complementary to capture
oligonucleotides present in the binding sites. This attachment can
be done by hybridization using an end-adapter or by ligation on one
end or both ends to from a circle. In this case individual sample
DNA molecules will be arrayed without RCR in solution and may be
used as such or amplified in situ by using various methods
including RCR. Saturation at a spot by in situ amplification or by
cutting the excess unbound to capture oligo may be used to get
almost identical copy number per spot.
[0214] It is estimated that a single well of a 384 well plate could
accommodate in the order of 10.sup.6 DNA attached nano-balls and a
single well of a 96-well plate 5.times.10.sup.6 DNA nano-balls. The
analysis of 10.sup.7 bases of DNA with a 90%-99% overlap of
fragments (i.e. one fragment starting every 1-10 bases on average)
would require about 10.sup.6-10.sup.7 DNA nano-balls. This amount
of sequence equates to 100-300 different viruses of 10-30K base
genome size, so potentially a 1536-well plate will work for high
throughput viral screening with 10-30 viruses represented on each
unit-array. Alternatively 200 to 300 bacterial species will have
about 10.sup.9 bases of DNA so 10.sup.7.times.100 base long
fragments will cover more than 50% of bases with occasional rare
gaps longer than 1 kb appearing. In this case forming arrays for 10
meaningful groups of 20-30 bacterial species and all different
isolates is more than sufficient for screening 10 specific human or
other samples (e.g. blood, urine, saliva, skin each on specific
array. About 108 bases of human coding DNA may be represented by 10
DNA nano-balls having 50-200 base fragments. All long exons and
almost every short exon will be represented with at least one
fragment and every gene will be represented by about 30-3000
fragments.
[0215] 1.2. Identification of DNA Nano-Balls
[0216] Spot DNA identification or sequencing involves
characterization with multiple decoding and sequencing probes. The
identification process may also provide quantification of DNA in
each spot. This information may be used in the interpretation
(normalization) of the obtained hybridization results obtained with
the test DNA or RNA.
[0217] 1.2.1. Identification of Long DNA
[0218] The sequence identity of each attached DNA nano-ball may be
determined by a "signature" approach. About 50 to 100 or possibly
200 probes will be used such that about 25-50% or in some
application 10-30% of attached nano-balls will have a full match
sequence for each probe. This type of data will allow each
amplified DNA fragment within the nano-ball to be mapped to the
reference sequence. One example of this process would be to score
64 4-mers (i.e. 25% of all possible 256 4-mers) using 16
hybridization/stripoff cycles in a 4 colors labeling schema. On a
60-70 base fragment amplified in the DNA nano-ball about 16 of 64
probes will be positive since there are 64 possible 4mers present
in a 64 base long sequence (ie one quarter of all possible 4mers).
Unrelated 60-70 base fragments will have a very different set of
about 16 positive decoding probes. A combination of 16 probes out
of 64 probes has a random chance of occurrence in 1 of every one
billion fragments which practically provides a unique signature for
that nano-ball. Scoring 80 probes in 20 cycles and generating 20
positive probes would create even more unique signature: occurrence
by chance is 1 in billions. Previously, a "signature" approach was
used to select novel genes from cDNA libraries (3) An
implementation of a signature approach is to sort obtained
intensities of all tested probes and select up to a predefined
(expected) number of probes that satisfy the positive probe
threshold. These probes will be mapped to sequences of all DNA
fragments (sliding window of a longer reference sequence may be
used) expected to be present in the array. The sequence that has
all or a statistically sufficient number of the selected positive
probes is assigned as the sequence of the DNA fragment in the given
nano-ball. In another approach an expected signal can be defined
for all used probes using their pre measured full match and
mismatch hybridization/ligation efficiency. In this case a measure
similar to the correlation factor can be calculated.
[0219] A preferred way to score 4-mers is to ligate pairs of
probes, for example: Nis 7)BB6 with BN(7_9), where B is the defined
base and N is a degenerate base. For generating signatures on
longer DNA nano-ball probes, more unique bases will be used. For
example, a 25% positive rate in a fragment 1000 bases in length
would be achieved by N(4-6)BBBB and BBN(6$). Note that longer
fragments need the same number of about 60-80 probes (15-20
ligation cycles using 4 colors).
[0220] In one embodiment all probes of a given length (e.g. 4096
NZ.4BBBBBBN2-4) or all ligation pairs may be used to determine
complete sequence of the DNA in the nano-ball. For example, 1024
combinations of Nib-7 63 and BBN(6$) may be scored (256 cycles if 4
colors are used) to determine sequence of DNA fragments of up to
about 250 bases, preferably up to about 100 bases.
[0221] The decoding of sequencing probes with large numbers of Ns
may be prepared from multiple syntheses of subsets of sequences at
degenerated bases to minimize difference in the efficiency. Each
subset is added to the mix at a proper concentration. Also, some
subsets may have more degenerated positions than others. For
example, each of 64 probes from the set
[0222] N(5-7)BBB may be prepared in 4 different synthesis.
[0223] Oligonucleotide preparation from the three specific
syntheses would be added in to regular synthesis in experimentally
determined amount to increase hybrid generation with target
sequences that have in front of the BBB sequence an AT rich (e.g.
AATAT) or (A or T) and (G or C) alternating sequence (e.g. ACAGT or
GAGAC). These sequences are expected to be less efficient in
forming a hybrid. All 1024 target sequences can be tested for the
efficiency to form hybrid with No_3 NNNNNBBB probes and those types
that give the weakest binding may be prepared in about 1-10
additional synthesis and added to the basic probe preparation.
[0224] Decoding by Signatures: a smaller number of probes for small
number of distinct samples: 5-7 positive out of 20 probes (5 cycles
using 4 colors) has capacity to distinct about 10-100 thousand
distinct fragments
[0225] 1.2.2. Decoding of 8-20Mer RCR Products
[0226] In this application arrays are formed as random
distributions of unique 8 to 20 base recognition sequences in the
form of DNA nano-balls. The probes need to be decoded to determine
the sequence of the 8-20 base probe region. At least two options
are available to do this and the following example describes the
process for a 12 mer. In the first, one half of the sequence is
determined by utilizing the hybridization specificity of short
probes and the ligation specificity of fully matched hybrids. Six
to ten bases adjacent to the 12 mer are predefined and act as a
support for a 6mer to 10-mer oligonucleotide. This short timer will
ligate at its 3-prime end to one of 4 labeled 6-mers to 10-mers.
These decoding probes consist of a pool of 4 oligonucleotides in
which each oligonucleotide consists of 4-9 degenerate bases and 1
defined base. This oligonucleotide will also be labeled with one of
four fluorescent labels. Each of the 4 possible bases A, C, G, or T
will therefore be represented by a fluorescent dye. For example
these 5 groups of 4 oligonucleotides and one universal
oligonucleotide (Us) can be used in the ligation assays to sequence
first 5 bases of 12-mers: B=each of 4 bases associated with a
specific dye or tag at the end
UUUUUUUU.BNNNNNNN'
UUUUUUUU.NBNNNNNN
UUUUUUUU.NNBNNNNN
UUUUUUUU.NNNBNNNN
UUUUUUUU.NNNNBNNN
[0227] Six or more bases can be sequences with additional probe
pools. To improve discrimination at positions near the center of
the 12mer the 6mer oligonucleotide can be positioned further into
the 12mer sequence. This will necessitate the incorporation of
degenerate bases into the 3-prime end of the non-labeled
oligonucleotide to accommodate the shift. This is an example of
decoding probes for position 6 and 7 in the 12-mer.
[0228] UUUUUUNN.NNNBNNNN UUUUUUNN.NNNNBNNN
[0229] In a similar way the 6 bases from the right side of the
12mer can be decoded by using a fixed oligonucleotide and 5-prime
labeled probes. In the above described system 6 cycles are required
to define 6 bases of one side of the 12mer. With redundant cycle
analysis of bases distant to the ligation site this may increase to
7 or 8 cycles. In total then, complete sequencing of the 12mer
could be accomplished with 12-16 cycles of ligation.
[0230] 1.2.3. Partial or Complete Sequencing of Arrayed DNA by
Combining Two Distinct Types of Libraries of Detector Probes
[0231] In this approach one set has probes of the general type
N3$B4-6 (anchors) that are ligated with the first 2 or 3 or 4
probes/probe pools from the set BN'', NBN5-7, N2BN4-6, and N3BN3-5.
The main requirement is to test in a few cycles a probe from the
first set with 2-4 or even more probes from the second set to read
longer continuous sequence such as 5-6+3-4=8-10 in just 3-4 cycles.
In one example, the process is:
[0232] 1) Hybridize 1-4 4-mers or more 5-mer anchors to obtain
70-80% 1 or 2 anchors per DNA.
[0233] One way to discriminate which anchor is positive from the
pool is to mix specific probes with distinct hybrid stability
(maybe different number of Ns in addition). Anchors may be also
tagged to determine which anchor from the pool is hybridized to a
spot. Tags, as additional DNA segment, may be used for adjustable
displacement as a detection method. For example, EEEEEEEENNNAAAAA
and FFFFFFFFNNNCCCCC probes can be after hybridization or
hybridization and ligation differentially removed with two
corresponding displacers: EEEEEEEENNNNN and FFFFFFFFNNNNNNNN where
the second is more efficient. Separate cycles may be used just to
determine which anchor is positive. For this purpose anchors
labeled or tagged with multiple colors may be ligated to unlabeled
N7-N10 supporter oligonucleotides.
[0234] 2) Hybridize BNNNNNNNN probe with 4 colors corresponding to
4 bases; wash discriminatively (or displace by complement to the
tag) to read which of two scored bases is associated to which
anchor if two anchors are positive in one DNA. Thus, two 7-10 base
sequences can be scores at the same time. [0235] In 2-4 cycles
extend to 4-6 base anchor for additional 2-4 bases [0236] Run 16
different anchors per each array (32-64 physical cycles if 4 colors
are used) to determine about 16 possible 8-mers (-100 bases total)
per each fragment (more then enough to map it to the reference
(probability that a 100-mer will have a set of 10 8-mers is less
than 1 in trillion trillions; (10exp-28). By combining data from
different anchors scored in parallel on the same fragment in
another array complete sequence of that fragment and by extension
to entire genomes may be generated from overlapping 7-10-mers.
[0237] 1.2.4. Tagging Probes with DNA Tags for Larger Multiplex of
Decoding or Sequence Determination Probes
[0238] Instead of directly labeling probes they can be tagged with
different oligonucleotide sequences made of natural bases or new
synthetic bases (such as isoG and isoC). Tags can be designed to
have very precise binding efficiency with their anti-tags using
different oligonucleotide lengths (about 6-24 bases) and/or
sequence including GC content. For example 4 different tags may be
designed that can be recognized with specific anti-tags in 4
consecutive cycles or in one hybridization cycle followed by a
discriminative wash. In the discriminative wash initial signal is
reduced to 95-99%, 30-40%, 10-20% and 0-5% for each tag,
respectively. In this case by obtaining two images 4 measurements
are obtained assuming that probes with different tags will rarely
hybridize to the same dot. Another benefit of having many different
tags even if they are consecutively decoded (or 2-16 at a time
labeled with 2-16 distinct colors) is the ability to use a large
number of individually recognizable probes in one assay reaction.
This way a 4-64 times longer assay time (that may provide more
specific or stronger signal) may be affordable if the probes are
decoded in short incubation and removal reactions.
[0239] 1.2.5. System for Decoding saDNA Chip Machine
Introduction:
[0240] A key component of successful array production is having a
cost-effective methodology for decoding each array. Decoding arrays
during production simplifies assays for the end user. Our decoding
methodology includes a fast, automated imaging and assay platform
designed specifically to optimize this task. Under the currently
described schema, patterned array substrates are produced to match
the standard 96 or 384 well plate format. Our production format
will be an 8.times.12 pattern of 6 mm.times.6 mm arrays at 9 mm
pitch or 16.times.24 of 3.33 mm.times.
[0241] 3.33 mm array at 4.5 mm pitch, on a single piece of glass or
plastic and other optically compatible material. In one example
each 6 mm.times.6 mm array consists of 36 million 250-500 nm square
activated regions at 1 micrometer pitch. Throughout the production
process, our arrays will be manipulated in this array of arrays
format.
[0242] The rate limiting step for the production process may be
array decoding. While arrays can be printed and hybridized at an
astonishing rate through the use of processes derived from the
semiconductor industry, they must be decoded at the rate of image
acquisition. The decoding process, described in other sections of
this document, will require the use of 48-96 or more decoding
probes. These pools will be further combined into 12-24 or more
pools by encoding them with four fluorophores, each having
different emission spectra. Additional tagging may be used as
described in the biochemistry of decoding.
[0243] Using a 20.times. objective, each 6 mm.times.6 mm array may
require roughly 30 images for full coverage by using a 10 mega
pixel camera with. Each of 1 micrometer array areas will be read by
about 8 pixels. Our prior experience suggests that each image could
be acquired in 250 milliseconds, 150 ms for exposure and 100 ms to
move the stage. Using this fast acquisition it will take -7.5
seconds to image each array, or 12 minutes to image the complete
set of 96 arrays on each substrate. In one embodiment of an imaging
system, we will achieve this high image acquisition rate by using
four ten-megapixel cameras, each imaging the emission spectra of a
different fluorophore. The cameras will be coupled to the
microscope through a series of dichroic beam splitters. The
autofocus routine, which takes extra time, will run only if an
acquired image is out of focus. It will then store the Z axis
position information to be used upon return to that section of that
array during the next imaging cycle. By mapping the autofocus
position for each location on the substrate we will drastically
reduce the time required for image acquisition.
[0244] Each array will require about 12-24 cycles to decode. Each
cycle consist of a hybridization, wash, array imaging, and
strip-off step. These steps, in their respective orders, may take
for the above example 5, 2, 12, and 5 minutes each, for a total of
24 minutes each cycle, or roughly 5-10 hours for each array, if the
operations were performed linearly. The time to decode each array
can be reduced by a factor of two by allowing the system to image
constantly. To accomplish this, we will stagger the imaging of two
separate substrates on each microscope. While one substrate is
being reacted, the other substrate will be imaged.
[0245] The physical makeup of the machine will include a number of
additions to the standard microscope. A large area automated plate
stage may be added to the microscope. This stage will accommodate
the two substrates needed for each decoding assay. Another
possibility is to use two smaller substrates that can fit in the
standard plate stage. Each substrate will be fitted into a cassette
and those cassettes will be fitted on to the stage. The cassette
will index the substrate to the stage and provide a method to
contain fluids over the assay substrate. Cassettes will have ports
to facilitate the addition and removal of large volumes of buffer.
They will also provide a means to control the temperature of the
substrate, through a connection with a temperature control
subsystem with ability to maintain temperature in the range from
about 5-950 C or more specifically 10-85.degree. C.) and can change
temperature in the cycle about 0.5-20 C per second. Another key
component is the 3 axis robot gantry which will be equipped with a
syringe pump actuated pipetting head. This robotic pipetter will be
used to add the probe pools to each cassette. Syringe pumps will be
used to pump buffers into and out of each cassette. In another
embodiment, the robotc piteting may be replaced with pumps and
valves based automation of decoding probe pool delivery. In yet
another embodiment all reagents and substrates may be contained on
a microfluidic chip.
[0246] Example Cycle:
Set temperature of array to hybridization temperature (usually in
the range 5-250 C) Use robot pipetter to pre mix a small amount of
decoding probe with the appropriate amount of hybridization buffer.
Pipette mixed reagents into hybridization chamber Hybridize for
predetermined time Drain reagents from chamber using pump (syringe
or other) Add a buffer to wash mismatches of non-hybrids Adjust
chamber temperature to appropriate wash temp (about 10-40.degree.
C.) Drain chamber Add more wash buffer if needed to improve
imaging
[0247] Image each array, preferably with a mid power (20.times.)
microscope objective optically coupled to a high pixel count high
sensitivity ccd camera, or cameras. Plate stage moves chambers (or
perhaps flow-cells with input funnels) over object, or
objective-optics assembly moves under chamber. Certain optical
arrangements, using di-chroic mirrors/beam-splitters can be
employed to collect multi-spectral images simultaneously, thus
decreasing image acquisition time. Arrays can be imaged in sections
or whole, depending on array/image size/pixel density. Sections can
be assembled by aligning images using statistically significant
empty regions pre-coded onto substrate (during active site
creation) or can be made using a multi step nano-printing
technique, for example sites (grid of activated sites) can be
printed using specific capture probe, leaving empty regions in the
grid. Then print a different pattern or capture probe in that
region using separate print head.
[0248] Drain chamber and replace with probe strip buffer (or use
the buffer already loaded) then heat chamber to probe stripoff
temperature (60-90.degree. C.). High pH buffer may be used in the
strip-off step to reduce stripoff temperature. Wait for the
specified time.
[0249] Remove Buffer
[0250] Start next cycle with next decoding probe pool in set
Specific solutions:
Hybrization Chamber
[0251] Currently we use a flow cell for 1'' square 170 micrometer
thick coverslips that have been derivitized and activated to bind
nano-balls. The cell encloses the "array` by sandwiching the glass
and a gasket between two planes. One plane has an opening of
sufficient size to permit imaging, and an indexing pocket for the
coverslip. The other plane has an indexing pocket for the gasket,
fluid ports, and a temperature control system. One fluid port is
connected to a syringe pump which "pulls" or "pushes" fluid from
the flow cell the other port is connected to a funnel like mixing
chamber. The chamber, in turn is equipped with a liquid level
sensor. The solutions are dispensed into the funnel, mixed if
needed, then drawn into the flow cell. When the level sensor reads
air in the funnels connection to the flow cell the pump is reversed
a known amount to back the fluid up to the funnel. This prevents
air from entering the flow cell. This system has worked well for
the coverslip sized samples and may be used in modified form for
the larger substrates.
[0252] The substrate may be sectioned off and divided into strips
to accommodate fluid flow/capillary effects caused by sandwiching.
The substrate may be made of thicker glass to resist flexing in the
chamber, reducing reliance on autofocus. The substrate may be
housed in an "open air"/"open face" chamber to promote even flow of
the buffers over the substrate by eliminating capillary flow
effects.
Imaging/Imaging Speed
[0253] Currently imaging is accomplished with a 100.times.
objective using TIRF or epi illumination and a 1.3 mega pixel
hamamatsu orca-er-ag on a ziess axiovert 200. This configuration
currently images nano-balls bound randomly to a substrate
(non-ordered array). Imaging speed will be improved by decreasing
the objective magnification power, using grid patterned arrays and
increasing the number of pixels of data collected in each
image.
[0254] We propose using up to four or more cameras preferably in
the the 10-16 megapixel range. Larger pixel count cameras, such as
the 81 megapixel ccd 595 from Fairchild imaging may be used if and
when they are cost effective. We may also use multiple band pass
filters and dichroic mirrors to collect pixel data across up to
four or more emission spectra. To compensate for the lower light
collecting power of the decreased magnification objective, we will
increase the power of the excitation light source. Currently, the
imaging system is idle while the samples are being
hybridized/reacted. To increase throughput one or more chambers
will be assayed while one or more chambers is being imaged. Because
the probing of arrays can be non-sequential, more than one imaging
system can be used to collect data from a set of arrays, further
decreasing assay time.
[0255] During the imaging process, the substrate must remain in
focus. Some key factors in maintaining focus are the flatness of
the substrate, orthogonality of the substrate to the focus plane,
and mechanical forces on the substrate that may deform it.
Substrate flatness can be well controlled, glass plates which have
better than % wave flatness are readily obtained. Uneven mechanical
forces on the substrate can be minimized through proper design of
the hybridization chamber. Orthogonality to the focus plane can be
achieved by a well adjusted, high precision stage. Even when all
these issues are addressed, it is likely that some auto focus
methodology will have to be used during substrate imaging. Auto
focus routines generally take additional time to run, so it is
desirable to run them only if necessary. After each image is
acquired, it will be analyzed using a fast algorithm to determine
if the image is in focus. If the image is out of focus, the auto
focus routine will run. It will then store the objectives Z
position information to be used upon return to that section of that
array during the next imaging cycle. By mapping the objectives Z
position at various locations on the substrate, we will reduce the
time required for substrate image acquisition.
[0256] Illumination:
[0257] The current system uses a zeiss TIRF slider coupled to a 80
milliwatt 532 nm solid state laser. The slider illuminates the
substrate through the objective at the correct TIRF illumination
angle. TIRF can also be accomplished without the use of the
objective by illuminating the substrate though a prism optically
coupled to the substrate. Planar wave guides can also be used to
implement TIRF on the substrate Epi illumination can also be
employed. The light source can be rastered, spread beam, coherent,
incoherent, and originate from a single or multi-spectrum
source.
[0258] Our current microscope can do standard epi illumination on
the entire plate substrate. Our current system successfully detects
hybridization on DNA nano-balls with both TIRF and epi
fluorescence.
[0259] A preferred embodiment for the imaging system will contain a
20.times. lens with a 1.25 mm field of view, with detection being
accomplished with a 10 megapixel camera. Such a system would image
approx 1.5 million nano-balls attached to the patterned array at 1
micron pitch. Under this configuration there are approximately 6.4
pixels per nano-ball. The number of pixels per nano-ball can be
adjusted by increasing or decreasing the field of view of the
objective. For example a 1 mm field of view would yield a value of
10 pixels per nano-ball and a 2 mm field of view would yield a
value of 2.5 pixels per nanoball. The field of view will be
adjusted relative to the magnification and NA of the objective to
yield the lowest pixel count per nano-ball that is still capable of
being resolved by the optics, and image analysis software.
[0260] Robot Gantry:
[0261] Our current 3 axis robotic gantry pipetting system can be
scaled up to serve more than one microscope. Currently the system
has one pipette head. If the number of chambers becomes too great
for a single pipetter to service efficiently, multiple pipetting
channels can be added to the pipetter head, each head individually
accessible via a simple linear extension system, increasing robot
efficiency by increasing the service potential for each robot move.
It may be more efficient or cost effective to implement a
non-gantry style robot, such as a scara style robot to perform
certain operations.
[0262] Plate Stage:
[0263] A larger than standard plate stage may be needed to image
more than one plate sized substrate per microscope. The plate stage
should be designed for rigidity, positional accuracy, and
repeatability.
[0264] 1.3. Preparation of Sample Targets for saDNA Probe
Arrays
[0265] The DNA nano-ball probe arrays can be used for sequence (for
example genes, exons, promores, diagnostic sites, SNPs, mutations)
identification in amplified or possibly non-amplified target
samples. For the detection and characterization of viral and
bacterial DNA collected from clinical or pre-symptomatic isolates
there may be the requirement to minimize contaminating human
genomic DNA. The reduction of human DNA contamination may be
achieved by using affinity columns or beads directed to Alu or LINE
repeats in the human genome. Sample DNA of 1-10 kb length could be
hybridized to these affinity columns and the un-bound fraction
collected and fragmented to the final preferred length before
amplification or direct hybridization to the nano-ball probe
arrays. It may be important to quantify the amount of isolated
DNA.
[0266] Under conditions in which the DNA sample is relatively pure,
"whole genome" methods of amplification could be employed. One
approach could be to form single stranded DNA circles (50-500 bases
in length) using a 20-100 base adapter and amplify by RCR 100-1000
fold in a linear amplification from the original copy. Concatamers
can then be fragmented by a restriction enzyme after hybridizing a
complementary oligonucleotide to the adapter such that a double
stranded cutting site is formed.
[0267] It may be beneficial to randomly fragment the target DNA to
about 50-200 bases using DNAse at a pre-tested enzyme dilution for
a specific incubation time and depending upon the amount of DNA in
the sample. Fragmentation has the benefit of improving
hybridization kinetics and decreasing negative repellent forces of
the molecules. It is also a more efficient use of sample DNA and
less likely to build chaining of two DNA fragments from solution
that may cause false signals. It may also be beneficial to develop
an internal control target that reports the degree of fragmentation
for example through the separation of quenching dyes.
[0268] Each target DNA that is prepared by RCR will be a single
stranded concatemer of sample target and adapter. The adapter
sequence portion of the RCR concatemers allows for the
hybridization of labeled dendrimers in which a single arm of the
dendrimer is complementary to a portion of the adapter used to form
DNA circles. Non-amplified DNA may be labeled by poly-C or poly-A
tailing using terminal transferase and than hybridized to
dendrimers with a single complementary arm. An alternative may be
to ligate on each end of single stranded DNA a complement to
different dendrimer arm. Longer DNA with multiple attached
dendrimers may be ligated to each end or other standard labeling
procedures may be used. The excess of label may be hybridized or
ligated to a biotinylated oligonucleotide, and remove with
Strepavidin coated beads.
[0269] A target DNA sample may also be prepared by utilizing the
detection DNA nano-ball array itself for sample isolation. In this
procedure the target DNA would be collected in a small volume, then
fragmented and denatured. The target DNA is then hybridized to an
array of nano-ball sequences complementary to those desired from
the sample and any excess un-hybridized DNA would then be washed
away. Captured target DNA could be amplified on the surface of the
array by covalently ligating fragmented concatemers with the
capture oligonucleotide as a bridging support, followed by RCR.
Alternatively, an adapter with a tag or label can be ligated to the
hybridized DNA and detected.
[0270] RNA may also serve as the target with or without conversion
to DNA Sample DNA or RNA amplification may not be required due to:
1) extensive miniaturization, low reaction volume and effective
reaction mixing to allow DNA or RNA fragments to find complementary
nano-ball probes; 2) longer DNA detector molecules in the array
enable efficient and specific hybridization in complex mixtures.
This also allows the use of bulky signal amplification molecules;
3) the use of multiplicity of different DNA fragments for each DNA
region of each sample reduces experimental noise (it also allows
finding of specific DNA fragments for detecting given gene or
genomic region with no cross talk to other DNA in the sample); 4)
signal amplification for example with the application of dendrimers
or concatenated detector DNA as labeling methods for the
target.
[0271] 1.3.1. Hybridization and Data Analysis
[0272] The longer lengths of the DNA nano-ball probes allows for
stringent washing conditions which can improve specificity of the
probes and targets. Temperature gradients and obtaining several
measurements per spot in the wash steps could also be employed to
increase specificity and this may be important for shorter probes
to detect single or a few base changes. Detection of hybrid
formation without washing excess target from the reaction chamber
(homogenous assay) is also an option by focusing the CCD on the
surface. This could be especially applicable if a longer
hybridization is performed to deplete labeled DNA from
solution.
[0273] One or more images of an array will be generated preferably
using a CCD camera. Raw signals will be determined by image
analysis and assigned to each spot and associated with provided
information about identity of detector molecule at each spot. Empty
or other control dots may be used to assure proper assignments of
spot signals to detector molecules.
[0274] 1.4. Applications
[0275] Some applications of the long DNA nano ball probes
include
[0276] 1. Detection of gene duplications and deletions
[0277] 2. Detection of horizontal transfer of DNA (tumor samples,
individual variation, similar species)
[0278] 3. Gene expression analysis
[0279] 4. Alternative splicing characterization
[0280] 5. Pathogen detection and quantification in which there is
no need for sequence specific primers.
[0281] 6. Enrichment columns or elimination columns for example Alu
binding columns for human DNA or split on two samples label one by
biotin, then rehybridize them and collect. More abundant sequences
will bind rapidly to the biotinylated strands and will be removed.
Selection of strain specific genes could occur by removing common
sequences by hybridizing to the one or more initially selected
strains.
[0282] 7. Environmental screening/quantification for microbes
[0283] 8. Protein binding assays
[0284] 9. A SNP-detection chip in which 20-mers with mismatches are
selected between multiple pairs of individual genomes. This could
be performed in a 96-well plate with 10 million SNPs assayed Genome
specific tiling chips could be created through the use of random
mutagenesis. Low accuracy polymerase enzymes could be used to
incorporate one mutation every 10-20 bases and arrays of 10-25-mers
can be prepared with the total sequence covered by the nano-ball
probes of 40 to 400 times the length of the genome.
[0285] 10. DNA assembly by ligation or multiple site specific
mutagenesis
[0286] 11. Programmable nano-wiring support
[0287] In this application the array with different identified DNA
per spot is used to create programmable connections and nano-wires
between neighboring spots by providing a bridging oligonucleotide.
..PPPPPPPPPPPPPPPPPPPP SSSSSSSSSSSSSSS.... PPPPPPPPPPPBBBBB ....
BBBBSSS BBBBSSS.
[0288] By designing different length for Ps and Ss in the above
example a controllable switches can be generated using temperature
as the trigger. The connector may be designed to stay in one of two
connected spots for reconnecting.
[0289] Advantages of nano-ball probe arrays vs. in-situ prepared
probe arrays include: --longer probe lengths (50-5000 base) allows
for increased specificity and sensitivity. --Ultra-high density of
probe nano-balls allows for higher sensitivity and lower assay
cost--Low production cost compared with existing array technologies
[0290] Very high probe accuracy even for long probes [0291]
10-100.times. higher density of probes per surface than existing
array technologies; --Three-dimensional nature of the nano-ball
improves hybridization access [0292] Full flexibility in changing
and upgrading content similar to mask-less in-situ synthesis The
use of non-identical arrays may result in some targets not being
represented in the array and may need many spots with the same DNA:
losing the advantage of high densities, but accuracy of each
measurements increases. Also, there may be a need for specific
priming to make arrays representing only selected DNA regions of a
genome.
[0293] We have described two platforms for sequence quantification.
In the rSBH platform target (e.g. test sample) nano-balls are
arrayed on the surface and can be quantitated by counting the
occurrence of specific nanoballs. In the saDNA platform nano-ball
probes attached to the surface are used to quantitate solution
phase labeled target through relative intensity levels of the label
at the nano-ball. The advantages of saDNA based quantitation versus
rSBH based quantitation include: 1) A duplicated gene will produce
on average a two-fold stronger signal on 10-100 representative DNA
nano-ball probes. In contrast for rSBH we would need to count
sufficient number of nano-balls to determine whether there is truly
overrepresentation of one sequence over other sequences.
[0294] A further advantage of the saDNA platform over array of DNA
from the test sample is that only 10-100 million nano-balls need be
scored instead of 1-10 billion for quantitative representation of
all informative fragments. One limitation of saDNA however is that
it may be difficult to identify low frequency targets such as gene
duplications or deletions in one of every 10-10,000 tumor
cells.
[0295] Example: Duplications and Deletions in Tumor Samples:
100 million (maybe only 30 million after removing repeats in one
96-well).times.100-1000 bases: one every 30 bases on average; can
detect 1000 base deletions; for full sequencing (cover every part
with sufficient redundancy 10.times. more DNA and 1 Ox more
probes.
[0296] 1.4.1. Self-Assembled Arrays of Peptides, Proteins or Other
Polymers
[0297] RCR products of synthetic or natural DNA fragments of about
30-3000 bases initiated with a primer that has RNA polymerase
promoter extension are used to produce long RNA and in vitro
translated protein with multiple copies of the same peptide with an
adapter (used for forming DNA circles) coded spacer peptide. The
resulting protein with 100 to 10000 amino acids may be folded maybe
initiated by the spacer protein to form several to hundreds of
almost independently folded unit peptides. Each peptide may form
several domains for binding different molecules like antibodies,
oligo peptides, single or double-stranded oligonucleotides or other
chemical compounds that can be used to identify given peptide.
[0298] These protein balls may be attached to binding sites of a
support having a peptide or other molecule that binds to spacer
peptide or by using other general protein binding chemistry. Small
size of active binding sites surrounded by non-binding support
allow to attach only one (first to bind) protein nano-ball by
binding saturation of all available binding molecules in the
binding site or by physical prevention of other protein nano-balls
to interact with the same binding site. To minimize double or
multiple occupancy protein nano-balls smaller than give size may be
removed by size separation or saturation of spacer protein.
2. EXAMPLES OF SPECIFIC PRODUCTS AND PROCESSES AND PROCEDURES FOR
MAKING THEM
[0299] Preparation of DNA detection and quantification arrays
comprising:
[0300] providing mixture of DNA fragments 10, 20, 50, 100 or more
bases and shorter than 25, or 50, or 100, or 500, or 1000, or 2000
or 5000 or 10,000 bases from a source DNA
[0301] form DNA arrays by attaching concatemers of the same
fragment or by in-situ amplification of a single DNA molecule
[0302] identify the DNA in each spot by hybridization signature or
partial or complete sequence determination. [0303] Dependent claim:
RCR based formation of DNA concatemers with or without sequence
complementary to the support bound capture oligonucleotide [0304]
Dependent claim: Utilize a support with a grid of regions with DNA
capture chemistry separated by surface without DNA capture
chemistry, each region being 0.1-10 micrometer with center to
center distance of about 0.2 to 20 um. [0305] Dependent claim:
Source DNA is all sequence variants of given length 8 to 20 base
[0306] Dependent claim: Identity of nano-ball sequence by ligation
of two adapter dependent or adapter independent oligonucleotides,
and use individual probes or pools of probes with 0 to about 8
informative bases. [0307] Highly multiplexed DNA detection and
quantification method consisting of
[0308] providing a DNA array containing >100K, >1 M, >10M
DNA spots identified by hybridization signature or partial or
complete sequence
[0309] Hybridizing target sample comprising labeled or tagged (or
target able to be labeled or tagged) DNA fragments under conditions
allowing the formation of complementary DNA hybrids *Detecting
bound labels/tags or bound DNA in array spots
[0310] Analyzing data to detect and quantify DNA molecules in the
sample substantially complementary to one or more DNAs on the
array
[0311] Dependent claim: DNA arrays prepared using RCR based
formation of DNA concatemers with or without sequence complementary
to the support bound capture oligo bound
[0312] Dependent claim: Add a washing step before detecting step to
remove non-hybridized DNA Dependent claim: Add a stringent washing
step before detecting step to remove non-hybridized DNA and DNA
hybridized to targets with larger number of mismatches;
[0313] Dependent claim: performing multiple detection step during
the increased stringency (for example higher temperature, or higher
pH) washes
[0314] Dependent claims: determining gene expression and or
alternative splicing; gene deletion or duplication; pathogen
detection, quantification and characterization, SNP detection;
mutation discovery, microbe detection and quantification in natural
sources; DNA sequencing, industrial use in agriculture, food
pathogens, medical diagnostics, cancer samples; [0315] Labeling or
tagging of sample molecules is done after binding them to the
detector molecules in the array. [0316] A support with DNA/RNA with
natural or analog bases spots in a grid or random spot array with
informative single stranded DNA longer than 15, or 25, or 50, or 75
or 100 or 125, or 150, or 200, or 250, or 300, or 400, or 500, or
750, or 1000 bases and more than 10,000 or 100,000 or 1 million
spots per mm.sup.2 containing multiple copies of the same DNA per
spot, wherein more than 1000 or 10,000 or 100,000 different DNA is
present in the array and which DNA is at which spot is determined
after DNA attachment. [0317] Dependent claim: more than 50, 60, 70,
80, 90 or 95% of spots in the grid have single informative DNA
species excluding errors produced by amplification. [0318]
Dependent claim: a plate with 2, 4, 6, 8, 10, 12, 16, 24, 32, 48,
64, 96, 192, 384 or more such DNA arrays, where in most cases the
same DNA is in different spots in the individual arrays. [0319]
Dependent claim: array containing DNA fragments from multiple
(2-2000, 10-2000, 20-2000, 50-2000, 100-2000, 100-10,000,
500-10,000 species. [0320] Dependent claims: array containing DNA
fragments that have SNP or other differences between individuals or
species. [0321] Dependent claim for all above product claims: DNA
copies per spot produced by RCR before attachment [0322] Dependent
claim for all above product claims: DNA isolated from natural
sources. *Identity or sequence of DNA/RNA or other detector
molecule in usable spots is inferred by matching hybridization or
other binding signature or partial or complete polymer sequence to
reference data base of signatures or sequences.
[0323] A support with protein, peptide or other polymer detector
molecules spots in a grid or random spot array with informative
peptide or other polymer longer than 15, or 25, or 50, or 75 or 100
or 125, or 150, or 200, or 250, or 300, or 400, or 500, or 750, or
1000 and more amino acids or other monomers, and more than 10,000
or 100,000 or 1 million spots per mm.sup.2 containing multiple
copies of the same peptide or other polymer per spot, wherein more
than 1000 or 10,000 or 100,000 different peptides or other polymers
is present in the array and which peptide or other polymer is at
which spot is determined after peptides or other polymer attachment
to the support. *Identification of which peptide or other polymer
is present in a spot by generating binding signature using
antibodies, oligo peptides, oligonucleotides, sets of compounds.
*Binding signatures developed by experimental testing of known
peptides or other polymers in tubes, wells or spotted arrays with
predefined spot for each tested peptide or other polymer. *Expected
binding signatures developed by computing binding properties of
each expected peptide (or other polymer) with each binder
molecule.
[0324] 2.1. Examples of DNA Nano-Ball Array Preparation and DNA
Identification
[0325] 2.1.1. DNA Targets on the Random Array Derived from
Different Sequences can be Specifically Identified by
Sequence-Specific Probes
[0326] PCR products from diagnostic regions of Bacillus anthracis
and Yersinia pestis were converted into single stranded DNA and
were attached with a universal adaptor. These 2 samples were then
mixed and replicated together using the rolling circle replication
method and deposited onto the random array. Successive
hybridization with amplicon specific probes showed that each spot
on the array corresponded uniquely to either one of the 2 amplicon
sequences and they can be identified specifically with the probes.
This result demonstrates sensitivity and specificity of identifying
DNA present in submicron size spots created by attaching DNA
nano-balls having about 100-1000 copies of a DNA fragment generated
by RCR reaction.
[0327] Amplicons Ba3 (a 155 by amplicon sequence
5'-TCCCAATACATATGAGCGATTCGCCTTTAT
AAACGACGTATTCCTTTGAACTCGTTATGACACTCATTACTCAACTCCCCTTTTCTACTAAAATAGCGTTTTT-
GTTT GGTTTTTTTCTTCACATAATCCGTCCTATTTGATTTTTACATACCACC-3' from B.
anthracis) and Yp3 (a 275 by amplicon sequence 5'
TGTAGCCGCTAAGCACTACCATCCCCTCAAGGTTATTGACGGTATCGAGTAG
GGTTAGGTGGGCATCATTGTCCATTTCATGGCGGTAATATCGGGATGAGATAACGCGGGTGTCATGGACGTAT-
GG
CGGGTCAACAAAATGAAGCGTTGAAACTGTGTCATGGTCTAACATGCATTGGACGGCATCACGATTCTCTA-
CCAAA
ACGCCCTCGAATCGCTGGCCAACTGCTGCCAAGTTTTCAGGCATCCTTGCCCAAAGGTGTTGAGCTGT-
TGCC-3' from Y. pestis) were amplified using standard PCR
techniques with PCR primers Ba3F (5'-TCCCAATACATATGAGCGATTCGCC-3')
and Ba3R (5'-GGTGGTATGTAAAAATCAAATAGGA-3') for Ba3, Yp3F (5'
TGTAGCCGCTAAGCACTACCATCC-3') and Yp3R
(5'-GGCAACAGCTCAACACCTTTGG-3') for Yp3. Ba3R and Yp3R were
phosphorylated, therefore the complementary strands of the PCR
products were phosphorylated at the 5' end. Single stranded form of
the PCR products was generated by degradation of the phosphorylated
strand using lambda exonuclease (Epicenter). The 5' end of the
remaining strand was phosphorylated using T4 DNA polynucleotide
kinase (Epicenter) to allow ligation to the universal adaptor. The
universal adaptor (sequence 5'-TATCATCTACTGCACTGACCGGATGTTAGGAAGAC
AAAAGGAAGCTGAGGGTCACATTAACG GAC-3') was ligated using T4 DNA ligase
(Epicenter) to the 5' end of the target molecule assisted by a
template oligonucleotide (Ba3-5end 5'-ATTGGGAGTCCGTTAATGTGAC-3' for
amplicon Ba3, Yp3-5end 5'-GGCTACAGTCCGTTAATGTGAC-3' for amplicon
Yp3) specifically complementary to the 5 end of the targets and 3'
end of the universal adaptor. The adaptor ligated targets were then
circularized using bridging oligonucleotides (Ba3-3end
5'-AGATGATAGGTGGTAT-3' for amplicon Ba3, Yp3-3end
5'-AGATGATAGGCAACAG-3' for amplicon Yp3) with bases complementary
to the adaptor and to the 3' end-of the targets. Linear DNA
molecules were removed by treating with exonuclease I (New England
Biolabs) at 37.degree. C. for 4 hours under standard reaction
conditions. Rolling circle replication (RCR) products (DNA
nano-balls) were generated by mixing the single-stranded samples of
Ba3 and Yp3 together, and using Phi29 polymerase (New England
Biolabs) to replicate around the circularized adaptor-target
molecules with the bridging oligos as the initiating primers.
Specifically, 0.1 to 0.5 pmol of the circularized DNA was incubated
with 5 units of Phi29 DNA polymerase, 2 pmol of the bridging
oligos, 0.4 mM dNTP, 0.2 mg/ml BSA and 1.times. standard Phi29 DNA
polymerase buffer at 30.degree. C. for 2 hours, followed by 1 hour
incubation at 55.degree. C. with 1 ug/ul proteinase K. The RCR
products were captured on the glass slide via the capture oligo
(sequence 5'-GGATGTTAGGAAGACAAAAGGAAGCTGAGG-3') attached to
derivatized glass coverslips (Corning) that is complementary to the
universal adaptor sequence.
[0328] To prepare the coverslips for attaching amine-modified
oligonucleotides, the coverslips were first cleaned in a
potassium/ethanol solution followed by rinsing and drying. They
were then treated with a solution of
3-aminopropyldimethylethoxysilane, acetone, and water for 45
minutes and cured in an oven at 100.degree. C. for 1 hour. As a
final step, the coverslips were treated with a solution of
p-phenylenediisothiocyanate (PDC), pyridine, and dimethylformamide
for 2 hours. The capture oligo is modified at the 5' end with an
amine group and 2 C-18 linkers. For attachment, 10 pl of the
capture oligo at 10 pM in 0.1 M NaHCO.sub.3 was spotted onto the
center of the derivatized coverslip, dried for 10 minutes in a
70.degree. C. oven and rinsed with water. To create an array of DNA
nano-balls, the RCR reaction containing the DNA nano-balls was
diluted 10-folds with 3.times.SSPE, 20 pl of which was then
deposited over the immobilized capture oligos on the coverslip
surface for 30 minutes in a moisture saturated chamber. The
coverslip with the DNA nano-balls was then assembled into the
reaction chamber of the rSBH instrument and was rinsed by 2 ml of
3.times.SSPE.
[0329] The arrayed target molecules were probed sequentially with
TAMRA-labeled oligomer: probe BrPrb3 (sequence.: 5'-CATTAACGGAC-3',
specifically complementary to the universal adaptor sequence),
probe Ba3 (sequence: 5'-TGAGCGATTCG-3', specifically complementary
to the Ba3 amplicon sequence), probe Yp3 (sequence:
5'-GGTGTCATGGA-3', specifically complementary to the Yp3 amplicon
sequence). The probes were hybridized to the array at a
concentration of 0.1 pM for 20 min in 3.times.SSPE at room
temperature. Excess probes were washed off with 2 ml of
3.times.SSPE. Images were taken with the TIRF microscope. The
probes were then stripped off with 1 ml of 3.times.SSPE at
80.degree. C. for 5 minutes to prepare the arrayed target molecules
for the next round of hybridization.
[0330] By overlaying the images obtained from successive
hybridization of these 3 probes, (FIG. 10) shows that most of the
arrayed molecules that hybridized with the adaptor probe (blue
spots) would only hybridized to either the Ba3 probe (red spots) or
the Yp3 probe (green spots), with very few that would hybridized to
both. This specific hybridization pattern demonstrates that each
spot on the array contains only one type of sequence, either the
Ba3 amplicon or the Yp3 amplicons. It also demonstrates that the
rSBH process is able to distinguish target molecules of different
sequences deposited onto the array by using sequence specific
probes.
[0331] 2.1.2. Decoding a Base Position in Arrayed DNA Nano-Balls
Created from a Synthetic 80-Mer Oligo with a Degenerate Base
[0332] Individual molecules of a synthetic oligo containing a
degenerate base can be divided into 4 sub-populations, each will
have either an A, C, G or T base at that particular position. An
array of DNA nano-balls created from this synthetic DNA will have
about 25% of spots with each of the bases. We demonstrated
successful identification of these sub-populations of DNA
nano-balls by four successive hybridization and ligation of pairs
of probes specific to each of the 4 bases. The results demonstrate
ability to determine partial or complete sequence of DNA present in
DNA nano-balls by increasing number of consecutive probe cycles or
by using 4 or more probes labeled with different dyes per each
cycle.
[0333] A synthetic oligo (T1A:
5'-NNNNNNNNGCATANCACGANGTCATNATCGTNCAAACGTCAGTCCAN GAATCNAGATCCAC
TTAGANT
[0334] -3') contains at position 32 a degenerate base. Universal
adaptor was ligated to this oligo and the adaptor-T1A DNA was
circularized as described before. DNA nano-balls made using the
rolling circle replication (RCR) reaction on this target were
arrayed onto the random array. Because each spot on this random
array corresponded to tandemly replicated copies originated from a
single molecule of T1A, therefore DNA in a particular arrayed spot
would contain either an A, or a C, or a G, or a T at positions
corresponding to position 32 of T1 A. To identify these
sub-populations, a set of 4 ligation probes specific to each of the
4 bases was used. A 5' phosphorylated, 3' TAMRA-labeled pentamer
oligo corresponding to position 33-37 of T1A with sequence CAAAC
(probe T1A9b) was paired with one of the following hexamer oligos
corresponding to position 27-32: ACTGTA (probe T1A9a), ACTGTC
(probe T1A10a), ACTGTG (probe T1A11a), ACTGTT (probe T1AI2a). Each
of these 4 ligation probe pairs should hybridize to either an A-,
C-, G- or T-containing version of T1 A.
[0335] For each hybridization cycle, the probes were incubated with
the array in ligation/hybridization buffer (50 mM Tris-Cl, pH7.8,
10% PEG, 1 mM ATP, 50 mg/L BSA, 10 mM MgCl2i 0.05 unit/NI T4 DNA
ligase (Epicenter) and 10 mM DTT) at 20.degree. C. for 5 minutes.
Excess probes were washed off with 2 ml of wash buffer (50 mM
Tris-Cl, pH7.5, 10 mM MgCl2) at 20.degree. C. and images were taken
with the TIRF microscope. Bound probes were stripped with 10 mM
Tris-Cl, pH8.0 to prepare for the next round of hybridization.
[0336] The adaptor specific probe BrPrb3 at 0.1 pM was hybridized
to the array to establish the positions of all the spots (shown as
blue in FIG. 11). The 4 ligation probe pairs, at 0.4 NM, were then
hybridized successively to the array: the spots hybridized to the
A-specific ligation probe pair are shown as red in FIG. 11, the
C-specific spots are green, G-specific spots are yellow and the
T-specific spots are cyan. In FIG. 11, circle A indicates the
position of one of the spots hybridized to both the adaptor probe
and the A-specific ligation probe pair, suggesting that the DNA
arrayed at this spot is derived from a molecule of T1A that
contains an A at position 32. It is clear that most of the spots
are associated with only one of the 4 ligation probe pairs, and
thus the nature of the base at position 32 of T1A can be determined
specifically.
[0337] Using an in-house image analysis program, spots were
identified using the images taken for the hybridization cycle using
the adaptor probe. The same spots were also identified and the
fluorescent signals were quantified for subsequence cycles with the
base-specific ligation probes. A instrument background of 205 was
subtracted off of each signal. A discrimination score was
calculated for each signal: for each base-specific signal of each
spot, it was divided by the average of the other 3 base-specific
signals of the same spot. For each spot, the highest of the 4
base-specific discrimination scores was compared with the second
highest score, and if the ratio of the two was above 1.8, then the
base corresponding to the maximum discrimination score is selected
for the base calling. In this analysis over 500 spots were
successfully base-called and the average discrimination score is
3.34. The average full match signal is 272, while the average
single mismatch signal (signals from the un-selected bases) is
83.2, thus the full match/mismatch ratio is 3.27. The image
background noise was calculated by quantifying signals from
randomly selected empty spots and the average signal of these empty
spots is 82.9, thus the full match/background noise ratio is 3.28.
In these experiments the mismatch discrimination is limited by the
low full match signal relative to the background.
[0338] By adjusting assay conditions (buffer composition including
addition of NaCl, concentrations of all components and time and
temperature of each step in the cycle) higher signal to background
and full match to mismatch ratios are expected as demonstrated with
similar ligation assay performed on our spotted arrays of 6-mer
probes. In this case full match/background ratio is about 50 and
average full match/mismatch ratio is 30.
[0339] 2.1.3. Preparation of Glass Slides for Attaching Capture
Oligonucleotides
[0340] The cover slips are prepared by cleaning them in a solution
of Potassium hydroxide and ethanol. They are then rinsed and dried.
After drying that are immersed in a solution of
3-aminopropyldimethylethoxysilane, acetone, and water for 45
minutes. After rinsing to remove excess reagents, they are cured in
an oven at 100c for 1 hour. The cover slips are allowed to cool to
room temperature and immersed in a solution of
p-phenylenediisothiocyanate (PDC), pyridine, and dimethylformamide
for 2 hours. After rinsing and drying the slides are ready to bind
amine-modified oligonucleotides.
[0341] In another embodiment, a solution of
3-aminopropyldimethoxysilane, trimethylethoxysilane, acetone and
water is used in the second step. The trimethylethoxysilane is used
in various ratios to the 3-aminopropyidimethylethoxysilane to
control the density of 3 aminopropyldimethylethoxysilane on the
surface. By using a non-amino functionalized silane, we will
produce fewer amino-functionalized sites on the surface, ultimately
reducing the number oligonucleotides that can bind to the surface
during capture probe attachment or hybridization assays of the DNA
array.
[0342] Under certain conditions it may be advantageous to use
silanes that have longer or shorter alcohol groups on the silane
molecule. For example we could use trimethylmethoxysilane in place
of trimethylethoxysilane to control the activation rate of the
silane molecule in solution. Mixtures of "ethoxy" and "methoxy`
silanes could be used to produce better control over silane
activation rates.
3. ADDITIONAL EXAMPLES OF METHODS AND PROCESSES USED IN PRODUCING
OR APPLICATIONS OF saDNA CHIPS OR OTHER PRODUCTS
[0343] 3.1. The Ordered Random Array Process
[0344] The core of this new approach involves the creation and
efficient analysis of high-density random arrays containing
millions of DNA molecules. Such random arrays eliminate the costly,
time-consuming steps of arraying probes on the substrate surface
and the need for individual preparation of thousands of sequencing
templates. Instead they provide a fast and cost effective way to
analyze complex DNA mixtures.
[0345] DNA molecules are arrayed at a density of about one molecule
per square micron of substrate. A 3.times.3 mm array has the
capacity to hold 1-10 million fragments, or approximately 1-10
billion DNA bases, the upper limit being the equivalent of three
human genomes.
[0346] We describe here two broad platforms for nanoscale, ordered
arrays of DNA
[0347] In the first platform, random sequencing by hybridization
(rSBH) utilizes attached DNA nano-ball targets and multiple cycling
of probe pool solutions over the attached targets The rSBH process
preserves all the advantages of combinatorial SBH demonstrated on
our HyChip product, including the high specificity of the ligation
process. At the same time it adds several important benefits that
result from the attachment of DNA fragments instead of probes. DNA
attachment creates the possibility of using random DNA arrays with
much greater capacity than regular probe arrays, and allows
detection by ligation of two labeled probes in solution. In
addition, having both probe modules in solution allows us to expand
our informative probe pool (IPP) strategy to both probe sets, which
was not possible on the HyChip.TM. format. The rSBH process allows
for the identification of unknown-sequence targets bound to the
surface substrate either through full sequencing or by partial
sequence signatures. In this regard the targets are "random"
because that are distributed randomly at attachment sites on the
array surface. Once identification of the surface targets is
enabled either through full or partial sequence acquisition the
targets may now take on the role of probes in a subsequent
hybridization assay of solution phase targets. This latter process
is termed Self-assembled DNA nano-arrays (saDNA) and is the basis
for the second sequencing platform described here.
[0348] 3.2. The Instrument
[0349] The system hardware consists of three major components; the
illumination system, the reaction chamber, and the detector system.
These components work together to provide single fluorophore
detection sensitivity. TIRFM creates a 100-500 nm thick evanescent
field at the interface of two optically different materials (4).
The evanescent field is an extension of the beam energy that
reaches beyond the glass/water interface by a few hundred
nanometers (generally between 100-500 nm). This field can be used
to excite fluorophores close to the glass-water boundary and
virtually eliminates background from the excitation source.
[0350] The substrate, once attached to the reaction chamber, forms
the bottom section of a hybridization chamber. This chamber
controls the hybridization temperature, provides ports for the
addition of probe pools or targets to the chamber, removal of the
probe pools or targets and substrate washing. A fluorescently
labeled solution is introduced into the chamber and is given time
to hybridize with the attached DNA. A high sensitivity CCD camera
capable of single photon detection is used to detect fluorescent
hybridization/ligation events. For sequence determination or
signature identification of attached targets multiple solution
phase probe pools may be cycled through the chamber. After image
acquisition the chamber may be flushed to remove all probes and the
next probe pool is introduced. This process is repeated 256-512
times until all probe pools have been assayed.
[0351] The detection instrument has been fully assembled with
features including: adjustable laser power, electronic shutter,
auto focus, and operating software. The system was optimized and
tested using arrays of individual Tamra dye molecules and arrays of
dendrimers with 350 and 50 dye molecules. Detection of a single
molecule of dye is achieved in about 10% of cases demonstrating
projected sensitivity of a TIRFM-based strong illumination/low
background process coupled with a sensitive CCD camera. This result
also demonstrates that detection of single molecules is
statistically inefficient due to various physical and chemical
factors and that a target DNA amplification schema is required.
Dendrimers with only 50 dye molecules produce signal that is 50
fold stronger than background indicating that about 100 fold target
amplification would be sufficient. We also developed and optimized
a reaction chamber with automated temperature control and liquid
handling including efficient washing of the chamber. The rSBH
instrument is currently fully operational for individual
hybridization cycles. For handling 256-1024 hybridization cycles we
have designed a robotic station that is fully integrated with both
reaction chamber and detection system.
[0352] 3.3. Coverslip Chemistry and Design
[0353] Effective glass activation chemistry has been developed that
creates a monolayer of isothiocyanate reactive groups for attaching
amine modified primers or DNA capture oligonucleotides. This
monolayer chemistry reduces trapping of labeled probe and thus
dramatically reduces the assay's background.
[0354] 3.4. Linear Rolling Circle Replication (RCR) from ss Circles
and Surface Attachment
[0355] Our approach to create random arrays of single molecule
sequences has been to linearly amplify the target or saDNA-probe to
be sequenced or identified in the form of concatemers. This
strategy creates an attached DNA molecule with many rSBH-probe
binding sites resulting in higher and more sustained signal
intensities than obtained with single fluorophores. We describe
these condensed concatemers of linear DNA as DNA nano-balls. The
long, single-stranded concatemers are generated by a rolling circle
replication (RCR) process that relies upon the desired target
molecule first being formed into a circular substrate.
[0356] Amplification of the DNA in this form has several
advantages; 1) The amplification is linear so copies of the same
template are created which prevents mutated copies from
over-representing the original template sequence. 2) All of the
copies are localized to the one single molecule and so are ideal
for microscopic analysis with fluorescent probes. Amplification
proceeds at 30.degree. C. in the presence of phi29 polymerase and
dNTPs. One strand is a closed circle and acts as the template, the
other strand with an exposed 3-prime end acts as an initiating
primer and is extended. The strand displacing activity of the
enzyme creates a long single stranded molecule of hundreds or
thousands of copies of the circle. Regions of the single stranded
molecule (in the adapter sequence) are utilized to form stable
hybrids to complementary oligonucleotides attached to the surface
of the coverslip.
[0357] For the task of preparation of random arrays of amplified
DNA, single-stranded hybridization-ready concatemers of DNA
fragments were generated by RCR. The continuous strand extension
creates a long, single-stranded DNA consisting of hundreds of
concatemers complementary to the circle. We found that, if arrayed,
these concatemers form long threads on the regular glass surface
(FIG. 12, panel c). To achieve compact, dense bundles of the DNA in
the form of sub-micron spots or nano-balls we utilized a region of
the amplified molecule for hybridization to a capture probe
attached to the glass surface Zi FIG. 4, panels a and b). Hundreds
of capture probe molecules (spaced about. 10 nm apart) keep
hundreds of concatenated copies of a target molecule tightly bound
to a glass surface area of less than 300 nm in diameter.
[0358] In one study, two synthetic targets were co-amplified and
about one million molecules captured on the glass surface, and then
probed for one of the targets. After imaging and photo-bleaching
the first probe, the second target was probed. There was no
evidence of co-localization of targets under these conditions. We
then demonstrated that a fluorescent llmer probe could hybridize to
bound DNA and produce a strong signal equivalent to a 30mer probe.
We also confirmed that the probe could be removed through heating
at 70.degree. C. and then re-hybridized to produce equally strong
signals.
[0359] Uniform RCR Amplicon Length
[0360] One observed feature of RCR generated concatemers has been a
range of feature sizes produced from a homogeneous target
population (see FIG. 13) This may be a result of extension
initiating at different times on different circular templates or
different rates of extension are occurring for individual
polymerase molecules. It is believed that one polymerase molecule
is responsible for the continuous extension of a primer (5)
although we are not aware of any studies describing an upper limit
to size of product produced by a single polymerase molecule. To
create more uniform sizes of the amplified targets we will
incorporate dideoxy nucleotides as a very small proportion of the
total dNTP concentration (e.g. 1 in 50,000). This may have the
effect of terminating those molecules that extend at a more rapid
rate than other molecules that either initiate later or extend at
slower rates. In another approach to create more uniform sizes of
the amplified targets we may block short concatemers by consuming
all potential binding sites with a predefined number of concatemer
complementary sequences introduced before surface attachment. This
could be achieved by creating ligation concatemers of the 30-40
base capture oligonucleotide attachment site.
[0361] 3.5. Methods for Circle Formation from Double Stranded DNA
3.5.1. Method I
[0362] A universal adapter that also serves as the binding site for
capture probes and RCR primer is ligated to the 5' end of the
target molecule using a universal template DNA containing
degenerate bases for binding to all genomic sequences. The 3' end
of the target molecule is modified by addition of a poly-dA tail
using terminal transferase. The modified target is then
circularized using a bridging template complementary to the adapter
and to the oligo-dA tail (FIG. 14).
[0363] 3.5.2. Method II
[0364] Single stranded PCR products can be prepared by exonuclease
digestion of one of the strands or by strand separation with high
temperature and rapid cooling. Primer sequences will be
incorporated into the 5-prime ends of the primers to allow for the
hybridization of a bridge oligonucleotide for circularization (FIG.
15). This approach can be utilized for genomic fragment capture
with adapters ligated to restriction enzyme fragmented genomic DNA.
With two adapters, approximately 50% of fragments will possess two
different adapters at each end which can then be used for strand
removal and circle formation.
[0365] Capture sequences in the bridge will be the same for each
molecule but probe binding sequences for sequence identification
will vary. Circularization of the molecule proceeds with a bridging
oligonucleotide of about 20-30 bases in length that will bring the
two ends into juxtaposition for ligation by T4 DNA ligase. The
bridging molecule can now act as the primer for extension.
Amplification of DNA captured into the circular molecules proceeds
by a rolling circle replication to form long linear concatemer
copies of the circle.
[0366] 3.6. rSBH-Probe Cycling
[0367] The novelty of the method proposed here is that millions of
single DNA molecules, randomly arrayed on an optically clear
surface, serve as templates for hybridization and ligation of
fluorescent-tagged probe pairs. Pairs of probe pools, at least one
of which is labeled with a fluorophore, are mixed with DNA ligase
and presented to the random array. When probes hybridize to
adjacent sites on a target fragment they are ligated together,
forming a stable hybrid. A sensitive mega pixel CCD camera with
advanced optics is used to simultaneously detect millions of these
individual hybridization/ligation events on the entire array. Once
signals from the first pool are detected, the probes are removed
and successive ligation cycles are used to test different probe
combinations. The fixed position of the CCD camera relative to the
array ensures accurate tracking of consecutive hybridizations to
individual target molecules. The entire sequence of each DNA
fragment is compiled based on fluorescent signals generated by
hundreds of independent hybridization/ligation events.
[0368] Detection of the attached concatemers can also occur with
sequence specific probes. To identify specific mutations of the
concatemer, TAMRA labeled 5mer probes can be used in conjunction
with a pair of 6mers to identify the base sequence at the mutated
site.
[0369] We have demonstrated the ability for probe ligation to occur
with the condensed concatemers. Reactions were carried out at 201 C
for 10 min using our ligation kit followed by a brief wash of the
chamber to remove excess probe.
[0370] 3.7. High Density Structured Random Arrays
[0371] The proposal is to structure random DNA arrays into a high
density grid, such that each DNA binding site is only 100-300 nm in
size and each binding site contains only a single DNA fragment.
This approach should minimize cross hybridization between DNA
targets, while at the same time substantially decreasing the size
of each binding site and thus increasing the density of binding
sites per array. The significance of being able to efficiently and
inexpensively make such "perfect" random DNA arrays is tremendous.
Maximizing the number of DNA segments per surface area will enable
scientists to analyze a complex genome on one small glass chip,
about 1 cm2 in size or less. A CCD chip can be perfectly aligned
with the DNA array to provide a one to one correspondence between
each CCD pixel and DNA binding site, maximizing reading
efficiency.
[0372] Development of DNA random arrays in the form of "perfect"
high density grids with sub-micron spots will provide the basis for
daily sequencing of multiple human genomes using affordable 10 mega
pixel CCD detectors. These whole genome DNA arrays have over 1000
times more DNA spots than the current high density probe arrays.
Because a one cm2 chip can hold over one billion DNA fragments
(>100 billion bases or over 30 human genomes) an automated
process can be developed such that the total sequencing reaction
volume for 100 interrogation cycles would be only 1 ml, reducing
sequencing cost to less than $1000 dollars per genome.
[0373] The proposed high density structured random DNA array chip
will have capture oligonucleotides concentrated in small,
segregated capture cells aligned into a rectangular grid formation
(FIG. 16). Most importantly, each capture cell or binding site will
be surrounded by an inert surface and will have a sufficient but
limited number of capture molecules (100-400). Each capture
molecule will bind one copy of the matching adaptor sequence on the
RCR produced DNA concatemer. Since each concatemer contains over
1000 copies of the adapter sequence, it will quickly saturate the
binding site upon contact and prevent other concatemers from
binding, resulting in exclusive attachment of one RCR product per
binding site or spot. The proper concentration of RCR products and
sufficient reaction time will ensure that almost every spot on the
array contains one and only one unique DNA target.
[0374] RCR "molecular cloning" allows the application of the
saturation/exclusion (single occupancy) principle in making random
arrays. The exclusion process is not feasible in making single
molecule arrays if an in situ amplification is alternatively
applied. RCR concatemers provide an optimal size to form small
non-mixed DNA spots. Each concatemer of about 100 kb is expected to
occupy a space of about 0.10.10.1 um. This indicates that RCR
products can fit into the 100 nm capture cells. Another advantage
of RCR products is that the single stranded DNA is ready for
hybridization and is very flexible for forming a randomly coiled
ball of DNA. It is important to note that 1000 copies of DNA target
produced by RCR provide much higher specificity than analysis of
single molecule. Thus, RCRs provide several important advantages
without any serious penalties.
[0375] Having 125-250 nm DNA sites in a regular grid with 250-500
nm center-to-center spacing will provide 20-80 times more DNA
samples per surface than arrays with random attached DNA with spots
of about 1000 nm in size and 20% usable occupancy. This will result
in 20-80 fold lower reagent consumption and 20-80 fold faster
readout. Furthermore, attaching RCR products onto this dense grid
of capture probe spots ensures that each DNA ball is concentrated
on a much smaller surface, increasing the signal and the speed of
biochemical assays. Overall, the reduction of DNA attachment spots
from 500 nm to 125 nm in size will result in up to 16 fold higher
signal intensities. In short, the proposed DNA arrays will provide
an order of magnitude lower cost, higher throughput and higher
sensitivity than standard random DNA arrays.
[0376] A long term goal is to develop a structured array of
384-unit arrays each 3.33.times.3.33 mm in size (10 mm.sup.2)
spaced at standard 384-well plate dimensions of 4.5 mm well to well
distance. This composite array can have 384.times.100 million DNA
spots spaced at 333 nm center to center a density that provides 10
million spots per mm.sup.2, 1 billion spots per cm.sup.2 or a total
of 38.4 billion DNA spots. To analyze these arrays at the speed of
100 million spots per second (one unit array per second) will
require a 30-100 mega-pixel CCD detector and it will take 6.5
minutes per cycle. The goal for the first usable system based on
the composite structured arrays would be to produce DNA features
that are spaced at 1 micrometer center to center and total of up to
3.8 billion spots (10 million per unit array) that can be read in
about 5 minutes with a 10 mega pixel CCD detector. One billion
binding sites with 100 base long DNA fragments can hold an
equivalent of 30 human genomes at 1.times. coverage.
[0377] Composite arrays of hundreds of smaller unit arrays have
many advantages over a single large array. For example, a subset of
genes instead of entire genomes can be selectively amplified in a
multiplex reaction and sequenced in hundreds of individuals at the
same time on one composite array. Another very important
application of array of arrays is to determine whole chromosome
sequence and haplotypes using our novel two-level fragmentation
method. This method represents an enabling technology that provides
mapping information for assembling chromosomal haplotypes and
alternatively spliced mRNAs for any analysis based on random DNA
fragmentation.
[0378] In this method, genomic sample DNA is first prepared in the
form of about 5, 10, 100, or 200 kb length fragments. By proper
dilution a small subset of these fragments are at random placed in
discreet wells of multi-well plates or similar accessories. For
example a plate with 96, 384 or 1536 wells can be used for these
fragment subsets. An optimal way to create these DNA aliquots is to
take only 10-30 cells, isolate the DNA, fragment in long segments
and then split the entire preparation into 384 wells. This will
assure that all chromosomal regions are represented with the same
coverage. The DNA aliquots will contain a few to 10, 20 or more
fragments. The fragment subset's complexity is determined by the
capacity of unit arrays and by statistical requirements. The goal
is to minimize cases where any two overlapping fragments from the
same region of chromosome or any two mRNA molecules transcribed
from the same gene are placed in the same subset, e.g. the same
plate well. For diploid genomes represented with 10.times. coverage
there are 20 overlapping fragments on average to separate in
distinct wells. By forming 384 fractions in a standard 384-well
plate there is only about 1/400 chance that two overlapping
fragments will end up in the same well. Even if some matching
fragments are placed in the same well, the other overlapping
fragments from each chromosomal region will provide the necessary
unique mapping information.
[0379] The prepared groups of long fragments are further cut to the
final fragment size of about 200 to 2000 bases. To obtain 10.times.
coverage of each fragment in a group, the DNA in each well may be
amplified before final cutting using well-developed whole genome
amplification methods. All short fragments from one well will then
be arrayed and sequenced on one separate unit array or in one
section of a larger continuous matrix. The above described
composite array of 384 unit arrays is ideal for parallel analysis
of these groups of fragments. In the assembly of long sequences
representing parental chromosomes, the algorithm will use the
critical information that short fragments detected in one unit
array belong to a limited number of longer continuous segments each
representing a discreet portion of one chromosome or one mRNA
molecule in the case of analyzing expressed sequences. In almost
all cases the homologous chromosomal segments will be analyzed on
different unit arrays. Long continuous initial segments form a
tailing pattern and provide sufficient mapping information to
assemble each parental chromosome separately as depicted below by
relaying on about 100 polymorphic sites per 100 kb of DNA. Dots
represent 100-1000 consecutive bases that are identical in
corresponding segments.
Example:
TABLE-US-00003 [0380] Well 3 T C C. . . G A Well 20 C T T. . . A G
C. . . Well 157 T. . . A G C A. . . C. . . Well 258 . . . C C. . .
G A T G . . . T . . .
[0381] Wells 3 and 258 assemble mother's chromosome:
TABLE-US-00004 T C C. . . G A T G. . . T. . . . Wells 20 and 157
assemble father's chromosome:. C T T. . . A G C A. . . C. . .
[0382] Random arrays prepared by two-level DNA fragmenting combines
the advantages of both BAC sequencing and shotgun sequencing in a
simple and efficient way. In addition to haplotype determination,
this innovation will extend use of random DNA arrays for de novo
sequencing of complex genomes or mixtures of genomes, e.g. all
bacterial and protozoa genomes in a drop of see water.
[0383] Overall, the high density structured random DNA arrays and
array of arrays will provide 20-80 times more DNA binding sites per
surface area than random attachment, resulting in several
advantages:
[0384] 1) A 20-50 fold overall increase in sequencing efficiency
per array
[0385] 2) A 20-50 fold decrease in reagent use and sequencing time
and thus an equally large cut in cost.
[0386] 3) Increased array reading efficiency, since each pixel of
the CCD camera can be aligned to one spot on the ordered array,
resulting in the largest possible density of spots per image.
[0387] 4) No overlaps between DNA targets, since targets will be
spaced 250-500 nm center to center with 100-300 nm of inert surface
space between binding sites.
[0388] 5) A 16 times higher signal, because DNA targets will be
concentrated in a much tighter ball over dense 125 nm spots of
probes.
[0389] 6) A very stable DNA array, since there are over hundred
attachment points for each RCR product.
[0390] 7) A probe and enzyme friendly array, since most of DNA is
not directly attached to the glass surface, thus it would be
accessible to ligase, polymerase or other DNA processing enzymes
and hybridization probes.
[0391] 8) Flexibility in making and using an array of structured
random arrays for more efficient haplotype and splice variant
determination, analysis of multiple samples in parallel, staggered
sequencing reaction to eliminate the idle time of CCD detectors,
parallel probing cycles to shorten the sequencing completion time
of longer DNA fragments.
[0392] Structured, high-density random arrays with submicron
patterned support surfaces and RCR concatemers also have many
advantages over probe arrays and DNA-on-bead arrays:
[0393] 1) Light or focused particle induced patterning of the
surface is much easier because only one universal "mask" is
required. Making an array of 20-mers by in situ synthesis requires
80 steps and 80 masks. The ease of one-step patterning allows for
smaller, higher density grid cells or binding sites. Thus, in
addition to containing a much higher concentration of grid cells
(billions) compared to bead arrays with large grid cells or wells,
this patterned surface is far simpler and cheaper to prepare.
[0394] 2) Amplification of DNA fragments by RCR provides ideal
"molecular cloning" in solution without any segregation of
individual molecules by physical barriers. The only requirement is
the proper concentration of target molecules. A single reaction
tube with 1000 ul of RCR solution can amplify one billion
fragments, each of which is allocated to a 10.times.10.times.10
micrometer volume on average. Each concatemer is expected to occupy
a space of about 0.1.times.0.1.times.0.1 um. Thus, the average
distance between concatemers in RCR solution is 100 times larger
than their size. This distance minimizes DNA chain entanglements
between concatemers. RCR combined with a patterned surface is an
inexpensive solution to make billions of DNA spots in comparison
with arrays of long gene specific probes prepared by in situ
synthesis of oligonucleotides.
[0395] 3) In comparison to probe arrays, random DNA arrays provide
a better solution for sequencing complex genomes because complex
genomes are broken into millions of parallel low complexity
sequencing reactions. Structured arrays are especially efficient in
providing over 10 billion DNA spots. The DNA array format allows
accurate determination (by counting) of low frequency mRNAs or SNPs
in complex sample pools. For testing 1000 SNPs in such a pool with
that frequency, a unit random array with 10 million DNA fragments
would be sufficient.
[0396] There are also several advantages of rSBH over sequencing by
synthesis even if the latter is done on the same structured
arrays:
[0397] 1) It is based on an efficient and proven probe ligation
biochemistry that is easy to perform in cycles
[0398] 2) Can analyze multiple fragments per grid cell with proper
total sequence complexity
[0399] 3) Has longer adjustable read length from 100-1000 bases
[0400] 4) Allows data combination of different probes tested on
different unit arrays prepared from the same DNA sample; this
parallel data acquisition cuts the assay time 4-8 fold
[0401] 5) Allows to use large number of dyes per cycle (much more
than maximal number of 4 dyes allowed in sequencing by synthesis)
to reduce number of cycles, e.g. total assay time
[0402] 6) 11 reads per base by 11 overlapping 11-mers provides
higher accuracy per each DNA strand;
[0403] 7) There is no signal degradation with each consecutive
cycle or in bases following "reading stops"
[0404] 8) Provides partial sequence signature analysis for long DNA
including entire mRNA (2-5 kb) per spot using special probe pools;
an important advantage for efficient gene expression and splice
variant analyses
[0405] The main limitation of rSBH and SBH in general is the
difficulty in determining exact length of long simple repeats
(ACACACACACAC . . . ), usually longer than about 10 bases. Special
probes can be used for extending the read length of such sequences.
For example for reading (A)n repeats, probes
(C,G,T).sub.3A.sub.6-10 and A.sub.6-10(C,G,T).sub.3 alone or in
combination with (A).sub.7-20 spacers can be used in 10-15
additional ligation cycles to extend the read length of simple
repeats to about 30 bases.
[0406] Even though SBH does not provide direct positional
information (which base is on which position) sequencing of short
(100-200 by fragments) in random DNA arrays removes the limitation
of branching points for de-novo sequence assembly in rSBH because
mathematically proven de-novo read length of 11-mers is over 1000
bases. We have used combinatorial probe ligation on HyChip
universal arrays for successful de novo sequencing of DNA samples
100-700 base in length.
4. METHODS AND APPLICATIONS FOR RSBH AND RCR UTILIZING
TECHNOLOGIES
[0407] 4.1. Genomic Region Isolation
[0408] 4.1.1. Method I. Primer Extension
[0409] Primer extension from a genomic DNA template may be used to
generate a linear amplification of greater than 10 kilobases of
sequence surrounding the genomic region of interest. To create a
population of defined size targets, 20 cycles of linear
amplification will be performed with the forward primer followed by
20 cycles with the reverse primer (FIG. 17). Before applying the
second primer, the first primer can be removed with a standard
column for long DNA purification or degraded if a few uracil bases
are incorporated. A greater number of reverse strands are generated
relative to forward strands resulting in a population of double
stranded molecules and single stranded reverse strands. The reverse
primer for the test DNA is biotinylated for capture to streptavidin
beads which can be heated to melt any double stranded homoduplexes
from being captured. All attached molecules will be single stranded
and representing one strand of the original genomic DNA. Although
full long-range PCR is an option here, the chance of introducing
base changes by polymerase mis-incorporation and selective
amplification of deleted products is minimized by avoiding
exponential amplification. In addition, the amount of sample
required for downstream random DNA array applications will still be
ample with a linear amplification approach.
[0410] The 10 kb products produced can be fragmented to 0.2-2 kb in
size (effectively releasing them from the solid support) and used
for RCR and random array production or RCR and solution phase
target production for saDNA. In this procedure single stranded DNA
fragments are first treated with terminal transferase to attach a
poly dA tail to the 3-prime end. This is then followed by ligation
of the free end intra-molecularly with the aid of another bridging
oligonucleotide (see section 3.5.1. for a description of the
procedure).
[0411] Once single stranded circles have been formed, a primer for
a strand displacing polymerase such as Phi 29 polymerase can be
used to create a long, linear concatemer of the circle. The
concatemers may then be attached to the surface of a glass slide
for detection with fluorescent probes or labeled and used as
targets on an array. Sequence specific probes may be designed for
specific regions of the attached concatemer, spaced about 1 kb
apart, within the 10 kb of original sequence. In effect this
process will "count" the number of individual molecules that were
produced through a positive or negative hybrid formation to probes
for targeted regions or non-targeted regions.
[0412] 4.1.2. Method II: Large Fragment Capture
[0413] Rare-cutting restriction enzymes may be mapped to the
genomic regions of interest to predict fragment size and the
sequences at the ends of the molecules. A possible enzyme includes
NotI, which cleaves the human genome on average every 130 kb and so
would be a suitable enzyme. Although methylation could affect the
cutting efficiency, if the genomic DNA is from a homogeneous source
then digestion patterns should be complete and not partial.
[0414] To isolate specific fragments from genomic DNA, all released
fragments will first be treated with lambda exonuclease. This
enzyme degrades bases from the 5-prime end of double stranded
templates possessing a 5-prime phosphate. The strand shortening
will be controlled to degrade approximately 50 to 100 bases of one
strand from each end. The single stranded sequence that is revealed
can act as a region of hybridization for a tagged primer for
selection. The primer will be extended with the Stoffel fragment of
Taq polymerase which will extend the strand until it is adjacent to
the 5-prime end of the degraded strand. The newly synthesized
strand can then be ligated to the undigested portion to complete
the strand with a thermostable ligase. One half of the primer
(3-prime) is used for sequence specific extension of the primer and
reconstruction of the strand. The other half (5-prime tagged end)
of the primer contains at least 20 bases of sequence for
hybridization to a complementary sequence attached to the surface
of a microplate well. To remove excess primers the sample will
first be filtered to remove small DNA fragments. The sample is then
hybridized via one end to the surface and non-attached sequences
are removed by gentle washing. Capture of the molecule via one end
allows one level of selection, but release of the captured molecule
and re-capture via the other end provides a second and higher level
of purification and selection.
[0415] After the final release of the 100 kb DNA fragment from the
surface the sample will be digested with a six-base restriction
enzyme. These fragments of 5 to 10 kb can be used for subsequent
DNase fragmentation and circle formation.
[0416] 4.2. Mutation Discovery by Mismatch Enzyme Cleavage
[0417] Several approaches to mutation detection employ a
heteroduplex in which the mismatch itself is utilized for cleavage
recognition. Chemical cleavage with piperidine at mismatches
modified with Hydroxylamine or Osmium tetroxide provides one
approach to release a cleaved fragment. In a similar way the
enzymesT7 endonuclease I or T4 endonuclease VII have been used in
the enzyme mismatch cleavage (EMC) technique (6-8).
[0418] Cleavase is used in the cleavage fragments length
polymorphism (CFLP) technique (9) which has been commercialized by
Third Wave Technologies. When single stranded DNA is allowed to
fold and adopt a secondary structure the DNA will form internal
hairpin loops at locations dependent upon the base sequence of the
strand. Cleavase will cut single stranded DNA five-prime of the
loop and the fragments can then be separated by PAGE or similar
size resolving techniques.
[0419] Mismatch binding proteins such as Mut S and Mut Y also rely
upon the formation of heteroduplexes for their ability to identify
mutation sites. Mismatches are usually repaired but the binding
action of the enzymes can be used for the selection of fragments
through a mobility shift in gel electrophoresis or by protection
from exonucleases (10).
[0420] Various factors may affect the specificity of cutting with
the mismatch enzymes such as temperature, pH, salt and possibly
sequence context (11) so to demonstrate the ability of the mismatch
enzymes to detect mutations under specific laboratory conditions,
we will use a set of synthetic targets to test for optimal
conditions. Two synthetic targets will be mixed, with one
containing either a single base change or a deletion of several
bases. The mixture will be heat denatured and re-annealed. The
re-annealed products will be treated with the mis-match detection
enzymes T7 endonuclease I or T4 endonuclease VII to determine the
most effective enzyme. In the case of T7 endonuclease I, a
population of molecules with 5-prime phosphorylated overhangs
surrounding the site of the mutation will be created while T4
endonuclease VII cuts 3-prime of the mismatch. A range of overhang
types will therefore be generated depending on the position of the
cut sites. Gel analysis will display the efficiency of cutting and
re-ligation will display the nature of the overhang.
[0421] 4.2.1. Method I: Capture of Mismatch Cleaved DNA from Primer
Extended Products
[0422] Templates for heteroduplex formation will be prepared by
primer extension from genomic DNA. For the same genomic region of
the reference DNA, an excess of the opposite strand is prepared in
the same way from the test DNA but in a separate reaction. The test
DNA strand produced is biotinylated and will be attached to a
streptavidin support. Homoduplex formation is prevented by heating
and removal of the complementary strand. The reference preparation
is now combined with the single stranded test preparation and
annealed to produce heteroduplexes (FIG. 18). This heteroduplex is
likely to contain a number of mismatches. Residual DNA is washed
away before the addition of the mismatch endonuclease, which, if
there is a mismatch every 1 kb would produce around 10 fragments
for a 10 kb primer extension. After cleavage, each fragment can
bind an adapter at each end and enter the mismatch-fragment circle
selection process (FIG. 20).
[0423] 4.2.2. Method II: Capture of Mismatch Cleaved DNA from Large
Genomic Fragments
[0424] The 5-10 kb genomic fragments prepared from large genomic
fragments in section 4.1.2 will be biotinylated by the addition of
a biotinylated dideoxy nucleotide at the 3-prime end with terminal
transferase and excess biotinylated nucleotide will be removed by
filtration. A reference BAC clone that covers the same region of
sequence will be digested with the same six-base cutter to match
the fragments generated from the test DNA. The biotinylated genomic
fragments will be heat denatured in the presence of the BAC
reference DNA and slowly annealed to generate biotinylated
heterohybrids (FIG. 11). The reference BAC DNA is in large excess
to the genomic DNA so the majority of biotinylated products will be
heteroduplexes. The biotinylated DNA can then be attached to the
surface for removal of the reference DNA. Residual DNA is washed
away before the addition of the mismatch endonuclease. After
cleavage, each fragment can bind an adapter at each end and enter
the mismatch circle selection process as outlined in FIG. 13 and
section 4.3.2
[0425] In addition. It may be possible to use mismatch cleavage of
DNA nanoball probes and hybridized target to identify single base
mutations. Cleaved mismatch hybrids could be identified through
detection of the newly formed DNA ends at the cleavage site by end
specific labeling.
[0426] 4.3. Circle Formation from Mismatch Cleavage Products 4.3.1.
Method
[0427] The heteroduplexes generated in section 4.2.1. can be used
for selection of small DNA circles. In this process the sample is
treated with the mismatch enzyme to create products cleaved on both
strands surrounding the mutation site (FIG. 12). T7 endonuclease I
or similar enzyme will cleave 5-prime of the mutation site to
reveal a 5-prime overhang of varying length on both strands
surrounding the mutation. The next phase is to capture the cleaved
products into a form suitable for amplification and sequencing. An
adapter (A) is ligated to the overhang produced by the mismatch
cutting, but because the nature of the overhang is unknown, at
least three adapters will be needed and each adapter will be
synthesized with degenerate bases to accommodate all possible ends.
The adapter can be prepared with an internal biotin on the
non-circularizing strand to allow capture for buffer exchange and
sample cleanup, and also for direct amplification on the surface if
desired.
[0428] Because the intervening sequence between mutations does not
need to be sequenced and reduces the sequencing capacity of the
system it will be removed when studying genomic derived samples.
Reduction of sequence complexity will utilize a type I is enzyme
that cuts the DNA at a point away from the enzyme recognition
sequence. In doing so, the cut site and resultant overhangs will be
a combination of all base variants. A possible enzyme to use in
this case is MmeI (20 bases with 2 base 3' overhang) or Eco P151
(with 25 bases and 2 base 5' overhang). The adapter will be about
50 by in length to provide sequences for initiation of rolling
circle amplification and also provide stuffer sequence for circle
formation. Once the adapter has been ligated to the fragment the
DNA is digested with the type Its restriction enzyme to release all
but 20-25 bases of sequence containing the mutation site that
remains attached to the adapter.
[0429] The adapter (A)-DNA fragment can now be attached to a
streptavidin support for removal of excess fragment DNA. Excess
adapter that did not ligate to mismatch cleaved ends will also bind
to the streptavidin solid support. The new degenerate end created
by the type Its enzyme can now be ligated to adapter B through the
phosphorylation of one strand of adapter B. The other strand is
non-phosphorylated and blocked at the 3-prime end with a dideoxy
nucleotide. The structure formed is essentially the genomic
fragment of interest captured between two different adapters. To
create a circle from this structure would simply require both ends
of the molecule coming together and ligating. Although this event
should happen efficiently, there is also the possibility that the
end of an alternative molecule could ligate at the other end of the
molecule creating a dimer molecule, or greater multiples of each
unit molecule. One way to minimize this is to perform the ligation
under dilute conditions so only intra-molecular ligation is
favored, then re-concentrating the sample for future steps. An
alternative and preferred strategy to maximize the efficiency of
circle formation without inter-molecular ligation events occurring
is to block excess (A) adapters on the surface. This can be
achieved by using Lambda exonuclease to digest the lower strand. If
adapter B has been attached then it will be protected from
digestion because there is no 5-prime phosphate available. If only
adapter A is attached to the surface then the 5-prime phosphate is
exposed for degradation of the lower strand of adapter A. This will
lead to loss of excess adapter A from the surface.
[0430] After lambda exonuclease treatment the 5 prime end of the
top strand of adapter A is prepared for ligation to the 3-prime end
of adapter B. This can be achieved by introducing a restriction
enzyme site into the adapters so that re-circularization of the
molecule can occur with ligation.
[0431] Amplification of DNA captured into the circular molecules
proceeds by a rolling circle amplification to form long linear
concatemer copies of the circle. If extension initiates 5-prime of
the biotin, the circle and newly synthesized strand is released
into solution. Complementary oligonucleotides on the surface are
responsible for condensation and provide sufficient attachment for
downstream applications. One strand is a closed circle and acts as
the template. The other strand, with an exposed 3-prime end, acts
as an initiating primer and is extended.
[0432] 4.3.2. Method II
[0433] This is similar to the procedure above with the following
modifications as shown in FIG. 21.
[0434] 1) The adapter can be prepared with a 3-prime biotin on the
non-circularized strand to allow capture for buffer exchange and
sample cleanup.
[0435] 2) Reduction of sequence complexity of the 10 kb
heteroduplex fragments described in section 4.2.2 occurs through
the use of 4-base cutting restriction enzymes. Use of 2 or 3
enzymes in the one reaction could reduce the genomic fragment size
down to about 100 bases
[0436] The adapter-DNA fragment can be attached to a streptavidin
support for removal of excess fragment DNA. Excess adapter that did
not ligate to mismatch cleaved ends will also bind to the
streptavidin solid support. The biotinylated and phosphorylated
strand can now be removed by lambda exonuclease which will degrade
from the 5-prime end but leave the non-phosphorylated strand
intact. To create a circle from this structure now requires both
ends of the molecule coming together and ligating to form the
circle.
[0437] Several approaches are available to form the circle using a
bridging oligonucleotide. A polynucleotide can be added to the
3-prime end with terminal transferase to create a sequence for one
half of the bridge to hybridize to. The other half will bind to
sequences in the adapter. Alternatively, before addition of the
exonuclease, an adapter can be added to the end generated by the
4-base cutter which will provide sequence for the bridge to
hybridize to after removal of one strand by exonuclease. A key
aspect of this selection procedure is the ability to select the
strand for circularization and amplification. This ensures that
only the strand with the original mutation (from the 5-prime
overhang) and not the strand from the adapter is amplified. If the
3-prime recessed strand was amplified then a mismatch from the
adapter could create a false base call at the site of or near to
the mutation.
[0438] Amplification of DNA captured into the circular molecules
proceeds by a rolling circle amplification to form linear
concatemer copies of the circle.
[0439] 4.3.3. Alternative Applications of Mis-Match Derived
Circles
[0440] The mis-match derived small circular DNA molecules may be
amplified by other means such as PCR. Common primer binding sites
can be incorporated into the adapter sequences The amplified
material can be used for mutation detection by methods such as
Sanger sequencing or array based sequencing.
[0441] 4.4. Cell-Free Clonal Selection of cDNAs
[0442] Traditional methods of cloning have several drawbacks
including the propensity of bacteria to exclude sequences from
plasmid replication and the time consuming and reagent-intensive
protocols required to generate clones of individual cDNA molecules.
We have previously demonstrated the ability to create linear
single-stranded amplifications of DNA molecules that have been
closed into a circular form. These large concatemeric, linear forms
arise from a single molecule and can act as efficient, isolated
targets for PCR when separated into a single reaction chamber, in
much the same way a bacterial colony is picked to retrieve the cDNA
containing plasmid. We plan to develop this approach as a means to
select cDNA clones without having to pass through a cell-based
clonal selection step.
[0443] The first step of this procedure will involve ligating a
gene specific oligonucleotide directed to the 5-prime end with a
poly dA sequence for binding to the poly dT sequence of the 3-prime
end of the cDNA. This oligonucleotide acts as a bridge to allow T4
DNA ligase to ligate the two ends and form a circle.
[0444] The second step of the reaction is to use a primer, or the
bridging oligonucleotide, for a strand displacing polymerase such
as Phi 29 polymerase to create a concatemer of the circle. The long
linear molecules will then be diluted and arrayed in 1536 well
plates such that wells with single molecules can be selected. To
ensure about 10% of the wells contain 1 molecule approximately 90%
would have to be sacrificed as having no molecules. To detect the
wells that are positive we plan to hybridize a dendrimer that
recognizes a universal sequence in the target to generate 10K-100K
dye molecules per molecule of target. Excess dendrimer could be
removed through hybridization to biotinylated capture oligos. The
wells will be analyzed with a fluorescent plate reader and the
presence of DNA scored. Positive wells will then be re-arrayed to
consolidate the clones into plates with complete wells for further
amplification
[0445] 4.5. Exon Profiling Using Probe Pools 4.5.1. Process
Overview
[0446] The challenge on splice variant profiling remains on finding
technologies that are able to probe the presence of exons on
separated cDNA molecules efficiently and rapidly. The system
proposed in this project allows millions of individual cDNA
molecules to be arrayed and probed in parallel. Together with a
carefully designed pooling scheme of short probes from a universal
set, high-throughput and low cost characterization of the splicing
pattern of the whole transcript should be achieved. Main steps in
the proposed process are:
[0447] Prepare full length first strand cDNA for targeted or all
mRNAs
[0448] Circularize the generated full length (or all) first strand
cDNA molecules by incorporating an adapter sequence;
[0449] By using primer complementary to the adapter sequence
perform rolling circle replication (RCR) of cDNA circles to form
concatemers with over 100 copies of initial cDNA [0450] Prepare
random arrays by attaching RCR produced "cDNA balls" to glass
surface coated with capture oligonucleotide complementary to a
portion of the adapter sequence; with an advanced submicron
patterned surface one mm.sup.2 can have between 1-10 million cDNA
spots; note that the attachment is a molecular process and does not
require robotic spotting of individual "cDNA balls". [0451]
Starting from pre-made universal libraries of 4096 6-mers and 1024
labeled 5-mers, use a sophisticated computer program and a simple
robotic pipetor to create 40-80 pools of about 200 6-mers and 20
5-mers for testing all 10,000 or more exons in targeted 1000 or
more up to all known genes in the sample organism/tissue. [0452] In
a 4-8 hour process, hybridize/ligate all probe pools in 40-80
cycles on the same random array using an automated microscope-like
instrument with a sensitive 10-mega pixel CCD detector for
generating an array image for each cycle. [0453] Use a
sophisticated computer program to perform spot signal intensity
analysis to identify which cDNA is on which spot, and if any of the
expected exons is missing in any of the analyzed genes. Obtain
exact expression levels for each splice variant by counting
occurrences in the array.
[0454] 4.5.2. Advantages of Studying Alternative Splicing Using
Random Arrays
[0455] This system provides a complete analysis of the exon pattern
on a single transcript, instead of merely providing information on
the ratios of exon usage or quantification of splicing events over
the entire population of transcribed genes using the current
expression arrays hybridized with labeled mRNA/cDNA. At the maximum
limit of its sensitivity, it should be able to allow a detailed
analysis down to a single molecule of a mRNA type present in only
one in hundreds of other cells; this would provide unique
potentials for early diagnosis of cancer cells.
[0456] The combination of selective cDNA preparation with an "array
of random arrays" in a standard 384-well format and with "smart"
pools of universal short probes provides great flexibility in
designing assays; for examples, deep analysis of a small number of
genes in selected samples, or more general analysis in a larger
number of samples, or analysis of a large number of genes in
smaller number of samples.
[0457] The analysis provides simultaneously 1) detection of each
specific splice variant, 2) quantification of expression of wild
type and alternatively spliced mRNAs. It can also be used to
monitor gross chromosomal alterations based on the detection of
gene deletions and gene translocations by loss of heterozigosity
and presence of two sub-sets of exons from two genes in the same
transcript on a single spot on the random array.
[0458] The exceptional capacity and informativeness of this assay
is coupled with simple sample preparation from very small
quantities of mRNA, fully-automated assay based on all pre-made,
validated reagents including libraries of universal labeled and
unlabeled probes and primers/adapters that will be ultimately
developed for all human and model organism genes.
[0459] The proposed splice variant profiling process is equivalent
to high throughput sequencing of individual full length cDNA
clones; rSBH throughput can reach one billion cDNA molecules
profiled in a 4-8 hour assay.
[0460] This system will provide a powerful tool to monitor changes
in expression levels of various splice variants during disease
emergence and progression. It can enable discovery of novel splice
variants or validate known splice variants to serve as biomarkers
to monitor cancer progression. It can also provide means to further
understanding the roles of alternative splice variants and their
possible uses as therapeutic targets. Universal nature and
flexibility of this low cost and high throughput assay provides
great commercial opportunities for cancer research and diagnostics
and in all other biomedical areas. This high capacity system is
ideal for service providing labs or companies.
[0461] 4.5.3. Preparation of Templates for In Vitro
Transcription
[0462] Exon sequences will be cloned into the multiple cloning
sites (MCS) of plasmid pBluescript. For the purposes of
demonstrating the usefulness of the probe pools, it is not
necessary to clone the contiguous full-length sequence, nor to
maintain the proper protein coding frame. For genes that are
shorter than 1 kb, it should not be difficult to generate PCR
products from cDNA using gene specific oligos for the full length
sequence. For longer genes, the easiest approach would be to
generate PCR products of about 500 by corresponding to contiguous
block of exons and ordered the fragments by cloning into
appropriate cloning sites in the MCS of pBluescript. This will also
be the approach for cloning the alternative spliced versions, since
the desired variant might not be present in the cDNA source used
for PCR.
[0463] The last site of the MCS will be used to insert a string of
40 A's to simulate the polyA tails of cellular mRNA. This is to
control for the possibility that the polyA tail might interfere
with the sample preparation step described below, although it is
not expected to be a problem since a poly-dA tail is actually
incorporated into our standard methods for the sample preparation
of genomic fragments as described in section C.
[0464] Generation of in vitro transcripts will be straight forward.
The plasmid will be linearized, T7 RNA polymerase will be used to
generate the run-off transcripts and the RNA generated will be
purified with the standard methods.
[0465] 4.5.4. Preparation of Samples for Arraying
[0466] Because the probe pools are designed for specific genes,
cDNA will be prepared for those specific genes only. For priming
the reverse transcription reactions, gene-specific primers will be
used, therefore for 1000 genes, 1000 primers will be used.
[0467] The location of the priming site for the reverse
transcription will be selected with care, since it is not
reasonable to expect the synthesis of cDNA >2 kb to be of high
efficiency. It is quite common that the last exon would consist of
the end of the coding sequence and a long 3' untranslated region.
In the case of CD44 for example, although the full-length mRNA is
about 5.7 kb, the 3' UTR comprises of 3 kb, while the coding region
is only 2.2 kb. Therefore the logical location of the reverse
transcription primer site would be immediately downstream of the
end of the coding sequence. For some splice variants, the
alternative exons are often clustered together as a block to create
a region of variability. In the case of Tenascin C variants (8.5
kb), the most common isoform has a block of 8 extra exons, and
there is evidence to suggest that there is variability in exon
usage in that region(12). So for Tenascin C, the primer will be
located just downstream of that region. Because of the concern of
synthesizing cDNA with length >2 kb, for long genes, it might be
necessary to divide the exons into blocks of 2 kb with multiple
primers. Even though we will lose information on correlating splice
events that are apart on the same transcripts, it is still better
than generating biases of over-representing 3' exons.
[0468] There are many off-the-shelf reagents for the reverse
transcription reactions. The SuperScript III system from Invitrogen
(Carlsbad, Calif.) and the StrataScript system from Stratagene (La
Jolla, Calif.) being two of them. Once single stranded cDNA
molecules are produced, the rest of the procedures involved putting
on the adaptor sequence, circularization of the molecule and RCR.
All of these had been extensive tested in previous development for
rSBH processes and will follow protocols developed and described in
earlier sections. The 5' ends of the cDNAs are basically the
incorporated gene-specific primers used for initiating the reverse
transcription. By incorporating a 7 base universal tag on the 5'
end of the reverse-transcription priming oligos, all the cDNA
generated will carry the same 7 base sequence at the 5' end. Thus a
single template oligo that is complementary to both the adaptor
sequence and the universal tag can be used to ligate the adaptor to
all the target molecules, without using the template oligo with
degenerate bases. As for the 3' end of the cDNA (5' end of the
mRNA) which is usually ill-defined, it will be treated like a
random sequence end of a genomic fragment. Similar methods of
adding a polyA tail will be applied, thus the same circle closing
reaction will also be used.
[0469] Reverse transcriptases are prone to terminate prematurely to
create truncated cDNAs. Severely truncated cDNAs probably will not
have enough probe binding sites to be identified with a gene
assignment, thus would not be analyzed. For those cDNA molecules
that are close, but not quite full-length, will show up as splice
variant with missing 5' exons--if there are no collaborating
evidence from sequence database to support such variants, they will
be discounted. A way to avoid such problem is to select for only
the full-length cDNA (or those with the desired 3' end) to be
compatible with circle closing reaction, then any truncated
molecules will not be circularized nor replicated. First a
dideoxy-cytosine residue can be added to the 3' end of all the cDNA
to block ligation, then by using a mismatch oligo targeting the
desired sequence, a new 3' end can be generated by enzyme mismatch
cleavage using T4 endonuclease VII (13, 14). With the new 3' end,
the cDNA can proceed with the adding a poly-dA tail and with the
standard protocols of circularization and replication.
[0470] The rolling circle replication will initiate from a oligo
priming at the adaptor sequence, and the replication around the
circular cDNA molecule will be carried out by Phi29 polymerase
whose high processivity allow many tandem copies to be made from
circular templates.
[0471] 4.5.5. Smart Pooling Scheme for Exon Probes
[0472] Theoretically to probe for 10000 exons from 1000 genes on a
single array would require 10000 specific probes and 10000 cycles
of hybridization. However, through a combination of the use of
combinatorial probe ligation techniques developed for the HyChip
platform, and a judicious pooling scheme of the probes, the number
oligo probes actually required is significantly less, while the
number of hybridization cycle required would be less than 40.
[0473] The exon probe will actually consist of a pair of oligos
ligated together upon hybridization to the target (see description
on combinatorial probe ligation chemistry). One of the pair will be
selected from a library of 4096 6 mer oligos and the other will be
from a library of 1024 TAMRA-labeled 5 mer oligos.
[0474] A software program can be developed for preparing optimized
pools of 6-mer and 5-mer probes for a given set of 1000 genes and
about 10,000 exons. The goal is to keep the number of individual
probes in a pool that will detect 500 genes and one exon per gene
to be less than 200. The algorithm will consist of two main
steps:
[0475] Step 1: Select 1000-2000 shortest exons (total about 20-50
kb), and find out matching sequences for each of 1024 available
labeled 5-mers. On average each 5-mer will occur 20 times over 20
kb, but some may occur over 50 or over 100 times. By selecting the
most frequent 5-mer, the largest number of short exons will be
detected with the single labeled probe. A goal would be to detect
about 50-100 short exons (10%-20% of 500 exons) per cycle. Thus
less than 10 labeled probes and 50-100 unlabeled 6-mers would be
sufficient. Small number of labeled probes is favorable because it
minimizes overall fluorescent background.
[0476] Step 2. Find out all 6-mers that are contiguous with all
sites in all 1000 genes that are complementary to 10 selected
5-mers. On average 20 such sites will exist in each 2 kb gene.
Total number of sites would be about 20,000, eg, each 6-mer on
average will occur 5 times. Sort 6-mers by the hit frequency. The
most frequent may have over 20 hits, e.g. such 6-mer will detect 20
genes through combinations with 10 labeled probes. Thus, to get a
single probe pair for each of the 500 genes a minimum of 25 6-mer
probes would be required. Realistically, 100 to 200 6-mers may be
required.
[0477] Due to benefits of combinatorial ligation that uses pre-made
libraries of 6-mer and 5-mer probes we can quickly prepare 40 probe
pools with about 200 probes per pool using established pipetting
robotics. Because the information generated is equivalent to having
over 3 probes per exon (see later), therefore the use of 8000 5mers
and 6 mers effectively replaces the 30,000 longer exons specific
probes required for a single set of 1000 genes. Universal short
probe libraries would be sufficient to prepare pool sets for
hundreds of projects examining a different specific selection of a
1000-gene set in thousands of samples.
[0478] 4.5.6. Exon Profiling
[0479] The profiling of exons can be performed in two phases: the
gene identification phase and the exon identification phase. In the
gene identification phase, each concatemer on the array can be
uniquely identified with a particular gene. In theory, 10 probe
pools or hybridization cycles will be enough to identify 1000 genes
using the following scheme. Each gene is assigned a unique binary
code. The number of binary digits thus depends on the total number
of genes: 3 digits for 8 genes, 10 digits for 1024 genes. Each
probe pool is designed to correspond to a digit of the binary code
and would contain probes that would hit a unique combination of
half of the genes and one hit per gene only. Thus for each
hybridization cycle, an unique half of the genes will score a 1 for
that digit and the other half will score zero. Ten hybridization
cycles with 10 probe pools will generate 1024 unique binary codes,
enough to assign 1000 unique genes to all the concatemers on the
array. To provide redundancy in the identification data, 15-20
cycles would be used. If 20 cycles are used, it would provide 1
million unique binary codes and there should be enough information
to account for loss of signals due to missing exons or gene
deletions. It will also be equivalent to having 10 data points per
gene (20 cycles of 500 data point each give 10,000 data points
total), or one positive probe-pair per exon, on average. At this
point after 20 cycles, this system is capable of making assignment
of 1 million unique gene identities to the ampliots. Therefore by
counting gene identities of the ampliots, one can determine
quantitatively the expression level of all the genes (but not
sub-typing of splice variants) in any given samples.
[0480] After identifying each ampliot with a gene assignment, its
exon pattern will be profiled in the exon identification phase. For
the exon identification phase, one exon per gene in all or most of
the genes is tested per hybridization cycle. In most cases 10-20
exon identification cycles should be sufficient. Thus, in the case
of using 20 exon identification cycles we will obtain information
of 2 probes per each of 10 exons in each gene. For genes with more
than 20 exons, methods can be developed so that 2 exons per gene
can be probed at the same cycle. One possibility is using multiple
fluorophores of different colors, and another possibility is to
exploit differential hybrid stabilities of different ligation probe
pairs.
[0481] In conclusion, a total of about 40 assay cycles will provide
sufficient information to obtain gene identity at each spot and to
provide three matching probe-pairs for each of 10,000 exons with
enough informational redundancy to provide accurate identification
of missing exons due to alternative splicing or chromosomal
deletions.
5. LITERATURE CITED
[0482] 1. Dahl, F., Baner, J., Gullberg, M., Mendel-Hartvig, M.,
Landegren, U., and Nilsson, M. 2004. Circle-to-circle amplification
for precise and sensitive DNA analysis. Proc Natl Acad Sci USA
101:4548-4553. [0483] 2. Kwak, S. K., Lee, G. S., Ahn, D. J., and
Choi, J. W. 2004. Pattern formation of cytochrome c by microcontact
printing and dip-pen nanolithography. Materials Science and
Engineering. C 24:151-155. [0484] 3. Drmanac, S., Stavropoulos, N.
A., Labat, I., Vonau, J., Hauser, B., Soares, M. B., and Drmanac,
R. 1996. Gene-representing cDNA clusters defined by hybridization
of 57,419 clones from infant brain libraries with short
oligonucleotide probes. Genomics 37:29-40. [0485] 4. Tokunaga, M.,
Kitamura, K., Saito, K., Iwane, A. H., and Yanagida, T. 1997.
Single molecule imaging of fluorophores and enzymatic reactions
achieved by objective-type total internal reflection fluorescence
microscopy. Biochem Biophys Res Commun 235:47-53. [0486] 5. Blanco,
L., Bernad, A., Lazaro, J. M., Martin, G., Garmendia, C., and
Salas, M. 1989. Highly efficient DNA synthesis by the phage phi 29
DNA polymerase. Symmetrical mode of DNA replication. J Biol Chem
264:8935-8940. [0487] 6. Youil, R., Kemper, B. W., and Cotton, R.
G. 1995. Screening for mutations by enzyme mismatch cleavage with
T4 endonuclease VII. Proc Natl Acad Sci USA 92:87-91. [0488] 7.
Mashal, R. D., Koontz, J., and Sklar, J. 1995. Detection of
mutations by cleavage of DNA heteroduplexes with bacteriophage
resolvases. Nat Genet 9:177-183. [0489] 8. Babon, J. J., McKenzie,
M., and Cotton, R. G. 2003. The use of resolvases T4 endonuclease
VII and T7 endonuclease I in mutation detection. Mol Biotechnol
23:73-8 1. [0490] 9. Rossetti, S., Englisch, S., Bresin, E.,
Pignatti, P. F., and Turco, A. E. 1997. Detection of mutations in
human genes by a new rapid method: cleavage fragment length
polymorphism analysis (CFLPA). Mol Cell Probes 11:155-160. [0491]
10. Ellis, L. A., Taylor, G. R., Banks, R., and Baumberg, S. 1994.
MutS binding protects heteroduplex DNA from exonuclease digestion
in vitro: a simple method for detecting mutations. Nucleic Acids
Res 22:2710-2711. [0492] 11. Golz, S., Greger, B., and Kemper, B.
1998. Enzymatic mutation detection. Phosphate ions increase
incision efficiency of endonuclease VII at a variety of damage
sites in DNA. Mutat Res 382:85-92. [0493] 12. Dueck, M., Riedl, S.,
Hinz, U., Tandara, A., Moller, P., Herfarth, C., and Faissner, A.
1999. Detection of tenascin-C isoforms in colorectal mucosa,
ulcerative colitis, carcinomas and liver metastases. In/nt J
Cancer. 477-483. [0494] 13. Youil, R., Kemper, B. W., and Cotton,
R. G. 1995. Screening for mutations by enzyme mismatch cleavage
with T4 endonuclease VII. In Proc Nat/Acad Sci USA. 87-91. [0495]
14. Mashal, R. D., Koontz, J., and Sklar, J. 1995. Detection of
mutations by cleavage of DNA heteroduplexes with bacteriophage
resolvases. In Nat Genet. 177-183.
Definitions
[0496] Terms and symbols of nucleic acid chemistry, biochemistry,
genetics, and molecular biology used herein follow those of
standard treatises and texts in the field, e.g. Kornberg and Baker,
DNA Replication, Second Edition (W.H. Freeman, New York, 1992);
Lehninger, Biochemistry, Second Edition (Worth Publishers, New
York, 1975); Strachan and Read, Human Molecular Genetics, Second
Edition (Wiley-Liss, New York, 1999); Eckstein, editor,
Oligonucleotides and Analogs: A Practical Approach (Oxford
University Press, New York, 1991); Gait, editor, Oligonucleotide
Synthesis: A Practical Approach (IRL Press, Oxford, 1984); and the
like.
[0497] "Amplicon" means the product of a polynucleotide
amplification reaction. That is, it is a population of
polynucleotides, usually double stranded, that are replicated from
one or more starting sequences. The one or more starting sequences
may be one or more copies of the same sequence, or it may be a
mixture of different sequences. Amplicons may be produced by a
variety of amplification reactions whose products are multiple
replicates of one or more target nucleic acids. Generally,
amplification reactions producing amplicons are "template-driven"
in that base pairing of reactants, either nucleotides or
oligonucleotides, have complements in a template polynucleotide
that are required for the creation of reaction products. In one
aspect, template-driven reactions are primer extensions with a
nucleic acid polymerase or oligonucleotide ligations with a nucleic
acid ligase. Such reactions include, but are not limited to,
polymerase chain reactions (PCRs), linear polymerase reactions,
nucleic acid sequence-based amplification (NASBAs), rolling circle
amplifications, and the like, disclosed in the following references
that are incorporated herein by reference: Mullis et al, U.S. Pat.
Nos. 4,683,195; 4,965,188; 4,683,202; 4,800,159 (PCR); Gelfand et
al, U.S. Pat. No. 5,210,015 (real-time PCR with "taqman" probes);
Wittwer et al, U.S. Pat. No. 6,174,670; Kacian et al, U.S. Pat. No.
5,399,491 ("NASBA"); Lizardi, U.S. Pat. No. 5,854,033; Aono et al,
Japanese patent publ. JP 4-262799 (rolling circle amplification);
and the like. In one aspect, amplicons of the invention are
produced by PCRs. An amplification reaction may be a "real-time"
amplification if a detection chemistry is available that permits a
reaction product to be measured as the amplification reaction
progresses, e.g. "real-time PCR" described below, or "real-time
NASBA" as described in Leone et al, Nucleic Acids Research, 26:
2150-2155 (1998), and like references. As used herein, the term
"amplifying" means performing an amplification reaction. A
"reaction mixture" means a solution containing all the necessary
reactants for performing a reaction, which may include, but not be
limited to, buffering agents to maintain pH at a selected level
during a reaction, salts, co-factors, scavengers, and the like.
[0498] "Complementary or substantially complementary" refers to the
hybridization or base pairing or the formation of a duplex between
nucleotides or nucleic acids, such as, for instance, between the
two strands of a double stranded DNA molecule or between an
oligonucleotide primer and a primer binding site on a single
stranded nucleic acid. Complementary nucleotides are, generally, A
and T (or A and U), or C and G. Two single stranded RNA or DNA
molecules are said to be substantially complementary when the
nucleotides of one strand, optimally aligned and compared and with
appropriate nucleotide insertions or deletions, pair with at least
about 80% of the nucleotides of the other strand, usually at least
about 90% to 95%, and more preferably from about 98 to 100%.
Alternatively, substantial complementarity exists when an RNA or
DNA strand will hybridize under selective hybridization conditions
to its complement. Typically, selective hybridization will occur
when there is at least about 65% complementary over a stretch of at
least 14 to 25 nucleotides, preferably at least about 75%, more
preferably at least about 90% complementary. See, M. Kanehisa
Nucleic Acids Res. 12:203 (1984), incorporated herein by
reference.
[0499] "Duplex" means at least two oligonucleotides and/or
polynucleotides that are fully or partially complementary undergo
Watson-Crick type base pairing among all or most of their
nucleotides so that a stable complex is formed. The terms
"annealing" and "hybridization" are used interchangeably to mean
the formation of a stable duplex. "Perfectly matched" in reference
to a duplex means that the poly- or oligonucleotide strands making
up the duplex form a double stranded structure with one another
such that every nucleotide in each strand undergoes Watson Crick
basepairing with a nucleotide in the other strand. The term
"duplex" comprehends the pairing of nucleoside analogs, such as
deoxyinosine, nucleosides with 2-aminopurine bases, PNAs, and the
like, that may be employed. A "mismatch" in a duplex between two
oligonucleotides or polynucleotides means that a pair of
nucleotides in the duplex fails to undergo Watson-Crick
bonding.
[0500] "Genetic locus," or "locus" in reference to a genome or
target polynucleotide, means a contiguous subregion or segment of
the genome or target polynucleotide. As used herein, genetic locus,
or locus, may refer to the position of a nucleotide, a gene, or a
portion of a gene in a genome, including mitochondrial DNA, or it
may refer to any contiguous portion of genomic sequence whether or
not it is within, or associated with, a gene. In one aspect, a
genetic locus refers to any portion of genomic sequence, including
mitochondrial DNA, from a single nucleotide to a segment of few
hundred nucleotides, e.g. 100-300, in length.
[0501] "Genetic variant" means a substitution, inversion,
insertion, or deletion of one or more nucleotides at genetic locus,
or a translocation of DNA from one genetic locus to another genetic
locus. In one aspect, genetic variant means an alternative
nucleotide sequence at a genetic locus that may be present in a
population of individuals and that includes nucleotide
substitutions, insertions, and deletions with respect to other
members of the population. In another aspect, insertions or
deletions at a genetic locus comprises the addition or the absence
of from 1 to 10 nucleotides at such locus, in comparison with the
same locus in another individual of a population.
[0502] "Hybridization" refers to the process in which two
single-stranded polynucleotides bind non-covalently to form a
stable double-stranded polynucleotide. The term "hybridization" may
also refer to triple-stranded hybridization. The resulting
(usually) double-stranded polynucleotide is a "hybrid" or "duplex."
"Hybridization conditions" will typically include salt
concentrations of less than about 1M, more usually less than about
500 mM and less than about 200 mM. A "hybridization buffer" is a
buffered salt solution such as 5.times.SSPE, or the like.
Hybridization temperatures can be as low as 5.degree. C., but are
typically greater than 22.degree. C., more typically greater than
about 30.degree. C., and preferably in excess of about 37.degree.
C. Hybridizations are usually performed under stringent conditions,
i.e. conditions under which a probe will hybridize to its target
subsequence. Stringent conditions are sequence-dependent and are
different in different circumstances. Longer fragments may require
higher hybridization temperatures for specific hybridization. As
other factors may affect the stringency of hybridization, including
base composition and length of the complementary strands, presence
of organic solvents and extent of base mismatching, the combination
of parameters is more important than the absolute measure of any
one alone. Generally, stringent conditions are selected to be about
5.degree. C. lower than the Tm for the specific sequence at s
defined ionic strength and pH. Exemplary stringent conditions
include salt concentration of at least 0.01 M to no more than 1 M
Na ion concentration (or other salts) at a pH 7.0 to 8.3 and a
temperature of at least 25.degree. C. For example, conditions of
5.times.SSPE (750 mM NaC1, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4)
and a temperature of 25-30.degree. C. are suitable for
allele-specific probe hybridizations. For stringent conditions, see
for example, Sambrook, Fritsche and Maniatis. "Molecular Cloning A
laboratory Manual" 2nd Ed. Cold Spring Harbor Press (1989) and
Anderson "Nucleic Acid Hybridization" 1st Ed., BIOS Scientific
Publishers Limited (1999), which are hereby incorporated by
reference in its entirety for all purposes above. "Hybridizing
specifically to" or "specifically hybridizing to" or like
expressions refer to the binding, duplexing, or hybridizing of a
molecule substantially to or only to a particular nucleotide
sequence or sequences under stringent conditions when that sequence
is present in a complex mixture (e.g., total cellular) DNA or
RNA.
[0503] "Ligation" means to form a covalent bond or linkage between
the termini of two or more nucleic acids, e.g. oligonucleotides
and/or polynucleotides, in a template-driven reaction. The nature
of the bond or linkage may vary widely and the ligation may be
carried out enzymatically or chemically. As used herein, ligations
are usually carried out enzymatically to form a phosphodiester
linkage between a 5' carbon of a terminal nucleotide of one
oligonucleotide with 3' carbon of another oligonucleotide. A
variety of template-driven ligation reactions are described in the
following references, which are incorporated by reference: Whitely
et al, U.S. Pat. No. 4,883,750; Letsinger et al, U.S. Pat. No.
5,476,930; Fung et al, U.S. Pat. No. 5,593,826; Kool, U.S. Pat. No.
5,426,180; Landegren et al, U.S. Pat. No. 5,871,921; Xu and Kool,
Nucleic Acids Research, 27: 875-881 (1999); Higgins et al, Methods
in Enzymology, 68: 50-71 (1979); Engler et al, The Enzymes, 15:
3-29 (1982); and Namsaraev, U.S. patent publication 2004/0110213.
Enzymatic ligation usually takes place in a ligase buffer, which is
a buffered salt solution containing any required divalent cations,
cofactors, and the like, for the particular ligase employed.
[0504] "Microarray" or "array" refers to a solid phase support
having a surface, usually planar or substantially planar, which
carries an array of sites containing nucleic acids, such that each
member site of the array comprises identical copies of immobilized
oligonucleotides or polynucleotides and is spatially defined and
not overlapping with other member sites of the array; that is, the
sites are spatially discrete. In some cases, sites of a microarray
may also be spaced apart as well as discrete; that is, different
sites do not share boundaries, but are separated by inter-site
regions, usually free of bound nucleic acids. Spatially defined
hybridization sites may additionally be "addressable" in that its
location and the identity of its immobilized oligonucleotide are
known or predetermined, for example, prior to its use. In some
aspects, the oligonucleotides or polynucleotides are single
stranded and are covalently attached to the solid phase support,
usually by a 5'-end or a 3'-end. In other aspects, oligonucleotides
or polynucleotides are attached to the solid phase support
non-covalently, e.g. by a biotin-streptavidin linkage,
hybridization to a capture oligonucleotide that is covalently
bound, and the like. Conventional microarray technology is reviewed
in the following references: Schena, Editor, Microarrays: A
Practical Approach (IRL Press, Oxford, 2000); Southern, Current
Opin. Chem. Biol., 2: 404-410 (1998); Nature Genetics Supplement,
21: 1-60 (1999). As used herein, "random array" or "random
microarray" refers to a microarray whose spatially discrete regions
of oligonucleotides or polynucleotides are not spatially addressed.
That is, the identity of the attached oligonucleoties or
polynucleotides is not discernable, at least initially, from its
location, but may be determined by a particular operation on the
array, e.g. sequencing, hybridizing decoding probes, or the like.
Random microarrays are frequently formed from a planar array of
microbeads, e.g. Brenner et al, Nature Biotechnology, 18: 630-634
(2000); Tulley et al, U.S. Pat. No. 6,133,043; Stuelpnagel et al,
U.S. Pat. No. 6,396,995; Chee et al, U.S. Pat. No. 6,544,732; and
the like.
[0505] "Mismatch" means a base pair between any two of the bases A,
T (or U for RNA), G, and C other than the Watson-Crick base pairs
G-C and A-T. The eight possible mismatches are A-A, T-T, G-G, C-C,
T-G, C-A, T-C, and A-G.
[0506] "Mutation" and "polymorphism" are usually used somewhat
interchangeably to mean a DNA molecule, such as a gene, that
differs in nucleotide sequence from a reference DNA sequence, or
wild type sequence, or normal tissue sequence, by one or more
bases, insertions, and/or deletions. In some contexts, the usage of
Cotton (Mutation Detection, Oxford University Press, Oxford, 1997)
is followed in that a mutation is understood to be any base change
whether pathological to an organism or not, whereas a polymorphism
is usually understood to be a base change with no direct
pathological consequences.
[0507] "Nucleoside" as used herein includes the natural
nucleosides, including 2'-deoxy and 2'-hydroxyl forms, e.g. as
described in Kornberg and Baker, DNA Replication, 2nd Ed. (Freeman,
San Francisco, 1992). "Analogs" in reference to nucleosides
includes synthetic nucleosides having modified base moieties and/or
modified sugar moieties, e.g. described by Scheit, Nucleotide
Analogs (John Wiley, New York, 1980); Uhlman and Peyman, Chemical
Reviews, 90: 543-584 (1990), or the like, with the proviso that
they are capable of specific hybridization. Such analogs include
synthetic nucleosides designed to enhance binding properties,
reduce complexity, increase specificity, and the like.
Polynucleotides comprising analogs with enhanced hybridization or
nuclease resistance properties are described in Uhlman and Peyman
(cited above); Crooke et al, Exp. Opin. Ther. Patents, 6: 855-870
(1996); Mesmaeker et al, Current Opinion in Structural Biology, 5:
343-355 (1995); and the like. Exemplary types of polynucleotides
that are capable of enhancing duplex stability include
oligonucleotide N3'.fwdarw.P5' phosphoramidates (referred to herein
as "amidates"), peptide nucleic acids (referred to herein as
"PNAs"), oligo-2'-0-alkylribonucleotides, polynucleotides
containing C-5 propynylpyrimidines, locked nucleic acids (LNAs),
and like compounds. Such oligonucleotides are either available
commercially or may be synthesized using methods described in the
literature.
[0508] "Polymerase chain reaction," or "PCR," means a reaction for
the in vitro amplification of specific DNA sequences by the
simultaneous primer extension of complementary strands of DNA. In
other words, PCR is a reaction for making multiple copies or
replicates of a target nucleic acid flanked by primer binding
sites, such reaction comprising one or more repetitions of the
following steps: (i) denaturing the target nucleic acid, (ii)
annealing primers to the primer binding sites, and (iii) extending
the primers by a nucleic acid polymerase in the presence of
nucleoside triphosphates. Usually, the reaction is cycled through
different temperatures optimized for each step in a thermal cycler
instrument. Particular temperatures, durations at each step, and
rates of change between steps depend on many factors well-known to
those of ordinary skill in the art, e.g. exemplified by the
references: McPherson et al, editors, PCR: A Practical Approach and
PCR2: A Practical Approach (IRL Press, Oxford, 1991 and 1995,
respectively). For example, in a conventional PCR using Taq DNA
polymerase, a double stranded target nucleic acid may be denatured
at a temperature >90.degree. C., primers annealed at a
temperature in the range 50-75.degree. C., and primers extended at
a temperature in the range 72-78.degree. C. The term "PCR"
encompasses derivative forms of the reaction, including but not
limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR,
multiplexed PCR, and the like. Reaction volumes range from a few
hundred nanoliters, e.g. 200 nL, to a few hundred', IL, e.g. 200
pt. "Reverse transcription PCR," or "RT-PCR," means a PCR that is
preceded by a reverse transcription reaction that converts a target
RNA to a complementary single stranded DNA, which is then
amplified, e.g. Tecott et al, U.S. Pat. No. 5,168,038, which patent
is incorporated herein by reference. "Real-time PCR" means a PCR
for which the amount of reaction product, i.e. amplicon, is
monitored as the reaction proceeds. There are many forms of
real-time PCR that differ mainly in the detection chemistries used
for monitoring the reaction product, e.g. Gelfand et al, U.S. Pat.
No. 5,210,015 ("taqman"); Wittwer et al, U.S. Pat. Nos. 6,174,670
and 6,569,627 (intercalating dyes); Tyagi et al, U.S. Pat. No.
5,925,517 (molecular beacons); which patents are incorporated
herein by reference. Detection chemistries for real-time PCR are
reviewed in Mackay et al, Nucleic Acids Research, 30: 1292-1305
(2002), which is also incorporated herein by reference. "Nested
PCR" means a two-stage PCR wherein the amplicon of a first PCR
becomes the sample for a second PCR using a new set of primers, at
least one of which binds to an interior location of the first
amplicon. As used herein, "initial primers" in reference to a
nested amplification reaction mean the primers used to generate a
first amplicon, and "secondary primers" mean the one or more
primers used to generate a second, or nested, amplicon.
"Multiplexed PCR" means a PCR wherein multiple target sequences (or
a single target sequence and one or more reference sequences) are
simultaneously carried out in the same reaction mixture, e.g.
Bernard et al, Anal. Biochem., 273: 221-228 (1999) (two-color
real-time PCR). Usually, distinct sets of primers are employed for
each sequence being amplified.
[0509] "Quantitative PCR" means a PCR designed to measure the
abundance of one or more specific target sequences in a sample or
specimen. Quantitative PCR includes both absolute quantitation and
relative quantitation of such target sequences. Quantitative
measurements are made using one or more reference sequences that
may be assayed separately or together with a target sequence. The
reference sequence may be endogenous or exogenous to a sample or
specimen, and in the latter case, may comprise one or more
competitor templates. Typical endogenous reference sequences
include segments of transcripts of the following genes: f3-actin,
GAPDH, 132-microglobulin, ribosomal RNA, and the like. Techniques
for quantitative PCR are well-known to those of ordinary skill in
the art, as exemplified in the following references that are
incorporated by reference: Freeman et al, Biotechniques, 26:
112-126 (1999); Becker-Andre et al, Nucleic Acids Research, 17:
9437-9447 (1989); Zimmerman et al, Biotechniques, 21: 268-279
(1996); Diviacco et al, Gene, 122: 3013-3020 (1992); Becker-Andre
et al, Nucleic Acids Research, 17: 9437-9446 (1989); and the
like.
[0510] "Polynucleotide" or "oligonucleotide" are used
interchangeably and each mean a linear polymer of nucleotide
monomers. As used herein, the terms may also refer to double
stranded forms. Monomers making up polynucleotides and
oligonucleotides are capable of specifically binding to a natural
polynucleotide by way of a regular pattern of monomer-to-monomer
interactions, such as Watson-Crick type of base pairing, base
stacking, Hoogsteen or reverse Hoogsteen types of base pairing, or
the like, to form duplex or triplex forms. Such monomers and their
internucleosidic linkages may be naturally occurring or may be
analogs thereof, e.g. naturally occurring or non-naturally
occurring analogs. Non-naturally occurring analogs may include
PNAs, phosphorothioate internucleosidic linkages, bases containing
linking groups permitting the attachment of labels, such as
fluorophores, or haptens, and the like. Whenever the use of an
oligonucleotide or polynucleotide requires enzymatic processing,
such as extension by a polymerase, ligation by a ligase, or the
like, one of ordinary skill would understand that oligonucleotides
or polynucleotides in those instances would not contain certain
analogs of internucleosidic linkages, sugar moities, or bases at
any or some positions, when such analogs are incompatible with
enzymatic reactions. Polynucleotides typically range in size from a
few monomeric units, e.g. 5-40, when they are usually referred to
as "oligonucleotides," to several thousand monomeric units.
Whenever a polynucleotide or oligonucleotide is represented by a
sequence of letters (upper or lower case), such as "ATGCCTG," it
will be understood that the nucleotides are in 5'.fwdarw.3' order
from left to right and that "A" denotes deoxyadenosine, "C" denotes
deoxycytidine, "G" denotes deoxyguanosine, and "T" denotes
thymidine, "I" denotes deoxyinosine, "U" denotes uridine, unless
otherwise indicated or obvious from context. Unless otherwise noted
the terminology and atom numbering conventions will follow those
disclosed in Strachan and Read, Human Molecular Genetics 2
(Wiley-Liss, New York, 1999). Usually polynucleotides comprise the
four natural nucleosides (e.g. deoxyadenosine, deoxycytidine,
deoxyguanosine, deoxythymidine for DNA or their ribose counterparts
for RNA) linked by phosphodiester linkages; however, they may also
comprise non-natural nucleotide analogs, e.g. including modified
bases, sugars, or internucleosidic linkages. It is clear to those
skilled in the art that where an enzyme has specific
oligonucleotide or polynucleotide substrate requirements for
activity, e.g. single stranded DNA, RNA/DNA duplex, or the like,
then selection of appropriate composition for the oligonucleotide
or polynucleotide substrates is well within the knowledge of one of
ordinary skill, especially with guidance from treatises, such as
Sambrook et al, Molecular Cloning, Second Edition (Cold Spring
Harbor Laboratory, New York, 1989), and like references.
[0511] "Primer" means an oligonucleotide, either natural or
synthetic, that is capable, upon forming a duplex with a
polynucleotide template, of acting as a point of initiation of
nucleic acid synthesis and being extended from its 3' end along the
template so that an extended duplex is formed. The sequence of
nucleotides added during the extension process are determined by
the sequence of the template polynucleotide. Usually primers are
extended by a DNA polymerase. Primers usually have a length in the
range of from 9 to 40 nucleotides, or in some embodiments, from 14
to 36 nucleotides.
[0512] "Readout" means a parameter, or parameters, which are
measured and/or detected that can be converted to a number or
value. In some contexts, readout may refer to an actual numerical
representation of such collected or recorded data. For example, a
readout of fluorescent intensity signals from a microarray is the
position and fluorescence intensity of a signal being generated at
each hybridization site of the microarray; thus, such a readout may
be registered or stored in various ways, for example, as an image
of the microarray, as a table of numbers, or the like.
[0513] "Solid support", "support", and "solid phase support" are
used interchangeably and refer to a material or group of materials
having a rigid or semi-rigid surface or surfaces. In many
embodiments, at least one surface of the solid support will be
substantially flat, although in some embodiments it may be
desirable to physically separate synthesis regions for different
compounds with, for example, wells, raised regions, pins, etched
trenches, or the like. According to other embodiments, the solid
support(s) will take the form of beads, resins, gels, microspheres,
or other geometric configurations. Microarrays usually comprise at
least one planar solid phase support, such as a glass microscope
slide.
[0514] "Reference sequence" or "reference population" of DNA refers
to individual DNA sequences or a collection of DNAs (or RNAs
derived from it) which is compared to a test population of DNA or
RNA, (or "test DNA sequence," or "test DNA population") by the
formation of heteroduplexes between the complementary strands of
the reference DNA population and test DNA population. If perfectly
matched heteroduplexes form, then the respective members of the
reference and test populations are identical; otherwise, they are
variants of one another. Typically, the nucleotide sequences of
members of the reference population are known and the sequences
typically are listed in sequence databases, such as Genbank, Embl,
or the like. In one aspect, a reference population of DNA may
comprise a cDNA library or genomic library from a known cell type
or tissue source. For example, a reference population of DNA may
comprise a cDNA library or a genomic library derived from the
tissue of a healthy individual and a test population of DNA may
comprise a cDNA library or genomic library derived from the same
tissue of a diseased individual. Reference populations of DNA may
also comprise an assembled collection of individual
polynucleotides, cDNAs, genes, or exons thereof, e.g. genes or
exons encoding all or a subset of known p53 variants, genes of a
signal transduction pathway, or the like.
[0515] "Specific" or "specificity" in reference to the binding of
one molecule to another molecule, such as a labeled target sequence
for a probe, means the recognition, contact, and formation of a
stable complex between the two molecules, together with
substantially less recognition, contact, or complex formation of
that molecule with other molecules. In one aspect, "specific" in
reference to the binding of a first molecule to a second molecule
means that to the extent the first molecule recognizes and forms a
complex with another molecules in a reaction or sample, it forms
the largest number of the complexes with the second molecule.
Preferably, this largest number is at least fifty percent.
Generally, molecules involved in a specific binding event have
areas on their surfaces or in cavities giving rise to specific
recognition between the molecules binding to each other. Examples
of specific binding include antibody-antigen interactions,
enzyme-substrate interactions, formation of duplexes or triplexes
among polynucleotides and/or oligonucleotides, receptor-ligand
interactions, and the like. As used herein, "contact" in reference
to specificity or specific binding means two molecules are close
enough that weak noncovalent chemical interactions, such as Van der
Waal forces, hydrogen bonding, base-stacking interactions, ionic
and hydrophobic interactions, and the like, dominate the
interaction of the molecules.
[0516] As used herein, the term "Tm" is used in reference to the
"melting temperature." The melting temperature is the temperature
at which a population of double-stranded nucleic acid molecules
becomes half dissociated into single strands. Several equations for
calculating the Tm of nucleic acids are well known in the art. As
indicated by standard references, a simple estimate of the Tm value
may be calculated by the equation. Tm=81.5+0.41 (% G+C), when a
nucleic acid is in aqueous solution at 1 M NaC1 (see e.g., Anderson
and Young, Quantitative Filter Hybridization, in Nucleic Acid
Hybridization (1985). Other references (e.g., Allawi, H. T. &
SantaLucia, J., Jr., Biochemistry 36, 10581-94 (1997)) include
alternative methods of computation which take structural and
environmental, as well as sequence characteristics into account for
the calculation of Tm.
[0517] "Sample" usually means a quantity of material from a
biological, environmental, medical, or patient source in which
detection, measurement, or labeling of target nucleic acids is
sought. On the one hand it is meant to include a specimen or
culture (e.g., microbiological cultures). On the other hand, it is
meant to include both biological and environmental samples. A
sample may include a specimen of synthetic origin. Biological
samples may be animal, including human, fluid, solid (e.g., stool)
or tissue, as well as liquid and solid food and feed products and
ingredients such as dairy items, vegetables, meat and meat
by-products, and waste. Biological samples may include materials
taken from a patient including, but not limited to cultures, blood,
saliva, cerebral spinal fluid, pleural fluid, milk, lymph, sputum,
semen, needle aspirates, and the like. Biological samples may be
obtained from all of the various families of domestic animals, as
well as feral or wild animals, including, but not limited to, such
animals as ungulates, bear, fish, rodents, etc. Environmental
samples include environmental material such as surface matter,
soil, water and industrial samples, as well as samples obtained
from food and dairy processing instruments, apparatus, equipment,
utensils, disposable and non-disposable items. These examples are
not to be construed as limiting the sample types applicable to the
present invention.
[0518] The above teachings are intended to illustrate the invention
and do not by their details limit the scope of the claims of the
invention. While preferred illustrative embodiments of the present
invention are described, it will be apparent to one skilled in the
art that various changes and modifications may be made therein
without departing from the invention, and it is intended in the
appended claims to cover all such changes and modifications that
fall within the true spirit and scope of the invention.
Sequence CWU 1
1
38180DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotidemodified_base(1)..(8)a, c, g, or
tmodified_base(14)..(14)a, c, g, or tmodified_base(20)..(20)a, c,
g, or tmodified_base(26)..(26)a, c, g, or
tmodified_base(32)..(32)a, c, g, or tmodified_base(47)..(47)a, c,
g, or tmodified_base(53)..(53)a, c, g, or
tmodified_base(67)..(67)a, c, g, or tmodified_base(70)..(70)a, c,
g, or tmodified_base(73)..(80)a, c, g, or t 1nnnnnnnngc atancacgan
gtcatnatcg tncaaacgtc agtccangaa tcnagatcca 60cttagantgn cgnnnnnnnn
80250DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotide 2tatcatctgg atgttaggaa gacaaaagga
agctgaggac attaacggac 50315DNAArtificial SequenceDescription of
Artificial Sequence Synthetic oligonucleotide 3accttcagac cagat
15426DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotidemodified_base(1)..(8)a, c, g, or t
4nnnnnnnngt ccgttaatgt cctcag 26524DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotide5' Biotinmodified_base(16)..(24)a, c, g, or t
5atctggtctg aaggtnnnnn nnnn 24620DNAArtificial SequenceDescription
of Artificial Sequence Synthetic oligonucleotide5' Biotin
6cttttgtctt cctaacatcc 20716DNAArtificial SequenceDescription of
Artificial Sequence Synthetic oligonucleotide 7agatgataat ctggtc
16865DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotide 8tatcatctac tgcactgacc ggatgttagg
aagacaaaag gaagctgagg gtcacattaa 60cggac 65923DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotidemodified_base(1)..(7)a, c, g, or
tmodified_base(23)..(23)dideoxynucleotide 9nnnnnnngtc cgttaatgtg
acc 231017DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotidemodified_base(17)..(17)dideoxynucleotide
10agatgatatt ttttttc 171124DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 11tcagcttcct tttgtcttcc taac
241230DNAArtificial SequenceDescription of Artificial Sequence
Synthetic probe 12ggatgttagg aagacaaaag gaagctgagg
301375DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotidemodified_base(1)..(3)a, c, g, or
tmodified_base(9)..(9)a, c, g, or tmodified_base(15)..(15)a, c, g,
or tmodified_base(21)..(21)a, c, g, or tmodified_base(27)..(27)a,
c, g, or tmodified_base(42)..(42)a, c, g, or
tmodified_base(48)..(48)a, c, g, or tmodified_base(62)..(62)a, c,
g, or t 13nnngcatanc acgangtcat natcgtncaa acgtcagtcc angaatcnag
atccacttag 60antaaaaaaa aaaaa 751430DNAArtificial
SequenceDescription of Artificial Sequence Synthetic probe
14ggatgttagg aagacaaaag gaagctgagg 301511DNAArtificial
SequenceDescription of Artificial Sequence Synthetic probe
15cattaacgga c 111611DNAArtificial SequenceDescription of
Artificial Sequence Synthetic probe 16tgagcgattc g
111712DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotide 17acattaacgg ac 121811DNAArtificial
SequenceDescription of Artificial Sequence Synthetic probe
18ggtgtcatgg a 11196PRTArtificial SequenceDescription of Artificial
Sequence Synthetic 6xHis tag 19His His His His His His1
52010DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotide 20atcgatcgat 102110DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 21tagctagcta 1022155DNABacillus anthracis
22tcccaataca tatgagcgat tcgcctttat aaacgacgta ttcctttgaa ctcgttatga
60cactcattac tcaactcccc ttttctacta aaatagcgtt tttgtttggt ttttttcttc
120acataatccg tcctatttga tttttacata ccacc 15523275DNAYersinia
pestis 23tgtagccgct aagcactacc atcccctcaa ggttattgac ggtatcgagt
agggttaggt 60gggcatcatt gtccatttca tggcggtaat atcgggatga gataacgcgg
gtgtcatgga 120cgtatggcgg gtcaacaaaa tgaagcgttg aaactgtgtc
atggtctaac atgcattgga 180cggcatcacg attctctacc aaaacgccct
cgaatcgctg gccaactgct gccaagtttt 240caggcatcct tgcccaaagg
tgttgagctg ttgcc 2752425DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 24tcccaataca tatgagcgat tcgcc
252525DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 25ggtggtatgt aaaaatcaaa tagga 252624DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
26tgtagccgct aagcactacc atcc 242722DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
27ggcaacagct caacaccttt gg 222822DNAArtificial SequenceDescription
of Artificial Sequence Synthetic oligonucleotide 28attgggagtc
cgttaatgtg ac 222922DNAArtificial SequenceDescription of Artificial
Sequence Synthetic oligonucleotide 29ggctacagtc cgttaatgtg ac
223016DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotide 30agatgatagg tggtat 163116DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 31agatgatagg caacag 163268DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotidemodified_base(1)..(8)a, c, t, g, unknown or
othermodified_base(14)..(14)a, c, t, g, unknown or
othermodified_base(20)..(20)a, c, t, g, unknown or
othermodified_base(26)..(26)a, c, t, g, unknown or
othermodified_base(32)..(32)a, c, t, g, unknown or
othermodified_base(47)..(47)a, c, t, g, unknown or
othermodified_base(53)..(53)a, c, t, g, unknown or
othermodified_base(67)..(67)a, c, t, g, unknown or other
32nnnnnnnngc atancacgan gtcatnatcg tncaaacgtc agtccangaa tcnagatcca
60cttagant 683312DNAArtificial SequenceDescription of Artificial
Sequence Synthetic oligonucleotide 33acacacacac ac
123413DNAArtificial SequenceDescription of Artificial Sequence
Synthetic probemisc_feature(4)..(13)This region may encompass 6-10
bases 34bbbaaaaaaa aaa 133513DNAArtificial SequenceDescription of
Artificial Sequence Synthetic probemisc_feature(1)..(10)This region
may encompass 6-10 bases 35aaaaaaaaaa bbb 133620DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotidemisc_feature(1)..(20)This sequence may encompass
7-20 bases 36aaaaaaaaaa aaaaaaaaaa 203740DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 37aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa
403872DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotidemodified_base(6)..(6)a, c, t, g, unknown
or othermodified_base(12)..(12)a, c, t, g, unknown or
othermodified_base(18)..(18)a, c, t, g, unknown or
othermodified_base(24)..(24)a, c, t, g, unknown or
othermodified_base(39)..(39)a, c, t, g, unknown or
othermodified_base(45)..(45)a, c, t, g, unknown or
othermodified_base(59)..(59)a, c, t, g, unknown or other
38gcatancacg angtcatnat cgtncaaacg tcagtccang aatcnagatc cacttagant
60aaaaaaaaaa aa 72
* * * * *