Methods And Compositions For Nanostructure-based Nucleic Acid Sequencing Kotseroglou; Theofilos ; et al. [Eve Biomedical, Inc.]

Methods And Compositions For Nanostructure-based Nucleic Acid Sequencing

Kotseroglou; Theofilos ; et al.

Patent Application Summary

U.S. patent application number 15/666671 was filed with the patent office on 2018-01-11 for methods and compositions for nanostructure-based nucleic acid sequencing. The applicant listed for this patent is Eve Biomedical, Inc.. Invention is credited to Theofilos Kotseroglou, Stephanos Papademetriou.

Application Number	20180010181 15/666671
Document ID	/
Family ID	51351616
Filed Date	2018-01-11

United States Patent Application	20180010181
Kind Code	A1
Kotseroglou; Theofilos ; et al.	January 11, 2018

METHODS AND COMPOSITIONS FOR NANOSTRUCTURE-BASED NUCLEIC ACID SEQUENCING

Abstract

Provided herein are nanostructure-based sequencing methods and systems.

Inventors:

Kotseroglou; Theofilos; (Redwood City, CA) ; Papademetriou; Stephanos; (Sunnyvale, CA)

Applicant:

Name	City	State	Country	Type
Eve Biomedical, Inc.	Redwood City	CA	US

Family ID:

51351616

Appl. No.:

15/666671

Filed:

August 2, 2017

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
14185469	Feb 20, 2014	9725763
15666671
61766925	Feb 20, 2013

Current U.S. Class:	1/1
Current CPC Class:	C12Q 1/6869 20130101; C12Q 1/6869 20130101; C12Q 1/6869 20130101; C12Q 2521/101 20130101; C12Q 2521/543 20130101; C12Q 2563/143 20130101; C12Q 2565/631 20130101; C12Q 2521/543 20130101; C12Q 2521/119 20130101; C12Q 2565/631 20130101; C12Q 2563/143 20130101
International Class:	C12Q 1/68 20060101 C12Q001/68

Claims

1. (canceled)

2. A method of determining the sequence of a target nucleic acid molecule, comprising: providing a nanopore having a polymerase immobilized on or in the vicinity of said nanopore; contacting a polymerase with a double-stranded naturally occurring target nucleic acid molecule under first sequencing conditions, wherein the first sequencing conditions comprise the presence of nucleoside triphosphates consisting of four nucleoside triphosphates that each lack a detectable label, wherein a first nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount; detecting a pause in the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over the nanopore due to a pause in translocation of the target nucleic acid molecule and/or one or more nascent strand(s) by the polymerase; repeating the contacting and detecting steps under second sequencing conditions or third sequencing conditions, both comprising the presence of nucleoside triphosphates consisting of four nucleoside triphosphates that each lack a detectable label, wherein a second nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount in the second sequencing conditions and wherein a third nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount in the third sequencing conditions; and determining the sequence of the target nucleic acid molecule based on the pause(s) in the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over the nanopore detected under the first, second, and third sequencing conditions, wherein the pause(s) in movement indicate the presence of the rate-limiting nucleotide at that position.

3. The method of claim 2, wherein the solid substrate is glass.

4. The method of claim 2, wherein the polymerase is a RNA polymerase.

5. The method of claim 4, wherein the RNA polymerase is selected from the group consisting of a bacteriophage RNA polymerase and a bacterial RNA polymerase.

6. The method of claim 5, wherein the bacteriophage RNA polymerase is selected from the group consisting of a T7 RNA polymerase and a T3 RNA polymerase.

7. The method of claim 5, wherein the bacterial RNA polymerase is an E. coli RNA polymerase.

8. The method of claim 2, wherein the polymerase is a DNA polymerase.

9. The method of claim 8, wherein the DNA polymerase is selected from the group consisting of phi29, T7 DNA polymerase, Bacillus subtilis DNA polymerase, and Taq DNA polymerase.

10. The method of claim 2, wherein the target nucleic acid molecule further comprises a magnetic tag.

11. The method of claim 2, wherein the detecting step comprises measuring a change in electric current of the nanopore.

12. The method of claim 2, wherein the detecting step comprises measuring a change in ionic conduction of the nanopore.

13. The method of claim 2, wherein the detecting step further comprises capturing movement on a CMOS based manufactured nanopore and electronics.

14. The method of claim 2, further comprising: repeating the contacting and detecting steps under fourth sequencing conditions comprising the presence of nucleoside triphosphates consisting of four nucleoside triphosphates that each lack a detectable label, wherein a fourth nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount a plurality of times; and determining the sequence of the target nucleic acid molecule based on the pause(s) in the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over the nanopore detected in the first, second, third, and fourth sequencing conditions.

15. The method of claim 2, wherein the position in the target nucleic acid molecule is determined by detecting the cumulative amount of movement.

16. A method of determining the sequence of a target nucleic acid molecule, comprising: providing a nanopore, wherein a polymerase is immobilized on or near the nanopore; contacting the polymerase with the target nucleic acid molecule under first sequencing conditions comprising the presence of nucleoside triphosphates consisting of four nucleoside triphosphates, where a first nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount; detecting a pause in the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over the nanopore due to a pause in translocation of the target nucleic acid molecule and/or one or more nascent strand(s) by the polymerase; contacting the polymerase with the target nucleic acid molecule under second sequencing conditions comprising the presence of nucleoside triphosphates consisting of four nucleoside triphosphates, where a second nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount; detecting a pause in the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over the nanopore due to a pause in translocation of the target nucleic acid molecule and/or one or more nascent strand(s) by the polymerase; contacting the polymerase with the target nucleic acid molecule under third sequencing conditions comprising the presence of nucleoside triphosphates consisting of four nucleoside triphosphates, where a third nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount; detecting a pause in the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over the nanopore due to a pause in translocation of the target nucleic acid molecule and/or one or more nascent strand(s) by the polymerase; determining positional information of the first, second, and third nucleoside triphosphates along the target nucleic acid molecule based on the pause(s) in the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over the nanopore, wherein the pause(s) in movement indicate the presence of the rate-limiting nucleotide at that position.

17. The method of claim 16, further comprising: contacting the polymerase with the target nucleic acid molecule under fourth sequencing conditions comprising the presence of nucleoside triphosphates consisting of four nucleoside triphosphates, where a fourth nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount; detecting a pause in the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over the nanopore due to a pause in translocation of the target nucleic acid molecule and/or one or more nascent strand(s) by the polymerase; determining positional information of the first, second, third, and fourth nucleoside triphosphates along the target nucleic acid molecule based on the pause(s) in the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over the nanopore.

Description

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation application of U.S. application Ser. No. 14/185,469, filed on Feb. 20, 2014, which claims benefit under 35 U.S.C. .sctn.119(e) to U.S. Application No. 61/766,925, filed on Feb. 20, 2013.

TECHNICAL FIELD

[0002] This disclosure generally relates to nucleic acid sequencing systems and methods and compositions that can be used in such systems and methods.

BACKGROUND

[0003] Nanostructure DNA sequencing is one method of DNA sequencing that can lead to cost-effective, long read and accurate whole human genome sequencing and efficient bacterial genome sequencing and other sequencing applications. The present disclosure provides numerous improvements over existing nanostructure sequencing technology and addresses many of the limitations that have restricted the use of nanostructure-based sequencing methods in, for example, clinical applications and high-throughput environments.

SUMMARY

[0004] Nanostructure based sequencing relies upon the polymerase being immobilized relative to a solid surface in the vicinity of a nanostructure. As a consequence of base incorporation and elongation by the polymerase, the nucleic acid translocates within the polymerase enzyme and, as a consequence, through, on, or over the nanostructure. A change in the electronic signal across the nanostructure is observed as a result of the enzyme-dependent translocation. The methods of sequencing described herein encompass two approaches. The first approach is a base-by-base sequencing, where a known base addition leads to single base polymerization and translocation (i.e., movement) through, on, or over the nanostructure. In a second approach, all four nucleotides are present with one of the nucleotides present in a rate-limiting amount. During incorporation of three of the four nucleotides and subsequent elongation by the polymerase, movement of the nucleic acid through, on, or over the nanostructure occurs at the normal rate of the enzyme. However, at the positions within the nucleic acid that correspond to the rate-limiting nucleotide, elongation/translocation and, hence, movement through, on, or over the nanostructure, slows down or pauses. Iterative reactions with each nucleotides at a rate-limiting concentration allows for bioinformatically assembling the complete sequencing.

[0005] In one aspect, a method of determining the sequence of a target nucleic acid molecule is provided. Such a method typically includes contacting a polymerase with a target nucleic acid molecule under sequencing conditions, wherein sequencing conditions comprise the presence of at least one nucleoside triphosphate, wherein the polymerase is immobilized on a solid substrate; detecting the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over a nanostructure; repeating the contacting and detecting steps a plurality of times; and determining the sequence of the target nucleic acid molecule based, sequentially, on the presence or absence of a change in the movement in the presence of the at least one nucleoside triphosphate. In some embodiments, the sequencing conditions comprise the presence of a single nucleoside triphosphate. In some embodiments, the sequencing conditions comprise the presence of four nucleoside triphosphates, where a first nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount.

[0006] Representative solid substrates are glass. In one embodiment, the polymerase is a RNA polymerase. Representative RNA polymerases include, for example, bacteriophage RNA polymerases (e.g., T7 RNA polymerase and a T3 RNA polymerase) and bacterial RNA polymerase (e.g., an E. coli RNA polymerase). In one embodiment, the polymerase is a DNA polymerase. Representative DNA polymerases include, for example, phi29 DNA polymerase, T7 DNA polymerase, Bacillus subtilis DNA polymerase, and Taq DNA polymerase. In some embodiments, the polymerase is immobilized on the solid surface via a His-tag or via one or more biotin-streptavidin bonds.

[0007] In some embodiments, the target nucleic acid molecule is eukaryotic. The target nucleic acid molecule can be double-stranded or single-stranded. In some embodiments, the target nucleic acid molecule is included within or as a part of a biological sample. In some embodiments, the target nucleic acid molecule includes a polymerase promoter sequence. In some embodiments, the target nucleic acid molecule further includes a magnetic tag.

[0008] Representative nanostructures include, for example, biological nanostructures, solid state nanostructures, or combinations thereof. In some embodiments, the detecting step includes measuring a change in electric current through, on, or over the nanostructure and/or measuring a change in ionic conduction of the nanostructure. The detecting step can further include capturing movement on a CMOS based manufactured nanostructure and electronics. In some embodiments, the method further includes applying a directional force on the target nucleic acid molecules. In some embodiments, the directional force is produced with a magnet. In some embodiments, the directional force is produced with flow or pressure.

[0009] In another aspect, a method of determining the sequence of a target nucleic acid molecule is provided. Such a method typically includes providing a solid substrate onto which polymerase is immobilized; contacting the polymerase with the target nucleic acid molecule under first sequencing conditions, wherein the first sequencing conditions comprise the presence of four nucleoside triphosphates, where a first nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount; detecting the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over a nanostructure under the first sequencing conditions; and determining positional information of the first nucleoside triphosphate along the target nucleic acid molecule based on a change in the movement. Such a method can further include providing a solid substrate onto which polymerase is immobilized; contacting the polymerase with the target nucleic acid molecule under second sequencing conditions, wherein the second sequencing conditions comprise the presence of four nucleoside triphosphates, where a second nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount; detecting the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over a nanostructure under the second sequencing conditions; and determining positional information of the second nucleoside triphosphate along the target nucleic acid molecule based on a change in the movement. In some embodiments, the contacting and detecting steps under the second sequencing conditions are performed simultaneously with the contacting and detecting steps under the first sequencing conditions. In some embodiments, the contacting and detecting steps under the second sequencing conditions are performed sequentially before or after the contacting and detecting steps under the first sequencing conditions. Such a method can further include providing a solid substrate onto which polymerase is immobilized; contacting the polymerase with the target nucleic acid molecule under third sequencing conditions, wherein the third sequencing conditions comprise the presence of four nucleoside triphosphates, where a third nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount; detecting the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over a nanostructure under the third sequencing conditions; and determining positional information of the third nucleoside triphosphate along the target nucleic acid molecule based on a change in the movement. Such a method typically includes determining the sequence of the target nucleic acid molecule from the positional information for the first, second and third nucleoside triphosphates within the target nucleic acid molecule. Such a method can further include providing a solid substrate onto which polymerase is immobilized; contacting the polymerase with the target nucleic acid molecule under fourth sequencing conditions, wherein the fourth sequencing conditions comprise the presence of four nucleoside triphosphates, where a fourth nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount; detecting the movement of the target nucleic acid molecule and/or one or more nascent strand(s) under the fourth sequencing conditions; and determining positional information of the fourth nucleoside triphosphate along the target nucleic acid molecule based on a change in the movement.

[0010] In still another aspect, a method of determining the sequence of a target nucleic acid molecule is provided. Such a method typically includes providing a solid substrate onto which one or more polymerases are immobilized; contacting the one or more polymerases with the target nucleic acid molecule under first sequencing conditions, wherein the first sequencing conditions comprise the presence of a first of four nucleoside triphosphates; and detecting, under the first sequencing conditions, whether a change in the movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over a nanostructure occurs. If a change in the movement occurs, the method further includes repeating the contacting step and subsequent steps under the first sequencing conditions, but if a change in the movement does not occur, the method further includes repeating the contacting step and subsequent steps under second sequencing conditions, wherein the second sequencing conditions comprise the presence of a second of four nucleoside triphosphates. If a change in the movement occurs, the method further includes repeating the contacting step and subsequent steps under the first sequencing conditions, but if a change in the movement does not occur, the method further includes repeating the contacting step and subsequent steps under third sequencing conditions, wherein the third sequencing conditions comprise the presence of a third of four nucleoside triphosphates. Lastly, the method includes determining the sequence of the target nucleic acid molecule based, sequentially, on the occurrence of a change in the movement under the first, second, or third sequencing conditions.

[0011] In yet another aspect, an article of manufacture is provided. Such an article of manufacture generally includes a solid substrate onto which a plurality of polymerases are immobilized, wherein the solid substrate comprises a plurality of nanostructures. In some embodiments, the solid substrate is coated with copper and PEG. In some embodiments, the solid substrate is coated with nickel and PEG. In some embodiments, the solid substrate is coated with Ni-NTA. In some embodiments, the solid substrate is a CMOS or CCD. In some embodiments, the plurality of polymerases includes RNA polymerases, DNA polymerases, or a combination thereof. Such an article of manufacture further can include polymerase promoter sequences, biotinylated nucleic acid tether sequences, and/or one or more nucleoside triphosphates. In some embodiments, such an article of manufacture can further include instructions for identifying movement of the target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over a nanostructure; compiling a sequence of a target nucleic acid molecule based on the movement and the presence of a nucleoside triphosphate; and/or applying a directional force. In some embodiments, the instructions are provided in electronic form.

[0012] In another aspect, an apparatus for single-base sequencing of target nucleic acid molecules is provided. Such an apparatus typically includes a Sequencing Module. The Sequencing Module generally includes a receptacle for receiving a solid substrate, wherein the solid substrate comprises a plurality of polymerases immobilized thereon and a plurality of nanostructures; a source for providing directional force, wherein the directional force is sufficient and in a direction such that tension is applied to target nucleic acid molecules being polymerized by the plurality of polymerases immobilized on the solid surface; and means for determining changes in an electric current and/or an ionic conduction of the nanostructures. In some embodiments, the apparatus further can include a computer processor. In some embodiments, the apparatus can further include microfluidics for containing and transporting reagents and buffers involved in sequencing nucleic acids. Representative reagents can include nucleoside triphosphates. Representative buffers can include a wash buffer, an enzyme-binding buffer, and/or a sequencing buffer. In some embodiments, the source for providing directional force includes a magnet and/or flow of liquid.

[0013] Such an apparatus also can include a Sample Preparation Module, which can include a receptacle for receiving a biological sample; and fluidics for containing and transporting reagents and buffers involved in isolating and preparing nucleic acids for sequencing. Representative reagents include cell lysis reagents and cleavage enzymes. Representative buffers include lysis buffer and wash buffer.

[0014] Such an apparatus also can include a Template Finishing Module, which can include fluidics for containing and transporting reagents and buffers involved in attaching polymerase promoter sequences to nucleic acid molecules. Representative reagents include a ligase enzyme, a molecular motor-binding sequence, and a tether. Representative buffers include ligase buffer, magnetic tag-binding buffer, and enzyme-binding buffer.

[0015] In another aspect, a method of determining the sequence of a target nucleic acid molecule based upon data obtained during polymerization of the target nucleic acid molecule is provided. Such a method includes receiving a first datum for a first position of the target nucleic acid molecule, wherein the first datum indicates the presence or absence of movement of a target nucleic acid molecule and/or one or more nascent strand(s) through, on, or over a nanostructure and/or the rate of movement of the strand(s) through, on, or over the nanostructure; receiving a second datum for the first position of the target nucleic acid molecule, wherein the second datum indicates the presence and/or amount of one or more nucleoside triphosphates available during polymerization; receiving another first datum and another second datum for a second position of the target nucleic acid molecule; receiving yet another first datum and yet another second datum for a third position of the target nucleic acid molecule; repeating the receiving steps of the first datum and the second datum for a fourth and subsequent positions of the target nucleic acid molecule; and determining a sequence of the target nucleic acid molecule based on the first datum and second datum received for each position. In some embodiments, the first datum and the second datum is recorded as a nucleotide at an indicated position.

[0016] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the systems, methods and compositions of matter belong. Although systems, methods and materials similar or equivalent to those described herein can be used in the practice or testing of the systems, methods and compositions of matter, suitable systems, methods and materials are described below. In addition, the systems, materials, methods, and examples are illustrative only and not intended to be limiting. Any publications, patent applications, patents, and other references mentioned below are incorporated by reference in their entirety.

DESCRIPTION OF DRAWINGS

[0017] FIG. 1 shows an embodiment of a single-molecule nanostructure-based sequencing complex. The enzyme, in this embodiment, T7 RNA polymerase, is attached to a functionalized surface on one side of, in this embodiment, a nanopore via His-tag or other methods, and the nucleic acid is threaded through the nanostructure. Sequencing as described herein is performed, which translocates the nucleic acid through the enzyme and through, in this embodiment, the nanopore.

[0018] FIG. 2 shows an embodiment of a single-molecule nanostructure-based sequencing complex that utilizes, in this embodiment, a DNA polymerase. The enzyme is attached to a functionalized solid surface on one side of, in this embodiment, a nanopore. The nucleic acid is threaded and stretched through the nanostructure. Sequencing is performed as described herein and the nucleic acid is translocated through, in this embodiment, the nanopore.

[0019] FIG. 3 shows an embodiment of a single-molecule nanostructure-based sequencing complex in which a magnetic bead and a magnetic force is used to stretch and apply tension to the nucleic acid. The enzyme, in this embodiment, T7 RNA polymerase, is attached to a functionalized solid surface near, in this embodiment, a nanopore. A magnetic bead is attached at or near the end of the nucleic acid and, using magnetic force, tension is applied and the nucleic acid is stretched. Sequencing is performed as described herein and the nucleic acid is translocated through in this embodiment, the nanopore.

[0020] FIG. 4 is a flow diagram illustrating an example process for determining the sequence of a target nucleic acid molecule.

[0021] FIG. 5 shows an embodiment of a single-molecule nanostructure-based sequencing complex that can utilize either a DNA polymerase or a RNA polymerase. The enzyme is attached to a functionalized solid surface of, in this embodiment, a nanotube (e.g., a Carbon nanotube). Sequencing is performed as described herein and the nucleic acid is translocated through the nanostructure. Electrical signals that result from changes in the ionic concentration around the enzyme and near the nanostructure (e.g., in the Debye region) are measured. Since the polymerase enzyme adopts various conformations as it interacts with the template and incorporates bases into the nascent strand, the electronic signal through the nanotube can be used to correlate the motion, location and/or shape of the enzyme. Thus, when the enzyme pauses in the presence of one nucleotide in a rate-limiting amount, the electronic signal shows characteristics of pausing.

DETAILED DESCRIPTION

[0022] The present disclosure describes a single molecule nanostructure-based sequencing system in which many of the constraints of existing single molecule sequencing system are relaxed, including complexity, cost, scalability and, ultimately, longer read lengths, higher throughput and enhanced accuracy. The real time, single molecule nanostructure-based sequencing method and system described herein can sequence thousands of nucleotides in a very short time with high accuracy due to the use of highly processive enzymes and nanostructure technology.

[0023] The advantages of the present nanostructure-based sequencing systems are numerous. For example, double-stranded nucleic acid or single-stranded nucleic acid can be used as the template, which minimizes and reduces the requirements for sample preparation. In addition, labeled nucleotides are not required, since detection is performed using translocation through, on, or over nanostructures, which also significantly reduces the cost. Also, wild type polymerase enzymes can be used; no special modifications to the enzyme are necessary, and the surface chemistry and enzyme immobilization technologies also are routine. The present nanostructure-based sequencing systems and methods are suitable for homopolymeric sequences, since translocation through, on, or over the nanostructure is detectable for each nucleotide. Thus, the movement is cumulative over multiple nucleotides, even when the nucleotides are the same. The present nanostructure-based sequencing systems and methods also are readily adaptable for high throughput sequencing since multiple nanostructures can be used on a single solid surface. Notably, the polymerase enzymes regulate the rate of translocation through, on, or over the nanostructure, which is a significant problem for current nanostructure-based sequencing systems and methods but, in the present systems and methods, can ultimately lead to even higher throughput.

Overview of Nanostructure-Based Sequencing

[0024] Nanostructure-based sequencing relies upon elongation and translocation of the target nucleic acid molecules by polymerase enzymes, which also causes translocation of the target nucleic acid molecules through, on, or over the nanostructures. In one embodiment, a polymerase is immobilized on a solid surface, and a target nucleic acid is attached at one end to the polymerase while the other end is threaded through, on, or over a nanostructure. Solid state nanostructures such as nanopores or nanotubes typically have a larger opening than biological nanostructures and, thus, can accommodate double-stranded nucleic acids. The nanostructure can detect asymmetric ionic responses during movement of the nucleic acid through, on, or over the nanostructure, which signals elongation and translocation of a nucleotide base.

[0025] In one embodiment, a base-by-base (or synchronous) sequencing reaction can be performed, in which a single nucleotide is present. Reactions can then be performed that iterate between the other nucleotides. In another embodiment, an asynchronous sequencing reaction can be performed, in which all four nucleotides are present but one of the four nucleotides is provided in a rate-limiting amount. This results in a pause by the polymerase when trying to incorporate the rate-limiting nucleotide, and the change in the translocation (i.e., movement) of the nucleic acid through, on, or over the nanostructure indicates the presence of the rate-limiting nucleotide at that position. The entire sequence then can be compiled bioinformatically using, for example, four different reactions in which one of the four bases is provided in a rate-limiting amount. The different types of sequencing reactions are discussed in more detail below.

[0026] FIGS. 1 and 2 show a single-molecule nanostructure-based sequencing complex as described herein. FIG. 1 is an embodiment of a nanostructure-based sequencing complex that includes a T7 RNA polymerase (e.g., T7 RNAP), while FIG. 2 is an embodiment of a nanostructure-based sequencing complex that includes a DNA polymerase (e.g., Phi29). As described in more detail below, the polymerase enzyme can be immobilized on a functionalized surface in the vicinity of a nanostructure via a His-tag or other method. The target nucleic acid molecule can be complexed with the enzyme prior to the enzyme being immobilized on the solid substrate, or the target nucleic acid molecule can be complexed with the enzyme after the enzyme has been immobilized on the solid surface. The target nucleic acid molecule is threaded or fed through, on, or over the nanostructure, and sequencing is initiated in either a base-by base fashion or an asynchronous fashion as described herein. During each step of base incorporation by the polymerase enzyme, the nucleic acid is translocated through, on, or over the nanostructure, which is detected. In the nanostructure-based sequence methods described herein, the nanostructure detects movement by the nucleic acid due to base incorporation by the polymerase; the nanostructure is not used to distinguish the nucleotide base.

[0027] Each of the features of a nanostructure-based sequencing reaction is discussed in more detail below.

Solid Surface

[0028] For the nanostructure-based sequencing methods described herein, an enzyme (RNA polymerase or DNA polymerase) is immobilized on a solid surface. In some embodiments described herein, a solid surface is made from a silica-based glass (e.g., borosilicate glass, fused silica, or quartz). In other embodiments, Aluminum Oxide, silicon, Graphene or other surfaces used in the semiconductor art as substrates or layers on substrates. However, other materials (e.g., polypropylene, polystyrene, silicon, silicon nitride, and other polymers or composites thereof) also can be used provided they are suitable for use in the sequencing described herein.

[0029] Before immobilizing one or more polymerases into a solid surface, the solid surface generally is modified (e.g., functionalized) to receive and bind the polymerase. Methods of functionalizing solid surfaces for immobilizing enzymes are known in the art. In some embodiments, the solid surface can be functionalized with copper or nickel, while in some embodiments, the solid surface can be functionalized with Ni-NTA (see, for example, Paik et al., 2005, Chem. Commun. (Camb), 15:1956-8) or Cu-NTA. Alternatively, metals such as cobalt or the like can be used to modify a solid surface for immobilization.

[0030] Prior to modifying a solid surface, the solid surface can be treated with, for example, PEG moieties. Such strategies can be used to regulate the density of polymerases on a solid surface, and also can be used to generate a pattern of polymerases on the solid surface, such as a uniform, a semi-ordered or a random array of polymerases. The PEG environment results in minimal interactions between the enzyme and the surface (except for the binding tag on the N- or C-terminus), and ultimately results in minimal disturbance to the native conformation of the immobilized enzyme. In addition, surface passivation methods are known in the art and can include, for example, treating the solid surface with bovine serum albumin (BSA).

[0031] The solid surface can be functionalized in an array format so that a preferred location of the enzyme attachment with respect to the nanostructure can be achieved. This location, in some embodiments, can be close, or right next to, or surrounding the nanostructure. In some instances, the enzyme may partially overlap the nanostructure or it may be attached in a channel that allows for fluid communication between the nanostructure and one or more reagents or buffers. Methods for arranging enzymes in particular locations are known in the art. Positioning the enzymes with respect to the nanostructures also is feasible using methods known in the art (e.g., TEM, SEM, AFM). For coarse location readout, high resolution optical imaging can be adequate, particularly when the functional area can be tagged with fluorescence moieties that then can either be cleaved to make room for the enzymes or left in place while enzymes are positioned nearby.

Polymerase Enzymes

[0032] The nanostructure-based sequencing methods described herein can utilize any type of polymerase enzyme. Polymerases (EC 2.7.7.6; EC 2.7.7.7; EC 2.7.7.19; EC 2.7.7.48; or EC 2.7.7.49) synthesize one or two new strands of DNA or RNA from single-stranded or double-stranded template DNA or RNA. Suitable polymerases include, for example, DNA polymerases and RNA polymerases.

[0033] A representative DNA polymerase is phi29. Other DNA polymerases are well known in the art, and many that have been used in single molecule sequencing platforms that rely upon fluorescence also would be suitable for use in the present nanostructure-based sequencing methods. Representative DNA polymerases include, without limitation, T7 DNA polymerase, Bacillus subtilis DNA polymerase, and Taq DNA polymerase.

[0034] Any number of RNA polymerase enzymes can be used in the present methods. For example, multi-subunit RNA polymerases (e.g., E. coli or other prokaryotic RNA polymerase or one of the eukaryotic RNA polymerases) can be used in the sequencing methods described herein. However, it would be understood that the small, single-subunit RNA polymerases such as those from bacteriophage are particularly suitable. Single subunit RNA polymerases or the genes encoding such enzymes can be obtained from the T3, T7, SP6, or K11 bacteriophages.

[0035] The bacteriophage RNA polymerases are very processive and accurate compared to many of the multi-subunit RNA polymerases, and often produce fewer deletion-insertion errors. Additionally, RNA polymerases from bacteriophage are significantly less prone to back-tracking compared to multi-subunit counterparts such as the RNA polymerase from E. coli. RNA polymerase from several different bacteriophages has been described. Simply by way of example, the T7 RNA polymerase is made up of a single polypeptide having a molecular weight of 99 kDa, and the cloning and expression of the gene encoding T7 RNA polymerase is described in U.S. Pat. No. 5,693,489. The structure of T7 RNA polymerase has been resolved to a level of 3.3 Angstroms, with four different crystal structures having been solved: T7 RNA polymerase alone (uncomplexed), T7 RNA polymerase bound to a nucleic acid promoter, the entire initiation complex (T7 RNA polymerase bound to a nucleic acid promoter and one or more transcription factors), and T7 RNA polymerase bound by an inhibitor.

[0036] The density and/or distribution of polymerases on a solid surface can be controlled or manipulated, for example, to optimize the particular sequencing reactions being performed. As is known in the art, an array of biological molecules can be generated in a pattern. For example, an array of biological molecules can be randomly distributed on the solid surface, uniformly distributed or distributed in an ordered or semi-ordered fashion using, for example, the functionalization described herein. In some embodiments, a solid surface can have greater than 100 polymerases, or greater than 1000 polymerases (e.g., greater than 10,000 polymerases, greater than 100,000 polymerases, or greater than 1,000,000 polymerases) immobilized thereon. In some embodiments, a solid surface can have at least one polymerase immobilized per .about.5 .mu.m.sup.2 (e.g., at least one polymerase immobilized per .about.2.5 .mu.m.sup.2, .about.1 .mu.m.sup.2, .about.0.5 .mu.m.sup.2, or .about.0.1 .mu.m.sup.2). It would be understood that the density of polymerases on a solid surface may depend, at least, in part, upon the size of the target nucleic acid molecules being sequenced as well as the number, location and size of the nanostructures. As indicated herein, the polymerase enzymes can be positioned close to, right next to, overlapping with, or surrounding the nanostructure.

[0037] Polymerase enzymes can be immobilized on a solid surface using any number of known means. For example, in some embodiments, the polymerase contains a His-tag (e.g., His tags having 4 His residues, 6 His residues, or 10 His residues). In some embodiments, the polymerase is immobilized on the solid surface via one or more biotin-streptavidin bonds. A His-tag, a biotin-streptavidin binding pair or other suitable means can be used provided it is compatible with the surface chemistry (e.g., functionalization) discussed above. A polymerase can be immobilized to a solid surface in close proximity to a nanostructure or a polymerase can be immobilized to a solid surface at the same position as a nanostructure.

Target Nucleic Acid Molecules

[0038] Nucleic acid molecules for nanostructure-based sequencing can be obtained from virtually any source including eukaryotes, bacteria and archaea. Eukaryotic nucleic acids can be from humans or other mammals (e.g., primates, horses, cattle, dogs, cats, and rodents) or non-mammals (e.g., birds, reptiles (e.g., snakes, turtles, alligators, etc.) and fish), while prokaryotic nucleic acids can be from bacteria (e.g., pathogenic bacteria such as, without limitation, Streptococcus, E. coli, Pseudomonas, and Salmonella) or Archaea (e.g., Crenarchaeota, and Euryarchaeota).

[0039] Nucleic acid molecules for nanostructure-based sequencing can be contained within any number of biological samples. Representative biological samples include, without limitation, fluids (e.g., blood, urine, semen) and tissues (e.g., organ, skin, mucous membrane, and tumor).

[0040] As discussed herein, one of the advantages of the nanostructure-based sequencing methods described herein is that double-stranded or single-stranded nucleic acid can be used as the template. This reduces the need to manipulate the sample and the nucleic acid, which is a significant advantage, particularly when sequencing nucleic acids greater than 1 Kilobase (Kb; e.g., greater than 2 Kb, greater than 5 Kb, greater than 10 Kb, greater than 20 Kb, or greater than 50 Kb, or greater than 75 Kb, or greater than 100 Kb, or greater than 150 Kb) in length, since many methods used to obtain nucleic acids from biological samples result in undesired cleavage, shearing or breakage of the nucleic acids. Single-stranded nucleic acids (or samples containing single-stranded nucleic acids) can be used directly in the present methods or can be converted into a double-stranded nucleic acid. Methods of making double-stranded nucleic acids are well known in the art and will depend upon the nature of the single-stranded nucleic acid (e.g., DNA or RNA). Such methods typically include the use of well known DNA polymerases and/or Reverse Transcriptase enzymes. It would be understood that different enzymes utilize different templates (e.g., DNA or RNA, single-stranded or double-stranded), and that the choice of polymerases to be immobilized on the solid surface will depend, at least in part, upon the target nucleic acid being sequenced.

[0041] Sample preparation will be dependent upon the source, but typically will include nucleic acid isolation followed by promoter ligation. Nucleic acid templates used in the sequencing methods described herein do not require any special preparation and, thus, standard DNA isolation methods can be used. Also, a promoter sequence that is recognized by the particular polymerase must be ligated to the target nucleic acid molecules. Promoter sequences recognized by a number of polymerases, both DNA and RNA polymerases, are known in the art and are widely used. In addition, methods of ligating one nucleic acid molecule (e.g., a promoter sequence) to another nucleic acid molecule (e.g., a target nucleic acid molecule having an unknown sequence) are well known in the art and a number of ligase enzymes are commercially available.

[0042] In addition, isolated nucleic acids optionally can be fragmented and, if desired, particular sizes can be selected or fractionated. For example, isolated nucleic acids can be fragmented using ultrasonication and, if desired, size-selected using routine gel electrophoresis methodology. In addition, the target nucleic acids optionally can be circularized into, for example, a plasmid, so that sequencing can be performed on a circular target in a repetitive or recursive fashion.

[0043] Other moieties (e.g., tags) can be attached to target nucleic acid molecules using tethers. These moieties can be attached after the target nucleic acid molecules are threaded through, on, or over the nanostructures. Such moieties can be used, for example, to exert force on the target nucleic acid molecule (as discussed in more detail below), to fluoresce, to rotate with transcription, to indicate the location of the enzyme/target nucleic acid, or other functionalities that assist in deducing the location or movement of the target nucleic acid molecule through, on, or over the nanostructure or of the segments of target nucleic acid molecules that are outside or have exited the nanostructure area.

[0044] Tethers to attach moieties (e.g., tags) to target nucleic acid molecules are known in the art and include, without limitation, a chemical linkage (e.g., crosslinking, van der Walls or hydrogen bond) or a protein linkage (e.g., biotin-streptavidin binding pairs, digoxigenin and a recognizing antibody, hydrazine bonding or His-tagging). For example, in some embodiments, a moiety can be coated, at least partially, with streptavidin, while a biotinylated nucleic acid tether can be ligated to the target nucleic acid molecules. In some embodiments, a biotin-labeled nucleic acid (e.g., about 500 base pairs (bp)) can be ligated to one end of the target nucleic acid molecules. The target nucleic acid molecules having the biotin-labeled tether then can be combined with streptavidin-coated moieties. In one embodiment, a moiety as used herein can refer to a bead. There are a number of commercially available beads, including magnetic beads, that are coated or partially coated with various chemistries that can be used to tether the target nucleic acid molecules and/or bind a second moiety (e.g., Dynal, Invitrogen, Spherotech, Kisker Inc., Bangs Laboratories Inc.).

Tension on the Nucleic Acid Molecules

[0045] Tension on the target nucleic acid molecules becomes important with longer target nucleic acid molecules, as longer nucleic acid molecules can fold-up or collapse on themselves. Any type of abnormal helical structure of the target nucleic acid molecules could dampen or mask the movement through, on, or over the nanostructure and, therefore, the sequencing signal.

[0046] A directional force applied to the target nucleic acid molecules needs to be sufficient so as to avoid the folding or collapse of the target nucleic acid molecule discussed above, particularly when the end of the target nucleic acid molecule is thousands or hundreds of thousands of nucleotides away from the polymerase. However, the directional force applied to the target nucleic acid molecules can't be so strong (i.e., apply so much tension) such that elongation/translocation is impeded in any way or the backbone of the target nucleic acid molecule breaks. Such tension on the target nucleic acid molecules also can reduce the Brownian motion that can occur at the free end of a long target nucleic acid molecule or other noise effects (e.g., thermofluidic noise effects), thereby increasing the accuracy of detecting translocation (i.e., movement) through, on, or over the structure.

[0047] In some embodiments, the tension source (or the source of the directional force) can be a magnet. In such cases, the target nucleic acid molecule can be labeled with a moiety that is magnetic (e.g., a magnetic tag). See, for example, FIG. 3. Magnetic tags (e.g., beads, rods, etc.) are well known in the art. For example, a magnetic force can be applied that provides a uniform spatial force in the direction of the z-axis at a magnitude of, for example, about 1 pN, to adequately stretch the target nucleic acid molecules and avoid any looping. At the same time, such magnets generate only a miniscule force in the direction of the x-axis. These features do not impede movement (i.e., elongation and translocation of the target nucleic acid molecule through the polymerase enzyme and through, on, or over the nanostructure), while stabilizing any Brownian motion of the free end(s) of the target nucleic acid molecule. In some embodiments, the tension source can be a result of a directional flow of, for example, liquid (e.g., water or buffer) or air.

[0048] The amount of tension applied to the target nucleic acid molecules can be calibrated using standard fluidic methodology and incorporated in data acquisition and analysis process or base calling algorithms. For example, such a calibration can include monitoring the Brownian motion of a nucleic acid molecule being read by a polymerase, which is immobilized on the surface, at various locations above the surface, at various angles relative to the plane of the surface, and/or in different flows or magnetic fields and on various ionic concentrations of the buffer around the enzyme.

[0049] In certain embodiment and using the same technology as described above, tension can be applied to one or both of the nascent strands.

Threading the Nanostructure

[0050] As discussed herein, a polymerase enzyme, before or after being complexed with the template nucleic acid, can be immobilized on a solid surface directly on or in close proximity to a nanostructure. Once the template nucleic acid and the nanostructure are near one another, the nucleic acid can be introduced or threaded into the nanostructure using any number of methods including, for example, diffusion or electrical currents. It would be understood by those skilled in the art that entropic forces can affect the ability of the sample to enter the nanostructure, and that the interrelationship between diffusion and entropy depends on parameters such as the length of the nucleic acid and the size of the nanostructure. See, for example, He et al. (2013, ACS Nano, 7:538-46) for guidance.

[0051] It is known in the art that different types of nanostructures (e.g., nanotubes, nanopores) have different sizes of openings. Simply by way of example, biological nanostructures can have an opening of about 1 nm, graphene nanostructures can have an opening of about 0.5 nm, and silicon nitride nanostructures have been made with openings as small as about 2 nm. Therefore, it would be appreciated that the type of nucleic acid and the type of polymerase can determine the particular nanostructure used in the nanostructure-based sequencing methods described herein. For example, double stranded nucleic acids are usually too large to fit within nanostructures having, for example, a 1 nm opening (e.g., a biological nanostructure); therefore, those nanostructures can be used to detect the translocation of a single-stranded nucleic acid (e.g., single-stranded DNA or single-stranded RNA). In addition, a nanostructure can detect translocation of any number of different nucleic acids within the complex. For example, in some instances, a nanostructure can detect translocation of the template strand (e.g., single- or double-stranded RNA or DNA) as it is advanced by the enzyme; in some instances, a nanostructure can detect translocation of the nascent strand(s) (e.g., single- or double-stranded RNA or DNA) as it is being produced by the enzyme. Further, it would be understood that translocation of the template strand can be detected by the nanostructure in front of the enzyme or after leaving the enzyme.

[0052] The nanostructure-based sequencing methods described herein are designed to efficiently bring together a nucleic acid and a nanostructure such that the likelihood that the nanostructure will capture the nucleic acid is increased.

Nanostructures and Nanostructure-Based Sequencing

[0053] Nanostructures are well known in the art and include, without limitation, nanopores, nanotubes, and nanowires. Nanostructures can be produced using biological materials (e.g., proteins, e.g., a pore-forming protein), synthetic or solid-state materials (e.g., silicon, graphene, silicon nitride, aluminum oxide), or combinations thereof. The principle behind nanostructures is based on monitoring the ionic current passing through, on, or over the nanostructure as a voltage is applied. The passage of molecules or, in the present case, the translocation movement of the nucleic acid molecule, causes interruptions of, or changes in, the current level. Those skilled in the art would appreciate that the ionic concentration of the buffer in which the nanostructure resides can determine whether increases or decreases in the current are observed (see, for example, Smeets et al., 2006, NanoLett., 6:89-95). Thus, in some embodiments, a low ionic concentration can be used; in some embodiments, a high ionic concentration can be used.

[0054] In the nanostructure-based sequencing methods described herein, the nanostructure can detect the movement of one or more of the nucleic acids involved in the reaction. For example, the nanostructure can detect the translocation (i.e., movement) of the template nucleic acid molecule, prior to entering the polymerase enzyme, after exiting the polymerase enzyme, or both. In addition, the nanostructure can detect the translocation (i.e., movement) of one or more of the nascent strand(s) produced by the polymerase. The particular configuration will depend, at least in part, on the particular polymerase (e.g., the preferred strandedness of the template, the direction of synthesis, the strandedness of the newly-produced nucleic acid).

[0055] The basis of existing nanostructure-based sequencing methods is translocation of the nucleic acid through, on or over a nanostructure (e.g., biologic or solid state or hybrid), which is sensitive to differences between each of the four bases in a specific fashion, e.g. a specific calibration for each base. One significant hurdle to existing nanostructure-based sequencing methods is the differential sensitivity of the structure to each base. Currently, only biological pores have been shown to have adequate sensitivity and discrimination for distinguishing among the bases. Even with biological pores, however, software algorithms are used since the data is often ambiguous (e.g., identifying more than one base in the nanostructure at a single position). Therefore, existing nanostructure-based sequencing methods lack sufficient discrimination ability between the different bases.

[0056] Another limitation of existing nanostructure-based sequencing methods that contributes to low accuracy is that translocation occurs too fast. In these instances, the base does not remain in the vicinity of the nanostructure long enough to be discriminated based on its averaged signal signature with respect to the other three bases. In some cases, to counteract this, a molecular motor has been introduced in order to slow down translocation and allow the accurate detection of the electronic signal induced by each base within the nanostructure. However, even in instances in which the molecular motor is a polymerase (see, for example, Manrao et al., 2012, Nat. Biotech., 30:349-53), the base discrimination still occur within the nanostructure.

[0057] Another limitation of existing nanostructure-based sequencing technology is with the sample preparation. Nanostructure-based sequencing techniques can produce very long read lengths (e.g., 50 Kb or greater), but prefer single-stranded nucleic acids to achieve the greatest sensitivity. However, long single-stranded nucleic acids can be difficult to produce. Double-stranded nucleic acids are more stable and more easily prepared. However, because biological nanostructures are small, double-stranded nucleic acids must be converted to single-stranded nucleic acids using additional methods and enzymes before being sequenced in nanostructure-based sequencing systems that utilize biological nanostructures. On the other hand, while solid-state nanostructures are larger and can accommodate double-stranded nucleic acids, the accuracy of reading two nucleotides (i.e., one on each strand) across a larger structure is significantly reduced.

[0058] The present nanostructure-based sequence methods remove the requirement for the nanostructure to identify each specific base. The polymerase in the current nanostructure-based sequencing methods functions precisely with respect to base identification, and does not simply slow down the movement of the nucleic acid through, on, or over the nanostructure. Instead, the nanostructure-based sequencing methods described herein depend on the bases provided to the polymerase, and use the translocation of the nucleic acids through, on, or over the nanostructure (e.g., the presence of absence of translocation, or a change in the rate or pattern of translocation) to determine the sequence.

Sequencing Conditions

[0059] It would be understood by those skilled in the art that a nanostructure-based sequencing complex can be generated in any of a number of different fashions. In one embodiment, promoter-bound target nucleic acid molecules (also referred to as templates or template nucleic acids) can be provided to a solid surface having polymerases immobilized thereon. In this embodiment, the target nucleic acid molecules can be fed through, on, or over the nanostructures before or after the target nucleic acid molecules are complexed with the immobilized polymerases. In another embodiment, the polymerases and the promoter-bound target nucleic acid molecules can be combined and then the polymerases immobilized on the solid surface. Similar to the previous embodiment, the target nucleic acid molecules can be fed through, on, or over the nanostructures before or after the polymerases are provided and subsequently immobilized. The order of complex formation will depend on several factors, including, for example, without limitation, whether or not a further moiety is attached to the end of the target nucleic acid molecule opposite the promoter-bound end.

[0060] The nanostructure-based sequencing described herein can be performed in an asynchronous (i.e., rate-limiting) mode or a synchronous (i.e., base-by-base) mode, or any combination thereof to determine the sequence of a target nucleic acid molecule. At a minimum, "sequencing conditions," as used herein, refers to the presence of at least one nucleoside triphosphate, which can be used as described below to determine the sequence of a target nucleic acid molecule. In addition to the presence of at least one nucleoside triphosphate as discussed in more detail herein, conditions under which sequencing reactions are performed are well known in the art. For example, appropriate buffer components (e.g., KCl, Tris-HCl, MgCl.sub.2, DTT, Tween-20, BSA) can be used to provide a suitable environment for the enzyme. As used herein, nucleoside triphosphate refers to either the ribose-containing NTPs or the deoxyribose-containing dNTPs. Those skilled in the art would understand that the nucleoside triphosphates used in a particular sequencing reaction will be dictated by the particular polymerase(s).

a) Asynchronous Sequencing

[0061] The nanostructure-based sequencing method described herein can be used to sequence target nucleic acids based on an asynchronous incorporation of nucleotides. For asynchronous embodiments, the sequencing conditions under which the initial reaction occurs (i.e., first sequencing conditions) include the presence of four nucleoside triphosphates, where the nucleoside triphosphates are present in different amounts, at least one of which is rate-limiting and at least one of which is not rate-limiting. For example, one of the four nucleoside triphosphates is provided in a rate-limiting amount (e.g., in an amount that is less than the amount of the other three nucleoside triphosphates). In such a reaction, the polymerase will effectively pause each time it tries to incorporate the nucleoside triphosphate provided in the rate-limiting amount into the transcript, and such a pause can be observed in the pattern of movement as described herein.

[0062] Significantly, the number of bases between each pause can be precisely determined by detecting the cumulative amount of movement between pauses. Thus, the precise position of, for example, each guanine (G) nucleotide along the sequence of the target nucleic acid molecule can be concisely determined due to changes in the movement when the G nucleoside triphosphate is provided in rate-limiting amounts. Similar reactions can be performed under second, third and, if desired, fourth, sequencing conditions in which, respectively, the second, third, and fourth nucleoside triphosphate of the four nucleoside triphosphates is present in a rate-limiting amount. The combined information from the four reactions, whether they are performed simultaneously with one another or sequentially following one another, provide the complete sequence of the target nucleic acid molecule.

[0063] The pattern, even from a single reaction resulting in the positional sequence of one of four nucleotides can be compared to nucleic acid databases and used to identify the nucleic acid molecule with a high level of confidence. In addition, it would be understood by those skilled in the art that the sequence of a target nucleic acid molecule could be compiled using the positional information produced from three of the four nucleoside triphosphates, as the positional information of the fourth nucleotide in the sequence can be inferred once the other three nucleotides are known.

b) Synchronous or Base-by-Base Sequencing

[0064] The nanostructure-based sequencing method described herein can be used to sequence nucleic acids in a synchronous pattern, which otherwise might be known as base-by-base sequencing. For synchronous or base-by-base embodiments, the sequencing conditions under which the initial reaction occurs (i.e., first sequencing conditions) include the presence of a single nucleoside triphosphates. In such a reaction, transcription by the polymerase will only proceed if the target nucleic acid contains the complementary base at that position, which can be observed as a change in the movement of the nucleic acid as described herein. Such reaction conditions are continued until the movement does not change. It would be understood that the cumulative change in the movement can be used to precisely determine the number of times the first nucleoside triphosphate was sequentially incorporated into the nascent strand (e.g., in a homopolymeric region of the target nucleic acid molecule).

[0065] When a change is no longer observed in the movement of the nucleic acid under the first sequencing conditions (i.e., the presence of a first nucleoside triphosphate of the four nucleoside triphosphates), or if no changes in the movement are observed under the first sequencing conditions, a reaction is performed under second sequencing conditions. Second sequencing conditions include the presence of a second nucleoside triphosphate of the four nucleoside triphosphates. Changes in the movement of the nucleic acid through, on, or over the nanostructure are indicative of base incorporation into the nascent strand by the polymerase, while the absence of a change in the movement of the nucleic acid indicates that no base incorporation took place.

[0066] Such reactions, under first sequencing conditions, second sequencing conditions, third sequencing conditions (i.e., the presence of a third nucleoside triphosphate of the four nucleoside triphosphates) or fourth sequencing conditions (i.e., the presence of a fourth nucleoside triphosphate of the four nucleoside triphosphates), can be carried out in such a manner that the sequence of the target nucleic acid molecule is sequentially determined based on the changes in the movement of the nucleic acid under each of the respective sequencing conditions. It would be understood by those skilled in the art that steps can be taken to remove the residual nucleoside triphosphates under one sequencing condition before introducing a different sequencing condition. For example, the surface on which the polymerase is immobilized can be washed or flushed before introducing a different nucleoside triphosphate. While such washing steps are not required, it would be understood that such steps would increase the accuracy of the resulting sequence information.

c) Additional Sequencing Methodologies

[0067] The nanostructure-based sequencing methods described herein are amenable to a number of different variations and routine modifications, which can be utilized, for example, and without limitation, to further increase the accuracy of the sequencing information and further increase the amount of information obtained in a sequencing reaction.

[0068] For example, certain polymerases, usually RNA polymerases, possess a "strand-switching" or "turn-around" ability. This feature can be advantageously used in the methods described herein to increase the accuracy of the resulting sequence information. For example, when a polymerase reaches the end of a target nucleic acid, the polymerase can "jump" to the opposite strand and continue transcription. See, for example, McAllister at al. (US 2007/0077575) and Rong et al. (1998, J. Biol. Chem., 273(17):10253-60). In addition, certain RNA polymerases can "jump" from the double-stranded DNA template to the hybrid DNA-RNA transcript and resume transcription of the DNA strand. In addition, this type of recursive sequencing of a target nucleic acid molecule can be genetically engineered by introducing (e.g., ligating) a polymerase promoter onto each end of the target nucleic acid molecule, such that the polymerase binds and transcribes both strands.

[0069] In addition, one or more different polymerases (e.g., polymerases from different organisms or different polymerases from the same organism) can be immobilized onto a solid surface. As is known in the art, different polymerases recognize and bind to different promoter sequences. Therefore, one or more different polymerase promoters can be ligated to different populations of target nucleic acid molecules and a combined population of target nucleic acid molecules can be sequenced using the nanostructure-based sequencing methods described herein with the one or more different polymerases immobilized on the solid surface. By differentially-labeling, for example, the different polymerases or the different populations of target nucleic acid molecules (using, for example, beads emitting different wavelengths, fluorescent tags, or fluorescently-labeled antibodies), the sequence of one population of target nucleic acid molecules can be distinguished from the sequence of another population of target nucleic acid molecules. Using such methods, sequencing reactions on different populations of target nucleic acid molecules can take place simultaneously.

[0070] In some embodiments, both the polymerases and the populations of target nucleic acid molecules can be differentially labeled. It would be understood that labeling the target nucleic acid molecules can occur directly via the nucleic acid or, for example, via an additional moiety bound to the target nucleic acid molecule. This ability to differentially label at multiple levels of the sequencing reaction can be used, for example, to compare the processivity of different polymerases on target nucleic acid molecule having the same sequence, which may identify, for example, homopolymeric regions or regions of methylation, or to compare the polymerization of target nucleic acid molecules having different sequences by more than one polymerase.

[0071] Simply by way of example, any combination of polymerase enzymes (e.g., from one or more of the bacteriophages, one or more prokaryotes, or one or more eukaryotes), in conjunction with the appropriate nucleic acid promoter sequences, can be used in the nanostructure-based sequencing methods described herein. As discussed herein, this feature allows for a multiplexing of the sequencing reactions. Other variations that utilize different polymerases in conjunction with their specific promoter sequences as well as differential-labeling techniques are contemplated herein.

[0072] In some embodiments, two asynchronous nanostructure-based sequencing reactions can be performed under the same sequencing conditions (e.g., first sequencing conditions). Once sequencing has progressed for a sufficient number of nucleotides (e.g., at least 100 nt, 500 nt, 1,000 nt, 5,000 nt, or 10,000 nt or 20000 nt or 50000 nt or 100000 nt or 1500000 nt), the sequencing conditions of one of the reactions can be changed (e.g., to second sequencing conditions), and the nanostructure-based sequencing continued. The resulting sequence information obtained under the first sequencing conditions can be used to align a particular target nucleic acid molecule in the first reaction with the same particular target nucleic acid molecule in the second reaction, which, when the sequencing conditions are changed, allows positional sequence information to be obtained for two nucleotides within a particular target nucleic acid molecule.

[0073] Those skilled in the art would understand that the size of the nanostructures and/or the ionic content of the buffers around the nanostructures can affect the efficiency and accuracy of the sequencing reaction, particularly since polymerase enzymes place torsion on the nucleic acid molecules during elongation and translocation. In some instances, there may be polymerases and/or sequencing conditions in which loading of the polymerases and/or the nanostructures can be used to advantageously affect the rate of sequencing, although in most cases, those skilled in the art would prefer to minimize these effects.

Articles of Manufacture/Kits

[0074] Articles of manufacture (e.g., kits) are provided herein. An article of manufacture can include a solid substrate, as discussed herein, onto which a plurality of polymerase enzymes is immobilized. A plurality of polymerase enzymes refers to at least 10 polymerases (e.g., at least 20, 50, 75, or 100 enzymes), at least 100 polymerases (e.g., at least 200, 500, or 1,000 enzymes), or at least 1,000 polymerases (e.g., at least about 2,500, 5,000, 10,000, 50,000 enzymes or more).

[0075] Articles of manufacture are well known in the art and can include packaging material (e.g., blister packs, bottles, tubes, vials, or containers) and, in addition to the solid surface having polymerases immobilized thereon, can include one or more additional components.

[0076] In some embodiments, an article of manufacture can include nucleic acid sequences corresponding to a polymerase promoter. As discussed herein, promoters that direct transcription by polymerases are well known and used routinely in the art.

[0077] In some embodiments, an article of manufacture can include a tether. As discussed herein, a tether can be used to attach target nucleic acid molecules to a moiety (e.g., a tag). In some embodiments, a tether includes nucleic acid sequences, which, for example, can be biotinylated, such that they bind to, for example, streptavidin-labeled tags.

[0078] In some embodiments, an article of manufacture can include one or more nucleoside triphosphates. When more than one nucleoside triphosphate is provided, they can be provided in combination (e.g., in a single container) or separately (e.g., in separate containers).

[0079] In some embodiments, an article of manufacture further includes instructions. The instructions can be provided in paper form or in any number of electronic forms (e.g., an electronic file on, for example, a CD or a flash drive, or directions to a site on the interne (e.g., a link). Such instructions can be used to identify movement of the nucleic acid through, on, or over the nanostructure, compile the sequence of a target nucleic acid molecule based on the movement and the presence of a nucleoside triphosphate; and/or apply an appropriate tension on the nucleic acid.

Nanostructure-Based Sequencing Systems

[0080] A nanostructure-based sequencing system as described herein includes at least a Sequencing Module. A Sequencing Module for sequencing target nucleic acid molecules typically includes a receptacle for receiving a solid substrate, a tension source for providing directional force, and means for determining changes in an electric current across the nanostructures. The solid substrate and the tension source are discussed above, and means for determining or detecting a change in an electric current are well-known in the art. Such means can include, for example, using ionic current measurement (using, e.g., a voltage clamp amplifier (e.g., Axopatch)) or using transverse electric fields (e.g., dragging, tunneling) (e.g., Tsutsui et al., 2012, Sci. Rep., 2:394). A receptacle for receiving a solid substrate can be configured, for example, as a recessed chamber. A Sequencing Module also can include a computer processor or means to interface with a computer processor. Further, primary analysis software can be provided as part of a Sequencing Module.

[0081] In addition, a Sequencing Module further can include a heating and cooling element and a temperature control system for changing and regulating the temperature of the sequencing reactions. In addition, a Sequencing Module further can include fluidics (e.g., one or more reagent or buffer reservoirs and tubing for delivering the one or more reagents or buffers to the reaction chamber). Fluidics for delivering one or more reagents or buffers also can include, without limitation, at least one pump. Without limitation, exemplary reagents that can be used in a sequencing reaction can include, for example, nucleoside triphosphates and/or enzymes (polymerase). Also without limitation, exemplary buffers that can be used in a sequencing reaction can include, for example, of a wash buffer, an enzyme-binding buffer and a sequencing buffer.

[0082] The nanostructure-based sequencing systems described herein can significantly advance point-of-care diagnostics and genomics based on massively parallel single molecule analysis with the single nucleotide resolution. The system is intrinsically suited for highly multiplexed target identification and has unlimited flexibility of being able to be reconfigured to interrogate simultaneously or sequentially different nucleic acid targets, e.g. pathogens and human biomarkers. Current PCR- and microarray-based methods of sequencing nucleic acids are limited by being able to detect only known sequences or infectious agent(s) because of the specific set of reagents (primers and probes) required for positive identification.

[0083] For a system designed, for example, for high-throughput clinical diagnostics or for point-of care diagnostics, a nanostructure-based sequencing system as described herein can be coupled with a Sample Preparation Module and a Template Finishing Module.

[0084] A Sample Preparation Module can be configured to lyse cells, thereby releasing the nucleic acids, and a Sample Preparation Module also can have the capability of shearing/fragmenting the nucleic acid. A Sample Preparation Module typically includes a receptacle for receiving a biological sample, and fluidics for delivering one or more reagents or buffers to the biological sample. A Sample Preparation Module can be configured to receive a variety of different biological samples or a Sample Preparation Module can be configured to receive a specific type of biological sample (e.g., a swab, a tissue sample, a blood or plasma sample, saliva, or a portion of a culture) or a biological sample provided in a specific form (e.g., in a vial or tube or on blotting paper). A Sequencing Preparation Module also can be configured to capture certain molecules from the biological sample (e.g., bacterial cells, viruses, etc.) using, for example, filters, columns, magnets, immunological methods, or combinations thereof (e.g., Pathogen Capture System, NanoMR Inc.).

[0085] A Sample Preparation Module can include reagents or buffers involved in obtaining the nucleic acids from a biological sample and preparing the nucleic acids for sequencing. For example, reagents involved in obtaining nucleic acids for sequencing include cell lysis reagents, nucleic acid cleavage enzymes, DNA polymerases, oligonucleotides, and/or DNA binding agents (e.g., beads or solid matrices to bind and wash the target nucleic acid molecules), while buffers involved in obtaining nucleic acids for sequencing include lysis buffer, wash buffer, elution buffer, or binding buffer. Many of the functional components of a Sample Preparation Module are commercially available (e.g. Silica gel membrane (Qiagen or Ambion kits) or as an integrated part of Palladium System (Integrated Nano Technologies Inc.)). In addition, as an alternative to enzymatic cleavage of nucleic acid templates, instruments that fragment nucleic acids are commercially available (e.g., Covaris).

[0086] A Template Finishing Module can be configured to attach polymerase promoter sequences to target nucleic acid molecules. A Template Finishing Module typically includes fluidics for delivering one or more reagents or buffers to the target nucleic acid molecules. For example, a Template Finishing Module can include reagents and buffers for the purpose of ligating polymerase promoter sequences to the target nucleic acid molecules. For example, reagents involved in ligating promoter sequences to target nucleic acid molecules include, obviously, the promoter sequences, but also can include, for example, ligase enzymes, a tether or PCR reagents, while buffers involved in ligating promoter sequences to target nucleic acid molecules include ligation buffer, enzyme-binding buffer, washing buffer and sequencing buffer.

[0087] Depending upon the configuration of the nanostructure-based sequencing system as described herein, the plurality of polymerases can be immobilized on the solid surface prior to introducing the promoter-bound target nucleic acid molecules. Alternatively, a plurality of polymerases can be combined with the promoter-bound target nucleic acid molecules and the entire complex deposited on the solid surface. The latter procedure is feasible because the binding kinetics for polymerases and their corresponding promoter sequences is very fast, efficient and specific.

Sequence Determination Following Nanostructure-Based Sequencing

[0088] FIG. 4 is a flow diagram illustrating an example process 1100 for determining the sequence of a target nucleic acid molecule. In some examples, the process 1100 can be implemented using one or more computer program applications executed using one or more computing devices. For purposes of illustration, a non-limiting example context is provided that is directed to determining the sequence of a target nucleic acid molecule based upon data obtained during elongation of the target nucleic acid molecule by the polymerase.

[0089] The process 1100 starts by setting an identified position to the current nucleic position in a target nucleic acid molecule (1110) being sequenced using the nanostructure-based sequencing described herein. An identified position can be, for example, the first nucleotide incorporated/elongated within the promoter sequence, the first nucleotide incorporated/elongated from the target nucleic acid molecule (i.e., after the promoter sequences), or any nucleotide position along a target nucleic acid molecule. First datum (i.e., first information) at the identified position in the target nucleic acid molecule is received (1120) from the nanostructure-based sequencing system or provided based upon information from the operation of the nanostructure-based sequencing, and second information (i.e., second datum) at the identified position in the target nucleic acid molecule is provided or received (1120). For example, the first datum can be information regarding translocation (i.e., movement) of the nucleic acid through, on, or over a nanostructure. For example, first datum can be a rate of translocation, a determination of the presence or absence of translocation, or a change in an established pattern of translocation. For example, the second datum can be information regarding the presence and/or availability (e.g., concentration) of one or more nucleoside triphosphates in the sequencing reaction.

[0090] The nucleotide at an identified position then can be determined based upon the first and second data. For example, if the first datum indicates a change in the rate of translocation and the second datum indicates the presence of guanine nucleoside triphosphate in the reaction, then the nucleotide at the identified position in the target nucleic acid molecule is determined to be cytosine. Similarly, if the first datum indicates an absence of change in the rate of translocation and the second datum indicates the presence of guanine nucleoside triphosphate in the reaction, the nucleotide at the indicated position in the target nucleic acid molecule is determined to be non-guanine (i.e., adenine, guanine, or thymine).

[0091] If it is determined that the identified position can be advanced to a next position (1140), the identified position is set equal to the next nucleic position in the target nucleic acid molecule (1150) and the process 1100 continues (1120). If it is determined that the identified position cannot be advanced to a next position (1140), the sequence of the target nucleic acid molecule based on the first information and second information received at each identified position is compiled (1160) and the process 1100 ends. The identified position cannot be advanced to a next position when elongation can no longer occur due, for example, to completion of polymerization of the target nucleic acid molecule or expiration of polymerase activity (e.g., due to decay of enzyme activity).

[0092] Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, or in combinations of one or more of them. Embodiments of the subject matter described herein can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, a mobile communication device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially generated propagated signal. The computer storage medium can also be, or be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices).

[0093] The operations described herein can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources. The term "data processing apparatus" encompasses all kinds of apparatus, devices, and machines for processing data including, by way of example, a programmable processor, a mobile communications device, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.

[0094] A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

[0095] The processes and logic flows described herein can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA or an ASIC.

[0096] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile communications device, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few. Devices suitable for storing computer program instructions and data include all forms of non volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

[0097] To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.

[0098] Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network ("LAN") and a wide area network ("WAN"), an inter-network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).

[0099] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device). Data generated at the client device (e.g., a result of the user interaction) can be received from the client device at the server.

[0100] In accordance with the present invention, there may be employed conventional molecular biology, microbiology, biochemical, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. The invention will be further described in the following examples, which do not limit the scope of the methods and compositions of matter described in the claims.

EXAMPLES

Example 1

Solid Surface Preparation

[0101] An NTA monolayer was prepared as described (see Paik et al., 2005, Chem. Commun., 15:1956-58. Ni-NTA surfaces were obtained by immersing the NTA-functionalized substrates into 10 mM Tris-HCl buffer (pH 8.0) containing 0.1 M NiCl.sub.2 for 30 min. The substrates were then rinsed several times with Milli-Q water and dried under a nitrogen stream.

[0102] The freshly cleaned substrates were immersed into a distilled toluene solution containing 1% (v/v) 3-glycidyloxypropyl trimethoxysilane under argon for 2 days. After the substrates were removed from the solution, they were rinsed with distilled toluene and dried under a nitrogen stream. The substrates functionalized with epoxy-terminated SAM were incubated in 10 mM Tris-HCl buffer (pH 8.0) containing 2.5 mM N,N bis(carboxymethyl)-L-lysine (NTA) at 60.degree. C. for 4 h. The substrates were rinsed with Milli-Q water and dried in preparation for microcontact printing.

[0103] A limited nonspecific binding effect of His-tagged protein to the NTA SAM was observed, demonstrating the NTA SAM to be a suitable surface for fabricating Ni(II) ion patterns with microcontact printing and dip-pen nanolithography techniques.

Example 2

Cloning and Purification of His-Tagged RNA Polymerase

[0104] A DNA fragment that encodes the 38 amino acid SBP-tag was synthesized by PCR using pTAGk19 as a template and synthetic DNA oligomers RP46 and RP47 (see below) as primers. The fragment was digested with Ncol and ligated into pBH16117, resulting in pRP6.

[0105] SBP-His-RNA polymerase and His-RNA polymerase were expressed and purified as previously described (He et al., 1997, J. Protein Expression Purif., 9:142-51; and Keefe et al, 2001, J. Protein Expression Purif., 23 :440-46).

Example 3

Immobilization of Polymerase

[0106] The following reaction scheme was followed for the immobilization of RNA polymerase molecules on Si(111): (a) 40% NH.sub.4F, 10 min, 25.degree. C.; (b) Cl.sub.2 gas, 20 min, 100.degree. C.; (c) mPEG, over-night, vacuum, 150.degree. C.; (d) DSC, DEIDA, DMAP, DMF, overnight, 25.degree. C.; (f) BBTO, diethyl ether, 6 h, 25.degree. C.; (g) CuSO.sub.4, ethanol 20 min, 25.degree. C.; (h) 6.times. His-tagged protein incubation.

Example 4

Microcontact Printing (.mu.CP) and Complex Formation

[0107] A 10:1 (v/v) mixture of poly(dimethylsiloxane) (PDMS) and curing agent (Sylgard 184, Dow Corning) was cast against a patterned silicon master to prepare PDMS stamps with 5 micron line features, with a spacing of 3 and 10 micron line features and a spacing of 5 micron. The non-oxidized PDMS stamps were incubated in 10 mM Tris-HCl buffer (pH 8.0) containing 0.1 M NiCl.sub.2 for about 1 h and then dried with a nitrogen stream. The stamps were brought into contact with a NTA-terminated substrate for 3 min. After peeling off the stamp, the Ni(II)-printed substrates were incubated in about 200 .mu.L of 25 mM Tris-HCl buffer (pH 7.5) containing 100 nM of His-T7 RNAP with ds-DNA, promoter and magnetic tags attached via streptavidin-biotin bonds for 30 min and then rinsed with 10 mM Tris-HCl buffer (pH 8.0) and Milli-Q water to remove excess protein.

Example 5

Tethering

[0108] 2.8 micron SA-conjugated beads (Dynal) and 1.0 micron biotinylated beads were diluted (1:20 and 1:200, respectively) in PBS, and mixed at room temperature for 15 min. Coverslips were coated with Ni2+-NTA HRP conjugate (Qiagen) and flow chambers were assembled by aligning together slightly separated coverslips as previously described (see, Noji et al., 1997, Nature, 386:299-302).

Example 6

Template Preparation

[0109] DNA template for Sequencing by transcription was prepared by joining together 4.6 kb phage T7 DNA fragment bearing T7 promoter and 0.5 kb biotinylated fragment of Lambda DNA. A 4.6 kb fragment was generated by PCR using #T7pPK13 forward primer and #T7phi17REV primer containing an XbaI recognition site at the 3' end. A 0.5 kb PCR fragment was generated by PCR using #F3 and #R3 primers in the presence of Biotin-16-dUTP (Roche). After PCR was completed, the purified PCR product was digested with NheI and cleaned up with QIAquick PCR Purification Kit (Qiagen).

[0110] After digestion of the PCR product with XbaI, the 4.6 kb piece was joined by overnight ligation at 15.degree. C. with a 0.5 kb biotinylated PCR fragment digested with NheI. The resulting ligation product of 5.1 kb was resolved using 0.7% agarose gel electrophoresis and extracted from the gel using QlAquick Gel Extraction Kit (Qiagen). This DNA was used in the transcription and sequencing experiments.

[0111] The following primers were used for PCR: #T7pPK13:

TABLE-US-00001 (SEQ ID NO: 1 GCA GTA ATA CGA CTC ACT ATA GGG AGA GGG AGG GAT GGA GCC TTT AAG GAG GTC AAA TGG CTA ACG;

the T7 promoter sequence is underlined, the bold G is +1 and the bold C is a pause site at position +20); #T7phi17REV: GGC A-T CTA GA-TGC ATC CCT ATG CAG TCC TAA TGC (SEQ ID NO:2; contains Xba site); #F3: GGC AGC TAG CTA AAC ATG GCG CTG TAC GTT TCG C (SEQ ID NO:3; contains NheI restriction site at 5' end); and #R3: AGC CTT TCG GAT CGA ACA CGA TGA (SEQ ID NO:4).

[0112] The following table shows the reaction mixture used to prepare a 4.6 Kb fragment from T7 phage containing the T7 promoter. PCR amplification was performed under the following cycling conditions: 94.degree. C. for 30'', 32 cycles at 94.degree. C. for 10'', 55.degree. C. for 30'', 65.degree. C. for 4'10'', 65.degree. C. for 10', followed by a 4.degree. C. hold.

TABLE-US-00002 Component Volume 5x LongAmp Buffer with Mg 60 .mu.l (New England Biolabs) 25 mM NTPs (each) 3.6 ul 10 mM # T7pPK13 12 .mu.l (0.4 mM final) 10 mM # T7phi17REV 12 .mu.l (0.4 mM final) (50 ng/.mu.l) 6 .mu.l H.sub.2O 194.4 .mu.l LongAmp Polymerase (NEB) 12 .mu.l Total Reaction Volume 300 .mu.l

[0113] The following table shows the reaction mixture used to prepare a 0.5 Kb lambda fragment containing multiple biotins. PCR amplification was performed under the following cycling conditions: 94.degree. C. for 10', 32 cycles at 94.degree. C. for 10'', 55.degree. C. for 30'', 72.degree. C. for 1', 72.degree. C. for 7', followed by a hold at 4.degree. C.

TABLE-US-00003 Component Volume 10x TaqGold buffer w/o Mg 10 .mu.l (Applied Biosystems) 10 .mu.M F3 6 .mu.l 10 .mu.M R3 6 .mu.l 25 mM MgCl.sub.2 10 .mu.l Lambda DNA (50 ng/.mu.l) 2 .mu.l 1 mM dGTP 10 .mu.l 1 mM dCTP 10 .mu.l 1 mM dATP 10 .mu.l 1 mM dTTP 6.5 .mu.l 1 mM Bio-16-dUTP 3.5 .mu.l H.sub.2O 21 .mu.l TagGold Pol 5 .mu.l Total Reaction Volume 100 .mu.l

Example 7

Complex Formation and Sequencing Reaction

[0114] A PEG-Cu.sup.++ functionalized glass slide (MicroSurfaces, Inc) was passivated with Buffer B+1% BSA.

[0115] The following reaction was set up at room temperature and incubated for 3 min at 37.degree. C.

TABLE-US-00004 Component Volume 10x Buffer A 0.5 .mu.l Template (5.1 kb PT7pK13-Bio DNA) 6 ng/.mu.l, 2 .mu.l 1.93 fmoles/.mu.l, or 2 nM (final 0.8 nM) 10x mix of three NTP (0.3 mM ATP + 0.3 mM 1 .mu.l GTP + 0.1 mM UTP) 4 .mu.M His-T7RNAP (final 0.8 .mu.M; prepared 1 .mu.l from stock by diluting in Buffer A) H.sub.2O 0.5 .mu.l Total Reaction Volume 5 .mu.l

[0116] 45 .mu.l of Buffer B was added to the reaction mix with T7 RNAP-DNA elongation complexes halted at position +20 of the template, and the mixture was infused into the flow cell over a period of 5 min.

[0117] The flow cell was washed with Buffer B, and 1 .mu.m SA magnetic beads (46 .mu.l Buffer B+0.1% BSA mixed with 6 .mu.l washed beads in Buffer B+0.1% BSA) was infused over a period of 12 min. The flow cell was washed with Buffer B+0.1% BSA.

[0118] 0.8 micron polystyrene biotinylated beads (2 .mu.l of washed beads+48 .mu.l 1.times. B/0.1% BSA) were infused into the flow cell and incubated for 15 min to form bi-particles with surface tethered magnetic SA beads. The flow cell was washed with Buffer B to remove unbound 0.8 micron polystyrene beads.

[0119] Transcription/sequencing was started by infusing Buffer B+250 .mu.M NTPs+10 mM DTT into the flow cell. Four different NTP mixes (each containing less of one of the nucleotides) were used in four different flow cells.

TABLE-US-00005 1x Buffer A 1x Buffer B 20 mM Tris pH 8.0 20 mM Tris pH 8.0 14 mM MgCl2 4 mM MgCl2 10 mM DTT 0.1 mM DTT 0.1 mM EDTA 0.1 mM EDTA 20 mM NaCl 20 mM NaCl 1.5% glycerol 20 .mu.g/ml BSA 20 .mu.g/ml BSA

[0120] It is to be understood that, while the systems, methods and compositions of matter have been described herein in conjunction with a number of different aspects, the foregoing description of the various aspects is intended to illustrate and not limit the scope of the systems, methods and compositions of matter. Other aspects, advantages, and modifications are within the scope of the following claims.

[0121] Disclosed are systems, methods and compositions that can be used for, can be used in conjunction with, can be used in preparation for, or are products of the disclosed systems, methods and compositions. These and other materials are disclosed herein, and it is understood that combinations, subsets, interactions, groups, etc. of these systems, methods and compositions are disclosed. That is, while specific reference to each various individual and collective combinations and permutations of these compositions and methods may not be explicitly disclosed, each is specifically contemplated and described herein. For example, if a particular system part, composition of matter or particular method is disclosed and discussed and a number of system parts, compositions or methods are discussed, each and every combination and permutation of the system parts, compositions and methods are specifically contemplated unless specifically indicated to the contrary. Likewise, any subset or combination of these is also specifically contemplated and disclosed.

Sequence CWU 1

1

4166DNAArtificial Sequenceoligonucleotide 1gcagtaatac gactcactat agggagaggg agggatggag cctttaagga ggtcaaatgg 60ctaacg 66234DNAArtificial Sequenceoligonucleotide 2ggcatctaga tgcatcccta tgcagtccta atgc 34334DNAArtificial Sequenceoligonucleotide 3ggcagctagc taaacatggc gctgtacgtt tcgc 34424DNAArtificial Sequenceoligonucleotide 4agcctttcgg atcgaacacg atga 24

* * * * *