Parallel primer extension approach to nucleic acid sequence analysis Caskey, C. Thomas ; et al. [Baylor College of Medicine]

Parallel primer extension approach to nucleic acid sequence analysis

Caskey, C. Thomas ; et al.

Patent Application Summary

U.S. patent application number 10/254828 was filed with the patent office on 2003-05-01 for parallel primer extension approach to nucleic acid sequence analysis. This patent application is currently assigned to Baylor College of Medicine. Invention is credited to Caskey, C. Thomas, Metspalu, Andres, Shumaker, John.

Application Number	20030082613 10/254828
Document ID	/
Family ID	35810584
Filed Date	2003-05-01

United States Patent Application	20030082613
Kind Code	A1
Caskey, C. Thomas ; et al.	May 1, 2003

Parallel primer extension approach to nucleic acid sequence analysis

Abstract

A method of analyzing a polynucleotide of interest, comprising providing one or more sets of consecutive oligonucleotide primers differing within each set by one base at the growing end therof, annealing a single strand of the polynucleotide or a fragment of the polynucleotide to the oligonucleotide primers under hybridization conditions; subjecting the primers to single base extension reactions with a polymerase and terminating nucleotides, the terminating nucleotides being mutually distinguishable; and observing the location and identity of each terminating nucleotide to thereby analyze the sequence or a part of the nucleotide sequence of the polynucleotide of interest, is disclosed. An apparatus comprising a solid support to which is attached at defined locations thereon one or more sets of consecutive oligonucleotide primers differing within each set by one base at the growing end thereof is also described.

Inventors:	Caskey, C. Thomas; (Houston, TX) ; Shumaker, John; (Houston, TX) ; Metspalu, Andres; (Tartu, EE)
Correspondence Address:	HAMILTON, BROOK, SMITH & REYNOLDS, P.C. 530 VIRGINIA ROAD P.O. BOX 9133 CONCORD MA 01742-9133 US
Assignee:	Baylor College of Medicine
Family ID:	35810584
Appl. No.:	10/254828
Filed:	September 25, 2002

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
10254828	Sep 25, 2002
09711476	Nov 13, 2000
09711476	Nov 13, 2000
08564100	Mar 6, 1996
6153379
08564100	Mar 6, 1996
PCT/US94/07086	Jun 22, 1994

Current U.S. Class:	435/6.14 ; 435/91.2
Current CPC Class:	C12Q 1/6837 20130101; C12Q 1/6858 20130101; C12Q 1/6837 20130101; C12Q 1/6858 20130101; C12Q 2535/125 20130101; C12Q 2565/537 20130101; C12Q 2535/125 20130101; C12Q 1/6874 20130101; C12Q 2525/186 20130101; C12Q 2525/186 20130101; C12Q 2563/107 20130101; C12Q 2523/107 20130101; C12Q 2525/186 20130101; C12Q 2525/204 20130101; C12Q 2521/319 20130101; C12Q 1/6874 20130101; C12Q 2535/125 20130101; C12Q 1/6837 20130101
Class at Publication:	435/6 ; 435/91.2
International Class:	C12Q 001/68; C12P 019/34

Goverment Interests

[0001] This invention was made with Government Support under grant number 5-R01-DK31 428-11 awarded by the National Institutes of Health. The United States Government has certain rights in the invention.

Foreign Application Data

Date	Code	Application Number
Jun 22, 1993	SE	SE9302152-5

Claims

What is claimed is:

1.A method of analyzing the sequence of a polynucleotide of interest, comprising the steps of: a) annealing a polynucleotide of interest to free oligonucleotide primers having known sequences of N nucleotides in length to generate annealed primers; b) subjecting the annealed primers to a single base extension reaction to extend the annealed primers by the addition of a terminating nucleotide; c) observing the identity of each terminating nucleotide that has been added to the annealed primers.

2. A method of analyzing the sequence of a polynucleotide of interest, comprising the steps of: a) annealing a polynucleotide of interest to oligonucleotide primers having known sequences of N nucleotides in length under hybridization conditions, to generate annealed primers; b) subjecting the annealed primers to a single base extension reaction which comprises providing to the annealed primers nucleotides corresponding to each of the four bases, to extend the annealed primers by the addition of a terminating nucleotide; c) observing the identity and location of each terminating nucleotide that has been added to the annealed primers.

3. A method of analyzing the sequence of a polynucleotide of interest, comprising the steps of: a) attaching an array of oligonucleotide primers having known sequences of N nucleotides in length to a solid support at known locations; b) annealing the polynucleotide of interest to the array of oligonucleotide primers to generate annealed primers; c) subjecting the annealed primers to a single base extension reaction to extend the annealed primers by the addition of a terminating nucleotide; d) observing the identity and location of each terminating nucleotide within the array on the solid support.

4. A method of analyzing the sequence of a polynucleotide of interest, comprising the steps of: a) attaching an array of oligonucleotide primers having known sequences of N nucleotides in length to a solid support at known locations; b) annealing the polynucleotide of interest to the array of oligonucleotide primers to generate annealed primers; c) subjecting the annealed primers to a single base extension reaction to extend the annealed primers by the addition of a terminating nucleotide; d) selecting a starting annealed primer; e) observing the identity and location of the terminating nucleotide which has been added to the starting annealed primer, to determine the next nucleotide in sequence; f) selecting a second annealed primer which has the same nucleotide sequence as nucleotides 2 through N of the starting annealed primer nucleotide plus the next nucleotide in sequence as determined in step (e), and g) repeating steps (e) and (f), using the second annealed primer as the starting annealed primer for each repetition, to determine the sequence of the polynucleotide of interest.

5. A method of analyzing the sequence of a polynucleotide of interest, comprising the steps of: a) attaching an array of oligonucleotide primers, having known sequences of N nucleotides in length to a solid support at defined locations; b) annealing the polynucleotide of interest to the array of oligonucleotide primers under hybridization conditions, to generate annealed primers; c) subjecting the annealed primers to a single base extension reaction which comprises providing to the annealed primers nucleotides corresponding to each of the four bases, to extend the annealed primers by the addition of a terminating nucleotide; d) observing the identity and location of each terminating nucleotide within the array on the solid support.

6. A method of analyzing the sequence of a polynucleotide of interest, comprising the steps of: a) attaching an array of oligonucleotide primers, having known sequences of N nucleotides in length to a solid support at defined locations; b) annealing the polynucleotide of interest to the array of oligonucleotide primers under hybridization conditions, to generate annealed primers; c) subjecting the annealed primers to a single base extension reaction which comprises providing to the annealed primers nucleotides corresponding to each of the four bases, to extend the annealed primers by the addition of a terminating nucleotide; d) selecting a starting annealed primer; e) observing the identity and location of the terminating nucleotide which has been added to the starting annealed primer, to determine the next nucleotide in sequence; f) selecting a second annealed primer which has the same nucleotide sequence as nucleotides 2 through N of the starting annealed primer nucleotide plus the next nucleotide in sequence as determined in step (e), and g) repeating steps (e) and (f), using the second annealed primer as the starting annealed primer for each repetition, to determine the sequence of the polynucleotide of interest.

7. The method of any one of claims 1 to 6, wherein the single base extension reaction comprises subjecting the annealed primers to a reaction mixture comprising a polymerase and nucleotides corresponding to each of the four bases.

8. The method of any one of claims 5 to 7, wherein the nucleotides corresponding to each of the four bases are mutually distinguishable.

9. The method of claim 8, wherein three of the four nucleotides are differently labelled.

10. The method of claim 9, wherein the three differently labelled nucleotides are fluorescently labelled.

11. The method of any one of claims 1 to 10, further comprising analyzing the sequence of the complementary polynucleotide of interest.

12. The method of any one of claims 1 to 11, wherein the terminating nucleotides are dideoxynucleotides.

13. The method of any one of claims 1 to 12, wherein the length N of the oligonucleotide primers is between 7 and 30 inclusive.

14. The method of any one of claims 1 to 13, wherein the length N of the oligonucleotide primers is between 20 and 24 inclusive.

15. The method of any one of claims 1, 2, 13 or 14, wherein the oligonucleotide primers comprise oligonucleotide primers of different lengths.

16. The method of any one of claims 1 to 15, wherein observing the identity and location of a terminating nucleotide comprises the use of a charge coupled device or a photomultiplier tube.

17. The method of any one of claims 3 to 14 or 16, wherein the terminating nucleotides are removed from the annealed primers after completed analysis to prepare the solid support for reuse.

18. The method of any one of claims 1 to 17, wherein the terminating nucleotides are dinucleotides.

19. An apparatus for analyzing the sequence of a polynucleotide of interest, comprising a solid support having attached thereon at defined locations an array of oligonucleotide primers having known sequences.

20. The apparatus of claim 19, wherein the oligonucleotide primers are attached to the solid support by a specific binding pair.

21. The apparatus of claim 20, wherein the specific binding pair is biotin and a molecule selected front the group consisting of: avidin and strepavidin.

Description

RELATED APPLICATIONS

[0002] This application is a Continuation of U.S. patent application Ser. No. 08/564,100, which is the U.S. National Phase Application of PCT/US94/07086, filed Jun. 22, 1994, which is a Continuation-in-Part Application of Sweden Application No. SE 9302152-5, filed on Jun. 22, 1993. The entire teachings of the of above applications are incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0003] Today, there are two predominant methods for DNA sequence determination: the chemical degradation method (Maxam and Gilbert, Proc. Natl. Acad. Sci., 74:560-564 (1977), and the dideoxy chain termination method (Sanger et al., Proc. Natl. Acad. Sci., 74:5463-5467 (1977)). Most automated sequencers are based on the chain termination method utilizing fluorescent detection of product formation. There are two common variations of these systems: (1) dye-labeled primers to which deoxynucleotides and dideoxynucleotides are added, and (2) primers to which deoxynucleotides and fluorescently labeled dideoxynucleotides are added. In addition, the labeled deoxynucleotides can be used in conjunction with unlabeled dideoxynucleotides. This method is based upon the ability of an enzyme to add specific nucleotides onto the 3' hydroxyl end of a primer annealed to a template. The base pairing property of nucleic acids determines the specificity of nucleotide addition. The extension products are separated electrophoretically on a polyacrylamide gel and detected by an optical system utilizing laser excitation.

[0004] Although both the chemical degradation method and the dideoxy chain termination method are in widespread use, there are many associated disadvantages: for example, the methods require gel-electrophoretic separation. Typically, only 400-800 base pairs can be sequenced from a single clone. As a result, the systems are both time- and labor-intensive. Methods avoiding gel separation have been developed in attempts to increase the sequencing throughput.

[0005] Methods have been proposed by Crkvenjakov (Drmanac et al., Genomics, 4:114 (1989); Strezoska et al., (Proc. Natl. Acad. Sci. USA, 88:10089 (1991); Drmanac et al., Science, 260: 1649 (1991)) and Bains and Smith (Bains and Smith, J., Theoretical Biol., 135: 303 (1988)). These sequencing by hybridization (SBH) methods potentially can increase the sequence throughput because multiple hybridization reactions are performed simultaneously. This type of system utilizes the information obtained from multiple hybridizations of the polynucleotide of interest, using short oligonucleotides to determine the nucleic acid sequence (Drmanac, U.S. Pat. No. 5,202,231). To reconstruct the sequence requires an extensive computer search algorithm to determine the optimal order of all fragments obtained from the multiple hybridizations.

[0006] These methods are problematic in several respects. For example, the hybridization is dependent upon the sequence composition of the duplex of the oligonucleotide and the polynucleotide of interest, so that GC-rich regions are more stable than AT-rich regions. As a result, false positives and false negatives during hybridization detection are frequently present and complicate sequence determination. Furthermore, the sequence of the polynucleotide is not determined directly, but is inferred from the sequence of the known probe, which increases the possibility for error. A great need remains to develop efficient and accurate methods for nucleic acid sequence determination.

SUMMARY OF THE INVENTION

[0007] The current invention pertains to methods for analyzing, and particularly for sequencing, a polynucleotide of interest, and an apparatus useful in analyzing a polynucleotide of interest. In one embodiment of the current invention, the nucleotide sequence of a polypeptide of interest is analyzed for the presence of mutations or alterations. In a second embodiment of the current invention, the nucleotide sequence of a polypeptide of interest, for which the nucleotide sequence was not known previously, is determined. The method comprises detecting single base extension events of a set of specific oligonucleotide primers, such that the label and position of each separate extension event defines a base in a polynucleotide of interest.

[0008] In one method of the current invention, a solid support is provided. An array of a set or several sets of consecutive oligonucleotide primers of a specified size having known sequences is attached at defined locations to the solid support. The oligonucleotide primers differ within each set by one base pair. The oligonucleotide primers either correspond to at least a part of the nucleotide sequence of one strand of the polynucleotide of interest, if the sequence is known, or represent a set of all possible nucleotide sequences for oligonucleotide primers of the specified size, if the sequence is not known. A polynucleotide of interest, which may be DNA or RNA, or a fragment of the polynucleotide of interest, is annealed to the array of oligonucleotide primers under hybridization conditions, thereby generating "annealed primers." The annealed primers are subjected to single base extension reaction conditions, under which a nucleic acid polymerase and terminating nucleotides, such as dideoxynucleotides (ddNTPs) corresponding to the four known bases (A, G, T and C), are provided to the annealed primers. The terminating nucleotides can also comprise a terminating string of known polynucleotides, such as dinucleotides. As a result of the single base extension reaction, extended primers are generated, in which a terminating nucleotide is added to each of the annealed primers. The terminating nucleotides can be provided to the annealed primers either simultaneously or sequentially. The terminating nucleotides are mutually distinguishable; i.e., at least one of the nucleotides is labeled to facilitate detection. After addition of the terminating nucleotides, the sequence of the polynucleotide of interest is analyzed by "reading" the oligonucleotide array: the identity and location of each terminating nucleotide within the array on the solid support is observed. The label and position of each terminating nucleotide on the solid support directly defines the sequence of the polynucleotide of interest that is being analyzed.

[0009] In a second method of the current invention, the polynucleotide of interest is analyzed for the presence of specific mutations through the use of oligonucleotide primers that are not attached to a solid support. The oligonucleotide primers are tailored to anneal to the polynucleotide of interest at a point immediately preceding the mutation site(s). If more than one mutation site is examined, the oligonucleotide primers are designed to be mutually distinguishable: in a preferred embodiment, the oligonucleotide primers have different mobilities during gel electrophoresis. For example, oligonucleotides of different lengths are used. After the oligonucleotide primers are annealed to the polynucleotide of interest, the annealed primers are subjected to single base extension reaction conditions, resulting in extended primers in which terminating nucleotides are added to each of the annealed primers. As in the first method of the current invention, the terminating nucleotides are mutually distinguishable. After addition of the terminating nucleotides, the sequence of the polynucleotide of interest is analyzed by elating the extended primers, performing gel electrophoresis, and "reading" the gel: the identity and location of each terminating nucleotides on the gel is observed using standard methods, such as with an automated DNA sequencer. The label and position of each terminating nucleotide on the gel directly defines the sequence of the polynucleotide of interest that is being analyzed, and indicates whether a mutation is present.

[0010] The apparatus of the current invention comprises a solid support having an array of one or more sets of consecutive oligonucleotide primers with known sequences attached to it at defined locations, each oligonucleotide primer differing within each set by one base pair. The set of oligonucleotide primers either corresponds to at least a part of the nucleotide sequence of one strand of the polynucleotide of interest, if the sequence is known, or represents all possible nucleotide sequences for oligonucleotide primers of the specified size, if the sequence is not known.

[0011] The current invention provides both direct information, due to the detection of a specific nucleotide addition, and indirect information, due to the known sequence of the annealed primer to which the specific base addition occurred, for the polynucleotide of interest. The ability to determine nucleic acid sequences is a critical element of understanding gene expression and regulation. In addition, as advances in molecular medicine continue, sequence determination will become a more important element in the diagnosis and treatment of disease.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] FIG. 1 depicts an example of a set of oligonucleotide primers (sense primers, SEQ ID NOS: 2-11, antisense primers, SEQ ID NOS: 12-21), comprising consecutive primers differing by one base pair at the growing end and capable of hybridizing successively along the relevant part(s) or the whole of the polynucleotide of interest (SEQ ID NOS: 1 and 22).

[0013] FIG. 2 is a schematic illustration of a single strand template bound to a primer which is in turn attached to a solid support.

[0014] FIG. 3 illustrates a set of consecutive oligonucleotide primers for a part of the polynucleotide of interest following immediately after the primer illustrated in FIG. 2.

[0015] FIG. 4 illustrates the single base pair additions to all the primers illustrated in FIG. 3, as well as the corresponding additions for the corresponding primers related to the complementary strand of the polynucleotide of interest.

[0016] FIG. 5 is a graphic depiction of the length of extended primers formed utilizing free oligonucleotide primers annealed to a polynucleotide of interest.

[0017] FIGS. 6A, 6B and 6C are graphic depictions of electrophoretograms demonstrating the detection of the presence of a mutation in a polynucleotide of interest.

[0018] FIGS. 7A, 7B and 7C depict the results of a DNA chip-based analysis for a five-base region within the third exon of the HPRT gene.

DETAILED DESCRIPTION OF THE INVENTION

[0019] The current invention pertains to methods for analyzing the nucleotide sequence of a polynucleotide of interest. The method comprises hybridizing all or a fragment of a polynucleotide of interest to oligonucleotide primers, conducting single base extension reactions, and detecting the single base extension events. The method can be used to analyze the sequence of a polypeptide of interest by examining the sequence for the presence of mutations or alterations in the nucleotide sequence, or by determining the nucleotide sequence of a polypeptide of interest.

[0020] As used herein, the term "polynucleotide of interest" refers to the particular polynucleotide for which sequence information is wanted. Representative polynucleotides of interest include oligonucleotides, DNA or DNA fragments, RNA or RNA fragments, as well as genes or portions of genes. The polynucleotide of interest can be single- or double-stranded. The term "template polynucleotide of interest" is used herein to refer to the strand which is analyzed, if only one strand of a double-stranded polynucleotide is analyzed, or to the strand which is identified as the first strand, if both strands of a double-stranded polynucleotide are analyzed. The term, "complementary polynucleotide of interest" is used herein to refer to the strand which is not analyzed, if only one strand of a double-stranded polynucleotide is analyzed, or to the strand which is identified as the second strand (i.e., the strand that is complementary to the first (template) strand), if both strands of a double-stranded polynucleotide are analyzed. Either one of the two strands can be analyzed. In a preferred embodiment, both strands of a double-stranded polynucleotide of interest are analyzed in order to verify sequence information obtained from the template (first) strand by comparison with the complementary (second) strand. Nevertheless, it is not always necessary to analyze both strands. For example, if the polynucleotide of interest is being analyzed for the presence of a single base mutation, and not for the complete base sequence in the mutation region, it is sufficient to analyze a single strand of the polynucleotide of interest.

[0021] The methods of the current invention can be used to identify the presence of mutations or alterations in the nucleotide sequence of a polypeptide of interest. To identify mutations or alterations, the sequence of the polynucleotide of interest is compared with the sequence of the native or normal polynucleotide. An "alteration" in the polynucleotide of interest, as used herein, refers to a deviation from the expected sequence (the sequence of the native or normal polynucleotide), including deletions, insertions, point mutations, frame-shifts, expanded oligonucleotide repeats, or other changes. The portion of the polynucleotide of interest that contains the alteration is known as the "altered" region. The methods can also be used to determine the polynucleotide sequence of a polypeptide of interest having a previously unknown nucleotide sequence.

[0022] In one embodiment of the current invention, the polynucleotide of interest is analyzed by annealing the polynucleotide to an array comprising sets of oligonucleotide primers. The oligonucleotide primers in the array have a length N, where N is from about 7 to about 30 nucleotides, inclusive, and is preferably from 20 to 24 nucleotides, inclusive. Each oligonucleotide primer within each set differs by one base pair. The oligonucleotide primers can be prepared by conventional methods (see Sambrook et al., Molecular Cloning: A Laboratory Manual (2nd Ed, 1989)). The sets of oligonucleotide primers are arranged into an array, such that the position and nucleotide content of each oligonucleotide primer on the array is known.

[0023] The size and nucleotide content of the oligonucleotide primers in the array depend on the polynucleotide of interest and the region of the polynucleotide of interest for which sequence information is desired. To analyze a polynucleotide of interest for the presence of alterations, consecutive primers differing by one base pair at the growing end and capable of hybridizing successively along the relevant part(s) or the whole of the polynucleotide are used. An example of such a primer set is shown in FIG. 1. If only one or a few specific positions of the polynucleotide sequence are examined for alterations, the necessary array of oligonucleotide primers covers only the mutation regions, and is therefore small. If the whole or a major part of the polynucleotide of interest is to be analyzed for possible mutations at varying positions, the necessary array is larger. For example, the whole hypoxanthine-guanine phosphoribosyl-transferase (HPRT) gene can be covered by 900 primers, arranged in a 30.times.30 array; the whole p53 gene requires 700 primers. If both strands of a double-stranded polynucleotide of interest are analyzed for the presence of alterations, the array comprises consecutive oligonucleotide primers for the suspected mutation region of both the template polynucleotide of interest and the complementary polynucleotide of interest. If the polynucleotide of interest has not been sequenced previously, the array includes oligonucleotide primers comprising all possible N-mers.

[0024] The array of sets of oligonucleotide primers is immobilized to a solid support at defined locations (i.e., known positions). The immobilized array is referred to as a "DNA chip," which is the apparatus of the current invention. The solid support can be a plate or chip of glass, silicon, or other material. The solid support can also be coated, such as with gold or silver. Coating may facilitate attachment of the oligonucleotide primers to the surface of the solid support. The oligonucleotide primers can be bound to the solid support by a specific binding pair, such as biotin and avidin or biotin and streptavidin. For example, the primers can be provided with biotin handles in connection with their preparation, and then the biotin-labeled primers can be attached to a streptavidin-coated support. Alternatively, the primers can be bound by a linker arm, such as a covalently bonded hydrocarbon chain, such as a C.sub.10-20 chain. The primers can also be bound directly to the solid support, such as by epoxide/amine coupling chemistry (see Eggers, M. D. et al., Advances in DNA sequencing Technology, SPIE conference proceedings, Jan. 21, 1993). The solid support can be reused, as described in greater detail below.

[0025] In another embodiment of the invention, the polynucleotide of interest is analyzed by annealing the polynucleotide to one or more specific oligonucleotide primers that are not attached to a solid support; such oligonucleotide primers are referred to herein as "free oligonucleotide primers." If free oligonucleotide primers are used, the polynucleotide of interest can be attached to a solid support, such as magnetic beads. The free oligonucleotide primers have a length N, as described above, and are prepared by conventional methods (see Sambrook et al., Molecular Cloning. A Laboratory Manual (2nd Ed, 1989)). The size and nucleotide content of the free oligonucleotide primers depend on the polynucleotide of interest and the region of the polynucleotide of interest for which sequence information is desired. To analyze a polynucleotide of interest for the presence of alterations, primers capable of hybridizing immediately adjacent to the relevant part(s) of the polynucleotide are used. If more than one position of the polynucleotide sequence is examined for alterations, the free oligonucleotide primers are mutually distinguishable: i.e., the oligonucleotide primers have different mobilities during gel electrophoresis. In a preferred embodiment, oligonucleotides of different lengths are used. For example, an oligonucleotide primer of 10 nucleotides in length is designed to hybridize immediately adjacent to one putative mutation, and an oligonucleotide primer of 12 nucleotides in length is designed to hybridize immediately adjacent to a second putative mutation. Because the oligonucleotide primers are of different lengths, they will migrate to different positions on the gel. Thus, in this manner, the nucleotide content of each oligonucleotide primer can be identified by the position of the oligonucleotide primer on the gel.

[0026] The polynucleotide of interest is hybridized to the array of oligonucleotide primers, or to the free nucleotide primers, under high stringency conditions, so that an exact match between the polynucleotide of interest and the oligonucleotide primers is obtained, without any base-pair mismatches (see Sambrook et al., Molecular Cloning. A Laboratory Manual (2nd Ed, 1989)). For example, a schematic illustration of a hypothetical polynucleotide of interest annealed to an oligonucleotide primer that is attached to a solid support is shown schematically in FIG. 2. In FIG. 2, a part of the sequence of the polynucleotide of interest that follows immediately after the portion of the polynucleotide that is bound to the oligonucleotide primer on the array is shown as TGCAACTA. Six corresponding consecutive primers are shown in FIG. 3, i.e. primers ending with the pairing bases A, AC, ACG, etc. If the polynucleotide of interest is double-stranded, it can be separated into two single strands either before or after the binding of the polynucleotide of interest to the array oligonucleotide primers. Both the template and the complementary polynucleotide of interest can be analyzed utilizing a single array. Thus, while not shown in FIG. 2, appropriate primers corresponding to the complementary polynucleotide of interest are also attached to the solid support in known positions.

[0027] When the polynucleotide of interest is hybridized to the array of sets of oligonucleotide primers, or to the free oligonucleotide primers, under hybridization conditions, annealed primers are formed. The term, "annealed primer," as used herein, refers to an oligonucleotide primer (either free or attached to a solid support) to which a polynucleotide of interest is hybridized. The annealed primers are subjected to a single base extension reaction. The "single base extension reaction," as used herein, refers to a reaction in which the annealed primers are provided with a reaction mixture comprising a DNA polymerase, such as T7 polymerase, and terminating nucleotides under conditions such that single terminating nucleotides are added to each of the annealed primers. The term "terminating nucleotides," as used herein, refers to either single terminating nucleotides, or units of nucleotides, the units preferably being dinucleotides. In a preferred embodiment, the terminating nucleotides are single dideoxynucleotides. The terminating nucleotides can comprise standard nucleotides, and/or nucleotide analogues. The terminating nucleotide added to each annealed primer is thus a base pairing with the template base on the polynucleotide of interest, and is added immediately adjacent to the growing end of the respective primer. An oligonucleotide primer to which a terminating nucleotide has been added through the single base extension reaction is termed an "extended primer." Thus, as schematically shown for both strands of the hypothetical polynucleotide of interest in FIG. 4, a single nucleotide is added to each primer in the array; the primer set related to the strand illustrated in FIG. 2 is shown to the left in FIG. 4, and the other (complementary) strand is shown to the right. The nucleotides added are shown in extra bold type.

[0028] The terminating nucleotides preferably comprise dNTPs, and particularly comprise dideoxynucleotides (ddNTPs), but other terminating nucleotides apparent to the skilled person can also be used. If the terminating nucleotides are single nucleotides, then nucleotides corresponding to each of the four bases (A, T, G and C) are utilized in the single base extension reaction. If the terminating nucleotides are dinucleotide units, for example, then nucleotides corresponding to each of the sixteen possible dinucleotides are utilized.

[0029] The nucleotides are mutually distinguishable. For example, if the solid support is coated with a free electron metal, such as with gold or silver, surface plasmon resonance (SPR) microscopy allows identification of each nucleotide, by the change of the refractive index at the surface caused by each base extension. Alternatively, at least one of the terminating nucleotides is labeled by standard methods to facilitate detection. Suitable labels include fluorescent dyes, chemiluminescence, and radionuclides. The number of nucleotides that are labeled can be varied. It is sufficient to use three labeled terminating nucleotides, the fourth terminating nucleotide being identified by its "non-label," if single nucleotides are added in the base extension reaction. For example, if one is examining the polynucleotide of interest for the presence of a particular alteration, and not for the complete base sequence in the altered region, three labeled terminating nucleotides are sufficient. Fewer than three labels can also be utilized under appropriate circumstances. An exemplification of the use of two labeled and two unlabelled dNTPs is described below. If a specific alteration is to be investigated, such as a point mutation, only the native or normal nucleotide need be labeled, as a mutation would be indicated by the presence of the "non-label." Alternatively, the expected mutant nucleotide can also be labeled.

[0030] After the single base extension reaction has been performed, the identity and location of each terminating nucleotide is observed. If free oligonucleotide primers are used, the extended primers are eluted and separated by gel electrophoresis, and the gel is then analyzed. If oligonucleotide primers attached to an array are used, the array itself is analyzed. The gel or array is analyzed by detecting the labeled, terminating nucleotides bound to the oligonucleotide primers. The labeled, terminating nucleotides are detected by conventional methods, such as by an optical system. For example, a laser excitation source can be used in conjunction with a filter set to isolate the fluorescence emission of a particular type of terminating nucleotide. Either a photomultiplier tube, a charged-coupled device (CCD), or another suitable fluorescence detection method can be used to detect the emitted light from fluorescent terminating nucleotides.

[0031] The sequence of the polynucleotide of interest can be analyzed from the label pattern observed on the array or on the gel, since the position of each different primer on the array or on the gel is known, and since the identity of each terminating nucleotide can be determined by its specific label. The label and position of each terminating nucleotide either within the array or on the gel will directly define the sequence of the polynucleotide of interest that is being analyzed. Mutations or alterations in the sequence of the polynucleotide of interest are indicated by alterations in the expected label pattern. For example, assume that the nucleotide sequence shown in FIG. 2 contains a mutation: the third base C from the left is replaced by a G in the polynucleotide of interest. The top primer in FIG. 3 will still be extended by a C as shown in FIG. 4, whereas the next primer will be extended by a C rather than a G. Since this new, unexpected base C can be identified by its specific label and the respective primer location is known, the corresponding base mutation is identified as G.

[0032] The following simple example illustrates the ability to obtain complete sequence information and to identify a mutation in a representative polynucleotide of interest. The example utilizes two labeled terminating nucleotides, which give complete sequence information.

[0033] Assume a normal polynucleotide of the following base pair composition:

1 +ACTGCTTAG -TGACGAATC

[0034] and a corresponding mutant polynucleotide having the following base pair composition, which has a single base mutation in the third base pair:

2 +ACCGCTTAG -TGGCGAATC.

[0035] Using fluorescent labeling, for example, with a red label ("R") for terminating A and a green label ("G") for terminating G, and no label, i.e., null ("N") for the remaining bases T and C, the following "binary" codes allowing sequence interpretation would be obtained for the normal, mutant and heterozygote sequences, respectively:

3 +N-G-R-N-G-R-R-N-N Normal -R-N-N-G-N-N-N-R-G +N-G-G-N-G-R-R-N-N Mutant (Affected) -R-N-N-G-N-N-N-R-G R Heterozygote (Carrier) +N-G-G-N-G-R-R-N-N -R-N-N-G-N-N-N-R-G.

[0036] The presence of such a point mutation will affect the base pairing of the next few oligonucleotide primers to the polynucleotide of interest, and thereby the primer extensions obtained, such that the bases in the vicinity of the mutation (i.e., in the altered region) may not be accurately identified. To optimize identification of bases in the altered region, it is preferred to analyze both strands of such a double-stranded polynucleotide of interest. The few bases that may be difficult to identify on the template polynucleotide of interest, as well as the changed base, will be identified by the base extensions of the primers for the complementary polynucleotide of interest, as the analysis of the complementary polynucleotide of interest approaches the mutation site from the opposite direction. In the nearest regions on either side of the alteration, the sequence determination is thereby provided by the oligonucleotide primers for one of the two strands.

[0037] The sequence of a polynucleotide of interest for which the sequence is previously known can be determined using methods similar to those described above in reference to identification of mutations utilizing an array of oligonucleotide primers. As before, the positions of the terminating nucleotides within the array will directly define the sequence position of each nucleotide in the polynucleotide of interest.

[0038] To determine the oligonucleotide sequence, one annealed primer is selected to be the "starting" annealed primer; it is supposed for purposes of analysis that the sequence of the polynucleotide of interest "starts" with this primer. The nucleotide which has been added to the starting annealed primer is detected using standard methods. Then, a second annealed primer which has the same nucleotide sequence as the starting annealed primer, minus the 5' nucleotide and with the addition of the added nucleotide, is then selected. The terminating nucleotide which has been added to the second annealed primer is detected. These steps are then repeated, using the second annealed primer as the "starting" annealed primer in each repetition, until the sequence of the polynucleotide of interest is determined. For example, if the oligonucleotide primers are 10 nucleotides in length (N=10), the starting annealed primer is chosen to correspond to the first ten bases of the sequence. The terminating nucleotide of the starting annealed primer is then determined. Next, bases 2-11 (i.e., bases 2-10 of the starting annealed primer plus the terminating nucleotide extension) are matched to another annealed primer. This primer is the second annealed primer. The terminating nucleotide of the second annealed primer is then determined. These steps are repeated to determine the complete sequence. In this manner, the single base extension reaction automatically links together the set of annealed primers.

[0039] After analysis of the nucleotide sequence of a polypeptide of interest, the polynucleotide of interest and the terminating nucleotides can be removed from the DNA chip, so that the chip can be reused. In a preferred embodiment, the added terminating nucleotides are capable of being removed from the solid support after analysis of the polynucleotide of interest has been completed. Once the nucleotides are removed, the solid support with the immobilized oligonucleotide primers can be used for a new analysis. The nucleotides can be removed using standard methods, such as enzymatic cleavage or chemical degradation. Enzymatic cleavage, for example, would use a terminating nucleotide which can be removed by an enzyme. The single base extension reaction could result in addition to the oligonucleotide primers of RNA dideoxyTTP or RNA dideoxyCTP by reverse transcriptase or other polymerase. A C/T cleavage enzyme, such as RNase A, can then be used to "strip" off the RNA dideoxynucleotides. Alternatively, sulfur-containing dideoxy-A or dideoxy-G can be used during the single extension reaction; a sulfur-specific esterase, which does not cleave phosphates can then be used to cleave off the dideoxynucleotides. For chemical degradation, a chemically degradable terminating nucleotide can be used. For example, a modified ribonucleotide having its 2' - and 3' -hydroxyl groups esterified, such as by acetyl groups, can be used. After binding of the terminating nucleotide to the annealed primer, the acetyl groups are removed by treatment with a base to expose the 2'- and 3'-hydroxyl groups. The ribose residue can then be degraded by periodate oxidation, and the residual phosphate group removed from the annealed primer by treatment with a base and alkaline phosphatase.

[0040] The method and apparatus of the current invention have uses in detecting mutations, deletions, expanded oligonucleotide repeats, and other genetic abnormalities. For example, the current invention can be used to identify frame shifting mutations caused by insertions or deletions. Furthermore, carrier status of heritable diseases, such as cystic fibrosis, .beta.-thalassemia, .alpha.-1, Gaucher's disease, Tay Sach's disease, or Lesch-Nyham syndrome, can be easily determined using the current invention, because both the normal and the altered signals would be detected. Furthermore, mixtures of DNA molecules such as occur in HIV infected patients with drug resistance can be determined. The HIV virus may develop resistance against drugs like AZT by point mutations in the nucleotide sequence of a reverse transcriptase (RT) gene. When mutated viruses start to appear in the virus population, both the mutated gene and the normal (wild type) gene can be detected. The greater the proportion is of the mutant, the greater is the signal from the corresponding mutant terminating nucleotide.

[0041] The current invention is further exemplified by the following Examples.

EXAMPLE 1

Analyzing the Sequence of a Polynucleotide of Interest Utilizing Free Oligonucleotide Primers

[0042] An analysis of the hypoxanthine-guanine phosphoribosyl-transferase (HPRT) gene (the polypeptide of interest) was conducted for three individuals (Patients A, B, and C).

[0043] A. Obtaining the Polynucleotide of Interest

[0044] The polymerase chain reaction (see Sambrook et al., Molecular Cloning: A Laboratory Manual (2nd Ed, 1989), especially Chapter 14) was utilized to amplify the polynucleotide of interest. During the reaction, one of the two PCR primers was tagged with a biotin group. Following amplification, the single strand template was captured with streptavidin coated magnetic beads. For a 50 .mu.l PCR reaction, 25 .mu.l of Dynal M-280 paramagnetic beads (Dynal A/S, Oslo, Norway) was used. The supernatant of the beads was removed and replaced with 50 .mu.l of a binding and washing buffer (10 mM Tris-HCl (pH 7.5); 1 mM EDTA; 2 M NaCl). The PCR product was added to the beads and incubated at room temperature for 30 minutes for bead capture of the products. The single stranded polynucleotide of interest was isolated by the addition of 150 .mu.l of 0.15 M NaOH for 5 minutes. The beads were captured, the supernatant was removed, and 150 .mu.l of 0.15 M NaOH was again added for five minutes. Following denaturation, the beads were washed once with 150 .mu.l of 0.15 M NaOH and twice with 1.times. T7 annealing buffer (40 mM Tris-HCl (pH 7.5); 20 mM MgCl.sub.2; 50 mM NaCl). The beads were finally suspended in 70 .mu.l of water. This process both isolates the single-stranded polynucleotide of interest and removes any unincorporated dNTPs remaining after PCR.

[0045] B. Analyzing the Sequence of the Polynucleotide of Interest

[0046] After single strand isolation, the oligonucleotide primers were annealed to the polynucleotide of interest by heating to 65.degree. C. for approximately two minutes and cooling to room temperature over approximately 20 minutes. The 10 .mu.l reaction volume consisted of 7 .mu.l of the polynucleotide of interest (0.5-1 pmol), 2 .mu.l of 5.times. T7 annealing buffer, and 1 .mu.l of extension primer (3-9 pmol). The extension reaction was then performed. For the reaction, 1 .mu.l of DTT, 2 .mu.l of T7 polymerase (diluted 1:8) and 1 .mu.l of ddNTPs (final concentration of 0.5 uM) were added. The reaction proceeded at 37.degree. C. for two minutes, and then was stopped by the addition of 100 .mu.l of washing buffer (1.times. SSPE, 0.1% SDS, 30% ethanol). The beads were washed twice with 150 .mu.l of the washing buffer. The extension products were eluted by the addition of 5 .mu.l of formamide and heated to 70.degree. C. for two minutes. The beads were captured by the magnet and the supernatant containing the extension products was collected and analyzed on a ABI 373 (Applied Biosystems, Inc.). Oligonucleotide primers of lengths varying from 10 to 17 were used. As shown in FIG. 5, extension products were formed efficiently.

1. Deoxynucleotide Labeling--Four Fluorophores

[0047] Each ddNTP was labeled by a different fluorophore. ABI Dye Terminator dyes designed for taq polymerase were used: ddG is blue, ddA is green, ddT is yellow, and ddC is red. Four fluorescent ddNTPs were added to each reaction tube. The extension products were purified, gel separated, and analyzed on an ABI 373. Two different bases of exon 3 of the HPRT gene were analyzed: base 16534 (wild type is A) and base 16620 (wild type is C).

[0048] The results of the four fluor, single lane, indicated that the presence of mutations could be identified easily. All three patients are wild type for A at base 16534 (data not shown). Electrophoretograms shown in FIGS. 6A, 6B and 6C indicate that Patient A is wild type (C) at base 16620 (FIG. 6A), patient B is a mutated individual (C.fwdarw.T) at base 16620 (FIG. 6B), and patient C is a carrier at base 16620 (both C and T) (FIG. 6C).

2. Deoxynucleotide Labeling--Single Fluorophore

[0049] Each ddNTP was labeled by the same fluorophore. DuPont NEN fluorescein dyes (NEL 400-404) were used. Each ddNTP appears blue in the ABI 373. Only one fluorescent ddNTP is added to each reaction tube. The extension products were purified, gel separated, and analyzed on an ABI 373. Four lanes on the gel must be used to analyze each base. Two different bases of exon 3 of the HPRT gene were analyzed: base 16534 (wild type is A) and base 16620 (wild type is C).

[0050] The results of the single fluor, four lane, demonstrated results that were identical to those obtained using the four fluor, single lane method described in (1), above. This type of assay minimizes the effect of the fluorophore differences during extension product formation and gel separation.

3. Deoxynucleotide Labeling--Biotinlyated Dideoxynucleotides

[0051] The ddNTPs are labeled with a biotin group. Four separate reactions are performed, whereby only one of the four ddNTPs is biotinylated. Following the extension reaction, a strepavidin (or avidin) coupled fluorescent group is attached to the biotinylated ddNTPs. Because the biotin group is small, uniform incorporation of the ddNTPs is expected and base-specific differences in extension are minimized. Furthermore, the fluorescent signal can be amplified because the biotin group can bind a streptavidin moiety coupled to multiple fluors.

EXAMPLE 2

Analyzing the Sequence of a Polynucleotide of Interest Utilizing Labeled Deoxynucleotides

[0052] An analysis of the hypoxanthine-guanine phosphoribosyl-transferase (HPRT) gene (the nucleotide sequence of a polypeptide of interest) was conducted for three individuals (Patients A, B, and C). The third exon of the HPRT gene was examined.

[0053] Microscope glass slides were epoxysilanated at 80.degree. C. for eight hours using 25% 3' glycidoxy propyltriethoxysilane (Aldrich Chemical) in dry xylene (Aldrich Chemical) with a catalytic amount of diisopropylehylamine (Aldrich Chemical), according to Southern (Nucl. Acids Res. 20:1679 (1992), and Genomics 13:1008 (1992)). The DNA chips were made by placing 0.5 .mu.l drops of 5'-amino-linked oligonucleotides (50 .parallel.M, 0.1 M NaOH) at 37.degree. C. for six hours in a humid environment. The chips were washed in 50.degree. C. water for 15 minutes, dried and used. The annealing reaction consisted of adding 2.2 .mu.l of single-stranded DNA (0.1 .mu.M in T7 reaction buffer) to each grid position, heating the chip in a humid environment to 70.degree. C. and then cooling slowly to room temperature. A 1 .mu.l drop of 0.1 M DTT, 3 units of Sequenase Version 2.0 (USB), 5 .mu.Ci .alpha.-.sup.32P dNTP (3000 Ci/mmol) (DuPont NEN) and noncompeting unlabeled 18.5 .mu.M ddNTPs (Pharmacia) were added to each grid position for three minutes. The reaction was stopped by washing in 75.degree. C. water, and analyzed on a PhosphorImager (Molecular Dynamics).

[0054] FIGS. 7A, 7B and 7C depict the results of a DNA chip-based analysis for a five-base region within the third exon of the HPRT gene. The rows correspond to a particular base under investigation, and the columns correspond to the labeled base. FIG. 7A demonstrates the wild type sequence (TCGAG), FIG. 7B demonstrates a C.fwdarw.T mutation, and FIG. 7C demonstrates a C.fwdarw.T mutation.

[0055] Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described specifically herein. Such equivalents are intended to be encompassed in the scope of the following claims.

* * * * *