U.S. patent application number 12/605628 was filed with the patent office on 2010-06-10 for systems for rapid forensic analysis of mitochondrial dna and characterization of mitochondrial dna heteroplasmy.
Invention is credited to Lawrence B. Blyn, Stanley T. Crooke, Jared J. Drader, David J. Ecker, Mark W. Eshoo, Richard H. Griffey, Thomas A. Hall, James C. Hannis, Steven A. Hofstadler, Yun Jiang, John McNeil, Vivek Samant, Rangarajan Sampath, Neill White.
Application Number | 20100145626 12/605628 |
Document ID | / |
Family ID | 32512247 |
Filed Date | 2010-06-10 |
United States Patent
Application |
20100145626 |
Kind Code |
A1 |
Ecker; David J. ; et
al. |
June 10, 2010 |
Systems for rapid forensic analysis of mitochondrial DNA and
characterization of mitochondrial DNA heteroplasmy
Abstract
The present invention provides methods for rapid forensic
analysis of mitochondrial DNA and methods for characterizing
heteroplasmy of mitochondrial DNA, which can be used to assess the
progression of mitochondrial diseases.
Inventors: |
Ecker; David J.; (Encinitas,
CA) ; Griffey; Richard H.; (Vista, CA) ;
Sampath; Rangarajan; (San Diego, CA) ; Hofstadler;
Steven A.; (Vista, CA) ; McNeil; John; (La
Jolla, CA) ; Crooke; Stanley T.; (Carlsbad, CA)
; Blyn; Lawrence B.; (Mission Viejo, CA) ; Hall;
Thomas A.; (Oceanside, CA) ; Jiang; Yun;
(Carlsbad, CA) ; Hannis; James C.; (Vista, CA)
; White; Neill; (Vista, CA) ; Samant; Vivek;
(Encinitas, CA) ; Eshoo; Mark W.; (Solana Beach,
CA) ; Drader; Jared J.; (Carlsbad, CA) |
Correspondence
Address: |
Casimir Jones, S.C.
2275 Deming Way, Suite 310
Madison
WI
53562
US
|
Family ID: |
32512247 |
Appl. No.: |
12/605628 |
Filed: |
October 26, 2009 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10660998 |
Sep 12, 2003 |
7666588 |
|
|
12605628 |
|
|
|
|
10323438 |
Dec 18, 2002 |
|
|
|
10660998 |
|
|
|
|
09798007 |
Mar 2, 2001 |
|
|
|
10323438 |
|
|
|
|
60431319 |
Dec 6, 2002 |
|
|
|
Current U.S.
Class: |
702/19 ;
250/281 |
Current CPC
Class: |
G01N 2333/32 20130101;
G01N 2333/35 20130101; C12Q 1/6858 20130101; G01N 2333/28 20130101;
C12Q 1/6872 20130101; C12Q 1/686 20130101; C12Q 1/6844 20130101;
G01N 2333/36 20130101; C12Q 1/689 20130101; C12Q 1/6872 20130101;
C12Q 2600/156 20130101; C12Q 2545/113 20130101; C12Q 1/686
20130101; C12Q 2565/627 20130101; C12Q 2565/627 20130101; C12Q
2545/101 20130101; C12Q 2563/167 20130101; C12Q 1/6806 20130101;
C12Q 2531/113 20130101; C12Q 1/6888 20130101; C12Q 1/6858 20130101;
C12Q 1/686 20130101 |
Class at
Publication: |
702/19 ;
250/281 |
International
Class: |
G06F 19/00 20060101
G06F019/00; B01D 59/44 20060101 B01D059/44; G01N 33/48 20060101
G01N033/48 |
Claims
1-28. (canceled)
29. A system, comprising: at least one mass spectrometer configured
to determine molecular masses of two or more amplification
products, and/or fragments thereof, of two or more segments of
mitochondrial DNA from a forensic evidence sample; at least one
database comprising a plurality of known molecular masses from said
two or more segments of mitochondrial DNA, and/or fragments
thereof, from a plurality of subjects; and, at least one computer
configured to compare said molecular masses of said two or more
amplification products, and/or fragments thereof, with said
plurality of known molecular masses to reach a forensic
conclusion.
30. The system of claim 29, wherein said mass spectrometer is an
electrospray ionization mass spectrometer.
31. The system of claim 29, wherein said database is a Federal
Bureau of Investigation mitochondrial DNA database.
32. The system of claim 29, wherein said computer is configured to
determine relative amounts of said two or more amplification
products, and/or fragments thereof, from abundance of mass spectral
peaks corresponding to said two or more amplification products,
and/or fragments thereof.
33. The system of claim 29, comprising at least one nucleic acid
isolation robot configured to isolate nucleic acids from forensic
evidence samples.
34. The system of claim 29, wherein said subjects are non-human
eukaryotic organisms, fungi, parasites, or protozoa.
35. The system of claim 29, wherein said two or more segments of
mitochondrial DNA comprise a portion of a hypervariable region
(HVR) of said mitochondrial DNA.
36. The system of claim 29, wherein said hypervariable region
comprises at least one of HVR1 or HVR2.
37. The system of claim 29, wherein said forensic conclusion
comprises identification of a missing person, detection and
identification of a known bioagent, detection and identification of
an unknown bioagent, elimination of an individual as a crime
suspect, identification of an individual as a crime suspect,
identification of a location as a crime scene, identification of a
location as an accident scene, identification of evidence useful in
a court of law, identification of evidence useful in a criminal
investigation, or identification of one or more biological samples
from a crime scene.
38. The system of claim 29, wherein said forensic conclusion
comprises an identity of a subject, wherein said computer matches
said molecular masses of said amplification products with two or
more of said known molecular masses to reach said forensic
conclusion.
39. The system of claim 38, wherein said forensic conclusion is the
identification of a criminal.
40. The system of claim 38, wherein said forensic conclusion is the
identification of a crime victim.
41. The system of claim 29, comprising at least one autosampler
operably connected to said computer, which computer is configured
to control said autosampler.
42. The system of claim 41, comprising at least one microtiter
plate comprising said two or more amplification products, wherein
said autosampler is configured to transfer said two or more
amplification products to said mass spectrometer.
43. The system of claim 29, wherein said subjects are animals.
44. The system of claim 43, wherein said animals are humans.
45. A system, comprising: at least one mass spectrometer configured
to determine molecular masses of two or more amplification
products, and/or fragments thereof, of two or more segments of
mitochondrial DNA from a forensic evidence sample; at least one
database comprising a plurality of known base compositions from
said two or more segments of mitochondrial DNA, and/or fragments
thereof, from a plurality of subjects; and, at least one computer
configured to calculate base compositions of said two or more
amplification products, and/or fragments thereof, from said
molecular masses and to compare said base compositions of said two
or more amplification products, and/or fragments thereof, with said
plurality of known base compositions to reach a forensic
conclusion.
46. The system of claim 45, wherein said mass spectrometer is an
electrospray ionization mass spectrometer.
47. The system of claim 45, wherein said database is a Federal
Bureau of Investigation mitochondrial DNA database.
48. The system of claim 45, wherein said computer is configured to
determine relative amounts of said two or more amplification
products, and/or fragments thereof, from abundance of mass spectral
peaks corresponding to said two or more amplification products,
and/or fragments thereof.
49. The system of claim 45, comprising at least one nucleic acid
isolation robot configured to isolate nucleic acids from forensic
evidence samples.
50. The system of claim 45, wherein said subjects are non-human
eukaryotic organisms, fungi, parasites, or protozoa.
51. The system of claim 45, wherein said two or more segments of
mitochondrial DNA comprise a portion of a hypervariable region
(HVR) of said mitochondrial DNA.
52. The system of claim 45, wherein said hypervariable region
comprises at least one of HVR1 or HVR2.
53. The system of claim 45, wherein said forensic conclusion
comprises identification of a missing person, detection and
identification of a known bioagent, detection and identification of
an unknown bioagent, elimination of an individual as a crime
suspect, identification of an individual as a crime suspect,
identification of a location as a crime scene, identification of a
location as an accident scene, identification of evidence useful in
a court of law, identification of evidence useful in a criminal
investigation, or identification of one or more biological samples
from a crime scene.
54. The system of claim 45, wherein said forensic conclusion
comprises an identity of a subject, wherein said computer matches
said base compositions of said amplification products with two or
more of said known base compositions to reach said forensic
conclusion.
55. The system of claim 54, wherein said forensic conclusion is the
identification of a crime victim.
56. The system of claim 54, wherein said forensic conclusion is the
identification of a criminal.
57. The system of claim 45, comprising at least one autosampler
operably connected to said computer, which computer is configured
to control said autosampler.
58. The system of claim 57, comprising at least one microtiter
plate comprising said two or more amplification products, wherein
said autosampler is configured to transfer said two or more
amplification products to said mass spectrometer.
59. The system of claim 45, wherein said subjects are animals.
60. The system of claim 59, wherein said animals are humans.
61. A system, comprising: at least one mass spectrometer configured
to determine molecular masses of two or more amplification products
of two or more segments of mitochondrial DNA from a subject; and,
at least one computer configured to determine base compositions of
said two or more amplification products from said molecular masses
and to characterize heteroplasmy of said two or more segments of
mitochondrial DNA from said base compositions.
62. The system of claim 61, wherein said mass spectrometer is an
electrospray ionization mass spectrometer.
63. The system of claim 61, wherein said computer is configured to
characterize length heteroplasmy, nucleotide polymorphism
heteroplasmy, or both length heteroplasmy and nucleotide
polymorphism heteroplasmy.
64. The system of claim 61, wherein said computer is configured to
characterize a rate of naturally occurring mutations in
mitochondrial DNA.
65. The system of claim 61, comprising at least one nucleic acid
isolation robot configured to isolate nucleic acids from forensic
evidence samples.
66. The system of claim 61, comprising at least one database
comprising a plurality of base compositions from said two or more
segments of mitochondrial DNA from a plurality of subjects with one
or more mitochondrial diseases.
67. The system of claim 66, wherein said computer is configured to
compare said base compositions of said two or more amplification
products with one or more base compositions of said database to
correlate said heteroplasmy with an onset of said one or more
mitochondrial diseases in said subject.
68. The system of claim 66, wherein said one or more mitochondrial
diseases are selected from the group consisting of: Alpers Disease,
Barth syndrome, Beta-oxidation Defects, Carnitine-Acyl-Carnitine
Deficiency, Carnitine Deficiency, Co-Enzyme Q10 Deficiency, Complex
I Deficiency, Complex II Deficiency, Complex III Deficiency,
Complex IV Deficiency, Complex V Deficiency, COX Deficiency, CPEO,
CPT I Deficiency, CPT II Deficiency, Glutaric Aciduria Type II,
KSS, Lactic Acidosis, LCAD, LCHAD, Leigh Disease or Syndrome, LHON,
Lethal Infantile Cardiomyopathy, Luft Disease, MAD, MCA, MELAS,
MERRF, Mitochondrial Cytopathy, Mitochondrial DNA Depletion,
Mitochondrial Encephalopathy, Mitochondrial Myopathy, MNGIE, NARP,
Pearson Syndrome, Pyruvate Carboxylase Deficiency, Pyruvate
Dehydrogenase Deficiency, Respiratory Chain, SCAD, SCHAD, and
VLCAD.
69. The system of claim 61, comprising at least one autosampler
operably connected to said computer, which computer is configured
to control said autosampler.
70. The system of claim 69, comprising at least one microtiter
plate comprising said two or more amplification products, wherein
said autosampler is configured to transfer said two or more
amplification products to said mass spectrometer.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 10/660,998 filed Sep. 12, 2003, which is a continuation-in-part
of U.S. application Ser. No. 10/323,438 filed Dec. 18, 2002, each
of which is incorporated herein by reference in its entirety. U.S.
application Ser. No. 10/660,998 is also a continuation-in-part of
U.S. application Ser. No. 09/798,007 filed Mar. 2, 2001, which is
incorporated herein by reference in its entirety. U.S. application
Ser. No. 10/660,998 also claims priority to U.S. provisional
application Ser. No. 60/431,319 filed Dec. 6, 2002, which is
incorporated herein by reference in its entirety.
SEQUENCE LISTING
[0002] The present application is being filed along with a Sequence
Listing in electronic format. The Sequence Listing is provided as a
file entitled DIBIS0002USC18SEQ.txt, created on Oct. 23, 2009 which
is 68 Kb in size. The information in the electronic format of the
sequence listing is incorporated herein by reference in its
entirety.
FIELD OF THE INVENTION
[0003] This invention relates to the field of mitochondrial DNA
analysis. The invention enables the rapid and accurate
identification of individuals and eukaryotic organisms by forensics
methods as well as characterization of mitochondrial DNA
heteroplasmy and prediction of onset of mitochondrial diseases.
BACKGROUND OF THE INVENTION
[0004] Mitochondrial DNA (mtDNA) is found in eukaryotes and differs
from nuclear DNA in its location, its sequence, its quantity in the
cell, and its mode of inheritance. The nucleus of the cell contains
two sets of 23 chromosomes, one paternal set and one maternal set.
However, cells may contain hundreds to thousands of mitochondria,
each of which may contain several copies of mtDNA. Nuclear DNA has
many more bases than mtDNA, but mtDNA is present in many more
copies than nuclear DNA. This characteristic of mtDNA is useful in
situations where the amount of DNA in a sample is very limited.
Typical sources of DNA recovered from crime scenes include hair,
bones, teeth, and body fluids such as saliva, semen, and blood.
[0005] In humans, mitochondrial DNA is inherited strictly from the
mother (Case J. T. and Wallace, D. C., Somatic Cell Genetics, 1981,
7, 103-108; Giles, R. E. et al. Proc. Natl. Acad. Sci. 1980, 77,
6715-6719; Hutchison, C. A. et al. Nature, 1974, 251, 536-538).
Thus, the mtDNA sequences obtained from maternally related
individuals, such as a brother and a sister or a mother and a
daughter, will exactly match each other in the absence of a
mutation. This characteristic of mtDNA is advantageous in missing
persons cases as reference mtDNA samples can be supplied by any
maternal relative of the missing individual (Ginther, C. et al.
Nature Genetics, 1992, 2, 135-138; Holland, M. M. et al. Journal of
Forensic Sciences, 1993, 38, 542-553; Stoneking, M. et al. American
Journal of Human Genetics, 1991, 48, 370-382).
[0006] The human mtDNA genome is approximately 16,569 bases in
length and has two general regions: the coding region and the
control region. The coding region is responsible for the production
of various biological molecules involved in the process of energy
production in the cell. The control region is responsible for
regulation of the mtDNA molecule. Two regions of mtDNA within the
control region have been found to be highly polymorphic, or
variable, within the human population (Greenberg, B. D. et al.
Gene, 1983, 21, 33-49). These two regions are termed "hypervariable
Region I" (HVR1), which has an approximate length of 342 base pairs
(bp), and "hypervariable Region II" (HVR2), which has an
approximate length of 268 bp. Forensic mtDNA examinations are
performed using these two regions because of the high degree of
variability found among individuals.
[0007] Approximately 610 by of mtDNA are currently sequenced in
forensic mtDNA analysis. Recording and comparing mtDNA sequences
would be difficult and potentially confusing if all of the bases
were listed. Thus, mtDNA sequence information is recorded by
listing only the differences with respect to a reference DNA
sequence. By convention, human mtDNA sequences are described using
the first complete published mtDNA sequence as a reference
(Anderson, S. et al., Nature, 1981, 290, 457-465). This sequence is
commonly referred to as the Anderson sequence. It is also called
the Cambridge reference sequence or the Oxford sequence. Each base
pair in this sequence is assigned a number. Deviations from this
reference sequence are recorded as the number of the position
demonstrating a difference and a letter designation of the
different base. For example, a transition from A to G at Position
263 would be recorded as 263 G. If deletions or insertions of bases
are present in the mtDNA, these differences are denoted as
well.
[0008] In the United States, there are seven laboratories currently
conducting forensic mtDNA examinations: the FBI Laboratory;
Laboratory Corporation of America (LabCorp) in Research Triangle
Park, N.C.; Mitotyping Technologies in State College, Pa.; the Bode
Technology Group (BTG) in Springfield, Va.; the Armed Forces DNA
Identification Laboratory (AFDIL) in Rockville, Md.; BioSynthesis,
Inc. in Lewisville, Tex.; and Reliagene in New Orleans, La.
[0009] Mitochondrial DNA analyses have been admitted in criminal
proceedings from these laboratories in the following states as of
April 1999: Alabama, Arkansas, Florida, Indiana, Illinois,
Maryland, Michigan, New Mexico, North Carolina, Pennsylvania, South
Carolina, Tennessee, Texas, and Washington. Mitochondrial DNA has
also been admitted and used in criminal trials in Australia, the
United Kingdom, and several other European countries.
[0010] Since 1996, the number of individuals performing
mitochondrial DNA analysis at the FBI Laboratory has grown from 4
to 12, with more personnel expected in the near future. Over 150
mitochondrial DNA cases have been completed by the FBI Laboratory
as of March 1999, and dozens more await analysis. Forensic courses
are being taught by the FBI Laboratory personnel and other groups
to educate forensic scientists in the procedures and interpretation
of mtDNA sequencing. More and more individuals are learning about
the value of mtDNA sequencing for obtaining useful information from
evidentiary samples that are small, degraded, or both.
Mitochondrial DNA sequencing is becoming known not only as an
exclusionary tool but also as a complementary technique for use
with other human identification procedures. Mitochondrial DNA
analysis will continue to be a powerful tool for law enforcement
officials in the years to come as other applications are developed,
validated, and applied to forensic evidence.
[0011] Presently, the forensic analysis of mtDNA is rigorous and
labor-intensive. Currently, only 1-2 cases per month per analyst
can be performed. Several molecular biological techniques are
combined to obtain a mtDNA sequence from a sample. The steps of the
mtDNA analysis process include primary visual analysis, sample
preparation, DNA extraction, polymerase chain reaction (PCR)
amplification, postamplification quantification of the DNA,
automated DNA sequencing, and data analysis. Another complicating
factor in the forensic analysis of mtDNA is the occurrence of
heteroplasmy wherein the pool of mtDNAs in a given cell is
heterogeneous due to mutations in individual mtDNAs. There are two
forms of heteroplasmy found in mtDNA. Sequence heteroplasmy (also
known as point heteroplasmy) is the occurrence of more than one
base at a particular position or positions in the mtDNA sequence.
Length heteroplasmy is the occurrence of more than one length of a
stretch of the same base in a mtDNA sequence as a result of
insertion of nucleotide residues.
Heteroplasmy is a problem for forensic investigators since a sample
from a crime scene can differ from a sample from a suspect by one
base pair and this difference may be interpreted as sufficient
evidence to eliminate that individual as the suspect. Hair samples
from a single individual can contain heteroplasmic mutations at
vastly different concentrations and even the root and shaft of a
single hair can differ. The detection methods currently available
to molecular biologists cannot detect low levels of heteroplasmy.
Furthermore, if present, length heteroplasmy will adversely affect
sequencing runs by resulting in an out-of-frame sequence that
cannot be interpreted.
[0012] Mass spectrometry provides detailed information about the
molecules being analyzed, including high mass accuracy. It is also
a process that can be easily automated. Low-resolution MS may be
unreliable when used to detect some known agents, if their spectral
lines are sufficiently weak or sufficiently close to those from
other living organisms in the sample. DNA chips with specific
probes can only determine the presence or absence of specifically
anticipated organisms. Because there are hundreds of thousands of
species of benign bacteria, some very similar in sequence to threat
organisms, even arrays with 10,000 probes lack the breadth needed
to detect a particular organism.
[0013] Antibodies face more severe diversity limitations than
arrays. If antibodies are designed against highly conserved targets
to increase diversity, the false alarm problem will dominate, again
because threat organisms are very similar to benign ones.
Antibodies are only capable of detecting known agents in relatively
uncluttered environments.
[0014] Several groups have described detection of PCR products
using high resolution electrospray ionization-Fourier transform-ion
cyclotron resonance mass spectrometry (ESI-FT-ICR MS). Accurate
measurement of exact mass combined with knowledge of the number of
at least one nucleotide allowed calculation of the total base
composition for PCR duplex products of approximately 100 base
pairs. (Aaserud et al., J. Am. Soc. Mass Spec., 1996, 7, 1266-1269;
Muddiman et al., Anal. Chem., 1997, 69, 1543-1549; Wunschel et al.,
Anal. Chem., 1998, 70, 1203-1207; Muddiman et al., Rev. Anal.
Chem., 1998, 17, 1-68). Electrospray ionization-Fourier
transform-ion cyclotron resistance (ESI-FT-ICR) MS may be used to
determine the mass of double-stranded, 500 base-pair PCR products
via the average molecular mass (Hurst et al., Rapid Commun. Mass
Spec. 1996, 10, 377-382). The use of matrix-assisted laser
desorption ionization-time of flight (MALDI-TOF) mass spectrometry
for characterization of PCR products has been described. (Muddiman
et al., Rapid Commun. Mass Spec., 1999, 13, 1201-1204). However,
the degradation of DNAs over about 75 nucleotides observed with
MALDI limited the utility of this method.
[0015] U.S. Pat. No. 5,849,492 describes a method for retrieval of
phylogenetically informative DNA sequences which comprise searching
for a highly divergent segment of genomic DNA surrounded by two
highly conserved segments, designing the universal primers for PCR
amplification of the highly divergent region, amplifying the
genomic DNA by PCR technique using universal primers, and then
sequencing the gene to determine the identity of the organism.
[0016] U.S. Pat. No. 5,965,363 discloses methods for screening
nucleic acids for polymorphisms by analyzing amplified target
nucleic acids using mass spectrometric techniques and to procedures
for improving mass resolution and mass accuracy of these
methods.
[0017] WO 99/14375 describes methods, PCR primers and kits for use
in analyzing preselected DNA tandem nucleotide repeat alleles by
mass spectrometry.
[0018] WO 98/12355 discloses methods of determining the mass of a
target nucleic acid by mass spectrometric analysis, by cleaving the
target nucleic acid to reduce its length, making the target
single-stranded and using MS to determine the mass of the
single-stranded shortened target. Also disclosed are methods of
preparing a double-stranded target nucleic acid for MS analysis
comprising amplification of the target nucleic acid, binding one of
the strands to a solid support, releasing the second strand and
then releasing the first strand which is then analyzed by MS. Kits
for target nucleic acid preparation are also provided.
[0019] PCT WO97/33000 discloses methods for detecting mutations in
a target nucleic acid by nonrandomly fragmenting the target into a
set of single-stranded nonrandom length fragments and determining
their masses by MS.
[0020] U.S. Pat. No. 5,605,798 describes a fast and highly accurate
mass spectrometer-based process for detecting the presence of a
particular nucleic acid in a biological sample for diagnostic
purposes.
[0021] WO 98/21066 describes processes for determining the sequence
of a particular target nucleic acid by mass spectrometry. Processes
for detecting a target nucleic acid present in a biological sample
by PCR amplification and mass spectrometry detection are disclosed,
as are methods for detecting a target nucleic acid in a sample by
amplifying the target with primers that contain restriction sites
and tags, extending and cleaving the amplified nucleic acid, and
detecting the presence of extended product, wherein the presence of
a DNA fragment of a mass different from wild-type is indicative of
a mutation. Methods of sequencing a nucleic acid via mass
spectrometry methods are also described.
[0022] WO 97/37041, WO 99/31278 and U.S. Pat. No. 5,547,835
describe methods of sequencing nucleic acids using mass
spectrometry. U.S. Pat. Nos. 5,622,824, 5,872,003 and 5,691,141
describe methods, systems and kits for exonuclease-mediated mass
spectrometric sequencing.
[0023] Thus, there is a need for a method for bioagent detection
and identification which is both specific and rapid, and in which
no nucleic acid sequencing is required. The present invention
addresses this need.
SUMMARY OF THE INVENTION
[0024] The present invention is directed to methods of identifying
an individual by obtaining mitochondrial DNA from the individual,
amplifying the mitochondrial DNA with intelligent primers to obtain
at least one amplification product, determining the molecular mass
of the amplification product and comparing the molecular mass with
a database of molecular masses calculated from known sequences of
mitochondrial DNAs indexed to known individuals, wherein a match
between said molecular mass of the amplification product and the
calculated molecular mass of a known sequence in the database
identifies the individual.
[0025] Furthermore, this present invention is directed to methods
of determining the identity of protists or fungi by a process
analogous to the process described above, and determining the
geographic spread of fungi and protists by analysis of samples
obtained from a plurality of geographic locations.
[0026] The present invention is also directed to methods of
characterizing the heteroplasmy of a sample of mitochondrial DNA by
amplifying the mitochondrial DNA with intelligent primers to obtain
a plurality of amplification products, determining the molecular
masses and relative abundances of the plurality of amplification
products, thereby characterizing said heteroplasmy. Furthermore,
the present invention is directed to using these methods to
characterize the heteroplasmy of a plurality of samples of
mitochondrial DNA taken from an individual at different points of
the lifetime of said individual to investigate the rate of
naturally occurring mutations in mitochondrial DNA. These methods
can also be used to initiate a prediction of the rate of onset of
mitochondrial disease.
BRIEF DESCRIPTION OF THE DRAWINGS
[0027] FIGS. 1A-1H and FIG. 2 are consensus diagrams that show
examples of conserved regions from 16S rRNA (FIG. 1A-1, 1A-2, 1A-3,
1A-4, and 1A-5), 23S rRNA (3'-half, FIGS. 1B, 1C, and 1D; 5'-half,
FIGS. 1E-F), 23S rRNA Domain I (FIG. 1G), 23S rRNA Domain IV (FIG.
1H) and 16S rRNA Domain III (FIG. 2) which are suitable for use in
the present invention. Lines with arrows are examples of regions to
which intelligent primer pairs for PCR are designed. The label for
each primer pair represents the starting and ending base number of
the amplified region on the consensus diagram. Bases in capital
letters are greater than 95% conserved; bases in lower case letters
are 90-95% conserved, filled circles are 80-90% conserved; and open
circles are less than 80% conserved. The label for each primer pair
represents the starting and ending base number of the amplified
region on the consensus diagram. The nucleotide sequence of the 16S
rRNA consensus sequence is SEQ ID NO:3 and the nucleotide sequence
of the 23S rRNA consensus sequence is SEQ ID NO:4.
[0028] FIG. 2 shows a typical primer amplified region from the 16S
rRNA Domain III shown in FIG. 1A-1.
[0029] FIG. 3 is a schematic diagram showing conserved regions in
RNase P. Bases in capital letters are greater than 90% conserved;
bases in lower case letters are 80-90% conserved; filled circles
designate bases which are 70-80% conserved; and open circles
designate bases that are less than 70% conserved.
[0030] FIG. 4 is a schematic diagram of base composition signature
determination using nucleotide analog "tags" to determine base
composition signatures.
[0031] FIG. 5 shows the deconvoluted mass spectra of a Bacillus
anthracis region with and without the mass tag phosphorothioate A
(A*). The two spectra differ in that the measured molecular weight
of the mass tag-containing sequence is greater than the unmodified
sequence.
[0032] FIG. 6 shows base composition signature (BCS) spectra from
PCR products from Staphylococcus aureus (S. aureus 16S.sub.--1337F)
and Bacillus anthracis (B. anthr. 16S.sub.--1337F), amplified using
the same primers. The two strands differ by only two (AT-->CG)
substitutions and are clearly distinguished on the basis of their
BCS.
[0033] FIG. 7 shows that a single difference between two sequences
(A14 in B. anthracis vs. A15 in B. cereus) can be easily detected
using ESI-TOF mass spectrometry.
[0034] FIG. 8 is an ESI-TOF of Bacillus anthracis spore coat
protein sspE 56mer plus calibrant. The signals unambiguously
identify B. anthracis versus other Bacillus species.
[0035] FIG. 9 is an ESI-TOF of a B. anthracis synthetic
16S.sub.--1228 duplex (reverse and forward strands). The technique
easily distinguishes between the forward and reverse strands.
[0036] FIG. 10 is an ESI-FTICR-MS of a synthetic B. anthracis
16S.sub.--1337 46 base pair duplex.
[0037] FIG. 11 is an ESI-TOF-MS of a 56mer oligonucleotide (3
scans) from the B. anthracis saspB gene with an internal mass
standard. The internal mass standards are designated by
asterisks.
[0038] FIG. 12 is an ESI-TOF-MS of an internal standard with 5 mM
TBA-TFA buffer showing that charge stripping with tributylammonium
trifluoroacetate reduces the most abundant charge state from
[M-8H+]8- to [M-3H+]3-.
[0039] FIG. 13 is a portion of a secondary structure defining
database according to one embodiment of the present invention,
where two examples of selected sequences are displayed graphically
thereunder.
[0040] FIG. 14 is a three dimensional graph demonstrating the
grouping of sample molecular weight according to species.
[0041] FIG. 15 is a three dimensional graph demonstrating the
grouping of sample molecular weights according to species of virus
and mammal infected.
[0042] FIG. 16 is a three dimensional graph demonstrating the
grouping of sample molecular weights according to species of virus,
and animal-origin of infectious agent.
[0043] FIG. 17 is a figure depicting how the triangulation method
of the present invention provides for the identification of an
unknown bioagent without prior knowledge of the unknown agent. The
use of different primer sets to distinguish and identify the
unknown is also depicted as primer sets I, II and III within this
figure. A three dimensional graph depicts all of bioagent space
(170), including the unknown bioagent, which after use of primer
set I (171) according to a method according to the present
invention further differentiates and classifies bioagents according
to major classifications (176) which, upon further analysis using
primer set II (172) differentiates the unknown agent (177) from
other, known agents (173) and finally, the use of a third primer
set (175) further specifies subgroups within the family of the
unknown (174).
[0044] FIG. 18 shows: a) a representative ESI-FTICR mass spectrum
of a restriction digest of a 986 by region of the 16S ribosomal
gene from E. coli K12 digested with a mixture of BstNI, BsmFI,
BfaI, and NcoI; b) a deconvoluted representation (neutral mass) of
the above spectrum showing the base compositions derived from
accurate mass measurements of each fragment; and c) a
representative reconstructed restriction map showing complete base
composition coverage for nucleotides 1-856. The Nco1 did not
cut.
[0045] FIG. 19 indicates the process of mtDNA analysis. After
amplification by PCR (210), the PCR products were subjected to
restriction digests (220) with RsaI for HVR1 and a combination of
HpaII, HpyCH4IV, Pad and EaeI for HVR2 in order to obtain amplicon
segments suitable for analysis by FTICR-MS (240). The data were
processed to obtain mass data for each amplicon segment (250) which
were then compared to the masses calculated for theoretical digests
from the FBI mtDNA database by a scoring scheme (260).
[0046] FIG. 20A indicates predicted and actual mass data with
scoring parameters for length heteroplasmy (HVR1-1-outer-variants 1
and 2) in the digest segment from position 94 to 145 (variant 1)
(SEQ ID NO: 44)/146 (variant 2) (SEQ ID NO: 45) are shown.
[0047] FIG. 20B indicates that, whereas sequencing fails to resolve
the variants due to the length heteroplasmy, mass determination
detects multiple species simultaneously and also indicates
abundance ratios. In this case, the ratio of variant 1 (SEQ ID NO:
46, top sequence) to variant 2 (SEQ ID NO: 47)(short to long
alleles) is 1:3.
DESCRIPTION OF EMBODIMENTS
A. Introduction
[0048] The present invention provides, inter alia, methods for
detection and identification of bioagents in an unbiased manner
using "bioagent identifying amplicons." "Intelligent primers" are
selected to hybridize to conserved sequence regions of nucleic
acids derived from a bioagent and which bracket variable sequence
regions to yield a bioagent identifying amplicon which can be
amplified and which is amenable to molecular mass determination.
The molecular mass then provides a means to uniquely identify the
bioagent without a requirement for prior knowledge of the possible
identity of the bioagent. The molecular mass or corresponding "base
composition signature" (BCS) of the amplification product is then
matched against a database of molecular masses or base composition
signatures. Furthermore, the method can be applied to rapid
parallel "multiplex" analyses, the results of which can be employed
in a triangulation identification strategy. The present method
provides rapid throughput and does not require nucleic acid
sequencing of the amplified target sequence for bioagent detection
and identification.
B. Bioagents
[0049] In the context of this invention, a "bioagent" is any
organism, cell, or virus, living or dead, or a nucleic acid derived
from such an organism, cell or virus. Examples of bioagents
include, but are not limited, to cells, including but not limited
to, cells, including but not limited to human clinical samples,
bacterial cells and other pathogens) viruses, fungi, and protists,
parasites, and pathogenicity markers (including but not limited to:
pathogenicity islands, antibiotic resistance genes, virulence
factors, toxin genes and other bioregulating compounds). Samples
may be alive or dead or in a vegetative state (for example,
vegetative bacteria or spores) and may be encapsulated or
bioengineered. In the context of this invention, a "pathogen" is a
bioagent which causes a disease or disorder.
[0050] Despite enormous biological diversity, all forms of life on
earth share sets of essential, common features in their genomes.
Bacteria, for example have highly conserved sequences in a variety
of locations on their genomes. Most notable is the universally
conserved region of the ribosome. but there are also conserved
elements in other non-coding RNAs, including RNAse P and the signal
recognition particle (SRP) among others. Bacteria have a common set
of absolutely required genes. About 250 genes are present in all
bacterial species (Proc. Natl. Acad. Sci. U.S.A., 1996, 93, 10268;
Science, 1995, 270, 397), including tiny genomes like Mycoplasma,
Ureaplasma and Rickettsia. These genes encode proteins involved in
translation, replication, recombination and repair, transcription,
nucleotide metabolism, amino acid metabolism, lipid metabolism,
energy generation, uptake, secretion and the like. Examples of
these proteins are DNA polymerase III beta, elongation factor TU,
heat shock protein groEL, RNA polymerase beta, phosphoglycerate
kinase, NADH dehydrogenase, DNA ligase, DNA topoisomerase and
elongation factor G. Operons can also be targeted using the present
method. One example of an operon is the bfp operon from
enteropathogenic E. coli. Multiple core chromosomal genes can be
used to classify bacteria at a genus or genus species level to
determine if an organism has threat potential. The methods can also
be used to detect pathogenicity markers (plasmid or chromosomal)
and antibiotic resistance genes to confirm the threat potential of
an organism and to direct countermeasures.
C. Selection of "Bioagent Identifying Amplicons"
[0051] Since genetic data provide the underlying basis for
identification of bioagents by the methods of the present
invention, it is necessary to select segments of nucleic acids
which ideally provide enough variability to distinguish each
individual bioagent and whose molecular mass is amenable to
molecular mass determination. In one embodiment of the present
invention, at least one polynucleotide segment is amplified to
facilitate detection and analysis in the process of identifying the
bioagent. Thus, the nucleic acid segments which provide enough
variability to distinguish each individual bioagent and whose
molecular masses are amenable to molecular mass determination are
herein described as "bioagent identifying amplicons." The term
"amplicon" as used herein, refers to a segment of a polynucleotide
which is amplified in an amplification reaction.
[0052] As used herein, "intelligent primers" are primers that are
designed to bind to highly conserved sequence regions of a bioagent
identifying amplicon that flank an intervening variable region and
yield amplification products which ideally provide enough
variability to distinguish each individual bioagent, and which are
amenable to molecular mass analysis. By the term "highly
conserved," it is meant that the sequence regions exhibit between
about 80-100%, or between about 90-100%, or between about 95-100%
identity. The molecular mass of a given amplification product
provides a means of identifying the bioagent from which it was
obtained, due to the variability of the variable region. Thus
design of intelligent primers requires selection of a variable
region with appropriate variability to resolve the identity of a
given bioagent. Bioagent identifying amplicons are ideally specific
to the identity of the bioagent. A plurality of bioagent
identifying amplicons selected in parallel for distinct bioagents
which contain the same conserved sequences for hybridization of the
same pair of intelligent primers are herein defined as "correlative
bioagent identifying amplicons."
[0053] In one embodiment, the bioagent identifying amplicon is a
portion of a ribosomal RNA (rRNA) gene sequence. With the complete
sequences of many of the smallest microbial genomes now available,
it is possible to identify a set of genes that defines "minimal
life" and identify composition signatures that uniquely identify
each gene and organism. Genes that encode core life functions such
as DNA replication, transcription, ribosome structure, translation,
and transport are distributed broadly in the bacterial genome and
are suitable regions for selection of bioagent identifying
amplicons. Ribosomal RNA (rRNA) genes comprise regions that provide
useful base composition signatures. Like many genes involved in
core life functions, rRNA genes contain sequences that are
extraordinarily conserved across bacterial domains interspersed
with regions of high variability that are more specific to each
species. The variable regions can be utilized to build a database
of base composition signatures. The strategy involves creating a
structure-based alignment of sequences of the small (16S) and the
large (23S) subunits of the rRNA genes. For example, there are
currently over 13,000 sequences in the ribosomal RNA database that
has been created and maintained by Robin Gutell, University of
Texas at Austin, and is publicly available on the Institute for
Cellular and Molecular Biology web page on the world wide web of
the Internet at, for example, "rna.icmb.utexas.edu/." There is also
a publicly available rRNA database created and maintained by the
University of Antwerp, Belgium on the world wide web of the
Internet at, for example, "rrna.uia.ac.be."
[0054] These databases have been analyzed to determine regions that
are useful as bioagent identifying amplicons. The characteristics
of such regions include: a) between about 80 and 100%, or greater
than about 95% identity among species of the particular bioagent of
interest, of upstream and downstream nucleotide sequences which
serve as sequence amplification primer sites; b) an intervening
variable region which exhibits no greater than about 5% identity
among species; and c) a separation of between about 30 and 1000
nucleotides, or no more than about 50-250 nucleotides, or no more
than about 60-100 nucleotides, between the conserved regions.
[0055] As a non-limiting example, for identification of Bacillus
species, the conserved sequence regions of the chosen bioagent
identifying amplicon must be highly conserved among all Bacillus
species while the variable region of the bioagent identifying
amplicon is sufficiently variable such that the molecular masses of
the amplification products of all species of Bacillus are
distinguishable.
[0056] Bioagent identifying amplicons amenable to molecular mass
determination are either of a length, size or mass compatible with
the particular mode of molecular mass determination or compatible
with a means of providing a predictable fragmentation pattern in
order to obtain predictable fragments of a length compatible with
the particular mode of molecular mass determination. Such means of
providing a predictable fragmentation pattern of an amplification
product include, but are not limited to, cleavage with restriction
enzymes or cleavage primers, for example.
[0057] Identification of bioagents can be accomplished at different
levels using intelligent primers suited to resolution of each
individual level of identification. "Broad range survey"
intelligent primers are designed with the objective of identifying
a bioagent as a member of a particular division of bioagents. A
"bioagent division" is defined as group of bioagents above the
species level and includes but is not limited to: orders, families,
classes, clades, genera or other such groupings of bioagents above
the species level. As a non-limiting example, members of the
Bacillus/Clostridia group or gamma-proteobacteria group may be
identified as such by employing broad range survey intelligent
primers such as primers which target 16S or 23S ribosomal RNA.
[0058] In some embodiments, broad range survey intelligent primers
are capable of identification of bioagents at the species level.
One main advantage of the detection methods of the present
invention is that the broad range survey intelligent primers need
not be specific for a particular bacterial species, or even genus,
such as Bacillus or Streptomyces. Instead, the primers recognize
highly conserved regions across hundreds of bacterial species
including, but not limited to, the species described herein. Thus,
the same broad range survey intelligent primer pair can be used to
identify any desired bacterium because it will bind to the
conserved regions that flank a variable region specific to a single
species, or common to several bacterial species, allowing unbiased
nucleic acid amplification of the intervening sequence and
determination of its molecular weight and base composition. For
example, the 16S.sub.--971-1062, 16S.sub.--1228-1310 and
16S.sub.--1100-1188 regions are 98-99% conserved in about 900
species of bacteria (16S=16S rRNA, numbers indicate nucleotide
position). In one embodiment of the present invention, primers used
in the present method bind to one or more of these regions or
portions thereof.
[0059] Due to their overall conservation, the flanking rRNA primer
sequences serve as good intelligent primer binding sites to amplify
the nucleic acid region of interest for most, if not all, bacterial
species. The intervening region between the sets of primers varies
in length and/or composition, and thus provides a unique base
composition signature. Examples of intelligent primers that amplify
regions of the 16S and 23S rRNA are shown in FIGS. 1A-1H. A typical
primer amplified region in 16S rRNA is shown in FIG. 2. The arrows
represent primers that bind to highly conserved regions which flank
a variable region in 16S rRNA domain III. The amplified region is
the stem-loop structure under "1100-1188." It is advantageous to
design the broad range survey intelligent primers to minimize the
number of primers required for the analysis, and to allow detection
of multiple members of a bioagent division using a single pair of
primers. The advantage of using broad range survey intelligent
primers is that once a bioagent is broadly identified, the process
of further identification at species and sub-species levels is
facilitated by directing the choice of additional intelligent
primers.
[0060] "Division-wide" intelligent primers are designed with an
objective of identifying a bioagent at the species level. As a
non-limiting example, a Bacillus anthracis, Bacillus cereus and
Bacillus thuringiensis can be distinguished from each other using
division-wide intelligent primers. Division-wide intelligent
primers are not always required for identification at the species
level because broad range survey intelligent primers may provide
sufficient identification resolution to accomplishing this
identification objective.
[0061] "Drill-down" intelligent primers are designed with an
objective of identifying a sub-species characteristic of a
bioagent. A "sub-species characteristic" is defined as a property
imparted to a bioagent at the sub-species level of identification
as a result of the presence or absence of a particular segment of
nucleic acid. Such sub-species characteristics include, but are not
limited to, strains, sub-types, pathogenicity markers such as
antibiotic resistance genes, pathogenicity islands, toxin genes and
virulence factors. Identification of such sub-species
characteristics is often critical for determining proper clinical
treatment of pathogen infections.
Chemical Modifications of Intelligent Primers
[0062] Ideally, intelligent primer hybridization sites are highly
conserved in order to facilitate the hybridization of the primer.
In cases where primer hybridization is less efficient due to lower
levels of conservation of sequence, intelligent primers can be
chemically modified to improve the efficiency of hybridization.
[0063] For example, because any variation (due to codon wobble in
the 3.sup.rd position) in these conserved regions among species is
likely to occur in the third position of a DNA triplet,
oligonucleotide primers can be designed such that the nucleotide
corresponding to this position is a base which can bind to more
than one nucleotide, referred to herein as a "universal base." For
example, under this "wobble" pairing, inosine (I) binds to U, C or
A; guanine (G) binds to U or C, and uridine (U) binds to U or C.
Other examples of universal bases include nitroindoles such as
5-nitroindole or 3-nitropyrrole (Loakes et al., Nucleosides and
Nucleotides, 1995, 14, 1001-1003), the degenerate nucleotides dP or
dK (Hill et al.), an acyclic nucleoside analog containing
5-nitroindazole (Van Aerschot et al., Nucleosides and Nucleotides,
1995, 14, 1053-1056) or the purine analog
1-(2-deoxy-.beta.-D-ribofuranosyl)-imidazole-4-carboxamide (Sala et
al., Nucl. Acids Res., 1996, 24, 3302-3306).
[0064] In another embodiment of the invention, to compensate for
the somewhat weaker binding by the "wobble" base, the
oligonucleotide primers are designed such that the first and second
positions of each triplet are occupied by nucleotide analogs which
bind with greater affinity than the unmodified nucleotide. Examples
of these analogs include, but are not limited to, 2,6-diaminopurine
which binds to thymine, propyne T which binds to adenine and
propyne C and phenoxazines, including G-clamp, which binds to G.
Propynylated pyrimidines are described in U.S. Pat. Nos. 5,645,985,
5,830,653 and 5,484,908, each of which is commonly owned and
incorporated herein by reference in its entirety. Propynylated
primers are claimed in U.S. Ser. No. 10/294,203 which is also
commonly owned and incorporated herein by reference in entirety.
Phenoxazines are described in U.S. Pat. Nos. 5,502,177, 5,763,588,
and 6,005,096, each of which is incorporated herein by reference in
its entirety. G-clamps are described in U.S. Pat. Nos. 6,007,992
and 6,028,183, each of which is incorporated herein by reference in
its entirety.
D. Characterization of Bioagent Identifying Amplicons
[0065] A theoretically ideal bioagent detector would identify,
quantify, and report the complete nucleic acid sequence of every
bioagent that reached the sensor. The complete sequence of the
nucleic acid component of a pathogen would provide all relevant
information about the threat, including its identity and the
presence of drug-resistance or pathogenicity markers. This ideal
has not yet been achieved. However, the present invention provides
a straightforward strategy for obtaining information with the same
practical value based on analysis of bioagent identifying amplicons
by molecular mass determination.
[0066] In some cases, a molecular mass of a given bioagent
identifying amplicon alone does not provide enough resolution to
unambiguously identify a given bioagent. For example, the molecular
mass of the bioagent identifying amplicon obtained using the
intelligent primer pair "16S.sub.--971" would be 55622 Da for both
E. coli and Salmonella typhimurium. However, if additional
intelligent primers are employed to analyze additional bioagent
identifying amplicons, a "triangulation identification" process is
enabled. For example, the "16S.sub.--1100" intelligent primer pair
yields molecular masses of 55009 and 55005 Da for E. coli and
Salmonella typhimurium, respectively. Furthermore, the
"23S.sub.--855" intelligent primer pair yields molecular masses of
42656 and 42698 Da for E. coli and Salmonella typhimurium,
respectively. In this basic example, the second and third
intelligent primer pairs provided the additional "fingerprinting"
capability or resolution to distinguish between the two
bioagents.
[0067] In another embodiment, the triangulation identification
process is pursued by measuring signals from a plurality of
bioagent identifying amplicons selected within multiple core genes.
This process is used to reduce false negative and false positive
signals, and enable reconstruction of the origin of hybrid or
otherwise engineered bioagents. In this process, after
identification of multiple core genes, alignments are created from
nucleic acid sequence databases. The alignments are then analyzed
for regions of conservation and variation, and bioagent identifying
amplicons are selected to distinguish bioagents based on specific
genomic differences. For example, identification of the three part
toxin genes typical of B. anthracis (Bowen et al., J. Appl.
Microbiol., 1999, 87, 270-278) in the absence of the expected
signatures from the B. anthracis genome would suggest a genetic
engineering event.
[0068] The triangulation identification process can be pursued by
characterization of bioagent identifying amplicons in a massively
parallel fashion using the polymerase chain reaction (PCR), such as
multiplex PCR, and mass spectrometric (MS) methods. Sufficient
quantities of nucleic acids should be present for detection of
bioagents by MS. A wide variety of techniques for preparing large
amounts of purified nucleic acids or fragments thereof are well
known to those of skill in the art. PCR requires one or more pairs
of oligonucleotide primers that bind to regions which flank the
target sequence(s) to be amplified. These primers prime synthesis
of a different strand of DNA, with synthesis occurring in the
direction of one primer towards the other primer. The primers, DNA
to be amplified, a thermostable DNA polymerase (e.g. Taq
polymerase), the four deoxynucleotide triphosphates, and a buffer
are combined to initiate DNA synthesis. The solution is denatured
by heating, then cooled to allow annealing of newly added primer,
followed by another round of DNA synthesis. This process is
typically repeated for about 30 cycles, resulting in amplification
of the target sequence.
[0069] Although the use of PCR is suitable, other nucleic acid
amplification techniques may also be used, including ligase chain
reaction (LCR) and strand displacement amplification (SDA). The
high-resolution MS technique allows separation of bioagent spectral
lines from background spectral lines in highly cluttered
environments.
[0070] In another embodiment, the detection scheme for the PCR
products generated from the bioagent(s) incorporates at least three
features. First, the technique simultaneously detects and
differentiates multiple (generally about 6-10) PCR products.
Second, the technique provides a molecular mass that uniquely
identifies the bioagent from the possible primer sites. Finally,
the detection technique is rapid, allowing multiple PCR reactions
to be run in parallel.
E. Mass Spectrometric Characterization of Bioagent Identifying
Amplicons
[0071] Mass spectrometry (MS)-based detection of PCR products
provides a means for determination of BCS which has several
advantages. MS is intrinsically a parallel detection scheme without
the need for radioactive or fluorescent labels, since every
amplification product is identified by its molecular mass. The
current state of the art in mass spectrometry is such that less
than femtomole quantities of material can be readily analyzed to
afford information about the molecular contents of the sample. An
accurate assessment of the molecular mass of the material can be
quickly obtained, irrespective of whether the molecular weight of
the sample is several hundred, or in excess of one hundred thousand
atomic mass units (amu) or Daltons. Intact molecular ions can be
generated from amplification products using one of a variety of
ionization techniques to convert the sample to gas phase. These
ionization methods include, but are not limited to, electrospray
ionization (ES), matrix-assisted laser desorption ionization
(MALDI) and fast atom bombardment (FAB). For example, MALDI of
nucleic acids, along with examples of matrices for use in MALDI of
nucleic acids, are described in WO 98/54751 (Genetrace, Inc.).
[0072] In some embodiments, large DNAs and RNAs, or large
amplification products therefrom, can be digested with restriction
endonucleases prior to ionization. Thus, for example, an
amplification product that was 10 kDa could be digested with a
series of restriction endonucleases to produce a panel of, for
example, 100 Da fragments. Restriction endonucleases and their
sites of action are well known to the skilled artisan. In this
manner, mass spectrometry can be performed for the purposes of
restriction mapping.
[0073] Upon ionization, several peaks are observed from one sample
due to the formation of ions with different charges. Averaging the
multiple readings of molecular mass obtained from a single mass
spectrum affords an estimate of molecular mass of the bioagent.
Electrospray ionization mass spectrometry (ESI-MS) is particularly
useful for very high molecular weight polymers such as proteins and
nucleic acids having molecular weights greater than 10 kDa, since
it yields a distribution of multiply-charged molecules of the
sample without causing a significant amount of fragmentation.
[0074] The mass detectors used in the methods of the present
invention include, but are not limited to, Fourier transform ion
cyclotron resonance mass spectrometry (FT-ICR-MS), ion trap,
quadrupole, magnetic sector, time of flight (TOF), Q-TOF, and
triple quadrupole.
[0075] In general, the mass spectrometric techniques which can be
used in the present invention include, but are not limited to,
tandem mass spectrometry, infrared multiphoton dissociation and
pyrolytic gas chromatography mass spectrometry (PGC-MS). In one
embodiment of the invention, the bioagent detection system operates
continually in bioagent detection mode using pyrolytic GC-MS
without PCR for rapid detection of increases in biomass (for
example, increases in fecal contamination of drinking water or of
germ warfare agents). To achieve minimal latency, a continuous
sample stream flows directly into the PGC-MS combustion chamber.
When an increase in biomass is detected, a PCR process is
automatically initiated. Bioagent presence produces elevated levels
of large molecular fragments from, for example, about 100-7,000 Da
which are observed in the PGC-MS spectrum. The observed mass
spectrum is compared to a threshold level and when levels of
biomass are determined to exceed a predetermined threshold, the
bioagent classification process described hereinabove (combining
PCR and MS, such as FT-ICR MS) is initiated. Optionally, alarms or
other processes (halting ventilation flow, physical isolation) are
also initiated by this detected biomass level.
[0076] The accurate measurement of molecular mass for large DNAs is
limited by the adduction of cations from the PCR reaction to each
strand, resolution of the isotopic peaks from natural abundance
.sup.13C and .sup.15N isotopes, and assignment of the charge state
for any ion. The cations are removed by in-line dialysis using a
flow-through chip that brings the solution containing the PCR
products into contact with a solution containing ammonium acetate
in the presence of an electric field gradient orthogonal to the
flow. The latter two problems are addressed by operating with a
resolving power of >100,000 and by incorporating isotopically
depleted nucleotide triphosphates into the DNA. The resolving power
of the instrument is also a consideration. At a resolving power of
10,000, the modeled signal from the [M-14H+].sup.14- charge state
of an 84mer PCR product is poorly characterized and assignment of
the charge state or exact mass is impossible. At a resolving power
of 33,000, the peaks from the individual isotopic components are
visible. At a resolving power of 100,000, the isotopic peaks are
resolved to the baseline and assignment of the charge state for the
ion is straightforward. The [.sup.13C,.sup.15N]-depleted
triphosphates are obtained, for example, by growing microorganisms
on depleted media and harvesting the nucleotides (Batey et al.,
Nucl. Acids Res., 1992, 20, 4515-4523).
[0077] While mass measurements of intact nucleic acid regions are
believed to be adequate to determine most bioagents, tandem mass
spectrometry (MS.sup.n) techniques may provide more definitive
information pertaining to molecular identity or sequence. Tandem MS
involves the coupled use of two or more stages of mass analysis
where both the separation and detection steps are based on mass
spectrometry. The first stage is used to select an ion or component
of a sample from which further structural information is to be
obtained. The selected ion is then fragmented using, e.g.,
blackbody irradiation, infrared multiphoton dissociation, or
collisional activation. For example, ions generated by electrospray
ionization (ESI) can be fragmented using IR multiphoton
dissociation. This activation leads to dissociation of glycosidic
bonds and the phosphate backbone, producing two series of fragment
ions, called the w-series (having an intact 3' terminus and a 5'
phosphate following internal cleavage) and the a-Base series
(having an intact 5' terminus and a 3' furan).
[0078] The second stage of mass analysis is then used to detect and
measure the mass of these resulting fragments of product ions. Such
ion selection followed by fragmentation routines can be performed
multiple times so as to essentially completely dissect the
molecular sequence of a sample.
[0079] If there are two or more targets of similar molecular mass,
or if a single amplification reaction results in a product which
has the same mass as two or more bioagent reference standards, they
can be distinguished by using mass-modifying "tags." In this
embodiment of the invention, a nucleotide analog or "tag" is
incorporated during amplification (e.g., a 5-(trifluoromethyl)
deoxythymidine triphosphate) which has a different molecular weight
than the unmodified base so as to improve distinction of masses.
Such tags are described in, for example, PCT WO97/33000, which is
incorporated herein by reference in its entirety. This further
limits the number of possible base compositions consistent with any
mass. For example, 5-(trifluoromethyl)deoxythymidine triphosphate
can be used in place of dTTP in a separate nucleic acid
amplification reaction. Measurement of the mass shift between a
conventional amplification product and the tagged product is used
to quantitate the number of thymidine nucleotides in each of the
single strands. Because the strands are complementary, the number
of adenosine nucleotides in each strand is also determined.
[0080] In another amplification reaction, the number of G and C
residues in each strand is determined using, for example, the
cytidine analog 5-methylcytosine (5-meC) or propyne C. The
combination of the A/T reaction and G/C reaction, followed by
molecular weight determination, provides a unique base composition.
This method is summarized in FIG. 4 and Table 1.
TABLE-US-00001 TABLE 1 Total Total Total Base Base base base Double
Single mass info info comp. comp. Mass strand strand this this
other Top Bottom tag sequence Sequence strand strand strand strand
strand T*mass T*ACGT*ACGT T*ACGT*ACGT* 3x 3T 3A 3T 3A (T*-T) = x *
2A 2T AT*GCAT*GCA 2C 2G 2G 2C AT*GCAT*GCA 2x 2T 2A C*mass
TAC*GTAC*GT TAC*GTAC*GT 2x 2C 2G (C*-C) = y ATGC*ATGC*A ATGC*ATGC*A
2x 2C 2G
[0081] The mass tag phosphorothioate A (A*) was used to distinguish
a Bacillus anthracis cluster. The B. anthracis
(A.sub.14G.sub.9C.sub.14T.sub.9) had an average MW of 14072.26, and
the B. anthracis (A.sub.1A*.sub.13G.sub.9C.sub.14T.sub.9) had an
average molecular weight of 14281.11 and the phosphorothioate A had
an average molecular weight of +16.06 as determined by ESI-TOF MS.
The deconvoluted spectra are shown in FIG. 5.
[0082] In another example, assume the measured molecular masses of
each strand are 30,000.115 Da and 31,000.115 Da respectively, and
the measured number of dT and dA residues are (30,28) and (28,30).
If the molecular mass is accurate to 100 ppm, there are 7 possible
combinations of dG+dC possible for each strand. However, if the
measured molecular mass is accurate to 10 ppm, there are only 2
combinations of dG+dC, and at 1 ppm accuracy there is only one
possible base composition for each strand.
[0083] Signals from the mass spectrometer may be input to a
maximum-likelihood detection and classification algorithm such as
is widely used in radar signal processing. The detection processing
uses matched filtering of BCS observed in mass-basecount space and
allows for detection and subtraction of signatures from known,
harmless organisms, and for detection of unknown bioagent threats.
Comparison of newly observed bioagents to known bioagents is also
possible, for estimation of threat level, by comparing their BCS to
those of known organisms and to known forms of pathogenicity
enhancement, such as insertion of antibiotic resistance genes or
toxin genes.
[0084] Processing may end with a Bayesian classifier using log
likelihood ratios developed from the observed signals and average
background levels. The program emphasizes performance predictions
culminating in probability-of-detection versus
probability-of-false-alarm plots for conditions involving complex
backgrounds of naturally occurring organisms and environmental
contaminants. Matched filters consist of a priori expectations of
signal values given the set of primers used for each of the
bioagents. A genomic sequence database (e.g. GenBank) is used to
define the mass basecount matched filters. The database contains
known threat agents and benign background organisms. The latter is
used to estimate and subtract the signature produced by the
background organisms. A maximum likelihood detection of known
background organisms is implemented using matched filters and a
running-sum estimate of the noise covariance. Background signal
strengths are estimated and used along with the matched filters to
form signatures which are then subtracted. the maximum likelihood
process is applied to this "cleaned up" data in a similar manner
employing matched filters for the organisms and a running-sum
estimate of the noise-covariance for the cleaned up data.
F. Base Composition Signatures as Indices of Bioagent Identifying
Amplicons
[0085] Although the molecular mass of amplification products
obtained using intelligent primers provides a means for
identification of bioagents, conversion of molecular mass data to a
base composition signature is useful for certain analyses. As used
herein, a "base composition signature" (BCS) is the exact base
composition determined from the molecular mass of a bioagent
identifying amplicon. In one embodiment, a BCS provides an index of
a specific gene in a specific organism.
[0086] Base compositions, like sequences, vary slightly from
isolate to isolate within species. It is possible to manage this
diversity by building "base composition probability clouds" around
the composition constraints for each species. This permits
identification of organisms in a fashion similar to sequence
analysis. A "pseudo four-dimensional plot" can be used to visualize
the concept of base composition probability clouds (FIG. 18).
Optimal primer design requires optimal choice of bioagent
identifying amplicons and maximizes the separation between the base
composition signatures of individual bioagents. Areas where clouds
overlap indicate regions that may result in a misclassification, a
problem which is overcome by selecting primers that provide
information from different bioagent identifying amplicons, ideally
maximizing the separation of base compositions. Thus, one aspect of
the utility of an analysis of base composition probability clouds
is that it provides a means for screening primer sets in order to
avoid potential misclassifications of BCS and bioagent identity.
Another aspect of the utility of base composition probability
clouds is that they provide a means for predicting the identity of
a bioagent whose exact measured BCS was not previously observed
and/or indexed in a BCS database due to evolutionary transitions in
its nucleic acid sequence.
[0087] It is important to note that, in contrast to probe-based
techniques, mass spectrometry determination of base composition
does not require prior knowledge of the composition in order to
make the measurement, only to interpret the results. In this
regard, the present invention provides bioagent classifying
information similar to DNA sequencing and phylogenetic analysis at
a level sufficient to detect and identify a given bioagent.
Furthermore, the process of determination of a previously unknown
BCS for a given bioagent (for example, in a case where sequence
information is unavailable) has downstream utility by providing
additional bioagent indexing information with which to populate BCS
databases. The process of future bioagent identification is thus
greatly improved as more BCS indexes become available in the BCS
databases.
[0088] Another embodiment of the present invention is a method of
surveying bioagent samples that enables detection and
identification of all bacteria for which sequence information is
available using a set of twelve broad-range intelligent PCR
primers. Six of the twelve primers are "broad range survey primers"
herein defined as primers targeted to broad divisions of bacteria
(for example, the Bacillus/Clostridia group or
gamma-proteobacteria). The other six primers of the group of twelve
primers are "division-wide" primers herein defined as primers which
provide more focused coverage and higher resolution. This method
enables identification of nearly 100% of known bacteria at the
species level. A further example of this embodiment of the present
invention is a method herein designated "survey/drill-down" wherein
a subspecies characteristic for detected bioagents is obtained
using additional primers. Examples of such a subspecies
characteristic include but are not limited to: antibiotic
resistance, pathogenicity island, virulence factor, strain type,
sub-species type, and Glade group. Using the survey/drill-down
method, bioagent detection, confirmation and a subspecies
characteristic can be provided within hours. Moreover, the
survey/drill-down method can be focused to identify bioengineering
events such as the insertion of a toxin gene into a bacterial
species that does not normally make the toxin.
G. Fields of Application of the Present Invention
[0089] The present methods allow extremely rapid and accurate
detection and identification of bioagents compared to existing
methods. Furthermore, this rapid detection and identification is
possible even when sample material is impure. The methods leverage
ongoing biomedical research in virulence, pathogenicity, drug
resistance and genome sequencing into a method which provides
greatly improved sensitivity, specificity and reliability compared
to existing methods, with lower rates of false positives. Thus, the
methods are useful in a wide variety of fields, including, but not
limited to, those fields discussed below.
[0090] 1. Forensics Methods
[0091] In other embodiments of the invention, the methods disclosed
herein can be used for forensics. As used herein, "forensics" is
the study of evidence discovered at a crime or accident scene and
used in a court of law. "Forensic science" is any science used for
the purposes of the law, in particular the criminal justice system,
and therefore provides impartial scientific evidence for use in the
courts of law, and in a criminal investigation and trial. Forensic
science is a multidisciplinary subject, drawing principally from
chemistry and biology, but also from physics, geology, psychology
and social science, for example.
[0092] The process of human identification is a common objective of
forensics investigations. For example, there exists a need for
rapid identification of humans wherein human remains and/or
biological samples are analyzed. Such remains or samples may be
associated with war-related casualties, aircraft crashes, and acts
of terrorism, for example. Analysis of mtDNA enables a
rule-in/rule-out identification process for persons for whom DNA
profiles from a maternal relative are available. Human
identification by analysis of mtDNA can also be applied to human
remains and/or biological samples obtained from crime scenes.
[0093] Nucleic acid segments which provide enough variability to
distinguish each individual bioagent and whose molecular masses are
amenable to molecular mass determination are herein described as
"bioagent identifying amplicons." The bioagent identifying
amplicons used in the present invention for analysis of
mitochondrial DNA are defined as "mitochondrial DNA identifying
amplicons."
[0094] Forensic scientists generally use two highly variable
regions of human mtDNA for analysis. These regions are designated
"hypervariable regions 1 and 2" (HVR1 and HVR2 which contain 341
and 267 base pairs respectively). These hypervariable regions, or
portions thereof, provide one non-limiting example of mitochondrial
DNA identifying amplicons. A mtDNA analysis begins when total
genomic DNA is extracted from biological material, such as a tooth,
blood sample, or hair. The polymerase chain reaction (PCR) is then
used to amplify, or create many copies of, the two hypervariable
portions of the non-coding region of the mtDNA molecule, using
flanking primers. Care is taken to eliminate the introduction of
exogenous DNA during both the extraction and amplification steps
via methods such as the use of pre-packaged sterile equipment and
reagents, aerosol-resistant barrier pipette tips, gloves, masks,
and lab coats, separation of pre- and post-amplification areas in
the lab using dedicated reagents for each, ultraviolet irradiation
of equipment, and autoclaving of tubes and reagent stocks. In
casework, questioned samples are always processed before known
samples and they are processed in different laboratory rooms. When
adequate amounts of PCR product are amplified to provide all the
necessary information about the two hypervariable regions,
sequencing reactions are performed. These chemical reactions use
each PCR product as a template to create a new complementary strand
of DNA in which some of the nucleotide residues that make up the
DNA sequence are labeled with dye. The strands created in this
stage are then separated according to size by an automated
sequencing machine that uses a laser to "read" the sequence, or
order, of the nucleotide bases. Where possible, the sequences of
both hypervariable regions are determined on both strands of the
double-stranded DNA molecule, with sufficient redundancy to confirm
the nucleotide substitutions that characterize that particular
sample. At least two forensic analysts independently assemble the
sequence and then compare it to a standard, commonly used,
reference sequence. The entire process is then repeated with a
known sample, such as blood or saliva collected from a known
individual. The sequences from both samples, about 780 bases long
each, are compared to determine if they match. The analysts assess
the results of the analysis and determine if any portions of it
need to be repeated. Finally, in the event of an inclusion or
match, the SWGDAM mtDNA database, which is maintained by the FBI,
is searched for the mitochondrial sequence that has been observed
for the samples. The analysts can then report the number of
observations of this type based on the nucleotide positions that
have been read. A written report can be provided to the submitting
agency.
[0095] 2. Determination and Quantitation of Mitochondrial DNA
Heteroplasmy
[0096] In one embodiment of the present invention, the methods
disclosed herein for rapid identification of bioagents using base
composition signatures are employed for analysis of human mtDNA.
The advantages provided by this embodiment of the present invention
include, but are not limited to, efficiency of mass determination
of amplicons over sequence determination, and the ability to
resolve mixtures of mtDNA amplicons arising from heteroplasmy. Such
mixtures invariably cause sequencing failures.
[0097] In another embodiment of the present invention, the methods
disclosed herein for mtDNA analysis can be used to identify the
presence of heteroplasmic variants and to determine their relative
abundances. As used herein, "mitochondrial diseases" are defined as
diseases arising from defects in mitochondrial function which often
arise as a result of mutations and heteroplasmy. If the defect is
in the mitochondrial rather than the nuclear genome unusual
patterns of inheritance can be observed. This embodiment can be
used to determine rates of naturally occurring mutations
contributing to heteroplasmy and to predict the onset of
mitochondrial diseases arising from heteroplasmy. Examples of
mitochondrial diseases include, but are not limited to: Alpers
Disease, Barth syndrome, Beta-oxidation Defects,
Carnitine-Acyl-Carnitine Deficiency, Carnitine Deficiency,
Co-Enzyme Q10 Deficiency, Complex I Deficiency, Complex II
Deficiency, Complex III Deficiency, Complex IV Deficiency, Complex
V Deficiency, COX Deficiency, CPEO, CPT I Deficiency, CPT II
Deficiency, Glutaric Aciduria Type II, KSS, Lactic Acidosis, LCAD,
LCHAD, Leigh Disease or Syndrome, LHON, Lethal Infantile
Cardiomyopathy, Luft Disease, MAD, MCA, MELAS, MERRF, Mitochondrial
Cytopathy, Mitochondrial DNA Depletion, Mitochondrial
Encephalopathy, Mitochondrial Myopathy, MNGIE, NARP, Pearson
Syndrome, Pyruvate Carboxylase Deficiency, Pyruvate Dehydrogenase
Deficiency, Respiratory Chain, SCAD, SCHAD, VLCAD, and the like
found at the united mitochondrial disease foundation website.
[0098] In another embodiment of the present invention, the methods
disclosed herein can be used to rapidly determine the identity of a
fungus or a protist by analysis of its mtDNA.
[0099] In addition, epidemiologists, for example, can use the
present methods to determine the geographic origin of a particular
strain of a protist or fungus. For example, a particular strain of
bacteria or virus may have a sequence difference that is associated
with a particular area of a country or the world and identification
of such a sequence difference can lead to the identification of the
geographic origin and epidemiological tracking of the spread of the
particular disease, disorder or condition associated with the
detected protist or fungus. In addition, carriers of particular DNA
or diseases, such as mammals, non-mammals, birds, insects, and
plants, can be tracked by screening their mtDNA. Diseases, such as
malaria, can be tracked by screening the mtDNA of commensals such
as mosquitoes.
[0100] The present method can also be used to detect single
nucleotide polymorphisms (SNPs), or multiple nucleotide
polymorphisms, rapidly and accurately. A SNP is defined as a single
base pair site in the genome that is different from one individual
to another. The difference can be expressed either as a deletion,
an insertion or a substitution, and is frequently linked to a
disease state. Because they occur every 100-1000 base pairs, SNPs
are the most frequently bound type of genetic marker in the human
genome.
[0101] For example, sickle cell anemia results from an A-T
transition, which encodes a valine rather than a glutamic acid
residue. Oligonucleotide primers may be designed such that they
bind to sequences that flank a SNP site, followed by nucleotide
amplification and mass determination of the amplified product.
Because the molecular masses of the resulting product from an
individual who does not have sickle cell anemia is different from
that of the product from an individual who has the disease, the
method can be used to distinguish the two individuals. Thus, the
method can be used to detect any known SNP in an individual and
thus diagnose or determine increased susceptibility to a disease or
condition.
[0102] In one embodiment, blood is drawn from an individual and
peripheral blood mononuclear cells (PBMC) are isolated and
simultaneously tested, preferably in a high-throughput screening
method, for one or more SNPs using appropriate primers based on the
known sequences which flank the SNP region. The National Center for
Biotechnology Information maintains a publicly available database
of SNPs on the world wide web of the Internet at, for example,
"ncbi.nlm.nih.gov/SNP/."
[0103] The method of the present invention can also be used for
blood typing. The gene encoding A, B or O blood type can differ by
four single nucleotide polymorphisms. If the gene contains the
sequence CGTGGTGACCCTT (SEQ ID NO:5), antigen A results. If the
gene contains the sequence CGTCGTCACCGCTA (SEQ ID NO:6) antigen B
results. If the gene contains the sequence CGTGGT-ACCCCTT (SEQ ID
NO:7), blood group 0 results ("-" indicates a deletion). These
sequences can be distinguished by designing a single primer pair
which flanks these regions, followed by amplification and mass
determination.
[0104] While the present invention has been described with
specificity in accordance with certain of its embodiments, the
following examples serve only to illustrate the invention and are
not intended to limit the same.
EXAMPLES
Example 1
Nucleic Acid Isolation and PCR
[0105] In one embodiment, nucleic acid is isolated from the
organisms and amplified by PCR using standard methods prior to BCS
determination by mass spectrometry. Nucleic acid is isolated, for
example, by detergent lysis of bacterial cells, centrifugation and
ethanol precipitation. Nucleic acid isolation methods are described
in, for example, Current Protocols in Molecular Biology (Ausubel et
al.) and Molecular Cloning; A Laboratory Manual (Sambrook et al.).
The nucleic acid is then amplified using standard methodology, such
as PCR, with primers which bind to conserved regions of the nucleic
acid which contain an intervening variable sequence as described
below.
[0106] General Genomic DNA Sample Prep Protocol: Raw samples are
filtered using Supor-200 0.2 .mu.m membrane syringe filters (VWR
International). Samples are transferred to 1.5 ml eppendorf tubes
pre-filled with 0.45 g of 0.7 mm Zirconia beads followed by the
addition of 350 .mu.l of ATL buffer (Qiagen, Valencia, Calif.). The
samples are subjected to bead beating for 10 minutes at a frequency
of 19 Us in a Retsch Vibration Mill (Retsch). After centrifugation,
samples are transferred to an S-block plate (Qiagen) and DNA
isolation is completed with a BioRobot 8000 nucleic acid isolation
robot (Qiagen).
[0107] Swab Sample Protocol: Allegiance S/P brand culture swabs and
collection/transport system are used to collect samples. After
drying, swabs are placed in 17.times.100 mm culture tubes (VWR
International) and the genomic nucleic acid isolation is carried
out automatically with a Qiagen Mdx robot and the Qiagen QIAamp DNA
Blood BioRobot Mdx genomic preparation kit (Qiagen, Valencia,
Calif.).
Example 2
Mass Spectrometry
[0108] FTICR Instrumentation: The FTICR instrument is based on a 7
tesla actively shielded superconducting magnet and modified Bruker
Daltonics Apex II 70e ion optics and vacuum chamber. The
spectrometer is interfaced to a LEAP PAL autosampler and a custom
fluidics control system for high throughput screening applications.
Samples are analyzed directly from 96-well or 384-well microtiter
plates at a rate of about 1 sample/minute. The Bruker
data-acquisition platform is supplemented with a lab-built
ancillary NT datastation which controls the autosampler and
contains an arbitrary waveform generator capable of generating
complex rf-excite waveforms (frequency sweeps, filtered noise,
stored waveform inverse Fourier transform (SWIFT), etc.) for
sophisticated tandem MS experiments. For oligonucleotides in the
20-30-mer regime typical performance characteristics include mass
resolving power in excess of 100,000 (FWHM), low ppm mass
measurement errors, and an operable m/z range between 50 and 5000
m/z.
[0109] Modified ESI Source: In sample-limited analyses, analyte
solutions are delivered at 150 mL/minute to a 30 mm i.d.
fused-silica ESI emitter mounted on a 3-D micromanipulator. The ESI
ion optics consists of a heated metal capillary, an rf-only
hexapole, a skimmer cone, and an auxiliary gate electrode. The 6.2
cm rf-only hexapole is comprised of 1 mm diameter rods and is
operated at a voltage of 380 Vpp at a frequency of 5 MHz. A
lab-built electro-mechanical shutter can be employed to prevent the
electrospray plume from entering the inlet capillary unless
triggered to the "open" position via a TTL pulse from the data
station. When in the "closed" position, a stable electrospray plume
is maintained between the ESI emitter and the face of the shutter.
The back face of the shutter arm contains an elastomeric seal that
can be positioned to form a vacuum seal with the inlet capillary.
When the seal is removed, a 1 mm gap between the shutter blade and
the capillary inlet allows constant pressure in the external ion
reservoir regardless of whether the shutter is in the open or
closed position. When the shutter is triggered, a "time slice" of
ions is allowed to enter the inlet capillary and is subsequently
accumulated in the external ion reservoir. The rapid response time
of the ion shutter (<25 ms) provides reproducible, user defined
intervals during which ions can be injected into and accumulated in
the external ion reservoir.
[0110] Apparatus for Infrared Multiphoton Dissociation: A 25 watt
CW CO.sub.2 laser operating at 10.6 .mu.m has been interfaced to
the spectrometer to enable infrared multiphoton dissociation
(IRMPD) for oligonucleotide sequencing and other tandem MS
applications. An aluminum optical bench is positioned approximately
1.5 m from the actively shielded superconducting magnet such that
the laser beam is aligned with the central axis of the magnet.
Using standard IR-compatible mirrors and kinematic mirror mounts,
the unfocused 3 mm laser beam is aligned to traverse directly
through the 3.5 mm holes in the trapping electrodes of the FTICR
trapped ion cell and longitudinally traverse the hexapole region of
the external ion guide finally impinging on the skimmer cone. This
scheme allows IRMPD to be conducted in an m/z selective manner in
the trapped ion cell (e.g. following a SWIFT isolation of the
species of interest), or in a broadband mode in the high pressure
region of the external ion reservoir where collisions with neutral
molecules stabilize IRMPD-generated metastable fragment ions
resulting in increased fragment ion yield and sequence
coverage.
Example 3
Identification of Bioagents
[0111] Table 2 shows a small cross section of a database of
calculated molecular masses for over 9 primer sets and
approximately 30 organisms. The primer sets were derived from rRNA
alignment. Examples of regions from rRNA consensus alignments are
shown in FIGS. 1A-1C. Lines with arrows are examples of regions to
which intelligent primer pairs for PCR are designed. The primer
pairs are >95% conserved in the bacterial sequence database
(currently over 10,000 organisms). The intervening regions are
variable in length and/or composition, thus providing the base
composition "signature" (BCS) for each organism. Primer pairs were
chosen so the total length of the amplified region is less than
about 80-90 nucleotides. The label for each primer pair represents
the starting and ending base number of the amplified region on the
consensus diagram.
[0112] Included in the short bacterial database cross-section in
Table 2 are many well known pathogens/biowarfare agents (shown in
bold/red typeface) such as Bacillus anthracis or Yersinia pestis as
well as some of the bacterial organisms found commonly in the
natural environment such as Streptomyces. Even closely related
organisms can be distinguished from each other by the appropriate
choice of primers. For instance, two low G+C organisms, Bacillus
anthracis and Staph aureus, can be distinguished from each other by
using the primer pair defined by 16S.sub.--1337 or 23S.sub.--855
(.DELTA.M of 4 Da).
TABLE-US-00002 TABLE 2 Cross Section Of A Database Of Calculated
Molecular Masses.sup.1 Primer Regions Bug Name 16S_971 16S_1100
16S_1337 16S_1294 16S_1228 23S_1021 23S_855 23S_193 23S_115
Acinetobacter calcoaceticus 55619.1 55004 28446.7 35854.9 51295.4
30299 42654 39557.5 54999 55005 54388 28448 35238 51296 30295 42651
39560 56850 Bacillus cereus 55622.1 54387.9 28447.6 35854.9 51296.4
30295 42651 39560.5 56850.3 Bordetella bronchiseptica 56857.3
51300.4 28446.7 35857.9 51307.4 30299 42653 39559.5 51920.5
Borrelia burgdorferi 56231.2 55621.1 28440.7 35852.9 51295.4 30297
42029.9 38941.4 52524.6 58098 55011 28448 35854 50683 Campylobacter
jejuni 58088.5 54386.9 29061.8 35856.9 50674.3 30294 42032.9
39558.5 45732.5 55000 55007 29063 35855 50676 30295 42036 38941
56230 55006 53767 28445 35855 51291 30300 42656 39562 54999
Clostridium difficile 56855.3 54386.9 28444.7 35853.9 51296.4 30294
41417.8 39556.5 55612.2 Enterococcus faecalis 55620.1 54387.9
28447.6 35858.9 51296.4 30297 42652 39559.5 56849.3 55622 55009
28445 35857 51301 30301 42656 39562 54999 53769 54385 28445 35856
51298 Haemophilus influenzae 55620.1 55006 28444.7 35855.9 51298.4
30298 42656 39560.5 55613.1 Klebsiella pneumoniae 55622.1 55008
28442.7 35856.9 51297.4 30300 42655 39562.5 55000 55618 55626 28446
35857 51303 Mycobacterium avium 54390.9 55631.1 29064.8 35858.9
51915.5 30298 42656 38942.4 56241.2 Mycobacterium leprae 54389.9
55629.1 29064.8 35860.9 51917.5 30298 42656 39559.5 56240.2
Mycobacterium tuberculosis 54390.9 55629.1 29064.8 35860.9 51301.4
30299 42656 39560.5 56243.2 Mycoplasma genitalium 53143.7 45115.4
29061.8 35854.9 50671.3 30294 43264.1 39558.5 56842.4 Mycoplasma
pneumoniae 53143.7 45118.4 29061.8 35854.9 50673.3 30294 43264.1
39559.5 56843.4 Neisseria gonorrhoeae 55627.1 54389.9 28445.7
35855.9 51302.4 30300 42649 39561.5 55000 55623 55010 28443 35858
51301 30298 43272 39558 55619 58093 55621 28448 35853 50677 30293
42650 39559 53139 58094 55623 28448 35853 50679 30293 42648 39559
53755 55622 55005 28445 35857 51301 30301 42658 55623 55009 28444
35857 51301 Staphylococcus aureus 56854.3 54386.9 28443.7 35852.9
51294.4 30298 42655 39559.5 57466.4 Streptomyces 54389.9 59341.6
29063.8 35858.9 51300.4 39563.5 56864.3 Treponema pallidum 56245.2
55631.1 28445.7 35851.9 51297.4 30299 42034.9 38939.4 57473.4 55625
55626 28443 35857 52536 29063 30303 35241 50675 Vibrio
parahaemolyticus 54384.9 55626.1 28444.7 34620.7 50064.2 55620
55626 28443 35857 51299 .sup.1Molecular mass distribution of PCR
amplified regions for a selection of organisms (rows) across
various primer pairs (columns). Pathogens are shown in bold. Empty
cells indicate presently incomplete or missing data.
[0113] FIG. 6 shows the use of ESI-FT-ICR MS for measurement of
exact mass. The spectra from 46mer PCR products originating at
position 1337 of the 16S rRNA from S. aureus (upper) and B.
anthracis (lower) are shown. These data are from the region of the
spectrum containing signals from the [M-8H+].sup.8- charge states
of the respective 5'-3' strands. The two strands differ by two
(AT.fwdarw.CG) substitutions, and have measured masses of 14206.396
and 14208.373+0.010 Da, respectively. The possible base
compositions derived from the masses of the forward and reverse
strands for the B. anthracis products are listed in Table 3.
TABLE-US-00003 TABLE 3 Possible base composition for B. anthracis
products Calc. Mass Error Base Comp. 14208.2935 0.079520 A1 G17 C10
T18 14208.3160 0.056980 A1 G20 C15 T10 14208.3386 0.034440 A1 G23
C20 T2 14208.3074 0.065560 A6 G11 C3 T26 14208.3300 0.043020 A6 G14
C8 T18 14208.3525 0.020480 A6 G17 C13 T10 14208.3751 0.002060 A6
G20 C18 T2 14208.3439 0.029060 A11 G8 C1 T26 14208.3665 0.006520
A11 G11 C6 T18 14208.3890 0.016020 A11 G14 C11 T10 14208.4116
0.038560 A11 G17 C16 T2 14208.4030 0.029980 A16 G8 C4 T18
14208.4255 0.052520 A16 G11 C9 T10 14208.4481 0.075060 A16 G14 C14
T2 14208.4395 0.066480 A21 G5 C2 T18 14208.4620 0.089020 A21 G8 C7
T10 14079.2624 0.080600 A0 G14 C13 T19 14079.2849 0.058060 A0 G17
C18 T11 14079.3075 0.035520 A0 G20 C23 T3 14079.2538 0.089180 A5 G5
C1 T35 14079.2764 0.066640 A5 G8 C6 T27 14079.2989 0.044100 A5 G11
C11 T19 14079.3214 0.021560 A5 G14 C16 T11 14079.3440 0.000980 A5
G17 C21 T3 14079.3129 0.030140 A10 G5 C4 T27 14079.3354 0.007600
A10 G8 C9 T19 14079.3579 0.014940 A10 G11 C14 T11 14079.3805
0.037480 A10 G14 C19 T3 14079.3494 0.006360 A15 G2 C2 T27
14079.3719 0.028900 A15 G5 C7 T19 14079.3944 0.051440 A15 G8 C12
T11 14079.4170 0.073980 A15 G11 C17 T3 14079.4084 0.065400 A20 G2
C5 T19 14079.4309 0.087940 A20 G5 C10 T13
Among the 16 compositions for the forward strand and the 18
compositions for the reverse strand that were calculated, only one
pair (shown in bold) are complementary, corresponding to the actual
base compositions of the B. anthracis PCR products.
Example 4
BCS of Region from Bacillus anthracis and Bacillus cereus
[0114] A conserved Bacillus region from B. anthracis
(A.sub.14G.sub.9C.sub.14T.sub.9) and B. cereus
(A.sub.15G.sub.9C.sub.13T.sub.9) having a C to A base change was
synthesized and subjected to ESI-TOF MS. The results are shown in
FIG. 7 in which the two regions are clearly distinguished using the
method of the present invention (MW=14072.26 vs. 14096.29).
Example 5
Identification of Additional Bioagents
[0115] In other examples of the present invention, the pathogen
Vibrio cholera can be distinguished from Vibrio parahemolyticus
with .DELTA.M>600 Da using one of three 16S primer sets shown in
Table 2 (16S.sub.--971, 16S.sub.--1228 or 16S.sub.--1294) as shown
in Table 4. The two mycoplasma species in the list (M. genitalium
and M. pneumoniae) can also be distinguished from each other, as
can the three mycobacteriae. While the direct mass measurements of
amplified products can identify and distinguish a large number of
organisms, measurement of the base composition signature provides
dramatically enhanced resolving power for closely related
organisms. In cases such as Bacillus anthracis and Bacillus cereus
that are virtually indistinguishable from each other based solely
on mass differences, compositional analysis or fragmentation
patterns are used to resolve the differences. The single base
difference between the two organisms yields different fragmentation
patterns, and despite the presence of the ambiguous/unidentified
base N at position 20 in B. anthracis, the two organisms can be
identified.
[0116] Tables 4a-b show examples of primer pairs from Table 1 which
distinguish pathogens from background.
TABLE-US-00004 TABLE 4a Organism name 23S_855 16S_1337 23S_1021
Bacillus anthracis 42650.98 28447.65 30294.98 Staphylococcus aureus
42654.97 28443.67 30297.96
TABLE-US-00005 TABLE 4b Organism name 16S_971 16S_1294 16S_1228
Vibrio cholerae 55625.09 35856.87 52535.59 Vibrio parahaemolyticus
54384.91 34620.67 50064.19
[0117] Table 5 shows the expected molecular weight and base
composition of region 16S 1100-1188 in Mycobacterium avium and
Streptomyces sp.
TABLE-US-00006 TABLE 5 Molecular Region Organism name Length weight
Base comp. 16S_1100-1188 Mycobacterium avium 82 25624.1728
A.sub.16G.sub.32C.sub.18T.sub.16 16S_1100-1188 Streptomyces sp. 96
29904.871 A.sub.17G.sub.38C.sub.27T.sub.14
[0118] Table 6 shows base composition (single strand) results for
16S.sub.--1100-1188 primer amplification reactions different
species of bacteria. Species which are repeated in the table (e.g.,
Clostridium botulinum) are different strains which have different
base compositions in the 16S.sub.--1100-1188 region.
TABLE-US-00007 TABLE 6 Organism name Base comp. Mycobacterium avium
A.sub.16G.sub.32C.sub.18T.sub.16 Streptomyces sp.
A.sub.17G.sub.38C.sub.27T.sub.14 Ureaplasma urealyticum
A.sub.18G.sub.30C.sub.17T.sub.17 Streptomyces sp.
A.sub.19G.sub.36C.sub.24T.sub.18 Mycobacterium leprae
A.sub.20G.sub.32C.sub.22T.sub.16 Fusobacterium necroforum
A.sub.21G.sub.26C.sub.22T.sub.18 Listeria monocytogenes
A.sub.21G.sub.27C.sub.19T.sub.19 Clostridium botulinum
A.sub.21G.sub.27C.sub.19T.sub.21 Neisseria gonorrhoeae
A.sub.21G.sub.28C.sub.21T.sub.18 Bartonella quintana
A.sub.21G.sub.30C.sub.22T.sub.16 Enterococcus faecalis
A.sub.22G.sub.27C.sub.20T.sub.19 Bacillus megaterium
A.sub.22G.sub.28C.sub.20T.sub.18 Bacillus subtilis
A.sub.22G.sub.28C.sub.21T.sub.17 Pseudomonas aeruginosa
A.sub.22G.sub.29C.sub.23T.sub.15 Legionella pneumophila
A.sub.22G.sub.32C.sub.20T.sub.16 Mycoplasma pneumoniae
A.sub.23G.sub.20C.sub.14T.sub.16 Clostridium botulinum
A.sub.23G.sub.26C.sub.20T.sub.19 Enterococcus faecium
A.sub.23G.sub.26C.sub.21T.sub.18 Acinetobacter calcoaceti
A.sub.23G.sub.26C.sub.21T.sub.19 Clostridium perfringens
A.sub.23G.sub.27C.sub.19T.sub.19 Aeromonas hydrophila A.sub.23
G.sub.29 C.sub.21 T.sub.16 Escherichia coli A.sub.23 G.sub.29
C.sub.21 T.sub.16 Pseudomonas putida
A.sub.23G.sub.29C.sub.21T.sub.17 Vibrio cholerae
A.sub.23G.sub.30C.sub.21T.sub.16 Mycoplasma genitalium
A.sub.24G.sub.19C.sub.12T.sub.18 Clostridium botulinum
A.sub.24G.sub.25C.sub.18T.sub.20 Bordetella bronchiseptica
A.sub.24G.sub.26C.sub.19T.sub.14 Francisella tularensis
A.sub.24G.sub.26C.sub.19T.sub.19 Helicobacter pylori
A.sub.24G.sub.26C.sub.20T.sub.19 Helicobacter pylori
A.sub.24G.sub.26C.sub.21T.sub.18 Moraxella catarrhalis
A.sub.24G.sub.26C.sub.23T.sub.16 Haemophilus influenzae Rd
A.sub.24G.sub.28C.sub.20T.sub.17 Pseudomonas putida
A.sub.24G.sub.29C.sub.21T.sub.16 Clostridium botulinum
A.sub.25G.sub.24C.sub.18T.sub.21 Clostridium tetani
A.sub.25G.sub.25C.sub.18T.sub.20 Francisella tularensis
A.sub.25G.sub.25C.sub.19T.sub.19 Acinetobacter calcoacetic
A.sub.25G.sub.26C.sub.20T.sub.19 Bacteriodes fragilis
A.sub.25G.sub.27C.sub.16T.sub.22 Chlamydophila psittaci
A.sub.25G.sub.27C.sub.21T.sub.16 Borrelia burgdorferi
A.sub.25G.sub.29C.sub.17T.sub.19 Streptobacillus monilifor
A.sub.26G.sub.26C.sub.20T.sub.16 Rickettsia prowazekii
A.sub.26G.sub.28C.sub.18T.sub.18 Rickettsia rickettsii
A.sub.26G.sub.28C.sub.20T.sub.16 Mycoplasma mycoides
A.sub.28G.sub.23C.sub.16T.sub.20
[0119] The same organism having different base compositions are
different strains. Groups of organisms which are highlighted or in
italics have the same base compositions in the amplified region.
Some of these organisms can be distinguished using multiple
primers. For example, Bacillus anthracis can be distinguished from
Bacillus cereus and Bacillus thuringiensis using the primer
16S.sub.--971-1062 (Table 7). Other primer pairs which produce
unique base composition signatures are shown in Table 6 (bold).
Clusters containing very similar threat and ubiquitous non-threat
organisms (e.g. anthracis cluster) are distinguished at high
resolution with focused sets of primer pairs. The known biowarfare
agents in Table 6 are Bacillus anthracis, Yersinia pestis,
Francisella tularensis and Rickettsia prowazekii.
TABLE-US-00008 TABLE 7 Organism 16S_971-1062 16S_1228-1310
16S_1100-1188 Aeromonas hydrophila A.sub.21G.sub.29C.sub.22T.sub.20
A.sub.22G.sub.27C.sub.21T.sub.13 A.sub.23G.sub.31C.sub.21T.sub.15
Aeromonas salmonicida A.sub.21G.sub.29C.sub.22T.sub.20
A.sub.22G.sub.27C.sub.21T.sub.13 A.sub.23G.sub.31C.sub.21T.sub.15
Bacillus anthracis A.sub.24G.sub.22C.sub.19T.sub.18
A.sub.23G.sub.27C.sub.20T.sub.18 Bacillus cereus
A.sub.22G.sub.27C.sub.22T.sub.22 A.sub.24G.sub.22C.sub.19T.sub.18
A.sub.23G.sub.27C.sub.20T.sub.18 Bacillus thuringiensis
A.sub.22G.sub.27C.sub.21T.sub.22 A.sub.24G.sub.22C.sub.19T.sub.18
A.sub.23G.sub.27C.sub.20T.sub.18 Chlamydia trachomatis
A.sub.24G.sub.28C.sub.21T.sub.16 Chlamydia pneumoniae AR39
A.sub.26G.sub.23C.sub.20T.sub.22 A.sub.26G.sub.22C.sub.16T.sub.18
A.sub.24G.sub.28C.sub.21T.sub.16 Leptospira borgpetersenii
A.sub.22G.sub.26C.sub.20T.sub.21 A.sub.22G.sub.25C.sub.21T.sub.15
A.sub.23G.sub.26C.sub.24T.sub.15 Leptospira interrogans
A.sub.22G.sub.26C.sub.20T.sub.21 A.sub.22G.sub.25C.sub.21T.sub.15
A.sub.23G.sub.26C.sub.24T.sub.15 Mycoplasma genitalium
A.sub.28G.sub.23C.sub.15T.sub.22 Mycoplasma pneumoniae
A.sub.28G.sub.23C.sub.15T.sub.22 Escherichia coli
A.sub.24G.sub.25C.sub.21T.sub.13 A.sub.23G.sub.29C.sub.22T.sub.15
Shigella dysenteriae A.sub.24G.sub.25C.sub.21T.sub.13
A.sub.23G.sub.29C.sub.22T.sub.15 Proteus vulgaris
A.sub.24G.sub.30C.sub.21T.sub.15 Yersinia pestis
A.sub.24G.sub.25C.sub.21T.sub.22 A.sub.25G.sub.24C.sub.20T.sub.14
A.sub.24G.sub.30C.sub.21T.sub.15 Yersinia pseudotuberculosis
A.sub.24G.sub.25C.sub.21T.sub.22 A.sub.25G.sub.24C.sub.20T.sub.14
A.sub.24G.sub.30C.sub.21T.sub.15 Francisella tularensis Rickettsia
prowazekii Rickettsia rickettsii
[0120] The sequence of B. anthracis and B. cereus in region
16S.sub.--971 is shown below. Shown in bold is the single base
difference between the two species which can be detected using the
methods of the present invention. B. anthracis has an ambiguous
base at position 20.
TABLE-US-00009 B.anthracis_16S_971 (SEQ ID NO: 1)
GCGAAGAACCUUACCAGGUNUUGACAUCCUCUGACAACCCUAGAGAUAGG
GCUUCUCCUUCGGGAGCAGAGUGACAGGUGGUGCAUGGUU B.cereus_16S_971 (SEQ ID
NO: 2) GCGAAGAACCUUACCAGGUCUUGACAUCCUCUGAAAACCCUAGAGAUAGG
GCUUCUCCUUCGGGAGCAGAGUGACAGGUGGUGCAUGGUU
Example 6
ESI-TOF MS of sspE 56-mer Plus Calibrant
[0121] The mass measurement accuracy that can be obtained using an
internal mass standard in the ESI-MS study of PCR products is shown
in FIG. 8. The mass standard was a 20-mer phosphorothioate
oligonucleotide added to a solution containing a 56-mer PCR product
from the B. anthracis spore coat protein sspE. The mass of the
expected PCR product distinguishes B. anthracis from other species
of Bacillus such as B. thuringiensis and B. cereus.
Example 7
B. anthracis ESI-TOF Synthetic 16S.sub.--1228 Duplex
[0122] An ESI-TOF MS spectrum was obtained from an aqueous solution
containing 5 .mu.M each of synthetic analogs of the expected
forward and reverse PCR products from the nucleotide 1228 region of
the B. anthracis 16S rRNA gene. The results (FIG. 9) show that the
molecular weights of the forward and reverse strands can be
accurately determined and easily distinguish the two strands. The
[M-21H.sup.+].sup.21- and [M-20H.sup.+].sup.20- charge states are
shown.
Example 8
ESI-FTICR-MS of Synthetic B. anthracis 16S.sub.--1337 46 Base Pair
Duplex
[0123] An ESI-FTICR-MS spectrum was obtained from an aqueous
solution containing 5 .mu.M each of synthetic analogs of the
expected forward and reverse PCR products from the nucleotide 1337
region of the B. anthracis 16S rRNA gene. The results (FIG. 10)
show that the molecular weights of the strands can be distinguished
by this method. The [M-16H.sup.+].sup.16- through
[M-10H.sup.+].sup.10- charge states are shown. The insert
highlights the resolution that can be realized on the FTICR-MS
instrument, which allows the charge state of the ion to be
determined from the mass difference between peaks differing by a
single 13C substitution.
Example 9
ESI-TOF MS of 56-mer Oligonucleotide from saspB Gene of B.
anthracis with Internal Mass Standard
[0124] ESI-TOF MS spectra were obtained on a synthetic 56-mer
oligonucleotide (5 .mu.M) from the saspB gene of B. anthracis
containing an internal mass standard at an ESI of 1.7 .mu.L/min as
a function of sample consumption. The results (FIG. 11) show that
the signal to noise is improved as more scans are summed, and that
the standard and the product are visible after only 100 scans.
Example 10
ESI-TOF MS of an Internal Standard with Tributylammonium
(TBA)-trifluoroacetate (TFA) Buffer
[0125] An ESI-TOF-MS spectrum of a 20-mer phosphorothioate mass
standard was obtained following addition of 5 mM TBA-TFA buffer to
the solution. This buffer strips charge from the oligonucleotide
and shifts the most abundant charge state from [M-8H.sup.+].sup.8-
to [M-3H.sup.+].sup.3- (FIG. 12).
Example 11
Master Database Comparison
[0126] The molecular masses obtained through Examples 1-10 are
compared to molecular masses of known bioagents stored in a master
database to obtain a high probability matching molecular mass.
Example 12
Master Data Base Interrogation over the Internet
[0127] The same procedure as in Example 11 is followed except that
the local computer did not store the Master database. The Master
database is interrogated over an internet connection, searching for
a molecular mass match.
Example 13
Master Database Updating
[0128] The same procedure as in example 11 is followed except the
local computer is connected to the internet and has the ability to
store a master database locally. The local computer system
periodically, or at the user's discretion, interrogates the Master
database, synchronizing the local master database with the global
Master database. This provides the current molecular mass
information to both the local database as well as to the global
Master database. This further provides more of a globalized
knowledge base.
Example 14
Global Database Updating
[0129] The same procedure as in example 13 is followed except there
are numerous such local stations throughout the world. The
synchronization of each database adds to the diversity of
information and diversity of the molecular masses of known
bioagents.
Example 15
Biochemical Processing of Large Amplification Products for Analysis
by Mass Spectrometry
[0130] In the example illustrated in FIG. 18, a primer pair which
amplifies a 986 by region of the 16S ribosomal gene in E. coli
(K12) was digested with a mixture of 4 restriction enzymes: BstN1,
BsmF1, Bfa1, and Nco1. FIG. 18(a) illustrates the complexity of the
resulting ESI-FTICR mass spectrum which contains multiple charge
states of multiple restriction fragments. Upon mass deconvolution
to neutral mass, the spectrum is significantly simplified and
discrete oligonucleotide pairs are evident (FIG. 18(b). When base
compositions are derived from the masses of the restriction
fragments, perfect agreement is observed for the known sequence of
nucleotides 1-856 (FIG. 18(c); the batch of Nco1 enzyme used in
this experiment was inactive and resulted in a missed cleavage site
and a 197-mer fragment went undetected as it is outside the mass
range of the mass spectrometer under the conditions employed.
Interestingly however, both a forward and reverse strand were
detected for each fragment measured (solid and dotted lines in,
respectively) within 2 ppm of the predicted molecular weights
resulting in unambiguous determination of the base composition of
788 nucleotides of the 985 nucleotides in the amplicon. The
coverage map offers redundant coverage as both 5' to 3' and 3' to
5' fragments are detected for fragments covering the first 856
nucleotides of the amplicon.
[0131] This approach is in many ways analogous to those widely used
in MS-based proteomics studies in which large intact proteins are
digested with trypsin, or other proteolytic enzyme(s), and the
identity of the protein is derived by comparing the measured masses
of the tryptic peptides with theoretical digests. A unique feature
of this approach is that the precise mass measurements of the
complementary strands of each digest product allow one to derive a
de novo base composition for each fragment, which can in turn be
"stitched together" to derive a complete base composition for the
larger amplicon. An important distinction between this approach and
a gel-based restriction mapping strategy is that, in addition to
determination of the length of each fragment, an unambiguous base
composition of each restriction fragment is derived. Thus, a single
base substitution within a fragment (which would not be resolved on
a gel) is readily observed using this approach. Because this study
was performed on a 7 Tesla ESI-FTICR mass spectrometer, better than
2 ppm mass measurement accuracy was obtained for all fragments.
Interestingly, calculation of the mass measurement accuracy
required to derive unambiguous base compositions from the
complementary fragments indicates that the highest mass measurement
accuracy actually required is only 15 ppm for the 139 by fragment
(nucleotides 525-663). Most of the fragments were in the 50-70 by
size-range which would require mass accuracy of only .about.50 ppm
for unambiguous base composition determination. This level of
performance is achievable on other more compact, less expensive MS
platforms such as the ESI-TOF suggesting that the methods developed
here could be widely deployed in a variety of diagnostic and human
forensic arenas.
[0132] This example illustrates an alternative approach to derive
base compositions from larger PCR products. Because the amplicons
of interest cover many strain variants, for some of which complete
sequences are not known, each amplicon can be digested under
several different enzymatic conditions to ensure that a
diagnostically informative region of the amplicon is not obscured
by a "blind spot" which arises from a mutation in a restriction
site. The extent of redundancy required to confidently map the base
composition of amplicons from different markers, and determine
which set of restriction enzymes should be employed and how they
are most effectively used as mixtures can be determined. These
parameters will be dictated by the extent to which the area of
interest is conserved across the amplified region, the
compatibility of the various restriction enzymes with respect to
digestion protocol (buffer, temperature, time) and the degree of
coverage required to discriminate one amplicon from another.
Example 16
Analysis of 10 Human Blood Mitochondrial DNA Samples Provided by
the FBI
[0133] Ten different samples of human DNA provided by the FBI were
subjected to rapid mtDNA analysis by the method of the present
invention. Intelligent primers (SEQ ID NOs: 8-17 in Table 8) were
selected to amplify portions of HVR1 and HVR2. Additional
intelligent primers were designed to mtDNA regions other than HVR1
and HVR2 (SEQ ID NOs: 18-43). The primers described below are
generally 10-50 nucleotides in length, 15-35 nucleotides in length,
or 18-30 nucleotides in length.
TABLE-US-00010 TABLE 8 Intelligent Primer Pairs for Analysis of
mtDNA Primer Forward Reverse Pair Forward Primer SEQ ID Reverse
Primer SEQ ID Name Sequence NO: Sequence NO: HMTHV2_AND
TCACGCGATAGCATTGCG 8 TGGTTTGGCAGAGATGTGTTTA 9 RSN_76_353 AGT TMOD
HMTHV2_AND TCTCACGGGAGCTCTCCATGC 10 TCTGTTAAAAGTGCATACCGCC 11
RSN_29_429 A TMOD HMTHV1_AND TGACTCACCCATCAACAACCGC 12
TGAGGATGGTGGTCAAGGGAC 13 RSN_16065 16410_TMOD HMTHV1_AND
TGACTCACCCATCAACAACCGC 14 TGGATTTGACTGTAATGTGCTA 15 RSN_16065
16354_TMOD HMTHV1_AND TGACTCACCCATCAACAACCGC 16
TGAAGGGATTTGACTGTAATGT 17 RSN_16064_ GCTATG 16359 HMT_ASN_16
GAAGCAGATTTGGGTACCACC 18 GTGTGTGTGCTGGGTAGGATG 19 036_522
HMT_ASN_81 TACGGTCAATGCTCTGAAATCT 20 TGGTAAGAAGTGGGCTAGGGCA 21
62_8916 GTGG TT HMT_ASN_12 TTATGTAAAATCCATTGTCGCA 22
TGGTGATAGCGCCTAAGCATAG 23 438_13189 TCCACC TG HMT_ASN_14
TCCCATTACTAAACCCACACTC 24 TTTCGTGCAAGAATAGGAGGTG 25 629_15353 AACAG
GAG HMT_ASN_94 TAAGGCCTTCGATACGGGATAA 26 TAGGGTCGAAGCCGCACTCG 27
35_10188 TCCTA HMT_ASN_10 TACTCCAATGCTAAAACTAATC 28
TGTGAGGCGTATTATACCATAG 29 753_11500 GTCCCAAC CCG HMT_ASN_15
TCCTAGGAATCACCTCCCATTC 30 TAGAATCTTAGCTTTGGGTGCT 31 369_16006 CGA
AATGGTG HMT_ASN_13 TGGCAGCCTAGCATTAGCAGGA 32 TGGCTGAACATTGTTTGTTGGT
33 461_14206 ATA GT HMT_ASN_34 TCGCTGACGCCATAAAACTCTT 34
TAAGTAATGCTAGGGTGAGTGG 35 52_4210 CAC TAGGAAG HMT_ASN_77
TAACTAATACTAACATCTCAGA 36 TTTATGGGCTTTGGTGAGGGAG 37 34_8493
CGCTCAGGA GTA HMT_ASN_63 TACTCCCACCCTGGAGCCTC 38
TGCTCCTATTGATAGGACATAG 39 09_7058 TGGAAGTG HMT_ASN_76
TTATCACCTTTCATGATCACGC 40 TGGCATTTCACTGTAAAGAGGT 41 44_8371 CCT
GTTGG HMT_ASN_26 TGTATGAATGGCTCCACGAGGG 42 TCGGTAAGCATTAGGAATGCCA
43 26_3377 T TTGC
The process of the analysis is shown in FIG. 19. After
amplification by PCR (210), the PCR products were subjected to
restriction digests (220) with RsaI for HVR1 and a combination of
HpaII, HpyCH4IV, Pad and EaeI for HVR2 in order to obtain amplicon
segments suitable for analysis by FTICR-MS (230). The data were
processed to obtain mass data for each amplicon segment (240) which
were then compared to the masses calculated for theoretical digests
from the FBI mtDNA database by a scoring scheme (250). Digestion
pattern matches were scored by the sum of (i) the percentage of
expected complete digest fragments observed, (ii) the percentage of
fragments with a "floating" percentage of potential incomplete
digest fragments (to increase sensitivity for incomplete digestion
these are assigned lower weight), (iii) the percentage of the
sequence covered by matched masses, (iv) the number of mass peaks
accounted for in the theoretical database digest, and (v) the
weighted score for matched peaks, weighted by their observed
abundance. HVR1 and HVR2 scores were combined and all database
entries were sorted by high score. Even in the absence of an exact
match in the database, the majority of entries can be ruled out by
observing a much lower match score than the maximum score. One with
relevant skill in the art will recognize that development of such
scoring procedures is can be accomplished without undue
experimentation.
[0134] The results of analysis of sample 1 are shown in FIGS. 20A
and 20B. In this example, the utility of mass determination of
amplicon digest segments is indicated. In FIG. 20A, predicted and
actual mass data with scoring parameters for length heteroplasmy
(HV1-1-outer-variants 1 and 2) in the digest segment from position
94 to 145 (variant 1)/146 (variant 2) are shown. FIG. 20B indicates
that, whereas sequencing fails to resolve the variants due to the
length heteroplasmy, mass determination detects multiple species
simultaneously and indicates abundance ratios. In this case, the
ratio of variant 1 to variant 2 (short to long alleles) is 1:3.
Thus, in addition to efficiency of characterization of individual
digested amplicon fragments, the relative abundances of
heteroplasmic variants can be determined.
[0135] Of the 10 samples analyzed by the present methods, 9 samples
were verified as being consistent with members of the FBI database.
The remaining sample could not be analyzed due to a failure of PCR
to produce an amplification product.
[0136] Various modifications of the invention, in addition to those
described herein, will be apparent to those skilled in the art from
the foregoing description. Such modifications are also intended to
fall within the scope of the appended claims. Each reference cited
in the present application is incorporated herein by reference in
its entirety
Sequence CWU 1
1
47190RNABacillus anthracismisc_feature(20)..(20)N = A, U, G or C
1gcgaagaacc uuaccaggun uugacauccu cugacaaccc uagagauagg gcuucuccuu
60cgggagcaga gugacaggug gugcaugguu 90290RNABacillus cereus
2gcgaagaacc uuaccagguc uugacauccu cugaaaaccc uagagauagg gcuucuccuu
60cgggagcaga gugacaggug gugcaugguu 9031542RNAArtificial
SequenceSynthetic 3nnnnnnnaga guuugaucnu ggcucagnnn gaacgcuggc
ggnnngcnun anacaugcaa 60gucgancgnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn
agnggcnnac gggugaguaa 120nncnunnnna nnunccnnnn nnnnnggnan
annnnnnnga aannnnnnnu aauaccnnau 180nnnnnnnnnn nnnnaaagnn
nnnnnnnnnn nnnnnnnnnn nnnnnngann nnnnnnngnn 240nnaunagnun
guuggunngg uaanggcnna ccaagncnnn gannnnuagc ngnncugaga
300ggnngnncng ccacanuggn acugaganac ggnccanacu ccuacgggag
gcagcagunn 360ggaaunuunn ncaauggnng naanncugan nnagcnannc
cgcgugnnng anganggnnu 420nnngnungua aannncunun nnnnnngang
annnnnnnnn nnnnnnnnnn nnnnnnnnnu 480gacnnuannn nnnnannaag
nnncggcnaa cuncgugcca gcagccgcgg uaauacgnag 540gnngcnagcg
uunnncggan unanugggcg uaaagngnnn gnaggnggnn nnnnnngunn
600nnngunaaan nnnnnngcun aacnnnnnnn nnncnnnnnn nacnnnnnnn
cungagnnnn 660nnagnggnnn nnngaauunn nnguguagng gugnaauncg
naganaunng nangaanacc 720nnungcgaag gcnnnnnncu ggnnnnnnac
ugacncunan nnncgaaagc nugggnagcn 780aacaggauua gauacccugg
uaguccangc nnuaaacgnu gnnnnnunnn ngnnngnnnn 840nnnnnnnnnn
nnnnnnnnna nnnaacgnnn uaannnnncc gccuggggag uacgnncgca
900agnnunaaac ucaaangaau ugacggggnc cngcacaagc ngnggagnau
guggnuuaau 960ucgangnnac gcgnanaacc uuaccnnnnn uugacaunnn
nnnnnnnnnn nnganannnn 1020nnnnnnnnnn nnnnnnnnnn nnnacaggug
nugcauggnu gucgucagcu cgugnnguga 1080gnuguugggu uaagucccgn
aacgagcgca acccnnnnnn nnnguuncna ncnnnnnnnn 1140ngngnacucn
nnnnnnacug ccnnngnnaa nnnggaggaa ggnggggang acgucaanuc
1200nucaugnccc uuangnnnng ggcuncacac nuncuacaau ggnnnnnaca
nngngnngcn 1260annnngnnan nnnnagcnaa ncnnnnaaan nnnnucnnag
uncggaungn nnncugcaac 1320ucgnnnncnu gaagnnggan ucgcuaguaa
ucgnnnauca gnangnnncg gugaauacgu 1380ucncgggncu uguacacacc
gcccgucann ncangnnagn nnnnnnnncc nnaagnnnnn 1440nnnnnnncnn
nnnngnnnnn nnnnncnang gnnnnnnnnn nganugggnn naagucguaa
1500caagguancc nuannngaan nugnggnugg aucaccuccu un
154242904RNAArtificial SequenceSynthetic 4nnnnaagnnn nnaagngnnn
nngguggaug ccunggcnnn nnnagncgan gaaggangnn 60nnnnncnncn nnanncnnng
gnnagnngnn nnnnnncnnn nnanccnnng nunuccgaau 120ggggnaaccc
nnnnnnnnnn nnnnnnnnan nnnnnnnnnn nnnnnnnnnn nnnnnnngnn
180nacnnnnnga anugaaacau cunaguannn nnaggaanag aaannaannn
ngauuncnnn 240nguagnggcg agcgaannng nannagncnn nnnnnnnnnn
nnnnnnnnnn nnnannngaa 300nnnnnuggna agnnnnnnnn nannngguna
nannccngua nnnnaaannn nnnnnnnnnn 360nnnnnnnnnn aguannncnn
nncncgngnn annnngunng aannngnnnn gaccannnnn 420naagncuaaa
uacunnnnnn ngaccnauag ngnannagua cngugangga aaggngaaaa
480gnacccnnnn nangggagug aaanagnncc ugaaaccnnn nncnuanaan
nngunnnagn 540nnnnnnnnnn nnnuganngc gunccuuuug nannaugnnn
cngnganuun nnnunnnnng 600cnagnuuaan nnnnnnnngn agncgnagng
aaancgagun nnaanngngc gnnnagunnn 660nngnnnnaga cncgaancnn
ngugancuan nnaugnncag gnugaagnnn nnguaanann 720nnnuggaggn
ccgaacnnnn nnnnguugaa aannnnnngg augannugug nnungnggng
780aaanncnaan cnaacnnngn nauagcuggu ucucnncgaa annnnuuuag
gnnnngcnun 840nnnnnnnnnn nnnnggnggu agagcacugn nnnnnnnnng
gnnnnnnnnn nnnnuacnna 900nnnnnnnnaa acuncgaaun ccnnnnnnnn
nnnnnnnngn agnnanncnn ngngngnuaa 960nnuncnnngu nnanagggna
acancccaga ncnncnnnua aggncccnaa nnnnnnnnua 1020aguggnaaan
gangugnnnn nncnnanaca nnnaggangu uggcuuagaa gcagccancn
1080uunaaagann gcguaanagc ucacunnucn agnnnnnnng cgcngannau
nuancgggnc 1140uaannnnnnn nccgaannnn nngnnnnnnn nnnnnnnnnn
nnnnngguag nngagcgunn 1200nnnnnnnnnn ngaagnnnnn nngnnannnn
nnnuggannn nnnnnnagug ngnaugnngn 1260naunaguanc gannnnnnnn
gugananncn nnnncnccgn annncnaagg nuuccnnnnn 1320nangnunnuc
nnnnnngggu nagucgnnnc cuaagnngag ncnganangn nuagnngaug
1380gnnannnggu nnauauuccn nnacnnnnnn nnnnnnnnnn nnnnngacgn
nnnnngnnnn 1440nnnnnnnnnn nnnnggnnnn nnnnnnnnnn nnnnnnnnnn
nnnnnnnnnn nnnnnnnnnn 1500nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn
nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 1560nnnncnngaa aannnnnnnn
nnnnnnnnnn nnnnnnnnnc guaccnnaaa ccgacacagg 1620ungnnnngnn
gagnanncnn aggngnnngn nnnaannnnn nnnaaggaac unngcaaanu
1680nnnnccguan cuucggnana aggnnnncnn nnnnnnnnnn nnnnnnnnnn
nnnnnnnnnn 1740nnnnnnnnng nnnnannnan nngnnnnnnn cnacuguuua
nnaaaaacac agnncnnugc 1800naanncgnaa gnnganguau anggnnugac
nccugcccng ugcnngaagg uuaanngnnn 1860nnnnnngnnn nngnnnnnnn
nnnnannnaa gcccnnguna acggcggnng uaacuauaac 1920nnuccuaagg
uagcgaaauu ccuugucggg uaaguuccga ccngcacgaa nggngnaang
1980annnnnnnnc ugucucnnnn nnnnncncng ngaanuunna nunnnnguna
agaugcnnnn 2040uncncgcnnn nngacggaaa gaccccnngn ancuuuacun
nannnunnna nugnnnnnnn 2100nnnnnnnnug unnagnauag gunggagncn
nngannnnnn nncgnnagnn nnnnnggagn 2160cnnnnnugnn auacnacncu
nnnnnnnnnn nnnnucuaac nnnnnnnnnn nancnnnnnn 2220nnngacanug
nnngnngggn aguuunacug gggcggunnc cuccnaaann guaacggagg
2280ngnncnaagg unnncunann nnggnnggnn aucnnnnnnn nagunnaann
gnanaagnnn 2340gcnunacugn nagnnnnacn nnncgagcag nnncgaaagn
nggnnnuagu gauccggngg 2400unnnnnnugg aagngccnuc gcucaacgga
uaaaagnuac ncnggggaua acaggcunau 2460nnnncccaag aguncanauc
gacggnnnng uuuggcaccu cgaugucggc ucnucncauc 2520cuggggcugn
agnngguccc aagggunngg cuguucgccn nuuaaagngg nacgngagcu
2580ggguunanaa cgucgugaga caguungguc ccuaucngnn gngngngnnn
gannnuugan 2640nngnnnugnn cnuaguacga gaggaccggn nngnacnnan
cncuggugnn ncnguugunn 2700ngccannngc anngcngnnu agcuannunn
ggnnnngaua anngcugaan gcaucuaagn 2760nngaancnnn cnnnnagann
agnnnucncn nnnnnnnnnn nnnnnnnnna gnnncnnnnn 2820agannannnn
gungauaggn nngnnnugna agnnnngnna nnnnunnagn nnacnnnuac
2880uaaunnnncn nnnnncuunn nnnn 2904513DNAArtificial
SequenceSynthetic 5cgtggtgacc ctt 13614DNAArtificial
SequenceSynthetic 6cgtcgtcacc gcta 14713DNAArtificial
SequenceSynthetic 7cgtggtaccc ctt 13818DNAArtificial
SequenceSynthetic 8tcacgcgata gcattgcg 18921DNAArtificial
SequenceSynthetic 9tctcacggga gctctccatg c 211022DNAArtificial
SequenceSynthetic 10tgactcaccc atcaacaacc gc 221122DNAArtificial
SequenceSynthetic 11tgactcaccc atcaacaacc gc 221222DNAArtificial
SequenceSynthetic 12tgactcaccc atcaacaacc gc 221321DNAArtificial
SequenceSynthetic 13gaagcagatt tgggtaccac c 211426DNAArtificial
SequenceSynthetic 14tacggtcaat gctctgaaat ctgtgg
261528DNAArtificial SequenceSynthetic 15ttatgtaaaa tccattgtcg
catccacc 281627DNAArtificial SequenceSynthetic 16tcccattact
aaacccacac tcaacag 271727DNAArtificial SequenceSynthetic
17taaggccttc gatacgggat aatccta 271830DNAArtificial
SequenceSynthetic 18tactccaatg ctaaaactaa tcgtcccaac
301925DNAArtificial SequenceSynthetic 19tcctaggaat cacctcccat tccga
252025DNAArtificial SequenceSynthetic 20tggcagccta gcattagcag gaata
252125DNAArtificial SequenceSynthetic 21tcgctgacgc cataaaactc ttcac
252231DNAArtificial SequenceSynthetic 22taactaatac taacatctca
gacgctcagg a 312320DNAArtificial SequenceSynthetic 23tactcccacc
ctggagcctc 202425DNAArtificial SequenceSynthetic 24ttatcacctt
tcatgatcac gccct 252523DNAArtificial SequenceSynthetic 25tgtatgaatg
gctccacgag ggt 232625DNAArtificial SequenceSynthetic 26tggtttggca
gagatgtgtt taagt 252723DNAArtificial SequenceSynthetic 27tctgttaaaa
gtgcataccg cca 232821DNAArtificial SequenceSynthetic 28tgaggatggt
ggtcaaggga c 212922DNAArtificial SequenceSynthetic 29tggatttgac
tgtaatgtgc ta 223028DNAArtificial SequenceSynthetic 30tgaagggatt
tgactgtaat gtgctatg 283121DNAArtificial SequenceSynthetic
31gtgtgtgtgc tgggtaggat g 213224DNAArtificial SequenceSynthetic
32tggtaagaag tgggctaggg catt 243324DNAArtificial SequenceSynthetic
33tggtgatagc gcctaagcat agtg 243425DNAArtificial SequenceSynthetic
34tttcgtgcaa gaataggagg tggag 253520DNAArtificial SequenceSynthetic
35tagggtcgaa gccgcactcg 203625DNAArtificial SequenceSynthetic
36tgtgaggcgt attataccat agccg 253729DNAArtificial SequenceSynthetic
37tagaatctta gctttgggtg ctaatggtg 293824DNAArtificial
SequenceSynthetic 38tggctgaaca ttgtttgttg gtgt 243929DNAArtificial
SequenceSynthetic 39taagtaatgc tagggtgagt ggtaggaag
294025DNAArtificial SequenceSynthetic 40tttatgggct ttggtgaggg aggta
254130DNAArtificial SequenceSynthetic 41tgctcctatt gataggacat
agtggaagtg 304227DNAArtificial SequenceSynthetic 42tggcatttca
ctgtaaagag gtgttgg 274326DNAArtificial SequenceSynthetic
43tcggtaagca ttaggaatgc cattgc 264452DNAArtificial
SequenceSynthetic 44acataaaaac ccaatccaca tcaaaccccc ccccccatgc
ttacaagcaa gt 524553DNAArtificial SequenceSynthetic 45acataaaaac
ccaatccaca tcaaaccccc cccccccatg cttacaagca agt 534679DNAArtificial
SequenceSynthetic 46accacctgta gtacataaaa acccaatcca catcaaaccc
ccccccccct nggtttanaa 60gnangtnngg nantnancc 794780DNAArtificial
SequenceSynthetic 47gagggttgat tgctgtactt gcttgtaagc atgggggggg
gggntttaat ngnanttgnt 60ttttnnnncc cccccggggn 80
* * * * *