U.S. patent application number 14/047414 was filed with the patent office on 2014-08-14 for methods for rapid identification and quantitation of nucleic acid variants.
This patent application is currently assigned to IBIS BIOSCIENCES, INC.. The applicant listed for this patent is Nina M. Hofstadler. Invention is credited to David J. Ecker, Thomas A. Hall, Steven A. Hofstadler, Kristin Sannes-Lowery.
Application Number | 20140227704 14/047414 |
Document ID | / |
Family ID | 37667125 |
Filed Date | 2014-08-14 |
United States Patent
Application |
20140227704 |
Kind Code |
A1 |
Ecker; David J. ; et
al. |
August 14, 2014 |
METHODS FOR RAPID IDENTIFICATION AND QUANTITATION OF NUCLEIC ACID
VARIANTS
Abstract
There is a need for nucleic acid analysis which is both specific
and rapid, and in which no nucleic acid sequencing is required. The
present invention addresses this need, among others by providing a
method of nucleic acid amplification of overlapping sub-segments of
a nucleic acid followed by molecular mass measurement of resulting
amplification products by mass spectrometry, and determination of
the base compositions of the amplification products.
Inventors: |
Ecker; David J.; (Encinitas,
CA) ; Hofstadler; Steven A.; (Vista, CA) ;
Hall; Thomas A.; (Oceanside, CA) ; Sannes-Lowery;
Kristin; (Irvine, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Hofstadler; Nina M. |
Vista |
CA |
US |
|
|
Assignee: |
IBIS BIOSCIENCES, INC.
Carlsbad
CA
|
Family ID: |
37667125 |
Appl. No.: |
14/047414 |
Filed: |
October 7, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12616422 |
Nov 11, 2009 |
8551738 |
|
|
14047414 |
|
|
|
|
Current U.S.
Class: |
435/6.12 |
Current CPC
Class: |
C12Q 2600/156 20130101;
C12Q 2600/16 20130101; C12Q 1/6881 20130101; C12Q 1/6858 20130101;
C12Q 1/6858 20130101; C12Q 1/6827 20130101; C12Q 1/6872 20130101;
C12Q 1/6844 20130101; C12Q 1/6844 20130101; C12Q 2565/627 20130101;
C12Q 2531/113 20130101; C12Q 2545/10 20130101; C12Q 2537/143
20130101; C12Q 2565/627 20130101; C12Q 2537/143 20130101; C12Q
1/6827 20130101; C12Q 2531/113 20130101; C12Q 2565/627
20130101 |
Class at
Publication: |
435/6.12 |
International
Class: |
C12Q 1/68 20060101
C12Q001/68 |
Claims
1. A method for analyzing a nucleic acid comprising the steps of:
(a) obtaining a sample comprising nucleic acid for base composition
analysis; (b) selecting at least two primer pairs that will
generate overlapping amplification products of at least two
sub-segments of the nucleic acid; (c) amplifying at least two
nucleic acid sequences of a region of the nucleic acid designated
as a target for base composition analysis using the at least two
primer pairs, thereby generating at least two overlapping
amplification products; (d) determining base compositions of the
amplification products by; (i) measuring molecular masses of one or
more of the amplification products generated in step (c) using a
mass spectrometer; and (ii) converting one or more of the measured
molecular masses to base compositions; (e) comparing one or more of
the base compositions with a source of reference base composition
data for the nucleic acid sequence; and (f) identifying the
presence of a particular nucleic acid sequence or variant
thereof.
2-36. (canceled)
Description
RELATED APPLICATIONS
[0001] The present application is a continuation of U.S.
application Ser. No. 11/491,376, filed Jul. 21, 2006, which claims
the benefit of priority to U.S. Provisional Application Ser. No.
60/701,404, filed Jul. 21, 2005; to U.S. Provisional Application
Ser. No. 60/771,101, filed Feb. 6, 2006; and to U.S. Provisional
Application Ser. No. 60/747,607 filed May 18, 2006. Each of the
above listed applications is incorporated herein by reference in
its entirety. Methods disclosed in U.S. application Ser. Nos.
10/156,608, 09/891,793, 10/418,514, 10/660,997, 10/660,122,
10,660,996, 10/660,998, 10/728,486, 10/405,756, 10/853,660,
11/060,135, 11/073,362 and 11/209,439, are commonly owned and
incorporated herein by reference in their entirety for any
purpose.
SEQUENCE LISTING
[0002] A paper copy of the sequence listing and a computer-readable
form of the sequence listing, on diskette, containing the file
named DIBIS0075US.C1SEQ.txt, which was created on Nov. 11, 2009,
are herein incorporated by reference.
FIELD OF THE INVENTION
[0003] The present invention relates generally to the field of
nucleic acid analysis and provides methods, compositions and kits
useful for this purpose when combined with mass spectrometry.
BACKGROUND OF THE INVENTION
[0004] Characterization of nucleic acid variants is a problem of
great importance in various fields of molecular biology such as,
for example, genotyping and identification of strains of bacteria
and viruses which are subject to evolutionary pressures via
mechanisms including mutation, natural selection, ge drift and
recombination. Nucleic acid heterogeneity is a common feature of
RNA viruses, for example. Populations of RNA viruses often exhibit
high levels of heterogeneity due to mutations which enhance the
ability of the viruses to adapt to growth conditions. Mixed
populations of RNA virus quasispecies are known to exist in viral
vaccines. It would be advantageous to have a method for monitoring
the heterogeneity of viral vaccines. Likewise, new strains of
bacterial species are also known to evolve rapidly.
[0005] Characterization and quantitiation of newly-evolving
bacteria and viruses such as the SARS coronavirus, for example, is
typically the first step in containment of an epidemic or
infectious disease outbreak. In addition to characterization of
naturally occurring variants of bacteria and viruses, there is a
need for characterization of genetically engineered bacterial or
viral bio-weapons in forensic or bio-warfare investigations.
Unfortunately, the process of sequencing entire bacterial or viral
genomes or vaccine vector sequences is time consuming and is not
effective at resolving mixtures of nucleic acid variants.
[0006] Mitochondrial DNA is found in eukaryotes and differs from
nuclear DNA in its location, its sequence, its quantity in the
cell, and its mode of inheritance. The nucleus of the human cell
contains two sets of 23 chromosomes, one paternal set and one
maternal set. However, cells may contain hundreds to thousands of
mitochondria, each of which may contain several copies of
mitochondrial DNA. Nuclear DNA has many more bases than
mitochondrial DNA, but mitochondrial DNA is present in many more
copies than nuclear DNA. This characteristic of mitochondrial DNA
is useful in situations where the amount of DNA in a sample is very
limited. Typical sources of DNA recovered from crime scenes include
hair, bones, teeth, and body fluids such as saliva, semen, and
blood.
[0007] In humans, mitochondrial DNA is inherited strictly from the
mother (Case J. T. and Wallace, D. C., Somatic Cell Genetics, 1981,
7, 103-108; Giles, R. E. et al. Proc. Natl. Acad. Sci. 1980, 77,
6715-6719; Hutchison, C. A. et al. Nature, 1974, 251, 536-538).
Thus, the mitochondrial DNA sequences obtained from maternally
related individuals, such as a brother and a sister or a mother and
a daughter, will exactly match each other in the absence of a
mutation. This characteristic of mitochondrial DNA is advantageous
in missing persons cases as reference mitochondrial DNA samples can
be supplied by any maternal relative of the missing individual
(Ginther, C. et al. Nature Genetics, 1992, 2, 135-138; Holland, M.
M. et al. Journal of Forensic Sciences, 1993, 38, 542-553;
Stoneking, M. et al. American Journal of Human Genetics, 1991, 48,
370-382).
[0008] The human mitochondrial DNA genome is approximately 16,569
bases in length and has two general regions: the coding region and
the control region. The coding region is responsible for the
production of various biological molecules involved in the process
of energy production in the cell and includes about 37 genes (22
transfer RNAs, 2 ribosomal RNAs, and 13 peptides), with very little
intergenic sequence and no introns. The control region is
responsible for regulation of the mitochondrial DNA molecule. Two
regions of mitochondrial DNA within the control region have been
found to be highly polymorphic, or variable, within the human
population (Greenberg, B. D. et al. Gene, 1983, 21, 33-49). These
two regions are termed "hypervariable Region I" (HV 1), which has
an approximate length of 342 base pairs (bp), and "hypervariable
Region II" (HV2), which has an approximate length of 268 bp.
Forensic mitochondrial DNA examinations are performed using these
two hypervariable regions because of the high degree of variability
found among individuals.
[0009] There exists a need for rapid identification of humans
wherein human remains and/or biological samples are analyzed. Such
remains or samples may be associated with war-related casualties,
aircraft crashes, and acts of terrorism, for example. Analysis of
mitochondrial DNA enables a rule-in/rule-out identification process
for persons for whom DNA profiles from a maternal relative are
available. Human identification by analysis of mitochondrial DNA
can also be applied to human remains and/or biological samples
obtained from crime scenes.
[0010] The process of human identification is a common objective of
forensics investigations. As used herein, "forensics" is the study
of evidence discovered at a crime or accident scene and used in a
court of law. "Forensic science" is any science used for the
purposes of the law, in particular the criminal justice system, and
therefore provides impartial scientific evidence for use in the
courts of law, and in a criminal investigation and trial. Forensic
science is a multidisciplinary subject, drawing principally from
chemistry and biology, but also from physics, geology, psychology
and social science, for example.
[0011] Forensic scientists generally use the two hypervari able
regions of human mitochondrial DNA for analysis. These
hypervariable regions, or portions thereof, provide only one
non-limiting example of a region of mitochondrial DNA useful for
identification analysis.
[0012] A typical mitochondrial DNA analysis begins when total
genomic and mitochondrial DNA is extracted from biological
material, such as a tooth, blood sample, or hair. The polymerase
chain reaction (PCR) is then used to amplify, or create many copies
of, the two hypervariable portions of the non-coding region of the
mitochondrial DNA molecule, using flanking primers. When adequate
amounts of PCR product are amplified to provide all the necessary
information about the two hypervariable regions, sequencing
reactions are performed. Where possible, the sequences of both
hypervariable regions are determined on both strands of the
double-stranded DNA molecule, with sufficient redundancy to confirm
the nucleotide substitutions that characterize that particular
sample. The entire process is then repeated with a known sample,
such as blood or saliva collected from a known individual. The
sequences from both samples are compared to determine if they
match. Finally, in the event of an inclusion or match, The
Scientific Working Group on DNA Analysis Methods (SWGDAM)
mitochondrial DNA database, which is maintained by the FBI, is
searched for the mitochondrial sequence that has been observed for
the samples. The analysts can then report the number of
observations of this type based on the nucleotide positions that
have been read. A written report can be provided to the submitting
agency. This process is described in more detail in M. M. Holland
and T. J. Parsons 1999, Forensic Science Review, volume 11, pages
25-51.
[0013] Approximately 610 bp of mitochondrial DNA are currently
sequenced in forensic mitochondrial DNA analysis. Recording and
comparing mitochondrial DNA sequences would be difficult and
potentially confusing if all of the bases were listed. Thus,
mitochondrial DNA sequence information is recorded by listing only
the differences with respect to a reference DNA sequence. By
convention, human mitochondrial DNA sequences are described using
the first complete published mitochondrial DNA sequence as a
reference (Anderson, S. et al., Nature, 1981, 290, 457-465). This
sequence is commonly referred to as the Anderson sequence. It is
also called the Cambridge reference sequence or the Oxford
sequence. Each base pair in this sequence is assigned a number.
Deviations from this reference sequence are recorded as the number
of the position demonstrating a difference and a letter designation
of the different base. For example, a transition from A to G at
position 263 would be recorded as 263 G. If deletions or insertions
of bases are present in the mitochondrial DNA, these differences
are denoted as well.
[0014] In the United States, there are seven laboratories currently
conducting forensic mitochondrial DNA examinations: the FBI
Laboratory; Laboratory Corporation of America (LabCorp) in Research
Triangle Park, N.C.; Mitotyping Technologies in State College, Pa.;
the Bode Technology Group (BTG) in Springfield, Va.; the Armed
Forces DNA Identification Laboratory (AFDIL) in Rockville, Md.;
BioSynthesis, Inc. in Lewisville, Tex.; and Reliagene in New
Orleans, Louisiana.
[0015] Mitochondrial DNA analyses have been admitted in criminal
proceedings from these laboratories in the following states as of
April 1999: Alabama, Arkansas, Florida, Indiana, Illinois,
Maryland, Michigan, New Mexico, North Carolina, Pennsylvania, South
Carolina, Tennessee, Texas, and Washington. Mitochondrial DNA has
also been admitted and used in criminal trials in Australia, the
United Kingdom, and several other European countries.
[0016] Since 1996, the number of individuals performing
mitochondrial DNA analysis at the FBI Laboratory has grown from 4
to 12, with more personnel expected in the near future. Over 150
mitochondrial DNA cases have been completed by the FBI Laboratory
as of March 1999, and dozens more await analysis. Forensic courses
are being taught by the FBI Laboratory personnel and other groups
to educate forensic scientists in the procedures and interpretation
of mitochondrial DNA sequencing. More and more individuals are
learning about the value of mitochondrial DNA sequencing for
obtaining useful information from evidentiary samples that are
small, degraded, or both. Mitochondrial DNA sequencing is becoming
known not only as an exclusionary tool but also as a complementary
technique for use with other human identification procedures.
Mitochondrial DNA analysis will continue to be a powerful tool for
law enforcement officials in the years to come as other
applications are developed, validated, and applied to forensic
evidence.
[0017] Presently, the forensic analysis of mitochondrial DNA is
rigorous and labor-intensive. Currently, only 1-2 cases per month
per analyst can be performed. Several molecular biological
techniques are combined to obtain a mitochondrial DNA sequence from
a sample. The steps of the mitochondrial DNA analysis process
include primary visual analysis, sample preparation, DNA
extraction, polymerase chain reaction (PCR) amplification,
post-amplification quantification of the DNA, automated DNA
sequencing, and data analysis. Another complicating factor in the
forensic analysis of mitochondrial DNA is the occurrence of
heteroplasmy wherein the pool of mitochondrial DNAs in a given cell
is heterogeneous due to mutations in individual mitochondrial DNAs.
There are different forms of heteroplasmy found in mitochondrial
DNA. For example, sequence heteroplasmy (also known as point
heteroplasmy) is the occurrence of more than one base at a
particular position or positions in the mitochondrial DNA sequence.
Length heteroplasmy is the occurrence of more than one length of a
stretch of the same base in a mitochondrial DNA sequence as a
result of insertion of nucleotide residues.
[0018] Heteroplasmy is a problem for forensic investigators since a
sample from a crime scene can differ from a sample from a suspect
by one base pair and this difference may be interpreted as
sufficient evidence to eliminate that individual as the suspect.
Hair samples from a single individual can contain heteroplasmic
mutations at vastly different concentrations and even the root and
shaft of a single hair can differ. The detection methods currently
available to molecular biologists cannot detect low levels of
heteroplasmy. Furthermore, if present, length heteroplasmy will
adversely affect sequencing runs by resulting in an out-of-frame
sequence that cannot be interpreted.
[0019] Mass spectrometry provides detailed information about the
molecules being analyzed, including high mass accuracy. It is also
a process that can be easily automated.
[0020] There is a need for a mitochondrial DNA forensic analysis
which is both specific and rapid, and in which no nucleic acid
sequencing is required. There is also a need for a method of rapid
characterization and quantitation of nucleic acids which have
variant positions relative to a reference sequence. These needs, as
well as others, are addressed herein below.
SUMMARY OF THE INVENTION
[0021] Described herein are compositions and methods for analyzing
a nucleic acid by performing the steps of obtaining a sample of
nucleic acid for base composition analysis; selecting at least two
primer pairs that will generate overlapping amplification products
of at least two sub-segments of the nucleic acid; amplifying at
least two nucleic acid sequences of a region of the nucleic acid
designated as a target for base composition analysis using the
primer pairs, thereby generating at least two overlapping
amplification products; obtaining base compositions of the
amplification products by measuring molecular masses of one or more
of the amplification products using a mass spectrometer; and
converting one or more of the measured molecular masses to base
compositions; comparing one or more of the base compositions with
one or more base compositions of reference sub-segments of a
reference sequence; and identifying the presence of a particular
nucleic acid sequence or variant thereof.
[0022] The nucleic acid analyzed is obtained from a human,
bacterium, virus, fungus, synthetic nucleic acid source,
recombinant nucleic acid source, or encodes a biological product
such as a vaccine, antibody or other biological product.
[0023] Further described herein are compositions and methods for
identifying a human by obtaining a sample comprising mitochondrial
DNA of the human for base composition analysis; selecting at least
two primer pairs that will generate overlapping amplification
products representing overlapping sub-segments of the mitochondrial
DNA; amplifying at least two nucleic acid sequences of a region of
the mitochondrial DNA designated as a target for base composition
analysis using the at least two primer pairs, thereby generating at
least two overlapping amplification products; obtaining base
compositions of the amplification products by measuring molecular
masses of one or more of the amplification products generated using
a mass spectrometer and converting one or more of the measured
molecular masses to base compositions; and comparing one or more of
the base compositions with one or more base compositions of
reference sub-segments of a reference sequence thereby identifying
the human.
[0024] Also described herein are compositions and methods for
characterizing heteroplasmy of mitochondrial DNA comprising the
steps of obtaining a sample comprising mitochondrial DNA for base
composition analysis; selecting at least two primer pairs that will
generate overlapping amplification products representing
sub-segments of the mitochondrial DNA; amplifying at least two
nucleic acid sequences of a region of the mitochondrial DNA
designated as a target for base composition analysis using the at
least two primer pairs, thereby generating at least two overlapping
amplification products; obtaining base compositions of the
amplification products by measuring molecular masses of one or more
of the amplification products using a mass spectrometer; and
converting one or more of the measured molecular masses to base
compositions; comparing one or more of the base compositions with
one or more base compositions of reference sub-segments of a
reference sequence; and identifying at least two distinct
amplification products with distinct base compositions obtained by
the same pair of primers, thereby characterizing the
heteroplasmy.
[0025] Also disclosed are primer pair compositions and kits
comprising the same which are useful for obtaining amplification
products used in genotyping organisms.
BRIEF DESCRIPTION OF THE DRAWINGS
[0026] FIG. 1 is a schematic diagram of the definition of
sub-segments of a reference sequence for amplification. Arrows
indicate the position of primer hybridization for obtaining an
amplification product corresponding to a sub-segment. For example,
FWD-A indicates the hybridization position of the forward primer
for obtaining an amplification product corresponding to Sub-segment
A, while REV-A indicates the hybridization position of the reverse
primer for obtaining an amplification product corresponding to
sub-segment A. Overlap of one sub-segment A, which has a length of
120 nucleobases (bp) with sub-segment B is shown on the left
side.
[0027] FIG. 2 is mass spectrum of three amplification products of a
sample of mitochondrial DNA displaying six peaks corresponding to
the individual strands of each of the three amplification products,
each corresponding to sub-segments of the target mitochondrial DNA.
Peaks labeled A and B are from a single amplification product of
the HV1 region obtained with primer pair number 2892 (SEQ ID NOs:
4:29). Peaks labeled C and D are from a single amplification
product of the HV1 region obtained with primer pair number 2901
(SEQ ID NOs: 12:37). Peaks labeled E and F are from a single
amplification product of the HV2 region obtained with primer pair
number 2906 (SEQ ID NOs: 17:42).
[0028] FIG. 3 represents a refinement of peaks from a mass spectrum
of a sample mitochondrial DNA displaying six peak lines
corresponding to the individual strands of each of the three
amplification products. Detection of heteroplasmy in one of the
amplified regions is indicated. Peaks labeled A and B are from a
single amplification product of the HV 1 region obtained with
primer pair number 2904 (SEQ ID NOs: 15:40). Peaks labeled C and D
are from a single amplification product of the HV1 region obtained
with primer pair number 2896 (SEQ ID NO: 8:33). Peaks labeled C'
and D' are from a single amplification product of the HV 1 region
obtained with primer pair number 2896 which represents one
heteroplasmic variant of the amplification product represented by
peaks C and D. Peaks labeled C'' and D'' are from a single
amplification product of the HV1 region obtained with primer pair
number 2896 which represents another heteroplasmic variant of the
amplification product represented by peaks C and D. Peaks labeled E
and F are from a single amplification product of the HV2 region
obtained with primer pair number 2913 (SEQ ID NO: 22:47).
[0029] FIG. 4 is an illustration of the names and chromosome
locations for the CODIS 13 markers, as well as for the AMEL markers
on the X and Y chromosomes. The CODIS 13 short tandem repeats are
commonly used by law enforcement for determining the source
identity for a given nucleic acid.
DEFINITIONS
[0030] A number of terms and phrases are defined below:
[0031] As described herein, nucleic acids are analyzed to generate
a base composition profile. Nucleic acids include, but are not
limited to, human mitochondrial DNA, human, chromosomal DNA,
bacterial genomic DNA, fungal DNA, viral DNA, viral RNA,
commercially available plasmids or vectors or vaccines. The nucleic
acids are referred to as having regions, which define as being a
portion of the nucleic acid that are known or suspected to comprise
genetic sequence differences that allow for the characterization of
the nucleic acid. By use of the term "characterization" it is meant
that the source of the nucleic acid can be identified (e.g.,
genetic identification of a human, identification of a
recombination event in a plasmid, diagnosis of a human genetic
disposition towards a disease or trait, HN typing of influenza
virus strains). Part or all of a region may form the target for
analysis using the disclosed material and methods. Alternatively,
an entire nucleic acid can be analyzed, which is typically more
useful when there are not defined regions for characterization.
Thus, the whole nucleic acid will be referred to herein as region
and a target. Within a target there are sub-segments. Sub-segments
are the portions of nucleic acid that are flanked by primer to
generate individual amplified products or amplicons. These
sub-segments preferably overlap.
[0032] As used herein, "Mitochondrial DNA" refers to a circular
ring of DNA which is separate from chromosomal DNA and contained as
multiple copies within mitochondria. Mitochondrial DNA is often
abbreviated as "mtDNA" and will be recognized as such by one with
ordinary skill in the arts of mitochondrial DNA analysis. In a
preferred embodiment, the objective is to identify a human. Nucleic
acid is obtained from a human cell, such as a blood cell, hair,
cell, skin cell or any other human cell appropriate for obtaining
nucleic acid. In some embodiments, the nucleic acid is
mitochondrial DNA. In some embodiments, certain portions of
mitochondrial DNA are appropriate for base composition analysis
such as, for example, HV1 and HV2.
[0033] As used herein, the term "HV 1" refers to a region within
mitochondrial DNA known as "hypervariable region 1." With respect
to the reference Anderson/Cambridge mitochondrial DNA sequence, the
HV1 region is represented by coordinates 15924 . . . 16428. This
region is useful for identification of humans because it has a high
degree of variability among different human individuals. In some
embodiments, a defined portion of the HV1 region is analyzed by
base composition analysis of "sub-segments" of the defined portion.
In this embodiment, the defined portion of HV1 represents the
"target." In preferred embodiments, the entire HV1 region
(coordinates 15924 . . . 16428) is divided into overlapping
sub-segments. In this embodiment, the entire HV1 region represents
the "target."
[0034] As used herein, the term "HV2" refers to a region within
mitochondrial DNA known as "hypervariable region 2." With respect
to the reference Anderson/Cambridge mitochondrial DNA sequence, the
HV1 region is represented by coordinates 31 . . . 576. As for HV 1,
the HV2 region is useful for identification of humans because it
also has a high degree of variability among different human
individuals. In some embodiments, a defined portion of the HV2
region is analyzed by base composition analysis of "sub-segments"
of the defined portion. In this embodiment, the defined portion of
HV2 represents the "target." In preferred embodiments, the entire
HV1 region (coordinates 31 . . . 576) is divided into overlapping
sub-segments. In this embodiment, the entire HV2 region represents
the "target."
[0035] In other embodiments, additional target regions within the
mitochondrial DNA may be chosen for base composition analysis.
[0036] As used herein, the term "target" generally refers to a
nucleic acid sequence to be detected or characterized. Thus, the
"target" is sought to be sorted out from other nucleic acid
sequences.
[0037] As used herein, "sub-segments" are portions of a given
target which are of useful size for base composition analysis. In
some embodiments, the sizes of sub-segments range between about 45
to about 150 nucleobases in length. In preferred embodiments, the
"sub-segments" overlap with each other and cover the entire target
as shown in FIG. 1. Amplification products representing the
sub-segments are obtained by amplification methods, such as PCR
that are well known to those with ordinary skill in molecular
biology techniques. The amplification products representing the
sub-segments are analyzed by mass spectrometry to determine their
molecular masses and base compositions of the amplification
products are calculated from the molecular masses. The
experimentally-determined base compositions are then compared with
base compositions of "reference sub-segments" of a "reference
nucleic acid" whose sequence and/or base composition is known. In
preferred embodiments a database containing base compositions of
reference nucleic acids and sub-segments thereof is used for
comparison with the experimentally-determined base compositions. A
match of one or more experimentally-determined base compositions of
one or more sub-segments with one or more base compositions of
reference sub-segments will provide the identity of the human.
[0038] The same definitions of the terms "target," "sub-segment,"
"reference sub-segment" and "reference nucleic acid" are applicable
to other preferred embodiments where base composition analysis is
used to identify a human by analysis of specific human chromosomal
target regions such as CODIS markers for example. FIG. 4 is an
illustration of the names and chromosome locations for the CODIS 13
markers, as well as for the AMEL markers on the X and Y
chromosomes.
[0039] The same definitions of the terms "target," "sub-segment,"
"reference sub-segment" and "reference nucleic acid" are applicable
to other preferred embodiments where base composition analysis is
used to identify or characterize a genotype of a microorganism such
as a bacterium, virus, or fungus for example. Characterization of
genotypes of microorganisms is useful in infectious disease
diagnostics for example. In these embodiments, a given target may
represent the entire genome of a microorganism or a portion
thereof. The target is analyzed by characterization of
amplification products representing sub-segments of the target.
[0040] The same definitions of the terms "target," "sub-segment,"
"reference sub-segment" and "reference nucleic acid" are applicable
to other preferred embodiments where base composition analysis is
used to validate a "test nucleic acid" with respect to a reference
nucleic acid. Validation of test nucleic acids is desirable in
quality control of pharmaceutical production such as in production
of vectors carrying genes encoding therapeutic proteins such as
vaccines for example. In this embodiment, the "test nucleic acid"
is expected to be identical in sequence and base composition to the
reference nucleic acid. Comparison of experimentally determined
base compositions of amplification products representing
sub-segments of the target with base compositions of reference
sub-segments may either indicate that the base compositions are
identical, thereby validating the test nucleic acid, or identify a
variant of the reference nucleic acid.
[0041] "Amplification" is a special case of nucleic acid
replication involving template specificity. It is to be contrasted
with non-specific template replication (i.e., replication that is
template-dependent but not dependent on a specific template).
Template specificity is here distinguished from fidelity of
replication (i.e., synthesis of the proper polynucleotide sequence)
and nucleotide (ribo- or deoxyribo-) specificity. Template
specificity is frequently described in terms of "target"
specificity.
[0042] Template or target specificity is achieved in most
amplification techniques by the choice of enzyme. Amplification
enzymes are enzymes that, under conditions they are used, will
process only specific sequences of nucleic acid in a heterogeneous
mixture of nucleic acid. For example, in the case of Q.beta.
replicase, MDV-1 RNA is the specific template for the replicase (D.
L. Kacian et al., Proc. Natl. Acad. Sci. USA 69:3038 [1972]). Other
nucleic acid will not be replicated by this amplification enzyme.
Similarly, in the case of T7 RNA polymerase, this amplification
enzyme has a stringent specificity for its own promoters
(Chamberlin et al., Nature 228:227 [1970]). In the case of T4 DNA
ligase, the enzyme will not ligate the two oligonucleotides or
polynucleotides, where there is a mismatch between the
oligonucleotide or polynucleotide substrate and the template at the
ligation junction (D. Y. Wu and R. B. Wallace, Genomics 4:560
[1989]). Finally, Tag and Pfu polymerases, by virtue of their
ability to function at high temperature, are found to display high
specificity for the sequences bounded and thus defined by the
primers; the high temperature results in thermodynamic conditions
that favor primer hybridization with the target sequences and not
hybridization with non-target sequences (H. A. Erlich (ed.), PCR
Technology, Stockton Press [1989]).
[0043] As used herein, the term "sample template" refers to nucleic
acid originating from a sample that is analyzed for the presence of
"target" (defined below). In contrast, "background template" is
used in reference to nucleic acid other than sample template that
may or may not be present in a sample. Background template is most
often inadvertent. It may be the result of carryover, or it may be
due to the presence of nucleic acid contaminants sought to be
purified away from the sample. For example, nucleic acids from
organisms other than those to be detected may be present as
background in a test sample.
[0044] As used herein, the term "primer" refers to an
oligonucleotide, whether occurring naturally, such as a purified
fragment from a restriction digest, or produced synthetically,
which is capable of acting as a point of initiation of synthesis
when placed under conditions in which synthesis of a primer
extension product which is complementary to a nucleic acid strand
is induced, (i.e., in the presence of nucleotides and an inducing
agent such as DNA polymerase and at a suitable temperature and pH).
The primer is preferably single stranded for maximum efficiency in
amplification, but may alternatively be double stranded. If double
stranded, the primer is first treated to separate its strands
before being used to prepare extension products. Preferably, the
primer is an oligodeoxyribonucleotide. Preferably, the primer is
sufficiently long to prime the synthesis of extension products in
the presence of the inducing agent. The exact lengths of the
primers will depend on many factors, including temperature, source
of primer and the use of the method. The primers can be any useful
length. Lengths of about 13 to about 35 nucleobases are preferred.
One with ordinary skill in the art of molecular biology can design
primers appropriate for amplification methods.
[0045] As used herein, a "pair of primers" or "a primer pair" is
used for amplification of a nucleic acid sequence. A pair of
primers comprises a forward primer and a reverse primer. The
forward primer hybridizes to a sense strand of a target gene
sequence to be amplified and primes synthesis of an antisense
strand (complementary to the sense strand) using the target
sequence as a template. A reverse primer hybridizes to the
antisense strand of a target gene sequence to be amplified and
primes synthesis of a sense strand (complementary to the antisense
strand) using the target sequence as a template.
[0046] As used herein, the term "polymerase chain reaction" ("PCR")
refers to the method of K. B. Mullis U.S. Pat. Nos. 4,683,195,
4,683,202, and 4,965,188, hereby incorporated by reference, that
describe a method for increasing the concentration of a segment of
a target sequence in a mixture of genomic DNA without cloning or
purification. This process for amplifying the target sequence
consists of introducing a large excess of two oligonucleotide
primers to the DNA mixture containing the desired target sequence,
followed by a precise sequence of thermal cycling in the presence
of a DNA polymerase. The two primers are complementary to their
respective strands of the double stranded target sequence. To
effect amplification, the mixture is denatured and the primers then
annealed to their complementary sequences within the target
molecule. Following annealing, the primers are extended with a
polymerase so as to form a new pair of complementary strands. The
steps of denaturation, primer annealing, and polymerase extension
can be repeated many times (i.e., denaturation, annealing and
extension constitute one "cycle"; there can be numerous "cycles")
to obtain a high concentration of an amplified segment of the
desired target sequence. The length of the amplified segment of the
desired target sequence is determined by the relative positions of
the primers with respect to each other, and therefore, this length
is a controllable parameter. By virtue of the repeating aspect of
the process, the method is referred to as the "polymerase chain
reaction" (hereinafter "PCR"). Because the desired amplified
segments of the target sequence become the predominant sequences
(in terms of concentration) in the mixture, they are said to be
"PCR amplified."
[0047] With PCR, it is possible to amplify a single copy of a
specific target sequence in genomic DNA to a level detectable by
several different methodologies (e.g., hybridization with a labeled
probe; incorporation of biotinylated primers followed by
avidin-enzyme conjugate detection; incorporation of
.sup.32P-labeled deoxynucleotide triphosphates, such as dCTP or
dATP, into the amplified segment). In addition to genomic DNA, any
oligonucleotide or polynucleotide sequence can be amplified with
the appropriate set of primer molecules. In particular, the
amplified segments created by the PCR process itself are,
themselves, efficient templates for subsequent PCR
amplifications.
[0048] As used herein, the terms "PCR product," "PCR fragment," and
"amplification product" refer to the nucleic acid product obtained
after two or more cycles of the PCR steps of denaturation,
annealing and extension are complete. These terms encompass the
case where there has been amplification of one or more segments of
one or more target sequences.
[0049] As used herein, the term "amplification reagents" refers to
those reagents (deoxyribonucleotide triphosphates, buffer, etc.),
needed for amplification except for primers, nucleic acid template,
and the amplification enzyme. Typically, amplification reagents
along with other reaction components are placed and contained in a
reaction vessel (test tube, microwell, etc.).
[0050] As used herein, the terms "complementary" or
"complementarity" are used in reference to polynucleotides (i.e., a
sequence of nucleotides such as an oligonucleotide or a target
nucleic acid) related by the base-pairing rules. For example, for
the sequence "5'-A-G-T-3'," is complementary to the sequence
"3'-T-C-A-5'." Complementarity may be "partial," in which only some
of the nucleic acids' bases are matched according to the base
pairing rules. Or, there may be "complete" or "total"
complementarity between the nucleic acids. The degree of
complementarity between nucleic acid strands has significant
effects on the efficiency and strength of hybridization between
nucleic acid strands. This is of particular importance in
amplification reactions, as well as detection methods which depend
upon binding between nucleic acids. Either term may also be used in
reference to individual nucleotides, especially within the context
of polynucleotides. For example, a particular nucleotide within an
oligonucleotide may be noted for its complementarity, or lack
thereof, to a nucleotide within another nucleic acid strand, in
contrast or comparison to the complementarity between the rest of
the oligonucleotide and the nucleic acid strand.
[0051] The terms "homology," "homologous" and "sequence identity"
refer to a degree of identity. There may be partial homology or
complete homology. A partially homologous sequence is one that is
less than 100% identical to another sequence. Determination of
sequence identity is described in the following example: a primer
20 nucleobases in length which is otherwise identical to another 20
nucleobase primer but having two non-identical residues has 18 of
20 identical residues (18/20=0.9 or 90% sequence identity). In
another example, a primer 15 nucleobases in length having all
residues identical to a 15 nucleobase segment of primer 20
nucleobases in length would have 15/20=0.75 or 75% sequence
identity with the 20 nucleobase primer. In context of the present
invention, sequence identity is meant to be properly determined
when the query sequence and the subject sequence are both described
in the 5' to 3' direction.
[0052] As used herein, the term "hybridization" is used in
reference to the pairing of complementary nucleic acids.
Hybridization and the strength of hybridization (i.e., the strength
of the association between the nucleic acids) is influenced by such
factors as the degree of complementary between the nucleic acids,
stringency of the conditions involved, and the T.sub.m of the
formed hybrid. "Hybridization" methods involve the annealing of one
nucleic acid to another, complementary nucleic acid, i.e., a
nucleic acid having a complementary nucleotide sequence. The
ability of two polymers of nucleic acid containing complementary
sequences to find each other and anneal through base pairing
interaction is a well-recognized phenomenon. The initial
observations of the "hybridization" process by Marmur and Lane,
Proc. Natl. Acad. Sci. USA 46:453 (1960) and Doty et al., Proc.
Natl. Acad. Sci. USA 46:461 (1960) have been followed by the
refinement of this process into an essential tool of modern
biology.
[0053] The complement of a nucleic acid sequence as used herein
refers to an oligonucleotide which, when aligned with the nucleic
acid sequence such that the 5' end of one sequence is paired with
the 3' end of the other, is in "antiparallel association." Certain
bases not commonly found in natural nucleic acids may be included
in the nucleic acids of the present invention and include, for
example, inosine and 7-deazaguanine. Complementarity need not be
perfect; stable duplexes may contain mismatched base pairs or
unmatched bases. Those skilled in the art of nucleic acid
technology can determine duplex stability empirically considering a
number of variables including, for example, the length of the
oligonucleotide, base composition and sequence of the
oligonucleotide, ionic strength and incidence of mismatched base
pairs.
[0054] As used herein, the term "T.sub.m" is used in reference to
the "melting temperature." The melting temperature is the
temperature at which a population of double-stranded nucleic acid
molecules becomes half dissociated into single strands. Several
equations for calculating the T.sub.m of nucleic acids are well
known in the art. As indicated by standard references, a simple
estimate of the T.sub.m value may be calculated by the equation:
T.sub.m=81.5+0.41(% G+C), when a nucleic acid is in aqueous
solution at 1 M NaCl (see e.g., Anderson and Young, Quantitative
Filter Hybridization, in Nucleic Acid Hybridization (1985). Other
references (e.g., Allawi, H. T. & SantaLucia, J., Jr.
Thermodynamics and NMR of internal G.T mismatches in DNA.
Biochemistry 36, 10581-94 (1997) include more sophisticated
computations which take structural and environmental, as well as
sequence characteristics into account for the calculation of
T.sub.m.
[0055] The term "gene" refers to a DNA sequence that comprises
control and coding sequences necessary for the production of an RNA
having a non-coding function (e.g., a ribosomal or transfer RNA), a
polypeptide or a precursor. The RNA or polypeptide can be encoded
by a full length coding sequence or by any portion of the coding
sequence so long as the desired activity or function is
retained.
[0056] The term "wild-type" refers to a gene or a gene product that
has the characteristics of that gene or gene product when isolated
from a naturally occurring source. A wild-type gene is that which
is most frequently observed in a population and is thus arbitrarily
designated the "normal" or "wild-type" form of the gene. In
contrast, the term "modified", "mutant" or "polymorphic" refers to
a gene or gene product which displays modifications in sequence and
or functional properties (i.e., altered characteristics) when
compared to the wild-type gene or gene product. It is noted that
naturally-occurring mutants can be isolated; these are identified
by the fact that they have altered characteristics when compared to
the wild-type gene or gene product.
[0057] The term "oligonucleotide" as used herein is defined as a
molecule comprising two or more deoxyribonucleotides or
ribonucleotides, preferably at least 5 nucleotides, more preferably
at least about 13 to 35 nucleotides. The exact size will depend on
many factors, which in turn depend on the ultimate function or use
of the oligonucleotide. The oligonucleotide may be generated in any
manner, including chemical synthesis, DNA replication, reverse
transcription, PCR, or a combination thereof.
[0058] Because mononucleotides are reacted to make oligonucleotides
in a manner such that the 5' phosphate of one mononucleotide
pentose ring is attached to the 3' oxygen of its neighbor in one
direction via a phosphodiester linkage, an end of an
oligonucleotide is referred to as the "5'-end" if its 5' phosphate
is not linked to the 3' oxygen of a mononucleotide pentose ring and
as the "3'-end" if its 3' oxygen is not linked to a 5' phosphate of
a subsequent mononucleotide pentose ring. As used herein, a nucleic
acid sequence, even if internal to a larger oligonucleotide, also
may be said to have 5' and 3' ends. A first region along a nucleic
acid strand is said to be upstream of another region if the 3' end
of the first region is before the 5' end of the second region when
moving along a strand of nucleic acid in a 5' to 3' direction. All
oligonucleotide primers disclosed herein are understood to be
presented in the 5' to 3' direction when reading left to right.
[0059] When two different, non-overlapping oligonucleotides anneal
to different regions of the same linear complementary nucleic acid
sequence, and the 3' end of one oligonucleotide points towards the
5' end of the other, the former may be called the "upstream"
oligonucleotide and the latter the "downstream" oligonucleotide.
Similarly, when two overlapping oligonucleotides are hybridized to
the same linear complementary nucleic acid sequence, with the first
oligonucleotide positioned such that its 5' end is upstream of the
5' end of the second oligonucleotide, and the 3' end of the first
oligonucleotide is upstream of the 3' end of the second
oligonucleotide, the first oligonucleotide may be called the
"upstream" oligonucleotide and the second oligonucleotide may be
called the "downstream" oligonucleotide.
[0060] The term "primer" refers to an oligonucleotide that is
capable of acting as a point of initiation of synthesis when placed
under conditions in which primer extension is initiated. An
oligonucleotide "primer" may occur naturally, as in a purified
restriction digest or may be produced synthetically. A primer is
selected to be "substantially" complementary to a strand of
specific sequence of the template. A primer must be sufficiently
complementary to hybridize with a template strand for primer
elongation to occur. A primer sequence need not reflect the exact
sequence of the template. For example, a non-complementary
nucleotide fragment may be attached to the 5' end of the primer,
with the remainder of the primer sequence being substantially
complementary to the strand. Non-complementary bases or longer
sequences can be interspersed into the primer, provided that the
primer sequence has sufficient complementarity with the sequence of
the template to hybridize and thereby form a template primer
complex for synthesis of the extension product of the primer.
[0061] The term "target nucleic acid" refers to a nucleic acid
molecule containing a sequence that has at least partial
complementarity with an oligonucleotide primer. The target nucleic
acid may comprise single- or double-stranded DNA or RNA.
[0062] The term "variable sequence" as used herein refers to
differences in nucleic acid sequence between two nucleic acids. For
example, the same gene of two different bacterial species may vary
in sequence by the presence of single base substitutions and/or
deletions or insertions of one or more nucleotides. These two forms
of the structural gene are said to vary in sequence from one
another.
[0063] The term "nucleotide analog" as used herein refers to
modified or non-naturally occurring nucleotides such as 5-propynyl
pyrimidines (i.e., 5-propynyl-dTTP and 5-propynyl-dTCP), 7-deaza
purines (i.e., 7-deaza-dATP and 7-deaza-dGTP). Nucleotide analogs
include base analogs and comprise modified forms of
deoxyribonucleotides as well as ribonucleotides.
[0064] The term "microorganism" as used herein means an organism
too small to be observed with the unaided eye and includes, but is
not limited to bacteria, virus, protozoans, fungi; and
ciliates.
[0065] The term "microbial gene sequences" refers to gene sequences
derived from a microorganism.
[0066] The term "bacteria" or "bacterium" refers to any member of
the groups of eubacteria and archaebacteria.
[0067] The term "virus" refers to obligate, ultramicroscopic,
intracellular parasites incapable of autonomous replication (i.e.,
replication requires the use of the host cell's machinery).
[0068] The term "sample" in the present specification and claims is
used in its broadest sense. On the one hand it is meant to include
a specimen or culture (e.g., microbiological cultures). On the
other hand, it is meant to include both biological and
environmental samples. A sample may include a specimen of synthetic
origin.
[0069] Biological samples may be animal, including human, fluid,
solid (e.g., stool) or tissue, as well as liquid and solid food and
feed products and ingredients such as dairy items, vegetables, meat
and meat by-products, and waste. Biological samples may be obtained
from all of the various families of domestic animals, as well as
feral or wild animals, including, but not limited to, such animals
as ungulates, bear, fish, lagamorphs, rodents, etc.
[0070] Environmental samples include environmental material such as
surface matter, soil, water and industrial samples, as well as
samples obtained from food and dairy processing instruments,
apparatus, equipment, utensils, disposable and non-disposable
items. These examples are not to be construed as limiting the
sample types applicable to the present invention.
[0071] The term "source of target nucleic acid" refers to any
sample that contains nucleic acids (RNA or DNA). Particularly
preferred sources of target nucleic acids are biological samples
including, but not limited to blood, saliva, cerebral spinal fluid,
pleural fluid, milk, lymph, sputum and semen. The source of nucleic
acid may also be an organism such as a human, animal, bacterium,
virus or fungus for example.
[0072] The term "polymerization means" or "polymerization agent"
refers to any agent capable of facilitating the addition of
nucleoside triphosphates to an oligonucleotide. Preferred
polymerization means comprise DNA and RNA polymerases.
[0073] The term "adduct" is used herein in its broadest sense to
indicate any compound or element that can be added to an
oligonucleotide. An adduct may be charged (positively or
negatively) or may be charge-neutral. An adduct may be added to the
oligonucleotide via covalent or non-covalent linkages. Examples of
adducts include, but are not limited to, indodicarbocyanine dye
amidites, amino-substituted nucleotides, ethidium bromide, ethidium
homodimer, (1,3-propanediamino)propidium,
(diethylenetriamino)propidium, thiazolc orange,
(N-N'-tetramethyl-1,3-propanediamino)propyl thiazole orange,
(N-N'-tetramethyl-1,2-ethanediamino)propyl thiazole orange,
thiazole orange-thiazole orange homodimer (TOTO), thiazole
orange-thiazole blue heterodimer (TOTAB), thiazole orange-ethidium
heterodimer 1 (TOED1), thiazole orange-ethidium heterodimer 2
(TOED2) and fluorescein-ethidium heterodimer (FED), psoralens,
biotin, streptavidin, avidin, etc.
[0074] Where a first oligonucleotide is complementary to a region
of a target nucleic acid and a second oligonucleotide has
complementary to the same region (or a portion of this region) a
"region of overlap" exists along the target nucleic acid. The
degree of overlap will vary depending upon the nature of the
complementarity.
[0075] As used herein, the term "purified" or "to purify" refers to
the removal of contaminants from a sample.
[0076] As used herein the term "portion" when in reference to a
protein (as in "a portion of a given protein") refers to fragments
of that protein. The fragments may range in size from four amino
acid residues to the entire amino acid sequence minus one amino
acid (e.g., 4, 5, 6, . . . , n-1).
[0077] The term "nucleic acid" or "nucleic acid sequence" as used
herein refers to an oligonucleotide, nucleotide or polynucleotide,
and fragments or portions thereof, and to DNA or RNA of genomic or
synthetic origin which may be single or double stranded, and
represent the sense or antisense strand. Similarly, "amino acid
sequence" as used herein refers to peptide or protein sequence.
[0078] The term "peptide nucleic acid" ("PNA") as used herein
refers to a molecule comprising bases or base analogs such as would
be found in natural nucleic acid, but attached to a peptide
backbone rather than the sugar-phosphate backbone typical of
nucleic acids. The attachment of the bases to the peptide is such
as to allow the bases to base pair with complementary bases of
nucleic acid in a manner similar to that of an oligonucleotide.
These small molecules, also designated anti gene agents, stop
transcript elongation by binding to their complementary strand of
nucleic acid (Nielsen, et al. Anticancer Drug Des. 8:53 63
[1993]).
[0079] The term "locked nucleic acid ("LNA") as used herein, refers
to a conformationally restricted nucleic acid analogue, in which
the ribose ring is locked into a rigid C3'-endo (or Northern-type)
conformation by a simple 2'-O, 4'-C methylene bridge. Duplexes
involving LNA (hybridized to either DNA or RNA) display a large
increase in melting temperatures of between +3.0 to +9.3.degree. C.
per LNA modification, in comparison to corresponding unmodified
reference duplexes. LNA recognizes both DNA and RNA with remarkable
affinities and selectivities. Incorporation of a given number of
LNA monomers into oligonucleotides is a very convenient way of
vastly improving the stability and specificity of duplexes toward
complementary RNA or DNA such as, for example, primer binding
regions.
[0080] As used herein, the terms "purified" or "substantially
purified" refer to molecules, either nucleic or amino acid
sequences, that are removed from their natural environment,
isolated or separated, and are at least 60% free, preferably 75%
free, and most preferably 90% free from other components with which
they are naturally associated. An "isolated polynucleotide" or
"isolated oligonucleotide" is therefore a substantially purified
polynucleotide.
[0081] The term "duplex" refers to the state of nucleic acids in
which the base portions of the nucleotides on one strand are bound
through hydrogen bonding the their complementary bases arrayed on a
second strand. The condition of being in a duplex form reflects on
the state of the bases of a nucleic acid. By virtue of base
pairing, the strands of nucleic acid also generally assume the
tertiary structure of a double helix, having a major and a minor
groove. The assumption of the helical form is implicit in the act
of becoming duplexed.
[0082] The term "template" refers to a strand of nucleic acid on
which a complementary copy is built from nucleoside triphosphates
through the activity of a template-dependent nucleic acid
polymerase. Within a duplex the template strand is, by convention,
depicted and described as the "bottom" strand. Similarly, the
non-template strand is often depicted and described as the "top"
strand.
[0083] The term "template-dependent RNA polymerase" refers to a
nucleic acid polymerase that creates new RNA strands through the
copying of a template strand as described above and which does not
synthesize RNA in the absence of a template. This is in contrast to
the activity of the template-independent nucleic acid polymerases
that synthesize or extend nucleic acids without reference to a
template, such as terminal deoxynucleotidyl transferase, or Poly A
polymerase.
[0084] The term "in silico" when used in relation to a process
indicates that the process is simulated on or embedded in a
computer.
[0085] The term "priming region" refers to a region on a target
nucleic acid sequence to which a primer hybridizes for the purpose
of extension of the complementary strand of the target nucleic acid
sequence.
[0086] The term "non-templated T residue" as used herein refers to
a thymidine (T) residue added to the 5' end of a primer which does
not necessarily hybridize to the target nucleic acid being
amplified.
[0087] The term "genotype" as used herein refers to at least a
portion of the genetic makeup of an individual. A portion of a
genome can be sufficient for assignment of a genotype to an
individual provided that the portion of the genome contains a
representative sequence or base composition to distinguish the
genotype from other genotypes.
[0088] The term "nucleobase" as used herein is synonymous with
other terms in use in the art including "nucleotide,"
"deoxynucleotide," "nucleotide residue," "deoxynucleotide residue,"
"nucleotide triphosphate (NTP)," or deoxynucleotide triphosphate
(dNTP).
[0089] As defined herein, "base composition" refers to the numbers
of each of the four standard nucleobases that are present within a
given standard sequence or corresponding amplification product of a
standard, test or variant sequence. Methods including steps of
measuring base compositions are disclosed and claimed in commonly
owned published U.S. Patent Application Nos: 20030124556,
20030082539, 20040209260, 20040219517, and 20040180328 and U.S.
Ser. Nos. 10/728,486, 10/829,826, 10/660,998, 10/853,660,
60/604,329, 60/632,862, 60/639,068, 60/648,188, Ser. Nos.
11/060,135, 11/073,362, and 60/658,248, each of which is
incorporated herein by reference in entirety.
[0090] As used herein, the term "base composition analysis" refers
to determination of the base composition of an amplification
product representing a sub-segment of a target nucleic acid
sequence from the molecular mass of the amplification product
determined by mass spectrometry. In embodiments of the present
invention, base composition analysis may include determination of
base compositions of two or more amplification products
representing overlapping sub-segments of a nucleic acid sequence
which are to be compared with the defined base compositions of the
corresponding overlapping sub-segments of one or more reference
nucleic acids
[0091] As used herein, the term "reference nucleic acid" or
"reference nucleic acid segment" is a characterized nucleic acid of
known sequence and/or known base composition. A reference nucleic
acid segment is compared with uncharacterized sequences in various
embodiments of the present invention. For example, a characterized
vector or portion thereof can be used as a reference nucleic acid
segment. A characterized portion of human nucleic acid may also be
used as a reference nucleic acid provided the genotype, identity or
race of the human from which the reference nucleic acid is obtained
is known. A genome or a portion thereof of a bacterium, virus or
fungus may also be employed as a reference nucleic acid provided
that the species or genotype of the bacterium, virus or fungus is
known.
[0092] As used herein, the term "reference base composition" refers
to a characterized base composition. For example, a sub-segment of
a reference nucleic acid having the defined sequence AAAAATTTTCCCGG
has a standard base composition of A.sub.5 T.sub.4 C.sub.3
G.sub.2.
[0093] As used herein, the term "test nucleic acid sequence" refers
to an uncharacterized nucleic acid sequence whose base composition
is to be characterized and compared with one or more standard
nucleic acid segments.
[0094] As used herein, term "overlap" or "overlapping sub-segments"
refers to sub-segments of a standard nucleic acid segment which
have overlap as illustrated by the following example which employs
a standard nucleic acid segment of length of 300 nucleobases. A
first sub-segment may, for example, extend from position 1 to
position 100. A second sub-segment may, for example, extend from
position 60 to position 160, having overlap from position 60 to
position 100. A third sub-segment may, for example, extend from
position 120 to position 220, having overlap from position 120 to
position 160. A fourth sub-segment may, for example, extend from
position 180 to position 280, having overlap from position 180 to
position 220. Producing sub-segments with overlap is useful because
it provides redundancy and reduces the likelihood that sub-segments
containing variants relative to a given standard sub-segment will
be mischaracterized. If a primer used to amplify a given
sub-segment hybridizes to a position with a mutation relative to
the reference sequence, the amplification product will not contain
the mutation because the primer extension product is used as a
subsequent template in subsequent amplification cycles. Thus,
having overlap of two sub-segments wherein overlap of the second
sub-segment over the first sub-segment extends past the reverse
primer hybridization site of the first sub-segment eliminates the
possibility that the reverse primer for the first sub-segment will
mask a given mutation within the first sub-segment reverse primer
hybridization site. The extent of minimal overlap should be
determined by the length of the primer hybridization site of a
given sub-segment. Generally, overlap of sub-segments by several
nucleobases is appropriate but shorter overlap lengths may also be
appropriate provided the primer hybridization sites are shorter
nucleobases. The avoidance of overlap of primer hybridization sites
on overlapping sub-segments is preferred.
[0095] As used herein, the term "co-amplification" or
"co-amplified" refers to the process of obtaining more than one
amplification product in the same amplification reaction mixture
using the same pair of primers.
[0096] As used herein, the term "vector" refers to a nucleic acid
adapted for transfection into a host cell. Examples of vectors
include, but are not limited to, plasmids, cosmids, bacteriophages
and the like.
[0097] As used herein, the term "therapeutic protein" refers to any
protein product produced by biotechnological methods for use as a
therapeutic product. Examples of therapeutic proteins include, but
are not limited to protein products such as vaccines, antibodies,
structural proteins, hormones, and cell signaling proteins such as
receptors, cytokines and the like.
[0098] As used herein, the term "recombinant" refers to having been
created by genetic engineering. For example, a "recombinant insert"
refers to a nucleic acid segment inserted into another nucleic acid
sequence using techniques well known to those with ordinary skill
in the arts of genetic engineering and molecular biology.
[0099] A "nucleic acid variant" is herein defined as a nucleic acid
having substantial similarity or sequence identity with a
"standard" nucleic acid sequence. For example, between about 70% up
to but not including 100% sequence identity.
[0100] As used herein, a "triplex combination of primer pairs"
refers to three primer pairs which is to be included in an
amplification mixture for the purpose of obtaining three distinct
amplification products from a given target nucleic acid.
DESCRIPTION OF EMBODIMENTS
[0101] Provided herein are compositions and methods for determining
the presence of a nucleic acid variant or a genotype relative to a
known and defined "reference" nucleic acid sequence. Identification
of a distinct genotype in certain embodiments is satisfied by
identification of a distinct base composition of a given
sub-segment of a target nucleic acid.
[0102] In the methods described herein where the genotype, and in
turn the identity, of a nucleic acid sample is determined, the
nucleic acid is measured to deliver a base composition profile.
That measured base composition profile is then compared to a
reference base composition profile that is further associated with
an identity. The reference base composition can be a head-to-head
comparison or a standard reference database. In both the
head-to-head comparison and the standard reference database
comparison, the unknown sample is analyzed using the disclosed
compositions and methods to generate a measured base composition
profile. For the head-to-head comparison, the reference base
composition profile is generated by similarly analyzing samples
from a selected suspect population using the disclosed compositions
and methods. The measured base composition is then compared to the
reference base compositions and if a match occurs between the
unknown and a suspect, then the identity is determined. In the
standard reference database comparison the measured base
composition is compared to a pre-existing database of reference
base compositions. This database can be populated using standard
reference nucleic acids, previously measured base composition and
converted data to generate base compositions. For example, but not
limitation, a standard reference nucleic acid can include
commercially available vectors like pUC, the certified values for
CODIS 13 loci (SRM 2391b available from the National Institute of
Standards and Technology) and the Anderson mitochondrial DNA
sequence. Converted data can include, but is not limited to,
previously obtained sequence data, such as the reference data that
is stored in the SWGDAM database that is bioinformatically
converted to base composition data.
[0103] Also provided herein are compositions and methods for
identifying a human by comparison of base compositions of
amplification products representing overlapping sub-segments of a
target nucleic acid with base compositions of reference
sub-segments of one or more reference nucleic acids.
[0104] Amplification products of portions of the target nucleic
acid which correspond to the sub-segments are produced and their
molecular masses are measured by mass spectrometry. Base
compositions of the amplification products are calculated from
their molecular masses and the base compositions are compared with
the base compositions of the corresponding sub-segments of the
reference nucleic acid. A given target region can have any length
depending upon the type of analysis to be conducted and in
recognition of the numbers of primer pairs required to obtain
amplification products representing overlapping sub-segments of the
target, If a bacterium with a large genome is to be analyzed, and
the target is the entire genome, a target nucleic acid may have a
length of several kilobases. Alternatively, a target region may be
of a length of about 300 to about 1000 nucleobases in length.
[0105] In some embodiments, the nucleic acid variant has a sequence
identical to the standard sequence with the exception of having one
or more single nucleotide polymorphisms, insertions or
deletions.
[0106] In some embodiments, the reference nucleic acid and variant
nucleic acid is either single stranded or double stranded DNA or
RNA. In some embodiments, the standard and variant nucleic acid
originates from the genome of a bacterium or a virus or is a
synthesized nucleic acid such as a PCR product, for example.
[0107] A set of sub-segments within the reference nucleic acid
sequence is defined. In some embodiments, the members of the set of
standard sub-segments are from about 45 to about 150 nucleobases in
length. One will recognize that this includes standard sub-segments
of lengths of 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132,
133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145,
146, 147, 148, 149, or 150 nucleobases in length.
[0108] In some embodiments, the molecular masses of the test
amplification products are determined by mass spectrometry such as
electrospray Fourier transform ion cyclotron resonance (FTICR) mass
spectrometry or electrospray time-of-flight mass spectrometry. The
use of electrospray mass spectrometry permits the measurement of
large amplification products, as large as 500 nucleobases in
length, whereas amplification products analyzed by matrix-assisted
laser desorption ionization mass spectrometry are typically much
smaller in length (approximately 15 nucleobases in length).
[0109] If desired, the length of the standard segments can be
chosen such that some members of the set have calculated molecular
masses that are dissimilar from other members of the set. Having
standard segments of dissimilar molecular masses allows for
multiplexing or pooling of amplification products corresponding to
the standard segments prior to molecular mass determination, by
mass spectrometry for example. As is illustrated in FIGS. 2 and 3,
the resultant amplification products from a reaction using the at
least two primer pairs are sufficiently separated along the charge
axis of the mass spectrometry plot. This separation is preferred,
but not necessary, because the individually measured amplicon
strands can be easily visualized.
[0110] In some embodiments, the compositions and methods are used
for genotyping of a suspected variant of a known species of
bacterium or virus. The base compositions of the test amplification
products, if different from the base composition of the standard
segments, provide the means for identification of a previously
known variant, or for characterization of a previously unobserved
variant.
[0111] In some embodiments, the compositions and methods are used
for identification and characterization of genetically engineered
bacteria or viruses. Genetically engineered organisms are produced
by insertion or deletion of genes. These modifications are readily
detectable by the methods of the present invention.
[0112] In some embodiments, the compositions and methods can be
used for validation of reference nucleic acid sequences such as
those encoding therapeutic proteins including but not limited to
vaccines and biological drugs such as monoclonal antibodies for
example. A nucleic acid is "validated" by base composition analysis
according to the method of the present invention, wherein the
result indicates that the analyzed nucleic acid and/or sub-segments
thereof have the same base compositions as the reference nucleic
acid. The process of "validation" confirms that polymorphisms have
not been introduced into the target sequence relative to the
reference sequence.
[0113] In some embodiments, a known quantity of the standard
sequence is included in the sample (as an internal calibration
standard) containing the suspected variant and the quantity of the
variant is determined from the abundance data obtained from mass
spectrometry for example. Methods of using internal calibration
standards in base composition analyses are described in commonly
owned U.S. application Ser. No. 11/059,776 which is incorporated
herein by reference in entirety.
[0114] In some embodiments, the compositions and methods are used
for characterization of heterogeneity of a standard nucleic acid
test sample. For example, the standard nucleic acid test sample can
be a vaccine vector having a standard sequence. The present
invention can be used to identify a variant of said standard
sequence and also determine the quantity of the variant relative to
the standard sequence. Such an analysis is advantageous, for
example, in situations requiring rapid throughput analysis for
quality control. The methods described herein will be able to
determine if the quantity of a variant sub-population increases to
the point wherein quality of the product is compromised.
[0115] In some embodiments, the compositions and methods are used
for identification of a genotype of a given organism. This can be
accomplished by first selecting a series of primer pairs for
amplification of consecutive or overlapping segments of a standard
nucleic acid region found across known genotypes of a given
organism. The process continues by amplifying a test nucleic acid
of an organism of unknown genotype with the series of primer pairs
to obtain a corresponding series of amplification products, at
least some of which are then measured by mass spectrometry. Base
compositions of the amplification products are then calculated from
the molecular masses. These base compositions are compared with
measured or calculated amplification product base compositions
representing amplification products of known genotypes of a given
organism obtained with the same series of primers. One or more
matches of known and unknown base compositions provide the genotype
of the organism.
[0116] Preferably, at least some or all of the amplification
products have a range of lengths between about 45 to about 150
nucleobases. However, and depending on the mass spectrometer
instrument used, the amplification products analyzed by mass
spectrometry can be as large as about 500 nucleobases. Moreover,
very large amplification products can be digested into smaller
fragments that are compatible with the mass spectrometer used.
Methods of base composition analysis are described in commonly
owned U.S. patent application Ser. Nos. 10/660,998, 10/853,660, and
11/209,439, each of which are incorporated herein by reference in
entirety.
[0117] In some embodiments, the amplification is effected using the
polymerase chain reaction (PCR). In some embodiments, the PCR
reaction is performed with an extension cycle having a length of
one second. The one second extension cycle is shorter than an
ordinary extension cycle and is employed for the purpose of
minimization of artifact amplification products arising from target
site crossover.
[0118] In some embodiments, the organism of unknown genotype is a
human individual. In some embodiments, obtaining a genotypic result
for a human individual provides the means to draw a forensic
conclusion with regard to the individual, for example, to conclude
with a very high probability that the individual has had contact
with another individual or was present at a particular
location.
[0119] In some embodiments with applications in human forensics, a
given forensic nucleic acid sample may be characterized by base
composition analysis that includes comparison with members of a
database of tens, hundreds or even thousands of reference nucleic
acid segments obtained from individuals of known identity or racial
profile, or with standard references like the Anderson
mitochondrial DNA sequence. Such a database can be stored on or
embedded in a computer-readable medium and accessed over a network
such as the internet for example. Preferably the database comprises
base compositions of individual sub-segments of the reference
nucleic acids.
[0120] In some embodiments, the nucleic acid being amplified for a
genotyping analysis is mitochondrial DNA. In other embodiments, the
nucleic acid is chromosomal DNA.
[0121] In some embodiments, the mitochondrial DNA being amplified
for a genotyping analysis is from one or both of the highly
variable regions HV1 or HV2.
[0122] In some embodiments, the length of the DNA region being
analyzed is 300 to 700 nucleobases in length. In other embodiments,
the length of the DNA region being analyzed in 400 to 600
nucleobases in length or any length therewithin.
[0123] In some embodiments, the amplifying step of the method is
carried out in the presence of a dNTP containing a molecular
mass-modifying tag. In some embodiments, only one of the four
canonical dNTPs has the molecular mass-modifying tag. In some
embodiments, the dNTP containing the molecular mass-modifying tag
is 2'-deoxy-guanosine-5'-triphosphase, which has the greatest mass
of the four canonical dNTPs. In other embodiments, any of the other
three canonical dNTPs can contain the molecular mass-modifying tag.
In some embodiments, the tag comprises a minor isotope of carbon or
nitrogen. In some embodiments, the isotope of the molecular
mass-modifying tag is .sup.13C or .sup.15N. The advantage to
employing the latter mass-modifying tags is that the dNTP structure
is not altered and thus, efficiency of the amplification process
should be retained.
[0124] In some embodiments, the 3' end residue of each primer
hybridizes to a conserved nucleic acid residue of the target
nucleic acid wherein the conserved nucleic acid residue is
conserved among different genotypes. In other embodiments, the
final two 3' end residues of each primer hybridizes to a conserved
nucleic acid residue of the target nucleic acid wherein the
conserved nucleic acid residue is conserved among different
genotypes. In other embodiments, the final three 3' end residues of
each primer hybridizes to a conserved nucleic acid residue of the
target nucleic acid wherein the conserved nucleic acid residue is
conserved among different genotypes.
[0125] In some embodiments, multiplexing amplification reactions
are carried out with at least two primer pairs. In other
embodiments, multiplexing reactions are carried out with three
primer pairs, also known as triplex combinations.
[0126] In some embodiments, the compositions and methods are used
for characterization of length or base composition heteroplasmy in
mitochondrial DNA and also for determination of the quantity of a
given heteroplasmic variant relative to a "standard" mitochondrial
DNA region. In some embodiments, characterization of length
heteroplasmy is used to diagnose and/or evaluate the progression of
a mitochondrial DNA-related genetic disease such as one or more of
the following mitochondrial diseases: Alpers Disease, Barth
syndrome, Beta-oxidation Defects, Carnitine-Acyl-Carnitine
Deficiency, Carnitine Deficiency, Co-Enzyme Q10 Deficiency, Complex
I Deficiency, Complex II Deficiency, Complex III Deficiency,
Complex IV Deficiency, Complex V Deficiency, COX Deficiency, CPEO,
CPT I Deficiency, CPT II Deficiency, Glutaric Aciduria Type II,
KSS, Lactic Acidosis, LCAD, LCHAD, Leigh Disease or Syndrome, LHON,
Lethal Infantile Cardiomyopathy, Luft Disease, MAD, MCA, MELAS,
MERRF, Mitochondrial Cytopathy, Mitochondrial DNA Depletion,
Mitochondrial Encephalopathy, Mitochondrial Myopathy, MNGIE, NARP,
Pearson Syndrome, Pyruvate Carboxylase Deficiency, Pyruvate
Dehydrogenase Deficiency, Respiratory Chain, SCAD, SCHAD, or
VLCAD.
[0127] Determination of sequence identity is described in the
following example: a nucleic acid 20 nucleobases in length which is
otherwise identical to another 20 nucleobase nucleic acid but
having two non-identical residues has 18 of 20 identical residues
has 18/20=0.9 or 90% sequence identity. In another example, a
nucleic acid 15 nucleobases in length having all residues identical
to a 15 nucleobase segment of a nucleic acid 20 nucleobases in
length would have 15/20=0.75 or 75% sequence identity with the 20
nucleobase nucleic acid. In another example, a nucleic acid 17
nucleobases in length having all residues identical to a 15
nucleobase segment of a nucleic acid 20 nucleobases in length would
have 15/17=0.882 or 88.2% sequence identity. In some embodiments, a
nucleic acid variant has between about 70% and 99% sequence
identity with a standard nucleic acid sequence. In other
embodiments, the nucleic acid variant has between about 75% to
about 99% sequence identity. In other embodiments, the nucleic acid
has between about 80% to about 99% sequence identity. In other
embodiments, the nucleic acid has between about 85% to about 99%
sequence identity. In other embodiments, the nucleic acid has
between about 90% to about 99% sequence identity. In other
embodiments, the nucleic acid has between about 95% to about 99%
sequence identity. One will recognize that these embodiments
provide for nucleic acid variants having sequence identity with a
standard nucleic acid sequence ranging from about 70%, 71%, 72%,
73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%,
86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 98%,
to about 99%, as well as fractions thereof.
EXAMPLES
Example 1
Selection of Primers for Analysis of Mitochondrial DNA
[0128] An alignment of 5615 mitochondrial DNA sequences was
constructed and analyzed for regions of conservation which are
useful as primer binding sites for tiling coverage of the
mitochondrial DNA regions HV1 and HV2. A total of 24 primer binding
sites were chosen according to the criterion that the 5'-end of the
primer binding sites remain conserved across the alignment of
mitochondrial DNA sequences. In some cases, only the 5'-terminal
nucleobase itself is conserved. In other cases, as many as two or
three consecutive nucleobases at the 5' end of the primer binding
sites are conserved.
[0129] In cases where primer coverage at a particular region is
desired but complete conservation is absent, backup primer pairs
can be chosen to ensure that target sequences will be amplified.
For example, the 5' end of the primer binding site for the forward
primer of primer pair number 2893 is 99.7% conserved among the 5615
mitochondrial DNA sequences of the alignment, a backup primer pair
was designed. Primer pair number 2894 has a G residue instead of an
A residue because A is 0.3% conserved at the 5' end of the primer
binding site.
[0130] Table 1 shows the panel of 25 primer pairs designed to tile
the informative HV1 (coordinates 15924 . . . 16428) and HV2
(coordinates 31-576) mitochondrial DNA regions for complete and
partially redundant coverage with partially overlapping
amplification products according to the general scheme shown in
FIG. 1. The extent of overlap may vary but generally overlapping
regions relative to two amplification products should range from
about ten nucleobases to about 50 nucleobases of overlap. The sizes
of amplification products produced with the primer pairs of Table 1
range in length from 85 to 140 nucleobase pairs. With the exception
of three amplification products, all are less than 130 nucleobase
pairs. The coordinates of the primer binding sites are given in the
forward and reverse primer names with reference to the standard
Anderson mitochondrial DNA sequence (SEQ ID NO: 51). For example,
the forward primer of primer pair number 2889 (SEQ TD NO: 1)
hybridizes to coordinates 16357-16376 of the standard Anderson
mitochondrial DNA sequence (SEQ ID NO: 51). The primer pair name
designation "HUMMTDNA" refers to human mitochondrial DNA. Primer
pair numbers 2901 and 2925 are designed to produce an amplification
product corresponding to the same sub-segment defined by Anderson
mitochondrial DNA coordinates 15924 . . . 15985 (see Table 2). This
extent of redundancy is sometimes beneficial in cases where high
variability occurs at chosen primer binding sites such that a given
primer of a primer pair does not effectively hybridize to the
mitochondrial DNA of certain individuals. For this reason, 25
primer pairs are used to obtain amplification products of 24
sub-segments.
TABLE-US-00001 TABLE 1 Primer Pairs Used for Amplifying HV1 and HV2
Regions of Mitochondrial DNA Primer Forward Forward Reverse Reverse
pair primer Forward SEQ ID primer Reverse SEQ ID number name
sequence NO: name sequence NO: 2889 HUMMTDNA_ TCTCGTCCCC 1
HUMMTDNA_A TCGAGGAGAGT 26 ASN_16357 ATGGATGACC SN_16429_1
AGCACTCTTGT _16376_F 6451_R G 2890 HUMMTDNA_ TGCCATTTAC 2
HUMMTDNA_A TGGTCAAGGGA 27 ASN_16318 CGTACATAGC SN_16382_1
CCCCTATCTG _16341_F ACAT 6402_R 2891 HUMMTDNA_ TCACCCCTCA 3
HUMMTDNA_A TGGGACGAGAA 28 ASN_16256 CCCACTAGGA SN_16345_1
GGGATTTGACT _16282_F TACCAAC 6366_R 2892 HUMMTDNA_ TCACACATCA 4
HUMMTDNA_A TGCTATGTACG 29 ASN_16231 ACTGCAACTC SN_16306_1
GTAAATGGCTT _16253_F CAA 6338_R TATGTACTATG 2893 HUMMTDNA_
TAGTACATAA 5 HUMMTDNA_A TGGTGAGGGGT 30 ASN_16154 AAACCCAATC
SN_16251_1 GGCTTTG _16181_F CACATCAA 6268_R 2894 HUMMTDNA_
TAGTACATAA 6 HUMMTDNA _A TGGTGAGGGGT 31 ASN_16154 AAACCCAATC
SN_16251_1 GGCTTTG _16181_2_ CACATCAG 6268_R F 2895 HUMMTDNA_
TTTCCATAAA 7 HUMMTDNA_A TGGGTTGATTG 32 ASN_16130 TACTTGACCA
SN_16202_1 CTGTACTTGCT _16156_F CCTGTAG 6224_R T 2896 HUMMTDNA_
TACTGCCAGC 8 HUMMTDNA_A TGGGTTGATTG 33 ASN_16102 CACCATGAAT
SN_16202_1 CTGTACTTGCT _16123_F AT 6224_R T 2897 HUMMTDNA_
TCCAAGTATT 9 HUMMTDNA_A TACAGGTGGTC 34 ASN_16055 GACTCACCCA
SN_16130_1 AAGTATTTATG _16077_F TCA 6155_R GTAC 2898 HUMMTDNA_
TCTTTCATGG 10 HUMMTDNA_A TCATGGTGGCT 35 ASN_16025 GGAAGCAGAT
SN_16099_1 GGCAGTAATG _16047_F TTG 6119_R 2899 HUMMTDNA_ TGCACCCAAA
11 HUMMTDNA_A TGGTGAGTCAA 36 ASN_15985 GCTAAGATTC SN_16052_1
TACTTGGGTGG _16014_F TAATTTAAAC 6073_R 2901 HUMMTDNA_ TGGGGTATAA 12
HUMMTDNA_A TTAAATTAGAA 37 ASN_15893 ACTAATACAC SN_15986_1
TCTTAGCTTTG _15923_F CAGTCTTGTA 6012_R GGTGC A 2902 HUMMTDNA_
TCAGGTCTAT 13 HUMMTDNA_A TGTCTCGCAAT 38 ASN_5_30_ CACCCTATTA
SN_77_97_R GCTATCGCGT F ACCACT 2903 HUMMTDNA_ TATTAACCAC 14
HUMMTDNA_A TTTCAAAGACA 39 ASN_20_40 TCACGGGAGC SN_115_139
GATACTGCGAC _F T _R ATA 2904 HUMMTDNA_ TAGCATTGCG 15 HUMMTDNA_A
TGCCTGTAATA 40 ASN_83_10 AGACGCTGGA SN_163_187 TTGAACGTAGG 2_F _R
TGC 2905 HUMMTDNA_ TCTATGTCGC 16 HUMMTDNA_A TGGGTTATTAT 41
ASN_113_1 AGTATCTGTC SN_218_245 TATGTCCTACA 37_F TTTGA _R AGCATT
2906 HUMMTDNA_ TCCTTTATCG 17 HUMMTDNA_A TGGTTGTTATG 42 ASN_154_1
CACCTACGTT SN_268_290 ATGTCTGTGTG 77_F CAAT _R G 2907 HUMMTDNA_
TAACAATTGA 18 HUMMTDNA_A TGTTTTTGGGG 43 ASN_239_2 ATGTCTGCAC
SN_341_363 TTTGGCAGAGA 62_F AGCC _R T 2908 HUMMTDNA_ TGTGTTAATT 19
HUMMTDNA_A TCTGTGGCCAG 44 ASN_204_2 AATTAATGCT SN_314_330 AAGCGG
33_F TGTAGGACAT _R 2910 HUMMTDNA_ TCTTAAACAC 20 HUMMTDNA_A
TAAAAGTGCAT 45 ASN_331_3 ATCTCTGCCA SN_402_425 ACCGCCAAAAG 54_F
AACC _R AT 2912 HUMMTDNA_ TGCGGTATGC 21 HUMMTDNA_A TGTGTGTGCTG 46
ASN_409_4 ACTTTTAACA SN_502_521 GGTAGGATG 30_F GT _R 2913 HUMMTDNA_
TCTCCCATAC 22 HUMMTDNA_A TGCTTTGAGGA 47 ASN_464_4 TACTAATCTC
SN_577_603 GGTAAGCTACA 92_F ATCAATACA _R TAAAC 2916 HUMMTDNA_
TACCCTAACA 23 HUMMTDNA_A TGGAGGGGAAA 48 ASN_367-3 CCAGCCTAAC
SN_438_463 ATAATGTGTTA 88_F CA _R GTTG 2923 HUMMTDNA_ TGCTTTCCAC 24
HUMMTDNA_A TCTGGTTAGGC 49 ASN_262_2 ACAGACATCA SN_368_390
TGGTGTTAGGG 88_F TAACAAA _R T 2925 HUMMTDNA_ TCCTTTTTCC 25
HUMMTDNA_A TGCTTCCCCAT 50 ASN_15937 AAGGACAAAT SN_16018_1
GAAAGAACAGA _15962_F CAGAGA 6041_R GA
TABLE-US-00002 TABLE 2 Amplification Coordinates of Mitochondrial
DNA for the Primer Pairs of Table 1 Primer pair Amplification
number Coordinates mtDNA Region 2889 16377 . . . 16428 HV1 2890
16342 . . . 16381 HV1 2891 16283 . . . 16344 HV1 2892 16254 . . .
16305 HV1 2893 16182 . . . 16250 HV1 2894 16182 . . . 16250 HV1
2895 16157 . . . 16201 HV1 2896 16124 . . . 16201 HV1 2897 16078 .
. . 16129 HV1 2898 16048 . . . 16098 HV1 2899 16015 . . . 16051 HV1
2901 15924 . . . 15985 HV1 2902 31 . . . 76 HV2 2903 41 . . . 114
HV2 2904 103 . . . 162 HV2 2905 138 . . . 217 HV2 2906 178 . . .
267 HV2 2907 263 . . . 340 HV2 2908 234 . . . 314 HV2 2910 355 . .
. 402 HV2 2912 431 . . . 501 HV2 2913 493 . . . 576 HV2 2916 389 .
. . 437 HV2 2923 289 . . . 371 HV2 2925 15924 . . . 15985 HV1
Example 2
Validation of Triplex Tiling Mitochondrial DNA Assay
[0131] The 25 primer pairs of Table 1 were divided into triplex
combinations of three primer pairs such that the amplification
products of three primer pairs within a triplex combination have
sense and antisense strands which are significantly different in
molecular mass from the other sense and antisense strands of other
amplification products within the triplex combinations. The triplex
combinations are shown in Table 3 with reference to primer pair
combinations.
TABLE-US-00003 TABLE 3 Triplex Combinations of Primer Pairs for
Simultaneous Analysis of Mitochondrial DNA Regions Triplex
Combination Primer Pair Primer Pair Primer Pair No. Number Number
Number 1 2892 2901 2906 2 2891 2908 2925 3 2890 2899 2907 4 2898
2889 2923 5 2902 2910 2893/2894 6 2916 2897 2893 7 2904 2896 2913 8
2895 2912 2905
[0132] PCR cycle conditions used for obtaining amplification
products for this assay are as follows: 10 minutes at 96.degree. C.
followed by six cycles of steps (a) to (c) wherein: (a) is 20
seconds at 96.degree. C., (b) is 1.5 minutes at 55.degree. C., and
(c) is 1 second at 72.degree. C., followed by 36 cycles of steps
(d) to (f) wherein (d) is 20 seconds at 96.degree. C., (b) is 1.5
minutes at 50.degree. C., and (c) is 1 second at 72.degree. C.,
followed by a retention at 4.degree. C. All PCR reactions were
carried out with an Eppendorf thermal cycler with 40 .mu.l reaction
volumes in a 96-well microtiter plate format. Liquid manipulations
were performed using a Packard MPII liquid handling robotic
platform. The PCR reaction mixture consisted of 4 units of Amplitaq
Gold, 1.times. buffer II (Applied Biosystems, Foster City, Calif.),
1.5 mM MgCl.sub.2, 800 .mu.M dNTP mixture and 250 nM of each
primer. The dNTP mixture contained carbon-13 enriched
deoxyguanosine triphosphate, a chemically invisible molecular
mass-modifying tag which adds 10 Da to each G residue incorporated
into a given amplification product so that the numbers of possible
base compositions consistent with a measured molecular mass is
reduced and the probability of assignment of an incorrect base
composition to a given amplification product is greatly
decreased.
[0133] Eleven saliva samples were obtained from in-house laboratory
personnel and subjected to PCR reactions as described above with
the 8 triplex primer pair sets shown in Table 3. The PCR
amplification products were purified according to the primary
amine-terminated magnetic bead separation method; a technique that
is well known in the art and that is described in US patent
publication 20050130196 which is incorporated herein by reference
in entirety. All amplification products were analyzed using a
Bruker Daltonics MicroTOF.TM. mass spectrometer. Ions from the ESI
source undergo orthogonal ion extraction and are focused in a
reflectron prior to detection. The TOF and FTICR are equipped with
the same automated sample handling and fluidics described above.
Ions are formed in the standard MicroTOF.TM. ESI source that is
equipped with the same off-axis sprayer and glass capillary as the
FTICR ESI source. Consequently, source conditions were the same as
those described above. External ion accumulation was also employed
to improve ionization duty cycle during data acquisition. Each
detection event on the TOF was comprised of 75,000 data points
digitized over 75 .mu.s.
[0134] Mass spectra of the amplification products were analyzed
independently using a maximum-likelihood processor, such as is
widely used in radar signal processing. This processor, referred to
as GenX, first makes maximum likelihood estimates of the input to
the mass spectrometer for each primer by running matched filters
for each base composition aggregate on the input data. This
processor is described in U.S. Patent Application Publication No.
20040209260 which is incorporated herein by reference in
entirety.
[0135] All duplicate reactions were analyzed independently and
duplicate results were identical in all cases. An example of a mass
spectrum of triplex primer combination 1 (primer pair nos. 2892,
2901 and 2906) is shown in FIG. 2 wherein each of the peaks labeled
A-F represent a single strand of DNA of an amplification product.
The strands are clearly separated which facilitates efficient
analysis of the molecular masses.
[0136] The applicability of the present invention for resolution of
mitochondrial DNA heteroplasmy is indicated in FIG. 3. Strands C',
D', C'' and D'' represent two amplification products having length
heteroplasmy of the amplification product of strands C and D. Each
of the strands of the heteroplasmic variants is visible in the mass
spectrum because they vary in molecular mass.
Example 3
Rapid Typing of Human Mitochondrial DNA
[0137] Mitochondrial DNA (mtDNA) analysis of forensic samples is
performed when the quantity and/or quality of DNA are insufficient
for nuclear DNA analysis, or when DNA analysis through a maternal
lineage is otherwise desired. Forensic mtDNA analysis is performed
by sequencing portions of the mtDNA genome, which is a lengthy and
labor intensive technique. We present a mass spectrometry-based
multiplexed PCR assay suitable for automated analysis of mtDNA
control region segments. The assay has been internally validated
with 20 DNA samples with known sequence profiles and 50 blinded
samples contributed by external collaborators. Correct profiles
were obtained in all cases when compared to sequencing data. Two
samples containing mixed templates were observed and the relative
contribution of each template was quantified directly from the mass
spectra of PCR products.
[0138] The primer pairs of Table 1 were designed to amplify 1051
bases of human mitochondrial DNA in the hypervariable regions HV1
and HV2. The primer pairs were combined in multiplex reactions in
groups which were chosen such that the target segments of the three
primer pairs being combined were maximally separated and such that
each of the three amplification product masses in a triplex mixture
were resolvable from each other by mass spectrometry. The triplex
groups are shown in Table 3. The lengths of the amplification
products were 85 to 140 base pairs. All except for three
amplification products were less than 130 base pairs in length. The
relative primer pair concentrations in the triplex mixtures were
adjusted in order to favor simultaneous amplification of all three
target segments.
[0139] Mass spectra were measured by electrospray time-of-flight
(TOF) mass spectrometry.
[0140] A standard reference human mitochondrial DNA database was
used to obtain the base composition profiles corresponding to the
series of amplification products produced by the overlapping primer
pairs. As described above, the database was populated with base
composition data from the Anderson reference mitochondrial DNA,
from base composition measurements earlier obtained, and by
conversions from databases of earlier obtained sequencing data.
These base composition profiles represent the "truth data."
[0141] Fifty blinded test samples, including 25 blood samples and
25 cheek swab samples were tested and compared to the pre-existing
truth data. Mitochondrial DNA was purified from the samples by the
Qiagen blood punch protocol or by the Qiagen buccal swab protocol
and quantified using the Quantifiler qPCR kit prior to analysis.
Two or more independent assays were performed with the overlapping
primers of Table 1 using between 100 and 500 pg of mitochondrial
DNA in each reaction.
[0142] The purified mitochondrial DNA was subjected to triplex PCR
amplification with the eight triplex primer groups of Table 3
according to the procedure indicated in Example 2. Amplified
mixtures were purified by solution capture of nucleic acids with
ion exchange resin linked to magnetic beads as follows: 25 .mu.l of
a 2.5 mg/mL suspension of BioClone amine terminated
superparamagnetic beads were added to 25 to 50 .mu.l of a PCR (or
RT-PCR) reaction containing approximately 10 pM of a typical PCR
amplification product. The above suspension was mixed for
approximately 5 minutes by vortexing or pipetting, after which the
liquid was removed after using a magnetic separator. The beads
containing bound PCR amplification product were then washed three
times with 50 mM ammonium bicarbonate/50% MeOH or 100 mM ammonium
bicarbonate/50% MeOH, followed by three more washes with 50% McOH.
The bound PCR amplicon was eluted with a solution of 25 mM
piperidine, 25 mM imidazole, 35% MeOH which included peptide
calibration standards.
[0143] Each mass spectrum obtained by ESI-TOF mass spectrometry was
independently calibrated by internal peptide calibrants and
noise-reduced prior to calculation of base composition. Base
compositions were obtained from molecular masses and compared to a
database developed from over 110,000 mitochondrial DNA sequences.
The base composition of each amplification product was associated
with mitochondrial DNA coordinates as shown, for example in Table 4
which provides the base compositions for sample AF-12 from the set
of 50 blinded samples.
TABLE-US-00004 TABLE 4 Mitochondrial DNA Base Composition Profile
for Sample AF-12 Anderson/Cambridge Sequence Coordinates (SEQ ID
NO: 51) Base Composition 15893 . . . 16012 A47 G18 C25 T30 15937 .
. . 16041 A35 G14 C24 T32 15985 . . . 16073 A26 G15 C21 T27 16025 .
. . 16119 A26 G17 C26 T26 16055 . . . 16155 A31 G13 C30 T27 16102 .
. . 16224 A45 G13 C42 T23 16130 . . . 16224 A36 G7 C33 T19 16154 .
. . 16268 A44 G7 C46 T18 16231 . . . 16338 A40 G9 C40 T19 16256 . .
. 16366 A37 G9 C41 T24 16318 . . . 16402 A20 G14 C30 T21 16357 . .
. 16451 A21 G17 C36 T21 5 . . . 97 A19 G24 C24 T26 20 . . . 139 A24
G34 C29 T33 83 . . . 187 A23 G21 C29 T32 113 . . . 245 A39 G18 C28
T48 154 . . . 290 A49 G17 C31 T40 204 . . . 330 A42 G16 C35 T32 204
. . . 330 A42 G16 C36 T32 204 . . . 330 A42 G16 C37 T32 239 . . .
363 A43 G11 C46 T23 239 . . . 363 A43 G11 C47 T23 239 . . . 363 A43
G11 C48 T23 239 . . . 363 A43 G11 C49 T23 262 . . . 390 A47 G10 C50
T20 262 . . . 390 A47 G10 C51 T20 262 . . . 390 A47 G10 C52 T20 262
. . . 390 A47 G10 C53 T20 331 . . . 425 A33 G9 C27 T26 367 . . .
463 A27 G8 C32 T30 409 . . . 521 A32 G7 C48 T26 464 . . . 603 A44
G10 C63 T23
[0144] Heteroplasmy was detected in several of the samples. For
example, sample AF-4 has CT heteroplasmy at position 16176. Two
distinct amplification products having base compositions of A45 G13
C41 T24 and A45 G13 C40 T25 were obtained for this sample using
primer pair number 2896 which amplifies positions 16102 . . .
16224. If conventional sequencing analyses were used to analyze the
amplification reaction mixture, heteroplasmy would not have been
detected. Table 5 indicates additional examples of heteroplasmy
detected in various samples.
TABLE-US-00005 TABLE 5 Summary of Heteroplasmy Detection in
Selected Samples Blinded Approximate % of Sample Region
Heteroplasmy Minor Product AF-2 16231 . . . 16338 C .fwdarw. T 32.4
16256 . . . 16366 AF-4 16102 . . . 16224 C .fwdarw. T 49.2 16130 .
. . 16224 AF-7 16318 . . . 16402 T .fwdarw. C 10.2 AF-9 464 . . .
603 AC insertion 17.3 AF-19 15985 . . . 16073 A .fwdarw. G 44.9
16025 . . . 16119 AF-22 6102 . . . 16224 C .fwdarw. A 36.2 16130 .
. . 16224 AF-24 464 . . . 603 AC deletion 13.5 FBI-22 16055 . . .
16155 A .fwdarw. C 7.0 FBI-37 16231 . . . 16338 C .fwdarw. T 20.0
16256 . . . 16366 FBI-48 16055 . . . 16155 T .fwdarw. G 6.0 FBI-49
154 . . . 290 A .fwdarw. C 10.6 FBI-51 5 . . . 97 C .fwdarw. T 43.0
20 . . . 139 FBI-57 16357 . . . 16451 T .fwdarw. C 6.0 FBI-61 464 .
. . 603 AC insertion 17.0 FBI-66 113 . . . 245 C .fwdarw. T 50.0
154 . . . 290 FBI-72 113 . . . 245 C .fwdarw. T 34.0 154 . . .
290
[0145] The results of the investigation of the 50 blinded samples
indicated that 47 of 47 pure samples were directly concordant with
the sequence data available. One negative (no mitochondrial DNA
present) was confirmed as negative and two buccal swab samples were
confirmed as mixtures of existing buccal swab samples. Deduction of
contributors to mixtures was confirmed as accurate. Multiple
examples of length heteroplasmy and single nucleotide polymorphism
heteroplasmy were observed. These results indicate that the method
is useful for rapid typing of human mitochondrial DNA.
Example 4
Demonstration of the Feasibility of Rapid Detection of a Genetic
Engineering Event
[0146] To detect a genetic engineering event indicated by the
presence of foreign DNA sequences inserted into a parent virus, a
strategy of overlapping PCR primers to tile large sections of viral
genomes is employed. Primer binding sites were chosen such that the
PCR amplicon length (standard segments) will be approximately 150
nucleobases in length with overlapping segments defined by primer
hybridization regions every 50-100 nucleobases across the entire
target region (in a manner exemplified by FIG. 1).
[0147] Target regions are chosen according to expectation of
identification of a genetic engineering event at a particular
region. For example, if it is known that "region X" of a genome of
a given virus is known to be a common insertion point for a gene
encoding a toxin used as a biowarfare agent, it would be
advantageous to simplify the base composition analysis by choosing
only the genomic coordinates of region X as the target (a portion
of the genome chosen as the target). The target region is then
divided into sub-segments and primer pairs are chosen to obtain
amplification products which represent the sub-segments for base
composition analysis. On the other hand, if it is known that any
point in an entire genome is appropriate for insertion of a gene,
it would be advantageous to define the entire genome as the target
in order to ensure that the insertion is detected. One with
ordinary skill will recognize that defining an entire genome as a
target will require design of many more primer pairs and
significantly more analysis resources.
[0148] A database of molecular masses and base compositions for
each standard segment for the standard target virus species will be
used to assemble a base composition map of each sampled region from
the mass spectrum derived from each amplification reaction. The
identification of at least one amplification product whose base
composition differs from the base composition of its corresponding
standard segment in one or more overlapping tiled regions will
indicate that a variant exists and the sample will be flagged for
further analysis. SNP variants are readily recognized and can be
directly analyzed by the methods described herein. As an example of
the proposed method, 10 Kb nucleobase regions of orthopoxvirus
species genetically engineered with a green-fluorescent protein
(GFP) construct are inserted into analogous regions in five
different orthopoxviruses which will serve as benign surrogates to
represent a potentially deadly engineered virus.
[0149] In the following proof-of-concept example using the
recombinant GFP-containing camelpoxvirus (CMPV-GFP), simulated
processed mass spectrometry data was used to reconstruct a standard
segment base composition map, associate it unambiguously to CMPV,
and identify presence of a foreign insert in the virus by flagging
an unexpected/unmatched hole in two of the amplified regions.
Overlapping primer pairs were selected to span the CMPV-GFP
sequence. A theoretical prediction of the expected standard
amplification products using these primers was used to populate a
database that serves as an expected mass set for all poxvirus
species. Processed mass spectrometry data of the amplified regions
of CMPV-GFP were simulated and matched against the database of 16
poxvirus sequences (which did not include the GFP-engineered
sequence) to construct a base composition profile of each region.
The base composition profile is generated using the full set of
potential fragments from all database sequences, which helps
increase profile coverage in the case of strain-to-strain SNP
variations. If any SNP-generated fragments appear that do not occur
in any database sequence, the base composition of the
double-stranded fragment can be deduced directly from the masses.
The final base composition profile for each region can then be
compared to the compositions for all database sequences to
confirm/refine the identity of the parent virus. The presence of an
unmatched "hole" in the assembled profile that cannot be matched to
the expected viral sequence indicates the potential presence of an
engineered insert. This region may then be sequenced and compared
to the full sequence database via BLAST. The ability to rapidly
identify the presence of the insert, the location of the insertion,
and the flanking regions of the viral genome where the unexpected
genetic modification was done will serve as a powerful tool to flag
potential bioengineering events. It further reduces the burden of
sequencing to specific, targeted regions of the viral genome
instead of the entire virus from every sample.
Example 5
Vector Validation and Characterization of Vector Heterogeneity
[0150] This example illustrates a scenario where the method of the
present invention could be used to validate and/or characterize
heterogeneity of standard nucleic acid sequences encoding
biological products. The process of production of biological
therapeutic proteins such as vaccines and monoclonal antibodies
requires storage and manipulation of the nucleic acid sequences
encoding the therapeutic proteins. Mutations may occasionally arise
within a given nucleic acid sequence encoding the protein and
compromise its therapeutic effect. It is desirable to have a method
for rapid validation of such nucleic acid sequences and
characterization of heterogeneity of the sequences, if present.
[0151] Vector X contains a nucleic acid sequence encoding vaccine Y
which is used to vaccinate individuals against infection of virus
Z. Vector X is used to transfect a suitable host for production of
vaccine Y. Vaccine Y is suspected of being compromised by a
mutation that has arisen in the nucleic acid sequence encoding
vaccine Y and is being propagated via routine laboratory
manipulations of vector X.
[0152] The method of the present invention is used to analyze the
nucleic acid of vector X by base composition analysis of
sub-segments of the vector which encode vaccine Y. The nucleic acid
sequence encoding vaccine Y is 300 nucleobases in length. This
sequence is divided into four sub-segments as follows: sub-segment
1 represents coordinates 1 . . . 100 of the nucleic acid sequence
encoding vaccine Y; sub-segment 2 represents coordinates 61 . . .
160 of the nucleic acid sequence encoding vaccine Y; sub-segment 3
represents coordinates 141 . . . 240 of the nucleic acid sequence
encoding vaccine Y; and sub-segment 4 represents coordinates 221 .
. . 300 of the nucleic acid sequence encoding vaccine Y. The base
compositions of each of the four sub-segments are known because the
sequence of vaccine Y is known. Sub-segment 1 of the nucleic acid
of vaccine Y has a base composition of A.sub.25T.sub.20C.sub.30
G.sub.25; sub-segment 2 of the nucleic acid of vaccine Y has a base
composition of A.sub.15T.sub.20 C.sub.35 G.sub.30; sub-segment 3 of
the nucleic acid of vaccine Y has a base composition of
A.sub.20T.sub.25 C.sub.30 G.sub.25; and sub-segment 4 of the
nucleic acid of vaccine Y has a base composition of A.sub.25
T.sub.15 C.sub.15 G.sub.20. Primer pair 1 is used to obtain an
amplification product of vector X wherein the amplification product
corresponds to sub-segment 1. Primer pair 2 is used to obtain an
amplification product of vector X wherein the amplification product
corresponds to sub-segment 2. Primer pair 3 is used to obtain an
amplification product of vector X wherein the amplification product
corresponds to sub-segment 3. Primer pair 4 is used to obtain an
amplification product of vector X wherein the amplification product
corresponds to sub-segment 4. The amplification products
corresponding to sub-segments 1-4 are analyzed by mass spectrometry
to determine their molecular masses. The base compositions of one
or more of the amplification products are calculated from the
molecular masses and compared with the base compositions of the
sub-segments of vaccine Y listed above.
[0153] In one example, production lot A-1 of vector X is analyzed
according to the method described above. The results of the base
composition calculations indicate that each of the experimentally
determined base compositions of the amplification products match
the base compositions of the four sub-segments. The conclusion of
this exercise is that vector X and the nucleic acid encoding
vaccine Y contained thereon, do not contain mutations and that the
vaccine vector is validated, indicating that future vaccine
production will not be affected.
[0154] In another example, production lot B-2 of vector X is
analyzed according to the method described above. The results of
the base composition calculations indicate that each of the
experimentally determined base compositions of the amplification
products match the base compositions of the four sub-segments. An
additional amplification product is observed in the mass spectrum
of the amplification reaction of primer pair 3. The additional
amplification product which corresponds to sub-segment 3 has a base
composition of A.sub.20 T.sub.25 C.sub.31 G.sub.24. This indicates
that the additional amplification product has a G.fwdarw.C
substitution relative to the standard base composition of
sub-segment 3. The conclusion of this exercise is that vector X and
the nucleic acid encoding vaccine Y are heterogeneous and that
production of vaccine Y from production lot B-2 of vector X may be
compromised. The mass spectrum indicating signals from two
amplification products corresponding to sub-segment 3 may also be
used to estimate the relative amounts of the two amplification
products, thereby further characterizing the extent of
heterogeneity of the nucleic acid sequence encoding vaccine Y. If
the relative quantity of nucleic acid containing the mutation is
low, it may be decided that heterogeneity is negligible. On the
other hand, if the relative quantity of nucleic acid containing the
mutation is high, it may be decided that vector X lot B-2 is
severely compromised and should be destroyed instead of being used
to produce vaccine Y.
[0155] Various modifications of the invention, in addition to those
described herein, will be apparent to those skilled in the art from
the foregoing description. Such modifications are also intended to
fall within the scope of the appended claims. Each reference
(including, but not limited to, journal articles, U.S. and non-U.S.
patents, patent application publications, international patent
application publications, gene bank accession numbers, internet web
sites, and the like) cited in the present application is
incorporated herein by reference in its entirety. Those skilled in
the art will appreciate that numerous changes and modifications may
be made to the embodiments of the invention and that such changes
and modifications may be made without departing from the spirit of
the invention. It is therefore intended that the appended claims
cover all such equivalent variations as fall within the true spirit
and scope of the invention.
Sequence CWU 1
1
52120DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 1tctcgtcccc atggatgacc 20224DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
2tgccatttac cgtacatagc acat 24327DNAArtificial SequenceDescription
of Artificial Sequence Synthetic primer 3tcacccctca cccactagga
taccaac 27423DNAArtificial SequenceDescription of Artificial
Sequence Synthetic primer 4tcacacatca actgcaactc caa
23528DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 5tagtacataa aaacccaatc cacatcaa 28628DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
6tagtacataa aaacccaatc cacatcag 28727DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
7tttccataaa tacttgacca cctgtag 27822DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
8tactgccagc caccatgaat at 22923DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 9tccaagtatt gactcaccca tca
231023DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 10tctttcatgg ggaagcagat ttg 231130DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
11tgcacccaaa gctaagattc taatttaaac 301231DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
12tggggtataa actaatacac cagtcttgta a 311326DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
13tcaggtctat caccctatta accact 261421DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
14tattaaccac tcacgggagc t 211520DNAArtificial SequenceDescription
of Artificial Sequence Synthetic primer 15tagcattgcg agacgctgga
201625DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 16tctatgtcgc agtatctgtc tttga 251724DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
17tcctttatcg cacctacgtt caat 241824DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
18taacaattga atgtctgcac agcc 241930DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
19tgtgttaatt aattaatgct tgtaggacat 302024DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
20tcttaaacac atctctgcca aacc 242122DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
21tgcggtatgc acttttaaca gt 222229DNAArtificial SequenceDescription
of Artificial Sequence Synthetic primer 22tctcccatac tactaatctc
atcaataca 292322DNAArtificial SequenceDescription of Artificial
Sequence Synthetic primer 23taccctaaca ccagcctaac ca
222427DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 24tgctttccac acagacatca taacaaa
272526DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 25tcctttttcc aaggacaaat cagaga 262623DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
26tcgaggagag tagcactctt gtg 232721DNAArtificial SequenceDescription
of Artificial Sequence Synthetic primer 27tggtcaaggg acccctatct g
212822DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 28tgggacgaga agggatttga ct 222933DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
29tgctatgtac ggtaaatggc tttatgtact atg 333018DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
30tggtgagggg tggctttg 183118DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 31tggtgagggg tggctttg
183223DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 32tgggttgatt gctgtacttg ctt 233323DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
33tgggttgatt gctgtacttg ctt 233426DNAArtificial SequenceDescription
of Artificial Sequence Synthetic primer 34tacaggtggt caagtattta
tggtac 263521DNAArtificial SequenceDescription of Artificial
Sequence Synthetic primer 35tcatggtggc tggcagtaat g
213622DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 36tggtgagtca atacttgggt gg 223727DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
37ttaaattaga atcttagctt tgggtgc 273821DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
38tgtctcgcaa tgctatcgcg t 213925DNAArtificial SequenceDescription
of Artificial Sequence Synthetic primer 39tttcaaagac agatactgcg
acata 254025DNAArtificial SequenceDescription of Artificial
Sequence Synthetic primer 40tgcctgtaat attgaacgta ggtgc
254128DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 41tgggttatta ttatgtccta caagcatt
284223DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 42tggttgttat gatgtctgtg tgg 234323DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
43tgtttttggg gtttggcaga gat 234417DNAArtificial SequenceDescription
of Artificial Sequence Synthetic primer 44tctgtggcca gaagcgg
174524DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 45taaaagtgca taccgccaaa agat 244620DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
46tgtgtgtgct gggtaggatg 204727DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 47tgctttgagg aggtaagcta
cataaac 274826DNAArtificial SequenceDescription of Artificial
Sequence Synthetic primer 48tggaggggaa aataatgtgt tagttg
264923DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 49tctggttagg ctggtgttag ggt 235024DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
50tgcttcccca tgaaagaaca gaga 245116568DNAHomo sapiens 51gatcacaggt
ctatcaccct attaaccact cacgggagct ctccatgcat ttggtatttt 60cgtctggggg
gtatgcacgc gatagcattg cgagacgctg gagccggagc accctatgtc
120gcagtatctg tctttgattc ctgcctcatc ctattattta tcgcacctac
gttcaatatt 180acaggcgaac atacttacta aagtgtgtta attaattaat
gcttgtagga cataataata 240acaattgaat gtctgcacag ccactttcca
cacagacatc ataacaaaaa atttccacca 300aaccccccct cccccgcttc
tggccacagc acttaaacac atctctgcca aaccccaaaa 360acaaagaacc
ctaacaccag cctaaccaga tttcaaattt tatcttttgg cggtatgcac
420ttttaacagt caccccccaa ctaacacatt attttcccct cccactccca
tactactaat 480ctcatcaata caacccccgc ccatcctacc cagcacacac
acaccgctgc taaccccata 540ccccgaacca accaaacccc aaagacaccc
cccacagttt atgtagctta cctcctcaaa 600gcaatacact gaaaatgttt
agacgggctc acatcacccc ataaacaaat aggtttggtc 660ctagcctttc
tattagctct tagtaagatt acacatgcaa gcatccccgt tccagtgagt
720tcaccctcta aatcaccacg atcaaaaggg acaagcatca agcacgcagc
aatgcagctc 780aaaacgctta gcctagccac acccccacgg gaaacagcag
tgattaacct ttagcaataa 840acgaaagttt aactaagcta tactaacccc
agggttggtc aatttcgtgc cagccaccgc 900ggtcacacga ttaacccaag
tcaatagaag ccggcgtaaa gagtgtttta gatcaccccc 960tccccaataa
agctaaaact cacctgagtt gtaaaaaact ccagttgaca caaaatagac
1020tacgaaagtg gctttaacat atctgaacac acaatagcta agacccaaac
tgggattaga 1080taccccacta tgcttagccc taaacctcaa cagttaaatc
aacaaaactg ctcgccagaa 1140cactacgagc cacagcttaa aactcaaagg
acctggcggt gcttcatatc cctctagagg 1200agcctgttct gtaatcgata
aaccccgatc aacctcacca cctcttgctc agcctatata 1260ccgccatctt
cagcaaaccc tgatgaaggc tacaaagtaa gcgcaagtac ccacgtaaag
1320acgttaggtc aaggtgtagc ccatgaggtg gcaagaaatg ggctacattt
tctaccccag 1380aaaactacga tagcccttat gaaacttaag ggtcgaaggt
ggatttagca gtaaactaag 1440agtagagtgc ttagttgaac agggccctga
agcgcgtaca caccgcccgt caccctcctc 1500aagtatactt caaaggacat
ttaactaaaa cccctacgca tttatataga ggagacaagt 1560cgtaacatgg
taagtgtact ggaaagtgca cttggacgaa ccagagtgta gcttaacaca
1620aagcacccaa cttacactta ggagatttca acttaacttg accgctctga
gctaaaccta 1680gccccaaacc cactccacct tactaccaga caaccttagc
caaaccattt acccaaataa 1740agtataggcg atagaaattg aaacctggcg
caatagatat agtaccgcaa gggaaagatg 1800aaaaattata accaagcata
atatagcaag gactaacccc tataccttct gcataatgaa 1860ttaactagaa
ataactttgc aaggagagcc aaagctaaga cccccgaaac cagacgagct
1920acctaagaac agctaaaaga gcacacccgt ctatgtagca aaatagtggg
aagatttata 1980ggtagaggcg acaaacctac cgagcctggt gatagctggt
tgtccaagat agaatcttag 2040ttcaacttta aatttgccca cagaaccctc
taaatcccct tgtaaattta actgttagtc 2100caaagaggaa cagctctttg
gacactagga aaaaaccttg tagagagagt aaaaaattta 2160acacccatag
taggcctaaa agcagccacc aattaagaaa gcgttcaagc tcaacaccca
2220ctacctaaaa aatcccaaac atataactga actcctcaca cccaattgga
ccaatctatc 2280accctataga agaactaatg ttagtataag taacatgaaa
acattctcct ccgcataagc 2340ctgcgtcaga ttaaaacact gaactgacaa
ttaacagccc aatatctaca atcaaccaac 2400aagtcattat taccctcact
gtcaacccaa cacaggcatg ctcataagga aaggttaaaa 2460aaagtaaaag
gaactcggca aatcttaccc cgcctgttta ccaaaaacat cacctctagc
2520atcaccagta ttagaggcac cgcctgccca gtgacacatg tttaacggcc
gcggtaccct 2580aaccgtgcaa aggtagcata atcacttgtt ccttaaatag
ggacctgtat gaatggctcc 2640acgagggttc agctgtctct tacttttaac
cagtgaaatt gacctgcccg tgaagaggcg 2700ggcataacac agcaagacga
gaagacccta tggagcttta atttattaat gcaaacagta 2760cctaacaaac
ccacaggtcc taaactacca aacctgcatt aaaaatttcg gttggggcga
2820cctcggagca gaacccaacc tccgagcagt acatgctaag acttcaccag
tcaaagcgaa 2880ctactatact caattgatcc aataacttga ccaacggaac
aagttaccct agggataaca 2940gcgcaatcct attctagagt ccatatcaac
aatagggttt acgacctcga tgttggatca 3000ggacatcccg atggtgcagc
cgctattaaa ggttcgtttg ttcaacgatt aaagtcctac 3060gtgatctgag
ttcagaccgg agtaatccag gtcggtttct atctacttca aattcctccc
3120tgtacgaaag gacaagagaa ataaggccta cttcacaaag cgccttcccc
cgtaaatgat 3180atcatctcaa cttagtatta tacccacacc cacccaagaa
cagggtttgt taagatggca 3240gagcccggta atcgcataaa acttaaaact
ttacagtcag aggttcaatt cctcttctta 3300acaacatacc catggccaac
ctcctactcc tcattgtacc cattctaatc gcaatggcat 3360tcctaatgct
taccgaacga aaaattctag gctatataca actacgcaaa ggccccaacg
3420ttgtaggccc ctacgggcta ctacaaccct tcgctgacgc cataaaactc
ttcaccaaag 3480agcccctaaa acccgccaca tctaccatca ccctctacat
caccgccccg accttagctc 3540tcaccatcgc tcttctacta tgaacccccc
tccccatacc caaccccctg gtcaacctca 3600acctaggcct cctatttatt
ctagccacct ctagcctagc cgtttactca atcctctgat 3660cagggtgagc
atcaaactca aactacgccc tgatcggcgc actgcgagca gtagcccaaa
3720caatctcata tgaagtcacc ctagccatca ttctactatc aacattacta
ataagtggct 3780cctttaacct ctccaccctt atcacaacac aagaacacct
ctgattactc ctgccatcat 3840gacccttggc cataatatga tttatctcca
cactagcaga gaccaaccga acccccttcg 3900accttgccga aggggagtcc
gaactagtct caggcttcaa catcgaatac gccgcaggcc 3960ccttcgccct
attcttcata gccgaataca caaacattat tataataaac accctcacca
4020ctacaatctt cctaggaaca acatatgacg cactctcccc tgaactctac
acaacatatt 4080ttgtcaccaa gaccctactt ctaacctccc tgttcttatg
aattcgaaca gcataccccc 4140gattccgcta cgaccaactc atacacctcc
tatgaaaaaa cttcctacca ctcaccctag 4200cattacttat atgatatgtc
tccataccca ttacaatctc cagcattccc cctcaaacct 4260aagaaatatg
tctgataaaa gagttacttt gatagagtaa ataataggag cttaaacccc
4320cttatttcta ggactatgag aatcgaaccc atccctgaga atccaaaatt
ctccgtgcca 4380cctatcacac cccatcctaa agtaaggtca gctaaataag
ctatcgggcc cataccccga 4440aaatgttggt tatacccttc ccgtactaat
taatcccctg gcccaacccg tcatctactc 4500taccatcttt gcaggcacac
tcatcacagc gctaagctcg cactgatttt ttacctgagt 4560aggcctagaa
ataaacatgc tagcttttat tccagttcta accaaaaaaa taaaccctcg
4620ttccacagaa gctgccatca agtatttcct cacgcaagca accgcatcca
taatccttct 4680aatagctatc ctcttcaaca atatactctc cggacaatga
accataacca atactaccaa 4740tcaatactca tcattaataa tcataatagc
tatagcaata aaactaggaa tagccccctt 4800tcacttctga gtcccagagg
ttacccaagg cacccctctg acatccggcc tgcttcttct 4860cacatgacaa
aaactagccc ccatctcaat catataccaa atctctccct cactaaacgt
4920aagccttctc ctcactctct caatcttatc catcatagca ggcagttgag
gtggattaaa 4980ccaaacccag ctacgcaaaa tcttagcata ctcctcaatt
acccacatag gatgaataat 5040agcagttcta ccgtacaacc ctaacataac
cattcttaat ttaactattt atattatcct 5100aactactacc gcattcctac
tactcaactt aaactccagc accacgaccc tactactatc 5160tcgcacctga
aacaagctaa catgactaac acccttaatt ccatccaccc tcctctccct
5220aggaggcctg cccccgctaa ccggcttttt gcccaaatgg gccattatcg
aagaattcac 5280aaaaaacaat agcctcatca tccccaccat catagccacc
atcaccctcc ttaacctcta 5340cttctaccta cgcctaatct actccacctc
aatcacacta ctccccatat ctaacaacgt 5400aaaaataaaa tgacagtttg
aacatacaaa acccacccca ttcctcccca cactcatcgc 5460ccttaccacg
ctactcctac ctatctcccc ttttatacta ataatcttat agaaatttag
5520gttaaataca gaccaagagc cttcaaagcc ctcagtaagt tgcaatactt
aatttctgta 5580acagctaagg actgcaaaac cccactctgc atcaactgaa
cgcaaatcag ccactttaat 5640taagctaagc ccttactaga ccaatgggac
ttaaacccac aaacacttag ttaacagcta 5700agcaccctaa tcaactggct
tcaatctact tctcccgccg ccgggaaaaa aggcgggaga 5760agccccggca
ggtttgaagc tgcttcttcg aatttgcaat tcaatatgaa aatcacctcg
5820gagctggtaa aaagaggcct aacccctgtc tttagattta cagtccaatg
cttcactcag 5880ccattttacc tcacccccac tgatgttcgc cgaccgttga
ctattctcta caaaccacaa 5940agacattgga acactatacc tattattcgg
cgcatgagct ggagtcctag gcacagctct 6000aagcctcctt attcgagccg
agctgggcca gccaggcaac cttctaggta acgaccacat 6060ctacaacgtt
atcgtcacag cccatgcatt tgtaataatc ttcttcatag taatacccat
6120cataatcgga ggctttggca actgactagt tcccctaata atcggtgccc
ccgatatggc 6180gtttccccgc ataaacaaca taagcttctg actcttacct
ccctctctcc tactcctgct 6240cgcatctgct atagtggagg ccggagcagg
aacaggttga acagtctacc ctcccttagc 6300agggaactac tcccaccctg
gagcctccgt agacctaacc atcttctcct tacacctagc 6360aggtgtctcc
tctatcttag gggccatcaa tttcatcaca acaattatca atataaaacc
6420ccctgccata acccaatacc aaacgcccct cttcgtctga tccgtcctaa
tcacagcagt 6480cctacttctc ctatctctcc cagtcctagc tgctggcatc
actatactac taacagaccg 6540caacctcaac accaccttct tcgaccccgc
cggaggagga gaccccattc tataccaaca 6600cctattctga tttttcggtc
accctgaagt ttatattctt atcctaccag gcttcggaat 6660aatctcccat
attgtaactt actactccgg aaaaaaagaa ccatttggat acataggtat
6720ggtctgagct atgatatcaa ttggcttcct agggtttatc gtgtgagcac
accatatatt 6780tacagtagga atagacgtag acacacgagc atatttcacc
tccgctacca taatcatcgc 6840tatccccacc ggcgtcaaag tatttagctg
actcgccaca ctccacggaa gcaatatgaa 6900atgatctgct gcagtgctct
gagccctagg attcatcttt cttttcaccg taggtggcct 6960gactggcatt
gtattagcaa actcatcact agacatcgta ctacacgaca cgtactacgt
7020tgtagcccac ttccactatg tcctatcaat aggagctgta tttgccatca
taggaggctt 7080cattcactga tttcccctat tctcaggcta caccctagac
caaacctacg ccaaaatcca 7140tttcactatc atattcatcg gcgtaaatct
aactttcttc ccacaacact ttctcggcct 7200atccggaatg ccccgacgtt
actcggacta ccccgatgca tacaccacat gaaacatcct 7260atcatctgta
ggctcattca tttctctaac agcagtaata ttaataattt tcatgatttg
7320agaagccttc gcttcgaagc gaaaagtcct aatagtagaa gaaccctcca
taaacctgga 7380gtgactatat ggatgccccc caccctacca cacattcgaa
gaacccgtat acataaaatc 7440tagacaaaaa aggaaggaat cgaacccccc
aaagctggtt tcaagccaac cccatggcct 7500ccatgacttt ttcaaaaagg
tattagaaaa accatttcat aactttgtca aagttaaatt 7560ataggctaaa
tcctatatat cttaatggca catgcagcgc aagtaggtct acaagacgct
7620acttccccta tcatagaaga gcttatcacc tttcatgatc acgccctcat
aatcattttc 7680cttatctgct tcctagtcct gtatgccctt ttcctaacac
tcacaacaaa actaactaat 7740actaacatct cagacgctca ggaaatagaa
accgtctgaa ctatcctgcc cgccatcatc 7800ctagtcctca tcgccctccc
atccctacgc atcctttaca taacagacga ggtcaacgat 7860ccctccctta
ccatcaaatc aattggccac caatggtact gaacctacga gtacaccgac
7920tacggcggac taatcttcaa ctcctacata cttcccccat tattcctaga
accaggcgac 7980ctgcgactcc ttgacgttga caatcgagta gtactcccga
ttgaagcccc cattcgtata 8040ataattacat cacaagacgt cttgcactca
tgagctgtcc ccacattagg cttaaaaaca 8100gatgcaattc ccggacgtct
aaaccaaacc actttcaccg ctacacgacc gggggtatac 8160tacggtcaat
gctctgaaat ctgtggagca aaccacagtt tcatgcccat cgtcctagaa
8220ttaattcccc taaaaatctt tgaaataggg cccgtattta ccctatagca
ccccctctac 8280cccctctaga gcccactgta aagctaactt agcattaacc
ttttaagtta aagattaaga 8340gaaccaacac ctctttacag tgaaatgccc
caactaaata ctaccgtatg gcccaccata 8400attaccccca tactccttac
actattcctc atcacccaac
taaaaatatt aaacacaaac 8460taccacctac ctccctcacc aaagcccata
aaaataaaaa attataacaa accctgagaa 8520ccaaaatgaa cgaaaatctg
ttcgcttcat tcattgcccc cacaatccta ggcctacccg 8580ccgcagtact
gatcattcta tttccccctc tattgatccc cacctccaaa tatctcatca
8640acaaccgact aatcaccacc caacaatgac taatcaaact aacctcaaaa
caaatgataa 8700ccatacacaa cactaaagga cgaacctgat ctcttatact
agtatcctta atcattttta 8760ttgccacaac taacctcctc ggactcctgc
ctcactcatt tacaccaacc acccaactat 8820ctataaacct agccatggcc
atccccttat gagcgggcac agtgattata ggctttcgct 8880ctaagattaa
aaatgcccta gcccacttct taccacaagg cacacctaca ccccttatcc
8940ccatactagt tattatcgaa accatcagcc tactcattca accaatagcc
ctggccgtac 9000gcctaaccgc taacattact gcaggccacc tactcatgca
cctaattgga agcgccaccc 9060tagcaatatc aaccattaac cttccctcta
cacttatcat cttcacaatt ctaattctac 9120tgactatcct agaaatcgct
gtcgccttaa tccaagccta cgttttcaca cttctagtaa 9180gcctctacct
gcacgacaac acataatgac ccaccaatca catgcctatc atatagtaaa
9240acccagccca tgacccctaa caggggccct ctcagccctc ctaatgacct
ccggcctagc 9300catgtgattt cacttccact ccataacgct cctcatacta
ggcctactaa ccaacacact 9360aaccatatac caatgatggc gcgatgtaac
acgagaaagc acataccaag gccaccacac 9420accacctgtc caaaaaggcc
ttcgatacgg gataatccta tttattacct cagaagtttt 9480tttcttcgca
ggatttttct gagcctttta ccactccagc ctagccccta ccccccaatt
9540aggagggcac tggcccccaa caggcatcac cccgctaaat cccctagaag
tcccactcct 9600aaacacatcc gtattactcg catcaggagt atcaatcacc
tgagctcacc atagtctaat 9660agaaaacaac cgaaaccaaa taattcaagc
actgcttatt acaattttac tgggtctcta 9720ttttaccctc ctacaagcct
cagagtactt cgagtctccc ttcaccattt ccgacggcat 9780ctacggctca
acattttttg tagccacagg cttccacgga cttcacgtca ttattggctc
9840aactttcctc actatctgct tcatccgcca actaatattt cactttacat
ccaaacatca 9900ctttggcttc gaagccgccg cctgatactg gcattttgta
gatgtggttt gactatttct 9960gtatgtctcc atctattgat gagggtctta
ctcttttagt ataaatagta ccgttaactt 10020ccaattaact agttttgaca
acattcaaaa aagagtaata aacttcgcct taattttaat 10080aatcaacacc
ctcctagcct tactactaat aattattaca ttttgactac cacaactcaa
10140cggctacata gaaaaatcca ccccttacga gtgcggcttc gaccctatat
cccccgcccg 10200cgtccctttc tccataaaat tcttcttagt agctattacc
ttcttattat ttgatctaga 10260aattgccctc cttttacccc taccatgagc
cctacaaaca actaacctgc cactaatagt 10320tatgtcatcc ctcttattaa
tcatcatcct agccctaagt ctggcctatg agtgactaca 10380aaaaggatta
gactgaaccg aattggtata tagtttaaac aaaacgaatg atttcgactc
10440attaaattat gataatcata tttaccaaat gcccctcatt tacataaata
ttatactagc 10500atttaccatc tcacttctag gaatactagt atatcgctca
cacctcatat cctccctact 10560atgcctagaa ggaataatac tatcgctgtt
cattatagct actctcataa ccctcaacac 10620ccactccctc ttagccaata
ttgtgcctat tgccatacta gtctttgccg cctgcgaagc 10680agcggtgggc
ctagccctac tagtctcaat ctccaacaca tatggcctag actacgtaca
10740taacctaaac ctactccaat gctaaaacta atcgtcccaa caattatatt
actaccactg 10800acatgacttt ccaaaaaaca cataatttga atcaacacaa
ccacccacag cctaattatt 10860agcatcatcc ctctactatt ttttaaccaa
atcaacaaca acctatttag ctgttcccca 10920accttttcct ccgaccccct
aacaaccccc ctcctaatac taactacctg actcctaccc 10980ctcacaatca
tggcaagcca acgccactta tccagtgaac cactatcacg aaaaaaactc
11040tacctctcta tactaatctc cctacaaatc tccttaatta taacattcac
agccacagaa 11100ctaatcatat tttatatctt cttcgaaacc acacttatcc
ccaccttggc tatcatcacc 11160cgatgaggca accagccaga acgcctgaac
gcaggcacat acttcctatt ctacacccta 11220gtaggctccc ttcccctact
catcgcacta atttacactc acaacaccct aggctcacta 11280aacattctac
tactcactct cactgcccaa gaactatcaa actcctgagc caacaactta
11340atatgactag cttacacaat agcttttata gtaaagatac ctctttacgg
actccactta 11400tgactcccta aagcccatgt cgaagccccc atcgctgggt
caatagtact tgccgcagta 11460ctcttaaaac taggcggcta tggtataata
cgcctcacac tcattctcaa ccccctgaca 11520aaacacatag cctacccctt
ccttgtacta tccctatgag gcataattat aacaagctcc 11580atctgcctac
gacaaacaga cctaaaatcg ctcattgcat actcttcaat cagccacata
11640gccctcgtag taacagccat tctcatccaa accccctgaa gcttcaccgg
cgcagtcatt 11700ctcataatcg cccacgggct tacatcctca ttactattct
gcctagcaaa ctcaaactac 11760gaacgcactc acagtcgcat cataatcctc
tctcaaggac ttcaaactct actcccacta 11820atagcttttt gatgacttct
agcaagcctc gctaacctcg ccttaccccc cactattaac 11880ctactgggag
aactctctgt gctagtaacc acgttctcct gatcaaatat cactctccta
11940cttacaggac tcaacatact agtcacagcc ctatactccc tctacatatt
taccacaaca 12000caatggggct cactcaccca ccacattaac aacataaaac
cctcattcac acgagaaaac 12060accctcatgt tcatacacct atcccccatt
ctcctcctat ccctcaaccc cgacatcatt 12120accgggtttt cctcttgtaa
atatagttta accaaaacat cagattgtga atctgacaac 12180agaggcttac
gaccccttat ttaccgagaa agctcacaag aactgctaac tcatgccccc
12240atgtctaaca acatggcttt ctcaactttt aaaggataac agctatccat
tggtcttagg 12300ccccaaaaat tttggtgcaa ctccaaataa aagtaataac
catgcacact actataacca 12360ccctaaccct gacttcccta attcccccca
tccttaccac cctcgttaac cctaacaaaa 12420aaaactcata cccccattat
gtaaaatcca ttgtcgcatc cacctttatt atcagtctct 12480tccccacaac
aatattcatg tgcctagacc aagaagttat tatctcgaac tgacactgag
12540ccacaaccca aacaacccag ctctccctaa gcttcaaact agactacttc
tccataatat 12600tcatccctgt agcattgttc gttacatggt ccatcataga
attctcactg tgatatataa 12660actcagaccc aaacattaat cagttcttca
aatatctact catcttccta attaccatac 12720taatcttagt taccgctaac
aacctattcc aactgttcat cggctgagag ggcgtaggaa 12780ttatatcctt
cttgctcatc agttgatgat acgcccgagc agatgccaac acagcagcca
12840ttcaagcaat cctatacaac cgtatcggcg atatcggttt catcctcgcc
ttagcatgat 12900ttatcctaca ctccaactca tgagacccac aacaaatagc
ccttctaaac gctaatccaa 12960gcctcacccc actactaggc ctcctcctag
cagcagcagg caaatcagcc caattaggtc 13020tccacccctg actcccctca
gccatagaag gccccacccc agtctcagcc ctactccact 13080caagcactat
agttgtagca ggaatcttct tactcatccg cttccacccc ctagcagaaa
13140atagcccact aatccaaact ctaacactat gcttaggcgc tatcaccact
ctgttcgcag 13200cagtctgcgc ccttacacaa aatgacatca aaaaaatcgt
agccttctcc acttcaagtc 13260aactaggact cataatagtt acaatcggca
tcaaccaacc acacctagca ttcctgcaca 13320tctgtaccca cgccttcttc
aaagccatac tatttatgtg ctccgggtcc atcatccaca 13380accttaacaa
tgaacaagat attcgaaaaa taggaggact actcaaaacc atacctctca
13440cttcaacctc cctcaccatt ggcagcctag cattagcagg aatacctttc
ctcacaggtt 13500tctactccaa agaccacatc atcgaaaccg caaacatatc
atacacaaac gcctgagccc 13560tatctattac tctcatcgct acctccctga
caagcgccta tagcactcga ataattcttc 13620tcaccctaac aggtcaacct
cgcttcccca cccttactaa cattaacgaa aataacccca 13680ccctactaaa
ccccattaaa cgcctggcag ccggaagcct attcgcagga tttctcatta
13740ctaacaacat ttcccccgca tcccccttcc aaacaacaat ccccctctac
ctaaaactca 13800cagccctcgc tgtcactttc ctaggacttc taacagccct
agacctcaac tacctaacca 13860acaaacttaa aataaaatcc ccactatgca
cattttattt ctccaacata ctcggattct 13920accctagcat cacacaccgc
acaatcccct atctaggcct tcttacgagc caaaacctgc 13980ccctactcct
cctagaccta acctgactag aaaagctatt acctaaaaca atttcacagc
14040accaaatctc cacctccatc atcacctcaa cccaaaaagg cataattaaa
ctttacttcc 14100tctctttctt cttcccactc atcctaaccc tactcctaat
cacataacct attcccccga 14160gcaatctcaa ttacaatata tacaccaaca
aacaatgttc aaccagtaac tactactaat 14220caacgcccat aatcatacaa
agcccccgca ccaataggat cctcccgaat caaccctgac 14280ccctctcctt
cataaattat tcagcttcct acactattaa agtttaccac aaccaccacc
14340ccatcatact ctttcaccca cagcaccaat cctacctcca tcgctaaccc
cactaaaaca 14400ctcaccaaga cctcaacccc tgacccccat gcctcaggat
actcctcaat agccatcgct 14460gtagtatatc caaagacaac catcattccc
cctaaataaa ttaaaaaaac tattaaaccc 14520atataacctc ccccaaaatt
cagaataata acacacccga ccacaccgct aacaatcaat 14580actaaacccc
cataaatagg agaaggctta gaagaaaacc ccacaaaccc cattactaaa
14640cccacactca acagaaacaa agcatacatc attattctcg cacggactac
aaccacgacc 14700aatgatatga aaaaccatcg ttgtatttca actacaagaa
caccaatgac cccaatacgc 14760aaaactaacc ccctaataaa attaattaac
cactcattca tcgacctccc caccccatcc 14820aacatctccg catgatgaaa
cttcggctca ctccttggcg cctgcctgat cctccaaatc 14880accacaggac
tattcctagc catgcactac tcaccagacg cctcaaccgc cttttcatca
14940atcgcccaca tcactcgaga cgtaaattat ggctgaatca tccgctacct
tcacgccaat 15000ggcgcctcaa tattctttat ctgcctcttc ctacacatcg
ggcgaggcct atattacgga 15060tcatttctct actcagaaac ctgaaacatc
ggcattatcc tcctgcttgc aactatagca 15120acagccttca taggctatgt
cctcccgtga ggccaaatat cattctgagg ggccacagta 15180attacaaact
tactatccgc catcccatac attgggacag acctagttca atgaatctga
15240ggaggctact cagtagacag tcccaccctc acacgattct ttacctttca
cttcatcttg 15300cccttcatta ttgcagccct agcaacactc cacctcctat
tcttgcacga aacgggatca 15360aacaaccccc taggaatcac ctcccattcc
gataaaatca ccttccaccc ttactacaca 15420atcaaagacg ccctcggctt
acttctcttc cttctctcct taatgacatt aacactattc 15480tcaccagacc
tcctaggcga cccagacaat tataccctag ccaacccctt aaacacccct
15540ccccacatca agcccgaatg atatttccta ttcgcctaca caattctccg
atccgtccct 15600aacaaactag gaggcgtcct tgccctatta ctatccatcc
tcatcctagc aataatcccc 15660atcctccata tatccaaaca acaaagcata
atatttcgcc cactaagcca atcactttat 15720tgactcctag ccgcagacct
cctcattcta acctgaatcg gaggacaacc agtaagctac 15780ccttttacca
tcattggaca agtagcatcc gtactatact tcacaacaat cctaatccta
15840ataccaacta tctccctaat tgaaaacaaa atactcaaat gggcctgtcc
ttgtagtata 15900aactaataca ccagtcttgt aaaccggaga tgaaaacctt
tttccaagga caaatcagag 15960aaaaagtctt taactccacc attagcaccc
aaagctaaga ttctaattta aactattctc 16020tgttctttca tggggaagca
gatttgggta ccacccaagt attgactcac ccatcaacaa 16080ccgctatgta
tttcgtacat tactgccagc caccatgaat attgtacggt accataaata
16140cttgaccacc tgtagtacat aaaaacccaa tccacatcaa aaccccctcc
ccatgcttac 16200aagcaagtac agcaatcaac cctcaactat cacacatcaa
ctgcaactcc aaagccaccc 16260ctcacccact aggataccaa caaacctacc
cacccttaac agtacatagt acataaagcc 16320atttaccgta catagcacat
tacagtcaaa tcccttctcg tccccatgga tgacccccct 16380cagatagggg
tcccttgacc accatcctcc gtgaaatcaa tatcccgcac aagagtgcta
16440ctctcctcgc tccgggccca taacacttgg gggtagctaa agtgaactgt
atccgacatc 16500tggttcctac ttcagggtca taaagcctaa atagcccaca
cgttcccctt aaataagaca 16560tcacgatg 165685214DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 52aaaaattttc ccgg 14
* * * * *