U.S. patent application number 11/177646 was filed with the patent office on 2006-09-21 for re-sequencing pathogen microarray.
Invention is credited to Brian K. Agan, Eric H. Hanson, Jennifer Johnson, Russell P. Kruzelock, Baochuan Lin, Robb K. Rowley, Donald Seto, David A. Stenger, Dzung C. Thach, Clark J. Tibbetts, Gary J. Vora, Elizabeth A. Walter, Zheng Wang.
Application Number | 20060210967 11/177646 |
Document ID | / |
Family ID | 36916882 |
Filed Date | 2006-09-21 |
United States Patent
Application |
20060210967 |
Kind Code |
A1 |
Agan; Brian K. ; et
al. |
September 21, 2006 |
Re-sequencing pathogen microarray
Abstract
The present invention relates to pathogen detection and
identification by use of DNA resequencing microarrays. The present
invention also provides resequencing microarray chips for
differential diagnosis and serotyping of pathogens present in a
biological sample. The present invention further provides methods
of detecting the presence and identity of pathogens present in a
biological sample.
Inventors: |
Agan; Brian K.; (San
Antonio, TX) ; Hanson; Eric H.; (Alexandria, VA)
; Kruzelock; Russell P.; (San Antonio, TX) ; Lin;
Baochuan; (Bethesda, MD) ; Rowley; Robb K.;
(Alexandria, VA) ; Seto; Donald; (Manassas,
VA) ; Stenger; David A.; (Herndon, VA) ;
Johnson; Jennifer; (Finchville, KY) ; Tibbetts; Clark
J.; (Sperryville, VA) ; Thach; Dzung C.;
(Annandale, VA) ; Vora; Gary J.; (Washington,
DC) ; Walter; Elizabeth A.; (San Antonio, TX)
; Wang; Zheng; (Burke, VA) |
Correspondence
Address: |
DEPARTMENT OF THE AIR FORCE
AFMC LO/JAZ
2240 B ST., RM. 100
WRIGHT-PATTERSON AFB
OH
45433-7109
US
|
Family ID: |
36916882 |
Appl. No.: |
11/177646 |
Filed: |
July 2, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60590931 |
Jul 2, 2004 |
|
|
|
60609918 |
Sep 15, 2004 |
|
|
|
60631437 |
Nov 29, 2004 |
|
|
|
60631460 |
Nov 29, 2004 |
|
|
|
60691768 |
Jun 16, 2005 |
|
|
|
Current U.S.
Class: |
435/5 ;
435/287.2; 435/6.11 |
Current CPC
Class: |
C12Q 1/6888 20130101;
C12Q 1/6874 20130101; C12Q 1/6893 20130101; C12Q 1/689 20130101;
C12Q 1/701 20130101; G16B 30/00 20190201 |
Class at
Publication: |
435/005 ;
435/006; 435/287.2 |
International
Class: |
C12Q 1/70 20060101
C12Q001/70; C12Q 1/68 20060101 C12Q001/68; C12M 1/34 20060101
C12M001/34 |
Goverment Interests
STATEMENT REGARDING FEDERALLY FUNDED PROJECT
[0002] The United States Government owns rights in the present
invention pursuant to funding from the Defense Threat Reduction
Agency (DTRA; Interagency Cost Reimbursement Order (IACRO
#02-4118), MIPR numbers 01-2817, 02-2292, 02-2219, and 02-2887),
the Office of the U.S. Air Force Surgeon General (HQ USAF SGR; MIPR
Numbers NMIPR035203650, NMIPRONMIEPRO35203881,
NMIPRONMIPRO35203881), the U.S. Army Medical Research Acquisition
Activity (Contract # DAMD17-03-2-0089), the Defense Advance
Research Projects Agency (DARPA; MIPR Number M189/02), and the
Office of Naval Research (NRL Work Unit 6456).
Claims
1. A resequencing DNA microarray comprising multiple groups of
oligonucleotide primers of a length ranging from 13 to 70
nucleotides immobilized to a solid phase support, wherein each
group of oligonucleotide primers is: (a) selected to span a
particular region of a distinct reference sequence, (b) occupies a
discrete area of the array termed a tiled region, and (c) comprises
at least four sets of primers arranged in a parallel fashion within
said discrete area of the array, wherein said four sets of primers
are selected as follows: i) a first primer that is exactly
complementary to the reference sequence; and ii) three additional
primers, each of which is identical to the first primer but for the
nucleotide at a central position, which is different in each of the
three sets such that all four conventional nucleotide bases are
present on said array.
2. The resequencing DNA microarray according to claim 1, wherein
the length of the oligonucleotide primers is 25 nucleotides.
3. The resequencing DNA microarray according to claim 1, wherein
the region of the reference sequence that is spanned by the
oligonucleotide primer moves by (n+1) nucleotides across the
reference sequence for each adjacent tiled region across the
microarray surface.
4. The resequencing DNA microarray according to claim 1, wherein
the resequencing DNA microarray contains 18.times.18 micron
features.
5. The resequencing DNA microarray according to claim 1, wherein
the resequencing DNA microarray contains 8.times.8 micron
features.
6. The resequencing DNA microarray according to claim 1, wherein
the sequences selected for tiling are a single gene or subsequence
that may represent a much broader class of organism genus, species
and subspecies.
7. The resequencing DNA microarray according to claim 1, wherein
the sequences selected for tiling are "prototypes" representing
genotypes of pathogen families.
8. The resequencing DNA microarray according to claim 1, wherein
the sequences selected for tiling are "prototypes" representing a
family or group of adenoviruses.
9. The resequencing DNA microarray according to claim 1, wherein
the sequences selected for tiling are "prototypes" representing a
family or group of influenza viruses.
10. The resequencing DNA microarray according to claim 1, wherein
the sequences selected for tiling are a single gene or subsequence
unique to an individual pathogenic strain.
11. The resequencing DNA microarray according to claim 1, wherein
the sequences selected for tiling encode a drug-resistance
marker.
12. The resequencing DNA microarray according to claim 1, wherein
the resequencing DNA microarray is RPMV1.
13. The resequencing DNA microarray according to claim 1, wherein
the resequencing DNA microarray is RPMV2.
14. The resequencing DNA microarray according to claim 1, wherein
said resequencing DNA microarray contains at least one reference
sequence for a common pathogen and at least one biological
terrorism agent.
15. A kit comprising: (a) the resequencing DNA microarray according
to claim 1, and (b) reagents suitable for specific hybridization of
target sequences to the probe sequences present on said
resequencing DNA microarray.
16. A method of detecting the presence of a drug-resistance marker
in a microorganism, wherein the method comprises: (a) providing the
resequencing DNA microarray according to claim 1; (b) contacting an
unknown sample comprising genetic material to said resequencing DNA
microarray; (c) hybridizing the contents of said unknown sample to
the probe sequences immobilized on said resequencing DNA microarray
under suitable conditions and for a suitable time; and (d)
detecting the presence of a drug-resistance marker in a
microorganism in said unknown sample.
17. The method according to claim 16, wherein said method further
comprises determining the identity of the drug-resistance marker by
sequence comparison between the DNA sequence of the drug-resistance
marker identified by said method and known resistance markers.
18. The method according to claim 16, wherein said hybridizing is
for a time ranging from 15 minutes to 24 hours.
19. The method according to claim 16, wherein the unknown sample is
a biological sample.
20. The method according to claim 19, wherein said biological
sample is selected from the group consisting of a nasal wash
specimen, a nasal aspirate, a throat swab, a blood sample, a sputum
sample, blood cells, a tissue sample, a fine needle biopsy sample,
a urine specimen, a peritoneal fluid sample, a visceral fluid
sample, and a pleural fluid sample, or cells therefrom.
21. The method according to claim 16, wherein said unknown sample
is an environmental sample selected from the group consisting of a
soil sample, an air sample and a water sample.
22. The method according to claim 16, wherein prior to said
hybridizing, the unknown sample is subjected to at least one of
process selected from the group consisting of: (i) isolation of the
genetic material within said sample, (ii) enrichment for target
sequences of interest within said sample, (iii) amplification of
the genetic material contained within said sample, (iv) labeling
the genetic material within said sample, and (v) subtractive
hybridization.
23. The method according to claim 16, wherein prior to said
hybridizing one or more target nucleic acids of interest in the
unknown sample is amplified by at least one method selected from
the group consisting of specific reverse transcription, PCR,
multiplex PCR and random PCR.
24. The method according to claim 16, wherein prior to said
hybridizing one or more target nucleic acids of interest in the
unknown sample, the unknown sample is subjected to at least one
random amplification strategy selected from the group consisting of
random primed, isothermal Klenow polymerase-based, .PHI.29DNA
polymerase-based, tandem amplification, multiplex PCR, and total
amplification.
25. The method according to claim 16, wherein the genetic material
present in said unknown sample has been enriched.
26. The method according to claim 16, wherein one or more target
nucleic acids of interest present in said unknown sample are
enriched by subtraction of the background nucleic acids from said
sample.
27. The method according to claim 16, wherein one or more target
nucleic acids of interest present in said unknown sample are
enriched by reverse-transcriptase subtractive hybridization.
28. The method according to claim 16, wherein one or more target
nucleic acids of interest present in said unknown sample are
enriched by selective removal of said target nucleic acids from a
mixture of nucleic acids presenting said unknown sample.
29. The method according to claim 16, wherein said detecting is by:
(i) making base calls based on the hybridization response between
nucleic acids of interest in said unknown sample and the probe DNA
on the resequencing DNA microarray, and (ii) determining the
sequence of the corresponding full-length gene or genomic fragment
by comparing the sequence of the hybridized region to sequences
present in a sequence database.
30. The method according to claim 29, wherein the base calls are
made by Affymetrix GDAS software under "permissive" settings.
31. The method according to claim 29, wherein sequence
determination is by Resequencing Pathogen Identifier (REPI)
software.
32. The method according to claim 29, wherein the sequence database
is GenBank.
33. A method of routine diagnosis of common respiratory pathogens
by performing the method according to claim 16, wherein said
unknown sample is obtained from a subject in need of routine
diagnosis of common respiratory pathogens.
34. The method according to claim 33, wherein said unknown sample
is obtained during a physical examination.
35. A method of surveillance of common respiratory pathogens by
performing the method according to claim 16, wherein said unknown
sample is obtained from a subject in need of surveillance of common
respiratory pathogens.
36. The method according to claim 35, wherein said subject in need
of surveillance of common respiratory pathogens has a history of
respiratory pathogen infection.
37. A method of surveillance of one or more biological terrorism
agents by performing the method according to claim 16, wherein said
unknown sample is obtained from a subject in need of surveillance
of said biological terrorism agents.
38. The method according to claim 37, wherein said subject in need
of surveillance of biological terrorism agents is suspected of
being exposed to said biological terrorism agents.
39. A method of detecting the presence of a microorganism belonging
to a particular class of organism species or subspecies in an
unknown sample, wherein the method comprises: (a) providing a
resequencing DNA microarray according to claim 1; (b) contacting
the unknown sample comprising genetic material to said resequencing
DNA microarray; (c) hybridizing the contents of said unknown sample
to the probe sequences immobilized on said resequencing DNA
microarray under suitable conditions and for a suitable time; and
(d) detecting the presence of a microorganism belonging to a
particular class of organism species or subspecies in said unknown
sample.
40. The method according to claim 39, wherein said method further
comprises determining the identity of the particular class of
organism species or subspecies by sequence comparison between the
DNA sequence identified by said method and known sequences present
in a genomic database.
41. The method according to claim 40, wherein said determining
comprises detecting a sufficient amount of pathogen sequence in
order to allow a forensic assessment of the possible source of
pathogens.
42. The method according to claim 39, wherein said hybridizing is
for a time ranging from 15 minutes to 24 hours.
43. The method according to claim 39, wherein the unknown sample is
a biological sample.
44. The method according to claim 43, wherein said biological
sample is selected from the group consisting of a nasal wash
specimen, a nasal aspirate, a throat swab, a blood sample, a sputum
sample, blood cells, a tissue sample, a fine needle biopsy sample,
a urine specimen, a peritoneal fluid sample, a visceral fluid
sample, and a pleural fluid sample, or cells therefrom.
45. The method according to claim 39, wherein said unknown sample
is an environmental sample selected from the group consisting of a
soil sample, an air sample, and a water sample.
46. The method according to claim 39, wherein prior to said
hybridizing the unknown sample is subjected to at least one of
process selected from the group consisting of: (i) isolation of the
genetic material within said sample, (ii) enrichment for target
sequences of interest within said sample, (iii) amplification of
the genetic material contained within said sample, (iv) labeling
the genetic material within said sample, and (v) subtractive
hybridization.
47. The method according to claim 39, wherein prior to said
hybridizing one or more target nucleic acids of interest in the
unknown sample is amplified by at least one method selected from
the group consisting of specific reverse transcription, PCR,
multiplex PCR, and random PCR.
48. The method according to claim 39, wherein prior to said
hybridizing one or more target nucleic acids of interest in the
unknown sample, the unknown sample is subjected to at least one
random amplification strategy selected from the group consisting of
random primed, isothermal Klenow polymerase-based, .PHI.29DNA
polymerase-based, tandem amplification, multiplex PCR, and total
amplification.
49. The method according to claim 39, wherein the genetic material
present in said unknown sample has been enriched.
50. The method according to claim 39, wherein one or more target
nucleic acids of interest present in said unknown sample are
enriched by subtraction of the background nucleic acids from said
sample.
51. The method according to claim 39, wherein one or more target
nucleic acids of interest present in said unknown sample are
enriched by reverse-transcriptase subtractive hybridization.
52. The method according to claim 39, wherein one or more target
nucleic acids of interest present in said unknown sample are
enriched by selective removal of said target nucleic acids from a
mixture of nucleic acids presenting said unknown sample.
53. The method according to claim 39, wherein said detecting is by:
(i) making base calls based on the hybridization response between
nucleic acids of interest in said unknown sample and the probe DNA
on the resequencing DNA microarray, and (ii) determining the
sequence of the corresponding full-length gene or genomic fragment
by comparing the sequence of the hybridized region to sequences
present in a sequence database.
54. The method according to claim 53, wherein the base calls are
made by the Affymetrix GDAS software under "permissive"
settings.
55. The method according to claim 53, wherein sequence
determination is by Resequencing Pathogen Identifier (REPI)
software.
56. The method according to claim 53, wherein the sequence database
is GenBank.
57. A method of routine diagnosis of common respiratory pathogens
by performing the method according to claim 39, wherein said
unknown sample is obtained from a subject in need of routine
diagnosis of common respiratory pathogens.
58. The method according to claim 57, wherein said unknown sample
is obtained during a physical examination.
59. A method of surveillance of common respiratory pathogens by
performing the method according to claim 39, wherein said unknown
sample is obtained from a subject in need of surveillance of common
respiratory pathogens.
60. The method according to claim 59, wherein said subject in need
of surveillance of common respiratory pathogens has a history of
respiratory pathogen infection.
61. A method of surveillance of one or more biological terrorism
agents by performing the method according to claim 39, wherein said
unknown sample is obtained from a subject in need of surveillance
of said biological terrorism agents.
62. The method according to claim 61, wherein said subject in need
of surveillance of biological terrorism agents is suspected of
being exposed to said biological terrorism agents.
63. A method of estimating the relative amount of a pathogen in a
biological sample containing the same comprising: (a) providing a
resequencing DNA microarray according to claim 1; (b) contacting
said biological sample to said resequencing DNA microarray; (c)
hybridizing the contents of said unknown sample to the probe
sequences immobilized on said resequencing DNA microarray under
suitable conditions and for a suitable time; and (d) quantifying
the presence and/or identity of a drug-resistance marker in a
microorganism or a microorganism belonging to a particular class of
organism species or subspecies in said unknown sample.
64. The method according to claim 63, wherein said quantifying is
by determining the absolute intensity of the hybridization signals
on said resequencing DNA microarray.
65. The method according to claim 63, wherein said quantifying is
by determining the percentage of base calls, both as a percentage
of the total tile region size and as a percentage of base calls
within a selected subsequence satisfying a sliding window
algorithm.
66. The method according to claim 65, wherein said quantifying is
by Resequencing Pathogen Identifier (REPI) software.
67. A genomic sequence of an adenovirus strain selected from the
group consisting of Ad3, Ad3FS_navy, Ad4, Ad4vaccine, Ad4FS_navy,
Ad4FS_AF, Ad5FS, Ad7, Ad7FS_navy, Ad7 vaccine, Ad16, Ad1, and Ad21,
and fragments thereof.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority to U.S. provisional
Application Ser. No. 60/590,931, filed on Jul. 2, 2004, U.S.
provisional Application Ser. No. 60/609,918 filed on Sep. 15, 2004,
U.S. provisional Application Ser. No. 60/631,437 filed on Nov. 29,
2004, U.S. provisional Application Ser. No. 60/631,460 filed on
Nov. 29, 2004 and U.S. provisional Application Ser. No. 60/691,768
filed on Jun. 16, 2005. This application is also related to U.S.
non-provisional application Ser. No. ______, titled
"Computer-Implemented Biological Sequence Identifier System and
Method," filed along with this application on Jul. 2, 2005. The
entire contents of these applications are incorporated herein by
reference.
REFERENCE TO SEQUENCE LISTING
[0003] The present application includes a sequence listing on an
accompanying compact disk containing a single file named
272918US59SDSt2-5.txt, created on Jul. 1, 2005, 639 KB in size, and
additionally labeled:
[0004] "Inventors: Brian K. Agan, Eric H. Hanson, Russell P.
Kruzelock, Baochuan Lin et al.
[0005] Invention: "Re-Sequencing Pathogen Microarray"
[0006] The entire contents of that accompanying compact disk are
incorporated by reference into this application.
BACKGROUND OF THE INVENTION
[0007] 1. Field of the Invention
[0008] The present invention provides pathogen detection by use of
DNA resequencing microarrays. Preferably, the present invention
provides for simultaneous detection of multiple pathogens. The
present invention also provides resequencing microarrays and
microarray chips for differential diagnosis and fine-scale
discrimination between closely related pathogens present in a
biological sample. The present invention further provides methods
of detecting the presence and identity of pathogens present in a
biological sample. The invention enables diagnosis and surveillance
of known pathogen sequences and pathogens that may be identified
due to unanticipated sequence variations, as well as mixtures of
such pathogens. Resequencing, combined with several amplification
strategies, allows simultaneous clinical diagnosis and performance
of traditional surveillance assays for serotyping, antibiotic
resistance profiling, genetic drift/shift analysis, forensics, and
rapid detection of biological terrorism events.
[0009] 2. Discussion of the Background
[0010] As we move through the biotechnology age fostered by the
human genome project a premium has been placed on the development
of high throughput methodologies to obtain and analyze sequence
information. To meet this demand, the multifunctional DNA
microarray platform has gained notoriety leading to an explosive
growth in application methods using the same.
[0011] More importantly, the evolution of world events and the
emergence of bioterrorism in mainstream society have led to a
growing sentiment amongst the scientific community and lay people
alike that new, rapid, and accurate techniques for biological
threat identification and eradication must be developed. The
concept of a microarray used for broad-spectrum pathogen
identification has considerable and obvious appeal to both medical
practice and national defense. It is within this framework that the
present inventors have endeavored.
[0012] Heretofore, for the purpose of pathogen identification,
approaches generally rely on the ability of immobilized "probe" DNA
sequences on the surfaces of microarrays to hybridize with
complementary genomic "target" that is uniquely identifying of a
particular category or specific strain of microbial pathogen.
Various microarray technologies have been developed for this
purpose, varying in the density of probes and the time ranges
required for assay completion.
[0013] One technical challenge for pathogen detection with
microarrays arises due to the difficulty in obtaining samples with
a sufficient quantity of pathogen nucleic acid. Thus, for a
majority of sample types, some sort of target amplification will
likely be required to provide sufficient copies of pathogen gene
markers for detection by microarray hybridization. Unfortunately,
conventional methods for this amplification do not scale well in
comparison to the number of probes that can be placed on a
microarray chip. However, the most commonly employed means of
providing sufficient quantities of genomic target to detect
hybridization relies upon genotypic identification methods that
utilize molecular biology-based techniques, such as the polymerase
chain reaction (PCR). These techniques offer several potential
advantages over conventional microbiological approaches. Nucleic
acid amplification strategies base pathogen identification on the
detection of genetic information contained within the organism,
such that culturing the organism is not required.
[0014] Although PCR-based assays are sensitive, accurate, and
rapid, these methods also introduce a new set of problems. As
successful identification depends almost entirely on appropriately
chosen primer sets, as PCR-based testing requires assumptions about
the exact sequences pertaining to the identity of the target
organism(s). Consequently, there is a critical need for advanced
diagnostic systems that can detect both assumed and unanticipated
pathogen sequences. DNA microarrays, which enable the simultaneous
interrogation of thousands of genetic elements, address this
crucial need. Here, the term "microarray" refers to any type of
planar substrate or solid beads presenting a high multiplicity
(10.sup.2 to 10.sup.6) of individual sites, each presenting nucleic
acid probes designed to selectively capture complementary strands
of target (i.e. pathogen or host) nucleic acid.
[0015] However, the majority of pathogen identification microarrays
described in the literature is prepared using oligonucleotides that
are robotically spotted onto derivatized glass surfaces (typically
3.times.1 inch microscope slides). This approach allows the most
flexibility with regards to the size of the oligonucleotides that
are deposited, ranging from 20-mers to cDNA PCR products of several
thousand base pairs (bp). With few exceptions, the detection event
is an increased level of fluorescence originating from a spot
following hybridization of a fluorophore-labeled target nucleic
acid.
[0016] Short (14-25 mer) oligonucleotides, immobilized inside
acrylamide pads, have been applied extensively to pathogen
identification (Strizhkov et al., 2000; Vasiliskov et al., 1999) in
a collaborative effort between Argonne National Lab (DOE, USA) and
the Engelhard Institute of Molecular Biology (Moscow, RU) under the
leadership of Andrei Mirzabekov. In addition, low-density
microarrays (several hundred features per 3.times.1 inch microscope
slide) have been used for determination of drug resistance
determinants (Volokhov et al., 2003). One distinguishing aspect of
this body of work is the use of three-dimensional polymer matrices
for probe immobilization instead of two-dimensional planar
surfaces.
[0017] More recently, Cherkasova et al have described the use of
glass-immobilized short oligonucleotide spotted microarrays to map
poliovirus mutations using overlapping 14-25 mer probes (Cherkasova
et al., 2003). Two variations of this approach have been used: (1)
Microarrays for Resequencing and Sequence Heterogeneity (MARSH)
assay, and (2) Microarray Analysis of Viral Recombination (MAVR)
assay. MARSH uses a set of overlapping (at half length) nucleotide
probes for individual gene sequences. Hybridizations patterns allow
the detection of single point mutations or substitution/deletion
events to a resolution of half probe lengths (e.g. 7-10 bp) but
does not allow for exact determination of position(s) or the nature
of the mutation. Accordingly, conventional DNA sequencing
technologies must be employed subsequently to determine these
changes. MAVR uses organism-specific oligonucleotide probes that
cover the entire genome at .about.150 nt spacings and is used to
detect large scale genetic recombinations.
[0018] The DeRisi group at UCSF pioneered the use of long (70-mer)
oligonucleotide probe microarrays for broad-spectrum pathogen
identification (Wang et al., 2002; Wang et al., 2003). The use of
long (70 nt) oligonucleotides bears implicit advantages and
disadvantages. One advantage is that higher degrees of sensitivity
can usually be achieved with 70-mer probes compared to shorter ones
(e.g. 20-25 mers). However, specificity is reduced because 70-mer
target/probe hybridizations are generally insensitive to
significant numbers (e.g., 7-10) of single base mismatches, whereas
shorter probes provide much greater sequence specificity.
[0019] DeRisi's group described the use of spotted microarrays
having 1,600 different 70-mer oligonucleotide probes to identify a
variety of viruses responsible for common respiratory infections
(Wang et al., 2002). The probes were selected for each pathogen
using an algorithm that located discriminatory sequences from a
list of known viral genomes. A serial combination of a previously
described (Bohlander et al., 1992) method and subsequent PCR/Klenow
fragment-based amplification was used to achieve non-biased
amplification of both viral RNA and DNA, allowing generation of
sufficient amounts of target amplicons for successful microarray
hybridization and detection via fluorescent label. (N.B. This
protocol was placed into the public domain via the DeRisi lab
website.(http://derisilab.ucsf.edu)). The time required from sample
preparation to obtained result was approximately 24 hours. Because
exact sequence information was not attainable from such arrays,
pathogen identifications were made on the basis of a hybridization
pattern that could be empirically determined for each pathogen or
strain. In a related report from the same group (Wang et al., 2003)
similar microarrays were prepared using highly conserved sequences
in an effort to capture as many microbial species as possible from
a sample. Following physical removal of the pathogen sequences from
the microarray, the sequences are cloned and sequenced using
conventional DNA sequencing technologies. No measure of
analytical/clinical sensitivity or specificity for pathogen
detection in clinical specimens was provided in the work from the
DeRisi group.
[0020] In contrast to the above-mentioned approaches using spotted
microarrays, Affymetrix, Inc. (Santa Clara, Calif.) uses
high-density probe fabrication technology to construct "tiled"
microarrays using 4 probes each in both the sense and anti-sense
directions for each nucleotide base to be resequenced. Thus, single
base substitutions are directly detected by the hybridization
pattern (for additional information see Affymetrix CustomSeq design
manual). Several groups described the use of tiled microarrays for
pathogen genotyping. (Kozal et al., 1996) utilized this type of
microarray to measure mutational drift in HIV while Gingeras et al
(Gingeras et al., 1998; Troesch et al., 1999) used a tiled array of
65,000 oligomer probes to resequence and accurately identify 70
clinical isolates of 27 mycobacterial species and 15
rifampin-resistant M. tuberculosis strains. More recently, Andersen
et al. (Wilson et al., 2002b) described the use of tiled Affymetrix
microarrays for the identification of biological warfare agents.
Their approach relied entirely on the use of specific PCR reactions
performed in parallel to generate sufficient pathogen target DNA
for microarray hybridization. In all cases listed above, specific
PCR primers were used to amplify DNA targets prior to microarray
hybridizations, through the use of conserved primer sites, or in
the work of Andersen et al. (Wilson et al., 2002a; Wilson et al.,
2002b), by carrying out up to .about.150 different PCR reactions in
multi-well format and pooling the amplicons.
[0021] U.S. Pat. No. 6,228,575 B1 describes the same data as
described by Gingeras (Gingeras et al., 1998) and Troesch (Troesch
et al., 1999). In this patent, which is incorporated herein by
reference in its entirety, target pathogen sequences are tiled onto
arrays. Because several types of variations (esp.
insertion/deletion or frequent multiple substitutions) in pathogen
sequence can perturb hybridization patterns, Gingeras et al. used
differential measures of specific pathogen hybridization patterns
to identify individual mycobacterial variants. That is,
identifications required a priori knowledge of a differential
hybridization pattern that was empirically determined in ground
truth experiments.
[0022] As stated above, there is a critical need for advanced
diagnostic systems that can detect known and pathogen genomic
sequences as well as variations of those sequences. More
particularly, there remains a critical demand for DNA microarray
techniques that are fast and reliable, but are free from the
systemic bias flowing from the specific PCR based methods that have
heretofore been employed.
SUMMARY OF THE INVENTION
[0023] It is an object of the present invention to provide the
following preferred embodiments: [0024] A. In an embodiment of the
present invention is a resequencing DNA microarray chip of multiple
groups of oligonucleotide primers of a length ranging from 13 to 70
nucleotides immobilized to a solid phase support, wherein each
group of oligonucleotide primers is selected to span a particular
region of the reference sequence, occupying a discrete area of the
array (e.g. a tile), and comprising at least four sets of primers
arranged in a parallel fashion on the chip: 1) a first set that is
exactly complementary to the reference sequence; and 2) three
additional sets of primers, each of which is identical to the first
set of primers but for the nucleotide at a central position, which
is different in each of the three sets such that all four
conventional nucleotide bases are present on said array. [0025] a.
In a preferred aspect of this embodiment, the length of the
oligonucleotide primers is 25 nucleotides. [0026] b. In a preferred
aspect of this embodiment, the region of the reference sequence
that is spanned by the oligonucleotide primer moves by (n+1)
nucleotides across the reference sequence for each adjacent tiled
region across the microarray surface. [0027] c. In a preferred
aspect of this embodiment, the resequencing DNA microarray contains
18.times.18 micron features. [0028] d. In a preferred aspect of
this embodiment, the resequencing DNA microarray contains 8.times.8
micron features. [0029] e. In a preferred aspect of this
embodiment, the sequences selected for tiling are a single gene or
subsequence that may represent a much broader class of organism
genus, species and subspecies. [0030] f. In a preferred aspect of
this embodiment, the sequences selected for tiling are "prototypes"
representing genotypes of pathogen families. [0031] g. In a
preferred aspect of this embodiment, the sequences selected for
tiling are "prototypes" representing a family or group of
adenoviruses. [0032] h. In a preferred aspect of this embodiment,
the sequences selected for tiling are "prototypes" representing a
family or group of influenza viruses. [0033] i. In a preferred
aspect of this embodiment, the sequences selected for tiling are a
single gene or subsequence unique to an individual pathogenic
strain. [0034] j. In a preferred aspect of this embodiment, the
sequences selected for tiling encode a drug-resistance marker.
[0035] k. In a preferred aspect of this embodiment, the
resequencing DNA microarray is a Version 1 Respiratory Pathogen
Microarray (RPMV1). [0036] l. In a preferred aspect of this
embodiment, the resequencing DNA microarray is a Version 2
Respiratory Pathogen Microarray (RPMV2). [0037] m. In a preferred
aspect of this embodiment, at least one common pathogen and at
least one biological terrorism agent is represented on the same
chip. [0038] n. In a preferred aspect of this embodiment, the
resequencing DNA microarray embraces any combination of the
aforementioned aspects. [0039] B. In an embodiment of the present
invention is a kit containing (a) the aforementioned resequencing
DNA microarray, and (b) reagents suitable for specific
hybridization of target sequences to the probe sequences present on
said resequencing DNA microarray. C. In an embodiment of the
present invention is a method of detecting the presence of a
drug-resistance marker in a microorganism or a microorganism
belonging to a particular class of organism species or subspecies,
wherein the method comprises: (1) providing a resequencing DNA
microarray as described above; (2) contacting to said resequencing
DNA microarray a unknown sample; (3) hybridizing the contents of
said unknown sample to the probe sequences immobilized on said
resequencing DNA microarray under suitable conditions and for a
suitable time; (4) detecting the presence and/or identity of a
drug-resistance marker in a microorganism or a microorganism
belonging to a particular class of organism species or subspecies
in said unknown sample, and (5) detecting a sufficient amount of
pathogen sequence in order to allow a forensic assessment of the
possible source(s) of pathogens. [0040] a. In a preferred aspect of
this embodiment, the method is for detecting the presence of a
particular pathogenic species. [0041] b. In a preferred aspect of
this embodiment, the method is for detecting the presence of a
drug-resistance marker. [0042] c. In a preferred aspect of this
embodiment, the hybridization time ranges from 15 minutes to 24
hours. [0043] d. In a preferred aspect of this embodiment, the
unknown sample is a biological sample, including a nasal wash
specimen, a throat swab, a blood sample, and a sputum sample, or an
environmental sample, including a soil sample, an air sample, and a
water sample. [0044] e. In a preferred aspect of this embodiment,
prior to hybridizing the unknown sample is subjected to one or more
of the following steps: (1) isolation, (2) enrichment for target
sequences of interest, (3) amplification, (4) labeling, and (5)
hybridization (e.g., subtractive). [0045] f. In a preferred aspect
of this embodiment, prior to hybridizing the target nucleic acids
of interest in the unknown sample is amplified by specific reverse
transcription (RT), PCR, multiplex PCR, and/or random PCR. [0046]
g. In a preferred aspect of this embodiment, prior to hybridizing
the target nucleic acids of interest, the unknown sample is
subjected to a random amplification strategy (e.g., random primed,
isothermal Klenow polymerase-based, .phi.29DNA polymerase-based,
tandem amplification, multiplex PCR, and total amplification).
[0047] h. In a preferred aspect of this embodiment, the DNA present
in said unknown sample has been enriched. [0048] i. In a preferred
aspect of this embodiment, the target nucleic acids of interest
present in said unknown sample are enriched by subtraction of the
background nucleic acids from said sample. [0049] j. In a preferred
aspect of this embodiment, the target nucleic acids of interest
present in said unknown sample are enriched by selective removal of
said target nucleic acids from a mixture of nucleic acids
presenting said unknown sample. [0050] k. In a preferred aspect of
this embodiment, the target nucleic acids of interest present in
said unknown sample are enriched by selective capture using probes
having complete or partial sequence homnology, followed by
amplification and hybridization to the microarray. [0051] l. In a
preferred aspect of this embodiment, said detecting the presence
and/or identity is by (a) making base calls based on the
hybridization response between the nucleic acids of interest in
said unknown sample and the probe DNA on the resequencing DNA
microarray and (b) determining the sequence of the corresponding
full-length gene or genomic fragment by comparing the sequence of
the hybridized region to sequences present in a sequence database.
[0052] i. In a particularly preferred aspect, the base calls are
made by the Affymetrix GDAS software under "permissive" settings.
[0053] ii. In a particularly preferred aspect, sequence
determination is by Resequencing Pathogen Identifier (REPI)
software (see U.S. provisional Application Ser. No. 60/609,918
filed on Sep. 15, 2004, and U.S. provisional Application Ser. No.
60/631,460, filed on Nov. 29, 2004, which are incorporated herein
by reference in their entirety). [0054] iii. In a particularly
preferred aspect, the sequence database is GenBank. [0055] D. In an
embodiment of the present invention is a method of routine
diagnosis of common respiratory pathogens and/or biological
terrorism agents by using the method of C above. [0056] E. In an
embodiment of the present invention is the genomic sequences of
thirteen adenovirus strains, which were not known as of the date of
the present invention, including: Ad3, Ad3FS_navy, Ad4, Ad4vaccine,
Ad4FS_navy, Ad4FS_AF, Ad5FS, Ad7, Ad7FS_navy, Ad7 vaccine, Ad16,
Ad1, and Ad21, and fragments thereof. [0057] F. In an embodiment of
the present invention-is a method of surveillance of common
respiratory pathogens and/or biological terrorism agents by using
the method of C above. [0058] G. In an embodiment of the present
invention is a method of estimating the relative amount of a
pathogen in a biological sample containing the same comprising
providing a resequencing DNA microarray as described above; (2)
contacting to said resequencing DNA microarray said biological
sample; (3) hybridizing the contents of said unknown sample to the
probe sequences immobilized on said resequencing DNA microarray
under suitable conditions and for a suitable time; and (4)
quantifying the presence and/or identity of a drug-resistance
marker in a microorganism or a microorganism belonging to a
particular class of organism species or subspecies in said unknown
sample. [0059] a. In a preferred aspect of this embodiment, said
quantifying is by determining the absolute intensity of the
hybridization signals on the chip. [0060] b. In a preferred aspect
of this embodiment, said quantifying is by determining the
percentage of base calls, both as a percentage of the total tile
region size and as a percentage of base calls within a selected
subsequence satisfying the sliding window algorithm (i.e.,
REPI).
[0061] The above objects highlight certain aspects of the
invention. Additional objects, aspects and embodiments of the
invention are found in the following detailed description of the
invention.
BRIEF DESCRIPTION OF THE FIGURES
[0062] A more complete appreciation of the invention and many of
the attendant advantages thereof will be readily obtained as the
same becomes better understood by reference to the following
Figures in conjunction with the detailed description below.
[0063] FIG. 1 shows a graphical description of the Version 1
Respiratory Pathogen Microarray (RPMV1). The geometrical
distribution of all tile regions dedicated to each pathogen is
represented by assigned colors (right). The Affymetrix spike-in
controls are at the top of the microarray (white). Black regions
interspersed between tiled regions did not contain probes.
[0064] FIG. 2 shows purified adenovirus 4 prototype (accession
number AY594253) DNA hybridization to RPMV1 following nucleic acid
isolation and amplification using either (A) degenerate primer PCR
(Lin et al., 2004) or (B) GenomiPhi total amplification of target
DNA, as described in Example 1. Degenerate primer PCR (A) resulted
in hybridization of targets within those tiled regions covered by
conserved primer sites. Total amplification (B) resulted in target
hybridization across the entire Ad4 tile regions. Neither method
resulted in significant cross-hybridization across the microarray.
In each case, REPI identified the correct strain of adenovirus 4
(AY594253) as having achieved the highest BLAST ("Basic Local
Alignment Search Tool") bit score across all Ad4 tile regions where
amplicons hybridized, except in one case (described in Example
1).
[0065] FIG. 3 shows purified adenovirus 5 field strain (Ad5FS,
accession number AY601635) (A, B) and adenovirus 7 prototype
(accession number AY594255) (C, D) DNA hybridizations to RPMV1
following nucleic acid isolation and amplification using either (A,
C) degenerate primer PCR (Lin et al., 2004) or (B, D) GenomiPhi
total amplification of target DNA as described in Example 3.
Degenerate primer PCR (A, C) resulted in hybridization of targets
within those tiled regions having conserved primer sites for both
the Ad5FS and the Ad7 prototype. Total amplification (B, D)
resulted in target hybridization across the entire AdS and Ad7 tile
regions. Neither method resulted in significant cross-hybridization
across the microarray. In each case, REPI identified the correct
strain of adenovirus 5 (Ad5FS, accession number AY601635) or
adenovirus 7 (accession number AY594255) as having achieved the
highest BLAST bit score across all respective tile regions where
amplicons hybridized. For adenovirus 7, the correct assignment was
also made for the Ad7 prototype on the Ad7 vaccine tile region,
distinguishing it correctly from Ad7 vaccine strain (AY594256) from
which the tile region was derived. Also, it is noteworthy to
mention that the degenerate primer solution used for amplification
of all adenoviruses herein was unintentionally contaminated with
small amounts of Ad4 and Ad7 prototypes (not considered positive
for contamination by real-time PCR). The contaminants did not
result in easily discernible hybridizations (as shown in FIG. 3)
but base calls were made by Genetic Data Analysis Software (GDAS)
(packaged with the microarray reader form Affymetrix, Santa Clara,
Calif.) and Resequencing Pathogen Identifier (REPI) assigned all
unintentional base calls to the correct contaminant. The GenomiPhi
solution did not become contaminated, and no base calls were made
on adenovirus tile regions outside of the Ad5 and Ad7 tile regions
in respective experiments.
[0066] FIG. 4 shows the results of the adenovirus 4,5 breakthrough
strain as described in Example 4. An aliquot of purified nucleic
acid from culture of throat swab of a patient with febrile
respiratory illness (who was previously vaccinated against
adenovirus) was amplified using conserved/degenerate primers as
described (Lin et al., 2004) and otherwise processed according to
the standard Affymetrix CustomSeq protocol. The patient was
diagnosed as Ad4 positive via serotypic antibody neutralization,
and diagnosed as Ad5 positive via a hexon sequence. The present
inventors obtained multiple target hybridizations, resulting in
base calls on all the adenovirus 5 and adenovirus 7 prototype tile
regions. REPI analysis revealed that the adenovirus was identified
with high confidence as Ad5 (Ad5FS, accession number AY601635),
while multiple Ad7 prototype regions gave evidence of a B subgroup
species other than Ad7, namely Ad21. It is noteworthy again to note
that the degenerate primer solution, used for amplification of all
adenoviruses herein was unintentionally contaminated with small
amounts of Ad4 and Ad7 prototypes, as in the case shown in FIG. 3.
The contaminants did not result in easily discernible
hybridizations in the Ad4 tiles (as shown in FIG. 4) but base calls
were made by GDAS and REPI assigned all unintentional base calls to
the correct contaminant (adenovirus 4 prototype; accession number
AY594253), not a field strain that circulates in basic military
training venues. Moreover, the highest bit score achieved in the
Ad7 tiles was not Ad7 prototype (despite low-level contamination)
but adenovirus type 21 (Ad 21; AY601633), another member of the B
subgroup of adenoviruses represented by the Ad7 prototype
(AY594255). The observation of a mixture of Ad5 and Ad21 in this
sample was confirmed using conventional DNA sequencing for Ad5,
amplicon sequencing for Ad21, and spotted microarray results (Lin
et al., 2004). Yet, the low-level contaminant of Ad7 prototype was
still detected and distinguished by high bit scores in a single
tile region subsequence. Thus, in this example, an unanticipated
mixed adenovirus sample comprised of AdS and Ad2l, inadvertently
contaminated with small amounts of Ad4 and Ad7 prototype (not found
in field samples), was completely resolved. Resolution of this fine
detail would be impossible using other types of microarrays (i.e.
spotted) without anticipation of such types of events and extensive
probe design effort.
[0067] FIG. 5 shows the results of influenza A strain
identification as described in Example 5. Aliquots of a nasal wash
from an Influenza A (+) febrile patient (confirmed by standard
viral cell culturing techniques) who was previously immunized
(using A/Moscow/10/99(H3N2))--based vaccine) during the 2003-2004
flu season were amplified using (A) universal (Hoffmann et al.,
2001) or (B) multiplex (Offringa et al., 2000) RT/PCR primers, and
processed according to the standard Affymetrix CustomSeq protocol.
Because both strategies produce amplicons of the entire HA, NA, and
M genes, the respective tile regions on the microarrays were almost
completely hybridized in both cases. REPI output for both cases
showed that the highest bit score for HA3 was obtained for the
Fujian/411/2002 influenza strain
(ISDN38157_InfluenzaA/Fujian/411/2002_Hemagglutinin.sub.--1042)
that evaded vaccine protection during the 2003-2004 flu season. The
prototype influenza A HA sequence used to define the HA3 tile
region, (A/Panama/2007/99/H3N2), was not present in the REPI output
for the base calls on the HA tile. Thus, a prototype region for an
expected strain of influenza A allowed identification of an
unexpected strain.
[0068] FIG. 6 shows the results of the reduced hybridization time
assay as described in Example 6. In this example, a nasal wash that
was confirmed negative for all targets probed by the microarray
except the erythromycin resistance markers SPYERMB, SPYERMTR, and
SPYMEFAE was subjected to specific multiplex PCR for each of these
three markers. The amplicons were then hybridized to separate
microarrays for either 16 hours (A) or 15 minutes (B), otherwise
being processed in accordance with the Affymetrix CustomSeq
protocol. In comparison, the signal intensities for the 15-minute
hybridization microarray (B) were lower than those on the control
16-hour microarray (A) (note incomplete hybridization to the
Affymetrix control probes at the top of the microarray in (B)).
However, the REPI output for each of the three tile regions showed
that the highest bit scores in each region were the same for both
(A) and (B), although both the bit scores and expect values were
lower in the cases of reduced hybridization times. Similar results
were obtained for 30 minute and 1 hour hybridizations, with an
increase in the number of base calls made with increasing
hybridization times. However, this example clearly illustrates the
robustness of the method to make fine scale discrimination between
targets with a range of different hybridization patterns.
[0069] FIG. 7 shows the effects of subtractive hybridization
approaches as described in Example 7. (A) Shows the hybridization
pattern obtained following total amplification of the isolated
nucleic acids from an aliquot of nasal wash from a patient with
febrile respiratory illness at Lackland AFB. This sample was
positive for Ad4 at an estimated titer of 10.sup.4 genomic copies
per microliter. The high background hybridization across the
microarray prevents GDAS from making base calls even though the
adenovirus type 4 region showed a discernibly higher signal than
that of the overall background. Co-hybridizing the same set of
total amplicons obtained in (A) with COT-1 fraction human genomic
DNA (B) did little to improve this as again no base calls were made
by GDAS. (C) Shows that the use of a magnetic bead-based
subtraction alone, prior to total amplification, did not result in
a sufficient number of base calls to allow similarity searching.
However, through the combined use of a bead-based subtraction
(Streptavidin-coated beads conjugated with biotinylated COT-1 human
DNA) prior to amplification and co-hybridization with solution
phase COT-1 human DNA (D), enough base calls could be made to
unambiguously rank the adenovirus 4 Air Force field strain
(accession number AY599837) highest in each of three subsequences
identified in the Ad4HEXON-1 tile region (D). Moreover, by
performing the same set of combined steps using 2 microliters of
starting sample material instead of 1 microliter (E), base calling
was extended into the Ad4FIBER tile region in addition to
Ad4HEXON-1, allowing unambiguous high bit-score ranking for an Ad4
field strain in the Ad4HEXON-1 tiles and in several Ad4FIBER
subsequences.
[0070] FIG. 8 shows hybridization of Influenza A targets (Fujian
411/2002), spiked into normal nasal wash, following amplification
using a recently-described modification (Kessler et al., 2004) of a
previous (Wang et al., 2003) protocol for non-biased amplification
of viral RNA genomic targets. FIG. 8(A-C) show hybridization
patterns for 10.sup.5, 10.sup.3, and 10.sup.1 plaque-forming unit
(pfu) spike-in amounts, respectively (see Example 9). These results
demonstrate that the efficacy of the present approach can be
retained even when a non-biased amplification scheme is used. This
should allow extension of the overall approach to preparation of a
multiplicity of unknown RNA targets and for incorporation of this
particular protocol into a combined method for universal
amplification of both RNA and DNA pathogen targets for the
resequencing microarray.
[0071] FIG. 9 shows a flowchart representation of an embodiment of
the present invention in which the RNA and DNA pathways are
converged.
DETAILED DESCRIPTION OF THE INVENTION
[0072] Unless specifically defined, all technical and scientific
terms used herein have the same meaning as commonly understood by a
skilled artisan in enzymology, biochemistry, cellular biology,
molecular biology, bioinformatics, and the medical sciences.
[0073] All methods and materials similar or equivalent to those
described herein can be used in the practice or testing of the
present invention, with suitable methods and materials being
described herein. All publications, patent applications, patents,
and other references mentioned herein are incorporated by reference
in their entirety. In case of conflict, the present specification,
including definitions, will control. Further, the materials,
methods, and examples are illustrative only and are not intended to
be limiting, unless otherwise specified.
[0074] It is well known that DNA microarrays might be used to
analyze the sequence of target nucleic acids from pathogens for
diagnostic or surveillance purposes. The effects of oligonucleotide
probe sequence selection, length, positioning on surfaces, the
physical chemistry of binding, and labeling techniques, have been
the topics of vigorous research in the past ten years. However,
heretofore, the approaches taken have almost exclusively involved
the use of single oligonucleotide probes (13-70 mers) that are
specifically designed to hybridize to-a single recognized pathogen
target sequence with high specificity (13-25 mers) or to a longer
stretch of pathogen target with a lesser degree of specificity (70
mers). These microarrays are commonly known as spotted microarrays,
but the same content of a spotted microarray can also be embodied
in other forms besides two-dimensional planar surfaces, one example
being "bead arrays" (Ferguson, Steemers & Walt, 2000).
[0075] There have been many fewer descriptions of pathogen
identification and characterization using Affymetrix resequencing
microarrays. Kozal (Kozal et al., 1996) measured mutational drift
in HIV and Gingeras (Gingeras et al., 1998; Gingeras et al., 2001),
and Troesch (Troesch et al., 1999) identified mycobacterial species
and Single Nucleotide Polymorphisms (SNPs) relating to antibiotic
resistance using resequencing arrays. Wilson et al. (Wilson et al.,
2002a) used the same types of arrays for bacterial identification
using ribosomal RNA and for detection of multiple biowarfare agents
(Wilson et al., 2002b).
[0076] The use of resequencing microarrays for simultaneous
pathogen identification and surveillance has fundamental advantages
over single oligonucleotide approaches because detailed target
sequence information becomes directly available as part of the raw
microarray data. By definition, spotted microarrays having 10.sup.2
to 10.sup.4 oligonucleotide probes are incapable of resolving
subtle sequence variations for more than 10-1000 base pairs of
total target sequence, even if the array were designed with a
tiling strategy. Thus, spotted oligonucleotide microarrays that
facilitate identification of a broad spectrum of pathogens at an
individual strain or SNP level (Cherkasova et al., 2003; Wang et
al., 2002; Wang et al., 2003) require that the pathogen target
sequence be determined using conventional DNA sequencing
technology, requiring specialized equipment, personnel, and time
periods of several days.
[0077] Accordingly, the present invention generally relates to the
detection and identification of bacterial, viral and protozoan
pathogens and virulence markers using DNA microarray technology.
The present invention also relates to the design, test, and
analysis of resequencing microarrays having defined regions that
can be used to assign numeric detection probabilities for a large
number of specific pathogen genotypes and mixtures of pathogens.
Furthermore, it relates to methods required for the processing of
both simple and complex (e.g. clinical and environmental) samples
for said microarrays.
[0078] More specifically, the present invention relates to the use
of high plurality resequencing microarrays as a novel class of
devices for purposes that would include diagnosis of infectious
disease agents and pathogen surveillance. In particular, it
pertains to DNA microarrays that employ a large number (hundreds to
thousands) of selected "prototype" nucleic acid sequence regions
(typically 250-2000 base pairs each) from target genomic sequences
to detect and identify a logarithmically larger number (thousands
to tens of thousands) of both unrelated and closely related (strain
level) microbial pathogens. More-particularly, the invention
relates to the design of microarrays fabricated to allow a
resequencing of the target nucleic acids using a "tiled" probe
approach. Yet even more particularly, the invention concerns the
design of Affymetrix resequencing microarrays using tiled probes
that will allow for analysis of up to 10.sup.5 to 10.sup.6 base
pairs of target sequence.
[0079] In a particular embodiment, the present invention provides a
method for simultaneously assaying for a large number of pathogens
by detecting their nucleic acid sequences and subjecting the
detected sequences to an automated similarity search against public
domain and private databases. This object is accomplished by a
non-obvious adaptation of a commercial technology (Affymetrix
CustomSeq.TM.). The program employed, CustomSeq.TM., was designed
for detection of Single Nucleotide Polymorphisms (SNPs) by
resequencing target DNA. This so-called SNP-detection calls for:
(1) an error rate in base calling that is much lower than the
naturally occurring frequency of mutation (in humans approximately
(1 per 10.sup.8 base pairs), (2) amplification of the purified
starting material, at a concentration of 10.sup.6 genomic
copies/microliter, prior to microarray hybridization using specific
PCR primers, and (3) combination and processing multiple chips for
replicate samples by an algorithm that restricts base calls to
those having a degree of confidence at which the presence of low
frequency SNPs can be deduced. The present inventors demonstrate
that Affymetrix resequencing technology can be adapted for
insertion into an integrated system for high multiplicity
infectious disease diagnostics and pathogen surveillance in a time
scale and level of sample preparation complexity that are enabling
for point-of-care diagnostics applications. This system is
substantively different from the intended use of the Affymetrix
technology, and is not obvious to the typical skilled practitioner
of microarray technology.
[0080] At present, there have been few, but relevant literature
descriptions of application of Affymetrix resequencing technology
to pathogen identification. From the design, experimental, and
analysis standpoints, the present invention has substantive
advantages over the prior art in pathogen characterization using
resequencing. Kozal (Kozal et al., 1996) measured mutational drift
in HIV and Gingeras (Gingeras et al., 1998; Gingeras et al., 2001),
and Troesch (Troesch et al., 1999) identified mycobacterial species
and SNPs relating to antibiotic resistance using resequencing
arrays. In each of these cases, optimized sequences were selected
for tiling, based on alignments of the possible target sequences.
Specific PCR primers were designed to amplify targets for
hybridization. Unknown clinical isolate identifications were made
using pattern recognition algorithms based on empirically
determined differential hybridization patterns to the arrays. As
such, this approach would depend on the amplification and
hybridization of the full length of the target sequences, and would
not be amenable to: (1) contributions of confounding non-specific
binding resulting in lost base calls, (2) incomplete hybridization
across the length of the target due to low target concentration or
gaps caused by low homology, and (3) compromised hybridization
integrity caused by non-biased (total) amplification of unknown
target sequences from a clinical or environmental sample.
[0081] The method proffered by the present inventors does not share
the same limitations as those disclosed in the prior art using
resequencing microarrays. Moreover, technology enablements for at
least one intended use, namely the simultaneous detection of a
large number of diverse pathogen species, have only recently
occurred with the introduction of Affymetrix microarrays
(18.times.18 micron features) for CustomSeq.TM. RPMV1 chip and
higher density (8.times.8 micron features) for Respiratory
Microarray Version 2 (RPMV2), allowing 29.7 kb and 300 kb,
respectively, to be tiled for resequencing. However, the most
significant improvement offered by the present invention is the use
of the increased density chips with sequence length-independent
similarity searches (BLASTN), which affords that many fewer
assumptions must be made in advance of selecting sequences for
tiling. Furthermore, the use of length-independent similarity
searches (BLASTN) removes the constraint that a particular known
subsequence be fully resequenced, making the approach more
resistant to variations in target concentration and contributions
from nonspecific binding leading to lost base calls.
[0082] Thus, in the present invention, sequences selected for
tiling are "prototypes" in the sense that a single gene or
subsequence may represent a much broader class of organism species
and subspecies (alternatively types, strains, variants, or
mutants). The resulting method is robust with respect to minor
variations in the genotypes of individual pathogens and strains,
and enables detection and probable identification among a plurality
of candidate pathogens that may not be explicitly represented in
the design of the experimental chip layout, including the tiled
regions.
[0083] The chip layout also takes advantage of partially redundant
tile sets (these were discouraged by the manufacturer's design
guidelines and by the manufacturer during design), both intragenic
and intergenic variation within a single pathogen strain, and
between multiple similar or diverse pathogen types. However, the
present inventors have determined that the redundancy is important
to elevate confidence in results and to minimize likelihood of
false positive and false negative results. Advantages of this
design/analysis approach will permit incorporation of both
conserved and hyper-variable regions of genomes, facilitating
group, type, and strain level identification.
[0084] There are only a few literature reports describing generic
amplification strategies for microarray analysis of microbial
pathogens. Wang et al. (Wang et al., 2002) described a multi-step
process for amplification of RNA pathogen targets followed by
sequential PCR and Klenow fragment-based amplifications, prior to
hybridization on 70 mer oligonucleotide arrays. The time and/or
number of technical steps required for this amplification were not
specified but a subsequent paper from the same group (Wang et al.,
2003) reported it to be approximately 24 hours. It was also not
specified which of the amplification steps resulted in enhanced
detection efficiency. A recent report from our group (Vora et al.,
2004) describes a variety of nonspecific nucleic acid amplification
techniques, individually and in combination, for hybridization to
70 mer oligonucleotide arrays. There are no known reports of prior
art in the area of nonspecific amplification of DNA targets, either
purified or in a complex mixture (e.g. clinical sample), for
hybridization to a resequencing microarray.
[0085] The present invention generally advances the art of generic
amplification for pathogen detection by microarrays in several
ways: (1) it details specific methods for total amplification of
purified or highly enriched pathogen nucleic acids for presentation
to a high density short (25 mer) Affymetrix resequencing array,
which is neither previously described nor obvious, since the
methods are a large departure from the prescribed Affymetrix
protocols, and (2) it describes novel alternative approaches to the
use of enzymatic processes, competitive hybridizations, and
magnetic bead-based subtractive and enrichment steps to reduce
background and subsequent non-biased (i.e. total) amplification and
microarray hybridization.
[0086] The present invention is embodied by a specific set of
design and processing methods that enable broad-scale pathogen
identification and characterization by utilizing resequencing
microarrays. Specifically, the present invention allows for the
precise, sensitive, and high confidence identification of a large
multiplicity (thousands) of pathogens in a single assay.
[0087] In an embodiment of the present invention is a resequencing
DNA microarray chip of multiple groups of oligonucleotide primers
of a length ranging from 13 to 70 nucleotides (preferably 25
nucleotides, although it is possible and is within the scope of the
present invention to use primer lengths corresponding to each
integer value within this recited range) immobilized to a solid
phase support, wherein each group of oligonucleotide primers is
selected to span a particular region of the reference sequence,
occupying a discrete area of the array (e.g. a tile), and
comprising at least four sets of primers arranged in a parallel
fashion on the chip: 1) a first set that is exactly complementary
to the reference sequence; and 2) three additional sets of primers,
each of which is identical to the first set of primers but for the
nucleotide at a central position, which is different in each of the
three sets such that all four conventional nucleotide bases are
present on said array.
[0088] The present invention further provides methods for
processing complex clinical samples (e.g. nasal wash) requiring
minimal nucleic acid isolation/amplification step(s).
[0089] The present invention is distinct from the overwhelming
majority of microarray-based pathogen detection schemes because it
uses high-density "tiled" microarrays to determine the actual
sequences of pathogen genetic targets. It is further unique from
other resequencing pathogen identification strategies in a number
of important areas, including incorporation of: (1) a high
multiplicity of disparate pathogen "prototype" target regions that
exhibit little or no discernible cross-hybridization or
interference with one another, (2) a high sequence redundancy
within closely related pathogens that allows higher confidence
identification of specific strain (e.g. adenoviruses or influenza
viruses), (3) "prototype" regions of large tiled segments
representative of a class of pathogens allowing for the precise
identification of specific pathogen strains and the use of
specially-designed software to parse and arrange sequence fragments
for presentation to similarity search (e.g. BLAST) algorithms,
allowing discrimination of pathogen mixtures and recombination
events between pathogens, instead of the more restrictive tile
selections and differential algorithm described in the most
closely-related prior art (U.S. Pat. No. 6,228,575), (4)
minimally-biased nucleic acid amplification strategies that allow
precise, high-confidence pathogen target resequencing without
significant interference or cross-hybridization, and (5) sample
processing methodologies that allow the resequencing array to be
used in conjunction with minimally-biased nucleic acid
amplification strategies on complex clinical samples.
[0090] The combination of these methods allows the simultaneous
detection and identification of a high multiplicity of pathogen(s)
from a clinical sample by a single qualified technician within a
period of 24 hours, but preferably within 4 hours, more preferably
with 2 hours, most preferably within 30 minutes.
[0091] Accordingly, owing to its embodiments, the present invention
supports: (a) routine diagnosis of infection in a clinical setting
within several hours of sample collection, (b) simultaneous
interrogation of the sample for indications of a rare infectious
event (e.g. unanticipated pathogen, antibiotic resistance pattern
or biological warfare agent), (c) routine molecular pathogen
surveillance, (d) vaccine quality control and (e) monitoring of
genetic changes of a pathogen as a result of natural genetic
variations, drug treatment, intentional manipulation, or other
events.
High-Density Resequencing Microarrays
[0092] High-density microarrays (HDMs) are fabricated by
light-directed combinatorial synthesis of DNA oligomers (Kozal et
al., 1996). The DNA oligomers synthesized on these sites typically
have lengths of 20-30 bases. Through subsequent improvements to the
method using high-resolution semiconductor photoresists, Affymetrix
has demonstrated fabrication of HDMs having individual features
with resolutions approaching 1 .mu.m.sup.2, enabling probe feature
densities of 10-100 times greater than that demonstrated in the
RPMV1. To date, HDM designs relevant to pathogen identification
have been based on a "tiling" strategy. Accordingly, four probes of
equal length are synthesized for each base in both the sense and
antisense directions, requiring that a total of eight 25-mer probes
are used for each base pair in a given reference sequence. One
probe in each direction (sense and antisense) exactly complements
the reference sequence while three others have a single base
mismatch at the position of the interrogated base. Thus, a tiled
HDM can effectively allow the target nucleic acid to be
"resequenced".
[0093] In this manner, the base calls of the unknown target are
interrogated at each of one of four possible base positions (one of
every four possible base pairings varied at the number 13 position
in the tiled 25 mer probes), allowing a direct read of the target
sequence from the corresponding positions across the array. In the
case of CustomSeq arrays, the GCOS (Version 1.1) software is used
to reduce the raw image (.DAT) file to a simplified file format
(.CEL file) with intensities assigned to each of the corresponding
probe positions. Finally, the GDAS (Version 2.0) software is used
to apply an embedded version of the ABACUS (Cutler et al., 2001)
algorithm to produce an estimate of the correct base calls,
comparing the respective intensities for the sense and antisense
probe sets. One of the available export file types from GDAS is the
FASTA-formatted base calls made for each tiled region of the
resequencing array.
[0094] HDMs of the type described above have been used to identify
pathogen species and detect drug resistance-conferring mutations in
a series of in vitro experiments using cultured microorganisms,
including HIV (Kozal et al., 1996). Troesch et al. (Troesch et al.,
1999) designed HDMs to discriminate between 54 different to detect
Mycobacterium species and Mycobacterium tuberculosis rifampin
resistance. A tiled array of 65,000 oligomer probes was used to
accurately resequence 70 clinical isolates of 27 mycobacterial
species and 15 rifampin-resistant M. tuberculosis strains. More
recently, sequence-specific identification of F. tularensis and Y.
pestis was demonstrated in environmental samples using tiled HDMs
(Wilson et al., 2002b). Both of these general approaches rely on
specific hybridization patterns based on ground-truth (control)
measurements. Moreover, the authors have provided no direction on
how a quantitative comparison might be made against closely-related
or unanticipated organisms for starting concentrations that may
vary by six orders of magnitude.
Array Types
[0095] The present invention is developed using Affymetrix
CustomSeq resequencing microarrays. For a discussion of
resequencing microarrays, the artisan is directed to U.S. Pat. No.
6,228,575, which is incorporated herein by reference in its
entirety. However, the present invention is not conceptually
limited to microarrays produced using that specific fabrication
strategy. In principle, resequencing can be performed on the scale
described with any technology that is capable of producing
microarrays with sufficient feature density to allow this approach.
Theoretically this can be accomplished using an oligonucleotide
printing technology, but it is more likely accomplished using a
photolithographic approach. Whereas the Affymetrix resequencing
chips are based on sequential lithographic steps using a separate
mask that corresponds to each step, an alternative approach could
use maskless lithography (Albert et al., 2003; Nuwaysir et al.,
2002) or by nanolithographic methods (Ginger, Zhang & Mirkin,
2004). More generally, any method of producing a plurality of
oligonucleotide probes for the purpose of determining target
sequence would be applicable, even bead "arrays" that are not in a
2-dimensional format (Ferguson et al., 2000).
[0096] The probes themselves could be comprised of variants of DNA,
namely RNA or oligomeric peptide-nucleic acids (PNA). The probes
can be made sensitive to enzyme digestion, then subject to
subsequent handling. In a preferred embodiment, the probes will
incorporate dUTP instead of dTTP, making them sensitive to
uracil-DNA-glycosylase. This will make them amenable to selective
degradation following the capture of target. Moreover, within the
scope of the present invention it is also possible to immobilize
RNA and obtain complementary sequence recognition thereof.
Immobilization of RNA would require chemical stabilization of the
RNA. In a more general sense, the probes can be made from
chemically modified nucleic acids that would make them more or less
susceptible to subsequent chemical processing steps.
Array Design
[0097] This present invention, using the RPMV1 microarray,
illustrates the ability of tiled prototype sequences to identify a
wide variety of specific pathogen strains without assuming that
specific hybridization patterns are required for specific pathogen
identification. In the current apparatus, the prototypes for tiled
regions, particularly those for adenoviruses 4, 5, and 7, were
empirically chosen to be representative of adenovirus subgroups E,
C, and B, respectively.
[0098] A more preferred and systematic approach to the design would
involve the use of multiple sequence alignments to produce
consensus sequences, where consensus sequences are defined as those
representing the most frequent bases at the alignment position. In
a preferred embodiment, the alignment algorithm will produce a
hierarchical phylogenetic tree for target gene sequences from
individual pathogen strains or for a group or family of pathogens.
A consensus sequence will first be formed for members of each node
or group of nodes on the tree, whose distance measures fall within
a threshold, using an appropriate algorithm (Lee, 2003). The actual
target sequences would then be compared individually to the
consensus sequence, and the effects of hybridization to a tiled
microarray region defined by the consensus sequence would be
simulated. Known effects such as insertion or deletion of bases, as
well as the effects of multiple base substitutions within an
oligonucleotide probe region would define rules for the simulated
hybridization. Subsequent analysis of the resultant hybridization
and base call pattern would then indicate the suitability of a
proposed tile for acting as a prototype for a given range of
pathogens. This process would be reiterated until the most suitable
set of prototype regions were defined for coverage of a given set
of pathogens. In one very preferred embodiment, this process would
be used to maximize the efficiency of space usage on a chip,
resulting in the greatest reduction in the overall physical
dimensions of a resequencing microarray, and thus, the greatest
reduction in manufacture cost per microarray.
[0099] In one preferred embodiment, the resequencing array would be
designed in combination with another less complex solid phase
capture device (array, gel, or otherwise) that that would be used
initially to queue the resequencing array. For example, an array
comprised of a multiplicity of long oligonucleotide probes would be
used initially for the detection of pathogen by recognizing
conserved sequences within a family of pathogens. The resequencing
array could then be used to interrogate variable regions associated
with the conserved probes on the initial array, providing detailed
sequence information on the pathogen. In a very preferred
embodiment, the sample preparation would be common for both the
solid phase capture device and the resequencing microarray. In yet
another very preferred embodiment, the initial array would be used
to capture target that would be isolated and then amplified using
non-biased amplification techniques for presentation to the
resequencing array.
Overall Design Philosophy for the Intended Use
[0100] This invention leverages pathogen genome information from
various public and/or private sources to, design, fabricate,
evaluate, validate and integrate an advanced diagnostics platform
as part of an effective biodefense surveillance and operational
medicine system. It is an integral facet of the Epidemic Outbreak
Surveillance (EOS) program biodefense model that the system would
most effectively be enabled through the adoption of a widely
distributed device that would find utility in routine diagnostics
of infectious diseases, most particularly in diagnosis of
infectious respiratory disease (see the Defense Sciences Board
Summer 2000 Study Report and the Health Sciences Biodefense System
(HSBS) final briefing). It is important that the device (e.g.
microarray) will provide a cost-effective alternative to
conventional approaches to the diagnosis, management, and
surveillance of infectious diseases, most particularly respiratory
infections. It is equally important that the device is supported
with design and analysis informatics, and to ensure that decision
quality information from that device will be transmissible and
interpretable by a variety of care providers, public health
officials, and decision makers. Thus, it is a critical objective of
this invention that the device (e.g. microarray) be a key component
of an integrated system comprised of a local point-of-care
diagnostic device that provides automated, two-way data sharing
between health care providers, public health officials, and
decision makers (this a maybe a cross-reference to the business
model patent that is in the queue). The invention described herein
might fulfill this role in at least two ways: (1) through reduction
in array size (i.e. lower cost), process automation, and the
availability of portable hardware for processing resequencing
arrays, the invention could become the object of the point-of-care
device itself, and (2) in the event of lower-cost or easily
automated microarray alternatives, the resequencing array can be a
higher echelon component in a diagnostics/surveillance pipeline. In
the latter case, the lower-cost alternative device will provide for
initial sample processing, pathogen target enrichment,
amplification, decision information at the point-of-care, while the
resequencing capability provides for a sequential testing
capability by facilitating a more detailed interrogation of the
sample when desired.
General Strategy for Resequencing Microarray Chip Design
[0101] According to the present invention, the process of designing
resequencing microarray chips is carried out by selecting pathogen
genomic sequences having sequence properties that make them unique
to a small number (ideally one) of pathogens, or are highly
conserved, allowing them to detect many types of microbial species
at the family or genus level, or are moderately conserved and
selected as "prototype" regions. Prototype. regions will have an
intermediate level of sequence homology across a group of microbial
species and allow for both efficient hybridization and unique
identification of most or all of a subtype of pathogenic species.
The design strategy for layout of resequencing tiles includes
leveraging from alignment of similar sequences and application of a
consensus probe sequence for tiling to the chip. The consensus
sequence might not be identical to any of selected pathogens, but
will interact with a plurality of similar pathogen genomes. The
patterns of perfect and imperfect matching of real pathogen genome
sequences with the consensus sequence will provide diagnostic
discrimination power.
[0102] In a particular embodiment of the present invention is a
general protocol for designing resequencing microarray chips that
permit the identification and designation of pathogens present in a
sample (e.g., a partially purified sample, a purified sample, an
enriched sample, a biological sample, etc.). The various aspects of
this design and validation protocol are embodied in the following 6
"phases."
Phase 1: Pathogen Identification:
[0103] a. Pathogen list (e.g. Adenovirus; Influenza; Streptococcus
pyogenes)--provided by pathogen experts or public domain
[0104] b. Genus/species (new pathogen i.e. Severe acute respiratory
syndrome (SARS) variant of corona virus)
[0105] c. Species/subspecies (Epidemiologic tracking;
forensics)
[0106] d. Pathogen unique segments (cross hybridization issues)
[0107] e. Issues of genetic shift/drift (e.g. Influenza, HIV)
[0108] f. Drug resistance markers
[0109] g. Pathogenicity-related genes or virulence markers (useful
for diagnostic and prognostic purposes)
[0110] h. Markers for genetically engineered traits
[0111] i. Plasmid DNA sequence (Bluescript, PUC etc.) [0112] i.
Multiple cloning sites [0113] ii. Drug resistance markers
(ampicillin, kanamycin, penicillin etc.) (or could list as: Amp,
Kan, PBP, etc.) [0114] iii. Toxins (botulinum; ricin etc) Phase 2:
Pathogen identification in relation to sample sources:
[0115] a. Common respiratory pathogens (and close relatives)
[0116] b. Biothreat agents: (identified by Centers for Disease
Control)
[0117] c. Consideration of background or commensal organisms in:
[0118] i. Clinical samples (nasal wash, swab, stool, etc.) [0119]
ii. Vectors (e.g. mosquito) [0120] iii. Environment (water, food,
soil) Phase 3: Gene Identification: (Most complicated issue
associated with pathogen design)
[0121] a. Sequence Homology (relative to pathogen identity) [0122]
a. Highly conserved (Genus/Species identification) Identification
of novel or unanticipated) [0123] i. Identify new/unanticipated
organisms [0124] 1. chimeras (bacterial genetic exchange) [0125] 2.
genetic shift/drift variants (i.e. Influenza) [0126] 3. man-made
[0127] ii. Distinguish complicated pathogen families [0128] 1.
Rhinovirus (many disparate variants) [0129] b. Less conserved to
hypervariable (Species/subspecies) [0130] c. Pathogen-unique
sequence segments (cross hybridization) [0131] i. Potentially
important for a total amplification approach
[0132] b. Functional Sequences (relative to pathogenicity and
patient management) [0133] a. Drug Resistance genes in commensal
microbes [0134] b. Pathogenicity related genes [0135] i. Toxin
genes [0136] ii. Transmission (Infectivity) related genes [0137]
iii. Pathogenicity islands [0138] iv. Virulence factors [0139] c.
Other Host-pathogen interacting genes [0140] i. Immune response
[0141] ii. Tumorogenesis [0142] iii. DNA repair Phase 4: Gene
Selection (What sequences should actually be placed on the
chip)
[0143] a. Identify Gene Accession Number(s) [0144] a. Complete gene
[0145] b. Pathogen genome for flanking sequences (Controls) [0146]
c. Most recent/prevalent variants available [0147] i. Especially
important for rapidly evolving pathogens (Influenza)
[0148] b. BLAST search: (Inclusion/Exclusion criteria) [0149] a.
Human sequence homology (exclusion criteria) [0150] b. Related
pathogen sequences [0151] i. Possible exclusion/annotation criteria
[0152] ii. If >90% homology, sequence needs only to be deposited
once [0153] 1. Use consensus sequence for identity and annotate as
such [0154] iii. Identify pathogen specific genes/sequences
[0155] c. Pragmatic Issues [0156] a. Horizontal gene transfer
issues in bacteria [0157] b. Number of copies of gene in pathogen
[0158] c. Sequence homology with human sequence (cross
hybridization) [0159] d. Sequence homology with less pathogenic
species (i.e. Bacillus; variola [pox]) Phase 5: RPM chip
prioritization
[0160] a. Define chip "real estate" (total sequences represented)
[0161] a. Target gene tile sizes [0162] i. Associate accession
numbers with tiles [0163] ii. Associate accession numbers with
consensus sequences
[0164] b. Refer to primary objectives for prioritization criteria.
Key issue include: [0165] a. Prevalence of pathogen depending on
targeted population, geographic location, season, and other disease
transmission factors [0166] b. Clinical, operational, and public
health relevance [0167] c. Chip functionality issues: [0168] i.
Mixed pathogens [0169] ii. Data annotation and presentation to
end-user Phase 6. Microarray Validation:
[0170] a. Design controls [0171] a. Nested primers sets: [0172] i.
Outer primer set: Develop controls [0173] ii. Inner primer set.
Assay positive control [0174] iii. Control and RPM sequence should
be 100% sequence match [0175] b. Control clones developed [0176] c.
Control clones sequence validated for match.
[0177] b. Develop pathogen chip annotation schema: [0178] a. Sites
of poor hybridization [0179] i. Poor signal [0180] ii. Incorrect
signal [0181] b. Sites of cross hybridization [0182] i. Human cross
hybridization [0183] ii. Other pathogens (especially bioterrorism
agents and toxins) [0184] c. Level of specificity of sequence
[0185] i. Markers that can identify subspecies/variants [0186] 1.
Forms the basis of a forensics database [0187] ii. Markers that can
identify genus or species only [0188] iii. Identification of new
variant markers as discovered
[0189] c. Chip Validation Experimental Framework: [0190] a. Human
hybridization to detect cross hybridization regions [0191] b.
Plasmid-based validation schema: [0192] i. Titrate plasmid for PCR
sensitivity [0193] ii. Titrate for RPM detection sensitivity [0194]
iii. Estimate base-calling accuracy with varied concentrations
[0195] 1. Annotate any errors [0196] c. Culture-based validation
schema [0197] i. Titration of virus [0198] 1. Efficiency of nucleic
acid isolation [0199] 2. Chip hybridization sensitivity/specificity
[0200] a. Total amplification sensitivity/specificity [0201] b. PCR
comparison [0202] 3. Chip hybridization sensitivity from titrated
culture [0203] 4. Sequence validation of culture sequence [0204] d.
Spiked pathogens into complex media [0205] i. Titration of virus
into solution [0206] ii. Hybridization to chip (background
interference) [0207] iii. Sequence validation of virus if different
than culture pathogen [0208] e. Estimation of target concentration
from complex matrices. [0209] i. Nasal wash [0210] ii. Cotton swab
[0211] 1. Nasal swab [0212] 2. Throat swab [0213] iii.
Stabilization of virus in solution [0214] 1. Nasal wash [0215] 2.
Swabs [0216] iv. Freeze/thaw effects of virus solution [0217] v.
Sequence validation of target pathogen
[0218] The following table (Table 1) represents a preferred (but
not limiting) set of pathogens (both viral and bacterial) that may
be used within the context of the present invention as designed for
the detection and diagnosis of common respiratory pathogens:
TABLE-US-00001 TABLE 1 Microarray Pathogens Viral Pathogens
Bacterial Pathogens Adenovirus (Serotypes/Genus level)
Streptococcus pyogenes (emm types/resistance) Influenza A and B
(strains) Mycoplasma pneumoniae Coronavirus/SARS Bordetella
pertussis Parainfluenza 1,2,3,4 Chlamydia pneumoniae Respiratory
Syncitial Virus Streptococcus pneumoniae Metapneumovirus Legionella
(genus level) Rhinoviruses Moraxella catarrhalis Coxsackie virus
Haemophilus influenza Echoviruses Neisseria meningitidis West Nile
Virus Mycobacterium tuberculosis Varicella (HHV-3) Staphylococcus
aureus Hantaviruses Arcanobacterium hemolyticum Rubella, Rubeola
Chlamydia psittaci Herpes simplex types 1 and 2 Enteroviruses
(mumps, polio) Parvovirus
[0219] For the intended use of a broadly distributed respiratory
diagnostic device with built-in surveillance capability for agents
of bioterrorism, the list of pathogens chosen for inclusion onto
the chip would also include those selected from the U.S. Centers
for Disease Control (CDC) Category A, B, and C bioterrorism agents.
These are most notably, but not limited to, include: [0220] CDC
Category A
[0221] Bacillus anthracis (targets: lethal factor, protective
antigen)
[0222] Yersinia pestis
[0223] Smallpox (variola major)
[0224] Francisella tularensis
[0225] Viral hemorrhagic fevers (filoviruses [e.g. Ebola, Marburg]
and arenaviruses [e.g. Lassa, Machupo]
CDC Category B
[0226] Brucella abortus (2308 B37), Brucella melitensis (F6145),
Brucella suis (A44)
[0227] Burkholderia mallei (Glanders)
[0228] Burkholderia pseudomallei (Meliodosis)
[0229] Psittacosis (Chlamydia psittaci)
[0230] Typhus fever (Rickeusia prowazekii)
[0231] Viral encephalitis (alphaviruses [e.g., Venezuelan equine
encephalitis, eastern equine encephalitis, western equine
encephalitis])
CDC Category C
[0232] Emerging infectious diseases such as Nipah virus and
hantavirus
[0233] A complete and updated listing of these agents may be found
on the CDC website (http://www.cdc.gov/).
[0234] For purposes of illustration of the present invention, two
resequencing microarray chips (RPMV1 and RPMV2) will be described
herein below:
Resequencing Respiratory Pathogen Microarray Version 1 (RPMV1
Chip)
[0235] RPMV1 was made using a high-density Affymetrix microarray
fabrication process having an individual probe feature size of
18.times.18 microns. At this density, 29.7 kb of total pathogen
target sequence was tiled for resequencing. The fabrication was
performed as part of a pre-production beta-test of an Affymetrix
commercial product (CustomSeq) that was intended for SNP detection
in an arbitrary collection of sequences.
[0236] The following overall design strategy was used for the
RPMV1:
[0237] The Affymetrix CustomSeq design protocol was followed. While
there were a variety of ongoing efforts within the project
consortium to non-specifically amplify targets for microarray
analysis, the present inventors made every effort to tile sequences
on the array that could interrogate conventional (specific primer
pair) PCR amplicons. Whenever possible, primer pairs were developed
or adapted for conserved sequences that flanked the variable
regions to be interrogated with tiled microarray probe sets. This
allowed for the ability to: (1) directly amplify with an existing
hardware platform (e.g. RAPID Light Cycler, Idaho Technologies),
(2) provide control measurements for comparison with generic or
total amplification strategies that are on the immediate horizon.
In general, our strategy was to match the hybridization patterns
with specific strains of organisms and to detect subtle variations
in sequence that corresponds to pathogenicity and drug
resistance.
[0238] Adenovirus (double-stranded DNA virus) types 4, 5, and 7
were designated as "prototypes" for the E, C and B subgroups,
respectively. Specifically, the present inventors postulated that
resequencing on the prototype-tiled regions would allow detection
and identification of subtle sequence variations between the
subgroup members. Three target gene regions were selected,
specifically from E1A, hexon, and fiber genes. However, it was not
postulated nor anticipated which regions or which sections of any
of the tiles would allow unique identification.
[0239] Thirteen adenovirus genomes were completely sequenced as
part of the Epidemic Outbreak Surveillance program. The names,
accession numbers, and sources for these are listed in Table 6 of
the Examples section. Multiple sequence alignments were performed
to determine variable regions of the E1A, hexon, and fiber genes
that were flanked by conserved regions that could be used to
amplify multiple adenoviruses with a single set of degenerate
primers (Lin et al., 2004). These common regions for E1A, hexon,
and fiber genes were obtained for each of the 3 prototype
adenoviruses that are associated with respiratory illness: 7
(subgroup B), 5 (subgroup C), and 4 (subgroup E) were submitted to
Affymetrix as part of the 29.7 kb total pathogen target sequence
for tiling on the RPMV1 microarray.
Adenovirus Taxonomy:
[0240] Sub-Group B: 3, 7, 11, 14, 21, 34,-35 and 50
[0241] Sub-Group C: 1, 2, 5, and 6
[0242] Sub-Group E: 4
[0243] The present inventors made the hypothesis that if they tiled
three genes (E1A, Fiber and Hexon) on the array for prototype
sub-group representatives, types 7, 5, and 4 (for subgroups B, C,
and E, respectively), they would be able to identify any of the
fully-sequenced types (listed above) by variations in the
hybridization patterns that map to their sequence differences.
[0244] Influenza A and B viruses, which are negative-polarity
single stranded RNA viruses (ssRNA), were represented with
prototype regions for hemagglutinin (HA) neuraminidase (NA) and
matrix (M). These genes were represented for three types of
Influenza A (H1N1, H3N2 and H5N1) and Influenza B. Influenza is one
of the best examples of a prototype model system, as hundreds, if
not thousands of influenza strains have been at least partially
sequenced, and most have been sequenced for the hemagglutinin and
neuraminidase segments.
[0245] Prototype Influenza HA, NA and M genes were chosen from
strains that were either identical of closely related to the three
vaccine strains recommended for the Northern hemisphere by the
World Health Organization; [0246] A/New Caledonia/20/99/(H1N1)
[0247] A/Moscow/10/99/(H3N2) [0248] B/Hong Kong/330/2001
[0249] These sequences were publicly available from the Los Alamos
National Laboratory influenza Internet database. The present
inventors postulated that the sequence calls made for unknown
Influenza A or B on the prototype tile regions would allow the
identification of that target if it was sufficiently similar to
allow similarity search-based querying.
[0250] The remainder of RPMV1 was populated with tiles for a
variety of common respiratory pathogens, the first set being
viruses: [0251] Rhinovirus A (pos)SSRNA [0252] Rhinovirus B
(pos)SSRNA [0253] Coronavirus (pos)SSRNA; no DNA stage [0254]
Parainfluenza (neg)SSRNA [0255] RSV (neg)SSRNA
[0256] These viral pathogens had relatively little sequence
available compared to adenovirus and influenza, complicating
efforts to perform analogous tests for use of prototype sequences
to identify a large number of related strains.
[0257] Common bacterial pathogens were also chosen: [0258]
Streptococcus pyogenes [0259] Mycoplasma pneumoniae [0260]
Bordetella pertussis [0261] Chlamydia pneumoniae [0262]
Streptococcus pneumoniae [0263] Neisseria meningitidis
[0264] In addition, the following plasmid-conferred antibiotic
resistance genes were represented on the RPMV1 chip: [0265] ermA
[0266] ermB [0267] ermTR [0268] macrolide-efflux determinant
(mef)A
[0269] The following biological threat agents were also included on
the RPMV1 chip: [0270] Bacillus anthracis (targets: lethal factor,
protective antigen) [0271] Yersinia pestis [0272] Smallpox (variola
major) [0273] Francisella tularensis [0274] Viral hemorrhagic
fevers (filoviruses [e.g. Ebola, Marburg] and arenaviruses
[e.g.
[0275] Lassa, Machupo])
[0276] In addition to the hybridization controls prescribed by
Affymetrix (see CustomSeq protocol), it was considered necessary to
include internal process controls to verify target isolation,
reverse transcription (RT) and cDNA/DNA amplification. Thus, we
included a specialized vector containing gene inserts from the
plant Arabidopsis thaliana (Yang et al., 2002) and provided tile
regions on the array for several genes inserts from that
vector.
[0277] For all targets other than adenovirus and influenza, the
present inventors selected targets from the open literature (e.g.,
NCBI GenBank) for genes containing diagnostic (via PCR) regions and
tiled that diagnostic region or a larger coding region containing
that region. The full listing of all target genes, PCR primer
sites, and primers for the RPMV1 are shown in Table 8 (infra) and
the attached Sequence Listing. The sequences submitted for tiling
and chip fabrication are summarized in Table 7 (infra) and the
attached Sequence Listing.
[0278] However, the present invention is not limited to the
specific sequences and/or microorganisms (bacteria or virus) listed
above. It is well within the purview of the skilled artisan to
tailor the "pathogen chip" to meet the particular requirements of
the intended field test. Such tailoring may be on a gene-level or
on a microorganism level. Clearly, the artisan would be greatly
assisted in the efforts by the general protocol above.
Resequencing Microarray Chip Version 2 (RPMV2 Chip)
[0279] RPMV2 is built on a high-density Affymetrix microarray
having an intended feature size of 8.times.8 microns. At this
density, approximately 300 kb of target sequence information may be
tiled for resequencing.
[0280] The general layout strategy described in Tables 2-4 (infra).
The RPM V2 design allowed for roughly 10 times the sequence content
of the RPMV1, and thus allowed inclusion of a large number of
pathogens, which included expanded representations of all the
pathogens present on the RPMV1, numerous other respiratory
pathogens, and virtually the entire set of CDC Category A, B, and C
bioterrorism agents. The full listing of the RPMV2 content is
provided in Example 8. The RPMV2 design expanded molecular
epidemiology capability by enabling inclusion of probes, such as
the emm types for Streptococcus pyogenes. This feature will provide
the epidemiologist/public health professional an ability to
determine a likely index disease case and to study transmission of
the typed pathogen in enclosed populations (ails, dormitories,
nursing homes, military barracks, etc.). These probes may also
predict pathogen virulence and clinical presentation of disease in
the population.
[0281] The number of pathogens represented can also be expanded
with increasing probes per unit area of microarray. The chips of
the present invention take advantage of high feature density to
simultaneously probe and discriminate among a large number of
candidate pathogens that may be present in a single clinical
specimen.
[0282] It is also important to note that gene expression markers
can conceptually be tiled onto the array in the same manner as
genomic markers. Thus, sequences representing pathogen transcripts
(RNA templates coding for protein synthesis) can be simultaneously
positioned on arrays with those allowing resequencing of genomic
markers. This would enable the microarray to detect not only the
presence of a pathogen, but would also allow determination of
viability, since transcriptional markers would only be present in
intact pathogens. TABLE-US-00002 TABLE 2 RPMV2 layout (Name)
Respiratory Pathogen Name BT Agent Name Species/Sub- Species/Sub-
Genus/ (CDC A; B; C) species species Species/Sub-species Species Rx
Res Adenovirus Subgroup A Hexon Fiber E1A Subgroup B1 Adenovirus 3
Hexon Fiber E1A Adenovirus 7 Hexon Fiber E1A Adenovirus 16 Hexon
Fiber E1A Adenovirus 21 Hexon Fiber E1A Subgroup B2 Adenovirus 11
Hexon Fiber E1A Adenovirus 35 Hexon Fiber E1A Subgroup C Adenovirus
1 Hexon Fiber E1A Adenovirus 2 Hexon Fiber E1A Adenovirus 5 Hexon
Fiber E1A Adenovirus 6 Hexon Fiber E1A Subgroup D Subgroup E
Adenovirus 4 Hexon Fiber E1A Subgroup F Hexon Fiber E1A Influenza
Influenza A Hemagglutinin 1 Neuraminidase 1 Neuraminidase 1 Matrix
(Full) (Full) (H5N1) Hemagglutinin 2 Neuraminidase 2 (Full)
Hemagglutinin 3 Neuraminidase 3 (Full) Hemagglutinin 4
Neuraminidase 4 Hemagglutinin 5 Neuraminidase 5 Hemagglutinin 6
Neuraminidase 6 Hemagglutinin 7 Neuraminidase 7 Hemagglutinin 8
Neuraminidase 8 Hemagglutinin 9 Neuraminidase 9 Hemagglutinin 10
Hemagglutinin 11 Hemagglutinin 12 Hemagglutinin 13 Hemagglutinin 14
Hemagglutinin 15 Influenza B Hemagglutinin B Neuraminidase B Matrix
Influenza C Hemagglutinin- Matrix esterase Parainfluenza
Parainfluenza 1 Hemagglutinin- Matrix neuraminidase Parainfluenza 2
Hemagglutinin- Matrix neuraminidase Parainfluenza 3 Hemagglutinin-
Matrix neuraminidase Parainfluenza 4A Hemagglutinin- Matrix
neuraminidase Rhinovirus 5' NCR HRV 5' NCR HRV21 5' NCR HRV29 5'
NCR 5' NCR 9501468 HRV HRV62 9501821 5' NCR HRV1A 5' NCR HRV58 5'
NCR HRV14 5' NCR HRV87 Coronavirus SARS Membrane Nucleocapsid Spike
Matrix Glycoprotein O43 Hemagglutinin- Nucleocapsid Spike esterase
229E Surface Nucleocapsid Spike Glycoprotein Respiratory Syncitial
Virus Type 1 (RSV A) Nucleocapsid Matrix Type 2 (RSV B)
Nucleocapsid Matrix Streptococcus Streptococcus emm 1 ST2035
ST4529L ST4532 ST4264 ST4547 GyrA Erm(A); Erm pyogenes (B);
Erm(TR); MefA; MefE; prtF1; put. Rx resist prot.; tet(O) emm 75
emm13L ST3018 U92492 STI4973 STCMUK16 ST2267 U50338 ST2980 ST230-2
ST436 ST448L ST3365 ST1135 ST1161 ST1432 ST6949 ST1160 emm1 emm2
emm3 emm4 emm5 emm6 emm7 emm8 emm9 emm10 emm11 emm12 emm13 emm14
emm15 emm16 emm17 emm18 emm19 emm20 emm21 emm22 emm23 emm24 emm25
emm26 emm27 emm28 emm29 emm30 emm31 emm32 emm33 emm34 emm35 emm36
emm37 emm38 emm39 emm40 emm41 emm42 emm43 emm44 emm45 emm46 emm47
emm48 emm49 emm50 emm51 emm52 emm53 emm54 emm55 emm56 emm57 emm58
emm59 emm60 emm61 emm62 emm63 emm64 emm65 emm66 emm67 emm68 emm69
emm70 emm71 emm72 emm73 emm74 emm75 emm76 emm77 emm78 emm79 emm80
emm81 emm82 emm83 emm84 emm85 emm86 emm87 emm88 emm89 emm90 emm91
emm92 emm93 emm94 emm95 emm96 emm97 emm98 emm99 emm100 emm101
emm102 emm103 emm104 emm105 emm106 emm107 emm108 emm109 emm110
U74320 Csr R & Csr S sic SpeB Streptococcus ponA (Pbp1A) GyrA
GyrB; ParC pneumoniae Staphylococcus entQ entK tst seb GyrA MSR(A);
aureus mecR1; VanA; BlaZ; dfrA; qacC Mycoplasma P1 gene GyrA
pneumoniae Bordetella pertussis Pertussis toxin PrnA Chlamydia
Chlamydia OmpB DNA peumoniae Gyrase Chlamydia psittaci OmpA SigA
Neisseria MviN GyrA menIngitidis Bacillus Bacillus anthracis
protective antigen rpoB lethal factor adema GyrA factor (Cya)
Bacillus cereus rpoB Bacillus cry rpoB thuringensis Bacillus
subtillus rpoB rpoB Yersinia pestis OmpA cve2155 sequence GyrA
Smallpox (variola Hemagglutinin SOD Chemo- major) kine binding
protein Monkeypox Hemagglutinin IA Varicella Hemagglutinin
Francisella TUL4 Region of mdh tularensis Difference (RD1)
Filoviridae Ebola L gene NP Protein Marburg L gene NP Protein
Arenaviridae Lassa L protein NP Protein Machupo L protein NP
Protein Burkholderia Burkholderia mallei PenA WaaF (Glanders)
Burkholderia PenA WaaF pseudomallei (Meliodosis) Burkholderia recA
cepacia Typhus fever Omp1 GyrA ermB (Rickettsia prowazekii)
Alphaviruses Venezuelan equine nonstructural nucleocapsid
encephalitis polyprotein proteins Eastern equine nonstructural
nucleocapsid encephalitis polyprotein proteins Western equine
nonstructural nucleocapsid encephalitis polyprotein proteins
Brucella Omp25 Omp2 RB51_WBOA_IS711J INSERT Brucella abortus (2308
B37) Brucella melitensis (F6145) Brucella suis (A44) GyrA Brucella
canis Brucella ovis Brucella neotoma Arcanobacterium 16S rRNA pld
hemolyticum Haemophilus OmpP5 (OmpA- GyrA Influenzae family)
Moraxella hemagglutinin GyrA catarrhalis Mycobacterium Omp A GyrA
tuberculosis Clostridium Clostridium Ntnh Bont botulinum
Clostridium TmpC Epsilon Toxin GyrA perfringens Clostridium tetani
L & H chain Coxiella burnetii TolC GyrA Cryptosporidium Sod CP2
parvum E. coli 0157:H7 Omp A Shiga Toxin I Shiga Toxin II GyrA
Ricinus communis Ricinus communis toxin Salmonella Salmonella
enterica OmpA GyrA Salmonella OmpA precursor typhimurium Shigella
Shigella dysenteriae OmpA Shigella flexneri OmpA GyrA Vibrio
cholerae OmpA CtxA & CtxB GyrA Nipah virus Nucleocapsid Matrix
Hantavirus Sin Nombre Nucleocapsid Glycoprotein Legionella MompS
GyrA pneumophilia Histoplasma M antigen H antigen capsulatum
Blastomyces WI-1 bys1 dermatiditis Coccidioides bg12 Ag2 immitis
Varicella Varicella HHV-6 major capsid Major antigenic protein
structural protein Varicella HHV-3 major capsid immediate early
protein transactivator Epstein-Barr Virus major capsid immediate
early protein transactivator Corynebacterium Corynebacterium Dtx
gene GyrA diphtheriae Enterovirus (genus) Enterovirus Coxsackie
Coxsackie (subgroup) Echo (subgroup) Echovirus Polio (subgroup)
Poliovirus Paramyxoviridae Paramyxoviridae Hemagglutinin Matrix
morbillivirus (Rubeola) Newcastle Hemagglutinin- Matrix
Neuraminidase West Nile Virus Envelope Nonstructural
protein 5 Yellow Fever Envelope Nonstructural protein 5
Metapneumovirus Nucleocapsid Matrix Norwalk Virus RNA-dependent
Nucleocapsid RNA polymerase Dengue Virus capsid protein .COPYRGT.
pre-M protein Foot & Mouth VPI gene 3D gene St. Louis
polyprotein NS5 Encephalitis Rift Valley Fever nucleoprotein
nonstructural protein Usutu polyprotein tsutsugamushi fever GroEL
tsa Chandipura G Glycoprotein Matrix Man-made Insertion Bluescript
II (KS+) pUC pGEM Sequences Multiple Cloning Sites Rx Resistance
Markers
[0283] TABLE-US-00003 TABLE 3 RPMV2 layout (Accession Number)
Respiratory Pathogen Species/Sub- Species/Sub- Species/Sub-
Species/Sub- Species/Sub- Name species species species species
species Genus/Species Rx Res BT Agent Name (CDC A; B; C) Adenovirus
Subgroup A NC_001460 NC_001460 NC_001460 Subgroup B1 Adenovirus 3
AF542129 AY224415 AF492352 Adenovirus 7 X76551 Z48954 AY495969
Adenovirus 16 X74662 U06106 Adenovirus 21 AB053166 U06107 AF492353
Subgroup B2 Adenovirus 11 AF532578 L08232 BK001453 Adenovirus 35
AB052912 AB098563 AY128640 Subgroup C Adenovirus 1 AF534906
AB108423 AF534906 Adenovirus 2 AY224391 AY224410 BK000407
Adenovirus 5 AF542130 AY224411 AY147066 Adenovirus 6 X67710
AB108424 E1A Subgroup D NC_002067 NC_002067 NC_002067 Subgroup E
Adenovirus 4 X84646 X76547 E1A Subgroup F NC_001454 NC_001454
NC_001454 Influenza Crieria: Human; Swine; Avian (Newest 1st)
Influenza A AJ344014 AJ518092 AY526746 AJ458301 (vaccine) L11125
U42776 ISDN38157 AY300947 (vaccine) ISDN13277 K01030 (vaccine)
AF285885 AY526745 M24740 AJ507203 AF285887 AY338459 AY340079
AF310988 AY300948 AY294658 AY180830 AF311750 AF310986 AF310990
M26089 M35997 L43916 Influenza B AB126838 AY139074 AB036877
Influenza C AB093473 AB086809 Parainfluenza Parainfluenza 1
NC_003461 NC_003461 Parainfluenza 2 NC_003443 NC_003443
Parainfluenza 3 AY283063 NC_001796 Parainfluenza 4A E02727 E03809
Rhinovirus AF108164 AF108180 AF542420 AF542420 AF108184 AF108179
AF108183 AF108186 AF108187 Coronavirus SARS AY323974 AY365036
AY429079 AY390556 OC43 M76373 NC_005147 L14643 229E NC_002645
X51325 X16816 Respiratory Syncitial Virus Type 1 (RSV A) AF035006
AF035006 Type 2 (RSV B) AF013254 AF013254 Streptococcus emm source
Streptococcus pyogenes ST2035 ST4529L ST4532 ST4264 ST4547 AE006557
Erm(A); AY357120; Erm(TR); U70055; MefE; prtF1; AE006513 emm13L
ST3018 U92492 STI4973 STCMUK16 ST2267 U50338 ST2980 ST230-2 ST436
ST448L ST3365 ST1135 ST1161 ST1432 ST6949 ST1160 emm1 emm2 emm3
emm4 emm5 emm6 emm7 emm8 emm9 emm10 emm11 emm12 emm13 emm14 emm15
emm16 emm17 emm18 emm19 emm20 emm21 emm22 emm23 emm24 emm25 emm26
emm27 emm28 emm29 emm30 emm31 emm32 emm33 emm34 emm35 emm36 emm37
emm38 emm39 emm40 emm41 emm42 emm43 emm44 emm45 emm46 emm47 emm48
emm49 emm50 emm51 emm52 emm53 emm54 emm55 emm56 emm57 emm58 emm59
emm60 emm61 emm62 emm63 emm64 emm65 emm66 emm67 emm68 emm69 emm70
emm71 emm72 emm73 emm74 emm75 emm76 emm77 emm78 emm79 emm80 emm81
emm82 emm83 emm84 emm85 emm86 emm87 emm88 emm89 emm90 emm91 emm92
emm93 emm94 emm95 emm96 emm97 emm98 emm99 emm100 emm101 emm102
emm103 emm104 emm105 emm106 emm107 emm108 emm109 emm110 U74320
U11966 AF095713 AY229859 AB051298 Streptococcus X67867 AY157689
GyrB; ParC pneumoniae Staphylococcus aureus U93688 U93688 U93688
M11118 D10489 AF467080; AF142100; AE017171 Mycoplasma AF290002
X53555 pneumoniae Bordetella pertussis M13223 AJ507642 BX640413
(codon 286253) Chlamydia Chlamydia pneumoniae X53511 AB103388
Chlamydia psittaci AF269281 U04442 Neisseria meningitidis AE002384
AE002487 Bacillus Bacillus anthracis AF306783 AF205335 M29081
M24074 AY291534 Bacillus cereus AF205342 Bacillus thuringensis
AF278797 AF205349 Bacillus subtillus AF205356 X02369 (orf 821)
Yerslnia pestis NC_003143 AF350077 AE013898 Smallpox (variola
major) L22579 L22579 L22579 Monkeypox Varicella Francisella
tularensis M32059 AF469619 AF513319 Filoviridae Ebola NC_004161
NC_004161 Marburg NC_001608 NC_001608 Arenaviridae Lassa NC_004297
NC_004296 Machupo NC_005079 NC_005078 Burkholderia Burkholderia
mallei AY032868 AY124769 (Glanders) Burkholderia AY032869 AF097748
pseudomallei (Meliodosis) Burkholderia cepacia U70431 Typhus fever
(Rickettsia AJ235270 AJ235270 (codon AJ235270 prowazekii) 250672)
Alphaviruses Venezuelan equine L04653 L04653 encephalitis Eastern
equine NC003899 NC003899 encephalitis Western equine NC003908
NC003908 encephalitis Brucella Consensus Consensus Consensus
Brucella abortus (2308 B37) Brucella melitensis (F6145) Brucella
suis (A44) AE014411 Brucella canis Brucella ovis Brucella neotoma
Arcanobacterium X73952 L16583 hemolyticum Haemophilus influenzae
L20309 U32806 Moraxella catarrhalls AY077637 AF056196 Mycobacterium
BX842574 AE006915 tuberculosis Clostridium Clostridium botulinum
Y13630 X62683 Clostridium perfringens AP003191 X60694 AP003185
Clostridium tetani X04436 Coxiella burnetil AE016960 AE016960
Cryptosporidium AF529280 AY471868 parvum E. coli 0157:H7 AE005582
AB083044 AB048837 NC_002655 Ricinus communis X52908 Salmonella
Salmonella enterica AL627269 AE016836 (codon 70224) Salmonella
typhimurium AE008746 Shigella Shigella dysenteriae V01344 Shigella
flexneri AE015125 AE016986 Vibrio cholerae AF030977 Gary Custom
NC_002505 Nipah virus NC_002728 NC_002728 Hantavirus Sin Nombre
NC_005216 L33474 Legionella AF078136 AY091594 pneumophilia
Histoplasma AF026268 U20346 capsulatum Blastomyces S63772 AF277079
dermatiditis Coccidioides immitis AF022893 U32518 Varicella
Varicella HHV-6 NC_001664 NC_001664 Varicella HHV-3 NC_001348
NC_001348 Epstein-Barr Virus NC_001345 NC_001345 Corynebacterium
Corynebacterium A04646 BX248354 diphtheriae Enterovirus (genus)
NC_001612 Coxsackie (subgroup) AF499635 Echo (subgroup) NC_003986
Polio (subgroup) NC_002058 Paramyxoviridae Paramyxoviridae AY523581
NC_001498 morbillivirus (Rubeola) Newcastle AY510092 NC_002617 West
Nile Virus AF346319 AF208017 Yellow Fever AY359908 AF013417
Metapneumovirus AY145272 AY145271 Norwalk Virus NC_001959 NC_001959
Dengue Virus NC_001474 NC_001474 Foot & Mouth NC_004004
NC_004004 St. Louis Encephalitis AY289618 AF013416 Rift Valley
Fever X53771 X53771 Usutu AF452643 tsutsugamushi fever AY191589
AY283180 Chandipura J04350 AF128868 Man-made Insertion Stratagene
X06404 X65312 Sequences Provided Multiple Cloning Sites Rx
Resistance Markers
[0284] TABLE-US-00004 TABLE 4 RPMV2 layout (Size) Respiratory
Pathogen Name BT Agent Name Species/Sub- Species/Sub- (CDC A; B; C)
species species Species/Sub-species Genus/Species Rx Res Adenovirus
Subgroup A 836 860 597 Subgroup B1 Adenovirus 3 685 829 880
Adenovirus 7 605 829 880 Adenovirus 16 673 906 Adenovirus 21 759
637 819 Subgroup B2 Adenovirus 11 629 1025 789 Adenovirus 35 641
711 789 Subgroup C Adenovirus 1 667 802 870 Adenovirus 2 789 698
870 Adenovirus 5 684 826 1055 Adenovirus 6 785 578 Subgroup D 644
783 579 Subgroup E Adenovirus 4 2811 1375 616 Subgroup F 828 659
909 Influenza Influenza A 1692 1459 575 734 757 1410 1042 804 1323
257 303 865 839 691 770 946 849 795 553 444 727 680 690 813 715 745
Influenza B 737 761 715 Influenza C 401 862 Parainfluenza
Parainfluenza 1 1649 910 Parainfluenza 2 1646 693 Parainfluenza 3
1581 498 Parainfluenza 4A 638 807 Rhinovirus 508 499 338 503 501
511 504 520 506 Coronavirus SARS 666 1269 3768 666 OC43 828 557
1042 229E 753 789 1393 Respiratory Syncitial Virus Type 1 (RSV A)
907 958 Type 2 (RSV B) 554 826 Streptococcus Streptococcus pyogenes
675 425 713 661 340 815 Erm(A); 763; 732; 563; MefE; prtF1 200 316
371 415 680 413 384 401 688 350 355 554 619 475 664 650 635 325 360
391 337 490 437 emm7 emm8 509 emm10 500 364 emm13 439 652 emm16 187
emm18 381 emm20 emm21 620 258 421 561 936 317 333 328 265 339 299
328 340 emm35 410 270 481 369 340 488 408 306 391 emm45 315 331 489
485 287 423 499 414 293 405 389 338 391 433 452 325 334 429 538 640
528 335 526 566 440 300 256 328 306 451 353 450 415 322 354 432 497
421 324 448 404 420 346 379 483 393 363 403 516 408 358 357 351 372
340 350 354 emm103 emm104 emm105 emm106 emm107 emm108 emm109 emm110
533 292 904 1008 681 Streptococcus pneumoniae 1237 815 GyrB; ParC
Staphylococcus aureus 771 729 705 713 821 400; 652; 1032; 846; ##;
## Mycoplasma pneumoniae 2526 809 Bordetella pertussis 824 730 815
Chlamydia Chlamydia pneumoniae 982 824 Chlamydia psittaci 991 835
Neisseria meningitidis 856 941 Bacillus Bacillus anthracis 551 777
638 985 732 Bacillus cereus 777 Bacillus thuringensis 805 777
Bacillus subtillus 780 812 Bacillus globigii Yersinia pestis 913
517 812 Smallpox (variola major) 942 378 762 Monkeypox 764
Varicella Francisella tularensis 786 3322 960 Filoviridae Ebola 752
758 Marburg 1170 799 Arenaviridae Lassa 973 703 Machupo 1540 715
Burkholderia Burkholderia mallei 1117 (Glanders) Burkholderia
pseudomallei 1117 1100 (Meliodosis) Burkholderia cepacia 563 Typhus
fever (Rickettsia 985 968 1560 prowazekii) Alphaviruses Venezuelan
equine 923 1512 encephalitis Eastern equine 1312 975 encephalitis
Western equine 878 902 encephalitis Brucella 582 2428 1291 Brucella
abortus (2308 B37) Brucella melitensis (F6145) 995 Brucella suis
(A44) Brucella canis Brucella ovis Brucella neotoma Arcanobacterium
1489 1111 hemolyticum Haemophilus influenzae 937 896 Moraxella
catarrhalis 653 321 Mycobacterium 932 818 tuberculosis Clostridium
Clostridium botulinum 499 1000 Clostridium perfringens 1113 572 810
Clostridium tetani 1138 Coxiella burnetii 1851 812 Cryptosporidium
parvum 375 2304 E. coli 0157: H7 660 948 960 812 Ricinus communis
1133 Salmonella Salmonella enterica 904 812 Salmonella typhimurium
904 Shigella Shigella dysenteriae 907 Shigella flexneri 898 812
Vibrio cholerae 942 984 887 Nipah virus 858 1359 Hantavirus Sin
Nombre 639 1293 Leglonella pneumophilla 1157 236 Histoplasma
capsulatum 919 1082 Blastomyces dermatiditis 942 912 Coccidioides
Immitis 965 1234 Varicella Varicella HHV-6 890 1236 Varicella HHV-3
822 781 Epstein-Barr Virus 971 1317 Corynebacterium Corynebacterium
913 818 diphtheriae Enterovirus (genus) 1758 Coxsackie (subgroup)
920 Echo (subgroup) 1277 Polio (subgroup) 1226 Paramyxoviridae
Paramyxoviridae 1854 1008 morbillivirus (Rubeola) Newcastle 1734
1232 West Nile Virus 1504 917 Yellow Fever 1547 1035
Metapneumovirus 1185 765 Norwalk Virus 961 712 Dengue Virus 300 498
Foot & Mouth 633 799 St. Louis Encephalitis 679 1035 Rift
Valley Fever 738 798 Usutu 1035 tsutsugamushi fever 546 1011
Chandipura 704 755 Man-made Insertion 2961 3914 2912 Sequences
Multiple Cloning Sites Rx Resistance Markers Chip Size 293826
[0285] The sequences submitted for RPMV2 tiling and chip
fabrication were based on the Affymetrix instruction file
summarized in Table 5, which corresponds to the sequences appearing
as SEQ ID NOs: 59-421. TABLE-US-00005 TABLE 5 RPMV2 Affymetrix
instructions file for tiling and chip fabrication SEQ ID Name Alias
Start End NO: StartSeq EndSeq ATTIM1 ATTIM1 1 523 59 ACATCGAC
GAGCTTGC Ad3E1A Ad3E1A 1 879 60 TGAGACAC GGCAATGA Ad3HEXON Ad3HEXON
1 595 61 CCAGCACA CATGTGGT Ad3FIBER Ad3FIBER 1 746 62 AAGCGGCT
AACCCTCA Ad7E1A Ad7E1A 1 868 63 TGAGACAC AAACTGCC Ad7HEXON Ad7HEXON
1 513 64 CCAGCACA TGCTGACG Ad7FIBER Ad7FIBER 1 977 65 TGACCAAG
ACGACTGA Ad7NAVYE1A Ad7NAVYE1A 1 879 66 ATGAGACA AGGCAATA
Ad7NAVYHEXON Ad7NAVYHEXON 1 504 67 CCAGCACA TGCTGACG Ad7NAVYFIBER
Ad7NAVYFIBER 1 975 68 TGGCTTTA CCCCACGC Ad16E1A Ad16E1A 1 878 69
TGAGACAC GGCAATGA Ad16HEXON Ad16HEXON 1 667 70 CGTTGACC AGGACACC
Ad16FIBER Ad16FIBER 1 652 71 GGGAGATG AGGAAGCC Ad21EIA Ad21E1A 1
878 72 TGAGACAC GACAATGA Ad21HEXON Ad21HEXON 1 807 73 CGCCACAG
GTGTACAA Ad21FIBER Ad21FIBER 1 685 74 GCTCAGTG CCTTTATG Ad11E1A
Ad11E1A 1 872 75 TGAGAGAT GACAATAA Ad11HEXON Ad11HEXON 1 677 76
CGCCACAG TCAGAAAG Ad11FIBER Ad11FIBER 1 977 77 TGACCAAG ACGACTGA
Ad35E1A Ad35E1A 1 872 78 TGAGAGAT GACAATAA Ad35HEXON Ad35HEXON 1
689 79 CGCCACAG TGAAAAAA Ad35FIBER Ad35FIBER 1 971 80 TGACCAAG
ACAACTAA Ad1E1A Ad1E1A 1 986 81 TGAGACAT GGCCATAA Ad1HEXON Ad1HEXON
1 715 82 AGTGGTCT TGCTACGG Ad1FIBER Ad1FIBER 1 750 83 GATGCTGT
AAGAATAA Ad2E1A Ad2E1A 1 983 84 TGAGACAT GGCCATAA Ad2HEXON Ad2HEXON
1 837 85 AGTGGTCT CGACCGGC Ad2FIBER Ad2FIBER 1 750 86 ATAGCTAT
AGGAATAA Ad5E1A Ad5E1A 1 985 87 TGAGACAT GGCCATAA Ad5HEXON Ad5HEXON
1 732 88 AGTGGTCT TGTAAAGC Ad5FIBER Ad5FIBER 1 747 89 ACAGCCAT
AAGAATAA Ad6E1A Ad6E1A 1 985 90 TGAGACAT GGCCATAA Ad6HEXON Ad6HEXON
1 833 91 GAATGAAG AATTGGGA Ad6FIBER Ad6FIBER 1 750 92 TCCTCAAA
CAGAAAAT Ad4E1A Ad4E1A 1 865 93 TGAGGCAC GGCATTAA Ad4HEXON Ad4HEXON
1 2810 94 TGGCCACC CCACATAA Ad4FIBER Ad4FIBER 1 1277 95 TGTCCAAA
AACAATAA Ad4AFE1A Ad4AFE1A 1 832 96 TGAGGCAC GACATTAA Ad4AFHEXON
Ad4AFHEXON 1 2810 97 TGGCCACC CCACATAA Ad4AFFIBER Ad4AFFIBER 1 1277
98 TGTCCAAA AAGAATAA Ad12E1A Ad12E1A 1 597 99 ATGAGAAC GGAGGTGA
Ad12HEXON Ad12HEXON 1 884 100 CCTACTTC TGCAAGAC Ad12FIBER Ad12FIBER
1 908 101 CAGCAGAA CGTTGCCG Ad17E1A Ad17E1A 1 579 102 ATGAGACA
GAGGCTGA Ad17HEXON Ad17HEXON 1 692 103 CTTCAGCC GAAGAATA Ad17FIBER
Ad17FIBER 1 829 104 TCCTGTCA TATCAGCC Ad40E1A Ad40E1A 1 824 105
CTTGAGTG ATAGAAGA Ad40HEXON Ad40HEXON 1 876 106 CGCAATGG TAATTACA
Ad40FIBER Ad40FIBER 1 707 107 CACTGACA TCAGTGTC FluAHA1 FluAHA1 1
1692 108 ATGAAAGC GAATATGC FluAHA2 FluAHA2 1 805 109 AAGGTCGA
TTTGGGAG FluAHA3 FluAHA3 1 1042 110 CAAAAACT AATGGTGG FluAHA4
FluAHA4 1 1371 111 AGGAAATC GACAAAGG FluAHA5 FluAHA5 1 303 112
ATGCCCCA TTTAACAA FluAHA6 FluAHA6 1 887 113 CCGTCACA GACTAAGA
FluAHA7 FluAHA7 1 818 114 AAATCCTG TGGGAATT FluAHA8 FluAHA8 1 897
115 CTCTTGGC GCAAACCC FluAHA9 FluAHA9 1 601 116 ACTCCACA CCAAGGCC
FIuAHA10 FluAHA10 1 775 117 CCTGGAGC GTATGGTT FluAHA11 FluAHA11 1
728 118 CTGCATTC AGAGGCAA FluAHA12 FluAHA12 1 738 119 CACTGTTC
GGCCAAAC FluAHA13 FluAHA13 1 1765 120 AGCAAAAG TTTCTACT FluAHA14
FluAHA14 1 763 121 CACAAATG CAAGAGGC FLuAHA15 FluAHA15 1 793 122
ACGGAGAC CCCTTTGC FIuANA1-1 FluANA1-1 1 1459 123 CAAAAGCA TTTCTACT
FluANA1-2 FluANA1-2 1 575 124 TGCCATGA ATGATTTG FluANA2 FluANA2 1
1062 125 TCATGCGA TTTTAGAA FluANA3 FluANA3 1 852 126 GCCCTTTC
TGAAGTCA FluANA4 FluANA4 1 257 127 AGCAAAAG CAGCCCCC FluANA5
FluANA5 1 913 128 CGGTGAGA GCGGGAAG FluANA6 FluANA6 1 739 129
AGAGGATG TTGCATTC FluANA7 FluANA7 1 994 130 AGGAGGGT ACACCAGC
FluANA8 FluANA8 1 843 131 CAATACAG ATTAGCAG FluANA9 FluANA9 1 444
132 AACCTGAA GTCAATAT FluAH1N1MATRIX FluAH1N1MATRIX 1 734 133
ATGGAATG TAAACACG FluAH5N1MATRIX FluAH5N1MATRIX 1 657 134 AGACCAAT
TTGCACTT FluBHA FluBHA 1 785 135 GGGAAGTC AGGTAATA FluBNA FluBNA 1
809 136 GCCCTCAT CTCGAACG FluBMATRIX FIuBMATRIX 1 763 137 GGAGAAGG
ATGGCTTG FIuCHA FluCHA 1 401 138 CTTCTTGC ATGATCAT FluGMATRIX
FluCMATRIX 1 862 139 ATGTCCGA TTATATAA PIV1HN PIV1HN 1 1728 140
ATGGCTGA CATCTTGA PIV1MATRIX PIV1MATRIX 1 958 141 CCGGAGAA CAGTAGAA
PIV1NC PIV1NC 1 1682 142 AGGGTTAA AAGAAAAA PIV2HN PIV2HN 1 1716 143
ATGGAAGA TACCTTAA PIV2MATRIX PIV2MATRIX 1 741 144 CTTGCCTC CAGGTCGG
PIV2NC PIV2NC 1 1849 145 AGATTCGG AGAAAAAA PIV3HN PIV3HN 1 1725 146
ATGGAATA AATCATAA PIV3MATRIX PIV3MATRIX 1 544 147 CCAACAAA CCTGGCGA
PIV3NC PIV3NC 1 1548 148 ATGTTGAG GCAACTAA PIV4HN PIV4HN 1 686 149
GACGGGAG AAAGATTG PIV4MATRIX PIV4MATRIX 1 855 150 GGAACGGT TTGGCTCA
HRV14NCR HRV14NCR 1 520 151 TGATGTAC GTTTCTCA HRV1ANCR HRV1ANCR 1
511 152 TTCCGGTA GTTTCACT HRV21NCR HRV21NCR 1 499 153 TTCCGGTA
GTTTCACT HRV29NCR HRV29NCR 1 676 154 CGAAAACA TTGGGTGT HRV58NCR
HRV58NCR 1 504 155 TCACGGTA GTTTCCTG HRV62NCR HRV62NCR 1 501 156
TTCCGGTA GTTTCACT HRV87NCR HRV87NCR 1 506 157 TCTTGGTA GTTTCACT
HRV95NCR HRV95NCR 1 508 158 TTCCGGTA GTTTCTTG RSVABL RSVABL 1 379
159 AAGTGCTC AAGCAAAC RSVAMATRIX RSVAMATRIX 1 958 160 GGGGCAAA
ATAAAAAA RSVANC RSVANC 1 955 161 TCCAACGG CCGAGGAA RSVBMATRIX
RSVBMATRIX 1 770 162 ATGGAAAC GAGGATTA RSVBNC RSVBNC 1 602 163
GATGGGAG TACGCCAA HCV229ESPIKE HCV229ESPIKE 1 1534 164 GTTGATTG
GCCGTGGT HCV229EMEM HCV229EMEM 1 678 165 ATGTCAAA TTTTCTAA
HCV229ENC HCV229ENC 1 924 166 ATGGCTAC GTCACATT HCVOC43SPIKE
HCVOC43SPIKE 1 1456 167 ATTGATTG GCCTTGGT HCVOC43MEM HCVOC43MEM 1
693 168 ATGAGTAG ATATCTAA HCVOC43NC HCVOC43NC 1 966 169 AGAGCTCA
GTACACTT SARSSPIKE SARSSPIKE 1 1438 170 GTAGATTG GCCTTGGT SARSMEM
SARSMEM 1 666 171 ATGGCAGA TACAGTAA SARSNC SARSNC 1 932 172
ATGGGGCA ACAAAGAT HCVNL63SPIKE HCVNL63SPIKE 1 1534 173 GTTGATTG
GCCTTGGT HCVNL63ORF3 HCVNL63ORF3 1 678 174 ATGCCTTT TTAATTGA
HCVNL63MEM HCVNL63MEM 1 681 175 ATGTCTAA TAATCTAA HCVNL63NC
HCVNL63NC 1 879 176 ATGGCTAG GTGAGGTT MPVMATRIX MPVMATRIX 1 765 177
ATGGAGTC CCAGATAA
MPVNC MPVNC 1 1185 178 ATGTCTCT ATGAGTAA HHV1L HHV1L 1 1061 179
TACCAGGG AAGCGCCT HHV1CAPSID HHV1CAPSID 1 993 180 CGGGCGCC GTGGGCGT
HHV3L HHV3L 1 1061 181 TATAAAGG CGTCGCTT HHV3CAPSID HHV3CAPSID 1
993 182 CGGGAGCC ATGGGCAT HHV4L HHV4L 1 1067 183 TACCAGGG ACCCAGAT
HHV4CAPSID HHV4CAPSID 1 992 184 CGCCGACA CTGGGCAT HHV5L HHV5L 1
1136 185 TACCAGGG TCTAACCT HHV5CAPSID HHV5CAPSID 1 998 186 CGCGCAGC
CTGGGCCT HHV6L HHV6L 1 1058 187 TACAAAGG CCGAATCT HLHV6CAPSID
HHV6CAPSID 1 1001 188 CGCGCAGC TTGGGCAT ENTEROVIRUS ENTEROVIRUS 1
1758 189 CACCAATG GATAGATA COXSACKIEVIRUS COXSACKIEVIRUS 1 920 190
CAATGCAA TCTTGAGG ECHO ECHO 1 1277 191 CACTTGCC ACAAAGAG POLIO
POLIO 1 1226 192 TGGATAGT ACTTATGT POLIO1NCR POLIO1NCR 1 436 193
CAAGCACT TGACAATC POLIO2NCR POLIO2NCR 1 437 194 CAAGCACT TGACAATC
POLIO3NCR POLIO3NCR 1 437 195 CAAGCACT TGACAATC MEASLESHA MEASLESHA
1 1854 196 ATGTCACC GCGGATAG MEASLESMATRIX MEASLESMATRIX 1 1008 197
ATGACAGA TTCTGTAG NEWCASTLEHN NEWCASTLEHN 1 1734 198 ATGGACCG
CTGGCTAG NEWCASTLEMATRIX NEWCASTLEMATRIX 1 1232 199 ACGGGTAG
CATCAAGT WNE WNE 1 1504 200 TTCAACTG GCACGCTG WNNS WNNS 1 917 201
GGCTGCTG GGGAAGGA WNCM WNCM 1 432 202 GGCCAATA TGATCCAG YFE YFE 1
1547 203 CTGCATTG TAGAGACT YFNS YFNS 1 1035 204 AAGCTGTC AGGGAGAG
VMVG3R VMVG3R 1 762 205 ATGAAACA GTGTCTGA VMVHA VMVHA 1 942 206
ATGACACG AAGTCTAG VMVSOD VMVSOD 1 378 207 ATGGCTGT GCGTTTGA VMVCRMB
VMVCRMB 1 291 208 TCGGGAAC CGTCTGTT MONKEYPOX MONKEYPOX 1 812 209
GTGAATGC TTTCGACG EBOLAL EBOLAL 1 800 210 AGTTGGAC GAAACACG EBOLANP
EBOLANP 1 806 211 AGGAGTAA CGACAATC EBOLAMATRIX EBOLAMATRIX 1 1498
212 GATGAAGA AAGAAAAA MARBURGL MARBURGL 1 1218 213 GCGGCACT
CAATTGAC MARBURGNP MARBURGNP 1 847 214 TCACAGAA GTCATTTG LASSAL
LASSAL 1 1021 215 GCATCTGG ACTACCTC LASSANP LASSANP 1 751 216
ATGGAGTG AGTTCAGG LASSAGP LASSAGP 1 1476 217 ATGGGACA AGAGATGA
MACHUPOL MACHUPOL 1 1588 218 GTGGCTGA GAGGCTAA MACHUPONP MACHUPONP
1 763 219 TTGAAGAC GGCACTAT MACHUPOG MACHUPOG 1 1491 220 ATGGGGCA
GACATTAA VEEVNS VEEVNS 1 923 221 GACAGCCC AAAGTGAC VEEVNC VEEVNC 1
1512 222 GGCCACCT AGCATATC EEEVNS EEEVNS 1 1312 223 GAGATAGA
ATTGCGTC EEEVNC EEEVNC 1 975 224 CCTGACTT TCAGCTAT WEEVNS WEEVNS 1
878 225 CGTATGTC CCACAATG WEEVNC WEEVNC 1 902 226 TGTTCTAG TGGCGACT
NIPAHMATRIX NIPAHMATRIX 1 1359 227 AGGAGACA ACAAAAAA NILPAHN NIPAHN
1 858 228 AGGAATCT CAATCAGC SINNOMBREOP SINNOMBREGP 1 1293 229
ATAGCTGG GGATGGAT SINNOMBRENG SINNOMBRENG 1 639 230 TCACTCTC
TGTGATTG NORWALKL NORWALKL 1 739 231 TTCTCCAT ATTCGTAA
NORWALKCAPSID NORWALKCAPSID 1 760 232 TGGTACCG CTGGATGG
DENGUECAPSID DENGUECAPSID 1 300 233 ATGAATGA GACGTAGA DENGUEM
DENGUEM 1 498 234 TTTCATCT CAATGACA DENGUE1NCR DENGUE1NCR 1 157 235
GGTTAGAG GCTGTCTC DENGUE2NCR DENGUE2NCR 1 159 236 GGTTAGAG GCTGTCTC
DENGUE3NCR DENGUE3NCR 1 156 237 GGTTAGAG GCTGTCTC DENGUE5NCR
DENGUE5NCR 1 162 238 GGTTAGAG GCTGTCTC FMDVVP1 FMDVVP1 1 633 239
ACCACCTC CAAAACAG FMDV3D FMDV3D 1 846 240 GTTGATCG ACGGAGCA SLEVNS5
SLEVNS5 1 1035 241 AAGACTGG AGGGTGAG SLEVPP SLEVPP 1 727 242
CTCGGTAG GTTTCACG RVFVN RVFVN 1 738 243 ATGGACAA CAGCCTAA RVFVNS
RVFVNS 1 798 244 ATGGATTA TTGATTAG USUTUPP USUTUPP 1 1035 245
AAGCTCGG CAGGTGAG JEVPP JEVPP 1 1035 246 AAGCCTGG AAGGAGAG
CHANDIPURAMATRIX CHANDIPURAMATRIX 1 755 247 AACAGAAA GAAAAAAA
CHANDIPURAGP CHANDIPURAGP 1 752 248 ATCACTCT GTAGTTGT ATTIM2 ATTIM2
1 523 249 ACATCGAC GAGCTTGC ATTIM3 ATTIM3 1 523 250 ACATCGAC
GAGCTTGC SPYEMM1 SPYEMM1 1 398 251 GCTTCAGT CAGGCAAG SPYEMM2
SPYEMM2 1 360 252 GCATCCGT GAGAAGTC SPYEMM3 SPYEMM3 1 391 253
ACGGCTTC GGAATATC SPYEMM4 SPYEMM4 1 337 254 AGCATCAG GTCAATAT
SPYEMM5 SPYEMM5 1 490 255 ACTGCATC AAAAGATA SPYEMM6 SPYEMM6 1 437
256 TACTGCAT CTTAAAAA SPYEMM9 SPYEMM9 1 509 257 CAGGTACA CTGCTCTT
SPYEMM11 SPYEMM11 1 500 258 GCATCCGT AATCACCA SPYEMM12 SPYEMM12 1
364 259 GCTTCAGT AAATGATG SPYEMM13L SPYEMM13L 1 325 260 CAGCATCC
AAAAATCA SPYEMM18 SPYEMM18 1 524 261 ACTGCTTC GAAGAACA SPYEMM22
SPYEMM22 1 620 262 GCATCAGT GACGCAAG SPYEMM28 SPYEMM28 1 333 263
CAGCATCC AGAACGTC SPYEMM29 SPYEMM29 1 328 264 TGCATCAG AAGAACAG
SPYEMM44 SPYEMM44 1 391 265 CAGCATCA CAAGAACA SPYEMM61 SPYEMM61 1
325 266 GCATCAGT AGAACGTC SPYEMM75 SPYEMM75 1 451 267 TCCGTAGC
AAGCCGTG SPYEMM77 SPYEMM77 1 450 268 GCTCAGTA AGCTGAGC SPYEMM89
SPYEMM89 1 378 269 CATCAGTA AGAAAAGC SPYEMM94 SPYEMM94 1 516 270
GCATCAGT CAGACGCA SPYCSR SPYCSR 1 952 271 TGGTCCTA CCCAGGCT SPYSFB1
SPYSFB1 1 615 272 AGAACCTG GGCATGAG SPYSPEB SPYSPEB 1 729 273
ACTCTACC TATCGATG SPNGYRA SPNGYRA 1 815 274 GAGGATTT ACTGATAC
SPNLYTA SPNLYTA 1 125 275 TATCGAAC CTCAGACC SPNPLY SPNPLY 1 99 276
GGTTTGGC ATCAAGAT SAUGYRA SAUGYRA 1 821 277 GAAGACTT ACTAATGC
SAUTST SAUTST 1 705 278 ATGAATAA TTAATTAA SAUENTK SAUENTK 1 729 279
ATGAAAAA CGATATAA SAUENTQ SAUENTQ 1 771 280 ATGCCTAT CTGAATAA
CPNGYRA CPNGYRA 1 824 281 GAAGACAT TCGAGTCA CPNOMPB CPNOMPB 1 1030
282 GCGAAGCT TCAGGTCC CPNMOMPVD4 CPNMOMPVD4 1 150 283 ATGCTGAT
TCAGATCA CPNMOMPVD2 CPNMOMPVD2 1 133 284 AGCGTTCA TAGGCGCT CPNRPOB
CPNRPOB 1 346 285 AAGGACTT CTGCAGGC CPSOMPA CPSOMPA 1 991 286
GGAACCCA TCGATTCA CPSSIGA CPSSIGA 1 883 287 CGCAAGCT GGTTCAGC
CDIDTX CDIDTX 1 913 288 GACGTGGT TTCTCCGG CDIGYRA CDIGYRA 1 818 289
GAAGACCT ACCTCCGC CDIDTXR CDIDTXR 1 1124 290 AATGAGTG GCGCCTGT
HINGYRA HINGYRA 1 896 291 GAAGATTT ACTGATGC HINOMPA HINOMPA 1 937
292 GCGTTAAA CCAGACCG LPNGYRA LPNGYRA 1 236 293 GATGTCGG GACCGTCG
LPNMOMPS LPNMOMPS 1 1157 294 GTCCTTAC TCATTAGA MCAGYRA MCAGYRA 1
321 295 AACTGGAA AGATTCCC MCAHA MCAHA 1 653 296 GATCAATC AATGGTCA
MTUGYRA MTUGYRA 1 818 297 GAGGATTT ACTTCCGG MTUOMPA MTUOMPA 1 932
298 GACGAACT TCAACTAA MTURPOB MTURPOB 1 411 299 TACGGTCG ACGCCGTA
MPNGYRA MPNGYRA 1 809 300 GAGGACTT TCTTCAGC MPNP1 MPNP1 1 2570 301
CAGTTGCA ACGCGAGC NMEGYRA NMEGYRA 1 941 302 GAAGACCT ACCAGCGG
NMEMVIN NMEMVIN 1 904 303 GATGAATA ATACGGAA
NMECTRA NMECTRA 1 135 304 TTGGATGC TTTTGCTG NMECRGA NMECRGA 1 254
305 GGTGCTGC TGCCGGTC AHE16S AHE16S 1 1489 306 CGAACGCT CCGGAAGG
AHEPLD AHEPLD 1 1111 307 GCAAAGTG CTCCTTTT BANGYRA BANGYRA 1 732
308 GAAGACTT AGACTTGT BANLEF BANLEF 1 685 309 ATATCGAG CTAGGTGC
BANPAG BANPAG 1 599 310 CAGAAGTG GGATAGCG BANRPOB BANRPOB 1 777 311
TAGTTCGC AGGGGATA BANCYA BANCYA 1 545 312 GCGATGAT CTGTCGAG BANCAPB
BANCAPB 1 246 313 TTACACGT ACCTATTA BCERPOB BCERPOB 1 777 314
TAGTTCGC AGGGGATA BSUGYRA BSUGYRA 1 812 315 GAAGATCT ACAGCTAG
BSURPOB BSURPOB 1 780 316 TTGTTCGG AGGCGACA BTHCRY BTHCRY 1 853 317
AGAACACA ATCGCATC BTHRPOB BTHRPOB 1 777 318 TAGTTCGC AGGGGATA
BPEGYRA BPEGYRA 1 815 319 GAAGACCT ACCACCGG BPEPRNA BPEPRNA 1 777
320 GGTTCAAG CGCCGACA BMEGYRA BMEGYRA 1 995 321 GAAGACCT TCGGATGG
BABRB51 BABRB51 1 1339 322 ATCCCATA AGCTAGTA BABOMP25 BABOMP25 1
630 323 AAGTCTCT AGTTCTAA BABOMP2 BABOMP2 1 1434 324 TGTTCTTC
GAGAGCAG BCAOMP2 BCAOMP2 1 1434 325 TGTTCTTC GAGAGCAG BMEOMP2
BMEOMP2 1 1434 326 TGTTCTTC GAGAGCAG BNEOMP2 BNEOMP2 1 1434 327
TGTTCTTC GAGAGCAG BOVOMP2 BOVOMP2 1 1449 328 TGTTCTTC GAGAGCAG
BSUIOMP2 BSUIOMP2 1 1434 329 TGTTCTTC GAGAGCAG BMAPENA BMAPENA 1
1117 330 GAGAGCTG AAGGTTCA BMAWAAF BMAWAAF 1 1015 331 CGTTGGTT
GGGATGCT BPSPENA BPSPENA 1 1117 332 GAGAGCTG AAGGTTCA BPSWAAF
BPSWAAF 1 1100 333 AGCGCGGC GTCCGCGG BCEPRECA BCEPRECA 1 611 334
CATGGAAG CAACCAGA CPEGYRA CPEGYRA 1 810 335 GAAGACTT ATAAATAG
CPETMPC CPETMPC 1 1113 336 ATGAAAAA TAAATTAA CBUGYRA CBUGYRA 1 812
337 GAAGATTT AGTGATAA CBUTOLC CBUTOLC 1 745 338 ATTTAGAC CTAGGAAA
FTURD1A FTURD1A 1 531 339 ATGAAAAA CAATTTAG FTURD1B FTURD1B 1 285
340 ATGGCTTT TAGACTAG FTUTUL4 FTUTUL4 1 834 341 GGCGAGTG CCAACCAC
FTUMDH FTUMDH 1 960 342 ATGGCTAG CAAAATAA FTU13KD FTU13KD 1 431 343
ATCGTAAT TAAGTATG FTUFOPA FTUFOPA 1 111 344 CAGATATA GATACTAC
OTSGROEL OTSGROEL 1 546 345 GTTGAAGT AAGAAAAA OTSSTA56 OTSSTA56 1
1059 346 CTAGTGCA AGCAGTAG RPRGYRA RPRGYRA 1 968 347 GAAGATTT
ACAAATAG RPROMP1 RPROMP1 1 985 348 TATATAAA ACAAGCTA YPEGYRA
YPEGYRA 1 812 349 GAAGACCT ACTGATGC YPEOMPA YPEOMPA 1 913 350
GTGGTAAA CCAGATCG YPECVE YPECVE 1 517 351 GTACAGAT TGAGGTAC YPECAF1
YPECAF1 1 525 352 TATGAAAA ATATAGAT ACAHAG ACAHAG 1 1082 353
GGTTGCGC TGCTCTCG ACAMAG ACAMAG 1 919 354 CCGTCTGT GTCATGTA ACAGH17
ACAGH17 1 810 355 ACACAGCA AAAAAAAA BDEWI-1 BDEWI-1 1 942 356
GGATCCAT TTTTTGTG BDEBYS1 BDEBYS1 1 912 357 ATGCATCT ATGATAAC
CIMAG2 CIMAG2 1 1234 358 CTCTCCCT TTTTGTTA CIMBG12 CIMBG12 1 965
359 ATAGAGGG GAAACGAT CPACP2 CPACP2 1 735 360 CTGAGGAA TTCAAAAA
CPASOD CPASOD 1 375 361 TTGAATTC GTGATGTA ECOGYRA ECOGYRA 1 812 362
GAAGATCT ACCGATGC ECOOMPA ECOOMPA 1 660 363 ATGAAGAA CGCTGTAA
SENGYRA SENGYRA 1 812 364 GAAGATCT ACGGATGC SENOMPA SENOMPA 1 904
365 GTGCTAAA CCGGATCG SDYOMPA SDYOMPA 1 907 366 GTGCTAAA CCGGATCG
SFLGYRA SFLGYRA 1 812 367 GAAGATCT ACCGATGC SFLOMPA SFLOMPA 1 898
368 GTGCTAAA CCGGATCG VCHGYRA VCHGYRA 1 887 369 GAAGAGCT ACCAATGC
VCHOMPA VCHOMPA 1 942 370 ATGAAAAA TCCCTGAA MSRA MSRA 1 400 371
GCAAATGG ATCACATG MECR1 MECR1 1 652 372 ATGGAGGT GAATCGAT MEFA MEFA
1 611 373 AATATGGG ACTACGGC ERMTR ERMTR 1 732 374 ATGAAACA TTCAATAA
ERMB ERMB 1 763 375 GATGTATC GGAAATAA EMRB EMRB 1 1560 376 ATGCCAAA
TTCACTAA GYRB GYRB 1 1947 377 ATGACAGA ATGTCTAA PARC PARC 1 2637
378 GAGTTTGC AATATAAG PARE PARE 1 2008 379 AAAATTTT TGTTTTAA PBP1
PBP1 1 1282 380 TTCGACCA ACGAGCTA PBP5 PBP5 1 668 381 TGACGATC
AACGAGCA MECA MECA 1 729 382 ATCGATGG ATGAATAA BLAZ BLAZ 1 846 383
TTGAAAAA AATTTTAA DFRA DFRA 1 486 384 ATGACATT GGAAATAG VANA VANA 1
1032 385 ATGAATAG AGGGGTGA QACC QACC 1 324 386 ATGCCTTA CGCATTAA
RMTB RMTB 1 756 387 ATGAACAT ATGGATAA STRA STRA 1 804 388 TTGAATCG
GGGGTTGA STRB STRB 1 837 389 ATGTTCAT CATACTAG AADA1 AADA1 1 792
390 ATGAGGGA GCAAATAA SULII SULII 1 816 391 ATGAATAA TTCGTTAA CTXM
CTXM 1 436 392 CAAGAAGA ATGGCACC KPC2 KPC2 1 918 393 CGTTGATG
CCACCACC AMPC AMPC 1 1140 394 ATGAAAAA TGGAATAA BLACMY2 BLACMY2 1
1146 395 ATGATGAA TGCAATAA AMPR AMPR 1 876 396 ATGGTCAG CGGCGTAA
SULI SULI 1 840 397 ATGGTGAC ATGCCTAG AACAAPHD AACAAPHD 1 1440 398
ATGAATAT AAGATTGA FLOR FLOR 1 1215 399 ATGACCAC TCGTCTAA TETM TETM
1 536 400 CACGCCAG CGGAAATG TETC TETC 1 502 401 TATCGTCC CAGTCAGC
TETS TETS 1 555 402 GCTACATT GGCATTCA TETA TETA 1 494 403 TGGCATTC
GCTATACG TETG TETG 1 550 404 CTCGGTGG GGCTTTGC TETL TETL 1 548 405
CTGGGTGA ATTCCTGA TETB TETB 1 571 406 AGTGCTGT TCCAAGCC PTX PTX 1
872 407 ATCACTAC CAGGAGCT BONT BONT 1 792 408 TGATGGAA TTCTACGG
NTNH NTNH 1 496 409 AATATTTG AATATGGT BOTE BOTE 1 1000 410 GATAAAAT
TAACTATG EPSILON EPSILON 1 620 411 ATGCGAAA GGAGGAGC TETANUS
TETANUS 1 1185 412 CCTGATAA CTAACGGA STX1A STX1A 1 948 413 ATGAAAAT
GCAGTTGA STX2A STX2A 1 960 414 ATGAAGTG GTAAATAA RICINUSTOXIN
RICINUSTOXIN 1 1133 415 ATGGTCCA CATCGGAT CTXAB CTXAB 1 984 416
CGGGCAGA CCTGAGGA PBLUEVEC PBLUEVEC 1 236 417 CTGCAGGA TGCGTTGC
PGEMVEC PGEMVEC 1 226 418 GAATATGC TGCGTTGC PUCVEC PUCVEC 1 252 419
AGACAGTT TGCGTTGC ATTIM4 ATTIM4 1 523 420 ACATCGAC GAGCTTGC
Sample Preparation
[0286] In any embodiment of the present invention, the target
nucleic acids (DNA and/or RNA) may be contained within a biological
sample. The term "biological sample", as used herein, refers to a
sample obtained from an organism or from components (e.g., cells)
of an organism. The sample may be of any biological tissue or
fluid. Alternatively, the sample may be one taken from the
environment (air, soil or water). Frequently the sample will be a
"clinical sample" which is a sample derived from a patient. Such
samples include, but are not limited to, sputum, nasal wash, nasal
aspirate, throat swab, blood, blood cells (e.g., white cells),
tissue or fine needle biopsy samples, urine, peritoneal fluid,
visceral fluid, and pleural fluid, or cells therefrom. Within the
context of the present invention it is preferred that the clinical
sample be a nasal wash, nasal aspiration or a throat swab. In a
particularly preferred embodiment the clinical sample is a nasal
wash. Biological samples may also include sections of tissues such
as frozen sections taken for histological purposes, or from
non-human animal sources, plants, or environmental sources such as
water, air, or soil.
[0287] In order to be detected using microarrays, the target
nucleic acids may require some degree of processing. For the
intended purposes, this will involve one or more of the following
target processing steps: (1) isolation, (2) enrichment for target
sequences of interest, (3) amplification, (4) labeling, and (5)
hybridization. Preferred embodiments corresponding to each of these
processing strategies are described below; however, the present
invention is not intended to be limited thereto. To this end, the
skilled artisan would readily appreciate alternative methods
corresponding to the aforementioned processing strategies both
based on those commonly in use and as described in U.S. Pat. No.
6,638,717, U.S. Pat. No. 6,376,191, U.S. Pat. No. 5,759,778, U.S.
Pat. No. 6,268,133, and U.S. Pat. No. 6,613,516 (each of which are
incorporated herein by reference in its entirety).
Target Nucleic Acid Isolation
[0288] In an embodiment of the present invention, the target
nucleic acids (RNA and/or DNA) to be assayed are isolated prior to
amplification of the same. Methods of isolating nucleic acids are
well known to the skilled artisan.
[0289] In a preferred embodiment, the target nucleic acid isolation
may be performed using a MasterPure.TM. DNA Purification Kit
(Epicentre Technologies, Madison, Wis.) ethanol purification method
(per manufacturer's instructions). In another preferred embodiment,
the target nucleic acids will be isolated using a rapid
microcentrifuge technique, as in the use of Xtra Amp Kit (XTRANA,
Inc. Broomfield, Colo.). In yet another preferred embodiment, the
nucleic acids will be isolated using an automated device for said
purpose, such as a GeneXpert (Cepheid, Sunnyvale, Calif.) or using
a robot for magnetic bead-based isolation (e.g. Qiagen or
Beckman).
[0290] A variety of other commercial products are available that
are geared towards purification and concentration of nucleic acids
from complex matrices. In addition to the methods described above
and in the invention, alternatives include:
QIAamp DNA Mini Kit (Qiagen)
(For Purification of Genomic, Mitochondrial, Bacterial, Parasite,
or Viral DNA)
[0291] The QIAamp DNA Mini Kit simplifies isolation of DNA from
human tissue samples with fast spin-column or vacuum procedures.
DNA binds specifically to the QIAamp silica-gel membrane while
contaminants pass through. PCR inhibitors such as divalent cations
and proteins are completely removed in two efficient wash steps,
leaving pure DNA to be eluted in either water or a buffer provided
with the kit. QIAamp DNA technology yields genomic, mitochondrial,
bacterial, parasite, or viral DNA from human tissue samples ready
to use in PCR and blotting procedures.
RNeasy Mini Kit (Ambion)
[0292] The RNeasy Mini Kit allows efficient purification of total
RNA from very small amounts of tissue or cells. Total RNA is easily
purified from animal cells or tissues, Gram-positive or
Gram-negative bacteria, or yeast. RNeasy technology simplifies
total RNA isolation by combining the stringency of
guanidine-isothiocyanate lysis with the speed and purity of silica
gel-membrane purification.
UltraClean.TM. Tissue DNA Kit (Mo Bio Laboratories, Inc.)
[0293] Fresh or frozen tissue samples are homogenized using
bead-beating technology to lyse the cells. Lysates are loaded onto
a silica spin filter. During a brief spin, the DNA selectively
binds to the silica membrane while contaminants pass through.
Remaining contaminants and enzyme inhibitors are removed by a wash
step. Pure DNA is then eluted into certified, DNA-free Tris
buffer.
UltraClean.TM. Tissue RNA Kit (Mo Bio Laboratories, Inc.)
[0294] Prepared fresh or frozen tissues are homogenized using a
tissue homogenizer or mortar and pestle in the presence of lysis
solutions. The RNA is captured on a silica membrane spin filter
while contaminants are passed through the filter by centrifugation.
The filter is washed to remove any remaining contaminants and
salts. The RNA is then eluted into certified RNase-free water
(provided). RNA is of high quality and is ready for any downstream
applications.
Wizard.RTM. Genomic DNA Purification Kit (Promega)
[0295] The Wizard.RTM. Genomic DNA Purification Kit is designed for
isolation of DNA from white blood cells, tissue culture cells and
animal tissue, plant tissue, yeast, Gram-positive and Gram-negative
bacteria. The Wizard.RTM. Genomic DNA Purification Kit is based on
a four-step process. The first step in the purification procedure
lyses the cells and the nuclei. For isolation of DNA from white
blood cells, this step involves lysis of the red blood cells in the
Cell Lysis Solution, followed by lysis of the white blood cells and
their nuclei in the Nuclei Lysis Solution. An RNase digestion step
may be included at this time; it is optional for some applications.
The cellular proteins are then removed by a salt precipitation
step, which precipitates the proteins but leaves the high molecular
weight genomic DNA in solution. Finally, the genomic DNA is
concentrated and desalted by isopropanol precipitation.
SV Total RNA Isolation System (Promega)
[0296] The SV Total RNA Isolation System provides a fast and simple
technique for the preparation of purified and intact total RNA from
tissues, cultured cells and white blood cells using this
membrane-based purification system. The system incorporates a DNase
treatment step directly on the membrane of the minicolumn.
Purification is achieved without the use of phenol:chloroform
extractions or ethanol precipitations, and there is no DNase
carryover in the final RNA preparation.
RNAqueous Technology (Ambion, Inc.)
[0297] RNAqueous kits can be used to purify total RNA from many
different tissues and cells. Cells or tissue are disrupted in a
guanidinium thiocyanate solution; this chaotropic agent effectively
lyses cells and inactivates endogenous ribonucleases. The lysate is
then diluted with an ethanol solution and applied to an RNA-binding
glass fiber filter. Proteins, DNA and other contaminants are
removed in three rapid washing steps, and the bound RNA is then
eluted in concentrated form.
Nucleic Acid Isolation Robotics
[0298] In addition to those methods described above, several
vendors (e.g. PSS BIO Instruments, Roche Diagnsotics, Qiagen,
Caliper) manufacture both small (benchtop) and/or high-throughput
liquid handling robots and associated reagents that can be used
instead of the manual methods described above. In a preferred
embodiment, one or more such robots and their associated reagensts
will be used to automatically isolate nucleic acids for the
subsequent processing (background removal and amplification).
Target Amplification
[0299] One set of technical challenges for pathogen detection with
microarrays arises because of the difficulty in obtaining samples
with a sufficient quantity of pathogen nucleic acids. Thus, for a
majority of sample types, some sort of amplification will likely be
required to provide sufficient copies of pathogen gene markers for
detection by microarray hybridization. Multiplex PCR as a
microarray preparative step is practically limited to tens of
different primer pairs, not thousands, and increasing numbers of
primer pairs give rise to varying numbers of spurious amplicons.
However, discrimination occurs when the labeled amplicons are
required to hybridize to specific probes on the array surface and,
fortuitously, spurious amplicons may not detected by the microarray
assay (Chizhikov et al., 2001).
Multiplex PCR
[0300] Conserved (degenerate) multiplex PCR serves to reduce the
systemic bias introduced with specific PCR protocols by designing a
series of primers selected to target the conserved regions flanking
species-specific variable regions of a gene(s) to be probed. In the
examples of the present application, the strategy for E1A, fiber,
and hexon genes has been illustrated; however, the skilled artisan
may expand this strategy to target any gene that is conserved
across a broad spectrum of species, but still has species-specific
variable regions. To this end, candidate genes and the specific
regions (conserved and variable) can be readily identified by
global or local homology searches (i.e., sequence alignment).
[0301] Applicants describe below a general strategy for the
selection of targets and primer design:
[0302] As used herein the term "primer" (and generally appreciated
in the art) refers to an oligonucleotide that is capable of acting
as a point of initiation of polynucleotide synthesis along a
complementary strand when placed under conditions in which
synthesis of a primer extension product that is complementary to a
polynucleotide is catalyzed. Typical polynucleotide synthesis
conditions include the presence of four different nucleotide
triphosphates or nucleoside analogs and one or more enzyme to
catalyze polymerization (e.g., a DNA polymerase and/or reverse
transcriptase) in an appropriate buffer ("buffer" includes
substituents which are cofactors, or which affect pH, ionic
strength, etc.), and at a suitable temperature.
[0303] To facilitate specific PCR, amplification primers typically
range from 13 to 35 nucleotides, preferably from 20 to 25
nucleotides. The nucleotide sequence of the primer must have
sufficient sequence complementarity to the target sequence in order
to facilitate hybridization. Although the degree of complementarity
will depend in large part on the length of the primer, the degree
of complementarity generally is at least 80%, preferably at least
90%, more preferably at least 95%.
[0304] For random PCR protocols, the preferred primer length is
from 6 to 10 nucleotides. The sequence would ideally include all
permutations of a hexameric (2 permutations) to a decameric
(2.sup.10 permutations).
[0305] For the case of "prototype" region design, the preferred
scenario would be to have conserved primers that flank variable
regions of target genes. This was the case in RPMV1 for adenovirus
and has been adopted by present inventors for Influenza A (using
conserved 3' and 5' ends of Influenza A segments for hemagglutinin
(HA) neuraminidase (NA) and matrix (M). This concept can be
extended to any of a large number of pathogen types because highly
conserved regions are ubiquitous in nature and degenerate primers
can be designed for those sequences.
[0306] When targets are not selected as "prototypes", the amount of
real estate on the chip can be greatly reduced compared to
prototype regions. The objective in these cases is not necessarily
to identify a particular strain or sub-variant of the species, but
rather to allow for enough base calls to be made for an unambiguous
statistical estimate demonstrating that the sequence corresponds to
the pathogen of interest and not to a closely related species or
non-virulent strain. With this chip design of multiple, small
targets, it is not feasible to design and optimize conditions for
the large number of specific PCR reactions that would be needed and
total amplification becomes the optimal amplification strategy.
[0307] Homology, sequence similarity or sequence identity of
nucleotide or amino acid sequences may be determined conventionally
by using known software or computer programs such as the BestFit or
Gap pairwise comparison programs (GCG Wisconsin Package, Genetics
Computer Group, 575 Science Drive, Madison, Wis. 53711). BestFit
uses the local homology algorithm of Smith and Waterman, Advances
in Applied Mathematics 2: 482-489 (1981), to find the best segment
of identity or similarity between two sequences. Gap performs
global alignments: all of one sequence with all of another similar
sequence using the previously described methods (Needleman &
Wunsch, 1970). When using a sequence alignment program such as
BestFit, to determine the degree of sequence homology, similarity
or identity, the default setting may be used, or an appropriate
scoring matrix may be selected to optimize identity, similarity or
homology scores. Similarly, when using a program such as BestFit to
determine sequence identity, similarity or homology between two
different amino acid sequences, the default settings may be used,
or an appropriate scoring matrix, such as blosum45 or blosum80, may
be selected to optimize identity, similarity or homology
scores.
[0308] Targets may also be amplified using a combination of
specific PCR reactants in what is referred to as "multiplexed PCR".
In this strategy, PCR primer pairs for all target regions on the
RPM are combined into one reaction miture. This is a preferred
method in cases where one or more of the pathogens is present in
too low a concentration to be detected using random amplification
strategies alone (described below).
Random Amplification Strategies
[0309] Despite the sensitivity and specificity associated with PCR
amplification, the inherent bias and limited throughput of this
approach limits the principal benefits of downstream
microarray-based applications. As successful identification depends
almost entirely on appropriately chosen primer sets, all PCR-based
testing requires a priori knowledge pertaining to the identity of
the contaminating organism. As such, there remains a critical need
for advanced diagnostic systems that can rapidly screen clinical
and environmental samples without bias for specifically anticipated
sequences. The advantage of microarray-based detection is that it
can combine unbiased (e.g. random) nucleic acid amplification
strategies with the subsequent discriminatory capability of
microarrays, resulting in high sensitivity, specificity and
throughput capacity.
[0310] The inventors have addressed the aforementioned issues
through the use of alternative approaches and have investigated
four front-end amplification strategies: random primed, isothermal
Klenow polymerase-based, 029 DNA polymerase-based and multiplex PCR
(Vora et al., 2004).
[0311] Klenow-based amplification--The Klenow fragment of DNA
polymerase I can be used for isothermal, random amplification of
target DNA of unknown sequence with random octamers oligonucleotide
primers. For the present invention, this can be performed using the
BioPrime.RTM. DNA Labeling System (Invitrogen). Biotinylated
amplicons are generated (following the manufacturer's recommended
labeling protocol with minor modifications). This process is
typically performed at 37.degree. C. for 4 hours. Klenow
amplification results in some high molecular weight amplification
products, but the vast majority of amplicons are 50-450 basepairs
(bp) in length. As such, these amplicons are suitable for direct
hybridization to DNA microarrays.
[0312] Random PCR amplification--Random PCR (rPCR) amplicons can be
generated using the 2.5.times. Random Primers Solution and
10.times. DNTP mix from the Invitrogen BioPrime DNA Labeling
System. The reaction components typically include: 1.times. PCR
buffer (Qiagen), 2.5 mM MgCl.sub.2, 1.times. DNTP Mix (containing
biotin-14-dCTP), 5 .mu.l of 2.5.times. random octamers and 5 U Taq
DNA polymerase (Qiagen). The amplification reactions are performed
using the following conditions for 35 cycles: 94.degree. C. for 30
sec, 26.degree. C. for 2 min, and 72.degree. C. for 1 min.
[0313] .phi.29-based amplification--Isothermal .phi.29 DNA
polymerase-based amplification with random hexamers can be
performed using the TempliPhi.TM. 100 Amplification Kit (Amersham
Biosciences Corp., Piscataway, N.J.). Biotinylated amplicons are
generated following the manufacturer's recommended protocol with
minor modifications (14.4 .mu.l amplification reactions contained
0.4 .mu.l enzyme mix and 3 .mu.l of 350 .mu.M biotin-14-dCTP and
are incubated at 30.degree. C. for 16 h).
[0314] The majority of both the rPCR and .phi.29 DNA polymerase
amplification products are too large to migrate into electophoretic
gels (>1500 bp). Large amplification products hybridize poorly
to immobilized probes on two-dimensional surfaces due to spatial
and steric constraints (Vora et al., 2004). Thus, the rPCR and
.phi.29 polymerase amplification products are digested with a DNase
I fragmentation buffer to achieve amplicon sizes that are
comparable to the Klenow and multiplex amplicons prior to
hybridization.
Tandem Amplification
[0315] Increased sensitivity to pathogen targets can be achieved
using a combination of the random amplifications strategies
described above. Tandem [Klenow+Klenow] and [.phi.29+Klendow]
tandem random amplification strategies can provide better
sensitivity to enriched pathogen targets than multiplex PCR. The
same random amplification strategies are also able to detect
diagnostic genomic targets in spiked environmental water samples
containing a 63-fold excess of contaminating DNA (Vora et al.,
2004). The results presented underscore the feasibility of using
random amplification approaches and begin to systematically address
the versatility of these approaches for unbiased pathogen detection
from environmental sources.
Combination of Target Enrichment and Random Amplification
[0316] Random amplification methods result in amplification of
non-target DNA as well as target DNA. Thus, the inventors describe
herein a variety of methods for enrichment of pathogen diagnostic
target sequences in advance of random amplification steps. This is
especially important to improve sensitivity and/or specificity of
detection of analytes in complex matrices such as clinical (e.g.
nasal wash or throat swab) or environmental (e.g. water, soil, air
collector) samples.
[0317] Complex samples (clinical and environmental) contain
significant, and sometimes overwhelming amounts of unknown genomes
and genomes not of interest (non-specific background). One approach
to reduce the non-specific background is to expose the nucleic
acids to enzymatic processes that selectively digest the background
genomic sequences. This process is described below within the
examples.
[0318] Another strategy for reducing non-pathogen genomic content
is to use background genomic sequences from a comparable complex
sample known to be negative for the organism(s) of interest and use
it to directly remove background genomic sequences. Such
"subtractive" methods can include: (1) immobilization of the
background genomes on solid phase surfaces such as gels or beads,
followed by hybridization with the test sample to absorb background
genomes from the samples. One can also use unlabeled background
genomes during hybridization of targets on microarray to block the
effects of non-specific binding due to the labeled background
genomes. This blocking approach has the advantage in that it does
not require additional steps during assaying. Both such strategies
are also described in the Examples section.
[0319] Alternatively or concurrently, one can also use strategies
to selectively enrich for the pathogen target nucleic acids prior
to random amplification. One possibility for direct selection (also
known as positive selection) from samples is to use immobilized
probes on a solid phase support (magnetic beads or gel matrices) to
selectively enrich for genomic targets of interest. Ideally, the
probes on the solid support should not be detectable on the array,
as they would result in false signals in the event that the
enrichment molecules are carried over or inadvertently amplified.
As such, the probes on the beads should select for target genomes
via an adjacent region or slightly overlapping with the region of
the analytes' genomes to be queried on the microarray. If there is
a slight overlap between the probe sequences and the analytes'
sequences that is detectable on the array, then this queried region
should be masked out during in silico analysis. In the event that
it is desirable that the enrichment probes have homologous sequence
to that on the microarray, then those probes should be comprised so
as to not undergo subsequent amplification, or be made susceptible
to selective enzymatic digestion prior to amplification. One can
use combinations of the above and other strategies if necessary
according to the specifics of the sample and applications of
interest.
Alternative Subtraction or Enrichment Methods
[0320] Immunoprecipitation is another way to enrich the pathogens'
nucleic acids. Antibodies for pathogens of interest could be mixed
with clinical samples, then precipitated with anti-IgG to pull down
pathogens, thereby removing background genomic DNA. This will be of
particular importance when it is desirable to correlate specific
genomic traits with a single pathogen, such as antibiotic
resistance markers or indicators of intentional genetic
manipulation.
[0321] Size exclusion is another method by which to subtract or
enrich pathogen DNA. Exemplary size exclusion methods include:
gradient centrifugation, column, or centrifugal filter units.
Gradient centrifugation or column separation methods are
time-consuming and require special set-up in the laboratory. The
use of "centrifugal filter units" for separation of high- from
low-molecular-weight species suffers from inconsistent and, often,
low recovery rates. Each of these methods requires a large volume
of starting material. Other size exclusion methods include: flow
cytometry or electrophoresis methods, such as
fluorescence-activated cell sorting (FACS) or with a special
electrophoretic chip used in an Agilent bioanalyzer.
General Target Enrichment and Amplification Strategies
[0322] The present inventors have described methods for subtraction
of "normal" human and microbial nucleic acids from nasal wash
specimens. This same principle can also be applied to samples of
any origin (clinical or environmental) where a representative
"normal" mixture of nucleic acids can be collected, pooled, and
prepared as a subtractive reagent.
[0323] A variety of approaches can also be applied to enrich for
the pathogens or specific targets that are tiled on the array prior
to performing a total amplification strategy. One such approach
would use immobilized variants of the sequences that are tiled on
the array. These variants would be bound to a solid phase component
used for separation (beads, matrices, etc.) and serve to enrich for
targets that will be amplified without being amplified themselves.
Making the enrichment probes susceptible to enzymatic degradation
can do this or they might be comprised of modified nucleic acids
that would not be amplifiable.
[0324] A more preferred embodiment would utilize probes that
recognize sequences that are adjacent to target gene sequences and
not represented on the tiled regions of the chip. In this manner,
their amplification in a non-biased amplification strategy would
not result in artifacts.
[0325] As conventional DNA sequencing technologies can be
considered as part of an overall information collecting process in
basic research, so can the described resequencing approach be
considered a component of an overall pathogen
detection/characterization scheme. Wang et al. (Wang et al., 2003)
describe a method for using a spotted DNA microarray comprised of
long oligonucleotides (70-mers) that recognized conserved
sequences, which are within a family of pathogens. Target sequences
bound to these locations were isolated and used in a conventional
DNA sequencing approach to allow further characterization. In an
analogous manner, a very preferred embodiment would involve the use
of the resequencing array to replace the steps related to
alternative forms of DNA sequencing, thus providing specific
pathogen characterization within hours instead of days and
enablement of an effective biodefense system.
[0326] The method of the present invention preferably does not make
use of specific sequences for amplification (PCR). The invention
illustrates that there are alternate methods for enrichment of
pathogen nucleic acids, for example: using solid phase support
separations, prior to applying reduced bias, and isothermal (e.g.
random-primed Klenow or strand displacement) amplification. In a
preferable embodiment, a skilled technician in a conventional
laboratory setting would be capable of performing time- and
cost-effective sample preparation using a minimal set of automated
steps to perform microarray experiments. In a highly preferred
embodiment, a minimally skilled technician (medical technologist or
medic) in a field environment (medical level 1) would be capable of
performing manual separation/enrichment of pathogen target nucleic
acids using a handheld instrument and perform isothermal
amplification of pathogen targets with few reagents and technical
steps.
Ampification and Hybridization Following Background Subtraction
[0327] In addition to random RT-PCR using primer D:
GTITCCCAGTCACGATCNNNNNNNNN (SEQ ID NO: 573), and Primer E:
GTTTCCCAGTCACGATC (SEQ ID NO: 574) (Kessler et al., 2004),
analogous, previously described primer sets (Wang et al., 2002),
variants of those primer sets, and/or random (6-9 mers) to
synthesize first strand cDNA from RNA viruses (pathogens) may be
employed. A subtractive hybridization protocol could be employed
after first strand cDNA synthesis is complete to reduce the amount
of background DNA in clinical samples before subjecting complex
(i.e. clinical or environmental) samples to one or more DNA
amplification step(s).
[0328] One method is direct subtraction of the background genomic
DNA from amplified products (first strand cDNA synthesis, then
total amplification) in hybridization solution with COT-1 human DNA
(which consists largely of rapidly annealing repetitive elements).
Another method is bead-based subtraction of background genomic DNA
from clinical sample after first strand cDNA synthesis and prior to
the DNA amplification step.
[0329] The third method is the combination of the above methods. In
this case, the human genomic background DNA from complex samples
would be subtracted using bead-based subtraction after first strand
cDNA synthesis and prior to the DNA amplification step(s).
Subsequently, the background DNA could be further subtracted from
complex sample in hybridization solution with sequences designed to
selectively capture human DNA and RNA. The DNA amplification step
itself could be aimed at amplifying the cDNA products formed in the
random RT step specifically, non-specifically, or through a
combination of methods to amplify both the specific primer
site-labeled RT-PCR products as well as pathogen genomic DNA
targets and other non-cDNA targets that did not arise from the
random RT-PCR step. One approach for doing so would be to ligate a
specific primer sequence to the pathogen DNA genomic targets (this
could be the same primer as used attached to cDNA products or a
separate one), allowing a single PCR step to amplify all RNA and
DNA pathogen targets.
RNA Substraction
[0330] Where separate processing pathways exist for detection of
RNA and DNA pathogens the following may be used. For development of
the RNA processing pathway, spiked and clinical (nasal wash and
throat swab) Influenza A positive specimens were processed using
different methods (random, universal, and multiplexed PCR).
However, to achieve acceptable levels of sensitivity with random
amplification approaches, human messenger and ribosomal RNA should
be subtracted using the MICROBEnrich.TM. kit (Ambion, Inc., Austin,
Tex.) in accordance with the manufacturer's instructions. A
preferred random amplification strategy may be empolyed adapted
from a previously-described method (Wang et al., 2002).
[0331] It was found by the present inventors that by employing this
method positive detection of H3N2 and H1N1 Flu A detection in 16/19
different culture-positive Flu A nasal wash and throat swab
specimens at concentrations as low as 0.45-3.75 pfu/150 .mu.L
sample aliquot may be achieved. It was further determined it was
possible to detect Flu A in 6/8 specimens in the low concentration
range following subtraction of human RNA.
[0332] Using random amplification approach, without background
subtraction, the flu sensitivity is about 0.25 pfu/.mu.l (1
ng/.mu.l) in spiking nasal wash. In clinical samples, the
sensitivty without subtraction is estimated to be about 1-10
pfu/.mu.l.
DNA Substraction
[0333] Random amplification for DNA samples may be performed with
either bacteriophage .phi.29 DNA polymerase or modified random
amplification protocol from previously published paper (Wang et al.
2002; Wang et al.2003). Briefly, DNA amplification utilizing
bacteriophage .phi.29 DNA polymerase with random hexamers can be
conducted according to the instruction of GenomiPhi.TM. DNA
Amplification Kit (Amersham Biosciences Corp., Sunnyvale, Calif.).
The amplified products are then ethanol precipitated according to
manufacture recommended protocol. DNA amplification utilizing
modified random amplification may be performed with initial round
of DNA synthesis with Sequenase.TM. version 2.0 DNA polymerase
(United States Biochemical, Cleveland, Ohio) using primer D,
followed by PCR amplification with primer E. For RNA amplification,
viral samples are then amplified by a modified version of a random
PCR protocol (Wang et al. 2002; Wang et al. 2003; Kessler et al.
2004). Briefly, 10 .mu.l of total RNA can be reverse transcribed by
using primer D and superscript III reverse transcriptase
(Invitrogen Corp. Carlsbad, Calif.), and was then amplified by PCR
with primer E.
Convergence of RNA and DNA Pathways
[0334] The combination of the RNA and DNA pathways is described in
FIG. 9. In an embodiment of the present invention, the pathways for
RNA and DNA are merged. This protocol is adapted from the lab of
Joseph DeRisi at University of California San Francisco
(http://derisilab.ucsf.edu/pdfs/Round_A_B_C.pdf) and randomly
incorporates a single, fixed-sequence PCR primer binding site into
all pathogen DNA or cDNA molecules, allowing them to all be
amplified in a subsequent step using a conventional PCR thermal
cycling protocol.
Data Acquisition and Processing
Affymetrix/Microarray Apparatus
[0335] In general, a "microarray" is a linear or two-dimensional
array of preferably discrete regions, each having a defined area,
formed on the surface of a solid support. The density of the
discrete regions on a microarray is determined by the total numbers
of different target polynucleotides to be detected on the surface
of a single solid phase support, preferably at least about
10.sup.2/cm.sup.2, more preferably at least about
10.sup.4/cm.sup.2, even more preferably at least about
10.sup.6/cm.sup.2, and still more preferably at least about
10.sup.8/cm.sup.2. As used herein, a DNA microarray is an array of
oligonucleotide primers placed on a chip or other surfaces used to
detect and/or isolate target polynucleotides. Since the position of
each particular group of primers in the array is known, the
identities of the target polynucleotides can be determined based on
their binding to a particular position in the microarray.
[0336] One embodiment of the invention utilized standard Affymetrix
hardware (Agilent GeneChip Scanner (phased out) and the Affymetrix
Scanner 3000 workstation and Fluidics Station 450. In principle,
the described invention does not require this equipment. Given the
manufacturer's intended uses of the GeneChip system for
quantitative gene expression profiling and high confidence SNP
detection, the existing apparatuses are not optimized for the
dynamic range of signal intensities or background interferences
inherent to the present invention. Thus, a preferred embodiment
incorporates the use-of image acquisition methods, including the
use of spatial frequency filtering and image enhancement, taking
advantage of the inherent regularity of the feature dimensions to
impose a filtering algorithm (e.g. edge enhancement, convolution,
etc.) that allows better contrast of noisy images.
Affymetrix CustomSeq Protocol
[0337] In an embodiment of the present invention, sample processing
is accomplished by employing the standard Affymetrix CustomSeq.TM.
protocol. Generally, this method entails: (a) amplification of DNA
probe-containing materials, (b) pooling and quantitation of
amplified product, (c) fragmentation and labeling of the amplified
product, (d) target hybridization, and (e) washing, staining, and
scanning of the hybridized target. A detailed description of the
Affymetrix CustomSeq.TM. protocol can be found in the product
manual and protocol guide, which are available from the
manufacturer and are incorporated herein by reference.
[0338] In the standard Affymetrix CustomSeq.TM. protocol, step (a)
entails either long-range PCR or short-range PCR, with long-range
PCR being the preferred amplification strategy. For each of these
amplification strategies, the manufacturer recommended PCR DNA
polymerases are Taq variants, AmpliTaq Gold DNA polymerase
(short-range PCR) and LA Taq DNA polymerase (long-range PCR).
Although not specifically recommended by the manufacturer, any DNA
polymerase may be employed for step (a) so long as the DNA
polymerase used is a high fidelity DNA polymerase.
[0339] To facilitate DNA amplification step (a), the standard
Affymetrix CustomSeq.TM. protocol employs specific PCR primers.
However, the use of specific PCR primers significantly limits the
broad application of the inventive technique due to the
introduction of systemic bias flowing from the specific PCR based
methods that have heretofore been employed. Accordingly, in a
preferred embodiment of the present invention step (a) of the
Affymetrix CustomSeq.TM. protocol is replaced with an alternate
amplification strategy, such as multiplex PCR, total amplification
(GenomiPhi.TM.), or random RT/PCR. These alternate strategies are
discussed hereinabove. Conditions for optimal PCR amplification for
each of the manufacturer recommended PCR strategies, as well as the
preferred strategies of the present invention, can be determined by
routine experimentation by the skilled artisan.
[0340] Since variability exists between PCR reactions, the
Affymetrix CustomSeq.TM. protocol states that assay performance may
be compromised if amplicon concentration in the hybridization
varies by more than two fold. Therefore, step (b) of the Affymetrix
CustomSeq.TM. protocol entails pooling PCR reactions and
spectrophotometrically quantifying the same to ensure equimolar
application of sample to the microarray.
[0341] However, the present invention poses several advantages over
the SNP detection tailored protocols of the Affymetrix
CustomSeq.TM. method. In particular, the use of the increased
density chips with sequence length-independent similarity searches
(BLASTN) in the present invention affords that fewer assumptions
must be made in advance of selecting sequences for tiling.
Furthermore, the use of length-independent similarity searches
(BLASTN) removes the constraint that a particular known subsequence
be successfully resequenced, making the approach more resistant to
variations in target concentration and contributions from
nonspecific binding leading to lost base calls. Accordingly, within
the present invention step (b) of the Affymetrix CustomSeq.TM.
protocol is optional and may be omitted.
[0342] Following DNA amplification, the DNA molecules obtained
thereby are too long to hybridize with the short probes on the
array surface. Accordingly, step (c) of the Affymetrix
CustomSeq.TM. protocol involves fragmentation and subsequent
labeling of the fragments with a fluorescent substrate. The method
and reagents for fragmentation and labeling are not particularly
limiting; however the label must be compatible with the detection
apparatus for the resequencing microarray. To this end, the
manufacturer recommended reagents and conditions may be
employed.
Alternative Variations of the Affymetrix Protocols
[0343] Fluorescent labels that may serve to be advantageous for the
methods described herein, as these are routinely used with
automated instrumentation for simultaneous high throughput analysis
of multiple samples, include the Cy fluorophores, the rhodamine
based fluorophores: TARAM, ROX, JOE, and FAM; the BigDye.TM.
fluorophores (Applied Biosystems, Inc.), the dansyl group,
fluorescein and substituted fluorescein derivatives, acridine
derivatives, coumarin derivatives, pthalocyanines,
tetramethylrhodamine, Texas Red.TM.,
9-(carboxyethyl)-3-hydroxy-6-oxo-6H-xanthenes, DABCYL.TM.,
BODIPY.TM., and ALEXA.TM. fluorophores (Molecular Probes, Eugene,
Oreg.)
[0344] Additionally, there are a variety of labels other than
fluorophores that will be suitable and perhaps preferable for a
variety of situations. These labels include, but are not limited
to: resonance light scattering (RLS) particles (In Vitrogen,
Carlsbad, Calif.), quantum dots (Quantum Dot Corp.) and other
nanoscale particles having desirable optical qualities.
[0345] The target hybridization (step (d)) may be performed as
described in the Affymetrix CustomSeq.TM. protocol. The highlight
of this step is that the sample containing the fragmented and
labeled DNA is denatured by a high temperature (e.g.,
85-100.degree. C., preferably 95.degree. C.) incubation followed by
a hybridization temperature (e.g., 45.degree. C.) equilibration.
Once the DNA-containing sample has equilibrated the sample is
applied to the resequencing array. The manufacturer recommends
conducting the hybridization reaction for 16 hours; however, as
stated above, the method of the present invention does not have its
hands tied to enhance sensitivity to make the appropriate base
calls. Therefore, shorter incubation times are appropriate for
target hybridization. Within the context of the present invention,
the target hybridization incubation time may range from a short
time of 15 minutes to a long of 24 hours. Clearly, it is
contemplated that this range of times embodies each of the
intermediate times as if they were explicitly stated. Preferable
times worth noting are 15 minutes, 30 minutes, 1 hour, 2 hours, 4
hours, 12, hours, and 16 hours.
[0346] The last step of the Affymetrix CustomSeq.TM. protocol
entails washing the hybridized array using the Affymetrix Fluidics
Station and scanned using the Agilent GeneArray.TM. Scanner. This
instrument simply automates what would otherwise be manually
performable labeling and rinse steps. Thus, any instrument that
would be capable of delivering and withdrawing milliliter
quantities of labeling and rinse media on a timed basis would be a
suitable alternative. The invention described herein will be
amenable to any subsequent hardware variations offered by
Affymetrix. In addition, data acquisition from the types of
resequencing microarrays described herein may be obtained from any
manufacturer of equipment for microarray processing.
Bioinformatics Issues Concerning Pathogen Detection
[0347] Depending on the endpoints used for microarray-based
detection of pathogens, the emphasis of bioinformatics issues is
very different. Bioinformatic tools are indispensable for the
efficient design and selection of specific complementary nucleic
acid probe sequences for microarray development. For example,
target pathogen genomic nucleic acid sequences are often amplified
prior to microarray analysis and bioinformatics clearly has a role
in the design of primers (assessing T.sub.m/T.sub.a, secondary
structure, self-complementarity, and specificity issues) for
assaying genes considered specific to an organism and strain
(Kampke, Kieninger & Mecklenburg, 2001). These same assessments
must also be made for microarray probe design.
[0348] During the initial stages of experimental design, it is
assumed that primers and probes to genetic signatures associated
with a target pathogen are specific to that pathogen or family of
pathogens. It would follow that the generation of an amplicon or
positive hybridization reaction using specifically designed primers
or probe, respectively, would indicate the detection of the
designated molecular trait from the target pathogen; however, this
is not necessarily true. Bacterial and viral `genetic promiscuity`,
the propensity of microorganisms to exchange genetic material,
creates difficulties in developing single species or strain
specific probes (Ochman, Lawrence & Groisman, 2000). Thus,
preferred primer and probe design methodologies require the use of
bioinformatic tools to: (a) perform multiple sequence alignments
between different organisms or strains and design appropriate
primers with the appropriate biochemical properties, (b) compare
these sequences with those deposited in sequence databases to
determine the present uniqueness of particular sequences and the
potential for cross-reactivity, and (c) infer the probability of
target specificity based on the level of genetic conservation and
evolutionary relatedness with other pathogenic and non-pathogenic
species whose primary genetic sequence has not yet been
elucidated.
[0349] A very important bioinformatics aspect of the disclosed
invention involves the assembly, annotation and selection of
pathogen diagnostic targets into database(s) for incorporation into
microarray design, as well as the concomitant task of relating
detection events on the microarray to such database(s). An
advantage of the present invention is that the information
contained in the publicly available databases is ever increasing,
thus further adding to the robust nature of the present invention.
The present invention describes a process of manually selecting
pathogen target sequences from the published literature (e.g.
GenBank) and/or ascertaining an empirically determined diagnostic
target sequence from published literature. The described approach
has the advantage that a consortium of scientists, each possessing
sufficient "domain expertise" for each of a large number of
disparate pathogen species, can provide relevant, pathogen
diagnostic sequence information that can be incorporated into an
automated array design process without specific regard to specific
probe, reagent, amplification, and sample preparation methods.
[0350] In one very preferred embodiment, the requisite domain
expertise for each of a large number of unrelated pathogens will be
maintained in an up-to-date fashion through a web-portal enabled
database. Thus, an extended consortium, comprised of individual
researchers of specific pathogens, would be able to provide the
latest annotated target sequence information via a "pathogen page"
formatted web portal, analogous to the "molecular page" model
adopted by the Alliance for Cellular Signaling (AfCS). The AfCS
database then maintains an otherwise incomprehensible amount of
specific information on thousands of molecules involved in
intracellular signaling cascades. In this format, individual
researchers without specific knowledge about individual signaling
molecules can access detailed parameters that can be used in
numerical simulations of signaling events. Thus, in another very
preferable embodiment, the annotated target sequence data for
individual pathogens is organized into an automated data pipeline
in which will impose user-defined design constraints (e.g. number
of probe features, number of pathogen targets, the levels of
sensitivity and specificity required for array performance, etc.)
upon the total information content of a pathogen database, allowing
automated, optimal target selection and submission of those targets
to a vendor in a format necessary for microarray fabrication.
[0351] In yet another very preferred embodiment, the selected
target sequences determined by the previous process will be
correlated with the data that is collected in actual use of the
microarray, such that metrics for probability and quality can
readily used for decision-making. Two preferable approaches for
performing such automated pipelining of data and algorithms are
VIBE (Visual Integrated Bioinformatics Environment) software
(Incogen, Inc., Williamsburg, Va.) and iNquiry (BioTeam, Boston,
Mass.) which are representative of a class of integrated
bioinformatics environments that could be used to equal effect for
the intended purpose.
Data Acquisition
[0352] Raw sequence data from the resequencing microarray chips is
provided by the Genetic Data Analysis Software version 2.0 (GDAS)
packaged with the microarray reader from Affymetrix.
[0353] The Affymetrix resequencing array contains a defined number
of probe cells or features. During scanning, the software divides
each feature into subunit squares or pixels (3.times.3 .mu.m). Each
feature contains many copies of a unique 25-base oligonucleotide
probe of defined sequence, while a series of eight features query a
specific site in a known reference sequence. Four features
interrogate the sense strand and contain probes that are identical
except for the central base which is A, C, G, or T and four
features interrogate the anti-sense strand and contain probes that
are identical except for the central base which is A, C, G, or
T.
[0354] GDAS uses the cell intensity data to make base calls for
every base position represented on the resequencing array. Under
the manufacturer setting for GDAS, the algorithm uses the intensity
data from multiple samples to improve its calling accuracy and
assigns a quality score for each call.
[0355] GDAS base calling is based on a previously described
base-calling algorithm, ABACUS, detailed in (Cutler et al., 2001)).
The model assumes that the pixel intensities of a feature are
independently and normally distributed. The algorithm computes the
estimated mean background and variance for the sense and anti-sense
strand features. The base-calling algorithm also specifies models
for the presence or absence of various genotypes in the sample
(haploid or diploid). A variety of base calling algorithm
parameters can be defined by the user (GDAS operator's
manual/user's guide, Affymetrix) to obtain a trade-off between base
calling percentage and accuracy.
[0356] Additional information regarding the GDAS algorithm and the
parameters that can be modified is available in the GDAS user
manual, which is incorporated herein by reference in its entirety.
A description of the parameters is found in the GDAS version 2.0
manual on pages 207-217. The recommended (default) settings for
GDAS are "conservative" settings that focus on the highest level of
accuracy. In contrast, the objective of the present invention is to
increase the percentage of base calls. To achieve this objective,
the present inventors adjusted the parameters to allow highly
permissive base calls (increased percentage) as listed below:
"Permissive" Base Calling Algorithm Settings"
[0357] Filter Conditions [0358] No Signal threshold=0.500
(default=1.000000) [0359] Weak Signal Fold threshold=20000.000
(default=20.000000) [0360] Large SNR threshold=20.000000
(default=20.000000)
[0361] Algorithm Parameters [0362] Strand Quality Threshold=0.000
(default=0.000000) [0363] Total Quality Threshold=25.0000
(default=75.000000) [0364] Maximum Fraction of Heterozygote
Calls=0.99000 (default=0.900000) [0365] Model Type (0=Heterozygote,
1=Homozygote)=0 [0366] Perfect Call Quality Threshold=0.500
(default=2.000000)
[0367] Final Reliability Rules [0368] Min Fraction of Calls in
Neighboring Probes=1.0000 (disables filter) [0369] Min Fraction of
Calls of Samples=1.0000 (disables filter)
[0370] The settings above are significant in the present
application because the base call algorithm is set up by default to
sacrifice the number of base calls made in order to make the most
accurate calls (i.e., for SNP detection). In the present
application, the technique is less concerned about achieving the
same degree of accuracy as required for SNP detection but instead
expanding the number of calls made so that the longest possible
stretches of contiguous sequence are produced by GDAS while
maintaining necessary specificity.
[0371] It is to be understood that within the scope of the present
invention, the above-listed permissive settings can be altered
individually or in toto as desired by the practitioner to obtain an
optimal sensitivity/specificity agreement. In addition, it is to be
understood that the settings above are exemplary and that each
setting may be altered by 10% or more (parameter dependent) without
altering the desired result of the present invention.
Resequencing Pathogen Identifier (REPI) and Alternatives,
Modifications, Developments
[0372] Also according to the invention, it is the sequence
information derived from a base-calling algorithm, as applied to
the microarray hybridization pattern that is used to identify
individual pathogens. Preferably, the sequence of target sequences
determined by the resequencing probes is used to query a database
using a similarity search algorithm. More preferably, the algorithm
uses commonly used local alignment (e.g. Smith-Waterman, BLASTN)
sequence alignment algorithms to statistically determine the
probability that a given target sequence corresponds to a specific
sequence in a database record (Korf, Yandell & Bedell, 2003).
Even more preferably, a custom algorithm that determines
subsequences that are most suitable for producing meaningful
similarity searches against database records determines the set(s)
of sequences that are submitted for similarity search
automatically. Yet even more preferably, the automated
subsequence-parsing algorithm is the Resequencing Pathogen
Identifier (REPI) algorithm described in this invention and the
sequence database records will be in both the public (e.g. GenBank)
and private domain. Variants of nucleic acid sequence similarity
search algorithms that are suitable for use in the intended
invention include, but are not limited to: Washington University
BLAST (WU-BLAST), NCBI-BLAST, FastA, MPsrch, Scanps, and BestFit
(Korf et al., 2003).
REPI Alternatives and Variants
[0373] In the described invention, REPI (Resequencing Pathogen
Identifier) software (see U.S. Application Ser. No. 60/609,918
filed on Sep. 15, 2004, and U.S. Application Ser. No. 60/631,460,
filed on Nov. 29, 2004, which are incorporated herein by reference
in their entirety) was used to ascertain which base call
subsequences from the CustomSeq/GCOS/GDAS process would likely
return significant BLAST results through the use of a customized
sliding window algorithm. Subsequently, REPI automatically returns
BLAST outputs to the end user that allow probabilistic assignments
to the likelihood that a given set of base calls correspond to a
particular microbial sequence. This low-level software
functionality is analogous to the "kernel" of UNIX or UNIX-derived
computer operating system, in that all higher-level functions and
user interfaces must pass though it for resequencing chip
analysis.
[0374] The low-level functionality provided by REPI will be central
to a number of higher bioinformatics tasks that will utilize
discontinuous segments of nucleic acid, or even amino acid
sequence. In the following examples, the present inventors provide
data showing that sequence fragments can be linked automatically to
individual pathogens. In several more preferred embodiments, this
approach can be refined to better discriminate between mixtures of
pathogens and genetic recombination between pathogens. In one very
preferred embodiment, the analysis software would allow for
automatic detection of overlapping or homologous sequence fragments
on different tiled regions of the array, allowing inference of a
mixture of pathogens. In an even more refined embodiment, the
analysis software would determine that the sequence outputs from
different tiled regions are not overlapping but correspond to
contiguous sequence that may be used to infer a genetic
recombination event.
[0375] For example, a co-infection of two strains of a virus may
produce a recombinant with a gene that is homologous with one virus
strain except for the 5' end, which has been substituted with the
corresponding section of gene for the other virus strain. When this
new recombinant virus genome is hybridized on a resequencing
microarray, it produces signal from the corresponding pieces of
both regions. One would need to have an assembly algorithm to
construct a "model" of the pathogen showing which parts might fit
together to form an entire target. If the two have significant
overlap, one might conclude that there is probably a mixture. But
if there were no overlap, there would remain a possibility that
there is a recombinant. The degree of overlap (or lack of) could be
affected by low concentrations of target with correspondingly
smaller amounts of the tiles being filled in. This same principle
can be applied even more readily, and with greater impact, on
viruses where the recombination is a steady and recurring event, as
in the case of influenza, where recombinations between viral
segments result regularly in the formation of new viral strains. In
fact, this described functionality in REPI will be essential for
the distinction of pathogen mixtures versus recombination.
[0376] In another very preferred embodiment, REPI algorithms will
allow for the analysis of transcriptional markers (e.g. RNA) that
have been resequenced using the presently described type of
microarray (via hybridization of RNA or complementary cDNA). In a
method analogous to that described above for inference of genomic
recombination events, transcriptional sequences may also be
assembled to determine pathogen viability and transcriptional
editing events that can serve as markers for infection.
Another REPI Alternative (Estimating the Amount of Pathogen Target
in a Sample)
[0377] Not only is the present inventive approach able to
distinguish between mixtures of pathogens and recombination events
within a given pathogen (described elsewhere herein) it would also
be of great value to provide the end user with an estimate
(quantitation) of the relative amount of pathogen that was detected
in the resequencing microarray assay. In particular, this would be
of great utility when the clinician (technician) attempts to assign
cause and effect when multiple pathogen genomic signatures are
detected.
[0378] Two types of data may be used for this purpose. The first is
the absolute intensity of the hybridization signals on the chip. A
non-linear relationship exists between the amount of target in
solution and the amount that actually hybridizes and the resulting
signal. However, an estimate of the amount of target nucleic acid
in the sample could be made by comparison with a standard curve
prepared under control conditions. The signal intensity data is
readily available from the .CEL file in the Affymetrix data
hierarchy, and although the content of the .CEL files were not used
in this disclosure, the output of REPI could easily be modified to
include the intensity values of the .CEL files. Secondly, the
percentage of base calls, both as a percentage of the total tile
region size and as a percentage of base calls within a selected
subsequence satisfying the sliding window algorithm, could be used
as a measure of concentration. Our results show that both of these
percentage metrics decrease with decreasing target concentration,
although the correct pathogen can still be identified.
General Utility for Pathogen Detection
[0379] In a preferred embodiment, the invention described herein
will be used for the routine diagnosis and surveillance of common
respiratory pathogens in a clinical setting (at or near
point-of-care). Readily obtainable samples (e.g. nasal wash, nasal
swab, throat swab, sputum, or blood) will be processed in a simple
manner to produce nucleic acid isolates that are obtained using an
adsorptive process, enriched for pathogen-specific targets,
amplified using a non-biased (e.g. total) amplification method or
multiplexed PCR method, and hybridized on the resequencing
microarray for a minimal amount of time prior to washing and
imaging. The overall process will be sufficiently simple such that
a skilled technician (medical technologist level) will be able to
perform the assay without a significant interruption in their
routine work pattern. Base calls will be made using the custom
algorithms or using the steps specified by the vendor. REPI, or
some variant thereof, will be used to automatically parse the base
calls made by the microarray, and provide the end-user (e.g.,
physician, health care provider, public health officer, or other
decision-makers) with decision-quality information for management
(e.g., diagnostic, treatment, prognostic and outbreak
control/containment measures) of the infectious pathogen(s) that
are causative of the disease symptoms and complications. This
analysis would occur locally through the use of an embedded
sequence database that would be queried by REPI (e.g. local
dedicated BLAST server). In addition to providing a routine
diagnostic functionality, the microarray will also carry markers
for highly improbable (i.e. bioterrorism) pathogens that would be
cause for involvement of others, namely public health officials.
However, it is understood that a nasal wash or throat swab may not
be the optimal sample type for diagnosis of bioterrorism agents and
that a separate sample type may be needed.
[0380] Also within the scope of the present invention, which
further demonstrate the utility of the microarrays and methods of
the present invention, include:
Scenario 1:
[0381] Patient arrives to medical facility with T>100.5 and
respiratory symptoms. Nasal wash and/or throat swab are taken.
Pathogens which can be identified by this route include the
commonly occurring pathogens including those listed in Table 1. The
presence of fever has been found to be an important criteria for
isolating respiratory pathogens by culture and literature
demonstrates that pathogens are typically present at peak titer
during febrile periods.
[0382] For the bioterrorism agents, little information is available
in the literature on infectious titers in respiratory secretions
after intentional release of a biological agent. It is suspected
that an aerosol release of an agent of bioterrorism would lead to
detectable titers of organism in the first 24 hours post-exposure.
In individuals presenting after the first 24 hours post-exposure,
the microarray would serve the purpose of identifying common
pathogens that might be otherwise erroneously suspected of being
cases from the BT/BW agent. A proportion of individuals exposed to
a significant aerosol release will develop symptoms rapidly and
will retain the BT/BW agent in the nares for detection. For a
disease, such as smallpox, the incubation period for onset of
symptoms is much longer, but the virus can be isolated from throat
culture for days.
Scenario 2:
[0383] The individual has been ill for days and did not initially
present to a medical clinic, but chose to self-medicate. Patient
may not have initially noted fever, but now has fever and
respiratory symptoms and is concerned about lack of resolution.
Health Care Practitioner (HCP) sees patient and determines
appropriate clinical sample. If patient appears more ill, then
chest X-ray may be acquired. Common organisms causing lower
respiratory tract infection include, but are not limited to:
Mycoplasma pneumoniae, S. pneumoniae, C. pneumoniae, L.
pneumophila, S. pyogenes, Influenza A/B, RSV, Parainfluenza, S.
aureus, SARS. Amongst the BT/BW agents of concern, an abnormal
chest x-ray can be found with B. anthracis, Smallpox, Y. pestis, F.
tularensis. For the common respiratory organisms, nasal wash/throat
swab can still be appropriate sample techniques, although some
practitioners will choose to send a sputum samples. For the BT/BW
agents, sputum and blood can be appropriate sample techniques.
Scenario 3:
[0384] Ongoing outbreak of a known agent. The microarray can
continue to be useful for detecting cases of other common
pathogens, but may also be used to screen for known BT/BW agent
using any type of appropriate specimen. Other assays may prove less
expensive, but the microarray can provide forensic information and
antibiotic resistance data at the time of pathogen
identification.
Overall Business Model
[0385] The resequencing DNA microarray and associated devices (to
include other types of lower content microarrays or alternative
sequence detectors) will be but one class of elements of an
integrated pathogen diagnostic/surveillance system. This system
will be comprised of diagnostic, informatic, and epidemiologic
components. At the diagnostic level, the RPM (and ancillary
devices) will provide a rapid and cost-effective methodology for
providing a diagnosis, patient-specific treatment information, and
prognostic information (based on virulence and resistance markers)
of infectious respiratory disease. This will represent a shift in
diagnostic emphasis for the provider from clinical suspicion of one
or a few pathogens (e.g., Mycoplasma pneumoniae and other pathogens
with a presentation of "walking pneumonia" would be tested by
obtaining bacterial cultures, presence of antibodies to that
pathogen, etc.) to symptom-based and broad differential testing by
multiplexed DNA microarrays based on the symptom(s) and the organ
system or systems effected. The single highly multiplexed DNA
arrays will additionally provide diagnostic information on which
group of pathogens to treat (e.g., bacteria, viruses, parasitic,
and fungal pathogens) producing substantial return on investment by
reducing the number of prescriptions based on suspicion of causal
agents (e.g., prescribing antibiotics for a viral respiratory
infection will have no effect). Additional diagnostic applications
can be created to cover all pathogens effecting an organ or organ
systems (e.g., respiratory, genitourinary, central nervous system,
etc.), syndromes (e.g., fever of unknown origin evaluated in the
hospital for two weeks without identifying the cause to rule out
all potential infectious disease causes), symptoms (e.g., all
pathogens causing a urticaria or rash that itches, or cough, etc.),
around collections of pathogens or threats present in geographic
locations, or functional categories (e.g., groupings based on all
the organisms known to have antimicrobial resistance in hospital or
other confined settings like prisons or nursing homes). This highly
multiplexed differential diagnostic testing will be done by
determining the causative agent(s) of infection, antimicrobial
susceptibility of the agent(s), and genomic sequence-specific clues
as to the nature of the agent such that the maximum number of
opportunities is identified to increase the efficacy and efficiency
of clinical management.
[0386] However, because of the high number of specific molecular
queries that are being made, a large number of pathogens can be
detected, each at previously unattainable levels of detail. As
such, a large number of assays (culture, serotyping, and PCR
confirmation) that are subsequently and infrequently performed as
part of "surveillance" activities are now being performed
simultaneously with diagnosis, obviating the need for burdensome
and expensive tasks currently required at the clinical level for
surveillance support. This would have great relevance to diagnosis
and surveillance of variants of RNA viruses (e.g. Influenza and
SARS) in a manner that would be prohibitively complicated when
relying on specific oligonucleotide probe sets.
[0387] In the specific embodiment described here, it will be highly
advantageous to have diagnostic microarrays, the fabrication of
which will not rely on the availability of a large number of target
sequences and a means to fabricate arrays using them. More
importantly, it will be critical not to be restricted to the
assumption that the target sequences are invariant. Diagnostic DNA
microarrays will identify specific but unanticipated genomic
variants of a model pathogen, without requiring re-design of
specific oligonucleotide probes and array re-fabrication. This will
be critical to the ability to characterize the cause(s) of
infectious disease outbreaks in a time-effective manner. For
example, such microarrays could be used to rapidly detect new
variations of influenza or SARS virus without requiring that the
pathogens be isolated, cultured, and sequenced using conventional
approaches; a process that would require weeks to months if the
pathogen were readily cultivable.
[0388] The present invention also embraces applications for
detecting a mixture of pathogens, especially when there is no
preliminary evidence to suggest that the mixture might be
interrogated by specific reagents (e.g. PCR primers). Thus, the
present invention provides a means to impact the ability to
determine the complementary roles of interacting pathogens in
disease etiology. However, in applications such as viral or
bacterial stock quality control and assessment of viral vaccine
production, which involves the intentional mixture of field strain
and cultivable viruses to produce recombinants that culture well.
Thereby, the present invention enables the presentation of the
correct the appropriate epitope(s) for vaccine efficacy.
[0389] The informatics component of the system will provide the
necessary components to allow local (point-of-care), automatic
microarray data analysis as well as coordinate multi-directional
information transfer. "Upward" flow of information will entail the
transfer of specific sequence base calls from the resequencing
chip, preferably in FASTA format, and all associated local
processing results, to local, regional, national and international
levels. "Lateral" flow of information will involve the exchange of
specific sequence base calls and associated local processing
results to other local point-of-care medical facilities. "Downward"
flow of information is defined as a provision of national level
data integration to regional and local health officials.
[0390] Like clinical samples, environmental samples may contain
small amounts of target nucleic acids in a high genomic background
of unknown origin. But unlike clinical samples of a given type, the
background found in an environmental sample (e.g. soil, water, or
collected from an aerosol particle collector) might show a more
heterogeneous composition depending on the geographical location,
season, and environmental conditions. Accordingly, the
aforementioned amplification, enrichment and/or subtraction
strategies may be employed to obtain reliable base calling.
Forensic and Environmental Applications
[0391] The amount of detailed sequence information provided by the
RPM will be enabling for various applications other than medical
diagnosis and surveillance. Thus, the capability of the apparatus
extends to forensic fingerprinting of specific pathogen strains.
The capability enables preemptive diagnosis of etiologies of
infectious disease, as alternative to conventional practice of
corroborative diagnostic analysis. In the case of an intentional
infection, poisoning, or bioterrorism event, the resequencing
pathogen detection microarray would allow for detailed strain
identification that could be used to determine the possible
originator of the event and to allow for rapid mitigation of the
event (e.g. determination of infectious capacity, antimicrobial
resistance, or engineered modifications to an organism) by
implementing targeted public health containment measures directed
by having the pathogen identified to the strain as the completed
first step in the outbreak investigation process.
[0392] The present invention further suggests methods and processes
to automate and optimize the iterative and adaptive design,
fabrication and validation of arrays, including derivative
sub-arrays. In a very preferable embodiment, an enterprise level, a
consortium of experts on individual pathogens would maintain a web
portal-enabled database. The consortium would maintain pathogen
target sequences for identification and virulence.
[0393] The same technology described in the present invention can
be used for non-clinical samples, including those collected from
air, water, soil or surface swabs. The only modifications necessary
to those described in the present invention will be those necessary
for nucleic acid extraction and background nucleic acid removal, if
a subtractive approach followed by generic amplification is a
desired approach.
Multiple Pathogen Surveillance in a Population
[0394] The invention further provides a specific implementation
that validates its capabilities in a real world operational
setting. This implementation relates to epidemic outbreaks of acute
respiratory disease involving common and less common etiologies,
while simultaneously evaluating the possible presentation of
hostile pathogenic agents (rare if ever) in individual encounters.
Thus, the invention provides a means to perform near real time
surveillance of a plurality of infectious pathogens involved in an
infectious outbreak. Such surveillance may be validated and
eventually become operational in a "real world testbed". In a
preferred embodiment, the real world testbed is a human population
that regularly encounters a variety of respiratory pathogens. In a
preferred embodiment, the population is comprised of military
personnel at an installation or base. In a very preferred
embodiment, the population is comprised of active duty military
personnel.
End User-Specified Applications
[0395] The integrated process of microarray design and assay will
require only that sequences be provided to a microarray
manufacturer and not involve the design of ancillary reagents (e.g.
for specific PCR). Thus, an end user who has no detailed knowledge
of genomics or would be capable of defining qualitative attributes
of a microarray-based assay, and an automated bioinformatics
pipeline would be used to select suitable target gene subsequences
for submission to a resequencing microarray manufacturer. This
would allow rapid deployment of a new micorarray design for a
specific geographic location, theater of operations. Thus, the
integrated design/analysis capability that is enabled by the
present invention will generalize to other envisioned applications
besides those listed herein.
Adenovirus Sequences
[0396] In an additional embodiment of the present invention are the
genomic sequences of thirteen adenovirus strains, which were not
known as of the date of the present invention. The thirteen
adenovirus strains are: Ad3, Ad3FS_navy, Ad4, Ad4vaccine,
Ad4FS_navy, Ad4FS_AF, Ad5FS, Ad7, Ad7FS_navy, Ad7 vaccine, Ad16,
Ad1, and Ad21. These genomic sequences have been assigned the
GenBank accession numbers shown in Table 6 appearing in the
Examples. The full GenBank records, including partial annotation,
for each of these sequences are found in the Sequence Listing
attached herewith.
[0397] A "polypeptide" as used herein is understood to mean a
sequence of several amino acid residues linked by peptide bonds.
Such amino acids are known in the art and encompass the unmodified
and modified amino acids. In addition, one or more modifications
known in the art such as glycosylation, phosphorylation, etc may
modify the polypeptide.
[0398] The term "isolated" means separated from its natural
environment. This term is intended to also embrace the terms
"purified" (100% pure) and "substantially purified" (at least 90%
pure).
[0399] The term "polynucleotide" refers in general to
polyribonucleotides and polydeoxyribonucleotides, and can denote an
unmodified RNA or DNA or a modified RNA or DNA.
[0400] The term "homologous" as used herein is understood to mean
sequence similarity between two or more polynucleotides or proteins
from the same species or from a different species. Within the
meaning of this term, said two or more polynucleotides (or
proteins) are homologous when at least 70%, preferably at least
80%, most preferably at least 90% of the nucleotide base (amino
acid) composition of a candidate sequence corresponds to the
sequence according to the invention. According to the invention, a
"homologous protein" is to be understood to retain at least 50%,
preferably at least 75%, more preferably at least 85%, most
preferably at least 95%, of the activity of the activity of the
sequence of the present invention. As used herein "corresponds" is
to be understood to mean that the corresponding amino acids are
either identical or are mutually homologous amino acids. The
expression "homologous amino acids" denotes those that have
corresponding properties, particularly with regard to their charge,
hydrophobic character, steric properties, etc. The same terminology
can be used to describe DNA or RNA sequence homology for gene
sequences that encode the corresponding proteins.
[0401] The term "homologous fragment" as used herein is understood
to mean two or more polynucleotides or proteins from the same
species or from a different species. In this context, it is
contemplated that a fragment is homologous when it shares at least
40% identity to a fragment having at least 50 amino acids.
Preferably, homologous fragments share at least 50% identity to a
fragment having at least 50 amino acids. More preferably,
homologous fragments share at least 60% identity, at least 70%
identity, at least 80% identity, at least 90% identity, or at least
95% identity to a fragment having at least 50 amino acids.
Accordingly, homologous fragments are included within the scope of
the present invention. For homologous polynucleotides, it is
understood that that the same homology ranges are envisioned in the
present inventions but over a range of up to 1000 nucleotides,
inclusive of all integers (i.e., 150, 250, 300, 500, 750,
etc.).
[0402] Homology, sequence similarity or sequence identity of
nucleotide or amino acid sequences may be determined conventionally
by using known software or computer programs such as the BestFit or
Gap pairwise comparison programs (GCG Wisconsin Package, Genetics
Computer Group, 575 Science Drive, Madison, Wis. 53711). BestFit
uses the local homology algorithm of Smith and Waterman, Advances
in Applied Mathematics 2: 482-489 (1981), to find the best segment
of identity or similarity between two sequences. Gap performs
global alignments: all of one sequence with all of another similar
sequence using the method of Needleman and Wunsch, J. Mol. Biol.
48:443453 (1970). When using a sequence alignment program such as
BestFit, to determine the degree of sequence homology, similarity
or identity, the default setting may be used, or an appropriate
scoring matrix may be selected to optimize identity, similarity or
homology scores. Similarly, when using a program such as BestFit to
determine sequence identity, similarity or homology between two
different amino acid sequences, the default settings may be used,
or an appropriate scoring matrix, such as blosum45 or blosum80, may
be selected to optimize identity, similarity or homology
scores.
[0403] The present invention also relates to polynucleotides that
contain complete genes that are identified by open reading frames.
Examples of the preferred genes embraced by the present invention
include E1A, hexon, and fiber. The sequences of E1A, hexon, and
fiber, as well as other preferred polynucleotide sequences within
the scope of the present invention are found in the attached
Sequence Listing. The present invention also embraces fragments of
said genes and polynucleotides and fragments thereof that can be
obtained by screening by means of the hybridization of a
corresponding gene bank with a probe which contains the sequence of
said polynucleotide or a fragment thereof, and isolation of said
DNA sequence.
[0404] The present invention also relates to coding DNA sequences
that result from degeneration of the genetic code. Moreover, one
skilled in the art is also aware of conservative amino acid
replacements such as the replacement of glycine by alanine or of
aspartic acid by glutamic acid in proteins as "sense mutations"
that do not result in any fundamental change in the activity of the
protein, i.e. which are functionally neutral. It is also known that
changes at the N- and/or C-terminus of a protein do not
substantially impair the function thereof, and may even stabilize
said function.
[0405] Polynucleotide sequences according to the invention are
suitable as hybridization probes for RNA, cDNA and DNA, in order to
isolate those cDNAs or genes, which exhibit a high degree of
similarity to the probe sequence.
[0406] Polynucleotide sequences according to the invention are also
suitable as primers for polymerase chain reaction (PCR) for the
production of DNA that encodes an active enzyme.
[0407] Oligonucleotides such as these, which serve as probes or
primers, can contain more than 30, preferably up to 30, more
preferably up to 20, even more preferably at least 15, and most
preferably at least 13 successive nucleotides. Oligonucleotides
with a length of at least 40 or 50 nucleotides are also
suitable.
[0408] Hybridization protocols are known in the art and are
disclosed, for example, in Sambrook et al., Molecular Cloning: A
Laboratory Manual, Cold Spring Harbor Laboratory, N.Y. (1989).
However, as used herein, stringent hybridization conditions are
those conditions which allow hybridization between polynucleotides
that are 75%, 80%, 85%, 90%, 95%, or 98% homologous as determined
using conventional homology programs, an example of which is UWGCG
sequence analysis program available from the University of
Wisconsin (Devereux, Haeberli & Smithies, 1984). Typically,
stringent conditions will be those in which the salt concentration
is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na
ion concentration (or other salts) at pH 7.0 to 8.3 and the
temperature is at least about 30.degree. C. for short probes (e.g.,
10 to 50 nucleotides) and at least about 60.degree. C. for long
probes (e.g., greater than 50 nucleotides). Stringent conditions
may also be achieved with the addition of destabilizing agents such
as formamide. Exemplary low stringency conditions include
hybridization with a buffer solution of 30 to 35% formamide, 1 M
NaCl, 1% SDS (sodium dodecyl sulphate) at 37.degree. C., and a wash
in 1.times. to 2.times.SSC (20.times.SSC=3.0 M NaCl/0.3 M trisodium
citrate) at 50 to 55.degree. C. Exemplary moderate stringency
conditions include hybridization in 40 to 45% formamide, 1 M NaCl,
1% SDS at 37.degree. C., and a wash in 0.5.times. to 1.times.SSC at
55 to 60.degree. C. Exemplary high stringency conditions include
hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37.degree. C.,
and a wash in 0.1.times.SSC at 60 to 65.degree. C.
[0409] Specificity is typically the function of post-hybridization
washes, the critical factors being the ionic strength and
temperature of the final wash solution. For DNA--DNA hybrids, the
Tm can be approximated from the equation of Meinkoth and Wahl,
(Meinkoth & Wahl, 1984): Tm=81.5.degree. C.+16.6 (log M)+0.41
(% GC)-0.61 (% form)-500/L; where M is the molarity of monovalent
cations, % GC is the percentage of guanosine and cytosine
nucleotides in the DNA, % form is the percentage of formamide in
the hybridization solution, and L is the length of the hybrid in
base pairs. The Tm is the temperature (under defined ionic strength
and pH) at which 50% of a complementary target sequence hybridizes
to a perfectly matched probe. Tm is reduced by about 1.degree. C.
for each 1% of mismatching; thus, Tm, hybridization and/or wash
conditions can be adjusted to hybridize to sequences of the desired
identity. For example, if sequences with approximately 90% identity
are sought, the Tm can be decreased 10.degree. C. Generally,
stringent conditions are selected to be about 5.degree. C. lower
than the thermal melting point (Tm) for the specific sequence and
its complement at a defined ionic strength and pH. However,
severely stringent conditions can utilize a hybridization and/or
wash at 1, 2, 3, or 4.degree. C. lower than the thermal melting
point (Tm); moderately stringent conditions can utilize a
hybridization and/or wash at 6, 7, 8, 9, or 10.degree. C. lower
than the thermal melting point (Tm); low stringency conditions can
utilize a hybridization and/or wash at 11, 12, 13, 14, 15, or
20.degree. C. lower than the thermal melting point (Tm). Using the
equation, hybridization and wash compositions, and desired Tm,
those of ordinary skill will understand that variations in the
stringency of hybridization and/or wash solutions are inherently
described. If the desired degree of mismatching results in a Tm of
less than 45.degree. C. (aqueous solution) or 32.degree. C.
(formamide solution) it is preferred to increase the SSC
concentration so that a higher temperature can be used. An
extensive guide to the hybridization of nucleic acids is found in
Current Protocols in Molecular Biology, Chapter 2, Ausubel, et al.,
Eds., Greene Publishing and Wiley-Interscience, New York
(2000).
[0410] In this invention "primer" or "probe" means a
polynucleotide, especially an oligonucleotide, that is produced
synthetically or biologically and includes a specific nucleotide
sequence and permits hybridization to a section containing the
target nucleotide sequence.
[0411] Defined primers or probes, as well as all other
oligonucleotides and polynucleotide of the present invention, may
be produced by any of several well-known methods, including
automated solid-phase chemical synthesis using
cyanoethyl-phosphoramidite precursors. Other well-known methods for
construction of synthetic primers/oligonucleotides may, of course,
be employed. J. Sambrook, E. F. Fritsch and T. Maniatis, Molecular
Cloning 11 (2d ed. 1989).
[0412] The primers used to amplify the sample nucleic acids may be
coupled to a detectable moiety. A preferred example of such a
detectable moiety is fluorescein, which is a standard label used in
nucleic acid sequencing systems using laser light as a detection
system. Other detectable labels can also be employed, however,
including other fluorophores, radio labels, chemical couplers such
as biotin which can be detected with streptavidin-linked enzymes,
and epitope tags such as digoxigenin detected using antibodies. The
primers may be modified whereby another nucleotide is added to,
removed from, or substituted for at least one nucleotide in the
oligonucleotide. Introduction of known labels such as radioactive
substances, enzymes, fluorescence substances, etc. after synthesis
of oligonucleotide is also included therein.
[0413] Similarly, the probes/oligonucleotides used to hybridize
with the polynucleotides coding for the polypeptides of the
invention, for example for the purpose of detection of such a
polynucleotide, may be coupled to a detectable moiety.
[0414] As used herein, the term "enhancement" means increasing the
intracellular activity of one or more enzymes in a plant cell
and/or plant that are encoded by the corresponding DNA. Enhancement
can be achieved with the aid of various manipulations of the
bacterial cell. In order to achieve enhancement, particularly
over-expression, the number of copies of the corresponding gene can
be increased, a strong promoter can be used, or the promoter- and
regulation region or the ribosome binding site which is situated
upstream of the structural gene can be mutated. Expression
cassettes that are incorporated upstream of the structural gene act
in the same manner. In addition, it is possible to increase
expression by employing inducible promoters. A gene can also be
used which encodes a corresponding enzyme with a high activity.
Expression can also be improved by measures for extending the life
of the mRNA. Furthermore, preventing the degradation of the enzyme
increases enzyme activity as a whole. Moreover, these measures can
optionally be combined in any desired manner.
[0415] A gene can also be used that encodes a corresponding or
variant enzyme with a high activity. Preferably the corresponding
enzyme has a greater activity than the native form of the enzyme,
more preferably at least in the range of 5, 10, 25% or 50% more
activity, most preferably more than twice the activity of the
native enzyme.
[0416] The above written description of the invention provides a
manner and process of making and using it such that any person
skilled in this art is enabled to make and use the same, this
enablement being provided in particular for the subject matter of
the appended claims, which make up a part of the original
description.
[0417] As used herein, the phrases "selected from the group
consisting of," "chosen from," and the like include mixtures of the
specified materials.
[0418] Where a numerical limit or range is stated herein, the
endpoints are included. Also, all values and sub-ranges within a
numerical limit or range are specifically included as if explicitly
written out.
[0419] The above description is presented to enable a person
skilled in the art to make and use the invention, and is provided
in the context of a particular application and its requirements.
Various modifications to the preferred embodiments will be readily
apparent to those skilled in the art, and the generic principles
defined herein may be applied to other embodiments and applications
without departing from the spirit and scope of the invention. Thus,
this invention is not intended to be limited to the embodiments
shown, but is to be accorded the widest scope consistent with the
principles and features disclosed herein.
[0420] Having generally described this invention, a further
understanding can be obtained by reference to certain specific
examples, which are provided herein for purposes of illustration
only, and are not intended to be limiting unless otherwise
specified.
EXAMPLES
Materials and Methods
[0421] Table 6 (below) lists the adenovirus strains cited in the
following examples. GenBank accession numbers refer to the genomic
sequence numbers assigned to each strain. These sequences were not
available to the public at the time of invention and form an
embodiment of the present invention, as well as fragments of the
same. TABLE-US-00006 TABLE 6 Accession Genome GenBank name no. size
Origin Ad3 AY599834 35,345 ATCC#VR-3, strain GB Ad3FS_navy AY599836
35,265 Ad3FS NHRC#1276 from NTC Great Lakes, IL Nov. 07, 1997 Ad4
AY594253 35,990 ATCC#VR-4, strain RI-67 Ad4vaccine AY594254 35,994
Calif. Public Health (via Kevin Russell/NHRC) Ad4FS_navy AY599835
35,965 Ad4FS NHRC#42606 from Ft Jackson, SC Apr. 02, 2003 Ad4FS_AF
AY599837 35,964 Linda Canas, Brooks AFB, Lackland AFB AF#3 2002
Ad5FS AY601635 35,931 Ad5FS NHRC#7151 from Ft Jackson, SC Jan. 06,
1998 Ad7 AY594255 35,305 ATCC#AV-HAD7_AFIP, strain Gomen Ad7FS_navy
AY601634 35,198 Ad7FS NHRC#1315 from NTC Great Lakes, IL Nov. 19,
1997 Ad7vaccine AY594256 35,236 NHRC vaccine tablet (Margaret Ryan,
Kevin Russell/NHRC) Ad16 AY601636 35,522 ATCC#VR-17, strain ch.79
Ad1 AF534906 36,001 ATCC#VR-1, strain Adenoid 71 1953 Ad21 AY601633
35,382 ATCC#NIAID V-221-002-014, NIH Research Reagent, prepared in
November 1963 ATCC = American Type Culture Collection (Manassas,
VA) NHRC = Naval Health Research Center (San Diego, CA) NIH =
National Institutes of Health (Bethesda, MD)
The full GenBank records, including partial annotation, for each of
these are found in the Sequence Listing attached herewith.
Influenza Content
[0422] The prototype influenza types represented by tiles on RPMV1
are:
[0423] Influenza A virus (A/New Caledonia/20/99) H1 HA gene for
hemagglutinin 1
[0424] FluAHA3 Influenza A virus (A/Fujian/411/02) H3N2 gene for
hemagglutinin 3
[0425] FluAHA5 Influenza A virus (A/HongKong/156/97/H5N1),
hemagglutinin 5
[0426] FluANA1 Influenza (A/Chile/1/83), neuraminidase 1
[0427] FluANA2 Influenza A virus (A/Panama/2007/99/H3N2) NA gene
for neurarninidase 2
[0428] FluAMATRIX Influenza virus A/NWS/33/H1N1 matrix protein
(M)
[0429] FluBNA Influenza B virus (BlYamagata/16/88), neuraminidase
glycoprotein genes
[0430] FluBHA Influenza B virus (B/Yamanashi/166/98) hemagglutinin
1 subunit (HA)
[0431] FluBMATRIX Influenza B virus (BlYamagata/16/88) M1 matrix
protein (M)
[0432] The Accession numbers for each of the foregoing sequences,
as well as the remaining pathogen target sequences, obtained from
the public domain and personal communications, are listed in Table
9.
Preparation Example 1
RPM Version 1 Chip Design
[0433] DNA sequences were provided to Affymetrix for creation of
the resequencing microarray chip (RPM Version 1 chip) utilized in
the following examples. Submission of the DNA sequence and
instruction files to Affymetrix were in accordance with the
manufacturer instructions CustomSeq.TM. Array Protocol and product
literature. Probe lengths were nominally 25-nucleotides long and
contained a variable (interrogation point) central nucleotide for
each of four possible variants (A, C, T or G) in both the sense and
antisense directions.
[0434] The target genes selected for the RPMV1 pathogens listed
above are described in the version 1 layout shown in Table 8 and
the Sequence Listing along with the respective PCR primers used for
amplification of the same. The sequences submitted for tiling and
chip fabrication were based on the Affymetrix instruction file
summarized in Table 7, which corresponds to the sequences appearing
as SEQ ID NOs: 1-58. The corresponding "instruction file" lists the
alias designator (e.g. FluAHA5) for each tile region and provides a
"FASTA" formatted target gene sequence (this can be all or part of
the complete target gene). TABLE-US-00007 TABLE 7 RPMV1 Affymetrix
instructions file for tiling and chip fabrication SEQ Name Alias
Start End ID NO: StartSeq EndSeq Design FluAHA1 FluAHA1 1 699 1
TTGAGAAG ATGGTATG 1 FluAHA3 FluAHA3 1 794 2 GATAGTGA AAGCATTC 1
FluAHA5 FluAHA5 1 524 3 AATCCACT GCTCCAAT 1 FluANA1 FluANA1 1 1360
4 AAAAGCAG TTTTGTGG 1 FluANA2 FluANA2 1 1449 5 GCAAAAGC TAGAAAAA 1
FluAMATRIX FluAMATRIX 1 923 6 AGCAAAAG TGCCAGAG 1 FluBHA FluBHA 1
684 7 TTACATCC AGCCATAG 1 FluBNA FluBNA 1 896 8 ATGAACAA CAGTTACA 1
FluBMATRIX FluBMATRIX 1 362 9 ATGTCGCT CATGAAAG 1 Ad4HEXON
Ad4HEXON-1 1 1096 10 GTGGCGCC TAAAGTTA 1 Ad4HEXON Ad4HEXON-2 2226
2504 10 CGAGGTTA GCCCACGC 1 Ad4FIBER Ad4FLBER 1 1258 11 CGACCCCG
ACCCTGCA 1 Ad4E1A Ad4E1A 1 1326 12 GCGGGGCA CCCAGGCA 1 Ad5HEXON
Ad5HEXON-1 1 843 13 GTGGCGCC ATTGCTTT 1 Ad5HEXON Ad5HEXON-2 1655
1846 13 GACCTAAG CCAACGTG 1 Ad5FIBER Ad5FIBER 1 2012 14 TTCTGTCC
AGATCACC 1 Ad5E1A Ad5E1A 1 616 15 AGCCGGAG CTGTGGAA 1 Ad7HEXON
Ad7HEXON-1 1 807 16 GTGGCGCC ATTGGCTT 1 Ad7HEXON Ad7HEXON-2 1652
2245 16 TCTGTATG AATTACAC 1 Ad7FIBER Ad7FIBER 1 712 17 CCTTCAAC
AATGTTAA 1 Ad7E1A Ad7E1A 1 615 18 AAGAGTTT ACTGCCAC 1 PIVIHN PIVIHN
1 204 19 TAGACCCA TATAGGGA 1 PIVIIIHN PIVIIIHN 1 213 20 CAAATCTA
TGAAAGAT 1 PIVIIINCFP PIVIII5NCFP 1 230 21 ACTTAGGA GTACAACC 1
HRV5NT HRV5NT 1 412 22 GTCAAAGG TCGTGTTT 1 RSVABL RSVABL 1 379 23
AAGTGCTC AAGCAAAC 1 RSVAN RSVAN 1 106 24 AATACAAA AGATAGTA 1 RSVBN
RSVBN 1 128 25 GGCAAATA CAATTATG 1 WNVCPRM WNVCPRM 1 432 26
GGCCAATA TGATCCAG 1 WNVE WNVE 1 94 27 ATTTGGCT TTTGTGTG 1 WNVNS1
WNVNS1 1 153 28 GAAGGTTG GGGTACAA 1 HCV229EMG HCV229EMG 1 598 29
TAGAACAG TAACCTAC 1 HCVOC43MG HCVOC43MG 1 358 30 TGATTATT TATATGAC
1 SPNLYTA SPNLYTA 1 125 31 TATCGAAC CTCAGACC 1 SPNPLY SPNPLY 1 99
32 GGTTTGGC ATCAAGAT 1 SPYSPEB SPYSPEB 1 281 33 AATCTTTT TAGACATG 1
SPYMEFAE SPYMEFAE 1 370 34 GGCAGGGC TTACGAAA 1 SPYERMB SPYERMB 1
248 35 AACTGATT TAGAATCC 1 SPYERMTR SPYERMTR 1 176 36 CAACGGGT
GATATTGT 1 MPP1 MPP1 1 369 37 AGGGGGTT ACTATGTT 1 NMCTRA NMCTRA 1
135 38 TTGGATGC TTTTGCTG 1 NMCRGA NMCRGA 1 254 39 GGTGCTGC TGCCGGTC
1 BPPTXP BPPTXP 1 305 40 GAAGTAGC CAAACCGC 1 BPPTXS1 BPPTXS1 1 222
41 CGGCGCAT AGGCCGAA 1 CPMOMPVD4 CPMOMPVD4 1 150 42 ATGCTGAT
TCAGATCA 1 CPMOMPVD2 CPMOMPVD2 1 133 43 AGCGTTCA TAGGCGCT 1 CPRPOB
CPRPOB 1 346 44 AAGGACTT CTGCAGGC 1 BARPOB BARPOB 1 199 45 CGTCCTGG
GGCAGAAG 1 BAPAGA BAPAG 1 354 46 TAGCGGCG TAATTCGT 1 BACAPB BACAPB
1 246 47 TTACACGT ACCTATTA 1 VMVHA VMVHA 1 510 48 AACTATTA TCACCAAC
1 VMVCRMB VMVCRMB 1 291 49 TCGGGAAC CGTCTGTT 1 ZEVL ZEVL 1 443 50
TACTACCA TCACACTG 1 LVGPC LVGPC 1 351 51 GCGCACCG GTGGGCAA 1 FTLP
FTLP 1 431 52 ATCGTAAT TAAGTATG 1 FTFOPA FTFOPA 1 111 53 CAGATATA
GATACTAC 1 YPCVE YPCVE 1 265 54 ATAAAGGG AGGCGGGG 1 YPCAF1 YPCAF1 1
525 55 TATGAAAA ATATAGAT 1 ATTIM ATTIM 1 523 56 ACATCGAC GAGCTTGC 1
ATNAC1 ATNAC1 1 543 57 TATATGTA ATTGTACA 1 Ad7HEXVAC Ad7HEXVAC 168
383 58 GGTGCTTG AAGCCCAT 1
[0435] TABLE-US-00008 TABLE 8 RPMV1 layout, along with the
respective PCR primers used for amplification of the same Gene
Forward found in Primer SEQ ID (SEQ ID Organism Gene Name Amplicon
Probe NO: NO:) Size % GC Influenza A Hemaglutinin 1 675 699 1 439
23 47.8 Influenza A Hemaglutinin 3 770 794 2 440 24 33.3 Influenza
A Hemaglutinin 5 500 524 3 441 26 42.3 Influenza A Hemaglutinin 5
219 442 26 30.8 Influenza A Neuraminidase 1 1336 1360 4 443 22 22.7
Influenza A Neuraminidase 2 1434 1449 5 444 22 45.5 Influenza A
Matrix Gene 911 923 6 445 20 40 Influenza B Hemaglutinin 660 684 7
446 22 45.5 Influenza B Neuraminidase 881 896 8 447 25 32 Influenza
B Matrix Gene 338 362 9 448 24 45.8 Adenovirus 5 Hexon 819 843 13
449 Adenovirus 5 Hexon 168 192 13 450 24 45.8 Adenovirus 5 Fiber
1988 2012 14 451 22 45.5 Adenovirus 5 E1A 171 452 24 54.2
Adenovirus 5 E1A 431 616 15 453 23 56.5 Adenovirus 4 Hexon 764 1096
10 454 Adenovirus 4 Hexon 255 279 10 455 18 61.1 Adenovirus 4 Hexon
511 456 19 57.1 Adenovirus 4 Fiber 967 457 20 55 Adenovirus 4 Fiber
435 1258 11 458 Adenovirus 4 E1A 844 459 22 45.5 Adenovirus 4 E1A
878 460 23 56.5 Adenovirus 4 E1A 409 1326 12 461 Adenovirus 7 Hexon
774 798 16 462 Adenovirus 7 Hexon 570 594 16 463 24 61.6 Adenovirus
7 Fiber 688 712 17 464 23 41.3 Adenovirus 7 E1A 205 465 20 65
Adenovirus 7 E1A 428 615 18 466 Adenovirus 7 Hexvac Parainfluenza
Virus I HN 180 204 19 467 27 48.1 Parainfluenza Virus HN 189 213 20
468 21 47.6 III Parainfluenza Virus 5' noncoding 206 230 21 469 21
38.1 III region of fusion protein Human rhinovirus 5' noncoding 388
412 22 470 16 56.2 region RSV (A, B) L-polymerase 355 379 23 471 19
36.8 major RSV (A) nucleocapsid, N 82 106 24 472 26 38.5 major RSV
(B) nucleocapsid, N 104 128 25 473 26 38.5 West Nile virus C and
prM 408 432 26 474 25 48 West Nile virus E 70 94 27 475 21 52.4
West Nile virus NS1 129 153 28 476 21 52.4 Human coronavirus
membrane 574 598 29 477 20 40 (229E) glycoprotein Human coronavirus
membrane 334 358 30 20 55 (OC43) glycoprotein Streptococcus
Autolysin, lytA 101 125 31 478 21 47.6 pneumoniae Streptococcus
pneumolysin, ply 75 99 32 479 22 50 pneumoniae Mycoplasma
Cytadhesin P1 345 369 37 480 24 58.3 pneumoniae protein Neisseria
capsular transport 111 135 38 481 19 57.9 meningitidis protein
(ctrA) gene Neisseria regularoty protein, 230 254 39 482 25 60
meningitidis crgA Bordetella pertussis Pertussis toxin 281 305 40
483 22 72.7 promoter region Bordetella pertussis Pertussis toxin S1
198 222 41 484 18 61.1 subunit ptxS1 Chlamydia major outer 126 150
42 485 16 62.5 pneumoniae membrane protein (MOMP) VD4 Chlamydia DNA
directed 322 346 44 486 21 42.9 pneumoniae RNA polymerase (rpoB)
Chlamydia major outer 109 133 43 487 20 40 pneumoniae membrane
protein (MOMP) VD2 Streptococcus pyrogenic 257 281 33 488 20 50
pyogenes exotoxin B (speB) Streptococcus macrolide-efflux 346 370
34 489 21 33.3 pyogenes determinant (mefA, mefE) Streptococcus
erythromycin 224 248 35 490 23 39.1 pyogenes resistance methylase
(ermB) Streptococcus erm(TR) 152 176 36 491 20 40 pyogenes Bacillus
anthracis RNA polymerase 175 199 45 492 21 47.6 beta-subunit (rpoB)
Bacillus anthracis protective antigen 330 354 46 493 25 40 (pag)
Bacillus anthracis Poly(D-glutamic 222 246 47 494 19 52.6 acid)
capsule (capB) Variola Major Virus hemagglutinin 486 510 48 495 18
27.8 (HA) Variola Major Virus cytokine response 267 291 49 496 28
39.3 modifier B (crmB) Ebola Virus L Gene 419 443 50 497 22 27.3
Lassa Virus GPC Gene 335 351 51 498 20 55 Francisella 13-kDa
lipoprotein 407 431 52 499 27 55 tularensis Francisella FopA 87 111
53 500 27 37 tularensis Yersinia pestis cve2155 sequence 242 265 54
501 22 54.5 Yersinia pestis Caf1 501 525 55 502 21 52.4 Arabisopsis
thaliana TIM 499 523 56 503 20 45 Arabisopsis thaliana NAC1 519 543
57 504 20 45 29344 Reverse Taqman Primer Probe (SEQ ID (SEQ ID
Organism Tm NO:) Size % GC Tm NO:) Size % GC Tm Influenza A 54.8
505 24 33.3 55.6 Influenza A 57.6 506 22 36.4 55.9 Influenza A 59.8
507 26 42.3 58.7 Influenza A 58.9 508 24 45.8 58.5 Influenza A 54.2
509 18 55.6 53.5 Influenza A 57.9 510 23 39.1 57.8 Influenza A 55
511 571 20 55 62 Influenza B 55.5 512 24 47.9 57.5 Influenza B 56.3
513 22 45.5 56 Influenza B 59.9 514 25 36 59.6 572 22 54.5 63.4
Adenovirus 5 515 Adenovirus 5 57.6 516 24 54.2 58.5 Adenovirus 5
55.9 517 20 60 61 Adenovirus 5 61.5 518 21 66.7 61.3 Adenovirus 5
59.7 519 21 57.1 57.3 Adenovirus 4 520 Adenovirus 4 63 521 18 66.7
62 Adenovirus 4 66 522 18 50 64 Adenovirus 4 57.8 523 21 47.6 54.9
Adenovirus 4 524 Adenovirus 4 57.1 525 19 59.9 59.6 Adenovirus 4
61.2 526 25 48 60.6 Adenovirus 4 Adenovirus 7 527 Adenovirus 7 60.5
528 24 50 60.3 Adenovirus 7 54.7 529 23 47.8 58.2 Adenovirus 7 62
530 23 60.9 61.3 Adenovirus 7 531 Adenovirus 7 Parainfluenza Virus
I 62.8 532 28 35.7 58.2 Parainfluenza Virus 48.5 533 21 42.9 49.5
III Parainfluenza Virus 46.3 534 20 40 48 III Human rhinovirus 44.9
535 16 56.2 43.8 RSV (A, B) 43.9 536 19 36.8 42.9 RSV (A) 57.3 537
21 47.6 54.3 RSV (B) 58 538 30 33.3 59.1 West Nile virus 62.6 539
25 56 64.9 West Nile virus 53.8 540 21 52.4 53.8 West Nile virus
53.2 541 21 47.6 54.3 Human coronavirus 53.7 542 20 50 51.9 (229E)
Human coronavirus 55 543 20 45 53.7 (OC43) Streptococcus 52.7 544
21 42.9 53.2 pneumoniae Streptococcus 56.3 545 23 43.5 58
pneumoniae Mycoplasma 63.8 546 25 52 60.8 pneumoniae Neisseria 53.2
547 20 45 54.4 meningitidis Neisseria 73.6 548 24 62.5 69.4
meningitidis Bordetella pertussis 69.6 549 20 55 59.1 Bordetella
pertussis 56.2 550 19 63.2 58.6 Chlamydia 50.3 551 25 32 55.5
pneumoniae Chlamydia 52 552 20 50 49.2 pneumoniae Chlamydia 50.6
553 21 47.6 48.7 pneumoniae Streptococcus 48.1 554 20 40 48.7
pyogenes Streptococcus 43 555 21 38.1 47.2 pyogenes Streptococcus
50.7 556 20 50 51.7 pyogenes Streptococcus 45.2 557 23 30.4 48.7
pyogenes Bacillus anthracis 52.5 558 23 34.8 53.2 Bacillus
anthracis 54.9 559 27 44.4 57.1 Bacillus anthracis 49 560 19 42.1
51.7 Variola Major Virus 48 561 23 34.8 48.3 Variola Major Virus
59.1 562 24 50 59.8 Ebola Virus 50.1 563 30 33.3 60.1 Lassa Virus
59.5 564 29 31 57.6 Francisella 59.5 565 24 31 57.6 tularensis
Francisella 59.9 566 23 43.5 56 tularensis Yersinia pestis 60.1 567
20 65 61.8 Yersinia pestis 57.4 568 23 39.1 53.6 Arabisopsis
thaliana 47.7 569 20 50 49.9 Arabisopsis thaliana 51.1 570 20 50
53.3
[0436] The instruction file references the same alias designators,
but further specifies the actual subsequences that are actually to
be tiled onto the array. In most cases, this represents the entire
sequence from the sequence file but in others (e.g. FluBMATRIX)
only nucleotides 1-362 of the total sequence were used for tiling.
Table 9 provides the information presented in the instructions
file.
[0437] Table 9 (below): Instruction file (specific file) of the RPM
V1 design specification. The alias designator is a name assigned to
each discrete "tile" region (e.g. a geometric region of the
microarray containing all probe combinations required for
resequencing of a stretch of pathogen genome sequence). The
pathogen, sequence accession number, and tile size for each tile
region are shown. The instructions, which were processed in
Affymetrix submission format, appear in Table 7. The instruction
file references complete or partial sequence of the complete target
genes that are found in SEQ ID NOs: 1-58 appearing in the attached
Sequence Listing. TABLE-US-00009 TABLE 9 RPMV1 Chip Table ALIAS
NAME GENE_NAME ACCESSION_NO LENGTH ATNAC1 Arabidopsis thaliana NAC1
543 ATTIM Arabidopsis thaliana TIM 523 Ad4E1A Adenovirus 4 E1A
AF594253(draft) 1326 Ad4FIBER Adenovirus 4 Fiber AF594253(draft)
1258 Ad4HEXON-1 Adenovirus 4 Hexon AF594253(draft) 1096 Ad4HEXON-2
Adenovirus 4 Hexon AF594253(draft) 279 Ad5E1A Adenovirus 5 E1A
AY147066 616 Ad5FIBER Adenovirus 5 Fiber M18369 2012 Ad5HEXON-1
Adenovirus 5 Hexon AF542130 843 Ad5HEXON-2 Adenovirus 5 Hexon
AF542130 192 Ad7E1A Adenovirus 7 E1A AY594255(draft) 615 Ad7FIBER
Adenovirus 7 Fiber AY594255(draft) 712 Ad7HEXON-1 Adenovirus 7
Hexon AY594255(draft) 807 Ad7HEXON-2 Adenovirus 7 Hexon
AY594255(draft) 594 Ad7HEXVAC Adenovirus 7 Hexon AY594256(draft)
216 BACAPB Bacillus anthracis Poly(D-glutamic acid) capsule (capB)
M24150 246 BAPAG Bacillus anthracis protective antigen (pag) M22589
354 BARPOB Bacillus anthracis RNA polymerase beta-subunit (rpoB)
AF205323 199 BPPTXP Bordetella pertussis Pertussis toxin promoter
region M13223 305 BPPTXS1 Bordetella pertussis Pertussis toxin S1
subunit ptxS1 M13223.1 222 CPMOMPVD2 Chlamydia pneumoniae major
outer membrane protein (MOMP) CP0694 133 VD2 CPMOMPVD4 Chlamydia
pneumoniae major outer membrane protein (MOMP) M69230 150 VD4
CPRPOB Chlamydia pneumoniae DNA directed RNA polymerase (rpoB)
NT01CP0714 346 FluAHA1 Influenza A Hemaglutinin 1 AJ344014 699
FluAHA3 Influenza A Hemaglutinin 3 private 794 communication
FluAHA5 Influenza A Hemaglutinin 5 AF028709 524 FluAMATRIX
Influenza A Matrix Gene L25814 923 FluANA1 Influenza A
Neuraminidase 1 M24783 1360 FluANA2 Influenza A Neuraminidase 2
AJ457937 1449 FluBHA Influenza B Hemaglutinin AF100355 684
FluBMATRIX Influenza B Matrix Gene AF100378 362 FluBNA Influenza B
Neuraminidase AY139081 896 FTFOPA Francisella tularensis FopA
AF097542 111 FTLP Francisella tularensis 13-kDa lipoprotein M32059
431 HCV229EMG Human coronavirus (229E) membrane glycoprotein
AF304460 598 HCVOC43MG Human coronavirus (OC43) membrane
glycoprotein M93390 358 HRV5NT Human rhinovirus 5' noncoding region
NC_001617 412 LVGPC Lassa Virus GPC Gene M15076 351 MPP1 Mycoplasma
pneumoniae Cytadhesin P1 protei M18639 369 NMCRGA Neisseria
meningitidis regularoty protein, crgA AF190471 254 NMCTRA Neisseria
meningitidis capsular transport potein (ctrA) NMB0071 135 PIVIHN
Parainfluenza Virus I HN U70948 204 PIVIII5NCFP Parainfluenza Virus
III 5' noncoding region Z11575 213 PIVIIIHN Parainfluenza Virus III
HN M18764 230 RSVABL RSV L-polymerase AF254574 379 RSVAN RSV A
major nucleocapsid, M11486 106 RSVBN RSV B major nucleocapsid,
D00736 128 SPNLYTA Streptococcus pneumoniae Autolysin, lytA SP1937
125 SPNPLY Streptococcus pneumoniae pneumolysin, ply SP1923 99
SPYERMB Streptococcus pyogenes erythromycin resistance methylase
(ermB) X52632 248 SPYERMTR Streptococcus pyogenes erm(TR) AF002716
176 SPYMEFAE Streptococcus pyogenes macrolide-efflux determinant
(mefA, metE) U70055 370 SPYSPEB Streptococcus pyogenes pyrogenic
exotoxin B (speB) NT01SP1804 281 VMVCRMB Variola Major Virus
cytokine response mo U88145 291 VMVHA Variola Major Virus
hemagglutinin (HA) X65516 510 WNVCPRM West Nile virus C and prM
AF196835 432 WNVE West Nile virus E AF196835 94 WNVNS1 West Nile
virus NS1 AF196835 153 YPCAF1 Yersinia pestis Caf1 X61996 525 YPCVE
Yersinia pestis cve2155 sequence AF350077 265 ZEVL Ebola Virus L
Gene AF086833 443 29569
[0438] The chip design team at Affymetrix used the combination of
the information above and the corresponding sequence file
information to generate the layout of the chip. FIG. 1 shows an
overview of the chip layout as a function of where the gene
sequences for any given pathogen are clustered on the resequencing
microarray chip (RPM Version 1 chip) utilized in the following
examples. This figure and chip layout is only for illustration of
the "real estate" allocation to various pathogens for the RPMV1 and
is not intended to be limiting in any way. The skilled artisan
would readily appreciate that the relative order and amounts of
sequence dedicated to each of the pathogen clusters on this chip
can be altered without intrinsic deleterious effects on the utility
of the chip.
[0439] It is important to note that the tiling strategy dictates
that the first 12 and last 12 sequences from each discrete tile
region that are not queried by the tiling strategy of the
resequencing microassay, since they are used as components of the
first and last 25-mer probes that are varied at the number 13
positions.
[0440] The sequences used for the adenovirus regions (Ad4, Ad5, Ad7
and Ad7 vaccine) of the chip were all derived from early drafts of
genomes sequenced by the present inventors. The GenBank submission
files that correspond to the genomes used for tiled regions are
listed in Table 6. Because the sequences submitted to Affymetrix
for the prototypes were based on early drafts of the genomes, there
were discrepancies observed between those early sequences and the
final sequences submitted to GenBank. A list of those discrepancies
is given in the Table 10: TABLE-US-00010 TABLE 10 Discrepancies
observed between the RPMV1 tiled sequences and the final sequences
submitted to Genbank appearing in Table 6 Length of target Sequence
Base # Target Sequence Contig for Ad4 E1A 2004 554 missing A 658 C
T 697 G A 698 A G 851 C T 1460 C T 1675 C T 1777 A G 2002 missing T
Hexon 2813 18319 C G 18330 missing T 18331 missing G 18332 missing
G 18385 A G 18451 C T 18523 T C 18547 T C 18571 C T 18586 T C 18617
T C 18640 T C 18659 G T 18662 A G 18687 T C 18700 A C 18843 A G
18889 T A 18901 C T 18940 G T 18965 A C 18997 T C 19013 G A 19020 A
C 19113 A C 19237 A G 19325 T C 19327 A G 19330 C T 19447 A G 19542
C A 19714 T C 19732 A C 19759 C T 19762 A G 19765 A G 19795 C A
19796 T A 19798 C T 19816 T C 19819 C T 19881 A missing 19897 C T
19906 C T 19911 A G 19915 T C 19916 T C 19936 T C 19976 T C 20038 C
T 20050 C T 20128 C C 20149 A C 20158 A C 20176 T C 20206 C G 20210
G A 20239 missing C 20245 C missing 20246 T A 20285 T C 20297 T C
20336 T C 20363 T C 20366 A C 20429 T C 20435 T C 20447 G C 20459 G
A 20499 T C 20511 T C 20519 T C 20528 T C 20570 T C 20579 T C 20658
C G 20660 T C 20663 T C 20666 G A 20684 T C 20687 C T 20690 T A
20713 G A 20753 T C 20759 A G 20768 C T 20819 T C 20864 T C 20939 T
C 21008 C T 21038 G A Fiber 1386 31602 missing C 31611 missing C
31616 missing C 31652 missing A 31672 G missing 31714 missing C
31746 T missing 31790 missing C 31798 C missing 31799 C missing
31816 missing C 31923 T missing 31943 C missing 32003 G missing
32047 T missing 32051 missing T 32260 T missing 32262 G T 32266
missing G 32473 A T 32475 T A 32618 T C 32619 C T 32934 missing C
Contig for Ad5_canji E1A 60 none Hexon 60 19020 G missing 19023 A
missing 19024 C missing 19025 A missing Fiber 60 none Contig for
Ad7 E1A 60 none Hexon 60 none Fiber 60 none Contig for Ad7_Navy E1A
60 590 C T Hexon 60 18109 A G Fiber 60 none Contig for Ad7_Vaccine
E1A 60 559 C G 586 C T Hexon 60 18142 A G Fiber 60 none
[0441] This relatively small number of discordances did not
interfere with the ability of the chip to make base calls that
could be associated with the correct organism, except in one
specific case of adenovirus type 4, described in the Examples.
Overall, the resequencing microarray technique of the present
invention corroborated the accurate (refined) final sequence of the
tiled genes with respect to base substitutions, validating the
unanticipated robustness of our method.
Preparation Example 2
PCR Primer Design and Amplification Protocols
Degenerate PCR Primers Design
[0442] The objective of primer selection to support conserved
(degenerate) multiplex PCR is to design primers that target the
conserved regions flanking species-specific variable regions of
E1A, fiber, and hexon genes. In general, this method may be applied
to any organism, as conserved sequences within a species are a
ubiquitous in nature. These target genes were selected based on
their function and location within the linear adenoviral genome.
E1A is located at the 5' end of the adenoviruses genome and encodes
a trans-acting transcriptional regulatory factor that is necessary
for transcriptional activation of early genes. The hexon and fiber
genes, which are located in the middle and 3' end of the adenovirus
genome, encode antigenic determinants .epsilon. and .gamma.
respectively, which determine the viral serotype. Thus, detection
and serotyping of ARD-causing adenoviruses can be effectuated by
targeting the nucleic acid determinants that give rise to the
serotype. Thereby, the primers provided specific amplification
within the adenovirus while the variable regions supplied
serotype-specific characters for proper species identification.
[0443] The primers used for conserved (degenerate) multiplex PCR in
the following examples are based on a global alignment of E1A,
fiber, and hexon gene sequences, respectively, available from
GenBank (GenBank accession numbers are given in parentheses):
E1A-AdB (NC.sub.--004001), AdC (NC.sub.--001405), Ad3 (AF492352),
Ad4 (M14918), Ad7 (X03000);fiber-Ad2 (AJ278921), AdS (M18369), Ad3
(X01998), Ad4 (X76547), Ad7 (M23696), Ad16 (U06106), Ad2l (U06107);
hexon-Ad3 (X76549), Ad4 (X84646), Ad6 (AF161560, X67710, Y17245),
Ad7 (AF053087, X76551), Ad16 (X74662), Ad21 (AB053166). The global
sequence alignment for primer design of the E1A gene used the E1A
gene sequences from Ad3, Ad4, Ad7, Ad21, AdB, and AdC serotypes.
The global sequence alignment for primer design of the fiber gene
used fiber gene sequences from Ad2, Ad3, Ad4, AdS, Ad7, Ad16, and
Ad21 serotypes. The global sequence alignment for primer design of
the hexon gene used the hexon gene sequences from Ad3, Ad4, Ad6,
Ad7, Ad16, and Ad21 serotypes.
[0444] Subsequent to the global sequence alignment, primer pairs
were selected based on their ability to amplify E1A, fiber, and
hexon genes of serotype 3, 4, 6, 7, 16 and 21 (data not shown).
Table 11 shows the primer pairs employed in the following Examples.
TABLE-US-00011 TABLE 11 Amplicon Primer Gene Sequence (5' .fwdarw.
3') size (bp) AdE1A-F E1A CGC TGC ACG ATC TGT ATG AT (SEQ ID NO:
421) AdE1A-R E1A TCT CAT ATA GCA AAG CGC ACA (SEQ ID NO: 409-446
422) AdB1* Fiber TST ACC CYT ATG AAG ATG AAA GC (SEQ ID NO: 423)
AdB2* Fiber GGA TAA GCT GTA GTR CTK GGC AT (SEQ ID 670-772 NO: 424)
AdFib-F3 Fiber ACT GTA KCW GYT TTG GYT GT (SEQ ID NO: 425) AdFib-R3
Fiber TTA TTS YTG GGC WAT GTA KGA (SEQ ID NO: 430-437 426) AdHex-F7
Hexon CAC GAY GTG ACC ACM GAC CG (SEQ ID NO: 427) AdHex-R5 Hexon
TTK GGT CTG TTW GGC ATK GCY TG (SEQ ID 770-815 NO: 428)
Multiplex Degenerate Primer PCR Protocol
[0445] The primers pairs (Lin et al., 2004) were evaluated in
various multiplex combinations to obtain amplification of
adenovirus serotype 3, 4, 6, 7, 16, and 21. PCR was performed in 50
.mu.l volumes containing 20 mM Tris-HCl (pH 8.4), 50 mM KCl, 200
.mu.M each of dNTPs, 200 nM AdB1, AdB2 primers; 300 nM AdE1A-F,
AdE1A-R primers; 400 nM Adfib-F3, Adfib-R3 primers; 500 nM
Adhex-F7, Adhex-R5 primers, 2 U of Platinum Taq DNA polymerase
(Invitrogen), and 10.sup.6 copies of DNA templates. The
amplification reaction was performed using a Peltier Thermal
Cycler-PTC225 (MJ Research) with preliminary denaturation at
94.degree. C. for 3 minutes, then followed-by 40 cycles of
denaturation at 94.degree. C. for 30 seconds, annealing at
50.degree. C. for 30 seconds, the extension at 72.degree. C. for 40
seconds, and a final extension at 72.degree. C. for 10 minutes. The
generic multiplex PCR assay could amplify all three genes of all
six ARD causing adenoviruses even though the hexon gene of Ad4
showed very weak band on agarose gel.
Total Amplification Protocol
[0446] Total amplification was conducted using the commercially
available GenomiPhi.TM. DNA Amplification Kit (Amersham
Biosciences) to representatively amplify linear genomic DNA
(hereinafter referred to as "GenomiPhi total amplification"). The
amplification method employed in the GenomiPhi total amplification
method utilizes bacteriophage Phi29 DNA polymerase enzyme to
exponentially amplify single- or double-stranded linear DNA
templates by strand displacement amplification. The starting sample
was not quantified, but rather was used directly.
Preparation Example 3
REPI Software
[0447] Raw sequence data from the resequencing microarray chips is
provided by the Genetic Data Analysis Software version 2.0 (GDAS)
packaged with the microarray reader from Affymetrix. GDAS base
calling is based on a previously described base-calling algorithm
(Cutler et al., 2001). Each of the FASTA output files containing
the base calls obtained from the GDAS software was analyzed using
specialized software (REPI) that the present inventors
developed.
[0448] In the case of the present invention, the sequence output of
GDAS is most often a scattered mixture of contiguous sequence calls
(A, T, C or G) that are interspersed with varying amounts of
no-calls (n's) where the GDAS software does not make a base call
due to lack of amplification, weak hybridization signal on the chip
and/or high background hybridization caused by non-specific binding
(Cutler et al., 2001). An example output of the GDAS output for the
Adenovirus 4 prototype sample for the Ad4FIBER tile region is shown
below (SEQ ID NO: 429): TABLE-US-00012
>Ad4FIBER:CustemSeq-Adenovirus 4 Start = 12 End = 1245
nnnnnnnnnnnnnnncnnncncngaccgngnnnttcannaacnctcccntcgnnctcttcag
atgnattnnaagaaaagcccctgggggtgttgtccnnannnnnnngccgaccctgtcncnnn
aagaatgnnnaaattannnnnaagctgngngagggggtnnnncttgacgactcgggaaaact
cnttgcaancacagtaaacaaggccattgcnnctctcagnttttnccaacaacaccatttnn
nnttaacatggatacccctttatacaccaaagntggaaaactanccttacaagtttctncca
ccattaagtatattaaaatcaacaattttgaatacnnnnnnnnnagcttttggntcaggttt
nggactcagtnnnngcgcccnngcagtananttagnctcnccacttacatttgntgataaag
ggaatanaaagattacccnaaanagnnimttgcatgttanaacaggagntgcaattgaaagc
aacatcagttgggctaaaggtntaaaattngaagatggtgccatagctacaaacattggtaa
nnnnnnnnnnntnnnaaccagnngtncagaannannagnnannaangcttatccaatccaan
nnnnnnntgncnctggtctcagctttgacagcacaggagccataatgnctggcaataaagnc
tatgataaattaactttgtggacaacgcctgacccatcaccaaactgncaaatncttgcaga
aaatgntgcaaaactaacactttgnnnnnnnnanngnnacagncaaatactggccactgtan
cngntttggntgttagaagnggaaacttaaacccaattactggcacagtaagcagtgctcaa
gnttttcnncgntttgatgcaaatggtgnncntntnacagaacactctanncnnaaaaaatn
ntggggcnanaagcaangagatagnatagatggcactccatacaccaatgctgttggtttta
tgccaaattcaacagcttntnnaaagacncaaagttctnctnctaaaaataatntagtgggt
caagtatacatgantnnagntgtttnanannncatgnttcttnctataactcttaatggtnc
tgatgacaccaccngtgcatnctcaatgncattttcatacacctggactaacggaagctata
tcggagcaacatttggagctaactcatacaccttctcntacatngcccannannnn
[0449] Similarity search algorithms such as BLAST (Korf et al.,
2003) allow the use of n's (analogously to a wildcard) but the
inclusion of too many n's results in an unacceptable degree of
ambiguity. In that case, no meaningful BLAST results will be
returned.
[0450] The REPI software was designed to parse the output of the
FASTA file selecting and editing to mold sequence data into a
format suitable for sequence similarity searches using the NCBI
BLASTN algorithm. To accomplish this objective, REPI functions
through a series of filters modifying the data as little as
possible while extracting usable, "BLASTable" data from the FASTA
files. Due to the nature of the resequencing microarray the
sequences often contain large amounts of non-base calls (n's).
BLAST is unable to return significant similarity for sequences with
a large amount of non-base calls. Therefore the original sequences
must be filtered to extract those portions that are most likely to
return a significant similarity.
[0451] The first filter that a prospective sequence encounters is a
control check. The control sequence incorporated into the
microarray is specifically designed to be a nonsense sequence;
therefore, it will never return a significant similarity. Next the
sequence is evaluated for "BLASTable" data. Here a sliding window
algorithm is used, the window-size parameter is entered by the user
and represents the number of base calls the algorithm will evaluate
at one time. The sequence is evaluated starting from the first
base-call the window slides along the sequence searching for the
first area containing relevant data, this is evaluated using a
scoring method where all valid bases are given a score of one and
all n's are given a score of zero if the score is greater than or
equal to a predetermined threshold (here 25% is used) the program
marks the start of this window as the beginning of usable data.
Once the start of usable data is determined the program reverses
its parameters and begins searching for the end of the useable
data, the score must now be less than or equal to the threshold.
For each sequence REPI searches for the largest continuous
string(s) of usable data, which will represent that sequence's
initial subsequence.
[0452] This subsequence is then trimmed for beginning and trailing
n's. Trimming is necessary because the previous filter uses the
starting position of the starting window as the beginning of the
subsequence and the last position of the ending window as the end
of the subsequence, therefore all though that window's score was
acceptable there may be n's leading or trailing the subsequence.
The next filter the subsequence goes through is a length
evaluation. Subsequences longer than 50 nucleotides are allowed to
continue, subsequences shorter than 20 nucleotides are discarded,
and subsequences between 20 and 50 nucleotides are re-evaluated as
follows. Due to the length of these subsequences they are rescored
using the same scoring system described earlier. Subsequences with
greater than 60% non-base calls are discarded; all others are
allowed to continue to be searched with the BLAST algorithm against
GenBank, or one's own modified database.
[0453] Once the similarity search is complete REPI computes a
number of statistics on the subsequence including the subsequence
percentage of the target sequence, the subsequence length, the
number of subsequence base calls, and the percentage of subsequence
base calls. The subsequence percentage of the target sequence and
the subsequence length shows what portion of the target pathogen
gene was identified. The subsequence length and percentage of
subsequence base calls allow us to monitor the filtering algorithm
filters as well as the GDAS threshold parameters. REPI saves all
statistical results returned from the BLASTN algorithm allowing the
user to manipulate which results are displayed in the graphical
user interface.
[0454] In the examples provided, REPI was interfaced to a local
BLAST (NCBI GenBank) database (contained on an Apple G5 single
processor (1.8 Ghz) computer with 4.5 GB of random access memory)
via a CGI (Perl) interface. Displayed results included all database
sequences within an expect value (E-value) threshold of 1.0e-9. The
E-value represents the number of alignments expected at random
given the size of the search space, the scoring matrix, and the gap
penalties; the lower the E-value the less likely the database
sequence similarity matches was in fact a random identification. By
definition e=2.71828182845904523536028747135.
[0455] The REPI output is comprised of the (BLASTable) subsequence
names, lengths, E-values, and bits scores are displayed for each
subsequence in descending order of bit scores. The name is reported
as the GenBank record's FASTA definition line and includes the
sequence length. The score is the normalized score computed from
the scoring matrix and gap penalties, the higher the score the
greater the similarity.
[0456] The REPI output of the example listed above is shown below.
For each "BLASTable" subsequence, REPI returns (in descending order
of bit score ranking) all GenBank data records having expect values
of <1.0 e-9. The highest bit score is achieved for the
adenovirus 4 prototype (AY594253), which is genetically
indistinguishable from the Ad4 vaccine strain (AY594254) across
this stretch, while lower bit scores suitably distinguish field
strains from Air Force and Navy training sites (SEQ ID NO: 430).
TABLE-US-00013 >Ad4FIBER:CustemSeq-Adenovirus 4 Start = 12 End =
1245 Subsequence:
cnnncncngaccgngnnnttcannaacnctcccntcgnnctcttcagatgnattnnaagaaaag
cccctgggggtgttgtccnnannnnnnngccgaccctgtcncnnnaagaatgaanaaattannn
nnaagctgngngagggggtnnnncttgacgactcgggaaaactcnttgcaancacagtaaacaa
ggccattgcnnctctcagnttttnccaacaacaccatttnnnnttaacatggatacccctttat
acaccaaagntggaaaactanccttacaagtttctnccaccattaagtatattaaaatcaacaa
ttttgaatacnnnnnnnnnagcttttggntcaggtttnggactcagtnnnngcgcccnngcagt
ananttagnctcnccacttacatttgntgataaagggaatanaaagattacccnaaanagnnnn
ttgcatgttanaacaggagntgcaattgaaagcaacatcagttgggctaaaggtntaaaattng
aagatggtgccatagctacaaacattggtaannnnnnnnnnntnnnaaccagnngtncagaann
annagnnannaangcttatccaatccaannnnnnnntgncnctggtctcagctttgacagcaca
ggagccataatgnctggcaataaagnctatgataaattaactttgtggacaacgcctgacccat
caccaaactgncaaatncttgcagaaaatgntgcaaaactaacactttgnnnnnnnnanngnna
cagncaaatactggccactgtancngntttggntgttagaagnggaaacttaaacccaattact
ggcacagtaagcagtgctcaagnttttcnncgntttgatgcaaatggtgnncntntnacagaac
actctanncnnaaaaaatnntggggcnanaagcaangagatagnatagatggcactccatacac
caatgctgttggttttatgccaaattcaacagcttntrmaaagacncaaagttctnctnctaaa
aataatntagtgggtcaagtatacatgantnnagntgtttnanannncatgnttcttnctataa
ctcttaatggtnctgatgacaccaccngtgcatnctcaatgncattttcatacacctggactaa
cggaagctatatcggagcaacatttggagctaactcatacaccttctcntacatngcccanna
[0457] Subsequence Percentage of Target: 98%
[0458] Subsequence Length: 1215
[0459] Number of Subsequence Base Calls: 1020
[0460] Percentage of Subsequence Base Calls: 84% [0461]
lcl|AY594254|Human Adenovirus serotype 4, vaccine strain#|35,994
bp; Length=35994 [0462] evalue: 0.0, score: 751.806 for Ad4FIBER
[0463] lcl|AY594253|Human Adenovirus Serotype 4|35,990 bp;
Length=35990 [0464] evalue: 0.0, score: 751.806 for Ad4FIBER [0465]
gi|303967|gb|L19194.1|ADRFIBERX Mastadenovirus h4 fiber protein,
complete cds; Length=1346 [0466] evalue: 0.0, score: 743.877 for
Ad4FIBER [0467] gi|22796371|emb|AJ315930.1|HAD315930 Human
adenovirus type 4 DNA; Length=12718 [0468] evalue: 0.0, score:
735.947 for Ad4FIBER [0469] lcl|AY599837|Human Adenovirus serotype
4, USAF Field Strain|35,964 bp; Length=35964 [0470] evalue: 0.0,
score: 704.23 for Ad4FIBER [0471] lcl|AY599835|Human Adenovirus
serotype 4, US Navy Field Strain|35,965 bp; Length=35965 [0472]
evalue: 0.0, score: 696.3 for Ad4FIBER [0473]
gi|434913|emb|X76547.1|AV4FIB1 Adenovirus type 4 gene for fiber
protein; Length=1375 [0474] evalue: 2.32306E-154, score: 553.571
for Ad4FIBER [0475] gi|17105037|gb|AF394196.1|AF394196 Simian
adenovirus 25, complete genome; Length=36521 [0476] evalue:
6.5072E-53, score: 216.57 for Ad4FIBER [0477]
gi|33694802|tpg|BKO00413.1|TPA: Simian adenovirus 25, complete
genome; Length=36519 [0478] evalue: 6.5072E-53, score: 216.57 for
Ad4FIBER
[0479] The application Java Archive (.jar) files for the REPI
program are generated and processed in accordance with the
description provided in U.S. Application Ser. No. 60/609,918 filed
on Sep. 15, 2004, and U.S. Application Ser. No. 60/631,460, filed
on Nov. 29, 2004, which are incorporated herein by reference in
their entirety.
EXAMPLE 1
Base Calling Algorithm Settings and BLAST Analysis of Base Calls
for Adenovirus 4 Using Conserved (Degenerate) PCR Primers and
GenomiPhi Total Amplification
[0480] Raw sequence data from the resequencing microarray chips is
provided by the Genetic Data Analysis Software version 2.0 (GDAS)
packaged with the microarray reader from Affymetrix. GDAS base
calling is based on a previously described base-calling algorithm,
ABACUS, detailed previously (Cutler et al., 2001). A variety of
base-calling algorithm parameters can be defined by the user (GDAS
operators manual) to obtain a trade-off between base calling
percentage and accuracy.
[0481] A description of the parameters is found in the GDAS manual
on pages 207-217. The recommended (default) settings for GDAS are
"conservative" settings that focus on the highest level of
accuracy. In contrast, the objective of the present invention is to
increase the percentage of base calls. To achieve this objective,
the present inventors adjusted the parameters to allow highly
permissive base calls (increased percentage) as listed below:
[0482] "Permissive" Base Calling Algorithm Settings
[0483] Filter Conditions [0484] No Signal threshold=0.500
(default=1.000000) [0485] Weak Signal Fold threshold=20000.000
(default=20.000000) [0486] Large SNR threshold=20.000000
(default=20.000000)
[0487] Algorithm Parameters [0488] Strand Quality Threshold=0.000
(default=0.000000) [0489] Total Quality Threshold=25.0000
(default=75.000000) [0490] Maximum Fraction of Heterozygote
Calls=0.99000 (default=0.900000) [0491] Model Type (0=Heterozygote,
1=Homozygote)=0 [0492] Perfect Call Quality Threshold=0.500
(default=2.000000)
[0493] Final Reliability Rules [0494] Min Fraction of Calls in
Neighboring Probes=1.0000 (disables filter) [0495] Min Fraction of
Calls of Samples=1.0000 (disables filter)
[0496] The settings above are significant in the present
application because the base call algorithm is set up by default to
sacrifice the number of base calls made in order to make the most
accurate calls (i.e., for SNP detection). In the present
application, the technique is less concerned about achieving the
same degree of accuracy as required for SNP detection but instead
expanding the number of calls made so that the longest possible
stretches of contiguous sequence are produced by GDAS, while
maintaining specificity.
[0497] In a comparative approach to sample preparation for analysis
by the RPM V1 microarray, a starting concentration of 10.sup.6
genomic copies of Adenovirus 4 prototype (AY594253) per microliter
were amplified using either degenerate adenovirus PCR primers (Lin
et al., 2004) or GenomiPhi isothermal methods. For the total
amplification experiments, DNA was isolated from cultured
adenoviruses and aliquoted at concentrations of 10.sup.6 copies per
microliter. DNA was amplified using a total amplification strategy
(GenomiPhi, Amersham), then processed in accordance with the
standard Affymetrix CustomSeq.TM. protocol (available from
manufacturer).
[0498] Images of the hybridized microarrays are shown in FIG. 2.
GenomiPhi total amplification allowed the entire tiled region
(increased sensitivity) to be resequenced compared to the more
limited region resulting from conserved (degenerate) multiplex PCR.
This result arises due to the fact that GenomiPhi total
amplification does not rely on specific primer sequences within the
tiled fragment. However, it is also important to note that
irrespective of which amplification strategy was employed, i.e.,
conserved (degenerate) multiplex PCR or GenomiPhi total
amplification, the proper Ad4 calls were made. The consistency
between these methods is important as the conserved primers may
have a more direct application to serotyping members of the same
family of pathogens from complex samples without the need for
selective enrichment.
[0499] Based on the corresponding GDAS outputs (using "permissive"
settings), REPI identified the following list of top-ranked returns
for the conserved PCR amplification strategy for the E1A, Fiber,
and Hexon-1 tile regions of the V1 RPM microarray (note: omitted
from this list are several adenovirus type 4 GenBank records having
nearly identical sequences as the Ad4 prototype and Ad4 vaccine
strain). TABLE-US-00014 TABLE 12 Ad4E1A: CustemSeq-Adenovirus 4
Accession # Name E value: Bit Score: AY594254 Human Adenovirus
1.04808E-107 396.964 serotype 4, vaccine strain AY594253 Human
Adenovirus 1.04808E-107 396.964 Serotype 4 AY599837 Human
Adenovirus 8.34268E-53 214.587 serotype 4, USAF Field Strain
AY599835 Human Adenovirus 8.34268E-53 214.587 serotype 4, US Navy
Field Strain
[0500] TABLE-US-00015 TABLE 13 Ad4FIBER: CustemSeq-Adenovirus 4
Accession # Name E value: Bit Score: AY594254 Human Adenovirus 0.0
751.806 serotype 4, vaccine strain AY594253 Human Adenovirus 0.0
751.806 Serotype 4 AY599837 Human Adenovirus 0.0 704.23 serotype 4,
USAF Field Strain AY599835 Human Adenovirus 0.0 696.3 serotype 4,
US Navy Field Strain
[0501] TABLE-US-00016 TABLE 14 Ad4HEXON-1: CustemSeq-Adenovirus 4
Accession # Name E value: Bit Score: AY594254 Human Adenovirus 0.0
751.806 serotype 4, vaccine strain AY594253 Human Adenovirus 0.0
751.806 Serotype 4 AY599835 Human Adenovirus 1.73046E-169 603.13
serotype 4, US Navy Field Strain AY599837 Human Adenovirus
4.2185E-167 595.2 serotype 4, USAF Field Strain
[0502] In each case shown above, the adenovirus type 4 prototype
(AY594253) and vaccine strain (AY594254) returned the highest
expect values and bit scores for hybridization of the adenovirus
type 4 prototype amplicons obtained by conserved PCR amplification.
This is expected because the prototype strain was used for vaccine
production and the sequences were determined to be identical. In
each case, the returns were distinguished in order of similarity
from the closely related Air Force (AY599837) and Navy (AY599835)
adenovirus 4 field strains.
[0503] Listed below are the corresponding outputs following
GenomiPhi amplification of the adenovirus type 4 prototype instead
of conserved PCR (note: not shown are several closely-related
adenoviruses with bit scores higher than the Air Force and navy
field strains): TABLE-US-00017 TABLE 15 Ad4E1A: Adenovirus 4
GenomiPhi3 Accession # Name E value: Bit Score: AY594253 Human
Adenovirus 0.0 868.765 Serotype 4 AY594254 Human Adenovirus 0.0
868.765 serotype 4, vaccine strain gi|209874|gb|M14918.1 ADRDE1AA
0.0 860.836 Adenovirus type 4 E1A region
[0504] TABLE-US-00018 TABLE 16 Ad4FIBER: Adenovirus 4 GenomiPhi3
Accession # Name E value: Bit Score: gi|434913|emb|X76547.1 AV4FIB1
0.0 1031.32 Adenovirus type 4 gene for fiber protein AY594254 Human
Adenovirus 0.0 926.254 serotype 4, vaccine strain AY594253 Human
Adenovirus 0.0 926.254 Serotype 4 AY599837 Human Adenovirus 0.0
743.877 serotype 4, USAF Field Strain
[0505] TABLE-US-00019 TABLE 17 Ad4HEXON-1: Adenovirus 4 GenomiPhi3
Accession # Name E value: Bit Score: AY594254 Human Adenovirus 0.0
1065.02 serotype 4, vaccine strain AY594253 Human Adenovirus 0.0
1065.02 Serotype 4 gi|11693508|gb| Human adenovirus 0.0 1065.02
AF065062.2| type 4 strain RI-67 AF065062 pVI core protein AY599835
Human Adenovirus 2.96209E-147 529.782 serotype 4, US Navy Field
Strain
[0506] Based on the results evidenced by FIG. 2, GenomiPhi total
amplification allowed the entire tiled region (increased
sensitivity) to be resequenced compared to the more limited region
resulting from conserved (degenerate) multiplex PCR. This result
arises due to the fact that GenomiPhi total amplification does not
rely on primer sequences within the tiled fragment. REPI listed the
appropriate adenovirus type (AY594254 or AY594253) as the 10
highest scoring return for each tile region with the exception of
Ad4FIBER. This discrepancy was later resolved by observing that the
early draft sequence of the adenovirus type 4 prototype (AY594253)
contained errors that gave rise to a slightly higher homology
between the sequence used to define Ad4FIBER and a GenBank record
for a different adenovirus type 4 strain
(gi|11693508|gb|AF065062.2|AF065062).
[0507] With this one exception, it is also important to note that
irrespective of which amplification strategy was employed, i.e.,
conserved (degenerate) multiplex PCR or GenomiPhi total
amplification, the proper Ad4 calls were made (except in case of
Ad4FIBER with GenomiPhi due to tile sequence errors). The
consistency between these methods is important as the conserved
primers may have a more direct application to serotyping members of
the same family of pathogens from complex samples without the need
for selective enrichment.
EXAMPLE 2
Ad4 Dilution Series
[0508] In the present example, hybridization and base calling was
assayed as a function of the initial concentration of the target
subsequent to conserved (degenerate) multiplex PCR. In addition,
this example also compares two different base-calling strategies
within the GDAS software: (a) "permissive" (described in
Specification) and (b) "conservative" (default) settings. The
biological sample utilized in this example was the adenovirus 4
prototype.
[0509] Hybridization and base calling were assayed as a function of
the initial concentration of the target subsequent to conserved
(degenerate) multiplex PCR. A dilution series of the adenovirus 4
prototype (ATCC) was prepared having 10.sup.5, 10.sup.3, and
10.sup.1 genomic copies per microliter. To this end, the Affymetrix
CustomSeq protocol was followed, except that samples aliquoted from
the aforementioned dilutions were amplified using the conserved
(degenerate) multiplex adenovirus primer strategy for E1A, fiber,
and hexon genes.
[0510] Tables 18-20 demonstrate the ability of the RPMV 1 chip with
REPI analysis to detect the appropriate Adenovirus type 4 target
(not specific to strain) depending on whether "Conservative"
(default GDAS) or "Permissive" (from Example 1) base call settings
were used. A positive detection was counted when REPI returned one
or more GenBank (or local) database record(s) for an Adenovirus
type 4 target as the highest BLAST bit score value(s). In each
case, "length" corresponds to the number of base pairs in the
subsequence selected by REPI as having satisfied the conditions of
the sliding window algorithm. TABLE-US-00020 TABLE 18 Ad4 Fiber
Conservative Permissive Dilutions Expt. Score Eval name length
Expt. Score Eval name length 10.sup.5 1 835.065 E-0.0 Ad4 Fiber
1227 1 1623.97 0.0 Ad4 Fiber 1227 Protein Protein 1-2 1015.46 E-0.0
Ad4 Fiber 990 1-2 1447.89 0.0 Ad 4 Fiber 997 Protein Protein 1-3
297.846 E-78 Ad4 Fiber 451 1-3 703.626 0.0 Ad4 Fiber 512 Protein
Protein 1-4 839.03 0.0 Ad4 Fiber 1059 1-4 1605.08 0.0 Ad4 Fiber
1215 Protein Protein 10.sup.3 2 488.153 E-134 Ad4 Fiber 1208 2
1559.2 0.0 Ad4 Fiber 1220 Protein Protein 2-2* n/d -- -- 2-2 n/d --
-- 2-3 232.429 E-58 Ad4 Fiber 421 2-3 405.389 E-110 Ad4 Fiber 444
Protein Protein 2-4 n/d -- -- 2-4 n/d -- -- 10.sup.1 3 172.958 E-40
Ad4 Fiber 294 3 301.101 E-79 Ad4 Fiber 442 Protein Protein 3-2* n/d
-- -- 3-2 n/d -- -- 3-3 n/d -- -- 3-3 118.758 E-24 Ad4 Fiber 279
Protein 3-4 99.61 E-19 Ad4 Fiber 93 3-4 116.39 E-24 Ad4 Fiber 95
Protein Protein *indicates degraded template; n/d indicates "not
determined" due to insufficient base calls
[0511] TABLE-US-00021 TABLE 19 Ad4 E1A Conservative Permissive
Dilutions Expt. Score E-val name length Expt. Score Eval name
length 10.sup.5 1 448.5 E-123 Ad4 E1A region 553 1 775.595 E-0.0
Ad4 E1A region 556 1-2 420.8 E-115 Ad4 E1A region 422 1-2 691.471
E-0.0 Ad4 E1A region 444 1-3 n/d -- -- 1-3 561.5 E-157 Ad4 E1A
region 413 1-4 353.4 E-94 Ad4 E1A region 424 1-4 589.253 165 Ad4
E1A region 424 10.sup.3 2 341.5 E-91 Ad4 E1A region 399 2 607.1
E-171 Ad4 E1A region 412 2-2* n/d -- -- 2-2 n/d -- -- 2-3 194.8
E-47 Ad4 E1A region 404 2-3 470.774 E-130 Ad4 E1A region 421 2-4
n/d -- -- 2-4 n/d -- -- 10.sup.1 3 n/d -- -- 3 385.836 E-104 Ad4
E1A region 405 3-2* n/d -- -- 3-2 n/d -- -- 3-3 n/d -- -- 3-3
199.58 E-48 Ad4 E1A region 403 3-4 412.8 E-112 Ad4 E1A region 397
3-4 640.794 E-0.0 Ad4 E1A region 398
[0512] TABLE-US-00022 TABLE 20 Ad4 Hexon Conservative Permissive
Dilutions Expt. Score Eval name length Expt. Score Eval name length
10.sup.5 1 377.1 E-101 Ad4 Hexon gene 760 1 893.45 0.0 Ad4 Hexon
gene 769 1-2 603.13 E-0.0 Ad4 Hexon gene 760 1-2 941.874 E-0.0 Ad4
Hexon gene 768 1-3 107.5 E-21 Ad4 Hexon gene 262 1-3 246.762 E-62
Ad4 Hexon gene 497 1-4 498.1 E-138 Ad4 Hexon gene 760 1-4 920.916
E-0.0 Ad4 Hexon gene 762 10.sup.3 2 383.1 E-103 Ad4 Hexon gene 759
2 826.865 E-0.0 Ad4 Hexon gene 759 2-2* n/d -- -- 2-2 n/d -- -- 2-3
n/d -- -- 2-3 n/d -- -- 2-4 n/d -- -- 2-4 n/d -- -- 10.sup.1 3 n/d
-- -- 3 78.8677 E-12 Ad4 Hexon gene 143 3-2* 71.86 E-11 Ad4 Hexon
gene 65 3-2 107.54 E-21 Ad4 Hexon gene 72 3-3 n/d -- -- 3-3 n/d --
-- 3-4 n/d -- -- 3-4 149.17 E-33 Ad4 Hexon gene 209 *indicates
degraded template; n/d indicates "not determined" due to
insufficient base calls
[0513] Across the range of dilutions, the "Permissive" base call
settings produced GDAS outputs that were used by REPI to
consistently yield higher subsequence lengths and bit scores than
those outputs using the default GDAS base call settings, for SNP
detection. In several cases, the "Permissive" base call settings
resulted in a sufficient number of base calls for REPI to detect
the target whereas the default settings did not. This example
demonstrates that by lowering the restrictions on base calling by
GDAS and coupling the output to the REPI algorithm, higher
sensitivity and discriminatory power (strain identification) among
pathogens is achieved.
[0514] Of particular note here is that in some cases, there was
evidence contained in the image, GDAS output, and REPI analyses, of
unintentional contamination of the degenerate primer cocktail with
trace amounts of Adenovirus 7 and Adenovirus 5 (confirmed by
specific real-time PCR). Because there was no significant
cross-hybridization between the tile regions for the prototype
adenoviruses, this contamination did not result in a perturbation
of the base calling or results in the Adenovirus 4 tile regions.
This demonstrates the robustness of the methods described herein to
perform quantitative separation of mixtures of closely related
pathogens.
[0515] It is noteworthy that the Andersen group at Lawrence
Livermore National Laboratory (Wilson et al., 2002b) described the
use of an Affymetrix resequencing chip that could detect a low
concentration (.about.10.sup.1 copies) following specific PCR
amplification. However, it was not disclosed or suggested how this
sensitivity was defined or to what extent those results were
influenced by hybridization of closely related microbial species.
Rather, the emphasis in that paper was the percentage of probe
pairs that could be used, not the actual sequences as they were
called by the chip. Therefore, the artisan would not have any
expectation resulting from the disclosure. Nor would the skilled
artisan have inferred this possibility from the previous patent
(U.S. Pat. No. 6,228,575) or related publication (Gingeras et al.,
1998) since these references fail to disclose or suggest robustness
of the prescribed methods of pattern recognition to concentration
differences, errors in the definition of tile region sequences, or
other forms of interference.
EXAMPLE 3
Base Calling Algorithm Settings and BLAST Analysis of Base Calls
for Adenovirus Field Strain (Ad5FS: AY601635) and Adenovirus 7
Prototype (Ad7; AY594255) Using Conserved (Degenerate) PCR Primers
and GenomiPhi Total Amplification
[0516] This example is the same as Example 1, except the
resequencing microarray chip (RPMV1) described above in Preparation
Example 1 was probed with adenovirus type 5 field strain (Ad5FS;
AY601635) or adenovirus type 7 prototype (AY594255) using either
(a) conserved (degenerate) multiplex PCR or (b) GenomiPhi total
amplification (FIG. 3). Based on the GDAS outputs (using
"permissive" settings), REPI identified the following 3 top "hits"
for the conserved PCR and total amplification strategies with each
of the Ad5 prototype and the Ad7 prototype. Without exception, each
subsequence called by the adenovirus type 5 tile regions (Ad5E1A,
Ad5FIBER, Ad5FIBER, Ad5HEXON-1 and Ad5HEXON-2) listed the top "hit"
(highest BLAST bit score) as the correct prototype strain for
adenovirus type 5 (Ad5FS; Accession number AY601635) using either
conserved PCR amplification or total (GenomiPhi) amplification.
[0517] Also, without exception, each subsequence for the adenovirus
type 7 tile regions (Ad7E1A, Ad7FIBER, Ad5HEXON-1 and AdSHEXON-2)
listed the top "hit" (highest BLAST bit score) as the correct
prototype strain for adenovirus type 7 (Ad7; Accession number
AY594255) when using either conserved PCR or total (GenomiPhi)
amplification. In addition, GenomiPhi amplification led to base
calls from both the Ad4E1A regions and the Ad7vaccine regions, both
resulting in top bit scores assigned to the correct Ad7 prototype
strain.
[0518] The results obtained for the adenovirus type 5 and
adenovirus type 7 prototypes confirmed those for adenovirus type 4.
Specifically, the results shown in FIGS. 3A-D show that GenomiPhi
total amplification allowed the entire tiled region (increased
sensitivity) to be resequenced compared to the more limited region
resulting from conserved (degenerate) multiplex PCR. More
importantly, these results confirm that irrespective of which
amplification strategy was employed, i.e., conserved (degenerate)
multiplex PCR or GenomiPhi total amplification, the proper Ad5 or
Ad7 calls were made.
EXAMPLE 4
Ad4-5 Breakthrough Strain as a Model of Mixed Infection Detectable
by Resequencing Microarray Assay
[0519] The sample for this example was provided by the Naval Health
Research Center (San Diego). The archived sample was a nasal wash
from a basic trainee who had previously been immunized for
adenovirus (using the Ad4/Ad7 vaccine) and who subsequently was
diagnosed with adenovirus infection and febrile respiratory
illness. Strains associated with infection in spite of previous
vaccination are broadly referred to as "breakthrough strains". In
this case, the presumption was that a single variant form of
adenovirus was the causative agent. The sample described here is
referred to as Ad4-5 Breakthrough because of inconclusive culture
and serological assays that indicated properties of both adenovirus
types 4, while sequencing of the hexon gene (one of antigenic
determinants of adenovirus serotype) indicated properties of
adenovirus type 5.
[0520] The sample was processed according to the multiplex
degenerate primer PCR protocol described in Preparation Example 1.
This approach was demonstrated to (Lin et al 2004) successfully
amplify hypervariable regions on hexon and fiber genes (as well as
a relatively variable E1A region) for all adenoviruses associated
with febrile respiratory infection. Otherwise, the standard
Affymetrix CustomSeq protocol was used.
[0521] The resultant hybridization pattern is shown is FIG. 4.
Based on the GDAS outputs (using "permissive" settings), REPI
identified the following 3 top "hits" for total amplification
strategies by multiplex PCR (See Tables 21-24). TABLE-US-00023
TABLE 21 Ad5FIBER: Ad 4-5 Breakthrough Strain Accession # Name E
value: Bit Score: lcl|AY601635 Human 5.01036E-76 291.899 Adenovirus
Serotype 5, Field Strain gi|33694637| TPA: Human 5.01036E-76
291.899 tpg|BK000408.1 adenovirus type 5 gi|32127287| Human
5.01036E-76 291.899 gb|AY224398.1 adenovirus type 5 strain KNIH
99/5 fiber gene
[0522] TABLE-US-00024 TABLE 22 Ad5HEXON-1: Ad 4-5 Breakthrough
Strain Accession # Name E value: Bit Score: lcl|AY601635 Human 0.0
644.759 Adenovirus Serotype 5, Field Strain gi|33694637| TPA: Human
1.81596E-172 613.041 tpg|BK000408.1 adenovirus type 5 gi|32127287|
Human 2.63084E-165 589.253 gb|AY224398.1 adenovirus type 5 strain
KNIH 99/5 fiber gene
[0523] TABLE-US-00025 TABLE 23 Ad7E1A: Ad 4-5 Breakthrough Strain
Accession # Name E value: Bit Score: gi|4127293|emb| Adenovirus
3.18875E-83 315.688 AJ005536.1|ACA005536 clinical isolate, clone
B44 from genomic DNA lcl|AY601633 Human 4.97873E-82 311.723
Adenovirus Serotype 21 gi|21311720| Human 1.21371E-79 303.794
gb|AF492353.1 adenovirus type 21 E1A 13S protein gene lcl|AY601634
Human 2.95877E-77 295.864 Adenovirus Serotype 7, US Navy Field
Strain
[0524] TABLE-US-00026 TABLE 24 Ad7HEXON-1: Ad 4-5 Breakthrough
Strain Accession # Name E value: Bit Score: lcl|AY601633 Human
Adenovirus 6.98074E-24 117.452 Serotype 21 gi|13919592| Human
adenovirus type 6.98074E-24 117.452 gb|AY008279.1 21 hexon protein
gene gi|21311720| Human adenovirus 6.98074E-24 117.452
gb|AF492353.1 type 50 lcl|AY594255 Human Adenovirus 6.0101E-12
77.8048 Serotype 7 lcl|AY601634 Human Adenovirus 6.0101E-12 77.8048
Serotype 7, US Navy Field Strain
[0525] Selected REPI return values for Adenovirus 4,5 Breakthrough
Strain on RPMV1. This "strain" was determined by several
independent means to be a mixture of adenovirus type 5 and a B
subgroup member, most likely being adenovirus type 21. This was
borne out in the combined hybridization results and REPI analysis,
with adenovirus type 5 being confirmed on all adenovirus 5 tiles,
while the B subgroup prototype (adenovirus 7; AY594255) tiles
returned the highest bit scores for a adenovirus type 21, a
different member of the B subgroup.
[0526] Conventional DNA sequencing revealed that the complete
adenovirus 5 genome was recovered and that other sequences were
present that could not be assembled because an Adenovirus 5 model
assumption was required. However, a significant amount of
corroborating evidence was amassed to support the conclusion of an
Ad5/Ad 21 co-infection, including similar results obtained in
parallel experiments with 70-mer oligonucleotide arrays (Lin et al
2004) showing evidence of an Ad 5/Ad 21 mixture or co-infection,
and sequencing of amplicons using conventional approaches that
supported the presence of both Ad5 and Ad 21. An extended study
conducted subsequently of adenovirus breakthrough strains (in
collaboration with NHRC San Diego) showed that a variety of such
strains were actually comprised of mixtures of lesser adenoviruses
(manuscript in preparation).
[0527] There are several significant aspects and advantages
implicit in this discovery: [0528] 1. demonstration that a
prototype region (Ad 7 for B subgroup) could be used to
successfully identify a different B subgroup member (Ad 21) without
specific a priori design or anticipation of discovering Ad 21 in a
sample. [0529] 2. demonstration of a microarray to distinguish
mixed pathogens due to spatial resolution. This mixture confounded
a conventional DNA sequencing approach that relied on assembly of
contiguous fragments based on assumption of a single organism.
EXAMPLE 5
Influenza A Strain Identification Using Prototype Regions
[0530] Influenza A positive clinical samples of unknown serotype
were collected during the 2002-2003 and 2003-2004 flu seasons and
provided to Dr. Zheng Wang as frozen nasal washes by Dr. Elizabeth
Walter.
[0531] Samples were processed for pathogen chip analysis as
follows:
[0532] EPICENTRE MasterPure.TM. DNA Purification Kits (Madison,
Wis.) were used to extract total nucleic acids from 50 Ill of nasal
wash (sample NW20031114-05-02) without RNase treatment. Sample
NW20031114-05-02 was collected on Nov. 14, 2003 at Lackland Air
Force Base (San Antonio, Tex.). The patient was vaccinated on Nov.
10, 2003. The total nucleic acids were suspended in 20 .mu.l
nuclease free H.sub.2O. Two-step RT-PCR was employed to amplify
each of the viral gene segments.
[0533] Briefly, the RNA in 4 .mu.l total nucleic acids was
transcribed into cDNA by using SuperScript.TM. III Reverse
Transcriptase (Invitrogen, Carlsbad, Calif.) according to the
protocol provided by the manufacturer and 100 pmol Uni3 primer in
20 .mu.l. The RT (reverse transcription) reaction was performed at
42.degree. C. for 1 hr and then was inactivated at 70.degree. C.
for 15 min. 2 .mu.l of the RT-reaction was used for cDNA
amplification. The cDNA was amplified by using TaqPlus Long System
(Stratagene, La Jolla, Calif.) according to protocol provided. Two
different PCR conditions were used for amplification. For universal
PCR, universal primers uni3 and uni5 were used to amplify all eight
segments of hemagglutinin (HA), neuraminidase (NA) and matrix (M)
(see Hoffman et al, 2001). For multiplex PCR, segments were
amplified by mixing three segment specific primer pairs
(Bm-HA-1/Bm-NS-890R, BA-Na-1/Ba-Na-1413R and Bm-M-1/Bm-M-1027R).
The final concentration of Mg.sup.2+-ions was 2 mM and final primer
concentration was 1 .mu.M. PCR condition for universal
amplification was: 94.degree. C. for 2 min followed by 29 cycles of
94.degree. C. for 1 min, 40.degree. C. for 2 min. 72.degree. C. for
3 min, and final extension at 72.degree. C. for 10 min. The
multiplex PCR condition was basically same except that the
annealing temperature was raised to 58.degree. C. PCR products were
purified by Qiagen PCR Purification kit. 500 ng of universally
amplified PCR products and 1000 ng of multiplex PCR products were
applied to two V1 Pathogen chips for hybridization according to
Affymetrix protocol, respectively.
[0534] Primer Sequences (from Hoffman et al, 2001) TABLE-US-00027
Uni3: AGCAAAAAGCAGG (SEQ ID NO:431) Uni5: AGTAGAAACAAG (SEQ ID
NO:432) Bm-HA-1: TATTCGTCTCAGGGAGCAAAAGCAGGGG (SEQ ID NO:433)
Bm-NS-890R: ATATCGTCTCGTATTAGTAGAAACAAGGGTGTTTT (SEQ ID NO:434)
Ba-Na-1: TATTGGTCTCAGGGAGCAAAAGCAGGAGT (SEQ ID NO:435) Ba-Na-1413R:
ATATGGTCTCGTATTAGTAGAAACAAGGAGTTTTTT (SEQ ID NO:436) Bm-M-1:
TATTCGTCTCAGGGAGCAAAAGCAGGTAG (SEQ ID NO:437) Bm-M-1027R:
ATATCGTCTCGTATTAGTAGAAACAAGGTAGTTTTT (SEQ ID NO:438)
[0535] The chip scan results are shown for each of the two
amplification protocols: (a) universal primer PCR and (b) multiplex
PCR (FIG. 5). Based on the GDAS outputs (using "permissive"
settings), REPI identified the following 3 top "hits" for the
universal primer (Hoffman et al., 2001) amplification method and
total amplification strategies with each of the Ad5 prototype and
specific multiplex PCR method (See Table 25).
[0536] For this experiment, the present inventors obtained sequence
for the HA gene of the Flu strain that evaded the 2003-2004 vaccine
and caused widespread illness, namely Fujian/411/2002 (note: only
the HA sequence was available for Fujian/411/2002) and incorporated
that into the database searched by REPI. Each of the two
amplification methods produced effectively the same results,
although bit scores varied slightly. Specifically, Fujian 411/2002
returned the highest bit score for FluAHA3 in the REPI output
files. Moscow 10/99, the strain used for vaccination in 2003, did
not return a REPI output for FluAHA3. TABLE-US-00028 TABLE 25
FluAHA3: NW20031114-05-02 ACID04-B2 Accession # Name E value: Bit
Score: lcl|ISDN38157 InfluenzaA/Fujian/411/ 0.0 1431.75
2002_Hemagglutinin_104 gi|37530025| Influenza A virus 0.0 1431.75
gb|AY389356.1 (A/Middleburg/41/03(H3N2)) hemagglutinin (HA) gene
gi|37530033| Influenza A virus 0.0 1423.83 gb|AY389360.1
(A/Pretoria/17/03(H3N2)) hemagglutinin (HA) gene gi|37530031|
Influenza A virus 0.0 1423.83 gb|AY389359.1
(A/Pretoria/16/03(H3N2)) hemagglutinin (HA) gene
[0537] The highest BLAST bit score for HA3 was achieved for
(InfluenzaA/Fujian/411/2002_Hemagglutinin.sub.--104), the H3N2
etiologic agent of the 2003-2004 Influenza outbreak in the United
States that evaded protection by vaccination. An indistinguishable
viral sequence for HA3 also achieved the same bit score. Over 500
GenBank records were returned for HA3, all having expect values of
0.0 and bit scores ranging from those shown above to 995.636. The
vaccine strain, (A/Panama/2007/99(H3N2)), was not in the list of
returned Influenza A strains in the HA3 region. Note: sequence for
InfluenzaA/Fujian/411/2002_Hemagglutinin.sub.--104 was only
available for HA, so other regions are not compared here.
[0538] Although not shown, the present inventors have been able to
identify the H1N1 outbreak strain from the 2002-2003 flu
season.
EXAMPLE 6
Reduced Hybridization Times
[0539] The Affymetrix CustomSeq protocol specifies that the
hybridization step be at least 16 hours to facilitate maximum
hybridization. For the intended purpose of a diagnostic capability,
this could be prohibitively long. Therefore, the applicability and
efficiency of the present methodologies to short hybridization
times was assessed.
[0540] In the present example hybridization and base calling were
performed, using samples that were amplified by specific PCR
reagents and then hybridized for periods of either 1 hour, 30
minutes, or 15 minutes on the resequencing microarray chip (RPMV1)
described above in Preparation Example 1. For this comparative
study the Affymetrix CustomSeq protocol was followed with the
exception of varying hybridization time. In the example data shown
(FIG. 6), a nasal wash that was negative for all targets probed by
the microarray except the erythromycin resistance markers SPYERMB,
SPYERMTR, and SPYMEFAE was subjected to specific PCR for each of
those markers. The amplicons were then hybridized to separate
microarrays for either 16 hours or 15 minutes, and then processed
otherwise as prescribed by the Affymetrix protocol.
[0541] The results appear in FIG. 6A [16 hours; ex-6a]) and FIG. 6B
[2 hours; ex-6b]. Based on the GDAS outputs (using "permissive"
settings), REPI identified the following 3 top "hits" for the 16
hour hybridization and the 15 minute hybridization,
respectively.
[0542] The corresponding GDAS and REPI analyses show that the
number of base calls resulting is fewer in the case of the 15
minute hybridization compared to that for the 16 hour hybridization
(See Tables 26-31). For example, the SPYERMB tile region produced a
subsequence length of 219 bp (98% of tile region, 95% base calls
made within the subsequence) for a 16 hour hybridization, while
discontinuities in the hybridization pattern from a 15 minute
hybridization resulted in fragmentation of the tile into two
subsequences of lengths 100 bp (45% of tile region, 99% base calls
made within subsequence) and 87 bp (39% of tile region, 99% base
calls made in subsequence). TABLE-US-00029 TABLE 26 SPYERMB:
BL_JMS_020604_TW_3c (16 hour hybridization) Accession # Name E
value: Bit Score: gi|21886737| Enterococcus faecium 1.90377E-98
365.247 gb|AF516335.1 plasmid pUW786 multiple antibiotic resistance
gene cluster
[0543] TABLE-US-00030 TABLE 27 SPYERMB: BL_JMS_020604_TW_3b (15
minute hybridization) Accession # Name E value: Bit Score:
gi|21886737| Enterococcus faecium 1.6409E-47 194.764 gb|AF516335.1
plasmid pUW786 multiple antibiotic resistance gene cluster
[0544] TABLE-US-00031 TABLE 28 SPYERMTR: BL_JMS_020604_TW_3c (16
hour hybridization) Bit Accession # Name E value: Score:
gi|2190969|gb| Streptococcus pyogenes 3.18989E-13 79.787
AF002716.1| leader peptides 1 (lpg1) AF002716 and 2 (lpg2), and
erythromycin resistance methylase (ermTR) gene
[0545] TABLE-US-00032 TABLE 29 SPYERMTR: BL_JMS_020604_TW_3b (15
minute hybridization) Accession # Name E value: Bit Score: No calls
N/A N/A N/A
[0546] TABLE-US-00033 TABLE 30 SPYMEFAE: BL_JMS_020604_TW_3c (16
hour hybridization) Accession # Name E value: Bit Score:
gi|22121182|gb|AY071836.1 Strepto- 1.01677E-60 240.358 coccus sp. 6
macrolide- efflux protein (mefA) gene
[0547] TABLE-US-00034 TABLE 31 SPYMEFAE: BL_JMS_020604_TW_3b (15
minute hybridization) Accession # Name E value: Bit Score:
gi|22121182|gb|AY071836.1 Streptococcus 3.331E-36 159.081 sp. 6
macrolide- efflux protein (mefA) gene
[0548] The tables above show the comparative top rankings for the
three different antibiotic resistance markers described in Example
6 for 16 hour versus 15 minute hybridization times. Although the
SPYERMTR tile region did not produce a sufficient number of base
calls to allow identification following a 15-minute hybridization,
both the SPYERMB and SPYMEFAE returned high bit scores for the same
variants of the antibiotic resistance markers comprising the tile
regions (identified in Table 9).
[0549] However, the REPI output for each of the three tile regions
showed that the highest bit scores in each region were the same
(except for SPYERMTR which lacked sufficient base calls for the 15
minute hybridization), although both the bit scores and expect
values were different in each case. Similar results were obtained
for 30 minute and 1 hour hybridizations, with an increase in the
number of base calls made with increasing hybridization times.
Further, this example clearly illustrates the robustness of the
method to make fine scale discrimination between targets with a
range of different hybridization patterns.
EXAMPLE 7
Subtractive Hybridization
[0550] A variety of methods might be employed to reduce the amount
of background human DNA in clinical samples when using total
amplification. One method is direct subtraction of the background
genomic DNA from amplified products in hybridization solution with
COT-1 fraction human genomic DNA that consists largely of rapidly
annealing repetitive elements. Another method is bead-based
subtraction of background genomic DNA from clinical sample prior to
total amplification. A third method is the combination of the above
methods, the genomic background DNA from clinical sample were
subtracted using bead-based subtraction prior to total
amplification, subsequently, the background DNA might be further
subtracted from the clinical sample in hybridization solution with
COT-1 human DNA.
Methods for FIGS. 7A and 7B
[0551] A nasal wash was collected from one clinical sample and the
total nucleic acid was isolated using MasterPure.TM. DNA
purification kit (Epicentre). The nucleic acid was predigested with
McrBc (an endonuclease that cleaves DNA containing methylcytosine
on one or both strands) at 37.degree. C. for one hour, and then
heated to 65.degree. C. to denature the enzyme. Subsequently, the
sample was split to two aliquots and amplified with GenomiPhi DNA
amplification kit (Amersham Biosciences). The amplified products
were fragmented and labeled according to Affymetrix CustomSeq
protocol.
[0552] For this comparative study the Affymetrix CustomSeq protocol
was followed except for the following changes. The clinical sample
was hybridized to the resequencing microarray chips (RPMV1) either
with standard hybridization solution according to Affymetrix
CustomSeq protocol (FIG. 7A) or with hybridization mixture
containing 12 .mu.g Herring Sperm DNA and 10 .mu.g COT-1 human DNA
(Roche, Indianapolis, Ind.) instead of 22 .mu.g Herring sperm DNA
as Affymetrix recommended (FIG. 7B).
Methods for FIGS. 7C and 7D
[0553] The present example describes bead-base subtraction and the
combination of bead-base subtraction with subtractive hybridization
methods for increasing the pathogen specific signal-to-noise ratio
when using total amplification.
[0554] COT-1 human DNA, consisting largely of rapidly annealing
repetitive elements, was biotinylated at 3'-end using terminal
transferase (Tdt) (NEB) to generate Biotinylated Subtraction Probes
(BSPs). An aliquot of .about.400 ng of BSPs was incubated at
95.degree. C. for 10 minutes in 4.times.SSC and 0.2% SDS with
nucleic acid extract of clinical nasal wash (diagnosed as Ad4
positive clinical sample) predigested with McrBC at 37.degree. C.
for one hour, then slowly cooled to 65.degree. C. to allow
co-hybridization with the HMB for one hour. After hybridization,
the BSP-HMB complexes were incubated at room temperature for 30
minutes at 1.times.PBS, 0.1% BSA with BcMag.Streptavidin Magnetic
Beads (SMBs, Bioclone). The beads were separated from supernatant
using a magnetic separator. The supernatant was then transferred to
a new tube and the magnetic beads were washed once with 133 PBS,
0.1% BSA. The supernatants were combined and precipitated with
ethanol. The DNA pellet was resuspended with nuclease free water
(Ambion) then split to two aliquots and amplified with GenomiPhi
DNA amplification kit (Amersham Biosciences) according to the
manufacture recommended protocol. Briefly, 1 ml DNA template was
denatured with 9 .mu.l of sample buffer at 95.degree. C. for 3
minutes, the cooled down to 4.degree. C. In a separate tube, 1
.mu.l of enzyme was mixed with 9 .mu.l of reaction buffer, and then
the mixture was added to the denatured DNA template. The reaction
was then performed at 30.degree. C. for 18 hours at Thermal
Cycler-PTC225 (MJ Research Inc., Reno, Nev.).
[0555] For this comparative study, the amplified products were
hybridized to the resequencing microarray chips (RPMV1) either with
standard hybridization solution according to Affymetrix CustomSeq
protocol (FIG. 7C) or with hybridization mixture containing 12
.mu.g Herring Sperm DNA and 10 .mu.g COT-1 human DNA instead of 22
.mu.g Herring sperm DNA as Affymetrix recommended (FIG. 7D).
Methods for FIG. 7E
[0556] The combination of bead-base subtraction and subtractive
hybridization procedure was repeated with double amount of initial
material and the result appearing in FIG. 7E evidenced that the
present inventors were able to subtract human background from
clinical sample for the resequencing microarray chip (RPM Version 1
chip) described above in Preparation Example 1 to resolve the
presence and identification of Ad4 hexon and fiber genes in the
clinical sample.
Subtractive Hybridization Results
[0557] The results of the subtractive hybridization approaches are
shown in FIG. 7. As shown in FIG. 7A, total amplification of the
isolated nucleic acids resulted in significant background
hybridization to the RPMV1 microarray. Although the adenovirus type
4 region showed a discernibly higher signal than that of the
overall background, GDAS was unable to make enough base calls to
satisfy the sliding window condition in REPI, so no attempts to
BLAST the data were made. Co-hybridizing the same set of total
amplicons with COT-1 human genomic DNA did little to improve this
(FIG. 7B) and no base calls were made. FIG. 7C shows that the use
of a magnetic bead-based subtraction alone, prior to total
amplification, again did not result in a sufficient number of base
calls to allow similarity searching.
[0558] However, through the combined use of a bead-based
subtraction prior to amplification and co-hybridization with COT-1
human DNA, enough base calls could be made to identify and rank the
adenovirus 4 field strains (Navy and Air Force without
discrimination) higher than those of the vaccine and prototype
adenovirus 4 strains in each of three subsequences identified in
the Ad4HEXON-1 tile region (FIG. 7D and Table 32). TABLE-US-00035
TABLE 32 Ad4HEXON-1: BL_818141_030503_NW_4 Accession # Name E
value: Bit Score: lcl|AY599837 Human Adenovirus 6.14232E-37 161.064
serotype 4, USAF Field Strain lcl|AY599835 Human Adenovirus
6.14232E-37 161.064 serotype 4, US Navy Field Strain lcl|AY594254
Human Adenovirus 1.49737E-34 153.134 serotype 4, vaccine strain
lcl|AY594253 Human Adenovirus 1.49737E-34 153.134 Serotype 4
[0559] Moreover, by performing the same set of combined steps using
2 microliters of starting sample material instead of 1 microliter
(FIG. 7E), base calling was extended into the Ad4FIBER tile region
(see Table 33) in addition to Ad4HEXON-1 (see Table 34), allowing
unambiguous high bit-score ranking for Ad4 Field Strain (though not
distinguishing between Air Force and Navy field strains) versus
prototype (AY594253) or vaccine strain (AY594254) in several
Ad4FIBER subsequences. However, the Ad4 Air Force Field Strain
(Accession No. AY599837) was distinguishable from the Navy field
strain (AY599835) in the Ad4HEXON-1 tile because an increased
number of base calls allowed for a more complete sequence
comparison and increased bit score resolution. TABLE-US-00036 TABLE
33 Ad4FIBER: BL_818141_030503_NW_9b Accession # Name E value: Bit
Score: lcl|AY599837 Human Adenovirus 3.51948E-22 111.505 serotype
4, USAF Field Strain lcl|AY599835 Human Adenovirus 3.51948E-22
111.505 serotype 4, US Navy Field Strain lcl|AY594254 Human
Adenovirus 8.57976E-20 103.575 serotype 4, vaccine strain
lcl|AY594253 Human Adenovirus 8.57976E-20 103.575 Serotype 4
[0560] TABLE-US-00037 TABLE 34 Ad4HEXON-1: BL_818141_030503_NW_9b
Accession # Name E value: Bit Score: lcl|AY599837 Human Adenovirus
1.59752E-70 274.058 serotype 4, USAF Field Strain lcl|AY599835
Human Adenovirus 9.49375E-66 258.199 serotype 4, US Navy Field
Strain lcl|AY594254 Human Adenovirus 5.64196E-61 242.34 serotype 4,
vaccine strain lcl|AY594253 Human Adenovirus 5.64196E-61 242.34
Serotype 4
EXAMPLE 8
RPMV2 Chip and Design thereof
[0561] A listing of the sequence tiles for the RPM V2 chip is
listed below in (Table 35). This represented an approximate 10-fold
increase in the content of the RPM V1 chip. TABLE-US-00038 TABLE 35
RPMV2 Chip Table SEQ Accession Number: ID Alias Organism Gene Name
Seq Num Length NO: ATTIM1 Arabidopsis thaliana(1) triosephosphate
isomerase (TIM) 523 59 Ad3E1A Adenovirus 3 E1A AY599834: 576-1455
879 60 Ad3HEXON Adenovirus 3 Hexon AY599834: 18420-21254 595 61
Ad3FIBER Adenovirus 3 Fiber AY599834: 31370-32329 746 62 Ad7E1A
Adenovirus 7 E1A AY594255: 577-1445 868 63 Ad7HEXON Adenovirus 7
Hexon AY594255: 18419-21232 513 64 Ad7FIBER Adenovirus 7 Fiber
AY594255: 31320-32297 977 65 Ad7NAVYE1A Adenovirus 7 FS Navy E1A
AY601634: 575-1454 879 66 Ad7NAVYHEXON Adenovirus 7 FS Navy Hexon
AY601634: 18408-21210 504 67 Ad7NAVYFIBER Adenovirus 7 FS Navy
Fiber AY601634: 31320-32295 975 68 Ad16E1A Adenovirus 16 E1A
AY601636: 574-1452 878 69 Ad16HEXON Adenovirus 16 Hexon AY601636:
18450-21272 667 70 Ad16FIBER Adenovirus 16 Fiber AY601636:
31448-32509 652 71 Ad21E1A Adenovirus 21 E1A AY601633: 574-1452 878
72 Ad21HEXON Adenovirus 21 Hexon AY601633: 18454-21303 807 73
Ad21FIBER Adenovirus 21 Fiber AY601633: 31406-32377 685 74 Ad11E1A
Adenovirus 11 E1A AY163756: 568-1440 872 75 Ad11HEXON Adenovirus 11
Hexon AY163756: 18254-21100 677 76 Ad11FIBER Adenovirus 11 Fiber
AY163756: 30811-31788 977 77 Ad35E1A Adenovirus 35 E1A AY271307:
569-1441 872 78 Ad35HEXON Adenovirus 35 Hexon AY271307: 18257-21115
689 79 Ad35FIBER Adenovirus 35 Fiber AY271307: 30827-31798 971 80
Ad1E1A Adenovirus 1 E1A 33330439: 560-1546 986 81 Ad1HEXON
Adenovirus 1 Hexon 33330439: 18861-21755 715 82 Ad1FIBER Adenovinis
1 Fiber 33330439: 31101-32849 750 83 Ad2E1A Adenovirus 2 E1A
33694600: 559-1542 983 84 Ad2HEXON Adenovirus 2 Hexon 33694600:
18838-21744 837 85 Ad2FIBER Adenovirus 2 Fiber 33694600:
31030-32778 750 86 Ad5E1A Adenovirus 5 E1A 33465830: 560-1545 985
87 Ad5HEXON Adenovirus 5 Hexon 33465830: 18842-21700 732 88
Ad5FIBER Adenovirus 5 Fiber 33465830: 31037-32782 747 89 Ad6E1A
Adenovirus 6 E1A CBI 985 90 Ad6HEXON Adenovirus 6 Hexon X67710 833
91 Ad6FIBER Adenovirus 6 Fiber AB108424 750 92 Ad4E1A Adenovirus 4
E1A AY594253: 576-1441 865 93 Ad4HEXON Adenovirus 4 Hexon AY594253:
18248-21058 2810 94 Ad4FIBER Adenovirus 4 Fiber AY594253:
31645-32922 1277 95 Ad4AFE1A Adenovirus 4 FS AF E1A AY599837:
575-1407 832 96 Ad4AFHEXON Adenovirus 4 FS AF Hexon AY599837:
18179-20989 2810 97 Ad4AFFIBER Adenovirus 4 FS AF Fiber AY599837:
31463-32740 1277 98 Ad12E1A Adenovirus 12 E1A 9626621: 503-1099 597
99 Ad12HEXON Adenovirus 12 Hexon 9626621: 17740-20499 884 100
Ad12FIBER Adenovirus 12 Fiber 9626621: 29368-31131 908 101 Ad17E1A
Adenovirus 17 E1A 9632407: 560-1138 579 102 Ad17HEXON Adenovirus 17
Hexon 9632407: 17754-20617 692 103 Ad17FIBER Adenovirus 17 Fiber
9632407: 30935-32035 829 104 Ad40E1A Adenovirus 40 E1A 9626553:
418-1326 824 105 Ad40HEXON Adenovirus 40 Hexon 9626553: 17643-20414
876 106 Ad40FIBER Adenovirus 40 Fiber 9626553: 28751-29914 707 107
FluAHA1 Influenza A H1N1 (New Caledonia Like) Hemagglutinin 1
AJ344014 1692 108 FluAHA2 Influenza A H2N2 (berkley) Hemagglutinin
2 L11125 805 109 FluAHA3 Influenza A H3N2 (Fujian) Hemagglutinin 3
ISDN38157 1042 110 FluAHA4 Influenza A H4N6 (Swine: Ontario)
Hemagglutinin 4 AF285885 1371 111 FluAHA5 Influenza A H5N1
(Vietnam) Hemagglutinin 5 AY526745 303 112 FluAHA6 Influenza A H6N2
(Turkey; Germany) Hemagglutinin 6 AJ507203 887 113 FluAHA7
Influenza A H7N7 (Netherlands) Hemagglutinin 7 AY338459 818 114
FluAHA8 Influenza A H8N4 (Duck; Alberta) Hemagglutinin 8 AF310988
897 115 FluAHA9 Influenza A H9N2 (Swine: China) Hemagglutinin 9
AY294658 601 116 FluAHA10 Influenza A H10 (Shorebird) Hemagglutinin
10 AF311750 775 117 FluAHA11 Influenza A H11 (Duck; Taiwan)
Hemagglutinin 11 AF310986 728 118 FluAHA12 Influenza A H12N4 (Ruddy
Tumstone; Hemagglutinin 12 AF310990 738 119 Delaware) FluAHA13
Influenza A H13N6 (Gull; Astrakan) Hemagglutinin 13 M26089 1765 120
FluAHA14 Influenza A H14 (mallard; Gurjev) Hemagglutinin 14 M35997
763 121 FluAHA15 Influenza A H15N8 (Duck; Australia) Hemagglutinin
15 L43916 793 122 FluANA1-1 Influenza A H1N1 (New Caledonia Like)
Neuraminidase 1 AJ518092 1459 123 FluANA1-2 Influenza A H5N1
(Vietnam) Neuraminidase 1 AY526746 575 124 FluANA2 Influenza A H3N2
(Ireland) Neuraminidase 2 AJ457947 1062 125 FluANA3 Influenza A
H5N3 (chicken: TX) Neuraminidase 3 AY300947 852 126 FluANA4
Influenza A H8N4 (Duck: Alberta) Neuraminidase 4 K01030 257 127
FluANA5 Influenza A H6N5 (Shearwater) Neuraminidase 5 M24740 913
128 FluANA6 Influenza A H4N6 (Swine: Ontario) Neuraminidase 6
AF285887 739 129 FluANA7 Influenza A H7N7 (Netherlands)
Neuraminidase 7 AY340079 994 130 FluANA8 Influenza A H5N8 (Duck:
NY) Neuraminidase 8 AY300948 843 131 FluANA9 Influenza A H2N9
(Duck: Nanchang) Neuraminidase 9 AY180830 444 132 FluAH1N1MATRIX
Influenza A H1N1 (New Caledonia Like) Matrix AJ458301 734 133
FluAH5N1MATRIX Influenza A H5N1 (Vietnam) Matrix AY526748 657 134
FluBHA Influenza B Hemagglutinin B AB126838 785 135 FluBNA
Influenza B Neuraminidase B AY139074 809 136 FluBMATRIX Influenza B
Matrix AB036877 763 137 FluCHA Influenza C Hemagglutinin-esterase
AB093473 401 138 FluCMATRIX Influenza C Matrix AB086809 862 139
PIV1HN Parainfluenza 1 Hemagglutinin-neuraminidase 19718363:
6903-8630 1728 140 PIV1MATRIX Parainfluenza 1 Matrix 19718363:
3637-4809 958 141 PIV1NC Parainfluenza 1 Nucleocapsid 19718363:
56-1737 1682 142 PIV2HN Parainfluenza 2
Hemagglutinin-neuraminidasegi 19525721: 6817-8532 1716 143
PIV2MATRIX Parainfluenza 2 Matrix 19525721: 3411-4742 741 144
PIV2NC Parainfluenza 2 Nucleocapsid 19525721: 71-1919 1849 145
PIV3HN Parainfluenza 3 Hemagglutinin-neuraminidase 10937870:
6806-8530 1725 146 PIV3MATRIX Parainfluenza 3 Matrix 10937870:
3753-4814 544 147 PIV3NC Parainfluenza 3 Nucleocapsid 10937870:
111-1658 1548 148 PIV4HN Parainfluenza 4
Hemagglutinin-neuraminidase E02727 686 149 PIV4MATRIX Parainfluenza
4 Matrix E03809 855 150 HRV14NCR Rhinovirus 14 5' NCR AF108186 520
151 HRV1ANCR Rhinovirus 1A 5' NCR AF108179 511 152 HRV21NCR
Rhinovirus 21 5' NCR AF108180 499 153 HRV29NCR Rhinovirus 29 5' NCR
AF542420 676 154 HRV58NCR Rhinovirus 58 5' NCR AF108183 504 155
HRV62NCR Rhinovirus 62 5' NCR AF108184 501 156 HRV87NCR Rhinovirus
87 5' NCR AF108187 506 157 HRV95NCR Rhinovirus 95-01468 5' NCR
AF108164 508 158 RSVABL RSV A&B polymerase L AF254574 379 159
RSVAMATRIX RSV A Matrix 3089371: 3253-4210 958 160 RSVANC RSV A
Nucleocapsid 3089371: 1126-2398 955 161 RSVBMATRIX RSV B Matrix
2582022: 3263-4033 770 162 RSVBNC RSV B Nucleocapsid 2582022:
1140-2315 602 163 HCV229ESPIKE Coronavirus 229E Spike 12175745:
20570-24091 1534 164 HCV229EMEM Coronavirus 229E Membrane protein
12175745: 24995-25672 678 165 HCV229ENC Coronavirus 229E
Nucleocapsid 12175745: 25686-26855 924 166 HCVOC43SPIKE Coronavirus
OC43 Spike 38018022: 23644-27729 1456 167 HCVOC43MEM Coronavirus
OC43 Membrane protein 38018022: 28402-29094 693 168 HCVOC43NC
Coronavirus OC43 Nucleocapsid 38018022: 29104-30450 966 169
SARSSPIKE Coronavirus SARS(Urbani) Spike 30027617: 21492-25259 1438
170 SARSMEM Coronavirus SARS(Urbani) Membrane Glycoprotein
30027617: 26398-27063 666 171 SARSNC Coronavirus SARS(Urbani)
Nucleocapsid 30027617: 28120-29388 932 172 HCVNL63SPIKE Coronavirus
NL63 Spike AY567487: 20472-24542 1534 173 HCVNL63ORF3 Coronavirus
NL63 ORF3 AY567487: 24542-25219 678 174 HCVNL63MEM Coronavirus NL63
Membrane protein AY567487: 25442-26122 681 175 HCVNL63NC
Coronavirus NL63 Nucleocapsid AY567487: 26133-27266 879 176
MPVMATRIX Metapneumovirus Matrix AY145271 765 177 MPVNC
Metapneumovirus Nucleocapsid AY145272 1185 178 HHV1L Human
herpesvirus 1 (Simplex) DNA polymerase 9629378: 62807-66514 1061
179 HHV1CAPSID Human herpesvirus 1 (Simplex) major capsid protein
9629378: c40528-36404 993 180 HHV3L Human herpesvirus 3 (VZV) DNA
polymerase 9625875: c50636-47052 1061 181 HHV3CAPSID Human
herpesvirus 3 (VZV) major capsid protein 9625875: 71540-75730 993
182 HHV4L Human herpesvirus 4 (EBV) DNA polymerase 9625578:
c156746-153699 1067 183 HHV4CAPSID Human herpesvirus 4 (EBV) major
capsid protein 9625578: c137466-133321 992 184 HHV5L Human
herpesvirus 5 (CMV) DNA polymerase 9628290: c59588-56550 1136 185
HHV5CAPSID Human herpesvirus 5 (CMV) major capsid protein 9628290:
c129226-125114 998 186 HHV6L Human herpesvirus 6 (Roleola) DNA
polymerase 9628290: c59588-56550 1058 187 HHV6CAPSID Human
herpesvirus 6 (Roleola) major capsid protein 9628290: c93912-89875
1001 188 ENTEROVIRUS Enterovirus (genus) Enterovirus NC_001612 1758
189 COXSACKIEVIRUS Human coxsackievirus unknown AF499635 920 190
ECHO Echo (subgroup) Echovirus NC_003986 1277 191 POLIO Polio
(subgroup) Poliovirus NC_002058 1226 192 POLIO1NCR Polio type 1
Sabin 5'NCR AY184219: 162-597 436 193 POLIO2NCR Polio type 2 Sabin
5'NCR AY184220: 162-598 437 194 POLIO3NCR Polio type 3 Sabin 5'NCR
AY184221: 164-600 437 195 MEASLESHA Paramyxoviridae morbillivirus
(Rubeola) Hemagglutinin AY523581 1854 196 MEASLESMATRIX
Paramyxoviridae morbillivirus (Rubeola) Matrix 9626945: 3438-4445
1008 197 NEWCASTLEHN Newcastle Hemagglutinin-Neuraminidase AY510092
1734 198 NEWCASTLEMATRIX Newcastle disease Matrix 11545722:
3256-4487 1232 199 WNE West Nile Virus Envelope AF346319 1504 200
WNNS West Nile Virus Nonstructural protein 5 AF208017 917 201 WNCM
West Nile virus C and prM 432 202 YFE Yellow Fever Envelope
AY359908 1547 203 YFNS Yellow Fever Nonstructural protein 5
AF013417 1035 204 VMVG3R Variola major virus G3R 623595:
183809-184570 762 205 VMVHA Variola major virus hemagglutinin
623595: 151032-151973 942 206 VMVSOD Variola major virus SOD
623595: 144137-144514 378 207 VMVCRMB Variola major virus CrmB 291
208 MONKEYPOX Monkeypox virus inclusion body protein U84503 812 209
EBOLAL Reston Ebola L 22789222: 11464-18866 800 210 EBOLANP Reston
Ebola NP 22789222: 56-3013 806 211 EBOLAMATRIX Reston Ebola Matrix
22789222: 4396-5893 1498 212 MARBURGL Marburg virus L protein
13489275: 11479-18474 1218 213 MARBURGNP Marburg virus NP 13489275:
103-2190 847 214 LASSAL Lassa virus L 23343512: c7122-466 1021 215
LASSANP Lassa virus Nucleoprotein 23343509: 101-1810 751 216
LASSAGP Lassa virus Glycoprotein 23343509: c3347-1872 1476 217
MACHUPOL Machupo virus L polymerase 34365535: c7094-465 1588 218
MACHUPONP Machupo virus Nucleoprotein 34365532: c3364-1670 763 219
MACHUPOG Machupo virus Glycoprotein 34365532: 89-1579 1491 220
VEEVNS Venezuelan equine encephalitis virus NS 9626526: 1-7526 923
221 VEEVNC Venezuelan equine encephalitis virus Structrual protein
9626526: 7532-11444 1512 222 EEEVNS Eastern equine encephalitis
virus NS 21218484: 47-5683 1312 223 EEEVNC Eastern equine
encephalitis virus Nucleocapsid 21218484: 7592-11317 975 224 WEEVNS
Western equine encephalitis virus NS 21238454: 25-7428 878 225
WEEVNC Western equine encephalitis virus Structrual protein
21238454: 7473-11183 902 226 NIPAHMATRIX Nipah virus Matrix
13559808: 5008-6366 1359 227 NIPAHN Nipah virus N 13559808: 56-2297
858 228 SINNOMBREGP Sin Nombre Glycoprotein 558060: 52-3474 1293
229 SINNOMBRENC Sin Nombre Nucleocapsid 38371725: 43-1329 639 230
NORWALKL Norwalk virus RNA-dependent RNA 9630803: 3848-5371 739 231
polymerase NORWALKCAPSID Norwalk virus capsid protein 9630803:
5358-6950 760 232 DENGUECAPSID Dengue virus capsid protein 9626681:
97-396 300 233 DENGUEM Dengue virus pre-M protein 9626681: 439-936
498 234 DENGUE1NCR Dengue type 1 3'NCR AF309641: 10501-10657 157
235 DENGUE2NCR Dengue type 2 3'NCR AF359579: 10501-10659 159 236
DENGUE3NCR Dengue type 3 3'NCR AF317645: 10477-10632 156 237
DENGUE5NCR Dengue type 4 3'NCR AF326573: 10424-10585 162 238
FMDVVP1 Foot and mouth disease virus VP1 21426907: 3231-3863 633
239 FMDV3D Foot and mouth disease virus 3D 21426907: 6615-8024 846
240 SLEVNS5 Saint Louis encephalitis virus NS5 AF013416 1035 241
SLEVPP Saint Louis encephalitis virus polyprotein precursor
AY289618 727 242 RVFVN Rift Valley fever virus N 61928: c1652-915
738 243 RVFVNS Rift Valley fever virus NS 61928: 35-832 798 244
USUTUPP Usutu virus polyprotein AF452643 1035 245 JEVPP Japanese
encephalitis virus polyprotein M18370: 9062-10097 1035 246
CHANDIPURAMATRIX Chandipura virus Matrix 4583436: 97-851 755 247
CHANDIPURAGP Chandipura virus glycoprotein J04350 752 248 ATTIM2
Arabidopsis thaliana(2) triosephosphate isomerase (TIM) 523 249
ATTIM3 Arabidopsis thaliana(3) triosephosphate isomerase (TIM) 523
250 SPYEMM1 Streptococcus pyogenes emm1 CDC 398 251 SPYEMM2
Streptococcus pyogenes emm2 CDC 360 252 SPYEMM3 Streptococcus
pyogenes emm3 CDC 391 253 SPYEMM4 Streptococcus pyogenes emm4 CDC
337 254 SPYEMM5 Streptococcus pyogenes emm5 CDC 490 255 SPYEMM6
Streptococcus pyogenes emm6 CDC 437 256 SPYEMM9 Streptococcus
pyogenes emm9 CDC 509 257 SPYEMM11 Streptococcus pyogenes emm11 CDC
500 258 SPYEMM12 Streptococcus pyogenes emm12 CDC 364 259 SPYEMM13L
Streptococcus pyogenes emm13L CDC 325 260 SPYEMM18 Streptococcus
pyogenes emm18 CDC 524 261
SPYEMM22 Streptococcus pyogenes emm22 CDC 620 262 SPYEMM28
Streptococcus pyogenes emm28 CDC 333 263 SPYEMM29 Streptococcus
pyogenes emm29 CDC 328 264 SPYEMM44 Streptococcus pyogenes emm44
CDC 391 265 SPYEMM61 Streptococcus pyogenes emm61 CDC 325 266
SPYEMM75 Streptococcus pyogenes emm75 CDC 451 267 SPYEMM77
Streptococcus pyogenes emm77 CDC 450 268 SPYEMM89 Streptococcus
pyogenes emm89 CDC 378 269 SPYEMM94 Streptococcus pyogenes emm94
CDC 516 270 SPYCSR Streptococcus pyogenes Csr R & Csr S
AF095713 952 271 SPYSFB1 Streptococcus pyogenes sfb1 AJ347842 615
272 SPYSPEB Streptococcus pyogenes SpeB AB051298 729 273 SPNGYRA
Streptococcus pneumoniae GyrA AY157689 815 274 SPNLYTA
Streptococcus pneumoniae Autolysin lytA 125 275 SPNPLY
Streptococcus pneumoniae pneumolysin ply 99 276 SAUGYRA
Staphylococcus aureus GyrA D10489 821 277 SAUTST Staphylococcus
aureus tst 18266750: 2118-2822 705 278 SAUENTK Staphylococcus
aureus EntK 18266750: 13059-13787 729 279 SAUENTQ Staphylococcus
aureus EntQ 18266750: 12265-13035 771 280 CPNGYRA Chlamydia
pneumoniae GyrA 28415636: 2451-4955 824 281 CPNOMPB Chlamydia
pneumoniae OmpB X53511 1030 282 CPNMOMPVD4 Chlamydia pneumoniae
MOMP VD4 M69230 150 283 CPNMOMPVD2 Chlamydia pneumoniae major outer
membrane protein 133 284 (MOMP)VD2 CPNRPOB Chlamydia pneumoniae
Pstl fragment (rpoB) NT01CP0714 346 285 CPSOMPA Chlamydia psittaci
OmpA AF269281 991 286 CPSSIGA Chlamydia psittaci SigA U04442 883
287 CDIDTX Corynebacterium diphtheriae Dtx gene A04646 913 288
CDIGYRA Corynebacterium diphtheriae GyrA 38198900: 8792-11362 818
289 CDIDTXR Corynebacterium diphtheriae dtxR gene M80336 1124 290
HINGYRA Haemophilus influenzae GyrA 1574717: c7221-4579 896 291
HINOMPA Haemophilus influenzae OmpP5 (OmpA-family) L20309 937 292
LPNGYRA Legionella pneumophilia GyrA AY091594 236 293 LPNMOMPS
Legionella pneumophilia MompS AF078136 1157 294 MCAGYRA Moraxella
catarrhalis GyrA AF056196 321 295 MCAHA Moraxella catarrhalis
hemagglutinin AY077637 653 296 MTUGYRA Mycobacterium tuberculosis
GyrA 13879041: 7302-9818 818 297 MTUOMPA Mycobacterium tuberculosis
OmpA 38490207: 318539-319519 932 298 MTURPOB Mycobacterium
tuberculosis RpoB 468333: 1065-4598 411 299 MPNGYRA Mycoplasma
pneumoniae GyrA 11379479: 4821-7340 809 300 MPNP1 Mycoplasma
pneumoniae P1 gene AF290002 2570 301 NMEGYRA Neisseria meningitidis
GyrA 7413466: 89-2839 941 302 NMEMVIN Neisseria meningitidis MviN
7225498: c5929-4388 904 303 NMECTRA Neisseria meningitides capsular
transport protein (ctrA) 135 304 NMECRGA Neisseria meningitidis
CrgA AF190471 254 305 AHE16S Arcanobacterium hemolyticum 16S rRNA
X73952 1489 306 AHEPLD Arcanobacterium hemolyticum pld L16583 1111
307 BANGYRA Bacillus anthracis GyrA AY291534 732 308 BANLEF
Bacillus anthracis lethal factor M29081 685 309 BANPAG Bacillus
anthracis protective antigen AF306783 599 310 BANRPOB Bacillus
anthracis rpoB AF205335 777 311 BANCYA Bacillus anthracis Cya
142812: 544-2946 545 312 BANCAPB Bacillus anthracis encapsulation
protein 246 313 gene B(capB) BCERPOB Bacillus cereus rpoB AF205342
777 314 BSUGYRA Bacillus subtillus GyrA 40012: 7769-10234 812 315
BSURPOB Bacillus subtillus rpoB AF205356 780 316 BTHCRY Bacillus
thuringensis cry AF278797 853 317 BTHRPOB Bacillus thuringensis
rpoB AF205349 777 318 BPEGYRA Bordetella pertussis GyrA 33571514:
286253-288934 815 319 BPEPRNA Bordetella pertussis PmA AJ507642 777
320 BMEGYRA Brucella melitensis (F6145) GyrA AE009529: 4650-7358
995 321 BABRB51 Brucella abortus (RB51) WboA with insert AF107768
1339 322 Consensus Sequence BABOMP25 Brucella abortus (2308 B37)
Omp25 X79284 630 323 BABOMP2 Brucella abortus (2308 B37) Omp2
U26438 1434 324 BCAOMP2 Brucella canis Omp2 U26439 1434 325 BMEOMP2
Brucella melitensis (F6145) Omp2 U26440 1434 326 BNEOMP2 Brucella
neotoma Omp2 U26441 1434 327 BOVOMP2 Brucella ovis Omp2 U26442 1449
328 BSUIOMP2 Brucella suis (A44) Omp2 U26443 1434 329 BMAPENA
Burkholderia mallei (Glanders) PenA AY032868 1117 330 BMAWAAF
Burkholderia mallei (Glanders) WaaF AY124769 1015 331 BPSPENA
Burkholderia pseudomallei (Meliodosis) PenA AY032869 1117 332
BPSWAAF Burkholderia pseudomallei (Meliodosis) WaaF AF097748 1100
333 BCEPRECA Burkholderia cepacia RecA U70431 611 334 CPEGYRA
Clostridium perfringens GyrA 18143657: 7100-9391 810 335 CPETMPC
Clostridium perfringens TmpC 18146729: c45993-44881 1113 336
CBUGYRA Coxiella burnetii GyrA 29540947: 165494-168040 812 337
CBUTOLC Coxiella burnetii TolC 29540626: 51830-53680 745 338
FTURD1A Francisella tularensis subsp. RD1-A 32469331: c985-455 531
339 tularensis strain FTURD1B Francisella tularensis subsp. RD1-B
32469331: c1338-1054 285 340 tularensis strain FTUTUL4 Francisella
tularensis TUL4 M32059 834 341 FTUMDH Francisella tularensis mdh
AF513319 960 342 FTU13KD Francisella tularensis 13-kDa protein
M32059 431 343 FTUFOPA Francisella tularensis outer membrane
protein FopA 111 344 OTSGROEL Orientia tsutsugamushi GroEL AY191589
546 345 OTSSTA56 Orientia tsutsugamushi sta56 AY283180 1059 346
RPRGYRA Rickettsia prowazekii GyrA 3860572: 250672-253389 968 347
RPROMP1 Rickettsia prowazekii Omp1 3860572: c191770-189464 985 348
YPEGYRA Yersinia pestis GyrA 21959874: c10908-8233 812 349 YPEOMPA
Yersinia pestis OmpA 16120353: 913 350 c1628253-1627192 YPECVE
Yersinia pestis cve2155 sequence AF350077 517 351 YPECAF1 Yersinia
pestis caf1 X61996 525 352 ACAHAG Ajellomyces capsulatus H antigen
U20346 1082 353 ACAMAG Ajellomyces capsulatus M antigen AF026268
919 354 ACAGH17 Ajellomyces capsulatus GH17 U27588 810 355 BDEWI-1
Blastomyces dermatiditis WI-1 S63772 942 356 BDEBYS1 Blastomyces
dermatiditis bys1 AF277079 912 357 CIMAG2 Coccidioides immitis Ag2
U32518 1234 358 CIMBG12 Coccidioides immitis bg12 AF022893 965 359
CPACP2 Cryptosporidium parvum CP2 AY471868 735 360 CPASOD
Cryptosporidium parvum Sod AF529280 375 361 ECOGYRA E. coli 0157:H7
GyrA 16445223: 812 362 c3136459-3133832 ECOOMPA E. coli 0157:H7
OmpA 12518283: 3562-4221 660 363 SENGYRA Salmonella enterica GyrA
29136667: 70224-72860 812 364 SENOMPA Salmonella enterica OmpA
16502231: c18055-17003 904 365 SDYOMPA Shigella dysenteriae OmpA
46943: 568-1623 907 366 SFLGYRA Shigella flexneri GyrA 30041918:
c2708-81 812 367 SFLOMPA Shigella flexneri OmpA 24051234:
c4458-3340 898 368 VCHGYRA Vibrio cholerae GyrA 15640032: 887 369
1330207-1332891 VCHOMPA Vibrio cholerae OmpA 6031221: 1022-1987 942
370 MSRA Staphylococcus aureus MSR(A) AF467080 400 371 MECR1
Staphylococcus aureus mecR1 AF142100 652 372 MEFA Streptococcus
pyogenes MefA U70055 611 373 ERMTR Streptococcus pyogenes Erm(TR)
2190969: 211-942 732 374 ERMB Streptococcus pyogenes ErmB 38707181:
8545-9307 763 375 EMRB Rickettsia prowazekii EmrB 3860572:
c187837-186278 1560 376 GYRB Streptococcus pneumoniae GyrB X83917
1947 377 PARC Streptococcus pneumoniae ParC AF503577 2637 378 PARE
Streptococcus pneumoniae parE 6851036: 1255-3198 2008 379 PBP1
Streptococcus pneumoniae ponA (Pbp1A) AF446215 1282 380 PBP5
Enterococcus faecium pbp5 AF375986 668 381 MECA Staphylococcus
aureus pUB110 mecA 9181834: <1-729 729 382 BLAZ Staphylococcus
aureus plasmid pLW043 blaz 33390917: c41981-41136 846 383 DFRA
Staphylococcus aureus plasmid PLW043 dfrA 33390917: c2802-2317 486
384 VANA Staphylococcus aureus plasmid pLW043 vanA 33390917:
34299-35330 1032 385 QACC Staphylococcus aureus plasmid pLW043 qacC
33390917: c21313-20990 324 386 RMTB Escherichia coli rmtB AB117036
756 387 STRA Escherichia coli pMBSF1 plasmid strA 25815144:
723-1526 804 388 STRB Escherichia coli pMBSF1 plasmid strB
25815144: 1526-2362 837 389 AADA1 Escherichia coli plasmid p541
aadA1 40287459: 10986-11777 792 390 SULII Escherichia coli plasmid
pSTOJO1 sulii 17129524: 449-1264 816 391 CTXM Escherichia coli
strain EU2657 ctx-M AY517475 436 392 KPC2 Klebsiella pneumoniae
carbapenemase (KPC-2) AY034847 918 393 AMPC Klebsiella pneumoniae
plasmid AmpC 40548851: 1310-2449 1140 394 BLACMY2 Klebsiella
pneumoniae plasmid bla CMY-2 1212997: 1924-3069 1146 395 AMPR
Klebsiella pneumoniae plasmid pKP3 AmpR 40548851: c1199-324 876 396
SULI Plasmid R388 suli 45795: 880-1719 840 397 AACAAPHD
Staphylococcus aureus plasmid pLW043 aacA-aphD 33390917:
23440-24879 1440 398 FLOR Vibrio cholerae floR 32469306:
11605-12819 1215 399 TETM Enterococcus faecalis tet(M) M85225 536
400 TETC Escherichia coli partial plasmid (tetC) Y19114 502 401
TETS Listeria monocytogenes tet(S) L09756 555 402 TETA Pseudomonas
aeruginosa plasmid RP4 tet(A) X75761 494 403 TETG Pseudomonas sp.
plasmid pPSTG2 tet(G) AF133140 550 404 TETL Staphylococcus hyicus
(plasmid pSTE1) tet(L) X60828 548 405 TETB Transposon Tn10 tet(B)
V00611 571 406 PTX Bordetella pertussis Pertussis toxin M13223 872
407 BONT Clostridium botulinum bonT 3805779: 7268-11143 792 408
NTNH Clostridium botulinum ntnh 3805779: 3649-7242 496 409 BOTE
Clostridium botulinum botE X62683 1000 410 EPSILON Clostridium
perfringens plasmid epsilon-toxin X60694 620 411 TETANUS
Clostridium tetani tetanus toxin 40769: 281-4228 1185 412 STX1A E.
coli 0157:H7 stx1A 32400301: 1-948 948 413 STX2A E. coli 0157:H7
stx2A 13359151: 1-960 960 414 RICINUSTOXIN Ricinus communis Ricinus
communis toxin X52908 1133 415 CTXAB Vibrio cholerae ctxA + ctxB
Gary 984 416 PBLUEVEC pBluescript II KS(+) 236 417 PGEMVEC
pGEM-9Zf(-) X65312 226 418 PUCVEC pUC4KAN X06404 252 419 ATTIM4
Arabidopsis thaliana(4) triosephosphate isomerase (TIM) 523 420
308034
EXAMPLE 9
Influenza A Strain Identification with Random RT-PCR Approach
[0562] This example is to present application of newly modified
random RT-PCR protocol on isolated Influenza A (Fujan/411/2002
strain) nucleic acids spiked into normal nasal wash samples. A
random RT-PCR protocol has been applied by DeRisi's lab to amplify
viral pathogens for microarray interrogation (Wang et al., 2003).
Modification to this protocol was recently reported for amplifying
cultured influenza viruses (Kessler et al., 2004). However, the
modified protocol was not reported to amplify viral targets in
clinical samples.
[0563] The Influenza A virus cultures (H1N1 & H3N2) were
provided by the Air Force Institute for Operational Health (AFIOH)
and sent to Virapure, LLC for amplification and quantification
prior to the study. According to the Certificate of Analysis, the
virus was amplified for two rounds on early passage MDCK cells
purchased from ATCC exclusively for this project, followed by virus
amplification in the presence of 1 .mu.g/ml TPCK treated trypsin.
At 48 hours after infection, the virus supernatants were collected
and adjusted to contain 3% sucrose, 20 mM Tris pH 7.1 and pH
stabilizer. Prior to delivery, Corning screw top cryovials were
filled with 200-300 .mu.l aliquots of supernatant. The titration of
each sample vial was performed according to procedure AM110, Plaque
Forming Assay of Influenza Virus. A titer of 1.2.times.10.sup.7
PFU/ml was obtained for Influenza A H3N2, K0717 SV5/SV40 RMK and
5.times.10.sup.6 PFU/ml for Influenza
A/California/2935/03/H1N1.
[0564] To generate a clinical collection control, a nasal wash
(0.9% Normal Saline) was obtained from an asymptomatic member of
the Clinical Division of the Advanced Diagnostic Laboratory at
Lackland AFB. From each Virapure pre-quantified titer, a
10-fold/6-series dilution of Influenza-spiked nasal aspirate was
generated beginning with and including the Virapure stock of each
strain. A total of 100 .mu.l of each influenza-spiked nasal wash
were processed from the dilution using the MasterPure Total Nucleic
Acid Extraction distributed by EPICENTRE. During the protocol, 150
.mu.l of 2.times. T and C Lysis Solution containing Proteinase K
was added to each 100 .mu.l dilution of the spiked nasal wash,
vortex thoroughly and incubated at 65.degree. C. for 15 minutes.
Next, the samples were placed on ice for 5 minutes followed by the
addition of 150 .mu.l of MPC Protein Precipitation Reagent. The
samples were vortex mixed vigorously for 10 seconds. The mixture
was then precipitated by centrifugation at 10,000.times.g for 10
minutes and the supernatant transferred to a microcentrifuge tube.
To the recovered supernatant, 500 .mu.l of isopropanol wer added
and the tube inverted (30-40) times.
[0565] The mixture was centrifuged at 4.degree. C. for 10 minutes
and the isopropanol decanted without dislodging pellet. Following,
the pellet was rinsed twice with 75% ethanol and dried at room
temperature. Once dried, the nucleic acid pellet was resuspended in
35 .mu.l of TE Buffer. Each sample was properly labeled and placed
in dry ice prior to shipment to the Naval Research Laboratory.
[0566] Each of three 100 .mu.l normal nasal wash aliquots was
spiked with a serial of dilution (ranging from 10.sup.5 pfu to
10.sup.1) of influenza A H3N2 viral cultures. Total nucleic acids
were extracted with EPICENTRE MasterPure.TM. DNA Purification Kits
(Madison, Wis.), as in Example 5, and suspended in 40 .mu.l
H.sub.2O. Random RT-PCR was applied as follows, according to that
described (Kessler et al., 2004):
[0567] 2 .mu.l of total nucleic acid was reverse transcribed by
using 1 .mu.l primer D (40 pmol/.mu.l; 5'-GTT TCC CAG TCA CGA TCN
NNN NNN NN; SEQ ID NO: 573) and 1 .mu.l SuperScript.TM. III Reverse
Transcriptase (50 units/.mu.l; Invitrogen, Carlsbad, Calif.) in 20
.mu.l. The RT (reverse transcription) reaction was performed at
42.degree. C. for 1 hr and then was inactivated at 70.degree. C.
for 15 min. Then 10 .mu.l RT products were amplified with 1 .mu.l
primer E (100 pmol/.mu.l; 5'-GTT TCC CAG TCA CGA TC; SEQ ID NO:
574) and 2.5 units TaqPlus Long polymerase (5 units/.mu.l,
Stratagene, La Jolla, Calif.) in 50 .mu.l. PCR was conducted with
35 cycles of 94.degree. C. for 30 s, 40.degree. C. for 30 s,
50.degree. C. for 30 s, and 72.degree. C. for 160 s, followed by an
additional 7 minutes at 72.degree. C. PCR products from 10.sup.5,
10.sup.3 and 10.sup.1 pfu flu spiking samples were applied to three
different V1RPM pathogen microarrays for hybridization according to
Affymetrix Customseq protocol.
[0568] The V1RPM chip scans for the three spike-in samples,
10.sup.5, 10.sup.3, and 10.sup.1 pfu, amplified by this random
RT-PCR method, are shown in FIG. 8 A-C, respectively. All samples
resulted in clear sequence calls in at least some of the Influenza
A tile regions, indicating that the random RT-PCR protocol
successfully amplified flu genomes in the spiking samples even
without interference of human genomic DNA. Based on the GDAS
outputs (using "permissive" settings), REPI identified the
following 4 top "hits" from each sample (Tables 36-38). The top
four hits for 10.sup.5 (Table 36) and 10.sup.3 pfu (Table 37)
samples corresponded to the top ranking being assigned to the
correct Fujian/411/2002 strain and were exactly same as those
obtained for the Fujian/411/2002 (+) clinical sample
NW20031114-05-02 ACID04 -B2, amplified by influenza universal
primers (shown in Table 25). This result indicates that random
RT-PCR has similar amplification power to the Influenza A-specific
PCR with relatively high titer of viruses in the sample. With low
titer of flu viruses (10.sup.1 pfu), the percentage of tile region
that hybridized and the percentage of base calls decreased. This
resulted in a slight skewing of the rankings, making Fujian
411/2002 tied for the second highest-ranking strain. However, the
V1RPM was still able to read the spiked virus as H3N2 Pretoria
strain, which originally derived from and is almost identical to
Fujian/411/2002 strain.
[0569] Because of success of random RT-PCR in this experiment, it
is believed that this protocol will work in flu clinical samples
and other pathogenic viruses. TABLE-US-00039 TABLE 36 FluAHA3:
Influenza A H3N2 spiked samples (10.sup.5 pfu) Accession # Name E
value: Bit Score: lcl|ISDN38157 InfluenzaA/Fujian/411/ 0.0 1400.04
2002_Hemagglutinin_104 gi|37530025| Influenza A virus 0.0 1400.04
gb|AY389356.1 (A/Middleburg/41/03(H3N2)) hemagglutinin (HA) gene
gi|37530033| Influenza A virus 0.0 1392.11 gb|AY389360.1
(A/Pretoria/17/03(H3N2)) hemagglutinin (HA) gene gi|37530031|gb|
Influenza A virus 0.0 1392.11 AY389359.1 (A/Pretoria/16/03(H3N2))
hemagglutinin (HA) gene
[0570] TABLE-US-00040 TABLE 37 FluAHA: Influenza A H3N2 spiked
samples (10.sup.3 pfu) Accession # Name E value: Bit Score:
lcl|ISDN38157 InfluenzaA/Fujian/411/ 0.0 1471.4
2002_Hemagglutinin_104 gi|37530025| Influenza A virus 0.0 1471.4
gb|AY389356.1 (A/Middleburg/41/03(H3N2)) hemagglutinin (HA) gene
gi|37530033| Influenza A virus 0.0 1463.47 gb|AY389360.1
(A/Pretoria/17/03(H3N2)) hemagglutinin (HA) gene gi|37530031|gb|
Influenza A virus 0.0 1463.47 AY389359.1 (A/Pretoria/16/03(H3N2))
hemagglutinin (HA) gene
[0571] TABLE-US-00041 TABLE 38 FluAHA3: Influenza A H3N2 spiked
samples (10.sup.1 pfu) Accession # Name E value: Bit Score:
gi|37530033|gb| Influenza A virus 0.0 686.389 AY389360.1
(A/Pretoria/17/03(H3N2)) hemagglutinin (HA) gene gi|37530031|
Influenza A virus 0.0 686.389 gb|AY389359.1
(A/Pretoria/16/03(H3N2)) hemagglutinin (HA) gene gi|37530029|
Influenza A virus 0.0 686.389 gb|AY389358.1|
(A/Pretoria/2/03(H3N2)) hemagglutinin (HA) gene gi|37530025|gb|
Influenza A virus 0.0 686.389 AY389356.1|
(A/Middleburg/41/03(H3N2)) hemagglutinin (HA) gene lcl|ISDN38157
InfluenzaA/Fujian/411/ 0.0 678.459 2002_Hemagglutinin_104
[0572] This example is significant because it shows that a
completely non-biased amplification method, derived from a June
2001 public domain protocol may be used as a preparative step for
resequencing microarrays, resulting in PCR-level sensitivity
(10.sup.1 target copies). Although this amplification protocol has
been used for several published examples of long oligomer (70-mer)
microarray-based assays, it is not obvious that it would suffice
for resequencing microarray approaches. Numerous modifications and
variations on the present invention are possible in light of the
above teachings. It is, therefore, to be understood that within the
scope of the accompanying claims, the invention may be practiced
otherwise than as specifically described herein.
EXAMPLE 10
Other Pathogen Tests
[0573] In summary, the following pathogens were
detected/discriminated from laboratory and/or clinical samples on
RPMV1, (in aggregate representing >180 chips tested) with 100%
concordance of identification and input identity, and with no type
cross-hybridizations, except to parts of sequence tiles
representing homologous sequences between type variant strains, in:
TABLE-US-00042 Human adenoviruses group C Ad5 prototype Ad5 USA
field strain Human adenoviruses group E Ad4 prototype Ad4 AF field
strain Ad4 vaccine Ad4 USN field strain Human adenoviruses group B
Ad3 prototype Ad3 USN field strain Ad7 prototype Ad7 USN field
strain Ad7 vaccine Ad16 prototype Ad21 prototype Ad21 field strain
Influenza A H1N1 H3N2 Variola Major (smallpox) Lassa Virus Ebola
Virus Streptococcus pneumonia Streptococcus pyogenes Chlamydia
pneumonia Mycoplasma pneumonia Neisseria meningitides Bacillus
anthracis Yersinia pestis Francisella tularensis
[0574] Selected examples of RPMV1 microrarray results for prepared
non-infectious laboratory samples of Variola Major (smallpox),
Ebola Virus, B. anthracis, Y. pestis, and F. tularensis have been
performed, but not shown here. Variola Major (smallpox), Ebola
Virus, and F. tularensis samples were prepared using specific PCR
primers. B. anthracis and Y. pestis samples were amplified using
GenomiPhi reagents (as described in Example 3).
EXAMPLE 11
Features of Sequences Adenoviruses
[0575] The following tables present the sequence features and
putatitve gene products for the sequenced adenoviruses appearing in
Table 6 and SEQ ID NOs: 575-587. TABLE-US-00043 TABLE 39 Ad3
(Accession No. AY599834; SEQ ID NO: 575) Sequence Features and
putative gene products Feature Type Location Comment or Gene Locus
Product repeat_region 1 . . . 134 promoter 480 . . . 485 "TATA box
for the E1A genes" CDS Join (576 . . . 662, 1250 . . . 1351) gene =
"E1A" "6 KD protein" CDS Join (576 . . . 1155, 1250 . . . 1455)
gene = "E1A" "29.1 kD protein" CDS Join (576 . . . 1062, 1250 . . .
1455) gene = "E1A" "25K protein" polyA_site 1494 . . . 1499 "polyA
site for E1A" promoter 1549 . . . 1554 "TATA box for the E1B genes"
CDS 1603 . . . 2139 gene = "E1B" "19 kD small T antigen" CDS 1908 .
. . 3386 gene = "E1B" 55K protein Promoter 3384 . . . 3389 "TATA
box for pIX" polyA_site 3450 . . . 3455 "polyA site for E1B" CDS
3480 . . . 3896 gene = "protein IX" "protein IX" polyA_site 3913 .
. . 3918 "polyA site for pIX" polyA_site Complement (3925 . . .
3930) "polyA site for E2B" CDS Complement (join gene = "E2B" "IVA2
maturation (3948 . . . 5281, 5560 . . . 5572)) protein" CDS
Complement (5051 . . . 8419) gene = "E2B" "DNA polymerase" CDS 5123
. . . 5692 gene = "hypothetical" "hypothetical 20.6 KD early
protein" Promoter 5872 . . . 5877 "TATA box for the Major Late
Promoter" CDS Complement (6868 . . . 7389) gene "hypothetical" =
"hypothetical 19 kD protein" CDS 7829 . . . 8425 gene = "L1"
"probable DNA binding agnoprotein" CDS Complement (join gene =
"pTP" "DNA terminal protein (8422 . . . 10389, 13848 . . . 13856))
precursor" CDS 9757 . . . 10032 gene = "hypothetical" "hypothetical
9.7 kD protein" misc_RNA 10422 . . . 10595 "virus-associated RNA I"
CDS 10890 . . . 12026 gene = "L1" "55 KD protein" CDS 12051 . . .
13817 gene = "L1" "pIIIa protein" polyA_site 13830 . . . 13835
"polyA site for L1" CDS 13905 . . . 15539 gene = "L2" "penton base
protein" CDS 15553 . . . 16131 gene = "L2" "pVII" CDS 16174 . . .
17223 gene = "L2" "protein V" polyA_site 17498 . . . 17503 "polyA
site for L2" CDS 17555 . . . 18001 gene = "L3" "protein VI" CDS
18420 . . . 21254 gene = "L3" "hexon protein" CDS 21291 . . . 21920
gene = "L3" "`23K proteinase`" polyA_site 21940 . . . 21945 "polyA
site for L3" polyA_site Complement (21952 . . . 21957) "polyA site
for E2A" CDS Complement (22006 . . . 23559) gene = "E2A" "DNA
binding protein" CDS 23590 . . . 26076 gene = "L4" "hexon assembly-
associated 100 KD protein" CDS 25778 . . . 26375 gene = "L4" "22K
protein" CDS Join (25778 . . . 26159, gene = "L4" "33 KD protein"
26296 . . . 26651) CDS 26722 . . . 27405 gene = "L4" "proteinVIII"
Promoter 27087 . . . 27092 "TATA box for the E3 genes" CDS 27405 .
. . 27725 gene = "E3" "12.5 KD protein" CDS 27679 . . . 28002 gene
= "E3" "16 KD glycoprotein" CDS 28104 . . . 28622 gene = "E3" "18.5
protein" CDS 28652 . . . 29191 gene = "E3" "20.1 KD protein" CDS
29204 . . . 29773 gene = "E3" "20.1 KD protein, duplicate copy" CDS
29788 . . . 30021 gene = "E3" " "9K glycoprotein CDS 30063 . . .
30338 gene = "E3" "10.3 KD protein" CDS 30310 . . . 30747 gene =
"E3" "14.9 KD protein" CDS 30740 . . . 31150 gene = "E3" "14 KD
protein" polyA_site 31183 . . . 31188 "polyA site for E3" CDS
Complement (31194 . . . 31355) gene = "U" "U protein" CDS 31370 . .
. 32329 gene = "L5" "fiber protein" polyA_site Complement (32354 .
. . 32359) "polyA signal for E4" CDS Complement (32370 . . . 32621)
gene = "E4" "ORF6/7" CDS Complement (32618 . . . 33517) gene = "E4"
"33.2 KD protein" CDS Complement (33420 . . . 33788) gene = "E4"
"13.6 KD protein" CDS 33643 . . . 34152 gene = "L5" " "agnoprotein
CDS Complement (33797 . . . 34150) gene = "E4" "11 KD protein" CDS
Complement (34147 . . . 34536) gene = "E4" "14.3 KD protein"
polyA_site 34561 . . . 34566 "polyA signal for L5" CDS Complement
(34578 . . . 34955) gene = "E4" "13.9 KD protein" Promoter
Complement (35037 . . . 35042) "TATA box for the E4 promoter"
repeat_region Complement (35212 . . . 35345) "the inverted terminal
repeat"
[0576] TABLE-US-00044 TABLE 40 Ad3FS_navy (Accession No. AY599836;
SEQ ID NO: 576) Sequence Features and putative gene products
Feature Type Location Comment or Gene Locus Product repeat_region 1
. . . 111 "the inverted terminal repeat" promoter 480 . . . 485
"TATA box for E1A" CDS join (576 . . . 647, 1248 . . . 1349) gene =
"E1A" "6.9k protein" CDS join (576 . . . 1062, 1248 . . . 1453)
gene = "E1A" "25K protein" CDS join (576 . . . 1155, 1248 . . .
1453) gene = "E1A" "29K protein" polyA_site 1492 . . . 1497 "polyA
site for E1A" promoter 1547 . . . 1552 "TATA box for E1B" CDS 1601
. . . 2137 gene = "E1B" "19K small T-antigen protein" CDS 1906 . .
. 3384 gene = "E1B" "55K large T-antigen protein" Promoter 3428 . .
. 3433 "TATA box for pIX" polyA_site 3448 . . . 3453 "polyA signal
for E1B" CDS 3478 . . . 3894 gene = "pIX" "proteinIX" polyA_site
3907 . . . 3912 "polyA signal for pIX" polyA_site complement (3923
. . . 3928) "polyA signal for E2B" CDS complement (join gene =
"E2B" "proteinIVa2" (3946 . . . 5279, 5558 . . . 5570)) CDS
complement(5049 . . . 8417) gene = "E2B" "DNA polymerase" Promoter
5870 . . . 5875 "TATA box for the Major Late Promoter" CDS 6142 . .
. 6462 gene = "hypothetical" "hypothetical 11.5K early protein" CDS
7131 . . . 7418 gene = "hypothetical" "hypothetical 10.4K early
protein" CDS 7827 . . . 8423 gene = "L1" "probable DNA binding
agnoprotein" CDS complement (8227 . . . 8562) gene = "E2B"
"hypothetical 12.6K early protein" CDS complement (join gene =
"E2B" "Terminal protein (8420 . . . 10378, 13840 . . . 13848))
precursor, aka Bellet's protein" CDS complement (9532 . . . 9846)
gene = "hypothetical" "11.3K early protein" CDS 9746 . . . 10021
gene = "hypothetical" "9.7K early protein" misc_RNA 10411 . . .
10584 "VA RNA I" misc_RNA 10662 . . . 10838 "VA RNA II" CDS 10861 .
. . 12018 gene = "L1" "55K protein" CDS 12043 . . . 13809 gene =
"L1" "hexon associated proteinIIIa precursor" polyA_site 13822 . .
. 13827 "polyA signal for L1" CDS 13897 . . . 15531 gene = "L2"
"penton base protein" CDS 15545 . . . 16123 gene = "L2" "pVII 21K
protein" CDS 16166 . . . 17218 gene = "L2" "pV from L2" polyA_site
17493 . . . 17498 "polyA signal for L2" CDS 17550 . . . 18302 gene
= "L3" "pVI" CDS 18415 . . . 21243 gene = "L3" "hexon protein" CDS
21286 . . . 21915 gene = "L3" "23K proteinase" polyA_site 21935 . .
. 21940 "polyA signal for L3" CDS complement (21947 . . . 21952)
CDS complement (22002 . . . 23552) "E2A" "DNA binding protein" CDS
23583 . . . 26057 gene = "L4" "100K protein" CDS 25759 . . . 26358
gene = "L4" "22K protein" CDS join gene = "L4" "33K protein" (25759
. . . 26107, 26289 . . . 26632) CDS 26702 . . . 27385 gene = "L4"
"pVIII, hexon associated protein precursor" Promoter 27067 . . .
27072 "TATA box for the E3 gene" CDS 27385 . . . 27705 gene = "E3"
"12.2K glycoprotein" CDS 27659 . . . 28099 gene = "E3" "16K
glycoprotein" CDS 28084 . . . 28602 gene = "E3" "18.5K
glycoprotein" CDS 28632 . . . 29171 gene = "E4" "20.1K
glycoprotein" CDS 29184 . . . 29753 gene = "E3" "20.1K
glycoprotein" CDS 29985 . . . 30260 gene = "E3" "10.3K
glycoprotein" CDS 30232 . . . 30669 gene = "E3" "14.9K
glycoprotein" CDS 30662 . . . 31072 gene = "E3" "15.3K protein"
polyA_site 31105 . . . 31110 "polyA signal for E3" CDS complement
(31116 . . . 31277) gene = "U exon" "U exon protein" CDS 31292 . .
. 32251 gene = "L5" "fiber protein" polyA_site complement (32276 .
. . 32281) "polyA signal for E4" CDS complement (32292 . . . 32543)
gene = "E4" "ORF6/7" CDS complement (32540 . . . 33439) gene = "E4"
"34K protein" CDS complement (33342 . . . 33710) gene = "E4" "13.6K
protein" CDS 33565 . . . 34074 gene = "L5" "DNA binding
agnoprotein" CDS complement (33719 . . . 34072) gene = "E4" "ORF3"
CDS complement (34069 . . . 34458) gene = "E4" "14.3K protein"
polyA_site 34483 . . . 34488 "polyA signal for L5" CDS complement
(34500 . . . 34877) gene = "E4" "13.9K protein" Promoter complement
(34959 . . . 34964) "TATA box for the E4 gene" Promoter
repeat_region complement (35155 . . . 35265) "the inverted terminal
repeat_region repeat"
[0577] TABLE-US-00045 TABLE 41 Ad4 (Accession No. AY594253; SEQ ID
NO: 577) Sequence Features and putative gene products Feature Type
Location Comment or Gene Locus Product repeat_unit 1 . . . 116
"ITR" promoter 472 . . . 511 "E1A" CDS 576 . . . 1154 gene = "E1A"
"ORF1: putative" CDS join (576 . . . 650, 1236 . . . 1340) gene =
"E1A" "early E1A 6.8 KD protein" similarity to Simian Adenovirus 25
11 kDa, GI: 17227344 CDS join (576 . . . 1142, 1235 . . . 1441)
gene = "E1A" "EARLY E1A 28 KD PROTEIN" similarity to Simian
Adenovirus 25 28.2 kDa, GI: 17227342 CDS join (576 . . . 1049, 1235
. . . 1441) gene = "E1A" "early E1A 24.6 KD protein" similarity to
Simian Adenovirus 25 24.8 kDa, GI: 17227343 polyA_signal 1499 . . .
1504 "E1A" promoter 1553 . . . 1592 "E1B" CDS join (1600 . . .
2001, 2003 . . . 2029) gene = "E1B" "small T-antigen (E1B 19K)" CDS
1600 . . . 2115 gene = "E1B" "early E1B 20 KD protein" similarity
to Simian Adenovirus 25 20.5 kDa, GI: 17227345 CDS join (1905 . . .
2123, 3259 . . . 3276) gene = "E1B" "early E1B 8.2 KD protein"
similarity to Simian Adenovirus 25 10.1 kDa, GI: 17227348 CDS 1905
. . . 3356 gene = "E1B" "large T antigen" similarity to Simian
Adenovirus 25 54.7 kDa, GI: 17227347 CDS join (1905 . . . 2153,
3141 . . . 3356) gene = "E1B" "early E1B 16.8 KD protein"
similarity to Simian Adenovirus 25 18.5 kDa, GI: 17227346 CDS 3441
. . . 3869 gene = "IX" "protein IX (hexon- associated protein)"
similarity to Simian Adenovirus 25 pIX, GI: 17227349 polyA_signal
3880 . . . 3885 "E1B and IX" polyA_signal complement (3902 . . .
3907) "E2B and IVa2" CDS complement (join (3930 . . . 5263, gene =
"IVa2" "IVa2 protein 5542 . . . 5554)) (maturation protein)"
similarity to Simian Adenovirus 25 pIVa2, GI: 17227350 CDS
complement (join (5033 . . . 8605, gene = "E2B(POL)" "DNA
polymerase" 12212 . . . 12220)) similarity to Simian Adenovirus 25
TPA: pol, GI: 33694808 CDS 5105 . . . 5674 gene = "E2B" "19.4 KD
early protein" similarity to Human Adenovirus 7 hypothetical 20.6
kD early protein, GI: 58522 CDS 6126 . . . 6446 gene = "E2B" "11.5
KD early protein" similarity to Human Adenovirus 7 hypothtical 11.5
kD early protein, GI: 58524 CDS 7814 . . . 8407 gene = "L1"
"DNA-binding protein (agnoprotein)" similarity to Simian Adenovirus
25 21.9 kDa, GI: 17227351 CDS join (7814 . . . 7819, 8536 . . .
8928) gene = "E2B" "14.1 KD early protein" similarity to Human
Adenovirus 7 hypothetical 14.5 kD early protein, GI: 58528 CDS
complement (join gene = "E2B(pTP)" "precursor terminal (8404 . . .
10323, 12212 . . . 12220)) protein" similarity to Simian Adenovirus
25 TPA: pTP, GI: 33694809 misc_RNA 10356 . . . 10514
"virus-associated RNA I" "VA RNA-I, 159 nt" promoter complement
(10457 . . . 10496) "E2B" promoter 10541 . . . 10580 "L1" misc_RNA
10575 . . . 10743 "virus-associated RNA II" "VA RNA-II, 169 nt" CDS
10765 . . . 11937 gene = "L1(52K)" "52K protein" similarity to
Simian Adenovirus 25 42.9 kDa (52K), GI: 17227352 polyA_signal
11942 . . . 11947 "L1(52K)" CDS 11961 . . . 13736 gene = "L1(IIIa)"
"protein IIIa" similarity to Simian Adenovirus 25 TPA: pIIIa, GI:
33694811 polyA_signal 13749 . . . 13754 "L1(IIIa)" promoter 13758 .
. . 13797 "L2" CDS 13815 . . . 15422 gene = "L2(penton)" "penton
protein(protein III)" similarity to Simian Adenovirus 25 pIII, GI:
17227353 polyA_signal 15425 . . . 15430 "L2(penton)" CDS 15426 . .
. 16007 gene = "L2(pVII)" "major core protein (protein VII)"
similarity to Simian Adenovirus 25 pVII, GI: 17227354 CDS 16055 . .
. 17080 gene = "L2(pV)" "minor core protein (protein V)" similarity
to Simian Adenovirus 25 TPA: pV, GI: 33694814 CDS 17103 . . . 17336
gene = "L2(pX)" "protein X (protein mu)" similarity to Simian
Adenovirus 25 TPA: pX, GI: 33694815 polyA_signal 17357 . . . 17362
"L2(pX)" CDS 17368 . . . 18141 gene = "L3(pVI)" "protein VI (hexon-
associated protein)" similarity to Simian Adenovirus 25 pVI, GI:
17227356 CDS 18248 . . . 21058 gene = "L3(hexon)" "hexon protein
(protein II)" similarity to Simian Adenovirus 25 pII (hexon), GI:
17227357 CDS 21082 . . . 21702 gene = "L3(23K)" "late L3 23K
proteinase (Adenain)" similarity to Simian Adenovirus 25 protease,
GI: 33694818 polyA_signal 21725 . . . 21730 "L3" polyA_signal
complement (21767 . . . 21772) "E2A" CDS complement (21774 . . .
23312) gene = "E2A(DBP)" "early E2A DNA- binding protein"
similarity to Simian Adenovirus 25 DNA- binding protein, GI:
17227358 CDS 23341 . . . 25716 gene = "L4(100K)" "100K protein"
similarity to Simian Adenovirus 25 TPA: 100K, GI: 33694820 Promoter
complement (23345 . . . 23384) "E2A" CDS 25439 . . . 25978 gene =
"L4(22K)" "22K protein" similarity to Simian Adenovirus 25 TPA:
22K, GI: 33694822 CDS join(25439 . . . 25756, 25926 . . . 26249)
gene = "L4(33K)" "33K protein" similarity to Simian Adenovirus 25
TPA: 33K (24.3 kDa), GI: 17227359 CDS 26318 . . . 27001 gene =
"L4(pVIII)" "L4 protein VIII" similarity to Simian Adenovirus 25
pVIII, GI: 17227360 CDS 27002 . . . 27322 gene = "E3" "E3 12.1 kDa
protein" similarity to Simian Adenovirus 25 11.6 kDa, GI: 17227361
CDS 27276 . . . 27908 gene = "E3" "E3 23.3 kDa protein" similarity
to Simian Adenovirus 25 TPA: E3 CR1-alpha1, GI: 33694825 CDS 27890
. . . 28414 gene = "E3" "E3 19 kDa protein" similarity to Simian
Adenovirus 25 TPA: E3 gp19K (19.3 kDa), GI: 33694826 CDS 28445 . .
. 29107 gene = "E3" "E3 24.8 kDa protein" similarity to Simian
Adenovirus 25 22.3 kDa, TPA: E3 CR1- gamma1, GI: 17227364 CDS 29275
. . . 29439 gene = "E3" "E3 6.3 kDa protein" CDS 29436 . . . 30260
gene = "E3" "E3 29.7 kDa protein" similarity to Simian Adenovirus
25 TPA: E3 CR1-delta1, GI: 33694829 CDS 30269 . . . 30544 gene =
"E3" "E3 10.4 kDa protein" similarity to Simian Adenovirus 25 9.9
kDa, GI: 17227365 CDS 30550 . . . 30990 gene = "E3" "E3 14.5 kDa
protein" similarity to Simian Adenovirus 25 15.6 kDa, GI: 17227366
CDS 30983 . . . 31384 gene = "E3" "E3 14.7 kDa protein" similarity
to Simian Adenovirus 25 14.7 kDa, GI: 17227367 polyA_signal 31424 .
. . 31429 "E3" promoter 31473 . . . 31512 "L5" CDS 31645 . . .
32922 gene = "L5(fiber)" "fiber protein" similarity to Simian
Adenovirus 25 pIV(TPA: fiber), GI: 17227368 polyA_signal 32982 . .
. 32987 "L5" polyA_signal complement (32998 . . . 33003) "E4" CDS
complement (33018 . . . 33212) gene = "E4" "E4 7.4 KD protein"
similarity to Simian Adenovirus 25 Ad9 ORF7-like protein, GI:
17227369 CDS complement (join gene = "E4" "E4 15.9 KD protein"
(33018 . . . 33266, 33992 . . . 34165)) similarity to Simian
Adenovirus 25 TPA: E4 ORF6/7, GI: 33694835 CDS complement (33266 .
. . 34165) gene = "E4" "E4 34.6 KD protein" similarity to Simian
Adenovirus 25 33 kDa, GI: 17227370 CDS complement (34068 . . .
34436) gene = "E4" "E4 14.1 KD protein" similarity to Simian
Adenovirus 25 13.2 kDa, GI: 17227371 polyA_signal complement (34388
. . . 34393) "E4" CDS complement (34445 . . . 34798) gene = "E4"
"E4 13.7 KD protein" similarity to Simian Adenovirus 25 12.8 kDa,
GI: 17227372 CDS complement (34795 . . . 35184) gene = "E4" "E4
14.6 KD protein" similarity to Simian Adenovirus 25 14.2 kD, GI:
17227373 CDS complement (35232 . . . 35606) gene = "E4" "E4 13.5 KD
protein"
similarity to Simian Adenovirus 25 TPA: E4 ORF1, GI: 33694840
repeat_unit complement (35875 . . . 35990) "ITR"
[0578] TABLE-US-00046 TABLE 42 Ad4vaccine (Accession No. AY594254;
SEQ ID NO: 578) Sequence Features and putative gene products
Feature Type Location Comment or Gene Locus Product repeat_unit 1 .
. . 116 "ITR" promoter 472 . . . 511 "E1A" CDS 576 . . . 1154 gene
= "E1A" "ORF1: putative" CDS join (576 . . . 650, 1236 . . . 1340)
gene = "E1A" "early E1A 6.8 KD protein" similarity to Simian
Adenovirus 25 11 kDa, GI: 17227344 CDS join (576 . . . 1142, 1235 .
. . 1441) gene = "E1A" "early E1A 28 KD protein" similarity to
Simian Adenovirus 25 28.2 kDa, GI: 17227342 CDS join (576 . . .
1049, 1235 . . . 1441) gene = "E1A" "early E1A 24.6 KD protein"
similarity to Simian Adenovirus 25 24.8 kDa, GI: 17227343
polyA_signal 1499 . . . 1504 "E1A" promoter 1553 . . . 1592 "E1B"
CDS Join (1600 . . . 2001, 2003 . . . 2029) gene = "E1B" "small
T-antigen (E1B 19K)" CDS 1600 . . . 2115 gene = "E1B" "early E1B 20
KD protein" similarity to Simian Adenovirus 25 20.5 kDa, GI:
17227345 CDS join (1905 . . . 2123, 3259 . . . 3276) gene = "E1B"
"early E1B 8.2 KD protein" similarity to Simian Adenovirus 25 10.1
kDa, GI: 17227348 CDS 1905 . . . 3356 gene = "E1B" "large T
antigen" similarity to Simian Adenovirus 25 54.7 kDa, GI: 17227347
CDS join (1905 . . . 2153, 3141 . . . 3356) gene = "E1B" "early E1B
16.8 KD protein" similarity to Simian Adenovirus 25 18.5 kDa, GI:
17227346 CDS 3441 . . . 3869 gene = "IX" "protein IX (hexon-
associated protein)" similarity to Simian Adenovirus 25 pIX, GI:
17227349 polyA_signal 3880 . . . 3885 "E1B and IX" polyA_signal
complement (3902 . . . 3907) "E2B and IVa2" CDS complement (join
(3930 . . . 5263, gene = "IVa2" "IVa2 protein 5542 . . . 5554))
(maturation protein)" similarity to Simian Adenovirus 25 pIVa2, GI:
17227350 CDS complement (join (5033 . . . 8605, gene = "E2B(POL)"
"DNA polymerase" 12212 . . . 12220)) similarity to Simian
Adenovirus 25 TPA: pol, GI: 33694808 CDS 5105 . . . 5674 gene =
"E2B" "19.4 KD early protein" similarity to similar to Human
Adenovirus 7 hypothetical 20.6 kD early protein, GI: 58522 CDS 6126
. . . 6446 gene = "E2B" "11.5 KD early protein" similarity to
similar to Human Adenovirus 7 hypothtical 11.5 kD early protein,
GI: 58524 CDS 7814 . . . 8407 gene = "L1" "DNA-binding protein
(agnoprotein)" similarity to Simian Adenovirus 25 21.9 kDa, GI:
17227351 CDS join (7814 . . . 7819, 8536 . . . 8928) gene = "E2B"
"14.1 KD early protein" similarity to similar to Human Adenovirus 7
hypothetical 14.5 kD early protein, GI: 58528 CDS complement (join
gene = "E2B(pTP)" "precursor terminal (8404 . . . 10323, 12212 . .
. 12220)) protein" similarity to Simian Adenovirus 25 TPA: pTP, GI:
33694809 misc_RNA 10356 . . . 10514 "virus-associated RNA I" "VA
RNA-I, 159 nt" promoter complement (10457 . . . 10496) "E2B"
promoter 10541 . . . 10580 "L1" misc_RNA 10575 . . . 10743
"virus-associated RNA II" "VA RNA-II, 169 nt" CDS 10765 . . . 11937
gene = "L1(52K)" "52K protein" similarity to Simian Adenovirus 25
42.9 kDa (52K), GI: 17227352 polyA_signal 11942 . . . 11947
"L1(52K)" CDS 11961 . . . 13736 gene = "L1(IIIa)" "protein IIIa"
similarity to Simian Adenovirus 25 TPA: pIIIa, GI: 33694811
polyA_signal 13749 . . . 13754 "L1(IIIa)" promoter 13758 . . .
13797 "L2" CDS 13815 . . . 15422 gene = "L2(penton)" "penton
protein(protein III)" similarity to Simian Adenovirus 25 pIII, GI:
17227353 polyA_signal 15425 . . . 15430 "L2(penton)" CDS 15426 . .
. 16007 gene = "L2(pVII)" "major core protein (protein VII)"
similarity to Simian Adenovirus 25 pVII, GI: 17227354 CDS 16055 . .
. 17080 gene = "L2(pV)" "minor core protein (protein V)" similarity
to Simian Adenovirus 25 TPA: pV, GI: 33694814 CDS 17103 . . . 17336
gene = "L2(pX)" "protein X (protein mu)" similarity to Simian
Adenovirus 25 TPA: pX, GI: 33694815 polyA_signal 17357 . . . 17362
"L2(pX)" CDS 17368 . . . 18141 gene = "L3(pVI)" "protein VI (hexon-
associated protein)" similarity to Simian Adenovirus 25 pVI, GI:
17227356 CDS 18248 . . . 21058 gene = "L3(hexon)" "hexon protein
(protein II)" similarity to Simian Adenovirus 25 pII (hexon), GI:
17227357 CDS 21082 . . . 21702 gene = "L3(23K)" "late L3 23K
proteinase (Adenain)" similarity to Simian Adenovirus 25 protease,
GI: 33694818 polyA_signal 21725 . . . 21730 "L3" polyA_signal
complement (21767 . . . 21772) "E2A" CDS complement (21774 . . .
23312) gene = "E2A(DBP)" "early E2A DNA- binding protein"
similarity to Simian Adenovirus 25 DNA- binding protein, GI:
17227358 CDS 23341 . . . 25716 gene = "L4(100K)" "100K protein"
similarity to Simian Adenovirus 25 TPA: 100K, GI: 33694820 promoter
complement (23345 . . . 23384) "E2A" CDS 25439 . . . 25978 gene =
"L4(22K)" "22K protein" similarity to Simian Adenovirus 25 TPA:
22K, GI: 33694822 CDS join (25439 . . . 25756, gene = "L4(33K)"
"33K protein" 25926 . . . 26252) similarity to Simian Adenovirus 25
TPA: 33K (24.3 kDa), GI: 17227359 CDS 26321 . . . 27004 gene =
"L4(pVIII)" "L4 protein VIII" similarity to Simian Adenovirus 25
pVIII, GI: 17227360 CDS 27005 . . . 27325 gene = "E3" "E3 12.1 kDa
protein" similarity to Simian Adenovirus 25 11.6 kDa, GI: 17227361
CDS 27279 . . . 27911 gene = "E3" "E3 23.3 kDa protein" similarity
to Simian Adenovirus 25 TPA: E3 CR1-alpha1, GI: 33694825 CDS 27893
. . . 28417 gene = "E3" "E3 19 kDa protein" similarity to Simian
Adenovirus 25 TPA: E3 gp19K (19.3 kDa), GI: 33694826 CDS 28449 . .
. 29111 gene = "E3" "E3 24.8 kDa protein" similarity to Simian
Adenovirus 25 22.3 kDa, TPA: E3 CR1- gamma1, GI: 17227364 CDS 29279
. . . 29443 gene = "E3" "E3 6.3 kDa protein" CDS 29440 . . . 30264
gene = "E3" "E3 29.7 kDa protein" similarity to Simian Adenovirus
25 TPA: E3 CR1-delta1, GI: 33694829 CDS 30273 . . . 30548 gene =
"E3" "E3 10.4 kDa protein" similarity to Simian Adenovirus 25 9.9
kDa, GI: 17227365 CDS 30554 . . . 30994 gene = "E3" "E3 14.5 kDa
protein" similarity to Simian Adenovirus 25 15.6 kDa, GI: 17227366
CDS 30987 . . . 31388 gene = "E3" "E3 14.7 kDa protein" similarity
to Simian Adenovirus 25 14.7 kDa, GI: 17227367 polyA_signal 31428 .
. . 31433 "E3" promoter 31477 . . . 31516 "L5" CDS 31649 . . .
32926 gene = "L5(fiber)" "fiber protein" similarity to Simian
Adenovirus 25 pIV(TPA: fiber), GI: 17227368 polyA_signal 32986 . .
. 32991 "L5" polyA_signal complement (33002 . . . 33007) "E4" CDS
complement (33022 . . . 33216) gene = "E4" "E4 7.4 KD protein"
similarity to Simian Adenovirus 25 Ad9 ORF7-like protein, GI:
17227369 CDS complement (join gene = "E4" "E4 15.9 KD protein"
(33022 . . . 33270, 33996 . . . 34169)) similarity to Simian
Adenovirus 25 TPA: E4 ORF6/7, GI: 33694835 CDS complement (33270 .
. . 34169) gene = "E4" "E4 34.6 KD protein" similarity to Simian
Adenovirus 25 33 kDa, GI: 17227370 CDS complement (34072 . . .
34440) gene = "E4" "E4 14.1 KD protein" similarity to Simian
Adenovirus 25 13.2 kDa, GI: 17227371 polyA_signal complement (34392
. . . 34397) "E4" CDS complement (34449 . . . 34802) gene = "E4"
"E4 13.7 KD protein" similarity to Simian Adenovirus 25 12.8 kDa,
GI: 17227372 CDS complement (34799 . . . 35188) gene = "E4" "E4
14.6 KD protein" similarity to Simian Adenovirus 25 14.2 kD,
GI: 17227373 CDS complement (35236 . . . 35610) gene = "E4" "E4
13.5 KD protein" similarity to Simian Adenovirus 25 TPA: E4 ORF1,
GI: 33694840 repeat_unit complement (35879 . . . 35994) "ITR"
[0579] TABLE-US-00047 TABLE 43 Ad4FS_navy (Accession No. AY599835;
SEQ ID NO: 579) Sequence Features and putative gene products
Feature Type Location Comment or Gene Locus Product repeat_region 1
. . . 208 "ITR" CDS 576 . . . 1121 gene = "E1A" "ORF1: putative"
CDS Join (576 . . . 650, 1203 . . . 1307) gene = "E1A" "early E1A
6.8 KD protein" similarity to similar to Simian Adenovirus 25 11
kDa, GI: 17227344 CDS join (576 . . . 1109, 202 . . . 1408) gene =
"E1A" "early E1A 27 KD protein" similarity to Simian Adenovirus 25
28.2 kDa, GI: 17227342 CDS join (576 . . . 1016, 202 . . . 1408)
gene = "E1A" "early E1A 23.5 KD protein" similarity to Simian
Adenovirus 25 24.8 kDa, GI: 17227343 polyA_signal 1464 . . . 1469
"E1A" promoter 1518 . . . 1557 "E1B" CDS Join (1565 . . . 1966,
1968 . . . 1994) gene = "E1B" "small T antigen" CDS 1565 . . . 2119
gene = "E1B" "early E1B 21.5 KD protein" standard_name - Simian
Adenovirus 25 20.5 kDa, GI: 17227345 CDS join (1870 . . . 2127,
3263 . . . 3280) gene = "E1B" "early E1B 9 KD protein" similarity
to Simian Adenovirus 25 10.1 kDa, GI: 17227348 CDS 1870 . . . 3360
gene = "E1B" "large T antigen (55 KD)" similarity to Simian
Adenovirus 25 54.7 kDa, GI: 17227347 CDS join (1870 . . . 2157,
3145 . . . 3360) gene = "E1B" "early E1B 17.9 KD protein"
similarity to Simian Adenovirus 25 18.5 kDa, GI: 17227346 CDS 3444
. . . 3872 gene = "IX" "protein IX (hexon- associated protein)"
polyA_signal 3882 . . . 3887 "E1B and IX" polyA_signal Complement
(3904 . . . 3909) "E2B and IVa2" CDS Complement (join gene = "IVa2"
"IVa2 protein (3935 . . . 5268, 5547 . . . 5559)) (maturation
protein)" CDS complement (join (5038 . . . 8610, gene = "E2B(POL)"
"DNA polymerase" 12152 . . . 12160)) CDS 5110 . . . 5679 gene =
"E2B" "19.4 KD early protein" similarity to similar to Human
Adenovirus 7 htpothetical 20.6 kD early protein, GI: 58522 CDS 6131
. . . 6451 gene = "E2B" "11.5 KD early protein" similarity to
similar to Human Adenovirus 7 hypothetical 11.5 kD early protein,
GI: 58524 CDS 7819 . . . 8412 gene = "L1" "DNA-binding protein
(agnoprotein)" similarity to Simian Adenovirus 25 21.9 kDa, GI:
17227251 CDS join (7819 . . . 7824, 8541 . . . 8933) gene = "E2B"
"14.1 KD early protein" similarity to similar to Human Adenovirus 7
hypothetical 14.5 KD early protein, GI: 58528 CDS complement (join
gene = "E2B(pTP)" "precursor terminal (8409 . . . 10328, 12152 . .
. 12160)) protein" misc_RNA 10361 . . . 10519 "virus-associated RNA
I" "VA RNA-I, 159 nt" promoter complement(10462 . . . 10501) "E2B"
promoter 10542 . . . 10581 "L1" misc_RNA 10584 . . . 10684
"virus-associated RNA II" "VA RNA-II, 101 nt" CDS 10705 . . . 11877
gene = "L1(52K)" "52K protein" polyA_signal 11882 . . . 11887
"L1(52K)" CDS 11901 . . . 13682 gene = "L1(IIIa)" "protein IIIa"
polyA_signal 13696 . . . 13701 "L1(IIIa)" promoter 13755 . . .
13794 "L2" CDS 13762 . . . 15369 gene = "L2(penton)" "penton
protein(protein III)" polyA_signal 15372 . . . 15377 "L2(penton)"
CDS 15373 . . . 15954 gene = "L2(pVII)" "major core protein
(protein VII)" CDS 16002 . . . 17027 gene = "L2(pV)" "minor core
protein (protein V)" CDS 17050 . . . 17283 gene = "L2(pX)" "protein
X (protein mu)" CDS 17343 . . . 18074 gene = "L3(pVI)" "protein VI
(hexon- associated protein)" CDS 18181 . . . 20991 gene =
"L3(hexon)" "hexon protein (protein II)" CDS 21015 . . . 21635 gene
= "L3(23K)" "23K protease" polyA_signal 21661 . . . 21666 "L3"
polyA_signal complement (21703 . . . 21708) "E2A" CDS complement
(21710 . . . 23242) gene = "E2A(DBP)" "early E2A DNA- binding
protein" promoter complement (23170 . . . 23209) "E2A" promoter
23211 . . . 23250 "L4" CDS 23271 . . . 25634 gene = "L4(100K)"
"100K protein" CDS 25369 . . . 25884 gene = "L4(22K)" "22K protein"
CDS join (25369 . . . 25674, gene = "L4(33K)" "33K protein" 25844 .
. . 26158) CDS 26226 . . . 26909 gene = "L4(pVIII)" "L4 protein
VIII" CDS 26910 . . . 27230 gene = "E3" "E3 12.1 kDa protein" CDS
27184 . . . 27816 gene = "E3" "E3 23.3 kDa protein" CDS 27798 . . .
28322 gene = "E3" "E3 19 kDa protein" CDS 28352 . . . 28987 gene =
"E3" "E3 24.8 kDa protein" CDS 29296 . . . 30105 gene = "E3" "E3
29.7 kDa protein" CDS 30114 . . . 30389 gene = "E3" "E3 10.4 kDa
protein" CDS 30395 . . . 30835 gene = "E3" "E3 14.5 kDa protein"
CDS 30828 . . . 31229 gene = "E3" "E3 14.7 kDa protein"
polyA_signal 31279 . . . 31284 "E3" promoter 31292 . . . 31331 "L5"
CDS 31464 . . . 32741 gene = "L5(fiber)" "fiber protein"
polyA_signal 32801 . . . 32806 "L5" polyA_signal complement (32817
. . . 32822) "E4" CDS complement (32837 . . . 33031) gene = "E4"
"E4 7.4 KD protein" CDS complement (join gene = "E4" "E4 15.9 KD
protein" (32837 . . . 33085, 33811 . . . 33984)) CDS complement
(33085 . . . 33984) gene = "E4" "E4 34.6 KD protein" CDS complement
(33887 . . . 34255) gene = "E4" "E4 14.1 KD protein" polyA_signal
complement (34207 . . . 34212) "E4" CDS complement (34264 . . .
34617) gene = "E4" "E4 13.7 KD protein" CDS complement (34614 . . .
35003) gene = "E4" "E4 14.6 KD protein" CDS complement (35051 . . .
35425) gene = "E4" "E4 13.5 KD protein" repeat_region
complement(35758 . . . 35965) "ITR"
[0580] TABLE-US-00048 TABLE 44 Ad4FS_AF (Accession No. AY599837;
SEQ ID NO: 580) Sequence Features and putative gene products
Feature Type Location Comment or Gene Locus Product repeat_region 1
. . . 208 "ITR" CDS 575 . . . 1120 gene = "E1A" "ORF1; putative"
CDS Join (575 . . . 649, 1202 . . . 1306) gene = "E1A" "early E1A
6.8 KD protein" similarity to Simian Adenovirus 25 11 kDa, GI:
17227344 CDS Join (575 . . . 1108, 1201 . . . 1407) gene = "E1A"
"early E1A 27 KD protein" similarity to Simian Adenovirus 25 28.2
kDa, GI: 17227342 CDS Join (575 . . . 1015, 1201 . . . 1407) gene =
"E1A" "early E1A 23.5 KD protein" similarity to Simian Adenovirus
25 24.8 kDa, GI: 17227343 polyA_signal 1463 . . . 1468 "E1A"
promoter 1517 . . . 1556 "E1B" CDS join (1564 . . . 1965, 1976 . .
. 1993) gene = "E1B" "small T antigen (16.5 KD)" CDS 1564 . . .
2115 gene = "E1B" "early E1B 21.5 KD protein" similarity to Simian
Adenovirus 25 20.5 kDa, GI: 17227345 CDS join (1869 . . . 2123,
3258 . . . 3269) gene = "E1B" "early E1B 9 KD protein" similarity
to Simian Adenovirus 25 10.1 kDa, GI: 17227348 CDS join (1869 . . .
2615, 2618 . . . 3355) gene = "E1B" "large T antigen (55 KD)"
similarity to Simian Adenovirus 25 54.7 kDa, GI: 17227347 CDS join
(1869 . . . 2153, 3140 . . . 3355) gene = "E1B" "early E1B 17.9 KD
protein" similarity to Simian Adenovirus 25 18.5 kDa, GI: 17227346"
CDS 3439 . . . 3867 gene = "IX" "protein IX (hexon- associated
protein)" polyA_signal 3877 . . . 3882 "E1B and IX" polyA_signal
complement (3899 . . . 3904) "E2B and IVa2" CDS complement (join
(3931 . . . 5264, gene = "IVa2" "IVa2 protein 5543 . . . 5555))
(maturation protein)" CDS complement (join (5034 . . . 8606, gene =
"E2B(POL)" "DNA polymerase" 12151 . . . 12159)) CDS 5106 . . . 5675
gene = "E2B" "19.4 KD early protein." similarity to similar to
Human Adenovirus 7 hypothetical 20.6 kD early protein, GI: 58522
CDS 6127 . . . 6447 gene = "E2B" "11.5 KD early protein" similarity
to similar to Human Adenovirus 7 hypothtical 11.5 kD early protein,
GI: 58524 CDS 7815 . . . 8408 gene = "L1" "DNA-binding protein
(agnoprotein)" similarity to Simian Adenovirus 25 21.9 kDa, GI:
17227351 CDS join (7815 . . . 7820, 8537 . . . 8929) gene = "E2B"
"14.1 KD early protein" similarity to similar to Human Adenovirus 7
hypothetical 14.5 KD early protein, GI: 58528 CDS complement (join
gene = "E2B" "precursor terminal (8405 . . . 10324, 12151 . . .
12159)) protein" misc_RNA 10357 . . . 10515 "virus-associated RNA
I" "VA RNA-I, 159 nt" promoter complement (10458 . . . 10497) "E2B"
promoter 10541 . . . 10580 "L1" misc_RNA 10583 . . . 10683
"virus-associated RNA II" "VA RNA-II, 101 nt" CDS 10704 . . . 11876
gene = "L1(52K)" "52K protein" polyA_signal 11881 . . . 11886
"L1(52K)" CDS 11900 . . . 13681 gene = "L1(IIIa)" "protein IIIa"
polyA_signal 13695 . . . 13700 "L1(IIIa)" promoter 13754 . . .
13793 "L2" CDS 13761 . . . 15368 gene = "L2(penton)" "penton
protein(protein III)" polyA_signal 15371 . . . 15376 "L2(penton)"
CDS 15372 . . . 15953 gene = "L2(pVII)" "major core protein
(protein VII)" CDS 16001 . . . 17026 gene = "L2(pV)" "minor core
protein (protein V)" CDS 17049 . . . 17282 gene = "L2(pX)" "protein
X (protein mu)" CDS 17314 . . . 18072 gene = "L3(pVI)" "protein VI
(hexon- associated protein)" CDS 18179 . . . 20989 gene =
"L3(hexon)" "hexon protein (protein II)" CDS 21013 . . . 21633 gene
= "L3(23K)" "L3 23K proteinase(Adenain)" polyA_signal 21659 . . .
21664 "L3" polyA_signal complement (21701 . . . 21706) "E2A" CDS
complement (21708 . . . 23240) gene = "E2A(DBP)" "early E2A DNA-
binding protein" promoter complement (23168 . . . 23207) "E2A"
promoter 23209 . . . 23248 "L4" CDS 23269 . . . 25632 gene =
"L4(100K)" "100K protein" CDS 25367 . . . 25882 gene = "L4(22K)"
"22K protein" CDS join (25367 . . . 25672, gene = "L4(33K)" "33K
protein" 25842 . . . 26156) CDS 26224 . . . 26907 gene =
"L4(pVIII)" "L4 protein VIII" CDS 26908 . . . 27228 gene = "E3" "E3
12.1 kDa protein" CDS 27182 . . . 27814 gene = "E3" "E3 23.3 kDa
protein" CDS 27796 . . . 28320 gene = "E3" "E3 19 kDa protein" CDS
28350 . . . 28985 gene = "E3" "E3 24.8 kDa protein" CDS 29295 . . .
30104 gene = "E3" "E3 29.7 kDa protein" CDS 30113 . . . 30388 gene
= "E3" "E3 10.4 kDa protein" CDS 30394 . . . 30834 gene = "E3" "E3
14.5 kDa protein" CDS 30827 . . . 31228 gene = "E3" "E3 14.7 kDa
protein" polyA_signal 31278 . . . 31283 "E3" promoter 31291 . . .
31330 "L5" CDS 31463 . . . 32740 gene = "L5(fiber)" "fiber protein"
polyA_signal 32800 . . . 32805 "L5" polyA_signal complement (32816
. . . 32821) "E4" CDS complement (32836 . . . 33030) gene = "E4"
"E4 7.4 KD protein" CDS complement (join gene = "E4" "E4 15.9 KD
protein" (32836 . . . 33084, 33810 . . . 33983)) CDS complement
(33084 . . . 33983) gene = "E4" "E4 34.6 KD protein" CDS complement
(33886 . . . 34254) gene = "E4" "E4 14.1 KD protein" polyA_signal
complement (34206 . . . 34211) "E4" CDS complement (34263 . . .
34616) gene = "E4" "E4 13.7 KD protein" CDS complement (34613 . . .
35002) gene = "E4" "E4 14.6 KD protein" CDS complement (35050 . . .
35424) gene = "E4" "E4 13.5 KD protein" repeat_region complement
(35757 . . . 35964) "ITR"
[0581] TABLE-US-00049 TABLE 45 Ad5FS (Accession No. AY601635; SEQ
ID NO: 581) Sequence Features and putative gene products Feature
Type Location Comment or Gene Locus Product CDS 11565 . . . 12297
"52/55K protein gene" protein_id = "AAA96406.1" CDS 12318 . . .
14075 product = "protein IIIa" protein_id = "AAA96407.1" CDS 16286
. . . 16474 product = "protein pVII" protein_id = "AAA96408.1" CDS
16544 . . . 17650 product = "protein V" protein_id = "AAA96409.1"
CDS 17774 . . . 17920 product = "protein 5 protein_id =
"AAA96410.1" precursor" CDS 18003 . . . 18755 product = "protein
pVI" protein_id = "AAA96411.1" CDS 25819 . . . 26484 "100K protein"
protein_id = "AAA96412.1" Exon 26485 . . . 26514 "of a 33K protein"
Exon 26713 . . . 27083 "of a 33K protein" CDS 27174 . . . 27857
product = "protein pVIII" protein_id = "AAA96413.1"
[0582] TABLE-US-00050 TABLE 46 Ad7 (Accession No. AY594255; SEQ ID
NO: 582) Sequence Features and putative gene products Feature Type
Location Comment or Gene Locus Product repeat_region 1 . . . 108
"the inverted terminal repeat" promoter 481 . . . 486 "TATA box for
E1A" CDS Join (577 . . . 648, 1250 . . . 1351) gene = "E1A" "6 kD
protein" CDS Join (577 . . . 1156, 1250 . . . 1455) gene = "E1A"
"E1A 28 kD protein" CDS Join (577 . . . 1063, 1250 . . . 1455) gene
= "E1A" "mRNAII product, homologue of gi|209784" polyA_site 1494 .
. . 1499 "polyA site for E1A gene. GenScan Prediction" promoter
1549 . . . 1554 "TATA box for E1B" CDS 1603 . . . 2139 gene = "E1B"
"19 kD small T antigen, homologous to the unnamed protein product
in Ad7, gi|58518" CDS 1908 . . . 3386 gene = "E1B" "55 kD protein,
homologous to gi|58519 in Ad7" CDS 3144 . . . 3386 gene = "E1B"
"hypothetical protein derived from 1.2 kB mRNA" Promoter 3384 . . .
3389 "TATA box for proteinIX" polyA_site 3405 . . . 3410 "possible
polyA site for the E1b genes" CDS 3481 . . . 3897 gene =
"proteinIX" "hexon associated protein IX, homologous to the unnamed
protein gi|58521, in Ad7" polyA_site 3910 . . . 3915 "Genscan
predicted polyA site for the proteinIX gene" polyA_site complement
(3928 . . . 3933) "Genscan predicted polyA site for E2B genes" CDS
complement (join (3950 . . . 5283, gene = "IVa2" "maturation
protein 5562 . . . 5574)) IVA2" CDS Complement (5053 . . . 8421)
gene = "E2B" "DNA polymerase" Promoter 5874 . . . 5879 "TATA box
for the major late promoter" CDS 7987 . . . 8427 gene = "L1" "DNA
binding protein, homologous to protein gi|17227351 in Adenovirus
type E" CDS complement (8231 . . . 8575) gene = "E2B" "hypothetical
12.6 kD early protein, gi|139931 in Ad7" CDS complement (8424 . . .
10346) gene = "E2B" "terminal protein precursor, homologue in Ad7
is gi|74214" CDS 8550 . . . 8951 gene = "L1" "hypothetical 14.5 kD
protein" CDS complement (9545 . . . 9859) gene = "E2B" "11.3 kD
early protein, homologue in Ad7 is gi|139914" CDS 9759 . . . 10034
gene = "L1" "9.7 kD hypothetical protein, homologue in Ad7 is
gi|140138" misc_RNA 10423 . . . 10592 "VA RNA I" misc_RNA 10668 . .
. 10838 "VA RNA II" CDS 10890 . . . 12026 gene = "L1" "55 kD
protein" CDS 12051 . . . 13817 gene = "L1" "peripentonal hexon
associated protein IIIa" polyA_site 13830 . . . 13835 "Genscan
predicted polyA site for L1" CDS 13905 . . . 15539 gene = "L2"
"penton base protein (III)" polyA_site 15545 . . . 15550 "Genscsan
predicted polyA site for L2" CDS 15550 . . . 16128 gene = "L2"
"protein VII precursor" CDS 16171 . . . 17000 gene = "L2" "minor
core protein, protein V" polyA_site 17497 . . . 17502 "Genscan
predicted polyA site for L2" CDS 17554 . . . 18306 gene = "L3"
"protein VI precursor" CDS 18419 . . . 21232 gene = "L3" "hexon
protein" CDS 21269 . . . 21898 gene = "L3" "23K protease"
polyA_site 21918 . . . 21923 "Genscan predicted polyA site for L3"
polyA_site complement (21930 . . . 21935) "Genscan predicted polyA
site for E2A" CDS complement (21985 . . . 23538) gene = "E2A" "DNA
binding protein" CDS 23569 . . . 26055 gene = "L4" "100k hexon-
assembly associated protein" CDS complement (24924 . . . 25133)
gene = "hypothetical" "hypothetical protein" CDS 25757 . . . 26356
gene = "L4" "22K protein" CDS join (25850 . . . 26225, gene = "L4"
"33 kD protein" 26275 . . . 26630) polyA_site 26662 . . . 26667
"possible polyA site for the L4 gene" CDS 26700 . . . 27383 gene =
"L4" "protein VIII, hexon- associated protein precursor" Promoter
27065 . . . 27070 "TATA box for E3" CDS 27383 . . . 27703 gene =
"E3" "12.1 kD glycoprotein" CDS 27657 . . . 28097 gene = "E3" "16.1
kD protein" CDS 28082 . . . 28600 gene = "E3" "19.3 kD MHC classI
antigen-binding glycoprotein precursor" CDS 28630 . . . 29169 gene
= "E3" "hypothetical 20.6 kD protein" CDS 29182 . . . 29751 gene =
"E3" "20.6 kD protein" CDS 29766 . . . 29966 gene = "E3" "7.7 kD
protein" CDS 30083 . . . 30313 gene = "E3" "10.3 kD protein" CDS
30285 . . . 30722 gene = "E3" "14.9 kD protien" CDS 30715 . . .
31122 gene = "E3" "14.7 kD protein" polyA_site 31128 . . . 31133
"Genscan predicted polyA site for E3" CDS complement (31141 . . .
31305) gene = "" "U protein" CDS 31320 . . . 32297 gene = "L5"
"fiber protein" polyA_site complement (32317 . . . 32322) "Genscan
prediction for polyA site of E4" CDS complement (32333 . . . 32584)
gene = "E4" "orf 6/7 protein" CDS complement (32581 . . . 33480)
gene = "E4" "33.2 kD protein" CDS complement (33383 . . . 33751)
gene = "E4" "13.6 kD protein" CDS 33606 . . . 34115 gene = "L5"
"agnoprotein" CDS complement (33760 . . . 34113) gene = "E4" "34 kD
protein" CDS complement (34110 . . . 34499) gene = "E4" "130aa
protein" polyA_site 34122 . . . 34127 "possible polyA site for the
L5 gene" CDS complement (34541 . . . 34918) gene = "E4" "13.9 kD
protein" Promoter complement (35000 . . . 35005) "TATA box for E4"
repeat_region complement (35199 . . . 35306) "the inverted terminal
repeat"
[0583] TABLE-US-00051 TABLE 47 Ad7FS_navy (Accession No. AY601634;
SEQ ID NO: 583) Sequence Features and putative gene products
Feature Type Location Comment or Gene Locus Product repeat_region 0
. . . 135 "the inverted terminal repeat" promoter 479 . . . 484
"TATA box for E1A" CDS 575 . . . 1168 gene = "E1A" "hypothetical
E1A protein" CDS join (575 . . . 646, 1249 . . . 1350) gene = "E1A"
"6 KD protein" CDS join (575 . . . 1154, 1249 . . . 1454) gene =
"E1A" "mRNAI, 28 KD E1A protein" CDS join (575 . . . 1061, 1249 . .
. 1454) gene = "E1A" "mRNAII protein" polyA_signal 1493 . . . 1498
"polyA signal for E1A gene" promoter 1548 . . . 1553 "TATA box for
E1B" CDS 1602 . . . 2138 gene = "E1B" "20 KD small T antigen" CDS
1907 . . . 3385 gene = "E1B" "transformation associated 55 KD
protein" Promoter 3383 . . . 3388 "TATA box for proteinIX"
polyA_signal 3401 . . . 3406 "possible polyA signal for E1B gene"
CDS 3479 . . . 3895 gene = "pIX" "proteinIX" polyA_signal 3908 . .
. 3913 "polyA signal for the proteinIX gene" polyA_signal
complement (3926 . . . 3931) "possible polyA signal for E2B genes"
CDS complement (join (5560 . . . 5572, gene = "E2B" "maturation
protein 3948 . . . 5281) IVa2" CDS complement (5051 . . . 8419)
gene = "E2B" "DNA polymerase" Promoter 5872 . . . 5877 "TATA box
for the major late promoter" CDS 6144 . . . 6464 gene =
"hypothetical" "hypothetical protein A-106" CDS complement (6868 .
. . 7389) gene = "E2B" "hypothetical 19 KD protein" CDS 7133 . . .
7420 gene = "L1" "hypothetical 10.4 KD protein" CDS 7829 . . . 8425
gene = "L1" "15.3 KD agnoprotein" CDS complement (8328 . . . 8573)
gene = "E2B" "hypothetical 12.6 kD early protein" CDS complement
(8422 . . . 10344) gene = "E2B" "preterminal DNA- binding protein"
CDS 8548 . . . 8949 gene = "hypothetical" "hypothetical 14.5 KD
early protein" CDS 9757 . . . 10032 gene = "hypothetical"
"hypothetical 9.7 KD protein" Misc_RNA 10424 . . . 10584 "VA RNA
I", Misc_RNA 10655 . . . 10829 "VA RNA II" CDS 10806 . . . 11975
gene = L1" "55 KD protein" CDS 12000 . . . 13766 gene = "L1"
"peripentonal hexon associated protein IIIA" polyA_signal 13779 . .
. 13784 "polyA signal for L1" CDS 13854 . . . 15488 gene = "L2"
"penton base protein" CDS 15500 . . . 16078 gene = "L2" "major core
protein precursor pVII" CDS 16121 . . . 17173 gene = "L2" "minor
core protein precursor pV" polyA_signal 17448 . . . 17453 "possible
polyA signal for L2" CDS 17505 . . . 18239 gene = "L3" "pVI
precursor" CDS 18352 . . . 21156 gene = "L3" "the hexon protein"
CDS 21193 . . . 21822 gene = "L3" "23 KD proteinase" polyA_signal
21842 . . . 21847 "possible polyA signal for L3" polyA_signal
complement (21854 . . . 21859) "possible polyA signal for E2A" CDS
complement (21911 . . . 23464) gene = "E2A" "DNA binding protein"
CDS 23495 . . . 25984 gene = "L4" "100 KD protein" CDS 25686 . . .
26285 gene = "L4" "22 KD protein" CDS join gene = "L4" "33 KD
protein" (25686 . . . 26034, 26204 . . . 26559) CDS 26719 . . .
27312 gene = "L4" "pVIII protein" Promoter 26994 . . . 26999
"putative TATA box for the E3 gene" CDS 27312 . . . 27632 gene =
"E3A" "12.1 KD glycoprotein" polyA_signal 27391 . . . 27396
"possible polyA signal for L4" CDS 27586 . . . 28026 gene = "E3A"
"16.1 KD protein" CDS 28011 . . . 28529 gene = "E3A" "19 KD MHC
classI antigen-binding glycoprotein" CDS 28559 . . . 29083 gene =
"E3A" "20.3 KD glycoprotein" CDS 29110 . . . 29679 gene = "E3A"
"20.3 KD protein" CDS 29694 . . . 29819 gene = "E3A" "7.7 KD
protein" CDS 29931 . . . 30206 gene = "E3B" "10.3 KD protein" CDS
30178 . . . 30615 gene = "E3B" "14.9 KD protein" CDS 30608 . . .
31015 gene = "E3B" "15.3 KD protein" polyA_signal 31021 . . . 31026
"putative polyA signal for E3" CDS 31213 . . . 32190 gene = "L5"
"fiber protein" polyA_signal} complement (32209 . . . 32214) "polyA
signal for the E4 gene" CDS complement (32225 . . . 32476) gene =
"E4" "ORF6/7" CDS complement (32473 . . . 33372) gene = "E4" "32 KD
protein" CDS complement (33275 . . . 33643) gene = "E4" "13.6 KD
protein" CDS 33498 . . . 34007 gene = "L5" "agnoprotein" CDS
complement (33652 . . . 34005) gene = "E4" "13 KD protein" CDS
complement (34002 . . . 34391) gene = "E4" "130aa protein CDS
complement (34433 . . . 34810) gene = "E4" "13.9 KD protein"
Promoter complement (34891 . . . 24896) "possible TATA box for the
E4 gene" repeat_region complement (35062 . . . 35197) "the inverted
terminal repeat"
[0584] TABLE-US-00052 TABLE 48 Ad7vaccine (Accession No. AY594256;
SEQ ID NO: 584) Sequence Features and putative gene products
Feature Type Location Comment or Gene Locus Product misc_feature 1
. . . 136 "the inverted terminal repeat" promoter 476 . . . 481
"TATA box for the E1A genes" CDS join (572 . . . 647, 1247 . . .
1348) gene = "E1A" "E1A 6 kD protein" CDS join (572 . . . 1157,
1246 . . . 1452) gene = "E1A" "E1A mRNA I protein, homologous to
the 29.1 kD protein in Ad11" CDS join (572 . . . 1067, 1246 . . .
1452) gene = "E1A" "E1A mRNA II protein, homologous to the 25.7 kD
E1A protein in Ad 11" polyA_site 1490 . . . 1495 promoter 1545 . .
. 1550 "TATA box for E1B" CDS 1599 . . . 2136 gene = "E1B" "20 kd
protein, small T antigen" CDS 1904 . . . 3382 gene = "E1B" "55 kD
protein" promoter 3380 . . . 3385 "TATA box for pIX" CDS 3476 . . .
3892 gene = "pIX" "protein IX" polyA_site 3905 . . . 3910
polyA_site complement (3923 . . . 3928) CDS complement (join (3945
. . . 5278, gene = "E2B" "pIVA2" 5557 . . . 5569)) CDS complement
(5048 . . . 8416) gene = "E2B" "DNA polymerase" promoter 5869 . . .
5874 "TATA box for the major late promoter" CDS 6141 . . . 6461
gene = "hypothetical" "A-106 hypothetical protein" CDS 7826 . . .
8422 gene = "agnoprotein gene" "13.6 kD agnoprotein" CDS complement
(8419 . . . 10341) gene = "E2B" "DNA terminal protein" CDS
complement (9540 . . . 9854) gene = "hypothetical" "11.3 kD
hypothetical protein" CDS 9754 . . . 10029 gene = "hypothetical"
"protein = 9.7 kD hypothetical protein" misc_RNA 10403 . . . 10821
"VA RNA, SHORTENED "VA RNA" DUE TO A 25bp DELETION" CDS 10828 . . .
11997 gene = "L1" "55 kD protein" CDS 12022 . . . 13788 gene = "L1"
"pIIIA precursor" CDS 13876 . . . 15510 "L2" "penton protein III
precursor" polyA_site 15512 . . . 15517 CDS 15520 . . . 16098 gene
= "L2" "protein VII precursor" CDS 16141 . . . 17193 gene = "L2"
"protein V" polyA_site 17467 . . . 17472 CDS 17523 . . . 18275 gene
= "L3" "protein VI precursor" CDS 18388 . . . 21192 gene = "L3"
"hexon" CDS 21229 . . . 21858 gene = "L3" "23K protease" polyA_site
21878 . . . 21883 polyA_site complement (21890 . . . 21895) CDS
complement (21947 . . . 23500) gene = "E2A" "DNA binding protein"
CDS 23531 . . . 26020 gene = "L4" "hexon protein" CDS 25722 . . .
26321 gene = "L4" "33 KD protein" CDS join (25722 . . . 26070, gene
= "L4" "33 kD protein" 26252 . . . 26595) CDS 26665 . . . 27348
gene = "L4" "pIII protein" promoter 27030 . . . 27035 "TATA box for
E3" CDS 27348 . . . 27668 gene = "E3" "12.1 kD glycoprotein" CDS
27622 . . . 28062 gene = "E3" "16.1 kD protein" CDS 28047 . . .
28565 gene = "E3" "18.3 kD glycoprotein precursor" CDS 28595 . . .
29134 gene = "E3" "E3 20.1 kD protein" CDS 29147 . . . 29716 gene =
"E3" "E3 20.6 kD protein duplication" CDS 29731 . . . 29856 gene =
"E3" "E3 7.7 kD protein" CDS 29969 . . . 30244 gene = "E3" "E3 10.3
kD protein" CDS 30249 . . . 30653 gene = "E3B" "E3B 14.9 kD protein
precursor" CDS 30646 . . . 31053 gene = "E3B" "E3B 14.7 kD protein"
polyA_site 31059 . . . 31064 CDS 31251 . . . 32228 gene = "L5" "L5
fiber protein" polyA_site complement (32247 . . . 32252) CDS
complement (32263 . . . 32514) gene = "E4" "E4 orf6/7" polyA_site
32764 . . . 32769 CDS complement (33313 . . . 33681) gene = "E4"
"E4 13.6 kD protein" CDS 33536 . . . 34045 gene = "probable
agnoprotein "probable gene" agnoprotein" CDS complement (33690 . .
. 34043) gene = "E4" "E4 13 kD protein" CDS complement (34040 . . .
34429) gene = "E4" "13.9 kD protein" CDS complement (34471 . . .
34848) gene = "E4" "hypothetical protein"
[0585] TABLE-US-00053 TABLE 49 Ad16 (Accession No. AY594256; SEQ ID
NO: 585) Sequence Features and putative gene products Feature Type
Location Comment or Gene Locus Product repeat_region 1 . . . 114
"the inverted terminal repeat" promoter 478 . . . 483 "TATA box of
the E1 promoter" CDS join (574 . . . 645, 1247 . . . 1348) gene =
"E1A" "6.3 kDa protein" CDS join (574 . . . 1060, 1247 . . . 1452)
gene = "E1A" "25.7 kDa protein" CDS join (574 . . . 1153, 1247 . .
. 1452) gene = "E1A" "28 kDa protein" polyA_signal 1489 . . . 1494
"polyA signal for E1A" promoter 1544 . . . 1549 "TATA box for the
E1B gene" CDS 1598 . . . 2134 gene = "E1B" "19K small T-antigen
protein" CDS 1903 . . . 3381 gene = "E1B" "55K large T antigen
protein" promoter 3444 . . . 3449 "TATA box for pIX" CDS 3476 . . .
3892 gene = "pIX" "proteinIX" polyA_signal 3905 . . . 3910 "polyA
signal for pIX" polyA_signal complement (3923 . . . 3928) "polyA
signal for E2B" CDS complement (join (3945 . . . 5278, gene = "E2B"
"maturation protein 5557 . . . 5569)) pIVa2" CDS complement (5048 .
. . 8416) gene = "E2B" "DNA polymerase" promoter 5869 . . . 5874
"TATA box for the Major Late Promoter" CDS 7130 . . . 7417 gene =
"hypothetical" "hypothetical 10.4K early protein" CDS 7826 . . .
8422 gene = "hypothetical" "probable DNA binding agnoprotein" CDS
complement (8226 . . . 8570) gene = "E2B" "hypothetical 12.6K early
protein" CDS complement (join (8419 . . . 10386, gene = "E2B"
"Terminal protein 13843 . . . 13851)) precursor" CDS 10851 . . .
12020 gene = "L1" "55K protein" CDS 12045 . . . 13811 gene = "L1"
"protein IIIa precursor" polyA_signal 13825 . . . 13830 "polyA
signal for L1" CDS 13902 . . . 15569 gene = "L2" "penton base
protein" CDS 15582 . . . 16160 gene = "L2" "protein VII precursor"
CDS 16203 . . . 17255 gene = "L2" "32K proteinV" CDS 17284 . . .
17511 gene = "L2" "proteinX" polyA_signal 17529 . . . 17534 "polyA
signal for L2" CDS 17586 . . . 18284 gene = "L3" "protein VI
precursor" CDS 18450 . . . 21272 gene = "L3" "the hexon protein"
CDS 21309 . . . 21938 gene = "L3" "23K protease" polyA_signal 21958
. . . 21963 "polyA signal for L3" polyA_signal complement (21970 .
. . 21975) "polyA signal for E2A" CDS complement (22027 . . .
23580) gene = "E2A" "early DNA binding protein" CDS 23611 . . .
26097 gene = "L4" "100k protein" CDS 25799 . . . 26398 gene = "L4"
"22K protein" CDS join (25799 . . . 26147, "33K protein" gene =
"L4" 26317 . . . 26672) CDS 26742 . . . 27425 gene = "L4" "protein
VIII precursor" promoter 27107 . . . 27112 "TATA box for E3" CDS
27425 . . . 27745 gene = "E3A" "12.2K glycoprotein" CDS 27699 . . .
28139 gene = "E3A" "16.1K membrane protein" polyA_signal 27734 . .
. 27739 "possible polyA signal for L4" CDS 28124 . . . 28642 gene =
"E3" "18.5K glycoprotein precursor" CDS 28672 . . . 29211 gene =
"E3" "20.1K protein" CDS 29224 . . . 29793 gene = "E3" "20.5K
glycoprotein" CDS 29808 . . . 30023 gene = "E3" "7.7K protein" CDS
30133 . . . 30408 gene = "E3" "10.3K protein" CDS 30380 . . . 30817
gene = "E3" "14.9K protein" CDS 30810 . . . 31217 gene = "E3"
"14.7K protein" polyA_signal 31258 . . . 31263 "polyA signal for
E3" CDS complement (31269 . . . 31433) gene = "U" "U exon protein"
CDS 31448 . . . 32509 gene = "L5" "the fiber protein" polyA_signal
complement (31735 . . . 31740) "polyA signal for E4" polyA_signal
32520 . . . 32525 "possible polyA signal for L5" CDS complement
(32552 . . . 32803) gene = "E4" "ORF6/7" CDS complement (32800 . .
. 33696) gene = "E4" "34K protein" CDS complement (33599 . . .
33967) gene = "E4" "13.6K protein" CDS 33822 . . . 34331 gene =
"L5" "DNA binding agnoprotein" CDS complement (33976 . . . 34329)
gene = "E4" "13K protein" CDS complement (34326 . . . 34715) gene =
"E4" "14.3K protein" CDS complement (34757 . . . 35134) gene = "E4"
"13.9K protein" promoter complement (35216 . . . 35221) "TATA box
for E4" repeat_region complement (35409 . . . 35522) "the inverted
terminal repeat"
[0586] TABLE-US-00054 TABLE 50 Ad1 (Accession No. AF534906; SEQ ID
NO: 586) Sequence Features and putative gene products Feature Type
Location Comment or Gene Locus Product CDS join (560 . . . 1112,
1230 . . . 1546) gene = "E1a" "32 kDa protein" CDS join (560 . . .
976, 1232 . . . 1546) gene = "E1a" "26 kDa protein" CDS join (560 .
. . 643, 1236 . . . 1319) gene = "E1a" "6 kDa protein" CDS 1717 . .
. 2259 gene = "E1a" "21 kDa protein" CDS 2022 . . . 3524 gene =
"E1b" "transformation- associated protein 55 kDa" CDS join (2022 .
. . 2270, 3291 . . . 3524) gene = "E1b" "E1b" CDS join (2022 . . .
2270, 3233 . . . 3277) gene = "E1b" "E1b" CDS 3621 . . . 4043 gene
= "IX" "hexon-associated protein 14.5 kDa" CDS complement (4102 . .
. 5460) gene = "IVa2" "virion morphogenesis- associated protein 51
kDa" CDS complement (5208 . . . 8378) gene = "IVa2" "DNA polymerase
120 kDa" CDS 7989 . . . 8438 gene = "L1" "16.7 kDa protein" CDS
complement (8594 . . . 10552) gene = "E2b" "terminal protein 75
kDa" CDS complement (10598 . . . 10996) "unknown" CDS 11059 . . .
12306 gene = "L1" "47 kDa protein" CDS 12327 . . . 14084 gene =
"IIIa" "peripentonal hexon- associated protein 65 kDa" CDS 14166 .
. . 15890 gene = "L3_1" "penton protein 64 kDa" CDS 15897 . . .
16493 gene = "pro-VII" "major core protein 22 kDa precursor" CDS
16563 . . . 17669 gene = "pV" "minor core protein 42 kDa" CDS 17793
. . . 17939 gene = "L2" "pmu 8.8 kDa" CDS 18022 . . . 18774 gene =
"pVI" "hexon-associated protein 27 kDa precursor" CDS 18861 . . .
21755 gene = "L4" "hexon protein 109 kDa" CDS 21788 . . . 22402
gene = "L3_1" "endopeptidase 23 kDa" CDS complement (22500 . . .
24089) gene = "E2a_1" "DNA-binding protein 59 kDa" CDS 24118 . . .
26541 gene = "L5" "hexon assembly- associated protein 90 kDa" CDS
join (26252 . . . 26566, 26769 . . . 27137) gene = "E2a_2" "virion
morphogenesis- associated protein 25 kDa" CDS 27225 . . . 27908
gene = "pVIII" "hexon-associated protein 25 kDa" CDS 27909 . . .
28232 gene = "E3A" "12 kDa protein" CDS 28775 . . . 29257 gene =
"E3" "glycosylated membrane protein 18.6 kDa" CDS 29532 . . . 29816
gene = "E3" "10.7 kDa protein" CDS 30106 . . . 30507 gene = "E3"
"14.9 kDa protein" CDS 31101 . . . 32849 gene = "L5" "fiber protein
62 kDa" CDS complement (join gene = "E4" "20 kDa protein" (33976 .
. . 34063, 34768 . . . 34865, 35232 . . . 35594)) CDS complement
(join gene = "E4" "27 kDa protein" (33976 . . . 34063, 34768 . . .
35054, 35232 . . . 35594)) CDS complement (join gene = "E4" "32 kDa
protein" (33976 . . . 34063, 34768 . . . 35168, 35232 . . . 35594))
CDS complement (join gene = "E4" "17 kDa protein" (34764 . . .
34865, 35232 . . . 35594)) CDS complement (join gene = "E4" "24 kDa
protein" (34764 . . . 35054, 35232 . . . 35594)) CDS complement
(join gene = "E4" "28.6 kDa protein" (34764 . . . 35168, 35232 . .
. 35594))
[0587] TABLE-US-00055 TABLE 51 Ad21 (Accession No. AY601633; SEQ ID
NO: 587) Sequence Features and putative gene products Feature Type
Location Comment or Gene Locus Product repeat_unit 1 . . . 114
"ITR" CDS join (574 . . . 645, 1247 . . . 1348) gene = "E1A" "E1A
6.8 KD protein" CDS join (574 . . . 1155, 1249 . . . 1452) gene =
"E1A" "E1A 28.4 KD protein" CDS join (574 . . . 1062, 1249 . . .
1452) gene = "E1A" "E1A 24.6 KD protein" polyA_signal 1491 . . .
1496 "E1A" promoter 1541 . . . 1580 "E1B and IX" CDS 1905 . . .
3383 gene = "E1B" "E1B large T antigen" CDS 3477 . . . 3893 gene =
"IX" "protein IX (hexon- associated protein)" polyA_signal 3906 . .
. 3911 "E1B and IX" polyA_signal complement (3924 . . . 3929) "E2B
and IVa2" CDS complement (join (3946 . . . 5279, gene = "IVa2"
"IVa2 protein 5558 . . . 5570)) (maturation protein)" CDS
complement (5049 . . . 8417) gene = "E2B(POL)" "DNA polymerase" CDS
6142 . . . 6462 gene = "unassigned" "hypothetical 11.5 KD protein"
CDS complement (6866 . . . 7387) gene = "unassigned" "hypothetical
19 KD protein" CDS 7131 . . . 7418 gene = "unassigned"
"hypothetical 10.4 KD protein" CDS join (7827 . . . 8228, 9478 . .
. 9495) gene = "unassigned" "DNA binding protein" CDS complement
(8227 . . . 8571) gene = "unassigned" "hypothetical 12.6 KD
protein" CDS complement (8420 . . . 10342) gene = "E2B(pTP)" "DNA
terminal protein" CDS 8546 . . . 8947 gene = "unassigned"
"hypothetical 14.5 KD protein" CDS complement (9541 . . . 9855)
gene = "unassigned" "hypothetical 11.5 KD protein" CDS 9755 . . .
10030 gene = "unassigned" "hypothetical 9.7 KD protein" promoter
complement (10521 . . . 10560) "E2B and IVa2" promoter 10576 . . .
10615 "L1" CDS 10857 . . . 12026 gene = "L1(52K)" "L1 52K protein"
CDS 12054 . . . 13805 gene = "L1(IIIa)" "protein IIIa" CDS 13878 .
. . 15563 gene = "L2(penton)" "penton protein(protein III)"
polyA_signal 15565 . . . 15570 "L2 (penton)" CDS 15572 . . . 16150
gene = "L2(pVII)" "major core protein (protein VII)" CDS 16190 . .
. 17251 gene = "L2(pV)" "minor core protein (protein V)" CDS 17280
. . . 17510 gene = "L2(pX)" "protein X (protein mu)" polyA_signal
17528 . . . 17533 "L2 (X)" promoter 17542 . . . 17581 "L3" CDS
17583 . . . 18332 gene = "L3(pVI)" "protein VI (hexon- associated
protein)" CDS 18454 . . . 21303 gene = "L3(hexon)" "hexon protein
(protein II)" CDS 21340 . . . 21969 gene = "L3(23K)" "23K
proteinase (Adenain)" polyA_signal 21989 . . . 21994 "L3"
polyA_signal complement (22001 . . . 22006) "E2A" CDS complement
(22058 . . . 23611) gene = "E2A(DBP)" "early E2A DNA- binding
protein" promoter complement (23560 . . . 23599) "E2A" promoter
23601 . . . 23641 "L4" CDS 23642 . . . 26113 gene = "L4(100K)"
"100K protein" CDS 25815 . . . 26414 gene = "L4(22K)" "22K protein"
CDS join (25815 . . . 26163, gene = "L4(33K)" "33K protein" 26354 .
. . 26688) polyA_signal 26559 . . . 26564 "LA (100K and 22K)" CDS
26758 . . . 27441 gene = "L4(pVIII)" "LA protein VIII" CDS 27441 .
. . 27761 gene = "E3" "E3 12.1 KD protein" CDS 27715 . . . 28155
gene = "E3" "E3 16 KD protein" polyA_signal 27750 . . . 27755 "E3"
CDS 28140 . . . 28658 gene = "E3" "E3 19.2 KD protein" CDS 28688 .
. . 29227 gene = "E3" "E3 20 KD protein" CDS 29240 . . . 29827 gene
= "E3" "E3 21.2 KD protein" CDS 29857 . . . 30084 gene = "E3" "E3
8.8 KD protein" CDS 30124 . . . 30399 gene = "E3" "E3 10.3 KD
protein" CDS 30371 . . . 30808 gene = "E3" "E3 16.6 KD protein" CDS
30801 . . . 31208 gene = "E3" "E3 15.3 KD protein" CDS 31406 . . .
32377 gene = "L5(fiber)" "fiber protein" polyA_signal 32380 . . .
32385 "L5" polyA_signal complement (32397 . . . 32402) "E4" CDS
complement (join gene = "E4" "E4 16 KD protein" (32409 . . . 32660,
33383 . . . 33556)) CDS complement (32657 . . . 33556) gene = "E4"
"E4 34.7 KD protein" CDS complement (33459 . . . 33827) gene = "E4"
"E4 14.3 KD protein" CDS 33682 . . . 34191 gene = "unassigned"
"agonoprotein" similarity to Human Adenovirus B agonoprotein, GI:
32967054 CDS complement (33836 . . . 34189) gene = "E4" "E4 13.6 KD
protein" CDS complement (34186 . . . 34575) gene = "E4" "E4 14.4 KD
protein" CDS complement (34617 . . . 34994) gene = "E4" "E4 14.2 KD
protein" repeat_unit complement (35269 . . . 35382) "ITR"
[0588] For the products designated above in Tables 39-51, the
present inventors note that it is routine in the art by referring
to the universal genetic code to translate the nucleic acid
sequence identified in the "location" column to the corresponding
amino acid sequence. As such, the amino acid sequences designated
in the "product" column have not been explicitly listed.
REFERENCES
[0589] Albert, T. J., Norton, J., Ott, M., Richmond, T., Nuwaysir,
K., Nuwaysir, E. F., Stengele, K. P., Green, R. D. 2003.
Light-directed 5'.fwdarw.3' synthesis of complex oligonucleotide
microarrays. Nucleic Acids Res 31:e35 [0590] Bohlander, S. K.,
Espinosa, R., 3rd, Le Beau, M. M., Rowley, J. D., Diaz, M. O. 1992.
A method for the rapid sequence-independent amplification of
microdissected chromosomal material. Genomics 13:1322-4 [0591]
Cherkasova, E., Laassri, M., Chizhikov, V., Korotkova, E.,
Dragunsky, E., Agol, V. I., Chumakov, K. 2003. Microarray analysis
of evolution of RNA viruses: evidence of circulation of virulent
highly divergent vaccine-derived polioviruses. Proc Natl Acad Sci U
S A 100:9398-403 [0592] Chizhikov, V., Rasooly, A., Chumakov, K.,
Levy, D. D. 2001. Microarray analysis of microbial virulence
factors. Appl Environ Microbiol 67:3258-63 [0593] Cutler, D. J.,
Zwick, M. E., Carrasquillo, M. M., Yohn, C. T., Tobin, K. P.,
Kashuk, C., Mathews, D. J., Shah, N. A., Eichler, E. E.,
Warrington, J. A., Chakravarti, A. 2001. High-throughput variation
detection and genotyping using microarrays. Genome Res 11:1913-25
[0594] Devereux, J., Haeberli, P., Smithies, O. 1984. A
comprehensive set of sequence analysis programs for the VAX.
Nucleic Acids Res 12:387-95 [0595] Ferguson, J. A., Steemers, F.
J., Walt, D. R. 2000. High-density fiber-optic DNA random
microsphere array. Anal Chem 72:5618-24 [0596] Ginger, D. S.,
Zhang, H., Mirkin, C. A. 2004. The evolution of dip-pen
nanolithography. Angew Chem Int Ed Engl 43:30-45 [0597] Gingeras,
T. R., Ghandour, G., Wang, E., Berno, A., Small, P. M.,
Drobniewski, F., Alland, D., Desmond, E., Holodniy, M., Drenkow, J.
1998. Simultaneous genotyping and species identification using
hybridization pattern recognition analysis of generic Mycobacterium
DNA arrays. Genome Res 8:435-48 [0598] Gingeras, T. R., Mack, D.,
Chee, M. S., Bemo, A. J., Small, P. M., Drobniewski, F., Alland,
D., Desmond, E., Holodniy, M., Drenkow, J. 2001. Chip-Based Species
Identification and Phenotype Characterization of Microorganisms.
Affymetrix, Inc., US [0599] Hoffmann, E., Stech, J., Guan, Y.,
Webster, R. G., Perez, D. R. 2001. Universal primer set for the
full-length amplification of all influenza A viruses. Arch Virol
146:2275-89 [0600] Kampke, T., Kieninger, M., Mecklenburg, M. 2001.
Efficient primer design algorithms. Bioinformatics 17:214-25 [0601]
Kessler, N., Ferraris, O., Palmer, K., Marsh, W., Steel, A. 2004.
Use of the DNA Flow-Thru Chip, a Three-Dimensional Biochip, for
Typing and Subtyping of Influenza Viruses. J Clin Microbiol
42:2173-2185 [0602] Korf, I., Yandell, M., Bedell, J. 2003. BLAST.
O'Reilly and Associates, Sebastopol, Calif. Kozal, M. J., Shah, N.,
Shen, N., Yang, R., Fucini, R., Merigan, T. C., Richman, D. D.,
Morris, D., Hubbell, E., Chee, M., Gingeras, T. R. 1996. Extensive
polymorphisms observed in HIV-1 clade B protease gene using
high-density oligonucleotide arrays. Nat Med 2:753-9 [0603] Lee, C.
2003. Generating consensus sequences from partial order multiple
sequence alignment graphs. Bioinformatics 19:999-1008 [0604] Lin,
B., Vora, G. J., Thach, D., Walter, E., Metzgar, D., Tibbetts, C.,
Stenger, D. A. 2004. Rapid detection and serotyping of acute
respiratory disease-associated adenoviruses with oligonucleotide
microarrays. Journal of Clinical Microbiology in press [0605]
Meinkoth, J., Wahl, G. 1984. Hybridization of nucleic acids
immobilized on solid supports. Anal Biochem 138:267-84 [0606]
Needleman, S. B., Wunsch, C. D. 1970. A general method applicable
to the search for similarities in the amino acid sequence of two
proteins. J Mol Biol 48:443-53 [0607] Nuwaysir, E. F., Huang, W.,
Albert, T. J., Singh, J., Nuwaysir, K., Pitas, A., Richmond, T.,
Gorski, T., Berg, J. P., Ballin, J., McCormick, M., Norton, J.,
Pollock, T., Sumwalt, T., Butcher, L., Porter, D., Molla, M., Hall,
C., Blattner, F., Sussman, M. R., Wallace, R. L., Cerrina, F.,
Green, R. D. 2002. Gene expression analysis using oligonucleotide
arrays produced by maskless photolithography. Genome Res 12:1749-55
[0608] Ochman, H., Lawrence, J. G., Groisman, E. A. 2000. Lateral
gene transfer and the nature of bacterial innovation. Nature
405:299-304 [0609] Offringa, D. P., Tyson-Medlock, V., Ye, Z.,
Levandowski, R. A. 2000. A comprehensive systematic approach to
identification of influenza A virus genotype using RT-PCR and RFLP.
J Virol Methods 88:15-24 [0610] Strizhkov, B. N., Drobyshev, A. L.,
Mikhailovich, V. M., Mirzabekov, A. D. 2000. PCR amplification on a
microarray of gel-immobilized oligonucleotides: detection of
bacterial toxin- and drug-resistant genes and their mutations.
Biotechniques 29:844-8, 850-2, 854 passim [0611] Troesch, A.,
Nguyen, H., Miyada, C. G., Desvarenne, S., Gingeras, T. R., Kaplan,
P. M., Cros, P., Mabilat, C. 1999. Mycobacterium species
identification and rifampin resistance testing with high-density
DNA probe arrays. J Clin Microbiol 37:49-55 [0612] Vasiliskov, A.
V., Timofeev, E. N., Surzhikov, S. A., Drobyshev, A. L., Shick, V.
V., Mirzabekov, A. D. 1999. Fabrication of mnicroarray of
gel-immobilized compounds on a chip by copolymerization.
Biotechniques 27:592-4, 596-8, 600 passim [0613] Volokhov, D.,
Chizhikov, V., Chumakov, K., Rasooly, A. 2003. Microarray analysis
of erythromycin resistance determinants. J Appl Microbiol 95:787-98
[0614] Vora, G. J., Meador, C. E., Stenger, D. A., Andreadis, J. D.
2004. Nucleic Acid amplification strategies for DNA
microarray-based pathogen detection. Appl Environ Microbiol
70:3047-54 [0615] Wang, D., Coscoy, L., Zylberberg, M., Avila, P.
C., Boushey, H. A., Ganem, D., DeRisi, J. L. 2002. Microarray-based
detection and genotyping of viral pathogens. Proc Natl Acad Sci U S
A 99:15687-92 [0616] Wang, D., Urisman, A., Liu, Y. T., Springer,
M., Ksiazek, T. G., Erdman, D. D., Mardis, E. R., Hickenbotham, M.,
Magrini, V., Eldred, J., Latreille, J. P., Wilson, R .K., Ganem,
D., DeRisi, J. L. 2003. Viral discovery and sequence recovery using
DNA microarrays. PLoS Biol 1:E2 [0617] Wilson, K. H., Wilson, W.
J., Radosevich, J. L., DeSantis, T. Z., Viswanathan, V. S.,
Kuczmarski, T. A., Andersen, G. L. 2002a. High-density microarray
of small-subunit ribosomal DNA probes. Appl Environ Microbiol
68:2535-41 [0618] Wilson, W. J., Strout, C. L., DeSantis, T. Z.,
Stilwell, J. L., Carrano, A. V., Andersen, G. L. 2002b.
Sequence-specific identification of 18 pathogenic microorganisms
using microarray technology. Mol Cell Probes 16:119-27 [0619] Yang,
I. V., Chen, E., Hasseman, J. P., Liang, W., Frank, B. C., Wang,
S., Sharov, V., Saeed, A. I., White, J., Li, J., Lee, N. H.,
Yeatman, T. J., Quackenbush, J. 2002. Within the fold: assessing
differential expression measures and reproducibility in microarray
assays. Genome Biol 3:research0062
Sequence CWU 0 SQTB SEQUENCE LISTING The patent application
contains a lengthy "Sequence Listing" section. A copy of the
"Sequence Listing" is available in electronic form from the USPTO
web site
(http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20060210967A1).
An electronic copy of the "Sequence Listing" will also be available
from the USPTO upon request and payment of the fee set forth in 37
CFR 1.19(b)(3).
0 SQTB SEQUENCE LISTING The patent application contains a lengthy
"Sequence Listing" section. A copy of the "Sequence Listing" is
available in electronic form from the USPTO web site
(http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20060210967A1).
An electronic copy of the "Sequence Listing" will also be available
from the USPTO upon request and payment of the fee set forth in 37
CFR 1.19(b)(3).
* * * * *
References