U.S. patent application number 17/330338 was filed with the patent office on 2021-09-09 for methods and compositions for enriching non-host sequences in host samples.
The applicant listed for this patent is University of Alaska Fairbanks. Invention is credited to Jiguo Chen, Fang Ge.
Application Number | 20210277470 17/330338 |
Document ID | / |
Family ID | 1000005611916 |
Filed Date | 2021-09-09 |
United States Patent
Application |
20210277470 |
Kind Code |
A1 |
Chen; Jiguo ; et
al. |
September 9, 2021 |
METHODS AND COMPOSITIONS FOR ENRICHING NON-HOST SEQUENCES IN HOST
SAMPLES
Abstract
Disclosed are compositions and methods for enriching a non-host
sequence from a host sample. Also disclosed are compositions and
methods for detecting a non-host sequence in a host sample. For
example, a pathogen can be enriched and detected in a sample taken
from a human without knowing what the pathogen is.
Inventors: |
Chen; Jiguo; (Fairbanks,
AK) ; Ge; Fang; (Fairbanks, AK) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
University of Alaska Fairbanks |
Fairbanks |
AK |
US |
|
|
Family ID: |
1000005611916 |
Appl. No.: |
17/330338 |
Filed: |
May 25, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15101590 |
Jun 3, 2016 |
11035000 |
|
|
PCT/US2014/068644 |
Dec 4, 2014 |
|
|
|
17330338 |
|
|
|
|
61911642 |
Dec 4, 2013 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
C12Q 1/6874 20130101;
C12Q 1/6888 20130101; C12Q 1/6806 20130101; C12N 15/1003
20130101 |
International
Class: |
C12Q 1/6874 20060101
C12Q001/6874; C12Q 1/6806 20060101 C12Q001/6806; C12N 15/10
20060101 C12N015/10; C12Q 1/6888 20060101 C12Q001/6888 |
Goverment Interests
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
[0002] This invention was made with government support under
P20GM103395 awarded by Alaska INBRE through the National Institute
of General Medical Sciences of the National Institutes of Health.
The government has certain rights in the invention.
Claims
1.-74. (canceled)
75. A method of enriching non-human nucleic acids from a human
sample comprising selectively amplifying nucleic acids isolated
from the human sample using non-human target primers to form an
enriched population of non-human nucleic acids, wherein the
non-human target primers do not bind to at least the top 1,000
human B cell transcripts.
76. The method of claim 75, wherein the nucleic acids isolated from
the human sample are selected from the group consisting of DNA and
RNA.
77. The method of claim 75 further comprising performing
subtractive hybridization against a population of reference human
cDNAs, wherein the subtractive hybridization results in a further
enriched population of non-human nucleic acids.
78. The method of claim 75, wherein the top 1,000 human B cell
transcripts comprise at least 65% of all human B cell
transcripts.
79. The method of claim 75, wherein the top 1,000 human B cell
transcripts are greater than 200 base pairs in length.
80. The method of claim 75, wherein the non-human target primers do
not bind to at least the top 2000, 3000, 4000, 5000, 6000, 7000,
8000, 9000, 10000, 20000 human B cell transcripts or whole human B
cell transcriptome.
81. The method of claim 75, wherein the non-human nucleic acids are
pathogenic sequences.
82. The method of claim 81, wherein the pathogenic sequences are
from viruses, bacteria, fungi, or any infectious agents.
83. A method of enriching non-human nucleic acids from a human
sample comprising hybridizing a set of non-human target primers to
total RNA or mRNA isolated from the human sample and selectively
reverse transcribing total RNA or mRNA isolated from the human
sample to form an enriched population of non-human cDNA strands,
wherein the set of non-human target primers do not hybridize to at
least the top 1,000 human B cell transcripts.
84. The method of claim 83 further comprising performing
subtractive hybridization against a population of reference human
cDNAs, wherein the subtractive hybridization results in a further
enriched population of non-human cDNAs.
85. The method of claim 83, wherein the top 1,000 human transcripts
comprise at least 65% of all human B cell transcripts.
86. The method of claim 83, wherein the top 1,000 human B cell
transcripts are greater than 200 base pairs in length.
87. The method of claim 83, wherein the non-human target primers do
not bind to at least the top 2000, 3000, 4000, 5000, 6000, 7000,
8000, 9000, 10000, 20000 human transcripts or whole human
transcriptome.
88. The method of claim 83, wherein the non-human nucleic acids are
pathogenic sequences.
89. The method of claim 88, wherein the pathogenic sequences are
from viruses, bacteria, fungi, or any infectious agents.
90. A diagnostic assay, comprising non-human target primers for
amplifying nucleic acids from a human sample to form an enriched
population of non-human nucleic acids, wherein the non-human target
primers do not bind to at least the top 1,000 human B cell
transcripts.
91. The assay of claim 90, wherein the top 1,000 human B cell
transcripts comprise at least 65% of all human B cell
transcripts.
92. The assay of claim 90, wherein the top 1,000 human B cell
transcripts are greater than 200 base pairs in length.
93. The assay of claim 90, wherein the non-human target primers do
not bind to at least the top 2000, 3000, 4000, 5000, 6000, 7000,
8000, 9000, 10000, 20000 human B cell transcripts or whole human B
cell transcriptome.
94. The assay of claim 90, wherein the non-human target primers are
at least the oligonucleotides in FIG. 10 or FIG. 11.
Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims benefit of U.S. Provisional
Application No. 61/911,642, filed Dec. 4, 2013, incorporated herein
by reference in its entirety.
BACKGROUND
[0003] There is an increasing demand for public health laboratories
to provide diagnoses and references of human pathogens in an
accurate and timely manner, particularly in the context of an
infectious disease outbreak where the pathogen is unknown. The
biggest challenge with traditional diagnosis is people have to know
what they are looking for. For example, a respiratory infection
outbreak may be caused by any one of dozens of human pathogens, but
the current practice can only detect one or a few of these
pathogens at a time. This often results in delayed treatment and
inconclusive diagnoses like the case of the SARS-coronavirus
outbreak in China. This practice becomes especially problematic
when an unknown infection or a complicated infectious disease case
occurs. When unsure about the source of an infection, physicians
must order a large number of diagnostic tests to cover a spectrum
of pathogens. This is time intensive, financially costly, and
physically uncomfortable for patients who must provide blood,
sputum, and other tissue samples to support numerous diagnostic
tests. There has been no alternative to this approach for diagnosis
until very recently, with the development of next generation
sequencing (NGS) technologies. The NGS technologies, including 2nd
and 3rd generation DNA sequencing platforms, have started a
revolution in genomics and provided opportunities for its broad
application in many fields, including in the discovery of human
pathogens. Currently, NGS technology is used as a research tool,
rather than a diagnostic tool. The current limitations of NGS
technology is due to the scarcity of pathogen sequences in human
clinical samples, necessary subsequent requirement of extensive
deep sequencing, and the complexity of bioinformatics analysis
required in order to identify the pathogenic sequences. For
example, the average viral genome in a human clinical sample is
about 1-100 per 10 million human genome sequence reads, which
usually require a deep sequencing and subsequent bioinformatics
analysis to identify the viral sequences. It is a big challenge for
diagnostic laboratories to diagnose unknown infections in a fast,
accurate, and comprehensive manner. Many laboratories have
developed various strategies to address this challenge, from
consensus PCR assays that use degenerate primers to computational
subtraction of large sequence data in order to find possible
pathogens--with little success.
BRIEF SUMMARY
[0004] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids.
[0005] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids. In some
aspects, the nucleic acids isolated from the human sample can be
DNA or RNA.
[0006] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising selectively reverse transcribing
total or mRNA isolated from the human sample to form first cDNA
strands using non-human target primers, wherein the first cDNA
strands form an enriched population of non-human nucleic acids.
[0007] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising reverse transcribing total RNA or
mRNA isolated from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic
acids.
[0008] Disclosed are methods of detecting non-host nucleic acids in
a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids; and detecting the
non-host nucleic acids by sequencing the enriched population of
non-host nucleic acids.
[0009] Disclosed are methods of detecting non-host nucleic acids in
a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids; and detecting the
non-host nucleic acids by sequencing the enriched population of
non-host nucleic acids, further comprising performing subtractive
hybridization against a reference population of host cDNAs, wherein
the subtractive hybridization results in a further enriched
population of cDNAs, wherein the subtractive hybridization occurs
prior to step of detecting.
[0010] Disclosed are methods of detecting a non-host sequence in a
host sample comprising: selectively reverse transcribing total RNA
or mRNA isolated from the host sample to form first cDNA strands
using non-host target primers, wherein the first cDNA strands form
an enriched population of non-host nucleic acids; and detecting the
non-host sequence by sequencing the enriched population of non-host
nucleic acids.
[0011] Disclosed are methods of detecting a non-host sequence in a
host sample comprising: selectively reverse transcribing total RNA
or mRNA isolated from the host sample to form first cDNA strands
using non-host target primers, wherein the first cDNA strands form
an enriched population of non-host nucleic acids; and detecting the
non-host sequence by sequencing the enriched population of non-host
nucleic acids, further comprising performing subtractive
hybridization against a reference population of host cDNAs, wherein
the subtractive hybridization results in a further enriched
population of cDNAs, wherein the subtractive hybridization occurs
prior to the step of detecting the non-host sequence by sequencing
the enriched population of non-host nucleic acids.
[0012] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands;
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids;
and detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids.
[0013] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands;
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids;
and detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids, further comprising performing
subtractive hybridization against a reference population of host
cDNAs, wherein the subtractive hybridization results in a further
enriched population of cDNAs, wherein the subtractive hybridization
occurs prior to step of detecting.
[0014] The disclosed methods can further comprise performing
subtractive hybridization against a population of reference human
cDNAs, wherein the subtractive hybridization results in a further
enriched population of non-human nucleic acids. Non-human target
primers can be eight, nine, ten, or eleven nucleotides in length.
In some aspects, non-human target primers do not hybridize to the
most abundant human transcripts. In some aspects, the most abundant
human transcripts can comprise at least 65% of all human
transcripts. In some aspects, the most abundant human transcripts
can be greater than 200 base pairs in length. In some aspects, the
most abundant human transcripts can be the 1000, 2000, 3000, 4000,
5000, 6000, 7000, 8000, 9000, 10000, 20000 most abundant human
transcripts or whole human transcriptome identified in the ENCODE
database.
[0015] Non-human target primers can comprise one or more of the
oligonucleotides in FIG. 10 or 11.
[0016] Non-human nucleic acids can be pathogenic sequences. For
example, pathogenic sequences can be from viruses, bacteria, fungi,
or any infectious agents.
[0017] Disclosed are methods of enriching non-host nucleic acids in
a host sample, wherein the non-host nucleic acids can be, but is
not limited to, non-human nucleic acids. The non-human nucleic
acids can be pathogenic sequences. For example, the pathogenic
sequences can be sequences from viruses, bacteria, fungi, protozoa,
parasites or any infectious agent.
[0018] Disclosed are methods of detecting non-host sequences in a
host sample, wherein detecting non-host sequences can be achieved
by detecting the enriched population of non-host nucleic acids. The
detection can be performed using any known molecular biology
technique, such as NGS technology, Sanger sequencing, ensemble
sequencing, capillary electrophoresis, single molecule sequencing,
hybridization, or microarray.
[0019] Additional advantages of the disclosed method and
compositions will be set forth in part in the description which
follows, and in part will be understood from the description, or
may be learned by practice of the disclosed method and
compositions. The advantages of the disclosed method and
compositions will be realized and attained by means of the elements
and combinations particularly pointed out in the appended claims.
It is to be understood that both the foregoing general description
and the following detailed description are exemplary and
explanatory only and are not restrictive of the invention as
claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] The accompanying drawings, which are incorporated in and
constitute a part of this specification, illustrate several
embodiments of the disclosed method and compositions and together
with the description, serve to explain the principles of the
disclosed method and compositions.
[0021] FIG. 1 is a schematic representation of one embodiment, the
PATHseq (Preferential Amplification of Pathogenic Sequences)
method. (1) Total mRNAs from clinical sample, including human mRNAs
and relatively scarce pathogenic mRNAs; (2) Total mRNAs are
transcribed into first strand cDNAs with P1 primer; (3) RNase H
cleaves RNAs in RNA-DNA duplex; (4) Reverse transcriptase (RT)
synthesizes secondary cDNA strands with P2 primers; (5) T7 RNA
polymerase synthesizes RNAs in the presence of T7 promoter; (6)
Synthesized anti-sense RNAs; (7) Synthesized RNAs are hybridized to
human reference (non-pathogenic) cDNA library. RNase H cleaves
bound RNAs (human RNAs) in RNA-DNA duplex; (8) Pathogenic RNAs are
enriched; (9) Reverse transcription; (10) RNase H cleaves RNAs in
RNA-DNA duplex; (11) T7 RNA polymerase synthesizes RNAs; (12) New
RNAs synthesized from enriched pathogenic RNAs are amplified
100-1000 fold.
[0022] FIG. 2 shows P1 (Poly d(T)-T7 promoter) primer used to form
first cDNA strands.
[0023] FIG. 3 shows numbers of oligonucleotides that do not match
the sequences of human transcripts.
[0024] FIG. 4 shows Next Generation Sequencing (NGS) instruments
and comparison of their capacities.
[0025] FIGS. 5A and 5B are tables showing the generation of sample
identifiers (barcodes) through the combination of 20+20
adaptors.
[0026] FIG. 6 shows the number of short oligos that do not match to
the most abundant human transcripts.
[0027] FIG. 7 shows the percentage of human viruses that can be
covered by short oligos. K-mers total number: Number of total short
oligos that match to the human virome (can have multiple matches
within one virus); K-mers number in short list: Number of total
short oligos that match at least one site for each virus.
[0028] FIG. 8 is an example operating environment.
[0029] FIG. 9 is a table showing the percent of the sequences in
the virome covered by the oligos.
[0030] FIG. 10 is a table showing 179 10mers that exclude the top
10,000 human transcripts.
[0031] FIG. 11 is a table showing 171 10mers that exclude the top
20,000 human transcripts.
DETAILED DESCRIPTION
[0032] The disclosed method and compositions may be understood more
readily by reference to the following detailed description of
particular embodiments and the example included therein and to the
Figures and their previous and following description.
[0033] It is to be understood that the disclosed method and
compositions are not limited to specific synthetic methods,
specific analytical techniques, or to particular reagents unless
otherwise specified, and, as such, may vary. It is also to be
understood that the terminology used herein is for the purpose of
describing particular embodiments only and is not intended to be
limiting.
[0034] Disclosed are materials, compositions, and components that
can be used for, can be used in conjunction with, can be used in
preparation for, or are products of the disclosed method and
compositions. These and other materials are disclosed herein, and
it is understood that when combinations, subsets, interactions,
groups, etc. of these materials are disclosed that while specific
reference of each various individual and collective combinations
and permutation of these compounds may not be explicitly disclosed,
each is specifically contemplated and described herein. For
example, if a class of molecules A, B, and C are disclosed as well
as a class of molecules D, E, and F and an example of a combination
molecule, A-D is disclosed, then even if each is not individually
recited, each is individually and collectively contemplated. Thus,
in this example, each of the combinations A-E, A-F, B-D, B-E, B-F,
C-D, C-E, and C-F are specifically contemplated and should be
considered disclosed from disclosure of A, B, and C; D, E, and F;
and the example combination A-D. Likewise, any subset or
combination of these is also specifically contemplated and
disclosed. Thus, for example, the sub-group of A-E, B-F, and C-E
are specifically contemplated and should be considered disclosed
from disclosure of A, B, and C; D, E, and F; and the example
combination A-D. This concept applies to all aspects of this
application including, but not limited to, steps in methods of
making and using the disclosed compositions. Thus, if there are a
variety of additional steps that can be performed it is understood
that each of these additional steps can be performed with any
specific embodiment or combination of embodiments of the disclosed
methods, and that each such combination is specifically
contemplated and should be considered disclosed.
A. Definitions
[0035] The phrase "non-host target primers" refers to
oligonucleotides that can serve as primers that are designed to not
hybridize to host transcripts. In some aspects, the non-host target
primers are designed such that they do not hybridize to the most
abundant host transcripts.
[0036] The phrase "non-human target primers" refers to
oligonucleotides that can serve as primers that are designed to not
hybridize to one or more human transcripts. In some aspects, the
non-human target primers are designed such that they do not
hybridize to the most abundant human transcripts. For example, the
most abundant human transcripts can be the top 1000, 2000, or 4000
transcripts found in the human transcriptome. Non-human target
primers can be eight, nine, then or eleven nucleotides in length.
In some aspects, the non-human target primers can comprise one or
more of the sequences in Tables 5-8 or 3. For example, the
non-human target primers can be one or more of the sequences in
Table 6.
[0037] "Non-host nucleic acids" refers to nucleic acids exogenous
to the nucleic acids of the host. Non-host nucleic acids can also
be referred to as non-host sequences.
[0038] "Non-human nucleic acids" refers to exogenous nucleic acids
to the nucleic acids of a human. Non-human nucleic acids can also
be referred to as non-human sequences.
[0039] "Host specific DNA" refers to the host's own DNA. For
example, if the host is human then human specific DNA refers to DNA
present in the human genome. If a human had a viral infection, the
human specific DNA would not include the viral DNA present in the
human.
[0040] The term "host" refers to the biological organism from which
nucleic acids are isolated. For example, a host can be, but is not
limited to, a human, plant, animal such as dog, cat, horse, or
cow.
[0041] The term "host sample" refers to a biological sample
obtained from the host. Examples of biological samples include, but
are not limited to, biological fluids, cells, tissue, hair, and any
combinations thereof. Examples of biological fluids include, but
are not limited to, saliva, blood, sputum, urine, an aspirate, a
secretion, and any combinations thereof.
[0042] The phrase "selectively amplifying" refers to the
amplification of nucleic acids with primers that allow for
amplification of selected targets (i.e. non-host nucleic
acids).
[0043] "Reference host cDNA population" refers to a control cDNA
population wherein the control is a control for the host sample. A
reference cDNA population is a cDNA population comprising
substantially pure host cDNAs and free from other substances or
non-host cDNAs. For example, if the host is a human, then a
reference host cDNA population would be a pathogen-free human cDNA
population.
[0044] The term "purification" or "pure" (e.g., with respect to a
cDNA population or a composition containing a pathogen) refers to
the process of removing components from a composition, the presence
of which is not desired. Purification is a relative term, and does
not require that all traces of the undesirable component be removed
from the composition. In the context of a reference host cDNA
population, purification includes such processes as centrifugation,
amplification or precipitation. Thus, the term "purified" does not
require absolute purity; rather, it is intended as a relative term.
Thus, for example, a purified cDNA preparation or population is one
in which the cDNA is more enriched than it is in its generative
environment, for instance within a cell or population of cells or
other nucleic acids in which it is replicated naturally or in an
artificial environment. A preparation of substantially pure cDNA
population can be purified such that the desired cDNA represents at
least 50% of the total nucleic acid content of the preparation. In
certain embodiments, a substantially pure cDNA population will
represent at least 60%, at least 70%, at least 80%, at least 85%,
at least 90%, or at least 95% or more of the total nucleic acid
content of the preparation.
[0045] It must be noted that as used herein and in the appended
claims, the singular forms "a", "an", and "the" include plural
reference unless the context clearly dictates otherwise. Thus, for
example, reference to "a non-host target primer" can include a
plurality of such primers, reference to "the host sample" can be a
reference to one or more host samples and equivalents thereof known
to those skilled in the art, and so forth.
[0046] Ranges may be expressed herein as from "about" one
particular value, and/or to "about" another particular value. When
such a range is expressed, also specifically contemplated and
considered disclosed is the range from the one particular value
and/or to the other particular value unless the context
specifically indicates otherwise. Similarly, when values are
expressed as approximations, by use of the antecedent "about," it
will be understood that the particular value forms another,
specifically contemplated embodiment that should be considered
disclosed unless the context specifically indicates otherwise. It
will be further understood that the endpoints of each of the ranges
are significant both in relation to the other endpoint, and
independently of the other endpoint unless the context specifically
indicates otherwise. Finally, it should be understood that all of
the individual values and sub-ranges of values contained within an
explicitly disclosed range are also specifically contemplated and
should be considered disclosed unless the context specifically
indicates otherwise. The foregoing applies regardless of whether in
particular cases some or all of these embodiments are explicitly
disclosed.
[0047] Throughout the description and claims of this specification,
the word "comprise" and variations of the word, such as
"comprising" and "comprises," means "including but not limited to,"
and is not intended to exclude, for example, other additives,
components, integers or steps. In particular, in methods stated as
comprising one or more steps or operations it is specifically
contemplated that each step comprises what is listed (unless that
step includes a limiting term such as "consisting of"), meaning
that each step is not intended to exclude, for example, other
additives, components, integers or steps that are not listed in the
step.
[0048] Unless defined otherwise, all technical and scientific terms
used herein have the same meanings as commonly understood by one of
skill in the art to which the disclosed method and compositions
belong. Although any methods and materials similar or equivalent to
those described herein can be used in the practice or testing of
the present method and compositions, the particularly useful
methods, devices, and materials are as described. Publications
cited herein and the material for which they are cited are hereby
specifically incorporated by reference. Nothing herein is to be
construed as an admission that the present invention is not
entitled to antedate such disclosure by virtue of prior invention.
No admission is made that any reference constitutes prior art. The
discussion of references states what their authors assert, and
applicants reserve the right to challenge the accuracy and
pertinence of the cited documents. It will be clearly understood
that, although a number of publications are referred to herein,
such reference does not constitute an admission that any of these
documents forms part of the common general knowledge in the
art.
B. Methods of Enriching
[0049] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids. The non-host
nucleic acids isolated or obtained from a host sample can be DNA or
RNA.
[0050] In some aspects, the host can be a human. Thus, disclosed
are methods of enriching non-human nucleic acids from a human
sample comprising selectively amplifying nucleic acids isolated
from the human sample using non-human target primers to form an
enriched population of non-human nucleic acids.
[0051] Enriching non-host nucleic acids from a host sample allows
for an increase in the ratio of non-host to host nucleic acids. The
increased ratio of non-host nucleic acids provides an increased
quantity of non-host nucleic acids, which can result in easier
detection of the non-host nucleic acids. For example, if the
initial host sample contains an average 1-10 pathogenic sequences
in 1 million host sequences, the disclosed methods can enrich the
pathogenic sequences about 1,000 times. Thus, about 0.1-1% of the
sample would be enriched pathogenic sequences
[0052] 1. Methods of Enriching DNA
[0053] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids isolated from the host sample are DNA.
[0054] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids isolated from the host sample are DNA, wherein the
non-host target primers are eight, nine, ten, or eleven nucleotides
in length.
[0055] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids isolated from the host sample are DNA, wherein the
non-host target primers do not hybridize to the most abundant host
transcripts. In some aspects, the most abundant host transcripts
comprise at least 65% of all host transcripts. In some aspects, the
most abundant host transcripts can be greater than 200 base pairs
in length. In some aspects, the most abundant host transcripts can
be the 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000,
20000 most abundant human transcripts or whole host transcriptome
identified in the ENCODE database.
[0056] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids isolated from the host sample are DNA, wherein the
non-host target primers comprise one or more of the
oligonucleotides in Table 5-8.
[0057] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids isolated from the host sample are DNA, wherein the
non-host sequences can be pathogenic sequences. In some aspects,
the pathogenic sequences can be from viruses, bacteria, fungi, or
any infectious agents.
[0058] In some aspects, the host can be a human. Thus, disclosed
are methods of enriching non-human nucleic acids from a human
sample comprising selectively amplifying nucleic acids isolated
from the human sample using non-human target primers to form an
enriched population of non-human nucleic acids, wherein the nucleic
acids isolated from the human sample are DNA.
[0059] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids, wherein the
nucleic acids isolated from the human sample are DNA, wherein the
non-human target primers are eight, nine, ten, or eleven
nucleotides in length.
[0060] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids, wherein the
nucleic acids isolated from the human sample are DNA, wherein the
non-human target primers do not hybridize to the most abundant
human transcripts. In some aspects, the most abundant human
transcripts comprise at least 65% of all human transcripts. In some
aspects, the most abundant human transcripts can be greater than
200 base pairs in length. In some aspects, the most abundant human
transcripts can be the 1000, 2000, 3000, 4000, 5000, 6000, 7000,
8000, 9000, 10000, 20000 most abundant human transcripts or whole
human transcriptome identified in the ENCODE database.
[0061] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids, wherein the
nucleic acids isolated from the human sample are DNA, wherein the
non-human target primers comprise one or more of the
oligonucleotides in FIG. 10 or 11.
[0062] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids, wherein the
nucleic acids isolated from the human sample are DNA, wherein the
non-human nucleic acids can be pathogenic sequences. In some
aspects, the pathogenic sequences can be from viruses, bacteria,
fungi, or any infectious agents.
[0063] i. Isolating
[0064] Methods of enriching non-host nucleic acids from a host
sample can include the isolation of DNA from a host sample. DNA
isolated from a host sample can include host specific DNA as well
as DNA from any non-host pathogens that may be present within the
host. DNA can be isolated from a host sample and used as a template
for non-host target primers. Techniques well-known in the art can
be used to DNA from a sample.
[0065] ii. Selectively Amplifying
[0066] Selectively amplifying DNA isolated from a host sample with
non-host target primers can form an enriched population of non-host
nucleic acids.
[0067] Non-host target primers can be used to selectively amplify
non-host cDNA. The non-host target primers are designed to
hybridize to non-host sequences and not to hybridize to host
sequences. In some aspects, the non-host target primers do not
hybridize to the most abundant host transcripts. For example,
non-human target primers can be used that do not hybridize to the
most abundant human transcripts. Any of the non-host target primers
disclosed herein can be used. For example, when the host is human,
the non-host (i.e. non-human) target primers can be one or more of
the primers comprising the sequences in FIG. 10 or 11. In some
aspects, the non-human target primers can be one or more sequences
in Table 3.
[0068] Non-host target primers can hybridize to non-host DNA and
can serve to prime synthesis of non-host DNA.
[0069] In some aspects, RNAs can be synthesized from dsDNA. Any DNA
dependent RNA polymerase can be used for synthesis of RNA from
dsDNA. For example, T7 RNA polymerase can be used. Other RNA
polymerases such as, but not limited to, T3 and SP6 RNA polymerases
can also be used.
[0070] The resulting RNAs can be RNA copies of the selectively
amplified DNAs. Therefore, the RNA copies can be considered
selective RNAs or an enriched RNA population. The selective RNAs
comprise RNA copies of non-host DNA sequences. In some aspects,
selective RNAs can be anti-sense RNAs.
[0071] Selective amplification results in an enriched population of
non-host nucleic acids. In some aspects, the enriched population of
non-host nucleic acids can comprise a DNA population that contains
a higher percentage of non-host nucleic acids compared to the DNA
population before selective amplification.
[0072] iii. Subtractive Hybridization
[0073] In some aspects, subtractive hybridization can be performed
after the selective amplification. Thus, disclosed are methods of
enriching non-host nucleic acids from a host sample comprising
selectively amplifying nucleic acids isolated from the host sample
using non-host target primers to form an enriched population of
non-host nucleic acids, wherein the nucleic acids are DNA, further
comprising performing subtractive hybridization against a
population of reference host nucleic acids, wherein the subtractive
hybridization results in a further enriched population of non-host
nucleic acids. The reference host nucleic acids can be, but are not
limited to, cDNAs.
[0074] Because the host sample can be a human sample, the disclosed
methods include methods of enriching non-human nucleic acids from a
human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids, wherein the
nucleic acids are DNA, further comprising performing subtractive
hybridization against a population of reference human nucleic
acids, wherein the subtractive hybridization results in a further
enriched population of non-human nucleic acids.
[0075] Performing subtractive hybridization can further enrich the
enriched population of non-host nucleic acids previously enriched
by selectively amplifying non-host sequences with non-host target
primers.
[0076] Subtractive hybridization comprises the subtraction or
removal of unwanted nucleic acids (e.g. host nucleic acids).
Subtractive hybridization can comprise hybridizing the selective or
enriched nucleic acids with a reference host cDNA population.
Reference host cDNA populations can be created using techniques
well-known in the art. In some instances, a reference host cDNA
population can be one or more commercially available host cDNA
libraries, such as, but not limited to, a human peripheral blood
mononuclear cell (PBMC) cDNA library.
[0077] Subtractive hybridization can comprise hybridization between
selective or enriched nucleic acids and a reference host cDNA
population. For example, the enriched population of non-host
nucleic acids can be reverse transcribed to make a selective or
enriched RNA population. The selective or enriched RNA population
can hybridize to the reference host cDNA population. If any of the
RNAs are host RNAs, then the host RNA can hybridize to the
reference host cDNA population form an RNA/DNA duplex and can be
removed using RNase H. Thus, the remaining RNAs would be the
selective RNAs that did not hybridize to the reference host cDNA
population and therefore would be considered non-host nucleic
acids. The removal of any host RNAs that were present in the
selective or enriched RNA population can result in a further
enriched RNA population comprising non-host RNAs.
[0078] After subtractive hybridization, the further enriched RNA
population can be reverse transcribed and the methods described
below can be performed or the cDNA produced from the reverse
transcription can then be selectively amplified using the method
described herein. For example, the enrichment steps can be repeated
continuously to increase the amount of non-host nucleic acids. In
one aspect, the further enriched RNA population can be reverse
transcribed to form a first cDNA strand; the first cDNA strand can
be selectively amplified with non-host target primers to form a
second enriched population of non-host nucleic acids. The second
enriched population of non-host nucleic acids can undergo selective
hybridization resulting in a second further enriched RNA
population. The cycle of reverse transcription, selective
amplification, and subtractive hybridization can be repeated
indefinitely.
[0079] 2. Methods of Enriching RNA
[0080] i. Selective Reverse Transcription
[0081] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively reverse transcribing
total or mRNA isolated from the host sample to form first cDNA
strands using non-host target primers, wherein the first cDNA
strands form an enriched population of non-host nucleic acids.
[0082] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively reverse transcribing
total or mRNA isolated from the host sample to form first cDNA
strands using non-host target primers, wherein the first cDNA
strands form an enriched population of non-host nucleic acids,
wherein the non-host target primers are eight, nine, ten, or eleven
nucleotides in length.
[0083] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively reverse transcribing
total or mRNA isolated from the host sample to form first cDNA
strands using non-host target primers, wherein the first cDNA
strands form an enriched population of non-host nucleic acids,
wherein the non-host target primers do not hybridize to the most
abundant host transcripts. In some aspects, the most abundant host
transcripts comprise at least 65% of all host transcripts. In some
aspects, the most abundant host transcripts can be greater than 200
base pairs in length. In some aspects, the most abundant host
transcripts can be the 1000, 2000, 3000, 4000, 5000, 6000, 7000,
8000, 9000, 10000, 20000 most abundant human transcripts or whole
host transcriptome identified in the ENCODE database.
[0084] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively reverse transcribing
total or mRNA isolated from the host sample to form first cDNA
strands using non-host target primers, wherein the first cDNA
strands form an enriched population of non-host nucleic acids,
wherein the non-host target primers comprise the complement of one
or more of the oligonucleotides in Table 5-8.
[0085] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising selectively reverse transcribing
total or mRNA isolated from the host sample to form first cDNA
strands using non-host target primers, wherein the first cDNA
strands form an enriched population of non-host nucleic acids,
wherein the non-host sequences can be pathogenic sequences. In some
aspects, the pathogenic sequences can be from viruses, bacteria,
fungi, or any infectious agents.
[0086] In some aspects the host can be a human. Thus, disclosed are
methods of enriching non-human nucleic acids from a human sample
comprising selectively reverse transcribing total or mRNA isolated
from the human sample to form first cDNA strands using non-human
target primers, wherein the first cDNA strands form an enriched
population of non-human nucleic acids.
[0087] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising selectively reverse transcribing
total or mRNA isolated from the human sample to form first cDNA
strands using non-human target primers, wherein the first cDNA
strands form an enriched population of non-human nucleic acids,
wherein the non-human target primers are eight, nine, ten, or
eleven nucleotides in length.
[0088] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising selectively reverse transcribing
total or mRNA isolated from the human sample to form first cDNA
strands using non-human target primers, wherein the first cDNA
strands form an enriched population of non-human nucleic acids,
wherein the non-human target primers do not hybridize to the most
abundant human transcripts. In some aspects, the most abundant
human transcripts comprise at least 65% of all human transcripts.
In some aspects, the most abundant human transcripts can be greater
than 200 base pairs in length. In some aspects, the most abundant
human transcripts can be the 1000, 2000, 3000, 4000, 5000, 6000,
7000, 8000, 9000, 10000, 20000 most abundant human transcripts or
whole human transcriptome identified in the ENCODE database.
[0089] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising selectively reverse transcribing
total or mRNA isolated from the human sample to form first cDNA
strands using non-human target primers, wherein the first cDNA
strands form an enriched population of non-human nucleic acids,
wherein the non-human target primers comprise one or more of the
oligonucleotides in Table 5-8.
[0090] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising selectively reverse transcribing
total or mRNA isolated from the human sample to form first cDNA
strands using non-human target primers, wherein the first cDNA
strands form an enriched population of non-human nucleic acids,
wherein the non-human nucleic acids can be pathogenic sequences. In
some aspects, the pathogenic sequences can be from viruses,
bacteria, fungi, or any infectious agents.
[0091] a. Isolating
[0092] Methods of enriching non-host nucleic acids from a host
sample can include the isolation of total RNA or mRNA from a host
sample. Total RNA or mRNA can be isolated from a host sample and
used as a template for creating first cDNA strands. Techniques
well-known in the art can be used to isolate total RNA or mRNA from
a sample.
[0093] b. Reverse Transcribing and Selective Amplification
Combined
[0094] Non-host target primers can be used to selectively amplify
total RNA or mRNA. The non-host target primers are designed to
hybridize non-host sequences and not to hybridize to host
sequences. The complement of any of the non-host target primers
disclosed herein can be used. For example, when the host is human,
the non-host (i.e. non-human) target primers can be one or more of
the primers comprising the complement of the sequences in FIG. 10
or 11. In some aspects, the non-human target primers can be one or
more of the complement of the sequences in Table 3.
[0095] Non-host target primers can hybridize to non-host total RNA
or mRNA and can serve to prime synthesis of non-host cDNA
strands.
[0096] The combination of reverse transcription and selective
amplification can also be known as selective reverse
transcription.
[0097] c. Subtractive Hybridization
[0098] In some aspects, a further step of subtractive hybridization
can be performed.
[0099] Thus, disclosed are methods of enriching non-host nucleic
acids from a host sample comprising selectively reverse
transcribing total or mRNA isolated from the host sample to form
first cDNA strands using non-host target primers, wherein the first
cDNA strands form an enriched population of non-host nucleic acids,
further comprising performing subtractive hybridization against a
population of reference host cDNAs, wherein the subtractive
hybridization results in a further enriched population of non-host
nucleic acids.
[0100] Because the host sample can be a human sample, the disclosed
methods include methods of enriching non-human nucleic acids from a
human sample comprising selectively reverse transcribing total or
mRNA isolated from the human sample to form first cDNA strands
using non-human target primers, wherein the first cDNA strands form
an enriched population of non-human nucleic acids, further
comprising performing subtractive hybridization against a
population of reference human cDNAs, wherein the subtractive
hybridization results in a further enriched population of non-human
cDNAs.
[0101] Performing subtractive hybridization can further enrich the
enriched population of non-host nucleic acids previously enriched
by selectively reverse transcribing non-host sequences with
non-host target primers.
[0102] Subtractive hybridization comprises the subtraction or
removal of unwanted nucleic acids (e.g. host nucleic acids).
Subtractive hybridization can comprise hybridizing the selective or
enriched RNAs with a reference host cDNA population. Reference host
cDNA populations can be created using techniques well-known in the
art. In some instances, a reference host cDNA population can be one
or more commercially available host cDNA libraries, such as, but
not limited to, a human peripheral blood mononuclear cell (PBMC)
cDNA library.
[0103] The enriched population of non-host nucleic acids can be
reverse transcribed to form an enriched or selective population of
RNA. Subtractive hybridization can comprise hybridization between
selective or enriched RNA and a reference host cDNA population. Any
host RNAs present in the selective RNAs can hybridize to the
reference host cDNA population. Host RNAs hybridized to the
reference host cDNA population form an RNA/DNA duplex and can be
removed using RNase H. Thus, the remaining RNAs would be the
selective RNAs that did not hybridize to the reference host cDNA
population and therefore would be considered non-host nucleic
acids. The removal of any host RNAs that were present in the
selective or enriched RNA population can result in a further
enriched RNA population comprising non-host RNAs.
[0104] After subtractive hybridization, the further enriched RNA
population can be treated as the original RNA obtained from the
sample and the further enriched RNA population can be used to
repeat one or more of the steps described above. For example, the
enrichment steps can be repeated continuously to increase the
amount of non-host nucleic acids. In one aspect, the further
enriched RNA population can be selectively reverse transcribed
using non-host target primers to form a second enriched population
of non-host nucleic acids. The cycle of selective reverse
transcription and subtractive hybridization can be repeated
indefinitely.
[0105] ii. Reverse Transcription and Then Selective
Amplification
[0106] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic
acids.
[0107] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids,
wherein the non-host target primers are eight, nine, ten, or eleven
nucleotides in length.
[0108] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids,
wherein the non-host target primers do not hybridize to the most
abundant host transcripts. In some aspects, the most abundant host
transcripts comprise at least 65% of all host transcripts. In some
aspects, the most abundant host transcripts can be greater than 200
base pairs in length. In some aspects, the most abundant host
transcripts can be the 1000, 2000, 3000, 4000, 5000, 6000, 7000,
8000, 9000, 10000, 20000 most abundant human transcripts or whole
host transcriptome identified in the ENCODE database.
[0109] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids,
wherein the non-host target primers comprise one or more of the
oligonucleotides in FIG. 10 or 11.
[0110] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids,
wherein the non-host sequences can be pathogenic sequences. In some
aspects, the pathogenic sequences can be from viruses, bacteria,
fungi, or any infectious agents.
[0111] In some aspects, the host can be a human. Thus, also
disclosed are methods of enriching non-human nucleic acids from a
human sample comprising reverse transcribing total RNA or mRNA
obtained from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic
acids.
[0112] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising reverse transcribing total RNA or
mRNA obtained from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic acids,
wherein the non-human target primers are eight, nine, ten, or
eleven nucleotides in length.
[0113] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising reverse transcribing total RNA or
mRNA obtained from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic acids,
wherein the non-human target primers do not hybridize to the most
abundant human transcripts. In some aspects, the most abundant
human transcripts comprise at least 65% of all human transcripts.
In some aspects, the most abundant human transcripts can be greater
than 200 base pairs in length. In some aspects, the most abundant
human transcripts can be the 1000, 2000, 3000, 4000, 5000, 6000,
7000, 8000, 9000, 10000, 20000 most abundant human transcripts or
whole human transcriptome identified in the ENCODE database.
[0114] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising reverse transcribing total RNA or
mRNA obtained from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic acids,
wherein the non-human target primers comprise one or more of the
oligonucleotides in FIG. 10 or 11.
[0115] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising reverse transcribing total RNA or
mRNA obtained from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic acids,
wherein the non-human nucleic acids can be pathogenic sequences. In
some aspects, the pathogenic sequences can be from viruses,
bacteria, fungi, or any infectious agents.
[0116] a. Isolating
[0117] Methods of enriching non-host nucleic acids from a host
sample can include the isolation of total RNA or mRNA from a host
sample. Total RNA or mRNA can be isolated from a host sample and
used as a template for creating first cDNA strands. Techniques
well-known in the art can be used to isolate total RNA or mRNA from
a sample.
[0118] b. Reverse Transcribing
[0119] Isolated total RNA or mRNA obtained from a host sample can
be used as the template for creating a first cDNA strand for each
of the isolated total RNA or mRNAs. Methods for reverse
transcribing mRNA are well-known in the art. For example, a poly
d(T) primer in combination with reverse transcriptase can be used
to produce a first cDNA strand from mRNA.
[0120] Reverse transcriptases have RNA dependent DNA polymerase
activity and thus can produce a cDNA strand from an RNA template.
In some instances, reverse transcriptases can further comprise
RNase H activity, which results in the cleavage of RNA from an
RNA-DNA duplex. In some instances, RNase H can be added separately
from the reverse transcriptase.
[0121] Several types of primers can be used during reverse
transcription. For example, the primers can be but are not limited
to random primers, sequence specific primers, and/or poly d(T)
primers. When reverse transcribing total RNA, random primers and
poly d(T) primers are often used since the sequences of the RNA may
not be known.
[0122] In some aspects, the primers can be poly d(T) primers
further comprising an RNA polymerase promoter sequence. Thus, a
poly d(T)-RNA polymerase promoter primer can be used to form first
cDNA strands from a host sample comprising reverse transcribing
mRNA obtained from the host sample. In some aspects, the poly
d(T)-RNA polymerase promoter primer can comprise poly d(T)-T7
promoter sequences as shown in FIG. 2. Promoter sequences for other
RNA polymerases such as, but not limited to, T3 and SP6 RNA
polymerases can also be used.
[0123] c. Selectively Amplifying
[0124] The first cDNA strands produced from reverse transcribing
total RNA or mRNA obtained from a host sample can be used for
selective amplification. Selectively amplifying the first cDNA
strands with non-host target primers can form an enriched
population of non-host nucleic acids.
[0125] Before selectively amplifying the first cDNA strands, the
total RNA or mRNA that served as the template for the first cDNA
strand can be removed. Removal of the total RNA or mRNA can be
performed by adding RNase H before selective amplification. In some
instances, the RNA dependent DNA polymerase used to produce the
first cDNA strand can have RNase H activity. In some instances,
RNase H can be added as a separate enzyme independent of the DNA
polymerase. After removing the total RNA or mRNA, or after
selectively isolating the first cDNA strands, only first cDNA
strands remain.
[0126] Non-host target primers can be used to selectively amplify
non-host cDNA. The non-host target primers are designed to
hybridize non-host sequences and not to hybridize to host
sequences. Any of the non-host target primers disclosed herein can
be used. For example, when the host is human, the non-host (i.e.
non-human) target primers can be one or more of the primers
comprising the sequences in FIG. 10 or 11. In some aspects, the
non-human target primers can be one or more sequences in Table
3.
[0127] Non-host target primers can hybridize to non-host first cDNA
strands and can serve to prime synthesis of second non-host cDNA
strand.
[0128] RNAs can also be synthesized from dsDNA. For example, RNA
can be synthesized from dsDNA comprising non-host first cDNA strand
and non-host second cDNA strands. Any DNA dependent RNA polymerase
can be used for synthesis of RNA from dsDNA. For example, T7 RNA
polymerase can be used. Other RNA polymerases such as, but not
limited to, T3 and SP6 RNA polymerases can also be used.
[0129] The resulting RNAs can be RNA copies of the selectively
amplified cDNAs. Therefore, the RNA copies can be considered
selective RNAs or an enriched RNA population. The selective RNAs
comprise RNA copies of non-host cDNA sequences. In some aspects,
selective RNAs can be anti-sense RNAs.
[0130] Selective amplification results in an enriched population of
non-host nucleic acids. In some aspects, the enriched population of
non-host nucleic acids can comprise a cDNA population that contains
a higher percentage of non-host nucleic acids compared to the cDNA
population before selective amplification. In some aspects, the
enriched population of non-host nucleic acids can comprise a
population of RNAs that contains a higher percentage of non-host
RNA sequences compared to the mRNA obtained from the host
sample.
[0131] d. Subtractive Hybridization
[0132] Disclosed are methods of enriching non-host nucleic acids
from a host sample comprising reverse transcribing mRNA isolated
from the host sample to form first cDNA strands; and selectively
amplifying the first cDNA strands with non-host target primers to
form an enriched population of non-host nucleic acids, further
comprising performing subtractive hybridization against a
population of reference host cDNAs, wherein the subtractive
hybridization results in a further enriched population of non-human
cDNAs.
[0133] Because the host sample can be a human sample, the disclosed
methods include methods of enriching non-human nucleic acids from a
human sample comprising reverse transcribing mRNA isolated from the
human sample to form first cDNA strands; and selectively amplifying
the first cDNA strands with non-human target primers to form an
enriched population of non-human nucleic acids, further comprising
performing subtractive hybridization against a population of
reference human cDNAs, wherein the subtractive hybridization
results in a further enriched population of non-human cDNAs.
[0134] Performing subtractive hybridization can further enrich the
enriched population of non-host nucleic acids previously enriched
by selectively amplifying non-host sequences with non-host target
primers.
[0135] Subtractive hybridization comprises the subtraction or
removal of unwanted nucleic acids (e.g. host nucleic acids).
Subtractive hybridization can comprise hybridizing the selective or
enriched RNAs with a reference host cDNA population. Reference host
cDNA populations can be created using techniques well-known in the
art. In some instances, a reference host cDNA population can be one
or more commercially available host cDNA libraries, such as, but
not limited to, a human peripheral blood mononuclear cell (PBMC)
cDNA library.
[0136] Subtractive hybridization can comprise hybridization between
selective or enriched RNA and a reference host cDNA population. Any
host RNAs present in the selective RNAs can hybridize to the
reference host cDNA population. Host RNAs hybridized to the
reference host cDNA population form an RNA/DNA duplex and can be
removed using RNase H. Thus, the remaining RNAs would be the
selective RNAs that did not hybridize to the reference host cDNA
population and therefore would be considered non-host nucleic
acids. The removal of any host RNAs that were present in the
selective or enriched RNA population can result in a further
enriched RNA population comprising non-host RNAs.
[0137] After subtractive hybridization, the further enriched RNA
population can be treated as the original RNA obtained from the
sample and the further enriched RNA population can be used to
repeat one or more of the steps described above. For example, the
enrichment steps can be repeated continuously to increase the
amount of non-host nucleic acids. In one aspect, the further
enriched RNA population can be reverse transcribed to form a first
cDNA strand; the first cDNA strand can be selectively amplified
with non-host target primers to form a second enriched population
of non-host nucleic acids. The second enriched population of
non-host nucleic acids can undergo selective hybridization
resulting in a second further enriched RNA population. The cycle of
reverse transcription, selective amplification, and subtractive
hybridization can be repeated indefinitely.
[0138] In some aspects, the subtractive hybridization can be
performed prior to the selective amplification.
C. Methods of Detecting or Diagnosing
[0139] Disclosed are methods of detecting non-host nucleic acids in
a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids; and detecting the
non-host nucleic acids by sequencing the enriched population of
non-host nucleic acids.
[0140] In some instances, the host can be a human. For example,
disclosed are methods of detecting non-human nucleic acids in a
human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-host target primers to
form an enriched population of non-human nucleic acids; and
detecting the non-human nucleic acids by sequencing the enriched
population of non-human nucleic acids.
[0141] Non-host nucleic acids being detected can be pathogenic or
non-pathogenic sequences. For example, the non-host sequences can
be pathogenic sequences from viruses, bacteria, fungi or can be
from any infectious agent. In some instances, the non-host sequence
is unknown. For example, a human sample can be tested to determine
whether a non-human nucleic acid is present in the sample. Once
detected by sequencing, it can be determined whether the non-human
nucleic acid is a pathogenic sequence. In some instances, the
detected pathogenic sequence is unknown. A non-host nucleic acid,
such as a pathogenic sequence, being unknown can mean that the
pathogen sequence was not previously known, or is a new variant of
a known pathogen, or is a sequence polymorphism different from host
reference.
[0142] In an exemplary aspect, the methods and systems can be
implemented on a computer 801 as illustrated in FIG. 8 and
described below. For example, it can be determined whether the
detected non-host (e.g., non-human) sequence is known. For example,
it can be determined whether the non-host (e.g., non-human)
sequence is a pathogenic (or other) sequence. Such a determination
can be made by utilizing a computing device to compare the detected
non-host (e.g., non-human) sequence to one or more other sequences
(e.g., a database of known pathogenic sequences). Databases
comprising full and partial sequences of known pathogens are well
known in the art. For example, a program, such as BLAST, can be
used to determine the percent identity of the detected non-host
sequence to a known pathogen sequence. Known pathogen sequences can
be found in databases similar to the ENCODE database but that are
specific to the many known pathogens. For example, there is a HIV
Database and a Hemorrhagic Fever Viruses Database that comprise
genetic sequences of HIV and Ebola, respectively. The methods and
systems disclosed can utilize one or more computers to perform one
or more functions in one or more locations. FIG. 8 is a block
diagram illustrating an exemplary operating environment 800 for
performing the disclosed methods. This exemplary operating
environment 800 is only an example of an operating environment and
is not intended to suggest any limitation as to the scope of use or
functionality of operating environment architecture. Neither should
the operating environment 800 be interpreted as having any
dependency or requirement relating to any one or combination of
components illustrated in the exemplary operating environment
800.
[0143] The present methods and systems can be operational with
numerous other general purpose or special purpose computing system
environments or configurations. Examples of well-known computing
systems, environments, and/or configurations that can be suitable
for use with the systems and methods comprise, but are not limited
to, personal computers, server computers, laptop devices, and
multiprocessor systems. Additional examples comprise set top boxes,
programmable consumer electronics, network PCs, minicomputers,
mainframe computers, distributed computing environments that
comprise any of the above systems or devices, and the like.
[0144] The processing of the disclosed methods and systems can be
performed by software components. The disclosed systems and methods
can be described in the general context of computer-executable
instructions, such as program modules, being executed by one or
more computers or other devices. Generally, program modules
comprise computer code, routines, programs, objects, components,
data structures, and/or the like that perform particular tasks or
implement particular abstract data types. The disclosed methods can
also be practiced in grid-based and distributed computing
environments where tasks are performed by remote processing devices
that are linked through a communications network. In a distributed
computing environment, program modules can be located in local
and/or remote computer storage media including memory storage
devices.
[0145] Further, one skilled in the art will appreciate that the
systems and methods disclosed herein can be implemented via a
general-purpose computing device in the form of a computer 801. The
computer 801 can comprise one or more components, such as one or
more processors 803, a system memory 812, and a bus 813 that
couples various components of the computer 801 including the one or
more processors 803 to the system memory 812. In the case of
multiple processors 803, the computer 801 can utilize parallel
computing.
[0146] The bus 813 can comprise one or more of several possible
types of bus structures, such as a memory bus, memory controller, a
peripheral bus, an accelerated graphics port, and a processor or
local bus using any of a variety of bus architectures. The bus 813,
and all buses specified in this description can also be implemented
over a wired or wireless network connection and one or more of the
components of the computer 801, such as the one or more processors
803, a mass storage device 804, an operating system 805, sequence
software 806, sequence data 807, a network adapter 808, a system
memory 812, an Input/Output Tnterface 810, a display adapter 809, a
display device 811, and a human machine interface 802 can be
contained within one or more remote computing devices 814a,b,c at
physically separate locations.
[0147] The computer 801 typically comprises a variety of computer
readable media. Exemplary readable media can be any available media
that is accessible by the computer 801 and comprises, for example
and not meant to be limiting, both volatile and non-volatile media,
removable and non-removable media. The system memory 812 can
comprise computer readable media in the form of volatile memory,
such as random access memory (RAM), and/or non-volatile memory,
such as read only memory (ROM). The system memory 812 typically can
comprise data such as sequence data 807 and/or program modules such
as operating system 805 and sequence software 806 that are
accessible to and/or are operated on by the one or more processors
803.
[0148] In another aspect, the computer 801 can also comprise other
removable/non-removable, volatile/non-volatile computer storage
media. The mass storage device 804 can provide non-volatile storage
of computer code, computer readable instructions, data structures,
program modules, and other data for the computer 801. For example,
a mass storage device 804 can be a hard disk, a removable magnetic
disk, a removable optical disk, magnetic cassettes or other
magnetic storage devices, flash memory cards, CD-ROM, digital
versatile disks (DVD) or other optical storage, random access
memories (RAM), read only memories (ROM), electrically erasable
programmable read-only memory (EEPROM), and the like.
[0149] Optionally, any number of program modules can be stored on
the mass storage device 804, including by way of example, an
operating system 805 and sequence software 806. One or more of the
operating system 805 and sequence software 806 (or some combination
thereof) can comprise elements of the program modules and the
sequence software 806. Sequence data 807 can also be stored on the
mass storage device 804. Sequence data 807 can be stored in any of
one or more databases known in the art. Examples of such databases
comprise, DB2.RTM., Microsoft.RTM. Access, Microsoft.RTM. SQL
Server, Oracle.RTM., mySQL, PostgreSQL, and the like. The databases
can be centralized or distributed across multiple locations within
the network 815.
[0150] A user can enter commands and information into the computer
801 via an input device (not shown). Examples of such input devices
comprise, but are not limited to, a keyboard, pointing device
(e.g., a computer mouse, remote control), a microphone, a joystick,
a scanner, tactile input devices such as gloves, and other body
coverings, motion sensor, and the like These and other input
devices can be connected to the one or more processors 803 via a
human machine interface 802 that is coupled to the bus 813, but can
be connected by other interface and bus structures, such as a
parallel port, game port, an IEEE 1394 Port (also known as a
Firewire port), a serial port, network adapter 808, and/or a
universal serial bus (USB).
[0151] In yet another aspect, a display device 811 can also be
connected to the bus 813 via an interface, such as a display
adapter 809. Tt is contemplated that the computer 801 can have more
than one display adapter 809 and the computer 801 can have more
than one display device 811. For example, a display device 811 can
be a monitor, an LCD (Liquid Crystal Display), light emitting diode
(LED) display, television, smart lens, smart glass, and/or a
projector. In addition to the display device 811, other output
peripheral devices can comprise components such as speakers (not
shown) and a printer (not shown) which can be connected to the
computer 801 via Input/Output Interface 810. Any step and/or result
of the methods can be output in any form to an output device. Such
output can be any form of visual representation, including, but not
limited to, textual, graphical, animation, audio, tactile, and the
like. The display 811 and computer 801 can be part of one device,
or separate devices.
[0152] The computer 801 can operate in a networked environment
using logical connections to one or more remote computing devices
814a,b,c. By way of example, a remote computing device 814a,b,c can
be a personal computer, computing station (e.g., workstation),
portable computer (e.g., laptop, mobile phone, tablet device),
smart device (e.g., smartphone, smart watch, activity tracker,
smart apparel, smart accessory), security and/or monitoring device,
a server, a router, a network computer, a peer device, edge device
or other common network node, and so on. Logical connections
between the computer 801 and a remote computing device 814a,b,c can
be made via a network 815, such as a local area network (LAN)
and/or a general wide area network (WAN). Such network connections
can be through a network adapter 808. A network adapter 808 can be
implemented in both wired and wireless environments. Such
networking environments are conventional and commonplace in
dwellings, offices, enterprise-wide computer networks, intranets,
and the Internet.
[0153] For purposes of illustration, application programs and other
executable program components such as the operating system 805 are
illustrated herein as discrete blocks, although it is recognized
that such programs and components can reside at various times in
different storage components of the computing device 801, and are
executed by the one or more processors 803 of the computer 801. An
implementation of sequence software 806 can be stored on or
transmitted across some form of computer readable media. Any of the
disclosed methods can be performed by computer readable
instructions embodied on computer readable media. Computer readable
media can be any available media that can be accessed by a
computer. By way of example and not meant to be limiting, computer
readable media can comprise "computer storage media" and
"communications media." "Computer storage media" can comprise
volatile and non-volatile, removable and non-removable media
implemented in any methods or technology for storage of information
such as computer readable instructions, data structures, program
modules, or other data. Exemplary computer storage media can
comprise RAM, ROM, EEPROM, flash memory or other memory technology,
CD-ROM, digital versatile disks (DVD) or other optical storage,
magnetic cassettes, magnetic tape, magnetic disk storage or other
magnetic storage devices, or any other medium which can be used to
store the desired information and which can be accessed by a
computer.
[0154] The methods and systems can employ artificial intelligence
(AI) techniques such as machine learning and iterative learning.
Examples of such techniques include, but are not limited to, expert
systems, case based reasoning, Bayesian networks, behavior based
AI, neural networks, fuzzy systems, evolutionary computation (e.g.
genetic algorithms), swarm intelligence (e.g. ant algorithms), and
hybrid intelligent systems (e.g. Expert inference rules generated
through a neural network or production rules from statistical
learning).
[0155] 1. Starting from a DNA Sample
[0156] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids isolated from the host sample are DNA; and detecting
the non-host sequence by sequencing the enriched population of
non-host nucleic acids.
[0157] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids isolated from the host sample are DNA; and detecting
the non-host sequence by sequencing the enriched population of
non-host nucleic acids, wherein the non-host target primers are
eight, nine, ten, or eleven nucleotides in length.
[0158] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids isolated from the host sample are DNA; and detecting
the non-host sequence by sequencing the enriched population of
non-host nucleic acids, wherein the non-host target primers do not
hybridize to the most abundant host transcripts. In some aspects,
the most abundant host transcripts comprise at least 65% of all
host transcripts. In some aspects, the most abundant host
transcripts can be greater than 200 base pairs in length. In some
aspects, the most abundant host transcripts can be the 1000, 2000,
3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000 most
abundant human transcripts or whole host transcriptome identified
in the ENCODE database.
[0159] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids isolated from the host sample are DNA; and detecting
the non-host sequence by sequencing the enriched population of
non-host nucleic acids, wherein the non-host target primers
comprise one or more of the oligonucleotides in FIG. 10 or 11.
[0160] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids isolated from the host sample are DNA; and detecting
the non-host sequence by sequencing the enriched population of
non-host nucleic acids, wherein the non-host sequences can be
pathogenic sequences. In some aspects, the pathogenic sequences can
be from viruses, bacteria, fungi, or any infectious agents.
[0161] in some aspects, the host can be a human. Thus, disclosed
are methods of detecting non-human nucleic acids from a human
sample comprising selectively amplifying nucleic acids isolated
from the human sample using non-human target primers to form an
enriched population of non-human nucleic acids, wherein the nucleic
acids isolated from the human sample are DNA; and detecting the
non-human sequence by sequencing the enriched population of
non-human nucleic acids.
[0162] Disclosed are methods of detecting non-human nucleic acids
from a human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids, wherein the
nucleic acids isolated from the human sample are DNA; and detecting
the non-human sequence by sequencing the enriched population of
non-human nucleic acids, wherein the non-human target primers are
eight, nine, ten, or eleven nucleotides in length.
[0163] Disclosed are methods of detecting non-human nucleic acids
from a human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids, wherein the
nucleic acids isolated from the human sample are DNA; and detecting
the non-human sequence by sequencing the enriched population of
non-human nucleic acids, wherein the non-human target primers do
not hybridize to the most abundant human transcripts. In some
aspects, the most abundant human transcripts comprise at least 65%
of all human transcripts. In some aspects, the most abundant human
transcripts can be greater than 200 base pairs in length. In some
aspects, the most abundant human transcripts can be the 1000, 2000,
3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000 most
abundant human transcripts or whole human transcriptome identified
in the ENCODE database.
[0164] Disclosed are methods of detecting non-human nucleic acids
from a human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids, wherein the
nucleic acids isolated from the human sample are DNA; and detecting
the non-human sequence by sequencing the enriched population of
non-human nucleic acids, wherein the non-human target primers
comprise one or more of the oligonucleotides in FIG. 10 or 11.
[0165] Disclosed are methods of detecting non-human nucleic acids
from a human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids, wherein the
nucleic acids isolated from the human sample are DNA; and detecting
the non-human sequence by sequencing the enriched population of
non-human nucleic acids, wherein the non-human sequences can be
pathogenic sequences. In some aspects, the pathogenic sequences can
be from viruses, bacteria, fungi, or any infectious agents
[0166] i. Isolating
[0167] Methods of detecting non-host nucleic acids from a host
sample can include the isolation of DNA from a host sample. DNA
isolated from a host sample can include host specific DNA as well
as DNA from any non-host pathogens that may be present within the
host. DNA can be isolated from a host sample and used as a template
for non-host target primers. Techniques well-known in the art can
be used to isolate DNA from a sample.
[0168] ii. Selectively Amplifying
[0169] Selectively amplifying DNA isolated from a host sample with
non-host target primers can form an enriched population of non-host
nucleic acids.
[0170] Non-host target primers can be used to selectively amplify
non-host cDNA. The non-host target primers are designed to
hybridize to non-host sequences and not to hybridize to host
sequences. Any of the non-host target primers disclosed herein can
be used. For example, when the host is human, the non-host (i.e.
non-human) target primers can be one or more of the primers
comprising the sequences in FIG. 10 or 11. In some aspects, the
non-human target primers can be one or more sequences in Table
3.
[0171] Non-host target primers can hybridize to non-host DNA and
can serve to prime synthesis of non-host DNA.
[0172] In some aspects, RNAs can be synthesized from dsDNA. Any DNA
dependent RNA polymerase can be used for synthesis of RNA from
dsDNA. For example, T7 RNA polymerase can be used. Other RNA
polymerases such as, but not limited to, T3 and SP6 RNA polymerases
can also be used.
[0173] The resulting RNAs can be RNA copies of the selectively
amplified DNAs. Therefore, the RNA copies can be considered
selective RNAs or an enriched RNA population. The selective RNAs
comprise RNA copies of non-host DNA sequences. In some aspects,
selective RNAs can be anti-sense RNAs.
[0174] Selective amplification results in an enriched population of
non-host nucleic acids. In some aspects, the enriched population of
non-host nucleic acids can comprise a DNA population that contains
a higher percentage of non-host nucleic acids compared to the DNA
population before selective amplification.
[0175] iii. Subtractive Hybridization
[0176] In some aspects, subtractive hybridization can be performed
after the selective amplification. Thus, disclosed are methods of
detecting non-host nucleic acids from a host sample comprising
selectively amplifying nucleic acids isolated from the host sample
using non-host target primers to form an enriched population of
non-host nucleic acids, wherein the nucleic acids isolated from the
host sample are DNA; and detecting the non-host sequence by
sequencing the enriched population of non-host nucleic acids,
further comprising performing subtractive hybridization against a
population of reference host nucleic acids, wherein the subtractive
hybridization results in a further enriched population of non-host
nucleic acids, wherein the subtractive hybridization occurs prior
to the detecting step. The reference host nucleic acids can be, but
are not limited to, cDNAs.
[0177] Because the host sample can be a human sample, the disclosed
methods include methods of detecting non-human nucleic acids from a
human sample comprising selectively amplifying nucleic acids
isolated from the human sample using non-human target primers to
form an enriched population of non-human nucleic acids, wherein the
nucleic acids isolated from the human sample are DNA; and detecting
the non-human sequence by sequencing the enriched population of
non-human nucleic acids, further comprising performing subtractive
hybridization against a population of reference human nucleic
acids, wherein the subtractive hybridization results in a further
enriched population of non-human nucleic acids, wherein the
subtractive hybridization occurs prior to the detecting step.
[0178] Performing subtractive hybridization can further enrich the
enriched population of non-host nucleic acids previously enriched
by selectively amplifying non-host sequences with non-host target
primers.
[0179] Subtractive hybridization comprises the subtraction or
removal of unwanted nucleic acids (e.g. host nucleic acids).
Subtractive hybridization can comprise hybridizing the selective or
enriched nucleic acids with a reference host cDNA population.
Reference host cDNA populations can be created using techniques
well-known in the art. In some instances, a reference host cDNA
population can be one or more commercially available host cDNA
libraries, such as, but not limited to, a human peripheral blood
mononuclear cell (PBMC) cDNA library.
[0180] Subtractive hybridization can comprise hybridization between
selective or enriched nucleic acids and a reference host cDNA
population. For example, the enriched population of non-host
nucleic acids can be reverse transcribed to make a selective or
enriched RNA population. The selective or enriched RNA population
can hybridize to the reference host cDNA population. If any of the
RNAs are host RNAs, then the host RNA can hybridize to the
reference host cDNA population form an RNA/DNA duplex and can be
removed using RNase H. Thus, the remaining RNAs would be the
selective RNAs that did not hybridize to the reference host cDNA
population and therefore would be considered non-host nucleic
acids. The removal of any host RNAs that were present in the
selective or enriched RNA population can result in a further
enriched RNA population comprising non-host RNAs.
[0181] After subtractive hybridization, the further enriched RNA
population can be reverse transcribed and the methods described
below can be performed or the cDNA produced from the reverse
transcription can then be selectively amplified using the method
described herein. For example, the enrichment steps can be repeated
continuously to increase the amount of non-host nucleic acids. In
one aspect, the further enriched RNA population can be reverse
transcribed to form a first cDNA strand; the first cDNA strand can
be selectively amplified with non-host target primers to form a
second enriched population of non-host nucleic acids. The second
enriched population of non-host nucleic acids can undergo selective
hybridization resulting in a second further enriched RNA
population. The cycle of reverse transcription, selective
amplification, and subtractive hybridization can be repeated
indefinitely.
[0182] iv. Detecting
[0183] Disclosed are methods of detecting a non-host sequence in a
host sample comprising selectively amplifying nucleic acids
isolated from the host sample using non-host target primers to form
an enriched population of non-host nucleic acids, wherein the
nucleic acids are DNA; and detecting the non-host sequence by
sequencing the enriched population of non-host nucleic acids. In
some aspects the host can be a human. As such, disclosed are
methods of detecting a non-human sequence in a human sample
comprising selectively amplifying nucleic acids isolated from the
human sample using non-human target primers to form an enriched
population of non-human nucleic acids, wherein the nucleic acids
are DNA; and detecting the non-human sequence by sequencing the
enriched population of non-human nucleic acids.
[0184] Detecting a non-host sequence in a host sample can comprise
detecting nucleic acids obtained from the host that have been
enriched for non-host sequences. Nucleic acid molecules can be
detected by any method known in the art. For example sequencing,
hybridization, or microarray can be used in the methods described
herein. Sequencing techniques such as but not limited to Next
Generation sequencing (NGS), Sanger sequencing, ensemble
sequencing, capillary electrophoresis, and single molecule
sequencing can be used.
[0185] Detecting the non-host sequence can be performed by
sequencing the enriched population of non-human nucleic acids.
Sequencing can be performed after performing one or more cycles of
selective amplification or after performing selective amplification
and subtractive hybridization. Also disclosed are methods of
detecting a non-host sequence in a host sample comprising reverse
transcribing mRNA isolated from the host sample to form first cDNA
strands; selectively amplifying the first cDNA strands with
non-host target primers to form an enriched population of non-host
nucleic acids; and detecting the non-host sequence by sequencing
the enriched population of non-host nucleic acids, further
comprising performing subtractive hybridization against a reference
population of host cDNAs, wherein the subtractive hybridization
results in a further enriched population of cDNAs, wherein the
subtractive hybridization occurs prior to the step of detecting the
non-host sequence.
[0186] NGS refers to sequencing technologies having increased
throughput as compared to traditional Sanger- and capillary
electrophoresis-based approaches, for example with the ability to
generate hundreds of thousands of relatively small sequence reads
at a time. Some examples of NGS techniques include, but are not
limited to, sequencing by synthesis, sequencing by ligation,
sequencing by hybridization, pyrosequencing, ion semiconductor
sequencing, polony sequencing, DNA nanoball sequencing, nanopore
sequencing, or single molecule sequencing. NGS platforms including,
but not limited to, Illumina MiSeq or HiSeq, Ion Torrent Proton,
Oxford Nanopore, 454, SOLiD, and Heliscope can be used.
[0187] Besides NGS, sequencing can be performed by chain
termination and gel separation, as described by Sanger et al., Proc
Natl Acad Sci USA, 74(12): 5463 67 (1977). Another conventional
sequencing method involves chemical degradation of nucleic acid
fragments. See, Maxam et al., Proc. Natl. Acad. Sci., 74: 560 564
(1977). Finally, methods have been developed based upon sequencing
by hybridization. See, e.g., Drmanac, et al. (Nature Biotech., 16:
54 58, 1998). The contents of each of reference is incorporated by
reference herein in its entirety for their teaching of sequencing
processes.
[0188] In some aspects, sequencing can be performed by the Sanger
sequencing technique. Classical Sanger sequencing involves a
single-stranded DNA template, a DNA primer, a DNA polymerase,
radioactively or fluorescently labeled nucleotides, and modified
nucleotides that terminate DNA strand elongation. If the label is
not attached to the dideoxynucleotide terminator (e.g., labeled
primer), or is a monochromatic label (e.g., radioisotope), then the
DNA sample is divided into four separate sequencing reactions,
containing four standard deoxynucleotides (dATP, dGTP, dCTP and
dTTP) and the DNA polymerase. To each reaction is added only one of
the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP). These
dideoxynucleotides are the chain-terminating nucleotides, lacking a
3'-OH group required for the formation of a phosphodiester bond
between two nucleotides during DNA strand elongation. If each of
the dideoxynucleotides carries a different label, however, (e.g., 4
different fluorescent dyes), then all the sequencing reactions can
be carried out together without the need for separate reactions.
Sanger sequencing is well-known in the art.
[0189] Sequencing can also be accomplished by a single-molecule
sequencing by synthesis technique. Single molecule sequencing is
shown for example in Lapidus et al. (U.S. Pat. No. 7,169,560),
Quake et al. (U.S. Pat. No. 6,818,395), Harris (U.S. Pat. No.
7,282,337), Quake et al. (U.S. patent application number
2002/0164629), and Braslaysky, et al., PNAS (USA), 100: 3960-3964
(2003), the contents of each of these references is incorporated by
reference herein in its entirety. Briefly, a single-stranded
nucleic acid (e.g., DNA or cDNA) is hybridized to oligonucleotides
attached to a surface of a flow cell. The oligonucleotides may be
covalently attached to the surface or various attachments other
than covalent linking as known to those of ordinary skill in the
art may be employed. Moreover, the attachment may be indirect,
e.g., via a polymerase directly or indirectly attached to the
surface. The surface may be planar or otherwise, and/or may be
porous or non-porous, or any other type of surface known to those
of ordinary skill to be suitable for attachment. The nucleic acid
is then sequenced by imaging the polymerase-mediated addition of
fluorescently-labeled nucleotides incorporated into the growing
strand surface oligonucleotide, at single molecule resolution.
[0190] v. Diagnosing
[0191] The methods described herein can also be used for diagnosing
infectious disease. In some aspects, after detecting a pathogenic
sequence in a host sample, the host can be diagnosed as having an
infectious disease related to the pathogenic sequence. For example,
if a pathogenic sequence from an influenza virus is detected in a
host sample, then the host can be diagnosed with having an
influenza virus infection.
[0192] Thus disclosed are methods of diagnosing infectious disease
in a host comprising selectively amplifying nucleic acids isolated
from the host sample using non-host target primers to form an
enriched population of non-host nucleic acids, wherein the nucleic
acids are DNA; detecting a non-host sequence by sequencing the
enriched population of non-host nucleic acids; and diagnosing the
host with an infectious disease when the non-host sequence is a
pathogenic sequence.
[0193] 2. Starting from a RNA Sample
[0194] i. Selective Reverse Transcription
[0195] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising selectively reverse transcribing
total or mRNA isolated from the host sample to form first cDNA
strands using non-host target primers, wherein the first cDNA
strands form an enriched population of non-host nucleic acids; and
detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids.
[0196] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising selectively reverse transcribing
total or mRNA isolated from the host sample to form first cDNA
strands using non-host target primers, wherein the first cDNA
strands form an enriched population of non-host nucleic acids; and
detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids, wherein the non-host target
primers are eight, nine, ten, or eleven nucleotides in length.
[0197] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising selectively reverse transcribing
total or mRNA isolated from the host sample to form first cDNA
strands using non-host target primers, wherein the first cDNA
strands form an enriched population of non-host nucleic acids; and
detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids, wherein the non-host target
primers do not hybridize to the most abundant host transcripts. In
some aspects, the most abundant host transcripts comprise at least
65% of all host transcripts. In some aspects, the most abundant
host transcripts can be greater than 200 base pairs in length. In
some aspects, the most abundant host transcripts can be the 1000,
2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000 most
abundant human transcripts or whole host transcriptome identified
in the ENCODE database.
[0198] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising selectively reverse transcribing
total or mRNA isolated from the host sample to form first cDNA
strands using non-host target primers, wherein the first cDNA
strands form an enriched population of non-host nucleic acids; and
detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids, wherein the non-host target
primers comprise the complement of one or more of the
oligonucleotides in Table 5-8.
[0199] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising selectively reverse transcribing
total or mRNA isolated from the host sample to form first cDNA
strands using non-host target primers, wherein the first cDNA
strands form an enriched population of non-host nucleic acids; and
detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids, wherein the non-host
sequences can be pathogenic sequences. In some aspects, the
pathogenic sequences can be from viruses, bacteria, fungi, or any
infectious agents.
[0200] In some aspects the host can be a human. Thus, disclosed are
methods of detecting non-human nucleic acids from a human sample
comprising selectively reverse transcribing total or mRNA isolated
from the human sample to form first cDNA strands using non-human
target primers, wherein the first cDNA strands form an enriched
population of non-human nucleic acids; and detecting the non-human
sequence by sequencing the enriched population of non-human nucleic
acids.
[0201] Disclosed are methods of detecting non-human nucleic acids
from a human sample comprising selectively reverse transcribing
total or mRNA isolated from the human sample to form first cDNA
strands using non-human target primers, wherein the first cDNA
strands form an enriched population of non-human nucleic acids; and
detecting the non-human sequence by sequencing the enriched
population of non-human nucleic acids, wherein the non-human target
primers are eight, nine, ten, or eleven nucleotides in length.
[0202] Disclosed are methods of detecting non-human nucleic acids
from a human sample comprising selectively reverse transcribing
total or mRNA isolated from the human sample to form first cDNA
strands using non-human target primers, wherein the first cDNA
strands form an enriched population of non-human nucleic acids; and
detecting the non-human sequence by sequencing the enriched
population of non-human nucleic acids, wherein the non-human target
primers do not hybridize to the most abundant human transcripts. In
some aspects, the most abundant human transcripts comprise at least
65% of all human transcripts. In some aspects, the most abundant
human transcripts can be greater than 200 base pairs in length. In
some aspects, the most abundant human transcripts can be the 1000,
2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000 most
abundant human transcripts or whole human transcriptome identified
in the ENCODE database.
[0203] Disclosed are methods of detecting non-human nucleic acids
from a human sample comprising selectively reverse transcribing
total or mRNA isolated from the human sample to form first cDNA
strands using non-human target primers, wherein the first cDNA
strands form an enriched population of non-human nucleic acids; and
detecting the non-human sequence by sequencing the enriched
population of non-human nucleic acids, wherein the non-human target
primers comprise one or more of the oligonucleotides in Table
5-8.
[0204] Disclosed are methods of detecting non-human nucleic acids
from a human sample comprising selectively reverse transcribing
total or mRNA isolated from the human sample to form first cDNA
strands using non-human target primers, wherein the first cDNA
strands form an enriched population of non-human nucleic acids; and
detecting the non-human sequence by sequencing the enriched
population of non-human nucleic acids, wherein the non-human
sequences can be pathogenic sequences. In some aspects, the
pathogenic sequences can be from viruses, bacteria, fungi, or any
infectious agents.
[0205] a. Isolating
[0206] Methods of detecting non-host nucleic acids from a host
sample can include the isolation of total RNA or mRNA from a host
sample. Total RNA or mRNA can be isolated from a host sample and
used as a template for creating first cDNA strands. Techniques
well-known in the art can be used to isolate total RNA or mRNA from
a sample.
[0207] b. Reverse Transcribing and Selective Amplification
Combined
[0208] Non-host target primers can be used to selectively amplify
total RNA or mRNA. The non-host target primers are designed to
hybridize non-host sequences and not to hybridize to host
sequences. The complement of any of the non-host target primers
disclosed herein can be used. For example, when the host is human,
the non-host (i.e. non-human) target primers can be one or more of
the primers comprising the complement of the sequences FIG. 10 or
11. In some aspects, the non-human target primers can be one or
more of the complement of the sequences in Table 3.
[0209] Non-host target primers can hybridize to non-host total RNA
or mRNA and can serve to prime synthesis of non-host cDNA
strands.
[0210] The combination of reverse transcription and selective
amplification can also be known as selective reverse
transcription.
[0211] c. Subtractive Hybridization
[0212] In some aspects, a further step of subtractive hybridization
can be performed.
[0213] Thus, disclosed are methods of detecting non-host nucleic
acids from a host sample comprising selectively reverse
transcribing total or mRNA isolated from the host sample to form
first cDNA strands using non-host target primers, wherein the first
cDNA strands form an enriched population of non-host nucleic acids;
and detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids, further comprising performing
subtractive hybridization against a population of reference host
cDNAs, wherein the subtractive hybridization results in a further
enriched population of non-host nucleic acids, wherein the
subtractive hybridization occurs prior to the detecting step.
[0214] Because the host sample can be a human sample, the disclosed
methods include methods of detecting non-human nucleic acids from a
human sample comprising selectively reverse transcribing total or
mRNA isolated from the human sample to form first cDNA strands
using non-human target primers, wherein the first cDNA strands form
an enriched population of non-human nucleic acids; and detecting
the non-human sequence by sequencing the enriched population of
non-human nucleic acids, further comprising performing subtractive
hybridization against a population of reference human cDNAs,
wherein the subtractive hybridization results in a further enriched
population of non-human cDNAs, wherein the subtractive
hybridization occurs prior to the detecting step.
[0215] Performing subtractive hybridization can further enrich the
enriched population of non-host nucleic acids previously enriched
by selectively reverse transcribing non-host sequences with
non-host target primers.
[0216] Subtractive hybridization comprises the subtraction or
removal of unwanted nucleic acids (e.g. host nucleic acids).
Subtractive hybridization can comprise hybridizing the selective or
enriched RNAs with a reference host cDNA population. Reference host
cDNA populations can be created using techniques well-known in the
art. In some instances, a reference host cDNA population can be one
or more commercially available host cDNA libraries, such as, but
not limited to, a human peripheral blood mononuclear cell (PBMC)
cDNA library.
[0217] The enriched population of non-host nucleic acids can be
reverse transcribed to form an enriched or selective population of
RNA. Subtractive hybridization can comprise hybridization between
selective or enriched RNA and a reference host cDNA population. Any
host RNAs present in the selective RNAs can hybridize to the
reference host cDNA population. Host RNAs hybridized to the
reference host cDNA population form an RNA/DNA duplex and can be
removed using RNase H. Thus, the remaining RNAs would be the
selective RNAs that did not hybridize to the reference host cDNA
population and therefore would be considered non-host nucleic
acids. The removal of any host RNAs that were present in the
selective or enriched RNA population can result in a further
enriched RNA population comprising non-host RNAs.
[0218] After subtractive hybridization, the further enriched RNA
population can be treated as the original RNA obtained from the
sample and the further enriched RNA population can be used to
repeat one or more of the steps described above. For example, the
enrichment steps can be repeated continuously to increase the
amount of non-host nucleic acids. In one aspect, the further
enriched RNA population can be selectively reverse transcribed
using non-host target primers to form a second enriched population
of non-host nucleic acids. The cycle of selective reverse
transcription and subtractive hybridization can be repeated
indefinitely.
[0219] d. Detecting
[0220] Disclosed are methods of detecting a non-host sequence in a
host sample comprising selectively reverse transcribing total or
mRNA isolated from the host sample to form first cDNA strands using
non-host target primers, wherein the first cDNA strands form an
enriched population of non-host nucleic acids; and detecting the
non-host sequence by sequencing the enriched population of non-host
nucleic acids. In some aspects the host can be a human. As such,
disclosed are methods of detecting a non-human sequence in a human
sample comprising selectively reverse transcribing total or mRNA
isolated from the host sample to form first cDNA strands using
non-host target primers, wherein the first cDNA strands form an
enriched population of non-host nucleic acids; and detecting the
non-host sequence by sequencing the enriched population of non-host
nucleic acids.
[0221] Detecting a non-host sequence in a host sample can comprise
detecting nucleic acids obtained from the host that have been
enriched for non-host sequences. Nucleic acid molecules can be
detected by any method known in the art. For example sequencing,
hybridization, or microarray can be used in the methods described
herein. Sequencing techniques such as but not limited to Next
Generation sequencing (NGS), Sanger sequencing, ensemble
sequencing, capillary electrophoresis, and single molecule
sequencing can be used.
[0222] Detecting the non-host sequence can be performed by
sequencing the enriched population of non-human nucleic acids.
Sequencing can be performed after performing one or more cycles of
selective amplification or after performing selective amplification
and subtractive hybridization. Also disclosed are methods of
detecting a non-host sequence in a host sample comprising reverse
transcribing mRNA isolated from the host sample to form first cDNA
strands; selectively amplifying the first cDNA strands with
non-host target primers to form an enriched population of non-host
nucleic acids; and detecting the non-host sequence by sequencing
the enriched population of non-host nucleic acids, further
comprising performing subtractive hybridization against a reference
population of host cDNAs, wherein the subtractive hybridization
results in a further enriched population of cDNAs, wherein the
subtractive hybridization occurs prior to the step of detecting the
non-host sequence.
[0223] NGS refers to sequencing technologies having increased
throughput as compared to traditional Sanger- and capillary
electrophoresis-based approaches, for example with the ability to
generate hundreds of thousands of relatively small sequence reads
at a time. Some examples of NGS techniques include, but are not
limited to, sequencing by synthesis, sequencing by ligation,
sequencing by hybridization, pyrosequencing, ion semiconductor
sequencing, polony sequencing, DNA nanoball sequencing, nanopore
sequencing, or single molecule sequencing. NGS platforms including,
but not limited to, Illumina MiSeq or HiSeq, Ion Torrent Proton,
Oxford Nanopore, 454, SOLiD, and Heliscope can be used.
[0224] Besides NGS, sequencing can be performed by chain
termination and gel separation, as described by Sanger et al., Proc
Natl Acad Sci USA, 74(12): 5463 67 (1977). Another conventional
sequencing method involves chemical degradation of nucleic acid
fragments. See, Maxam et al., Proc. Natl. Acad. Sci., 74: 560 564
(1977). Finally, methods have been developed based upon sequencing
by hybridization. See, e.g., Drmanac, et al. (Nature Biotech., 16:
54 58, 1998). The contents of each of reference is incorporated by
reference herein in its entirety for their teaching of sequencing
processes.
[0225] In some aspects, sequencing can be performed by the Sanger
sequencing technique. Classical Sanger sequencing involves a
single-stranded DNA template, a DNA primer, a DNA polymerase,
radioactively or fluorescently labeled nucleotides, and modified
nucleotides that terminate DNA strand elongation. If the label is
not attached to the dideoxynucleotide terminator (e.g., labeled
primer), or is a monochromatic label (e.g., radioisotope), then the
DNA sample is divided into four separate sequencing reactions,
containing four standard deoxynucleotides (dATP, dGTP, dCTP and
dTTP) and the DNA polymerase. To each reaction is added only one of
the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP). These
dideoxynucleotides are the chain-terminating nucleotides, lacking a
3'-OH group required for the formation of a phosphodiester bond
between two nucleotides during DNA strand elongation. If each of
the dideoxynucleotides carries a different label, however, (e.g., 4
different fluorescent dyes), then all the sequencing reactions can
be carried out together without the need for separate reactions.
Sanger sequencing is well-known in the art.
[0226] Sequencing can also be accomplished by a single-molecule
sequencing by synthesis technique. Single molecule sequencing is
shown for example in Lapidus et al. (U.S. Pat. No. 7,169,560),
Quake et al. (U.S. Pat. No. 6,818,395), Harris (U.S. Pat. No.
7,282,337), Quake et al. (U.S. patent application number
2002/0164629), and Braslaysky, et al., PNAS (USA), 100: 3960-3964
(2003), the contents of each of these references is incorporated by
reference herein in its entirety. Briefly, a single-stranded
nucleic acid (e.g., DNA or cDNA) is hybridized to oligonucleotides
attached to a surface of a flow cell. The oligonucleotides may be
covalently attached to the surface or various attachments other
than covalent linking as known to those of ordinary skill in the
art may be employed. Moreover, the attachment may be indirect,
e.g., via a polymerase directly or indirectly attached to the
surface. The surface may be planar or otherwise, and/or may be
porous or non-porous, or any other type of surface known to those
of ordinary skill to be suitable for attachment. The nucleic acid
is then sequenced by imaging the polymerase-mediated addition of
fluorescently-labeled nucleotides incorporated into the growing
strand surface oligonucleotide, at single molecule resolution.
[0227] e. Diagnosing
[0228] The methods described herein can also be used for diagnosing
infectious disease. In some aspects, after detecting a pathogenic
sequence in a host sample, the host can be diagnosed as having an
infectious disease related to the pathogenic sequence. For example,
if a pathogenic sequence from an influenza virus is detected in a
host sample, then the host can be diagnosed with having an
influenza virus infection.
[0229] Thus disclosed are methods of diagnosing infectious disease
in a host comprising selectively amplifying nucleic acids isolated
from the host sample using non-host target primers to form an
enriched population of non-host nucleic acids, wherein the nucleic
acids are DNA; detecting a non-host sequence by sequencing the
enriched population of non-host nucleic acids; and diagnosing the
host with an infectious disease when the non-host sequence is a
pathogenic sequence.
[0230] ii. Reverse Transcription and Then Selective
Amplification
[0231] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands;
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids;
and detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids.
[0232] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands;
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids;
and detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids, wherein the non-host target
primers are eight, nine, ten, or eleven nucleotides in length.
[0233] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands;
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids;
and detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids, wherein the non-host target
primers do not hybridize to the most abundant host transcripts. Tn
some aspects, the most abundant host transcripts comprise at least
65% of all host transcripts. In some aspects, the most abundant
host transcripts can be greater than 200 base pairs in length. In
some aspects, the most abundant host transcripts can be the 1000,
2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000 most
abundant human transcripts or whole host transcriptome identified
in the ENCODE database.
[0234] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands;
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids;
and detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids, wherein the non-host target
primers comprise one or more of the oligonucleotides in FIG. 10 or
11.
[0235] Disclosed are methods of detecting non-host nucleic acids
from a host sample comprising reverse transcribing total RNA or
mRNA obtained from the host sample to form first cDNA strands;
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids;
and detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids, wherein the non-host
sequences can be pathogenic sequences. In some aspects, the
pathogenic sequences can be from viruses, bacteria, fungi, or any
infectious agents.
[0236] In some aspects, the host can be a human. Thus, also
disclosed are methods of enriching non-human nucleic acids from a
human sample comprising reverse transcribing total RNA or mRNA
obtained from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic acids;
and detecting the non-human sequence by sequencing the enriched
population of non-human nucleic acids.
[0237] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising reverse transcribing total RNA or
mRNA obtained from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic acids;
and detecting the non-human sequence by sequencing the enriched
population of non-human nucleic acids, wherein the non-human target
primers are eight, nine, ten, or eleven nucleotides in length.
[0238] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising reverse transcribing total RNA or
mRNA obtained from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic acids;
and detecting the non-human sequence by sequencing the enriched
population of non-human nucleic acids, wherein the non-human target
primers do not hybridize to the most abundant human transcripts. In
some aspects, the most abundant human transcripts comprise at least
65% of all human transcripts. In some aspects, the most abundant
human transcripts can be greater than 200 base pairs in length. In
some aspects, the most abundant human transcripts can be the 1000,
2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000 most
abundant human transcripts or whole human transcriptome identified
in the ENCODE database.
[0239] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising reverse transcribing total RNA or
mRNA obtained from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic acids;
and detecting the non-human sequence by sequencing the enriched
population of non-human nucleic acids, wherein the non-human target
primers comprise one or more of the oligonucleotides in FIG. 10 or
11.
[0240] Disclosed are methods of enriching non-human nucleic acids
from a human sample comprising reverse transcribing total RNA or
mRNA obtained from the human sample to form first cDNA strands; and
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic acids;
and detecting the non-human sequence by sequencing the enriched
population of non-human nucleic acids, wherein the non-human
sequences can be pathogenic sequences. In some aspects, the
pathogenic sequences can be from viruses, bacteria, fungi, or any
infectious agents.
[0241] a. Isolating
[0242] Methods of detecting a non-host sequence in a host sample
can include the isolation of total RNA or mRNA from a host sample.
Total RNA or mRNA can be isolated from a host sample and used as a
template for creating cDNA. Techniques well-known in the art can be
used to isolate total RNA or mRNA from a sample.
[0243] b. Reverse Transcribing
[0244] Isolated total RNA or mRNA obtained from a host sample can
be used as the template for creating a first cDNA strand for each
of the isolated total RNA or mRNAs. Methods for reverse
transcribing mRNA are well-known in the art. For example, a poly
d(T) primer in combination with reverse transcriptase can be used
to produce a first cDNA strand from mRNA.
[0245] Reverse transcriptases have RNA dependent DNA polymerase
activity and thus can produce a cDNA strand from an RNA template.
In some instances, reverse transcriptases can further comprise
RNase H activity, which results in the cleavage of RNA from an
RNA-DNA duplex. In some instances, RNase H can be added separately
from the reverse transcriptase.
[0246] Several types of primers can be used during reverse
transcription. For example, the primers can be but are not limited
to random primers, sequence specific primers, and/or poly d(T)
primers. When reverse transcribing total RNA, random primers and
poly d(T) primers are often used since the sequences of the RNA may
not be known.
[0247] In some aspects, the primers can be poly d(T) primers
further comprising an RNA polymerase promoter sequence. Thus, a
poly d(T)-RNA polymerase promoter primer can be used to form first
cDNA strands from a host sample comprising reverse transcribing
mRNA obtained from the host sample. In some aspects, the poly
d(T)-RNA polymerase promoter primer can comprise poly d(T)-T7
promoter sequences as shown in FIG. 2. Promoter sequences for other
RNA polymerases such as, but not limited to, T3 and SP6 RNA
polymerases can also be used.
[0248] c. Selectively Amplifying
[0249] When reverse transcription is performed without the
combination of selective amplification, the step of selective
amplification is performed on the first cDNA strands produced from
reverse transcription.
[0250] The first cDNA strands produced from reverse transcribing
mRNA obtained from a host sample can be used for selective
amplification. Selectively amplifying the first cDNA strands with
non-host target primers can form an enriched population of non-host
nucleic acids.
[0251] Before selectively amplifying the first cDNA strands, the
mRNA that served as the template for the first cDNA strand can be
removed. Removal of the mRNA can be performed by adding RNase H
before selective amplification. In some instances, the RNA
Dependent DNA polymerase used to produce the first cDNA strand can
have RNase H activity. In some instances, RNase H can be added as a
separate enzyme independent of the DNA polymerase. After removing
the mRNA, or after selectively isolating the first cDNA strands,
only first cDNA strands remain.
[0252] Non-host target primers can be used to selectively amplify
non-host cDNA. The non-host target primers are designed to
hybridize non-host sequences and not to hybridize to host
sequences. Any of the non-host target primers disclosed herein can
be used. For example, when the host is human, the non-host (i.e.
non-human) target primers can be one or more of the primers
comprising the sequences in FIG. 10 or 11. In some aspects, the
non-human target primers can be one or more of the sequences in
Table 3.
[0253] Non-host target primers can hybridize to non-host first cDNA
strands and can serve to prime synthesis of second non-host cDNA
strand.
[0254] RNAs can also be synthesized from dsDNA. For example, RNA
can be synthesized from dsDNA comprising non-host first cDNA strand
and non-host second cDNA strands. Any DNA dependent RNA polymerase
can be used for synthesis of RNA from dsDNA. For example, T7 RNA
polymerase can be used. Other RNA polymerases such as, but not
limited to, T3 and SP6 RNA polymerases can also be used.
[0255] The resulting RNAs can be RNA copies of the selectively
amplified cDNAs. Therefore, the RNA copies can be considered
selective RNAs or an enriched RNA population. The selective RNAs
comprise RNA copies of non-host cDNA sequences. In some aspects,
selective RNAs can be anti-sense RNAs.
[0256] Selective amplification can result in an enriched population
of non-host nucleic acids. In some aspects, the enriched population
of non-host nucleic acids can comprise a cDNA population that
contains a higher percentage of non-host nucleic acids compared to
the cDNA population before selective amplification. In some
aspects, the enriched population of non-host nucleic acids can
comprise a population of RNAs that contains a higher percentage of
non-host RNA sequences compared to the mRNA obtained from the host
sample.
[0257] d. Subtractive Hybridization
[0258] In some aspects, subtractive hybridization can be performed
when detecting non-host nucleic acids in a host sample.
[0259] Disclosed are methods of detecting non-host nucleic acids in
a host sample comprising reverse transcribing total RNA or mRNA
isolated from the host sample to form first cDNA strands;
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids;
and detecting the non-host nucleic acids by sequencing the enriched
population of non-host nucleic acids, further comprising performing
subtractive hybridization against a reference population of host
cDNAs, wherein the subtractive hybridization results in a further
enriched population of cDNAs, wherein the subtractive hybridization
occurs prior to the detecting step.
[0260] Because the host sample can be a human sample, the disclosed
methods include methods of detecting a non-human sequence in a
human sample comprising reverse transcribing total RNA or mRNA
isolated from the human sample to form first cDNA strands;
selectively amplifying the first cDNA strands with non-human target
primers to form an enriched population of non-human nucleic acids;
and detecting the non-human nucleic acids by sequencing the
enriched population of non-human nucleic acids, further comprising
performing subtractive hybridization against a reference population
of human cDNAs, wherein the subtractive hybridization results in a
further enriched population of cDNAs, wherein the subtractive
hybridization occurs prior to detecting the pathogenic
sequence.
[0261] Performing subtractive hybridization can further enrich the
enriched population of non-host nucleic acids previously enriched
by selectively amplifying non-host sequences with non-host target
primers.
[0262] Subtractive hybridization can comprise the subtraction or
removal of unwanted nucleic acids (e.g. host nucleic acids).
Subtractive hybridization can comprise hybridizing the selective or
enriched RNAs with a reference host cDNA population. Reference host
cDNA populations can be created using techniques well-known in the
art. In some instances, a reference host cDNA population can be one
or more commercially available host cDNA libraries, such as, but
not limited, to human peripheral blood mononuclear cell (PBMC) cDNA
library.
[0263] Subtractive hybridization can comprise hybridization between
selective or enriched RNA and a reference host cDNA population. Any
host RNAs present in the selective RNAs can bind to the reference
host cDNA population. Host RNAs hybridized to the reference host
cDNA population form an RNA/DNA duplex and can be removed using
RNase H. Thus, the remaining RNAs would be the selective RNAs that
did not hybridize to the reference host cDNA population and
therefore would be considered non-host nucleic acids. The removal
of any host RNAs that were present in the selective or enriched RNA
population can result in a further enriched RNA population
comprising non-host RNAs.
[0264] After subtractive hybridization, the further enriched RNA
population can be treated as the original RNA obtained from the
sample and the further enriched RNA population can be used to
repeat one or more of the steps described above. For example, the
enrichment steps can be repeated continuously to increase the
amount of non-host nucleic acids. In one aspect, the further
enriched RNA population can be reverse transcribed to form a first
cDNA strand; the first cDNA strand can be selectively amplified
with non-host target primers to form a second enriched population
of non-host nucleic acids. The second enriched population of
non-host nucleic acids can undergo selective hybridization
resulting in a second further enriched RNA population. The cycle of
reverse transcription, selective amplification, and subtractive
hybridization can be repeated indefinitely. The first cDNA strand
can be reverse transcribed using a poly d(T)-T7 promoter primer
listed in FIG. 2.
[0265] e. Detecting
[0266] Disclosed are methods of detecting a non-host sequence in a
host sample comprising reverse transcribing total RNA or mRNA
obtained from the host sample to form first cDNA strands;
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids;
and detecting the non-host sequence by sequencing the enriched
population of non-host nucleic acids. In some aspects the host can
be a human. As such, disclosed are methods of detecting a non-human
sequence in a human sample comprising reverse transcribing total
RNA or mRNA obtained from the human sample to form first cDNA
strands; selectively amplifying the first cDNA strands with
non-human target primers to form an enriched population of
non-human nucleic acids; and detecting the non-human sequence by
sequencing the enriched population of non-human nucleic acids.
[0267] Detecting a non-host sequence in a host sample can comprise
detecting nucleic acids obtained from the host that have been
enriched for non-host sequences. Nucleic acid molecules can be
detected by any method known in the art. For example sequencing,
hybridization, or microarray can be used in the methods described
herein. Sequencing techniques such as but not limited to Next
Generation sequencing (NGS), Sanger sequencing, ensemble
sequencing, capillary electrophoresis, and single molecule
sequencing can be used.
[0268] Detecting the non-host sequence can be performed by
sequencing the enriched population of non-human nucleic acids.
Sequencing can be performed after performing one or more cycles of
selective amplification or after performing selective amplification
and subtractive hybridization. Also disclosed are methods of
detecting a non-host sequence in a host sample comprising reverse
transcribing mRNA isolated from the host sample to form first cDNA
strands; selectively amplifying the first cDNA strands with
non-host target primers to form an enriched population of non-host
nucleic acids; and detecting the non-host sequence by sequencing
the enriched population of non-host nucleic acids, further
comprising performing subtractive hybridization against a reference
population of host cDNAs, wherein the subtractive hybridization
results in a further enriched population of cDNAs, wherein the
subtractive hybridization occurs prior to the step of detecting the
non-host sequence.
[0269] NGS refers to sequencing technologies having increased
throughput as compared to traditional Sanger- and capillary
electrophoresis-based approaches, for example with the ability to
generate hundreds of thousands of relatively small sequence reads
at a time. Some examples of NGS techniques include, but are not
limited to, sequencing by synthesis, sequencing by ligation,
sequencing by hybridization, pyrosequencing, ion semiconductor
sequencing, polony sequencing, DNA nanoball sequencing, nanopore
sequencing, or single molecule sequencing. NGS platforms including,
but not limited to, Illumina MiSeq or HiSeq, Ion Torrent Proton,
Oxford Nanopore, 454, SOLiD, and Heliscope can be used.
[0270] Besides NGS, sequencing can be performed by chain
termination and gel separation, as described by Sanger et al., Proc
Natl Acad Sci USA, 74(12): 5463 67 (1977). Another conventional
sequencing method involves chemical degradation of nucleic acid
fragments. See, Maxam et al., Proc. Natl. Acad. Sci., 74: 560 564
(1977). Finally, methods have been developed based upon sequencing
by hybridization. See, e.g., Drmanac, et al. (Nature Biotech., 16:
54 58, 1998). The contents of each of reference is incorporated by
reference herein in its entirety for their teaching of sequencing
processes.
[0271] In some aspects, sequencing can be performed by the Sanger
sequencing technique. Classical Sanger sequencing involves a
single-stranded DNA template, a DNA primer, a DNA polymerase,
radioactively or fluorescently labeled nucleotides, and modified
nucleotides that terminate DNA strand elongation. If the label is
not attached to the dideoxynucleotide terminator (e.g., labeled
primer), or is a monochromatic label (e.g., radioisotope), then the
DNA sample is divided into four separate sequencing reactions,
containing four standard deoxynucleotides (dATP, dGTP, dCTP and
dTTP) and the DNA polymerase. To each reaction is added only one of
the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP). These
dideoxynucleotides are the chain-terminating nucleotides, lacking a
3'-OH group required for the formation of a phosphodiester bond
between two nucleotides during DNA strand elongation. If each of
the dideoxynucleotides carries a different label, however, (e.g., 4
different fluorescent dyes), then all the sequencing reactions can
be carried out together without the need for separate reactions.
Sanger sequencing is well-known in the art.
[0272] Sequencing can also be accomplished by a single-molecule
sequencing by synthesis technique. Single molecule sequencing is
shown for example in Lapidus et al. (U.S. Pat. No. 7,169,560),
Quake et al. (U.S. Pat. No. 6,818,395), Harris (U.S. Pat. No.
7,282,337), Quake et al. (U.S. patent application number
2002/0164629), and Braslaysky, et al., PNAS (USA), 100: 3960-3964
(2003), the contents of each of these references is incorporated by
reference herein in its entirety. Briefly, a single-stranded
nucleic acid (e.g., DNA or cDNA) is hybridized to oligonucleotides
attached to a surface of a flow cell. The oligonucleotides may be
covalently attached to the surface or various attachments other
than covalent linking as known to those of ordinary skill in the
art may be employed. Moreover, the attachment may be indirect,
e.g., via a polymerase directly or indirectly attached to the
surface. The surface may be planar or otherwise, and/or may be
porous or non-porous, or any other type of surface known to those
of ordinary skill to be suitable for attachment. The nucleic acid
is then sequenced by imaging the polymerase-mediated addition of
fluorescently-labeled nucleotides incorporated into the growing
strand surface oligonucleotide, at single molecule resolution.
[0273] f. Diagnosing
[0274] The methods described herein can also be used for diagnosing
infectious disease. In some aspects, after detecting a pathogenic
sequence in a host sample, the host can be diagnosed as having an
infectious disease related to the pathogenic sequence. For example,
if a pathogenic sequence from an influenza virus is detected in a
host sample, then the host can be diagnosed with having an
influenza virus infection.
[0275] Thus disclosed are methods of diagnosing infectious disease
in a host comprising reverse transcribing mRNA isolated from a
sample obtained from the host to form first cDNA strands;
selectively amplifying the first cDNA strands with non-host target
primers to form an enriched population of non-host nucleic acids;
detecting a non-host sequence by sequencing the enriched population
of non-host nucleic acids; and diagnosing the host with an
infectious disease when the non-host sequence is a pathogenic
sequence.
D. Non-Host Target Primers
[0276] Disclosed herein are non-host target primers that can be
used in any of the disclosed methods. Non-host target primers are
primers that do not hybridize (i.e. are not complementary) to any
host transcripts or to the most abundant host transcripts. Non-host
target primers are designed to target a sequence other than host
nucleic acid sequences. Non-host target primers can hybridize to
non-host sequences present in a host sample. Because the non-host
sequences present within a host sample may not be known, the
non-host target primers are not necessarily specific to non-host
sequences; however they are specifically designed not to hybridize
to host sequences or not to hybridize to the most abundant host
transcripts. Thus, the non-host target primers can be considered
random although they are not completely random because they are
specifically designed not to hybridize to the most abundant
transcripts of the host.
[0277] Non-host target primers can range in size depending on what
the host is and depending on what the non-host is. Because the
non-host target primers are designed not to bind to host nucleic
acid sequences, the non-host target primers are dependent on the
sequence of the host nucleic acids. For example, if the host is
human then the non-host target primers are considered non-human
target primers and are dependent on human nucleic acid sequences.
Based on the human nucleic sequences, the non-human target primers
can be designed so that they do not hybridize to the known human
sequences. Non-human target primers can be eight, nine, ten or
eleven nucleotides in length. When the host is not human, the
non-host target primers can be at least 5, 6, 7, 8, 9, 10, 15, 20,
25, or 30 nucleotides in length.
[0278] The length of the non-host target primer is based on its
ability to exclude the majority of host transcripts while retaining
the ability to hybridize to non-host sequences. For example the
likelihood of finding a match for a 9-mer oligo in a random
sequence is 4.sup.9.times.9=2,359,296, which is larger than the
genome size of many human pathogens. That means, this 9-mer oligo,
although it can get rid of all human transcripts, is less likely to
pick up (amplify) all potential pathogen sequences.
[0279] Non-host target primers can be designed so that they do not
hybridize to the most abundant host transcripts. The most abundant
host transcripts can be those transcripts that comprise at least
60%, 65%, 70%, 75%, 80%, 85%, 90%, 95, 96%, 97%, 98%, 99% or 100%
of all host transcripts. The most abundant transcripts can vary
depending on the host. Knowing the host transcriptome data can help
design non-host target primers. For example, the human
transcriptome data obtained from the Encyclopedia of DNA Elements
(ENCODE) was used to identify the most abundant human transcripts.
The most abundant human transcripts can be those transcripts that
comprise at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95, 96%, 97%,
98%, 99% or 100% of all human transcripts. Other species'
transcriptomes can be accessed from public databases for other
species including, but not limited to, dog, cattle, sheep, swine,
chicken, sea urchin, yeast, Arabidopsis, honey bee, etc.
[0280] The most abundant host transcripts, can be considered those
transcripts having greater than 100, 150, 200, 250, 300, 350, 400,
450, 500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or
10000 base pairs in length. For example, when the host is human,
the most abundant human transcripts can be greater than 200 base
pairs in length. The most abundant host transcripts can be the 10,
20, 30, 40, 50, 60, 70, 80, 90, 100, 250, 500, 1000, 2000, 3000,
4000, 5000, 6000, 7000, 8000, 9000, 10000, 12000, 14000, 16000,
18000 or 20000 most abundant host transcripts known in that host.
For example, the most abundant human transcripts can be the 1000,
2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000 most
abundant human transcripts or all human transcripts identified in
the ENCODE database.
[0281] Non-host target primers can be non-human target primers.
Non-human target primers can comprise one or more of the
oligonucleotides of Tables 3 or FIG. 10 or 11.
[0282] Disclosed are compositions comprising at least two of the
non-human oligonucleotides listed in Tables 3 or FIG. 10 or 11.
[0283] In some aspects, compositions comprise at least 2, 3, 4, 5,
6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35,
40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100 of the
oligonucleotides listed in Tables 3 or FIG. 10 or 11. In some
aspects, compositions comprise up to 150, 200, 250, 300, 350, 400,
450, 500, 1000, 2000, 3000, 4000, or 5000 of the oligonucleotides
listed in Tables 3, or FIG. 10 or 11.
[0284] Also disclosed are compositions comprising at least two of
the complement of the non-human oligonucleotides listed in Tables 3
or FIG. 10 or 11.
[0285] In some aspects, compositions comprise at least 2, 3, 4, 5,
6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35,
40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100 of the
complement of the oligonucleotides listed in Tables 3 or FIG. 10 or
11. In some aspects, compositions comprise up to 150, 200, 250,
300, 350, 400, 450, 500, 1000, 2000, 3000, 4000, or 5000 of the
complement of the oligonucleotides listed in Tables 3 or FIG. 10 or
11.
E. Kits
[0286] The materials described above as well as other materials can
be packaged together in any suitable combination as a kit useful
for performing, or aiding in the performance of, the disclosed
method. The kit can be used for research use or for in vitro
diagnosis (IVD). It is useful if the kit components in a given kit
are designed and adapted for use together in the disclosed method.
For example disclosed are kits for enriching non-host sequences
from a host sample, the kit comprising non-host target primers. For
example, disclosed are kits that can comprise one or more of the
oligonucleotides of Tables 3, or 5-8. The kits also can contain
polymerase or oligo d(T) primers. In some aspects, the oligo d(T)
primers can be oligo d(T)-RNA polymerase promoter primers.
Examples
[0287] To make NGS technology a practical diagnostic tool in
clinical labs, the key is to greatly increase the presence of
pathogenic sequences in a clinical sample. To address this
challenge, the Preferential Amplification of Pathogenic Sequences
(PATHseq) technology was developed to preferentially amplify
non-human sequences in a clinical sample. The strategy is based on
the following rationales: 1) Active infection is the result of
pathogenic gene expression, which produces polyadenylated RNAs
(mRNAs), or pathogenic transcripts; 2) Only about 3% of human
genome produce protein-coding transcripts (mRNAs), among these, the
top 1,000 and 2,000 most abundant human transcripts comprise more
than 65% and 72% of all human transcripts, respectively; 3) By
selectively excluding the amplification of these abundant human
transcripts, pathogenic transcripts can be preferentially amplified
in human clinical samples; 4) Pathogenic transcripts can be further
enriched through subtractive hybridization against a reference
(normal) human transcription library (human transcriptome). The
PATHseq technology, in combination with NGS technology, can provide
comprehensive and unbiased diagnosis of pathogens responsible for
any infectious disease
[0288] The NGS technology has started a revolution in genomics and
provided opportunities for its broad application in many fields,
including in the diagnosis of human pathogens. However, prior to
the methods described herein, NGS technology was still a research
tool, rather than a diagnostic tool, due to the scarcity of
pathogen sequences in human clinical samples and subsequent
requirement of deep sequencing, and the complexity of
bioinformatics analysis required to identify the pathogenic
sequences. In order to address these challenges, current mainstream
science focuses on three kinds of research: 1) Increasing the
capacity of deep sequencing; 2) Increasing the capacity of
bioinformatics analysis in order to identify the pathogenic
sequences; 3) Hoping to use traditional methods to solve novel
challenges (Table 1).
TABLE-US-00001 TABLE 1 List of current mainstream methods and their
features Method Application Problem Shortgun sequencing Viral
metagenome studies Remove host sequences Microbiome studies Mapping
pathogenic sequences Amplicon sequencing with Identify know human
Unknown or novel pathogens targeted PCR primers pathogens Advantage
over regular PCR? Identify mutations and variation Amplicon
sequencing with Amplification of enriched viral Many viruses do not
produce random PCR primers particles particles Discovery of new
pathogens Remove host sequences Enrichment through cell culture
Known pathogens Not all pathogens can be Unknown pathogens grown on
cultured tissue Method for unknown pathogen culture
[0289] PATHseq is significantly different from mainstream science
in that it focuses on how to increase the presence of pathogenic
sequences in clinical samples through a new approach. For example,
in PATHseq technology, a specific set of 8-mer primers, instead of
random primers or pathogen-specific primers, are used to construct
cDNA library. The 8-mer primers are short enough to amplify any
pathogen sequence larger than 5,958 bp ((4.sup.8.times.8)/88),
while selectively excluding the amplification of most abundant
human transcripts. The pathogenic sequences can further be enriched
by subtractive hybridization against reference human
transcripts.
F. The Most Abundant Human Transcripts
[0290] One strategy for NGS application is to look at only the
transcripts. Although the size of the human genome is huge
(containing over 3 billion base pairs (bp)), it encodes only about
20,000 protein-coding genes, accounting for a very small fraction
of the genome (approximately 2%). A recent report found that most
protein-coding genes have one major transcript expressed at
significantly higher levels than others, and in human tissues these
major transcripts contribute almost 85 percent to the total mRNA.
Given that the average length of human mRNAs is 1.3 kb, the
complexity can be reduced by 26.8 times, if cDNA is sequenced
instead of genomic DNA. This strategy has been used successfully by
several publications in searching human pathogens and other
applications. However, this strategy is still impractical for
diagnostic laboratories, because the number of human transcripts is
still too large compared to the relative scarcity of pathogenic
transcripts.
[0291] In order to solve this problem, an alternative strategy
using the most abundant human transcripts was used (Table 2). If
the most abundant human transcripts are eliminated from clinical
samples, the pathogenic sequences can be selectively enriched and
further reduce the sequencing complexity. The recent completion of
the Encyclopedia of DNA Elements (ENCODE) project provides a
genome-wide "landscape of transcription in human cells" in 14
different cell lines. Based on the publicly available ENCODE
database, the total human large transcripts (>200 bp
polyadenylated RNAs) in GM12878 (a cell line that contributed most
to the ENCODE database) are 161,999. Among these, 86,248
transcripts are reproducible (in a duplicated experiment). These
86,248 transcripts are defined as human transcriptome (Table 2). As
shown in table 2, the most abundant 1,000 and 2,000 transcripts
comprised about 65% and 72% of all human transcripts, respectively,
based on ENCODE data.
TABLE-US-00002 TABLE 2 The most abundant human transcripts (RPKM:
Reads per kilobase of transcript per million mapped reads) Most
abundant human % of total human transcripts RPKM transcriptome Top
1000 23391.45 65.52% Top 2000 25847.42 72.40% Top 3000 27355.52
76.62% Top 4000 28440.66 79.66% Top 5000 29287.62 82.04% Top 6000
29973.64 83.96% Top 7000 30544.67 85.56% Top 8000 31035.38 86.93%
Top 9000 31463.82 88.13% Top 10000 31838.97 89.18% Top 20000
34018.78 95.29% All 86248 35700.85 .sup. 100%
G. Generation of a Set of Non-Human Target Primers that do not
Match the Sequence of the Most Abundant Human Transcripts
[0292] A computer program was further developed to look for
specific patterns in the human transcriptome database. As
predicted, human transcript sequences are not randomly distributed.
Using this computer program, a set of 88 8-mer oligonucleotides
(non-human target primers) (Table 3) were generated that do not
match the sequences of the 2,000 most abundant human transcripts.
In other words, by using this set of oligos as primers in the
construction of cDNA library, 72% of human transcripts can be
eliminated from clinical samples, greatly increasing the chance of
selectively targeting pathogenic sequences. Theoretically, this set
of primers has the probability to amplify any sequences larger than
5,958 bp ((4.sup.8.times.8)/88), which include almost all human
pathogens (both viruses and bacteria).
TABLE-US-00003 TABLE 3 A list of 88 8-mer oligonucleotides
(non-human target primers) that do not match the sequences of the
2,000 most abundant human transcripts AAACGCGA ACGCGATA CCCTAACC
CGCGATAC CGGTCGAT GTATAACG TAGCGTAT TCGAATAG AACGCATA ATACCGGT
CCGGTAAT CGCGCGTA CGTATATC GTTACGCG TAGTAACG TCGCGTAT AATAACGC
ATACGTAC CCGTAGTA CGCGGTTA CGTATTCG TAACCGTT TAGTCGAG TCGGTAAC
AATATCGT ATAGCGCA CGAACGTA CGCGTAAT CGTCGAAT TAACGTAA TAGTCGGT
TCGTCGAT AATATTCG ATAGCGCG CGAATAAC CGCGTATA CTAATACG TAAGCGCG
TATAGCGC TCTAAGCG AATCGGTA ATGCGATA CGACGTAC CGCGTATC CTTAGCGA
TAAGCTCG TATCACGC TTAACGTA ACACGTTA ATGCGTTA CGATAGGT CGCTAAAA
GATACGTA TAATACGT TATCCGAC TTACGATA ACCGGTTA ATTAGCGT CGATAGTA
CGGGTCGA GCGAATAT TAGAGTCG TATCGCTA TTAGTCGA ACGAACCG ATTGCGAC
CGATATCC CGGTAAGC CCGACGTA TAGATCCG TATCGGAC TTATACCG ACGAATAA
ATTGTACG CGATCGTA CGGTAGAT GCGTAATT TAGCGAAT TATCGGTA TTATATCG
ACGATAGG CAATCGCG CGCAATAT CGGTAGTA GTACCGTA TAGCGTAC TATCGGTC
TTATCGCG
H. Development and Test of a Set of 8-Mer Oligonucleotides
(Non-Human Target Primers) that do not Match the Sequence of the
Most Abundant Human Transcripts
[0293] The feasibility of using this set of oligos as primers to
selectively amplify non-human sequences during construction of cDNA
library can be tested. To make this set more efficient, these
non-human target primers can be fine-tuned based on different sets
of data of the most abundant human transcripts, for example, using
the top 1000, or top 4000 human transcripts (FIG. 3). Whether it
produces better results by using partial sequences of human
transcripts (i.e. sequences from upstream 500 bp or 1,000 bp to
3'-end of transcripts) (FIG. 3) can also be tested. Whether this
set will be different if the sequence data from a different cell
line is used can be tested. The current technology is based on
sequence data from GM12878, a blood cell line, therefore this
approach is suitable for its application in blood donor screening,
but may not be suitable for skin infection. In that case, the
sequence data from an epithelial or endothelial cell line can be
used. Currently, ENCODE database provides sequence data from 14
different cell lines.
I. Technology Development--Preferential Amplification of Pathogenic
Sequences (PATHseq)
[0294] A technology, called PATHseq (FIG. 1) can be developed. This
technology is based on the following rationale: 1) Active infection
is the cause of pathogenic gene expression, which generates
protein-coding transcripts or polyadenylated RNAs (mRNA); 2) By
searching for pathogenic transcripts (mRNA) in the background of
human transcriptome instead of genomic DNA, the sequencing
complexity can be greatly reduced; 3) The 2,000 most abundant human
transcripts comprise about 72% of all human transcripts and the use
of a set of 8-mer primers can preferentially eliminate the
amplification of these human transcripts; 4) Pathogenic sequences
can be further enriched through subtractive hybridization against a
human reference (non-pathogenic) cDNA library.
[0295] Total RNA or mRNAs can be purified from clinical samples
(FIG. 1, step 1). Total RNA or mRNAs can be extracted and purified
from clinical samples. Only RNAs larger than 200 bp are collected.
Reverse transcription is shown in FIG. 1, step 2. A primer (P1) is
designed to specifically transcribe mRNA into first-strand cDNAs
(anti-sense) while introducing a T7 promoter/primer sequence into
the cDNA. RNase H activity occurs at FIG. 1, step 3. The
ribonuclease activity of RNase H cleaves RNA in a DNA/RNA duplex,
allowing the synthesis of secondary cDNA strands. Synthesis of
secondary cDNA strands is shown in FIG. 1, step 4. A set of 88
specific 8-mer oligonucleotides (Table 2) has been developed and
can be used as primers for the synthesis of secondary cDNA strands.
Because these primers do not amplify the 2,000 most abundant human
mRNAs, about 72% of all human mRNAs can be eliminated from
amplification, preferentially amplifying non-human (pathogenic)
sequences.
[0296] Synthesis of RNAs is shown in FIG. 1, step 5. Using the T7
promoter introduced in FIG. 1, step 2, the T7 RNA polymerase
synthesizes RNAs with double-stranded DNA as template. Anti-sense
RNAs can be synthesized as shown in FIG. 1, step 6. Because the T7
promoter is attached to the poly(A) end, newly generated RNAs are
anti-sense. Human reference cDNAs are used (FIG. 1, step 7). Human
reference cDNA library can be created using the same set of 8-mer
primers as in FIG. 1, step 4, plus a poly d(T) primer (not P1
primer). Normal (non-pathogenic) human mRNAs can be used as
templates when the library is constructed. Sense strands of human
reference cDNAs can be separated using poly d(T) beads. The beads
are further used as solid phase for subtractive hybridization.
Newly generated anti-sense human RNAs from FIG. 1, step 6 can be
captured (hybridized) by these cDNAs and specifically degraded by
RNase H in RNA-DNA hybrids. RNase H does not digest single or
double-stranded DNA.
[0297] Enrichment of pathogenic RNAs is shown in FIG. 1, step 8.
Pathogenic RNAs can be greatly enriched because they do not
hybridize to human reference cDNAs. Reverse transcription occurs
again as shown in FIG. 1, step 9. A poly d(T)-T7 promoter primer
can be used to synthesize the first cDNA strands. Another round of
RNase H activity occurs in FIG. 1, step 10. Again, RNase H cleaves
RNAs in a DNA/RNA duplex.
[0298] RNA synthesis is shown in FIG. 1, step 11. Synthesized RNAs
are anti-sense. Enrichment of pathogenic RNAs is shown in FIG. 1,
step 12. Step 6 through 12 form a cycle in which pathogenic RNAs
are repeatedly enriched. Through this process, the non-human
(pathogenic) sequences can be increased by several orders of
magnitude in the final samples for sequencing.
J. Next Generation Sequencing
[0299] The development of the NGS technologies have dramatically
increased the capacity to sequence large genomes; now, a
mammalian-sized genome can be sequenced with the time and
affordability that would not have been feasible even a few years
ago. NGS technology is following "Moore's Law" (a metric of
technology improvement), with dramatic decreases in the cost per
genome sequenced since its invention. New NGS instruments and
platforms are coming to the market quickly. To determine the best
possible NGS platform for the PATHseq technology, a comparison of
current available NGS instruments was done (FIG. 4), modified from
Glenn, T. C. 18 and updated from "2013 NGS Field Guide").
[0300] Among these available HGS platforms, the Illumina MiSeq, Ion
Torrent Proton, and Oxford Nanopore can be good choices for
diagnostic laboratories, based on the consideration of instrument
cost, sequencing capacity, and running cost. For example, the
Illumina MiSeq (Illumina, CA) has a capacity of 4 million reads per
run while the second version has a capacity of 15 million reads
with the length of reads up to 250+250 bp of PE (paired end),
totaling the yield per run to 7,500 million bp per run, about 6
times than the first version. The sequencing service fee has also
decreased rapidly, with current price of $3,500 per lane using
HiSeq 2500.
K. Multiplex Sequencing Per Run
[0301] The maximal number of samples that can be loaded onto one
sequencing lane depends on two factors: 1) the minimal sequencing
coverage required for successfully identifying a pathogen, usually
the sequencing depth to achieve at least 1 RPKM (Reads per kilobase
of transcript per million mapped reads) of pathogen sequences is
required; 2) the capacity of the sequencing instrument. A single
run of current HiSeq 2500 yields as many as 600 million reads (FIG.
4), each containing approximately 150 bases in length with paired
ends (PE), totaling the yield to 180,000 Mbp. The second version of
MiSeq platform has the capacity of 15 million reads per run and
totaling the yield to 7,500 Mbp.
[0302] Based on this calculation, further reducing the sequencing
cost per sample by introducing multiple sample identifiers (FIG. 4)
can be achieved. Using the same computer program, a total of 40
10-mer oligonucleotides were generated that do not match the
sequences to any of the total 161,999 human transcripts. These
oligos can be divided into two sets and used as adaptors for the
construction of sequencing libraries. The maximum combination of
these 20+20 adaptors can generate 400 (20.times.20=400) sample
identifiers (barcodes) from A1 to T20. Therefore, a total of
maximum 400 samples can be separately labeled by these two sets of
adaptors, mixed into one sample run for sequencing and then
separated by its own sample identifier (barcode) from A1 to T20
(FIG. 5). Of course, the actual number of samples loaded into one
sequencing lane may be fewer depending on different situation.
L. Computational Subtraction
[0303] The sequencing data can be analyzed by subtracting fragments
that match human sequences and assembling them into contiguous
sequences for direct comparison with the GenBank databases of
nucleic acids using BLASTN software. By this method, any
non-matching sequences representing potential pathogens can be
enriched and remain in the final dataset.
M. Test of PATHseq Technology in Identification of Non-Human
Sequences and Application in Diagnosis of Human Infection
[0304] The PATHseq technology can be tested by imbedding pathogenic
nucleic acid into human nucleic acid sample at different ratios of
between 1:10.sup.4 and 1:10.sup.10 (pathogenic sequences:human
sequences). Specific pathogenic sequences can then be
preferentially amplified over human sequences. To accurately
calculate the efficiency of this set of primers, quantitative PCR
assay can be used to monitor the fold increase of imbedded
pathogenic sequences. This technology can be used in the following
fields.
[0305] PATHseq can be used for diagnosis of infectious diseases.
The clinical specimens can be collected from routine test samples.
PATHseq can quickly and accurately diagnose clinical specimens from
unknown infections or from those samples with inconclusive
results.
[0306] PATHseq call be used during investigation of infectious
disease outbreaks and contributions to epidemiology studies. NGS
technologies represent an evolving and rapidly changing field, with
the potential to significantly reshape clinical practice and
diagnosis of infectious diseases. NGS technology has been recently
used to investigate outbreaks of drug-resistant bacteria in
hospitals at the NTH Clinical Center and in the UK, as well as an
outbreak of tuberculosis in Canada. Termed "genomic epidemiology,"
the NGS technology has the transformative potential to quickly
pinpoint the origin of emerging diseases, the degree of antibiotic
resistance of microbes, and the speed with which an infection moves
through a population. By applying PATHseq technology in this field,
the diagnosis of human pathogens can be improved by increasing the
accuracy and speed with which the infectious diseases outbreaks can
be characterized.
[0307] PATHseq can be used for blood donor screening. The
development of increasingly sensitive and inclusive laboratory
screening methods for blood products has greatly decreased the risk
of transmission of many known pathogens through blood transfusion.
However, transfusion is still not risk-free and the emergence of
new pathogens continues to be potential threats to the blood
supply. There are some limitations in the current screening test
system for preventing transfusion-transmitted infections: 1)
Current system requires some prior knowledge of the pathogen under
investigation and therefore, cannot detect previously unknown
pathogens; 2) Current system does not test for all potentially
human pathogens due to the consideration of cost- and
labor-efficiency; 3) Current system is already heavily burdened
with multiple testing steps and more screening tests are likely to
be added to the requirements in the future; 4) Current system is
reactive to the occurrence of emerging pathogens after some
patients are actually harmed and has a substantial delay before
implementation; 5) Many donors who do not pose a risk to patients
are temporarily or permanently deferred because of the
impreciseness of current screening system which is based on risk
evaluation of different groups. It is clear that a revolutionary
screening test needs to be developed to sufficiently protect the
safety of the blood supply while retain cost- and labor-efficiency.
PATHseq can be a universal, one-step, and unbiased blood testing
method to detect potential harmful pathogens (known or unknown) and
guarantee a safe blood supply.
N. Preferential Amplification of Pathogenic Sequences
[0308] 1. Human Transcriptome
[0309] Human transcriptome data was obtained from publicly
available ENCODE (Encyclopedia of DNA Elements) database
(http://encodeproject.org/ENCODE/). The total protein-coding human
transcripts (>200 bp polyadenylated RNAs) detected by this study
was 59,822, representing 18,939 genes expressed in at least one
cell line (total studied cell lines are 14). The total human large
transcripts identified in the cytosol of GM12878 cells (a cell line
that contributed most to the ENCODE database) were 161,999. Among
these, 86,248 transcripts were reproducible in the sequencing
duplicate2. Therefore, 86,248 transcripts were defined as total
human transcriptome. Previously, the Mammalian Gene Collection
(MGC), a trans-NIH initiative, has identified a total of 29,818
human full-length protein-coding cDNA clones, representing 17,592
non-redundant human genes. MGC cDNA clones were obtained by
screening of cDNA libraries, by transcript-specific RT-PCR cloning,
and by DNA synthesis of cDNA inserts. The MGC database was also
used in the study.
[0310] 2. The Most Abundant Human Transcripts
[0311] The most abundant human transcripts were based on the
sequencing data from cytosol portion of cell line GM12878, as
calculated by RPKM (reads per kilobase of transcript per million
mapped reads). As shown in Table 2, the top 1,000 and 2,000 most
abundant human transcripts consist of 65.52% and 72.40% of total
human transcripts, respectively.
[0312] 3. Shortest Unmatched Sequences within Human Transcripts
[0313] A total of 86,248 transcripts were generated as human
transcriptome based on ENCODE data in GM128782. To find out the
shortest unmatched sequences within human transcripts, all shortest
k-mers (substrings of length k) were then counted in human
transcriptome sequencing data using a computer program described
previously, with some modification. The computer program counted
k-mers for a given k or the size of substring in a DNA sequence. It
started from k=1, and checked if all of the possible k-mers
occurred at least once. It stopped when it reached a k value where
there was at least one k-mers that was not found as a substring in
the set of transcripts. Table 4 lists the number of k-mer not found
in human transcripts. FIG. 9 lists the percent of the sequences in
the Virome covered by the oligos. For example, FIG. 9 shows that
the top 10,000 human transcripts can be eliminated by using the 179
10-mers in FIG. 10 while covering all (100%) of known human
viruses. By using the 171 10-mers in FIG. 11 the top 20,000 human
transcripts can be eliminated while covering 95.477% of known human
viruses.
TABLE-US-00004 TABLE 4 Number of k-mer not found in human
transcripts Top Top Top Top Top Top Top Top Top Top Top All 1000
2000 3000 4000 5000 6000 7000 8000 9000 10000 20000 86248 8-mer 329
44 9 4 2 1 1 0 0 0 0 0 9-mer 23473 8883 4402 2411 1493 953 651 455
347 249 28 1 10-mer 351888 203816 139254 100542 76937 60510 49753
41374 35737 30336 10053 1075
[0314] 4. Construction of cDNA Libraries and Subtractive
Hybridization
[0315] A clinical specimen with bronchitis & pulmonary
inflammation was collected. Total RNA was extracted using Qiagen's
RNease Mini Kit (Catalog number: 74106, Qiagen, Valencia, Calif.)
according to the product's protocol. cDNA library was constructed
using NEB's T7 Quick High Yield RNA Synthesis Kit (Catalog number:
E2050S, New England Biolabs, Ipswich, Mass.) with the following
modification. First, the random primer mix in the kit was replaced
by 88 8-mer oligonucleotides listed on Table 8. Second, the oligo
d(T)23VN was replaced by a custom poly d(T) oligo, T7-d(T)18VN,
which adds a T7 promoter/primer sequence to the 5'-end of poly d(T)
sequence (5'-ACGGCCTAATACGACTCACTATAGGGTTTTTTTTTTTTTTTTTTVN-3'; SEQ
ID NO:41) (FIG. 2). Third, M-MuLV Reverse Transcriptase was
included in the reaction. Normal human cDNA library was constructed
from peripheral blood mononuclear cells (PBMCs) with the same
procedure.
[0316] Subtractive hybridization of cDNAs from clinical specimen
against human normal cDNAs was carried out according to the
protocol published previously with some modification (FIG. 1).
Briefly, two enzymes (M-MuLV Reverse Transcriptase and T7 RNA
polymerase) were used in the reaction, in addition to the custom
primer, T7-d(T)18VN. Since M-MuLV Reverse Transcriptase possess
RNAse H activity, the human transcripts resulting from the RNA-DNA
duplex were degraded by M-MuLV Reverse Transcriptase. Each of the
newly produced human or pathogen transcripts re-entered the
subtractive hybridization process and served as a template for a
new round of reverse transcription. The circulation of antisense
transcripts resulted in exponential amplification of pathogenic
transcripts (FIG. 1).
[0317] 5. Next Generation Sequencing
[0318] NGS was performed using BGI sequencing service (BGI
Americas, Cambridge, Mass.). Briefly, sample DNA was sheared into
small fragments by nebulization. Overhangs resulting from
fragmentation were filled into blunt ends by T4 DNA polymerase,
Klenow fragment, and T4 polynucleotide kinase. After adding "A"
base to the 3'-ends of the blunt phosphorylated DNA fragments,
adapters were ligated to the ends of the small DNA fragments. Too
short fragments were removed with Ampurc beads (Beckman Coulter,
Inc., Indianapolis Ind.) and the qualified DNA library was used for
the sequencing.
[0319] Sequencing was carried out via the Illumina HiSeq2000
(Illumina, San Diego, Calif.). Read length was 101 PE (paired
ends). Output was set to 3 Gb clean data. The actual raw reads was
3,206.25 Mb. The high quality reads was 3,100 Mb, representing
approximately 15 million PE reads with 97% clean data.
[0320] 6. Sequencing Data Analysis
[0321] Raw sequencing data was filtered by in-house scripts: 1)
Remove reads with 3 N; 2) Remove reads contaminated by adapter
(default: 15 bases overlapped by reads and adapter); 3) Remove
reads with a certain proportion of low quality (20) bases (40% as
default, parameter setting at 36 bp); 4) Remove duplication
contamination.
[0322] Using a computer program called STAR, quality sequencing
reads (approximately 15 million PE reads) were aligned against the
human genome primarily assembled from ENSEMBL
(uswest.ensembl.org/index.html). Reads aligned on multiple loci in
the reference human genome were also considered as unmapped reads,
and filtered out, which could reduce false positive rate. To obtain
longer non-human origin reads, 729,313 unaligned reads were further
assembled using a de novo assembly computer program named Trinity,
resulting in 2067 contig sequences. The de novo assembled unaligned
sequences were blasted against the nucleotide sequence database
known as NCBI "nr" database. Finally, the assembled sequences were
identified using the NCBI genomic BLAST database for "Microbes"
including bacteria, fungi, and viruses.
[0323] 7. 9-Mer Non-Human Target Primers for the Construction of
cDNA Library
[0324] In order to get rid of all human transcripts, or as many as
possible from a clinical sample, a set of 9-mer oligos can be used.
As shown in Table 7, there is one 9-mer oligo that does not match
the full length of all human transcripts. However, this oligo is
not ideal because the likelihood of finding a match for a 9-mer
oligo in a random sequence is 4.sup.9.times.9=2,359,296, which is
larger than the genome size of many human pathogens. That means, by
using this 9-mer oligo, even though it can get rid of all human
transcript, it is less likely to pick up (amplify) most potential
pathogen sequences.
[0325] To make this strategy work, the 3'-end sequences of all
human transcripts are focused on. As shown in Table 6, there are
197 9-mer oligos that do not match the sequences from 500 bp
upstream to the 3'-end of all human transcripts. In order to
control the cDNA length to approximately 500 bp, a mixture of ddNTP
(such as 1% ddATP, although this is adjustable) can be used with
normal dNTP in the construction of first strand cDNA library. ddNTP
lacks the OH needed to continue the elongation of the DNA strand.
When ddATP is added to the reaction, the elongation of the strand
will stop once the ddATP is added to the new strand. Using this set
of 9-mer oligos, the likelihood to find a match in a non-human
sequence is 4.sup.9.times.9/197=11,976 bp. Most human pathogens
have genome sizes larger, therefore, this strategy is feasible.
Construction of a cDNA library can follow the same procedure as
described above except 1% of ddATP (2',3'-Dideoxyadenosine
5'-Triphosphate, 100 MM Solution, GE Healthcare, Catalog number
27-2051-01) is added into dNTP solution.
[0326] It is understood that the disclosed method and compositions
are not limited to the particular methodology, protocols, and
reagents described as these may vary. It is also to be understood
that the terminology used herein is for the purpose of describing
particular embodiments only, and is not intended to limit the scope
of the present invention, which will be limited only by the
appended claims.
[0327] Those skilled in the art will recognize, or be able to
ascertain using no more than routine experimentation, many
equivalents to the specific embodiments of the method and
compositions described herein. Such equivalents are intended to be
encompassed by the following claims.
REFERENCES
[0328] 1 Mctzkcr, M. L. Sequencing technologies--the next
generation. Nat Rev Gcnct 11, 31-46, doi:10.1038/nrg2626 (2010).
[0329] 2 Shendure, J. & Ji, H. Next-generation DNA sequencing.
Nat Biotechnol 26, 1135-1145, 1486 (2008). [0330] 3 Djebali, S. et
al. Landscape of transcription in human cells. Nature 489, 101-108,
nature11233 (2012). [0331] 4 Shendure, J. & Lieberman Aiden, E.
The expanding scope of DNA sequencing. Nat Biotechnol 30,
1084-1094, doi:10.1038/nbt.2421 (2012). [0332] 5 Feng, H., Shuda,
M., Chang, Y. & Moore, P. S. Clonal integration of a
polyomavirus in human Merkel cell carcinoma. Science 319,
1096-1100, doi:10.1126/science.1152586 (2008). [0333] 6 Palacios,
G. et al. A New Arenavirus in a Cluster of Fatal
Transplant-Associated Diseases. The New England journal of medicine
(2008). [0334] 7 Consortium, E. P. et al. An integrated
encyclopedia of DNA elements in the human genome. Nature 489,
57-74, doi:10.1038/nature11247 (2012). [0335] 8 Gonzalez-Porta, M.,
Frankish, A., Rung, J., Harrow, J. & Brazma, A. Transcriptome
analysis of human tissues and cell lines reveals one dominant
transcript per gene. Genome Biol 14, R70,
doi:10.1186/gb-2013-14-7-r70 (2013). [0336] 9 Lander, E. S. et al.
Initial sequencing and analysis of the human genome. Nature 409,
860-921, doi:10.1038/35057062 (2001). [0337] 10 Feng, H. et al.
Human transcriptome subtraction by using short sequence tags to
search for tumor viruses in conjunctival carcinoma. J Virol 81,
11332-11340 (2007). [0338] 11 Wheeler, D. A. et al. The complete
genome of an individual by massively parallel DNA sequencing.
Nature 452, 872-876 (2008). 12 von Bubnoff, A. Next-generation
sequencing: the race is on. Cell 132, 721-723 (2008). [0339] 13
Mardis, E. R. The impact of next-generation sequencing technology
on genetics. [0340] Trends Genet 24, 133-141 (2008). [0341] 14
Schuster, S. C. Next-generation sequencing transforms today's
biology. Nat Methods 5, 16-18 (2008). [0342] 15 Shaffer, C.
Next-generation sequencing outpaces expectations. Nat Biotechnol
25, 149 (2007). [0343] 16 Mardis, E. R. Anticipating the 1,000
dollar genome. Genome Biol 7, 112 (2006). [0344] 17 Metzker, M. L.
Emerging technologies in DNA sequencing. Genome Res 15, 1767-1776
(2005). [0345] 18 Glenn, T. C. Field guide to next-generation DNA
sequencers. Mol Ecol Resour 11, 759-769,
doi:10.1111/j.1755-0998.2011.03024.x (2011). [0346] 19 Weber, G.,
Shendure, J., Tanenbaum, D. M., Church, G. M. & Meyerson, M.
[0347] Identification of foreign gene sequences by transcript
filtering against the human genome. Nat Genet 30, 141-142 (2002).
[0348] 20 Xu, Y. et al. Pathogen discovery from human tissue by
sequence-based computational subtraction. Genomics 81, 329-335
(2003). [0349] 21 Huang, X. & Madan, A. CAP3: A DNA sequence
assembly program. Genome Res 9, 868-877 (1999). [0350] 22 Li, W.
& Godzik, A. Cd-hit: a fast program for clustering and
comparing large sets of protein or nucleotide sequences.
Bioinformatics 22, 1658-1659 (2006). [0351] 23 Schmieder, R. &
Edwards, R. Quality control and preprocessing of metagenomic
datasets. Bioinformatics 27, 863-864,
doi:10.1093/bioinformatics/btr026 (2011). [0352] 24 Dobin, A. et
al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29,
15-21, doi:10.1093/bioinformatics/bts635 (2013). [0353] 25
Grabherr, M. G. et al. Full-length transcriptome assembly from
RNA-Seq data without a reference genome. Nat Biotechnol 29,
644-652, doi:10.1038/nbt.1883 (2011). [0354] 26 Morgulis, A. et al.
Database indexing for production MegaBLAST searches. [0355]
Bioinformatics 24, 1757-1764, doi:10.1093/bioinformatics/btn322
(2008). [0356] 27 Sayers, E. W. et al. Database resources of the
National Center for Biotechnology Information. Nucleic Acids Rcs
38, D5-16, doi:10.1093/nar/gkp967 (2010). [0357] 28 Kupferschmidt,
K. Epidemiology. Outbreak detectives embrace the genome era. [0358]
Science 333, 1818-1819, doi:10.1126/science.333.6051.1818 (2011).
[0359] 29 Alter, H. J., Stramer, S. L. & Dodd, R. Y. Emerging
infectious diseases that threaten the blood supply. Semin Hematol
44, 32-41 (2007). [0360] 30 Holland, P. V. Viral infections and the
blood supply. The NEJM 334, 1734-1735 (1996). [0361] 31 Schreiber,
G. B., Busch, M. P., Kleinman, S. H. & Korelitz, J. J. The risk
of transfusion-transmitted viral infections. The Retrovirus
Epidemiology Donor Study. The New England journal of medicine 334,
1685-1690 (1996). [0362] 32. Consortium, E. P. et al. An integrated
encyclopedia of DNA elements in the human genome. Nature 489,
57-74, doi:10.1038/nature11247 (2012). [0363] 33 Djebali, S. et al.
Landscape of transcription in human cells. Nature 489, 101-108,
doi:10.1038/nature11233 (2012). [0364] 34 Gerhard, D. S. et al. The
status, quality, and expansion of the NIH full-length cDNA project:
the Mammalian Gene Collection (MGC). Genome research 14, 2121-2127,
doi:10.1101/gr.2596504 (2004). [0365] 35 Strausberg, R. L. et al.
Generation and initial analysis of more than 15,000 full-length
human and mouse cDNA sequences. Proceedings of the National Academy
of Sciences of the United States of America 99, 16899-16903,
doi:10.1073/pnas.242603899 (2002). [0366] 36 Strausberg, R. L.,
Feingold, E. A., Klausner, R. D. & Collins, F. S. The mammalian
gene collection. Science 286, 455-457 (1999). [0367] 37 Team, M. G.
C. P. et al. The completion of the Mammalian Gene Collection (MGC).
Genome research 19, 2324-2333, doi:10.1101/gr.095976.109 (2009).
[0368] 38 Marcais, G. & Kingsford, C. A fast, lock-free
approach for efficient parallel counting of occurrences of k-mers.
Bioinformatics 27, 764-770, doi:10.1093/bioinformatics/btr011
(2011). [0369] 39 Rizk, G., Lavenier, D. & Chikhi, R. DSK:
k-mer counting with very low memory usage. Bioinformatics 29,
652-653, doi:10.1093/bioinformatics/btt020 (2013). [0370] 40 Chen,
J. Serial analysis of binding elements for human transcription
factors. Nat Protoc 1, 1481-1493 (2006). [0371] 41 Dobin, A. et al.
STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29,
15-21, doi:10.1093/bioinformatics/bts635 (2013). [0372] 42
Grabhcrr, M. G. et al. Full-length transcriptome assembly from
RNA-Scq data without a reference genome. Nat Biotechnol 29,
644-652, doi:10.1038/nbt.1883 (2011). [0373] 43 Sayers, E. W. et
al. Database resources of the National Center for Biotechnology
Information. Nucleic Acids Res 38, D5-16, doi:10.1093/nar/gkp967
(2010).
Sequence CWU 1
1
336110DNAArtificial Sequencesynthetic construct; oligonucleotide
1acgcgtatga 10210DNAArtificial Sequencesynthetic construct;
oligonucleotide 2acgtagcgtg 10310DNAArtificial Sequencesynthetic
construct; oligonucleotide 3atacgcgact 10410DNAArtificial
Sequencesynthetic construct; oligonucleotide 4atcgacgcaa
10510DNAArtificial Sequencesynthetic construct; oligonucleotide
5atcgttcgac 10610DNAArtificial Sequencesynthetic construct;
oligonucleotide 6attcgatcgc 10710DNAArtificial Sequencesynthetic
construct; oligonucleotide 7ccgtcgaagt 10810DNAArtificial
Sequencesynthetic construct; oligonucleotide 8cgaacgaatc
10910DNAArtificial Sequencesynthetic construct; oligonucleotide
9cgacgtattg 101010DNAArtificial Sequencesynthetic construct;
oligonucleotide 10cgatacgttc 101110DNAArtificial Sequencesynthetic
construct; oligonucleotide 11cgatctaaca 101210DNAArtificial
Sequencesynthetic construct; oligonucleotide 12cgattcggtt
101310DNAArtificial Sequencesynthetic construct; oligonucleotide
13cgcccgttaa 101410DNAArtificial Sequencesynthetic construct;
oligonucleotide 14cgcgatagtg 101510DNAArtificial Sequencesynthetic
construct; oligonucleotide 15cgcgtgttat 101610DNAArtificial
Sequencesynthetic construct; oligonucleotide 16cggatcgtta
101710DNAArtificial Sequencesynthetic construct; oligonucleotide
17cggtacgcat 101810DNAArtificial Sequencesynthetic construct;
oligonucleotide 18cggtcgtaga 101910DNAArtificial Sequencesynthetic
construct; oligonucleotide 19cgtaacgact 102010DNAArtificial
Sequencesynthetic construct; oligonucleotide 20cgtaactagg
102110DNAArtificial Sequencesynthetic construct; oligonucleotide
21cgtaatacgt 102210DNAArtificial Sequencesynthetic construct;
oligonucleotide 22cgtaatcggt 102310DNAArtificial Sequencesynthetic
construct; oligonucleotide 23cgtacaaacg 102410DNAArtificial
Sequencesynthetic construct; oligonucleotide 24cgtacgaaac
102510DNAArtificial Sequencesynthetic construct; oligonucleotide
25cgtacgttag 102610DNAArtificial Sequencesynthetic construct;
oligonucleotide 26gcgcgatagg 102710DNAArtificial Sequencesynthetic
construct; oligonucleotide 27gcgcgtaaat 102810DNAArtificial
Sequencesynthetic construct; oligonucleotide 28gtacgcgact
102910DNAArtificial Sequencesynthetic construct; oligonucleotide
29gtcgaacgag 103010DNAArtificial Sequencesynthetic construct;
oligonucleotide 30taacgtatcg 103110DNAArtificial Sequencesynthetic
construct; oligonucleotide 31taacgtcggc 103210DNAArtificial
Sequencesynthetic construct; oligonucleotide 32tacgcgattg
103310DNAArtificial Sequencesynthetic construct; oligonucleotide
33tagcgaacgc 103410DNAArtificial Sequencesynthetic construct;
oligonucleotide 34tagcgacgca 103510DNAArtificial Sequencesynthetic
construct; oligonucleotide 35tatgcgacgc 103610DNAArtificial
Sequencesynthetic construct; oligonucleotide 36tcgatcggtg
103710DNAArtificial Sequencesynthetic construct; oligonucleotide
37tcgcgaaatt 103810DNAArtificial Sequencesynthetic construct;
oligonucleotide 38tcgcgaatga 103910DNAArtificial Sequencesynthetic
construct; oligonucleotide 39tcgttcgtac 104010DNAArtificial
Sequencesynthetic construct; oligonucleotide 40ttatcgcgca
104146DNAArtificial Sequencesynthetic construct; poly d(T)
oligomisc_feature(46)..(46)n is a, c, g, or t 41acggcctaat
acgactcact atagggtttt tttttttttt ttttvn 464210DNAArtificial
Sequencesynthetic construct; oligonucleotide 42acgccgggtt
104310DNAArtificial Sequencesynthetic construct; oligonucleotide
43tcgcacatcg 104410DNAArtificial Sequencesynthetic construct;
oligonucleotide 44gatgcgttaa 104510DNAArtificial Sequencesynthetic
construct; oligonucleotide 45gcgtggttaa 104610DNAArtificial
Sequencesynthetic construct; oligonucleotide 46cggacgaaaa
104710DNAArtificial Sequencesynthetic construct; oligonucleotide
47ttattcgcgc 104810DNAArtificial Sequencesynthetic construct;
oligonucleotide 48ttattcgcgt 104910DNAArtificial Sequencesynthetic
construct; oligonucleotide 49cgactactaa 105010DNAArtificial
Sequencesynthetic construct; oligonucleotide 50agcttgcgtc
105110DNAArtificial Sequencesynthetic construct; oligonucleotide
51gacccgtaaa 105210DNAArtificial Sequencesynthetic construct;
oligonucleotide 52tagcgaccga 105310DNAArtificial Sequencesynthetic
construct; oligonucleotide 53ccgttgaacg 105410DNAArtificial
Sequencesynthetic construct; oligonucleotide 54actacgatta
105510DNAArtificial Sequencesynthetic construct; oligonucleotide
55tgcgatttcg 105610DNAArtificial Sequencesynthetic construct;
oligonucleotide 56tgtaatccgc 105710DNAArtificial Sequencesynthetic
construct; oligonucleotide 57gcgcagcgat 105810DNAArtificial
Sequencesynthetic construct; oligonucleotide 58cggtctatat
105910DNAArtificial Sequencesynthetic construct; oligonucleotide
59tcggtcgtaa 106010DNAArtificial Sequencesynthetic construct;
oligonucleotide 60gacgcatagg 106110DNAArtificial Sequencesynthetic
construct; oligonucleotide 61gcgcatacgt 106210DNAArtificial
Sequencesynthetic construct; oligonucleotide 62ccgaagtcga
106310DNAArtificial Sequencesynthetic construct; oligonucleotide
63atacgtcgga 106410DNAArtificial Sequencesynthetic construct;
oligonucleotide 64taccggttgc 106510DNAArtificial Sequencesynthetic
construct; oligonucleotide 65tccgattaac 106610DNAArtificial
Sequencesynthetic construct; oligonucleotide 66tccgattaaa
106710DNAArtificial Sequencesynthetic construct; oligonucleotide
67caatacgtac 106810DNAArtificial Sequencesynthetic construct;
oligonucleotide 68gaacaatgcg 106910DNAArtificial Sequencesynthetic
construct; oligonucleotide 69tacacgcgat 107010DNAArtificial
Sequencesynthetic construct; oligonucleotide 70aatcgatcga
107110DNAArtificial Sequencesynthetic construct; oligonucleotide
71gcgcgtctaa 107210DNAArtificial Sequencesynthetic construct;
oligonucleotide 72cgttaagagg 107310DNAArtificial Sequencesynthetic
construct; oligonucleotide 73atcgcctata 107410DNAArtificial
Sequencesynthetic construct; oligonucleotide 74cgtacgcgtc
107510DNAArtificial Sequencesynthetic construct; oligonucleotide
75tatttacggc 107610DNAArtificial Sequencesynthetic construct;
oligonucleotide 76cagaccggat 107710DNAArtificial Sequencesynthetic
construct; oligonucleotide 77taaggcaacg 107810DNAArtificial
Sequencesynthetic construct; oligonucleotide 78ggtacgctat
107910DNAArtificial Sequencesynthetic construct; oligonucleotide
79tatcggtcaa 108010DNAArtificial Sequencesynthetic construct;
oligonucleotide 80cgtacctaga 108110DNAArtificial Sequencesynthetic
construct; oligonucleotide 81attcgacgct 108210DNAArtificial
Sequencesynthetic construct; oligonucleotide 82agctcgatag
108310DNAArtificial Sequencesynthetic construct; oligonucleotide
83gttggcgaaa 108410DNAArtificial Sequencesynthetic construct;
oligonucleotide 84cgaacaacta 108510DNAArtificial Sequencesynthetic
construct; oligonucleotide 85gcacggtatc 108610DNAArtificial
Sequencesynthetic construct; oligonucleotide 86tgcgtaacta
108710DNAArtificial Sequencesynthetic construct; oligonucleotide
87ggtacgcgac 108810DNAArtificial Sequencesynthetic construct;
oligonucleotide 88acgcgtcata 108910DNAArtificial Sequencesynthetic
construct; oligonucleotide 89gatcacgtaa 109010DNAArtificial
Sequencesynthetic construct; oligonucleotide 90gatcacgtat
109110DNAArtificial Sequencesynthetic construct; oligonucleotide
91ccgtagtacc 109210DNAArtificial Sequencesynthetic construct;
oligonucleotide 92tcaagttacg 109310DNAArtificial Sequencesynthetic
construct; oligonucleotide 93cgcatataca 109410DNAArtificial
Sequencesynthetic construct; oligonucleotide 94aacgttacgt
109510DNAArtificial Sequencesynthetic construct; oligonucleotide
95gtaaccgcga 109610DNAArtificial Sequencesynthetic construct;
oligonucleotide 96gcacgatcga 109710DNAArtificial Sequencesynthetic
construct; oligonucleotide 97atcagttcgg 109810DNAArtificial
Sequencesynthetic construct; oligonucleotide 98cgatatcgga
109910DNAArtificial Sequencesynthetic construct; oligonucleotide
99gttttccggg 1010010DNAArtificial Sequencesynthetic construct;
oligonucleotide 100gttacgcgac 1010110DNAArtificial
Sequencesynthetic construct; oligonucleotide 101ataccggtcg
1010210DNAArtificial Sequencesynthetic construct; oligonucleotide
102tcgcatgggt 1010310DNAArtificial Sequencesynthetic construct;
oligonucleotide 103tgacgtacgg 1010410DNAArtificial
Sequencesynthetic construct; oligonucleotide 104attgcgcttt
1010510DNAArtificial Sequencesynthetic construct; oligonucleotide
105tgacgcaatg 1010610DNAArtificial Sequencesynthetic construct;
oligonucleotide 106tatcgcacta 1010710DNAArtificial
Sequencesynthetic construct; oligonucleotide 107agtgcggtaa
1010810DNAArtificial Sequencesynthetic construct; oligonucleotide
108ctaataagcg 1010910DNAArtificial Sequencesynthetic construct;
oligonucleotide 109ataatccgat 1011010DNAArtificial
Sequencesynthetic construct; oligonucleotide 110cgagtaagta
1011110DNAArtificial Sequencesynthetic construct; oligonucleotide
111gtacgacaaa 1011210DNAArtificial Sequencesynthetic construct;
oligonucleotide 112tcaggcgtaa 1011310DNAArtificial
Sequencesynthetic construct; oligonucleotide 113tcgtaacgct
1011410DNAArtificial Sequencesynthetic construct; oligonucleotide
114gcgtggataa 1011510DNAArtificial Sequencesynthetic construct;
oligonucleotide 115tggaacgccg 1011610DNAArtificial
Sequencesynthetic construct; oligonucleotide 116caacgatcaa
1011710DNAArtificial Sequencesynthetic construct; oligonucleotide
117aaggtcgacg 1011810DNAArtificial Sequencesynthetic construct;
oligonucleotide 118tacttacgga 1011910DNAArtificial
Sequencesynthetic construct; oligonucleotide 119cataacgcac
1012010DNAArtificial Sequencesynthetic construct; oligonucleotide
120gtcggtgaaa 1012110DNAArtificial Sequencesynthetic construct;
oligonucleotide 121gcgattgcga 1012210DNAArtificial
Sequencesynthetic construct; oligonucleotide 122ccgtaactta
1012310DNAArtificial Sequencesynthetic construct; oligonucleotide
123acgcaagaca 1012410DNAArtificial Sequencesynthetic construct;
oligonucleotide 124tcgtcccgat 1012510DNAArtificial
Sequencesynthetic construct; oligonucleotide 125ggcataatcg
1012610DNAArtificial
Sequencesynthetic construct; oligonucleotide 126acgtattcta
1012710DNAArtificial Sequencesynthetic construct; oligonucleotide
127cgcgtacgag 1012810DNAArtificial Sequencesynthetic construct;
oligonucleotide 128cgacttatcg 1012910DNAArtificial
Sequencesynthetic construct; oligonucleotide 129agtcgagtac
1013010DNAArtificial Sequencesynthetic construct; oligonucleotide
130ttcgtaacga 1013110DNAArtificial Sequencesynthetic construct;
oligonucleotide 131ctataatcgg 1013210DNAArtificial
Sequencesynthetic construct; oligonucleotide 132ctatcgatag
1013310DNAArtificial Sequencesynthetic construct; oligonucleotide
133aaccaatccg 1013410DNAArtificial Sequencesynthetic construct;
oligonucleotide 134tcgtgagtta 1013510DNAArtificial
Sequencesynthetic construct; oligonucleotide 135gctctcaacg
1013610DNAArtificial Sequencesynthetic construct; oligonucleotide
136ttatcggtca 1013710DNAArtificial Sequencesynthetic construct;
oligonucleotide 137tcatgtcgtt 1013810DNAArtificial
Sequencesynthetic construct; oligonucleotide 138cgtatacaag
1013910DNAArtificial Sequencesynthetic construct; oligonucleotide
139cgcaatagaa 1014010DNAArtificial Sequencesynthetic construct;
oligonucleotide 140tatctcgatc 1014110DNAArtificial
Sequencesynthetic construct; oligonucleotide 141catttgcgca
1014210DNAArtificial Sequencesynthetic construct; oligonucleotide
142cggtacgacg 1014310DNAArtificial Sequencesynthetic construct;
oligonucleotide 143taccccccgt 1014410DNAArtificial
Sequencesynthetic construct; oligonucleotide 144atgtaaccgt
1014510DNAArtificial Sequencesynthetic construct; oligonucleotide
145cgcatcatta 1014610DNAArtificial Sequencesynthetic construct;
oligonucleotide 146caatcgttga 1014710DNAArtificial
Sequencesynthetic construct; oligonucleotide 147tccgaatagg
1014810DNAArtificial Sequencesynthetic construct; oligonucleotide
148cgggtcgcaa 1014910DNAArtificial Sequencesynthetic construct;
oligonucleotide 149gatattcgca 1015010DNAArtificial
Sequencesynthetic construct; oligonucleotide 150attcgcgcat
1015110DNAArtificial Sequencesynthetic construct; oligonucleotide
151tatcgaggtt 1015210DNAArtificial Sequencesynthetic construct;
oligonucleotide 152acgttatagc 1015310DNAArtificial
Sequencesynthetic construct; oligonucleotide 153ataggggcgt
1015410DNAArtificial Sequencesynthetic construct; oligonucleotide
154aggttgcgac 1015510DNAArtificial Sequencesynthetic construct;
oligonucleotide 155cgacgttgca 1015610DNAArtificial
Sequencesynthetic construct; oligonucleotide 156gtcttcggta
1015710DNAArtificial Sequencesynthetic construct; oligonucleotide
157agcgcaatca 1015810DNAArtificial Sequencesynthetic construct;
oligonucleotide 158tgttatgcga 1015910DNAArtificial
Sequencesynthetic construct; oligonucleotide 159gtcatacgta
1016010DNAArtificial Sequencesynthetic construct; oligonucleotide
160cgtaattatg 1016110DNAArtificial Sequencesynthetic construct;
oligonucleotide 161tcgcaaaata 1016210DNAArtificial
Sequencesynthetic construct; oligonucleotide 162tagcaccgcc
1016310DNAArtificial Sequencesynthetic construct; oligonucleotide
163gtctaaacga 1016410DNAArtificial Sequencesynthetic construct;
oligonucleotide 164tagcacgcca 1016510DNAArtificial
Sequencesynthetic construct; oligonucleotide 165ggcgtttagc
1016610DNAArtificial Sequencesynthetic construct; oligonucleotide
166ctacgattat 1016710DNAArtificial Sequencesynthetic construct;
oligonucleotide 167ggtacggtta 1016810DNAArtificial
Sequencesynthetic construct; oligonucleotide 168atagagtcgg
1016910DNAArtificial Sequencesynthetic construct; oligonucleotide
169cgttatgggt 1017010DNAArtificial Sequencesynthetic construct;
oligonucleotide 170cggacataaa 1017110DNAArtificial
Sequencesynthetic construct; oligonucleotide 171tcgaaccggc
1017210DNAArtificial Sequencesynthetic construct; oligonucleotide
172acaagtcgca 1017310DNAArtificial Sequencesynthetic construct;
oligonucleotide 173cgatccctat 1017410DNAArtificial
Sequencesynthetic construct; oligonucleotide 174tagacgacca
1017510DNAArtificial Sequencesynthetic construct; oligonucleotide
175tagtcggaat 1017610DNAArtificial Sequencesynthetic construct;
oligonucleotide 176attcgcagtt 1017710DNAArtificial
Sequencesynthetic construct; oligonucleotide 177cttcgttagc
1017810DNAArtificial Sequencesynthetic construct; oligonucleotide
178cacgaacaac 1017910DNAArtificial Sequencesynthetic construct;
oligonucleotide 179tcgagatacg 1018010DNAArtificial
Sequencesynthetic construct; oligonucleotide 180ttgacccgta
1018110DNAArtificial Sequencesynthetic construct; oligonucleotide
181tcgtaatcaa 1018210DNAArtificial Sequencesynthetic construct;
oligonucleotide 182ctcgaccaaa 1018310DNAArtificial
Sequencesynthetic construct; oligonucleotide 183aatactcgag
1018410DNAArtificial Sequencesynthetic construct; oligonucleotide
184atgttttgcg 1018510DNAArtificial Sequencesynthetic construct;
oligonucleotide 185tgtcgcgtca 1018610DNAArtificial
Sequencesynthetic construct; oligonucleotide 186tcccataccg
1018710DNAArtificial Sequencesynthetic construct; oligonucleotide
187tagcgagtag 1018810DNAArtificial Sequencesynthetic construct;
oligonucleotide 188ctaaacccgc 1018910DNAArtificial
Sequencesynthetic construct; oligonucleotide 189gatgaattcg
1019010DNAArtificial Sequencesynthetic construct; oligonucleotide
190cgattgtact 1019110DNAArtificial Sequencesynthetic construct;
oligonucleotide 191cgcgaaatgg 1019210DNAArtificial
Sequencesynthetic construct; oligonucleotide 192ctatacgcaa
1019310DNAArtificial Sequencesynthetic construct; oligonucleotide
193tacacgatat 1019410DNAArtificial Sequencesynthetic construct;
oligonucleotide 194agcgcacgta 1019510DNAArtificial
Sequencesynthetic construct; oligonucleotide 195ccgtcaaacg
1019610DNAArtificial Sequencesynthetic construct; oligonucleotide
196attctcgtcg 1019710DNAArtificial Sequencesynthetic construct;
oligonucleotide 197ggcgcgatac 1019810DNAArtificial
Sequencesynthetic construct; oligonucleotide 198catattgcgt
1019910DNAArtificial Sequencesynthetic construct; oligonucleotide
199ccgcggtaag 1020010DNAArtificial Sequencesynthetic construct;
oligonucleotide 200cacgattgat 1020110DNAArtificial
Sequencesynthetic construct; oligonucleotide 201gtctagacgc
1020210DNAArtificial Sequencesynthetic construct; oligonucleotide
202cgttacgctt 1020310DNAArtificial Sequencesynthetic construct;
oligonucleotide 203tacgtcgagt 1020410DNAArtificial
Sequencesynthetic construct; oligonucleotide 204ggcataatcg
1020510DNAArtificial Sequencesynthetic construct; oligonucleotide
205ggcgcatata 1020610DNAArtificial Sequencesynthetic construct;
oligonucleotide 206taactcgtgg 1020710DNAArtificial
Sequencesynthetic construct; oligonucleotide 207cgcggtatac
1020810DNAArtificial Sequencesynthetic construct; oligonucleotide
208atgcgcgacg 1020910DNAArtificial Sequencesynthetic construct;
oligonucleotide 209cgttatccgc 1021010DNAArtificial
Sequencesynthetic construct; oligonucleotide 210aacgttacgt
1021110DNAArtificial Sequencesynthetic construct; oligonucleotide
211gcgtaactag 1021210DNAArtificial Sequencesynthetic construct;
oligonucleotide 212gcgtaactaa 1021310DNAArtificial
Sequencesynthetic construct; oligonucleotide 213tgcgccgaac
1021410DNAArtificial Sequencesynthetic construct; oligonucleotide
214atcagttcgg 1021510DNAArtificial Sequencesynthetic construct;
oligonucleotide 215cgatatcgga 1021610DNAArtificial
Sequencesynthetic construct; oligonucleotide 216ataccggtcg
1021710DNAArtificial Sequencesynthetic construct; oligonucleotide
217tcgaggttac 1021810DNAArtificial Sequencesynthetic construct;
oligonucleotide 218ctcgtaccta 1021910DNAArtificial
Sequencesynthetic construct; oligonucleotide 219tatcgcacta
1022010DNAArtificial Sequencesynthetic construct; oligonucleotide
220aagtctaacg 1022110DNAArtificial Sequencesynthetic construct;
oligonucleotide 221gtatcacgcg 1022210DNAArtificial
Sequencesynthetic construct; oligonucleotide 222taagacgggg
1022310DNAArtificial Sequencesynthetic construct; oligonucleotide
223tactatcgac 1022410DNAArtificial Sequencesynthetic construct;
oligonucleotide 224cgagtaagta 1022510DNAArtificial
Sequencesynthetic construct; oligonucleotide 225cggtaagcgc
1022610DNAArtificial Sequencesynthetic construct; oligonucleotide
226ctatcgatag 1022710DNAArtificial Sequencesynthetic construct;
oligonucleotide 227atatatcgaa 1022810DNAArtificial
Sequencesynthetic construct; oligonucleotide 228ttatcgcgag
1022910DNAArtificial Sequencesynthetic construct; oligonucleotide
229tattggatcg 1023010DNAArtificial Sequencesynthetic construct;
oligonucleotide 230gtaagcgtag 1023110DNAArtificial
Sequencesynthetic construct; oligonucleotide 231ccgctgatac
1023210DNAArtificial Sequencesynthetic construct; oligonucleotide
232gttcgcacta 1023310DNAArtificial Sequencesynthetic construct;
oligonucleotide 233gcgtcgaact 1023410DNAArtificial
Sequencesynthetic construct; oligonucleotide 234ttcgactagt
1023510DNAArtificial Sequencesynthetic construct; oligonucleotide
235gatgcgttaa 1023610DNAArtificial Sequencesynthetic construct;
oligonucleotide 236cggacgaaaa 1023710DNAArtificial
Sequencesynthetic construct; oligonucleotide 237tcggtcgtaa
1023810DNAArtificial Sequencesynthetic construct; oligonucleotide
238taccggttgc 1023910DNAArtificial Sequencesynthetic construct;
oligonucleotide 239aatagggcga 1024010DNAArtificial
Sequencesynthetic construct; oligonucleotide 240aatcgatcga
1024110DNAArtificial Sequencesynthetic construct; oligonucleotide
241gcgcgtctaa 1024210DNAArtificial Sequencesynthetic construct;
oligonucleotide 242ggtacgctat 1024310DNAArtificial
Sequencesynthetic construct; oligonucleotide 243tatcggtcaa
1024410DNAArtificial Sequencesynthetic construct; oligonucleotide
244cgtacctaga 1024510DNAArtificial Sequencesynthetic construct;
oligonucleotide 245ggtacgcgac 1024610DNAArtificial
Sequencesynthetic construct; oligonucleotide 246cggtacgacg
1024710DNAArtificial Sequencesynthetic construct; oligonucleotide
247ccgcatatag 1024810DNAArtificial Sequencesynthetic construct;
oligonucleotide 248ataatatcgc 1024910DNAArtificial
Sequencesynthetic construct; oligonucleotide 249cttaacgatc
1025010DNAArtificial Sequencesynthetic construct; oligonucleotide
250tcgccatacg 1025110DNAArtificial Sequencesynthetic construct;
oligonucleotide 251tatcgtattg
1025210DNAArtificial Sequencesynthetic construct; oligonucleotide
252gtactcgtag 1025310DNAArtificial Sequencesynthetic construct;
oligonucleotide 253tacgaatgcg 1025410DNAArtificial
Sequencesynthetic construct; oligonucleotide 254acgcttgcgt
1025510DNAArtificial Sequencesynthetic construct; oligonucleotide
255cggttagatc 1025610DNAArtificial Sequencesynthetic construct;
oligonucleotide 256acgataggac 1025710DNAArtificial
Sequencesynthetic construct; oligonucleotide 257cgacccataa
1025810DNAArtificial Sequencesynthetic construct; oligonucleotide
258attcgcgcat 1025910DNAArtificial Sequencesynthetic construct;
oligonucleotide 259ataggggcgt 1026010DNAArtificial
Sequencesynthetic construct; oligonucleotide 260taacatgcga
1026110DNAArtificial Sequencesynthetic construct; oligonucleotide
261atcgtatcga 1026210DNAArtificial Sequencesynthetic construct;
oligonucleotide 262tctacgcatc 1026310DNAArtificial
Sequencesynthetic construct; oligonucleotide 263cgtcatcggt
1026410DNAArtificial Sequencesynthetic construct; oligonucleotide
264cacgattcgt 1026510DNAArtificial Sequencesynthetic construct;
oligonucleotide 265cgtctctcgt 1026610DNAArtificial
Sequencesynthetic construct; oligonucleotide 266tatcgattag
1026710DNAArtificial Sequencesynthetic construct; oligonucleotide
267gttaatacga 1026810DNAArtificial Sequencesynthetic construct;
oligonucleotide 268tcgaaccggc 1026910DNAArtificial
Sequencesynthetic construct; oligonucleotide 269gtctaacgac
1027010DNAArtificial Sequencesynthetic construct; oligonucleotide
270acaagtcgca 1027110DNAArtificial Sequencesynthetic construct;
oligonucleotide 271tatcgtacac 1027210DNAArtificial
Sequencesynthetic construct; oligonucleotide 272gtgcggtatc
1027310DNAArtificial Sequencesynthetic construct; oligonucleotide
273actacgcatt 1027410DNAArtificial Sequencesynthetic construct;
oligonucleotide 274tgacggttcg 1027510DNAArtificial
Sequencesynthetic construct; oligonucleotide 275tacctaaccg
1027610DNAArtificial Sequencesynthetic construct; oligonucleotide
276ataatggtcg 1027710DNAArtificial Sequencesynthetic construct;
oligonucleotide 277actcattccg 1027810DNAArtificial
Sequencesynthetic construct; oligonucleotide 278atcgttaacg
1027910DNAArtificial Sequencesynthetic construct; oligonucleotide
279atgtcgcatc 1028010DNAArtificial Sequencesynthetic construct;
oligonucleotide 280cgtactatta 1028110DNAArtificial
Sequencesynthetic construct; oligonucleotide 281tagtgtcgcc
1028210DNAArtificial Sequencesynthetic construct; oligonucleotide
282ctacggttag 1028310DNAArtificial Sequencesynthetic construct;
oligonucleotide 283taggctagcg 1028410DNAArtificial
Sequencesynthetic construct; oligonucleotide 284cgattcggtt
1028510DNAArtificial Sequencesynthetic construct; oligonucleotide
285gatgcgacta 1028610DNAArtificial Sequencesynthetic construct;
oligonucleotide 286taagttaccg 1028710DNAArtificial
Sequencesynthetic construct; oligonucleotide 287ggtatgcgta
1028810DNAArtificial Sequencesynthetic construct; oligonucleotide
288aaccgaacgt 1028910DNAArtificial Sequencesynthetic construct;
oligonucleotide 289tcacgataca 1029010DNAArtificial
Sequencesynthetic construct; oligonucleotide 290caattgccga
1029110DNAArtificial Sequencesynthetic construct; oligonucleotide
291attaatatcg 1029210DNAArtificial Sequencesynthetic construct;
oligonucleotide 292caatccgtac 1029310DNAArtificial
Sequencesynthetic construct; oligonucleotide 293ggtcgaataa
1029410DNAArtificial Sequencesynthetic construct; oligonucleotide
294cgcaataagg 1029510DNAArtificial Sequencesynthetic construct;
oligonucleotide 295tactttcggt 1029610DNAArtificial
Sequencesynthetic construct; oligonucleotide 296ctcgaccaaa
1029710DNAArtificial Sequencesynthetic construct; oligonucleotide
297gttgacgtat 1029810DNAArtificial Sequencesynthetic construct;
oligonucleotide 298catcgctaga 1029910DNAArtificial
Sequencesynthetic construct; oligonucleotide 299tgtcgcgtca
1030010DNAArtificial Sequencesynthetic construct; oligonucleotide
300cgcgcttatt 1030110DNAArtificial Sequencesynthetic construct;
oligonucleotide 301cgagcatgta 1030210DNAArtificial
Sequencesynthetic construct; oligonucleotide 302tagcgagtag
1030310DNAArtificial Sequencesynthetic construct; oligonucleotide
303gaccatagcg 1030410DNAArtificial Sequencesynthetic construct;
oligonucleotide 304taatcaaccg 1030510DNAArtificial
Sequencesynthetic construct; oligonucleotide 305gcaatcgttg
1030610DNAArtificial Sequencesynthetic construct; oligonucleotide
306attagtcgag 1030710DNAArtificial Sequencesynthetic construct;
oligonucleotide 307cgccgtttga 1030810DNAArtificial
Sequencesynthetic construct; oligonucleotide 308cgtttccgaa
1030910DNAArtificial Sequencesynthetic construct; oligonucleotide
309cgactgatca 1031010DNAArtificial Sequencesynthetic construct;
oligonucleotide 310acgactaatg 1031110DNAArtificial
Sequencesynthetic construct; oligonucleotide 311gttgtccgat
1031210DNAArtificial Sequencesynthetic construct; oligonucleotide
312acttatcgga 1031310DNAArtificial Sequencesynthetic construct;
oligonucleotide 313gaactatcgt 1031410DNAArtificial
Sequencesynthetic construct; oligonucleotide 314tatagtttcg
1031510DNAArtificial Sequencesynthetic construct; oligonucleotide
315atacggacaa 1031610DNAArtificial Sequencesynthetic construct;
oligonucleotide 316taacgctagg 1031710DNAArtificial
Sequencesynthetic construct; oligonucleotide 317cggtccgtat
1031810DNAArtificial Sequencesynthetic construct; oligonucleotide
318ttaaacggta 1031910DNAArtificial Sequencesynthetic construct;
oligonucleotide 319tatcgcgtgt 1032010DNAArtificial
Sequencesynthetic construct; oligonucleotide 320actagggtcg
1032110DNAArtificial Sequencesynthetic construct; oligonucleotide
321tctagcgaat 1032210DNAArtificial Sequencesynthetic construct;
oligonucleotide 322taacatcgcc 1032310DNAArtificial
Sequencesynthetic construct; oligonucleotide 323ttcaatccgg
1032410DNAArtificial Sequencesynthetic construct; oligonucleotide
324ttcgataact 1032510DNAArtificial Sequencesynthetic construct;
oligonucleotide 325accgtctcga 1032610DNAArtificial
Sequencesynthetic construct; oligonucleotide 326tatccgttcg
1032710DNAArtificial Sequencesynthetic construct; oligonucleotide
327cggtgtatat 1032810DNAArtificial Sequencesynthetic construct;
oligonucleotide 328gggcatagcg 1032910DNAArtificial
Sequencesynthetic construct; oligonucleotide 329taagctacgg
1033010DNAArtificial Sequencesynthetic construct; oligonucleotide
330tagaacgcga 1033110DNAArtificial Sequencesynthetic construct;
oligonucleotide 331tgctaatcgc 1033210DNAArtificial
Sequencesynthetic construct; oligonucleotide 332cgaaccgaac
1033310DNAArtificial Sequencesynthetic construct; oligonucleotide
333atcgtatggt 1033410DNAArtificial Sequencesynthetic construct;
oligonucleotide 334tgtcgatcac 1033510DNAArtificial
Sequencesynthetic construct; oligonucleotide 335cgtaccgatg
1033610DNAArtificial Sequencesynthetic construct; oligonucleotide
336atacatgcgg 10
* * * * *
References