U.S. patent application number 17/251923 was filed with the patent office on 2021-07-22 for method for discriminating between live and dead microbes in a sample.
This patent application is currently assigned to PATHOQUEST. The applicant listed for this patent is PATHOQUEST. Invention is credited to Pascale BEURDELEY, Justine CHEVAL, Stephane CRUVEILLER, Marc ELOIT, Erika MUTH.
Application Number | 20210222261 17/251923 |
Document ID | / |
Family ID | 1000005507124 |
Filed Date | 2021-07-22 |
United States Patent
Application |
20210222261 |
Kind Code |
A1 |
ELOIT; Marc ; et
al. |
July 22, 2021 |
METHOD FOR DISCRIMINATING BETWEEN LIVE AND DEAD MICROBES IN A
SAMPLE
Abstract
A method for discriminating between live and dead microbes in a
sample, by discriminating between transcriptionally-active and
inert microbial nucleic acid sequences in the sample. In
particular, the method is based on the comparison of levels of
nucleotide substitution in a sample cultured in presence of an
RNA-labelling agent. Also, a diagnosis method of microbial
infections in a subject; and to methods of assessing the risk of
contamination of a sample, implementing the method for
discriminating between live and dead microbes in a sample.
Inventors: |
ELOIT; Marc; (Paris, FR)
; CHEVAL; Justine; (Le Perreux-sur-Marne, FR) ;
MUTH; Erika; (Bouconvilliers, FR) ; BEURDELEY;
Pascale; (Fontenay-sous-bois, FR) ; CRUVEILLER;
Stephane; (Yerres, FR) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
PATHOQUEST |
Paris |
|
FR |
|
|
Assignee: |
PATHOQUEST
Paris
FR
|
Family ID: |
1000005507124 |
Appl. No.: |
17/251923 |
Filed: |
June 20, 2019 |
PCT Filed: |
June 20, 2019 |
PCT NO: |
PCT/EP2019/066367 |
371 Date: |
December 14, 2020 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
C12Q 1/6806 20130101;
C12Q 1/70 20130101; C12Q 1/6869 20130101; C12Q 1/6888 20130101 |
International
Class: |
C12Q 1/6888 20060101
C12Q001/6888; C12Q 1/6806 20060101 C12Q001/6806; C12Q 1/6869
20060101 C12Q001/6869; C12Q 1/70 20060101 C12Q001/70 |
Foreign Application Data
Date |
Code |
Application Number |
Jun 20, 2018 |
EP |
18305777.7 |
Apr 3, 2019 |
EP |
19305439.2 |
Claims
1-14. (canceled)
15. A method for discriminating between live and dead microbes in a
sample, comprising discriminating between transcriptionally-active
and inert microbial nucleic acid sequences in the sample, wherein
the method comprises the steps of: (a) sequencing a first and a
second set of RNAs extracted from the sample, wherein the first set
of RNAs is obtained by culturing the sample in presence of an
RNA-labelling agent and further by submitting the extracted RNAs to
conditions promoting nucleotide substitution; and the second set of
RNAs is obtained by culturing the sample in absence of an
RNA-labelling agent; thereby obtaining a first and a second set of
sequence reads; (b) comparing the number and/or rate of substituted
nucleotides in the sequence reads mapping against at least one
microbial nucleic acid sequence hit in the first and in the second
set of sequence reads; and (c) concluding that the at least one
microbial nucleic acid sequence hit belongs to a live microbe if
the number and/or rate of substituted nucleotides in the sequence
reads mapping against said at least one microbial nucleic acid
sequence hit in the first set of sequence reads is greater than the
number and/or rate of nucleotides randomly substituted in the
second set of sequence reads.
16. The method according to claim 15, wherein the RNA-labelling
agent is a thiol-labelled RNA precursor.
17. The method according to claim 16, wherein the thiol-labelled
RNA precursor is selected from the group consisting of
4-thiouridine, 2-thiouridine, 2,4-dithiouridine,
2-thio-4-deoxyuridine, 5-carbethoxy-2-thiouridine,
5-carboxy-2-thiouridine, 5-(n-propyl)-2-thiouridine,
6-methyl-2-thiouridine and 6-(n-propyl)-2-thiouridine, thereby
obtaining thiouridine-labelled RNAs.
18. The method according to claim 16, wherein the thiol-labelled
RNA precursor is 4-thiouridine, thereby obtaining
thiouridine-labelled RNAs.
19. The method according to claim 15, wherein conditions promoting
nucleotide substitution comprise chemically modifying the labelled
RNAs, thereby obtaining chemically-modified RNAs; and further
reverse-transcribing said chemically-modified RNAs.
20. The method according to claim 15, wherein conditions promoting
nucleotide substitution comprise chemically modifying the labelled
RNAs by alkylation, oxidative-nucleophilic-aromatic substitution or
osmium-mediated transformation, thereby obtaining
chemically-modified RNAs; and further reverse-transcribing said
chemically-modified RNAs.
21. The method according to claim 15, wherein conditions promoting
nucleotide substitution comprise chemically modifying the labelled
RNAs by alkylation, thereby obtaining chemically-modified RNAs; and
further reverse-transcribing said chemically-modified RNAs.
22. The method according to claim 21, wherein alkylation of the
labelled RNAs is carried out using an alkylating agent selected
from the group consisting of iodoacetamide, iodoacetic acid,
N-ethylmaleimide and 4-vinylpyridine.
23. The method according to claim 21, wherein alkylation of the
labelled RNAs is carried out using iodoacetamide.
24. The method according to claim 15, wherein the step of
sequencing the first and the second set of RNAs comprises: (i)
reverse-transcribing RNAs, thereby obtaining a cDNA library, (ii)
optionally, amplifying said cDNA library, and (iii) sequencing said
cDNA library.
25. The method according to claim 24, wherein reverse-transcribing
RNAs at step (i) converts uridine (U) in the labelled RNA to
cytosine (C) instead of thymidine (T) in the cDNA library when the
sample was cultured in presence of an RNA-labelling agent.
26. The method according to claim 24, wherein reverse-transcribing
RNAs converts uridine (U) in the labelled RNA to cytosine (C)
instead of thymidine (T) in the cDNA library when the sample was
cultured in presence of a thiol-labelled RNA precursor.
27. The method according to claim 24, wherein, during
reverse-transcription at step (i), RNAs undergo a first-strand
synthesis with adenine (A)-to-guanosine (G) substitutions and a
second-strand synthesis leading to thymidine (T)-to-cytidine (C)
substitutions in the cDNA library when the sample was cultured in
presence of an RNA-labelling agent.
28. The method according to claim 24, wherein, during
reverse-transcription at step (i), RNAs undergo a first-strand
synthesis with A-to-G substitutions and a second-strand synthesis
leading to T-to-C substitutions in the cDNA library when the sample
was cultured in presence of a thiol-labelled RNA precursor.
29. The method according to claim 24, wherein sequencing said cDNA
library at step (iii) is performed by Next-Generation Sequencing
(NGS), deep sequencing or targeted sequencing of custom
sequences.
30. The method according to claim 15, wherein the at least one
microbial nucleic acid sequence hit is identified through: (i)
optionally, filtering the first and/or second set of sequence
reads, (ii) optionally, assembling the sequence reads into contigs,
(iii) aligning the sequence reads or contigs onto a database
comprising microbial nucleic acid sequences, (iv) identifying the
at least one microbial nucleic acid sequence hit mapped against at
least one sequence read or contig, and (v) optionally, re-aligning
the sequence reads or contigs onto the microbial nucleic acid
sequence hit identified in step (iv), thereby determining a
consensus microbial nucleic acid sequence, wherein the consensus
microbial nucleic acid sequence corresponds to the microbial
nucleic acid sequence hit.
31. The method according to claim 15, wherein the at least one
microbial nucleic acid sequence hit belongs to a live microbe if
the number and/or rate of T-to-C substitutions in the sequence
reads mapping against the at least one microbial nucleic acid
sequence hit in the first set of sequence reads is greater than the
number and/or rate of T-to-C substitutions in the sequence reads
mapping against the at least one microbial nucleic acid sequence
hit in the second set of sequence reads.
32. The method according to claim 15, wherein the microbe is
selected from the group consisting of viruses, bacteria, archaea,
fungi and protozoans.
33. A method of treating a subject affected with a microbial
infection, comprising: (a) providing a sample from the subject, (b)
performing the method according to claim 15 on said sample, (c)
diagnosing the subject as having a microbial infection if the at
least one identified microbial nucleic acid sequence hit belongs to
a live microbe, and (d) treating the subject if said subject was
diagnosed as having a microbial infection in step c).
34. A method for assessing the risk of microbial contamination in a
non-biological sample, comprising: (a) providing a non-biological
sample, (b) performing the method according to claim 15 on said
non-biological sample, and (c) concluding that the non-biological
sample is at risk of being contaminated if the at least one
identified microbial nucleic acid sequence hit belongs to a live
microbe.
Description
FIELD OF INVENTION
[0001] The present invention relates to a method for discriminating
between live and dead microbes in a sample, by discriminating
between transcriptionally-active and inert microbial nucleic acid
sequences in the sample. In particular, the method according to the
present invention is based on the comparison of levels of
nucleotide substitution in a sample cultured in presence of an
RNA-labelling agent.
[0002] The present invention further relates to a diagnosis method
of microbial infections in a subject; and to methods of assessing
the risk of contamination of a sample, implementing the method for
discriminating according to the present invention.
BACKGROUND OF INVENTION
[0003] The ability to detect viruses in cells, and more generally,
microbes, has numerous applications in the field of diagnosis,
where it proves useful in the identification of infectious agents
leading, e.g., to diseases; in biomedical research, where it
conditions the interpretation of experimental results; or in the
safety evaluation of potentially contaminated samples; or the
viability of microorganisms used in biotechnological process.
[0004] Beyond the now standard technique of detection relying on
the amplification of microorganism-specific nucleic acid sequences,
several techniques have emerged to circumvent the major limitation
of techniques based solely on amplification, which is the inability
to distinguish between dead microorganisms from active (or
replicative) microorganisms, including latent viruses. In the
context of screening a biological sample, the ability to establish
this distinction is crucial as the presence of an active microbial
agent may have consequences different from those associated with
the presence of dead and/or inactive microbes, in particular
viruses.
[0005] These techniques can be based on the detection of sequences
specific to the replicating viruses: the presence of RNAs in the
case of DNA viruses, the stoichiometry of positive--relative to
negative-sense RNAs in the case of negative-sense single-stranded
RNA viruses, the presence of negative-sense RNAs (antigenome) in
the case of positive-sense single-stranded RNA viruses and the
presence of DNA and positive-sense spliced RNA in the case of
retroviruses. These techniques are based on the amplification of
reverse-complement strand such as, e.g., RT-PCR, or on RNA
sequencing (RNA-seq, also called whole transcriptome shotgun
sequencing) with high-throughput sequencing methods, particularly
stranded RNA-seq techniques.
[0006] The current techniques still have several limitations: first
and foremost, the current techniques do not permit the distinction
between contaminating RNAs, such as replication intermediates
present independently of the presence of active microbes--such as
active viral particles--in the sample, and RNAs associated with the
presence of replicating microbes, such as viruses, in the sample.
The efficiency of such techniques in the detection of replicating
double-stranded RNA viruses still awaits demonstration. In the case
of positive-sense single-stranded RNA viruses, the small number of
negative-sense RNA species produced hinders sensitivity of the
detection technique available. In the case of negative-sense
single-stranded RNA viruses, distinction between positive-strand
antigenome and specific transcripts is difficult and require a
virus species-specific bioinformatics analysis.
[0007] Another characteristic of the RNA species associated with
the presence of replicating viruses (and more globally, replicating
microbes) is that they are, contrary to potential contaminating
RNAs, in the process of being synthetized (i.e., in the host cell
in the case of viruses; or directly within the bacteria, fungus or
else, in the case of other microbes). Several techniques to label
and purify nascent transcript (also referred to as RNA metabolic
labelling techniques) have been described. For instance, the
incorporation of 4-thiouridine (4sU) or other type of uridine
analogs (BudR) has been used to purify nascent eukaryotic mRNAs
transcripts and/or investigate the dynamics of transcriptional
networks (Herzog et al., 2017. Nat Methods. 14(12):1198-1204; Tani
et al., 2012. RNA Biol. 9(10):1233-8).
[0008] Here, the Inventors have designed, optimized and validated a
method for the detection of virus-infected cells, including by new
viral strain/species, relying on the detection of viral RNA
synthesis inside the cells of a biological sample using metabolic
labelling. The Inventors have further shown herein that this method
is also readily implementable to the detection of any
transcriptionally-active microbe, when relying on the detection of
microbial RNA synthesis in a sample using metabolic labelling.
These include, e.g., the detection of viral, bacterial, archaeal,
fungal or protozoal infections or contaminations by living
microorganisms and differentiation from carryover of dead
microorganisms.
SUMMARY
[0009] The present invention relates to an in vitro method for
discriminating between live and dead microbes in a sample,
comprising discriminating between transcriptionally-active and
inert microbial nucleic acid sequences in the sample, wherein the
method comprises the steps of: [0010] (a) sequencing a first set of
RNAs extracted from the sample, wherein the first set of RNAs is
obtained by culturing the sample in presence of an RNA-labelling
agent and further by submitting the extracted RNAs to conditions
allowing for nucleotide substitution; thereby obtaining a first set
of sequence reads; [0011] (b) comparing the number of substituted
nucleotides in the first set of sequence reads mapping against at
least one microbial nucleic acid sequence hit with a control
sequence; and [0012] (c) concluding that the at least one microbial
nucleic acid sequence hit belongs to a live microbe if the number
of substituted nucleotides in the sequence reads mapping against
said at least one microbial nucleic acid sequence hit in the first
set of sequence reads is greater than the number of nucleotides
randomly substituted in the control sequence.
[0013] In one embodiment, the control sequence is selected from:
[0014] a second set of sequence reads mapping against said at least
one microbial nucleic acid sequence hit, wherein the second set of
sequence reads is obtained by sequencing a second set of RNAs
obtained by culturing the sample in absence of an RNA-labelling
agent; [0015] a second set of sequence reads mapping against said
at least one microbial nucleic acid sequence hit, wherein the
second set of sequence reads is obtained by sequencing a second set
of RNAs obtained by culturing the sample in presence of an
RNA-labelling agent but without submitting the extracted RNAs to
conditions allowing for nucleotide substitution; [0016] a consensus
microbial nucleic acid sequence, obtained from the sequence reads
or contigs of the first set of sequence reads mapping against the
at least one microbial nucleic acid sequence hit; [0017] a sequence
corresponding to the same microbial nucleic acid sequence hit found
in the closest microbial strain identified in nucleic acid sequence
databases; and/or [0018] an analogous sequence corresponding to the
same microbial nucleic acid sequence hit identified in nucleic acid
sequence databases.
[0019] In one embodiment, the in vitro method is for discriminating
between infectious and non-infectious viral nucleic acid sequences
in a cell sample, and comprises: [0020] (a) sequencing a first and
a second set of RNAs extracted from the cell sample, wherein the
first set of RNAs is obtained by culturing the cell sample in
presence of an RNA-labelling agent and the second set of RNAs is
obtained by culturing the cell sample in absence of an
RNA-labelling agent, thereby obtaining a first and a second set of
sequence reads, [0021] (b) identifying at least one viral nucleic
acid sequence hit mapped against at least one sequence read of the
first set of sequence reads, [0022] (c) comparing the number of
substituted nucleotides in the sequence reads mapping the at least
one identified viral nucleic acid sequence hit in the first and
second set of sequence reads, and [0023] (d) concluding that the at
least one viral nucleic acid sequence hit belongs to an infectious
virus if the number of substituted nucleotides in the sequence
reads mapping the at least one identified viral nucleic acid
sequence hit in the first set of sequence reads is greater than in
the second set of sequence reads.
[0024] In one embodiment, the in vitro method for discriminating
between live and dead microbes in a sample comprises: [0025] (a)
sequencing a first and a second set of RNAs extracted from the
sample, wherein the first set of RNAs is obtained by culturing the
sample in presence of an RNA-labelling agent and the second set of
RNAs is obtained by culturing the sample in absence of an
RNA-labelling agent, thereby obtaining a first and a second set of
sequence reads, [0026] (b) comparing the number of substituted
nucleotides in the first set of sequence reads mapping against at
least one microbial nucleic acid sequence hit with the number of
substituted nucleotides in the second set of sequence reads mapping
against said at least one microbial nucleic acid sequence hit, and
[0027] (c) concluding that the at least one microbial nucleic acid
sequence hit belongs to a live microbe if the number of substituted
nucleotides in the sequence reads mapping against the at least one
microbial nucleic acid sequence hit in the first set of sequence
reads is greater than in the second set of sequence reads.
[0028] In one embodiment, the in vitro method for discriminating
between live and dead microbes in a sample comprises the steps of:
[0029] (a) sequencing a first and a second set of RNAs extracted
from the sample, wherein the first set of RNAs is obtained by
culturing the sample in presence of an RNA-labelling agent and
further by submitting the extracted RNAs to conditions allowing for
nucleotide substitution; and the second set of RNAs is obtained by
culturing the sample in absence of an RNA-labelling agent; thereby
obtaining a first and a second set of sequence reads; [0030] (b)
comparing the number of substituted nucleotides in the sequence
reads mapping against at least one microbial nucleic acid sequence
hit in the first and in the second set of sequence reads; and
[0031] (c) concluding that the at least one microbial nucleic acid
sequence hit belongs to a live microbe if the number of substituted
nucleotides in the sequence reads mapping against said at least one
microbial nucleic acid sequence hit in the first set of sequence
reads is greater than the number of nucleotides randomly
substituted in the second set of sequence reads.
[0032] In one embodiment, the first set of RNAs is obtained by
culturing the sample in presence of an RNA-labelling agent, thereby
obtaining labelled RNAs, and further submitting said labelled RNAs
to nucleotide substitution.
[0033] In one embodiment, the in vitro method for discriminating
between live and dead microbes in a sample comprises: [0034] (a)
sequencing a first and a second set of RNAs extracted from the
sample, wherein the first and the second set of RNAs are obtained
by culturing the sample in presence of an RNA-labelling agent,
thereby obtaining labelled RNAs, and wherein the first set of RNAs
is obtained from a first fraction of the labelled RNAs which is
submitted to nucleotide substitution, and the second set of RNAs is
obtained from a second fraction of the labelled RNAs which is not
submitted to nucleotide substitution, thereby obtaining a first and
a second set of sequence reads, [0035] (b) comparing the number of
substituted nucleotides in the first set of sequence reads mapping
against at least one microbial nucleic acid sequence hit with the
number of substituted nucleotides in the second set of sequence
reads mapping against said at least one microbial nucleic acid
sequence hit, and [0036] (c) concluding that the at least one
microbial nucleic acid sequence hit belongs to a live microbe if
the number of substituted nucleotides in the sequence reads mapping
against the at least one microbial nucleic acid sequence hit in the
first set of sequence reads is greater than in the second set of
sequence reads.
[0037] In one embodiment, the RNA-labelling agent is a
thiol-labelled RNA precursor.
[0038] In one embodiment, the thiol-labelled RNA precursor is
selected from the group comprising 4-thiouridine, 2-thiouridine,
2,4-dithiouridine, 2-thio-4-deoxyuridine,
5-carbethoxy-2-thiouridine, 5-carboxy-2-thiouridine,
5-(n-propyl)-2-thiouridine, 6-methyl-2-thiouridine and
6-(n-propyl)-2-thiouridine, thereby obtaining thiouridine-labelled
RNAs.
[0039] In one embodiment, the thiol-labelled RNA precursor is
preferably 4-thiouridine.
[0040] In one embodiment, nucleotide substitution comprises
chemically modifying the RNAs, preferably by alkylation,
oxidative-nucleophilic-aromatic substitution or osmium-mediated
transformation; more preferably by alkylation; and further
reverse-transcribing said chemically-modified RNAs.
[0041] In one embodiment, the second set of RNAs is obtained by
culturing the cell sample in presence of an RNA-labelling agent,
thereby obtaining labelled RNAs, and further alkylating said
labelled RNAs.
[0042] In one embodiment, labelled RNAs are alkylated using an
alkylating agent selected from the group comprising iodoacetamide,
iodoacetic acid, N-ethylmaleimide and 4-vinylpyridine.
[0043] In one embodiment, the alkylating agent is preferably
iodoacetamide.
[0044] In one embodiment, the step of sequencing RNAs extracted
from the cell sample comprises: [0045] (i) reverse-transcribing
RNAs, thereby obtaining a cDNA library, [0046] (ii) optionally,
amplifying said cDNA library, and [0047] (iii) sequencing said cDNA
library, preferably by Next-Generation Sequencing (NGS), deep
sequencing or targeted sequencing of custom sequences.
[0048] In one embodiment, the step of sequencing RNAs extracted
from the cell sample comprises: [0049] (i) reverse-transcribing
total RNAs, thereby obtaining a total cDNA library, [0050] (ii)
optionally, amplifying said total cDNA library, and [0051] (iii)
sequencing said total cDNA library by Next-Generation Sequencing
(NGS).
[0052] In one embodiment, reverse-transcribing RNAs converts
uridine (U) to cytidine (C) instead of uridine (U) to thymidine (T)
when the sample was cultured in presence of an RNA-labelling agent
and/or when the labelled RNAs are submitted to nucleotide
conversion.
[0053] In one embodiment, reverse-transcribing total RNAs converts
uridine (U) to cytidine (C) instead of uridine (U) to thymidine (T)
when the cell sample was cultured in presence of an RNA-labelling
agent.
[0054] In one embodiment, RNAs undergo first-strand synthesis
adenine (A)-to-guanosine (G) substitutions and second-strand
synthesis thymidine (T)-to-cytidine (C) substitutions upon
reverse-transcription when the sample was cultured in presence of
an RNA-labelling agent, preferably a thiol-labelled RNA
precursor.
[0055] In one embodiment, the step of identifying at least one
viral nucleic acid sequence hit mapped against at least one
sequence read of the first set of sequence reads comprises: [0056]
(i) optionally, filtering the first set of sequence reads, [0057]
(ii) optionally, assembling the sequence reads into contigs, [0058]
(iii) aligning the sequence reads or contigs onto a database
comprising viral nucleic acid sequences, [0059] (iv) identifying
the at least one viral nucleic acid sequence hit mapped against at
least one sequence read or contig, and [0060] (v) optionally,
re-aligning the sequence reads or contigs onto the viral nucleic
acid sequence hit identified in step (iv), thereby determining a
consensus viral nucleic acid sequence, thereby identifying at least
one consensus viral nucleic acid sequence.
[0061] In one embodiment, the at least one microbial nucleic acid
sequence hit is identified through: [0062] (i) optionally,
filtering the first and/or second set of sequence reads, [0063]
(ii) optionally, assembling the sequence reads into contigs, [0064]
(iii) aligning the sequence reads or contigs onto a database
comprising microbial nucleic acid sequences, [0065] (iv)
identifying the at least one microbial nucleic acid sequence hit
mapped against at least one sequence read or contig, and [0066] (v)
optionally, re-aligning the sequence reads or contigs onto the
microbial nucleic acid sequence hit identified in step (iv),
thereby determining a consensus microbial nucleic acid sequence,
wherein the consensus microbial nucleic acid sequence corresponds
to the microbial nucleic acid sequence hit.
[0067] In one embodiment, the at least one microbial nucleic acid
sequence hit belongs to a live microbe if: [0068] the number and/or
rate of T.fwdarw.C substitutions in the sequence reads mapping
against the at least one microbial nucleic acid sequence hit in the
first set of sequence reads is greater than the number and/or rate
of T.fwdarw.C substitutions in the control sequence; and/or [0069]
the number and/or rate of T.fwdarw.C substitutions in the sequence
reads mapping against the at least one microbial nucleic acid
sequence hit in the first set of sequence reads is greater than the
number and/or rate of T.fwdarw.A and/or T.fwdarw.G substitutions in
the same sequence reads.
[0070] In one embodiment, the at least one viral nucleic acid
sequence hit belongs to an infectious virus if the number of
T.fwdarw.C substitutions in the sequence reads mapping the at least
one identified viral nucleic acid sequence hit in the first set of
sequence reads is greater than in the second set of sequence
reads.
[0071] In one embodiment, the in vitro method according to the
present invention comprises the steps of: [0072] (1) (i) sequencing
unlabelled total RNAs extracted from the cell sample, wherein
unlabelled total RNAs are obtained by culturing the cell sample in
absence of an RNA-labelling agent, thereby obtaining a plurality of
sequence reads, [0073] (ii) identifying at least one viral nucleic
acid sequence hit mapped against the sequence reads, and [0074]
(iii) determining the number of substituted nucleotides in the
sequence reads mapping said identified at least one viral nucleic
acid sequence hit; and [0075] (2) (i) sequencing labelled total
RNAs extracted from the cell sample, wherein labelled total RNAs
are obtained by culturing the cell sample in presence of a
labelling agent, thereby obtaining a plurality of sequence reads,
[0076] (ii) determining the number of substituted nucleotides in
the sequence reads mapping said identified at least one viral
nucleic acid sequence hit, [0077] (3) comparing the number of
substituted nucleotides determined in (1)(iii) and (2)(ii), and
[0078] (4) concluding that the viral nucleic acid sequence hit
belongs to an infectious virus if the number of substituted
nucleotides determined in (2)(ii) is greater than the number of
substituted nucleotides determined in (1)(iii).
[0079] In one embodiment, the microbe is selected from the group
comprising viruses, bacteria, archaea, fungi and protozoans.
[0080] The present invention also relates to an in vitro method for
the diagnosis of a microbial infection in a subject, comprising:
[0081] (a) providing a sample from the subject, [0082] (b)
performing the in vitro method for discriminating between live and
dead microbes on said sample, and [0083] (c) diagnosing the subject
as having a microbial infection if the at least one identified
microbial nucleic acid sequence hit belongs to a live microbe.
[0084] The present invention also relates to an in vitro method for
the diagnosis of a viral infection in a subject, comprising: [0085]
(a) providing a cell sample from the subject, [0086] (b) performing
the in vitro method for discriminating between infectious and
non-infectious viral nucleic acid sequences in a cell sample
according to the present invention on said cell sample, and [0087]
(c) diagnosing the subject as having a viral infection if the at
least one identified viral nucleic acid sequence hit belongs to an
infectious virus.
[0088] The present invention also relates to a method of treating a
subject affected with a microbial infection, comprising: [0089] (a)
providing a sample from the subject, [0090] (b) performing the in
vitro method for discriminating between live and dead microbes on
said sample, [0091] (c) diagnosing the subject as having a
microbial infection if the at least one identified microbial
nucleic acid sequence hit belongs to a live microbe, and [0092] (d)
treating the subject if said subject was diagnosed as having a
microbial infection in step c).
[0093] The present invention also relates to a method for assessing
the risk of microbial contamination in a sample, comprising: [0094]
(a) providing a sample, [0095] (b) performing the in vitro method
for discriminating between live and dead microbes on said sample,
and [0096] (c) concluding that the sample is at risk of being
contaminated if the at least one identified microbial nucleic acid
sequence hit belongs to a live microbe.
Definitions
[0097] In the present invention, the following terms have the
following meanings:
[0098] "About" or "approximately", as used herein, can mean within
an acceptable error range for the particular value as determined by
the one skilled in the art, which will depend in part on how the
value is measured or determined, i.e., the limitations of the
measurement system. For example, "about" can mean within 1 or more
than 1 standard deviation, per the practice in the art.
Alternatively, "about" preceding a figure means plus or less 10% of
the value of said figure. Alternatively, particularly with respect
to biological systems or processes, the term can mean within an
order of magnitude, within 5-fold, and more preferably within
2-fold, of a value. Where particular values are described in the
application and claims, unless otherwise stated the term "about"
meaning within an acceptable error range for the particular value
should be assumed.
[0099] "Amplification", as used herein, refers to the process of
producing multiple copies, i.e., at least 2 copies, of a desired
template sequence. Techniques to amplify nucleic acids are
well-known to the one skilled in the art, and include specific
amplification methods as well as random amplification methods.
[0100] "Biological sample", as used herein, refers herein to any
sample that is obtained from, obtainable from, or otherwise derived
from a subject. "Biological samples" encompass "solid tissue
samples" and "fluid samples". The term "solid tissue sample" refers
herein to a sample of solid tissue isolated from anywhere in the
body. Tissue samples comprise cells that are not disaggregated, and
which occur in large clusters. Examples of tissue samples include,
but are not limited to, biopsy specimens and autopsy specimens. The
term "fluid sample" refers herein to a sample of fluid isolated
from anywhere in the body. Examples of fluid samples include, but
are not limited to, serum, plasma, whole blood, urine, saliva,
breast milk, tears, sweat, joint fluid, cerebrospinal fluid, lymph
fluid, sputum, mucus, pelvic fluid, synovial fluid, body cavity
washes, eye brushings, skin scrapings, buccal swabs, vaginal swabs,
pap smears, rectal swabs, aspirates, semen, vaginal fluid, ascitic
fluid and amniotic fluid. In a preferred embodiment, the
"biological sample" is a cell sample, i.e., any biological sample
as described herein, comprising at least one cell.
[0101] "cDNA library", as used herein, refers to a library composed
of complementary DNAs which are reverse-transcribed from mRNAs.
[0102] "Contig", as used herein, refers to overlapping sequence
reads. Typically, a contig is a continuous nucleic acid sequence
resulting from the reassembly of the small DNA fragments (sequence
reads) generated by next-generation sequencing. Practically,
assembly software will search for pairs of overlapping sequence
reads. Optionally, the assembly software can access nucleic acid or
amino acid databases to "align and check", thereby validating the
sequence read assembly. Assembling the sequences from pairs of
overlapping sequence reads produces a longer contiguous read
(contig) of sequenced DNA. By repeating this process multiple
times, at first with the initial short pairs of sequence reads,
then using increasingly longer pairs that are the result of
previous assembly, longer contigs can be determined.
[0103] "Deep sequencing", as used herein, refers to nucleic acid
sequencing to a depth that allows each base to be read multiple
times from independent nucleic acid molecules (e.g., a large number
of template molecules is sequenced relative to the length of the
sequence) and allows sequencing of thousands of molecules
simultaneously, thereby allowing to characterize complex pools of
nucleic acid molecules and increasing sequencing accuracy. Deep
sequencing of the transcriptome, also known as RNA-Seq, provides
both the sequence and frequency of contained RNA molecule species
that are present at any particular time in a specific sample.
[0104] "Expected value" or "e-value", as used herein, refers to a
parameter that describes the number of sequence hits one can expect
to see "by chance" when aligning sequence reads or contigs on a
database of a particular size. The e-value decreases exponentially
as the score of the match increases. Essentially, the e-value
describes the random background noise. For example, an e-value of 1
assigned to a hit can be interpreted as meaning that, in a database
of the current size, one might expect to see 1 match with a similar
score simply by chance. The lower the e-value, or the closer it is
to zero, the more "significant" the match is.
[0105] "Live microbe", as used herein, refers to any microbe which
is transcriptionally active, i.e., which is able to synthetize,
either by itself (such as in the case of bacteria, archaea, fungi
or protozoans for example) or after infecting a host cell (such as
in the case of viruses for example), RNAs. Live microbes include
latent microbes, i.e., dormant microbes which can reactivate. It is
to be noted that latent microbes, although dormant, exhibit a basal
transcriptional activity. By contract, a "dead microbe" refers to a
microbe which is not transcriptionally active, i.e., for which no
transcribed gene is detectable. In the context of the present
invention, the method aims at distinguishing between live microbes
and inert microbial nucleic acid sequences, either free in the
sample or contained inside a so-called dead microbe.
[0106] "Lysate", as used herein, refers to a liquid or solid
collection of materials following a lysis procedure.
[0107] "Lysis" (noun) or "lyse" (verb), as used herein, refer to
the disruption of (or the action of disrupting) a biological sample
in order to gain access to materials that are otherwise
inaccessible. When the biological sample is a cell, lysis refers to
breaking the cellular membrane of the cell, causing the cellular
contents to spill out. Lysis methods are well-known to the one
skilled in the art, and include, but are not limited to,
proteolytic lysis, chemical lysis, thermal lysis, mechanical lysis
and osmotic lysis.
[0108] "Nucleic acid sequence primer" or "primer", as used herein,
refer to an oligonucleotide that is capable of hybridizing or
annealing with a nucleic acid sequence and serving as an initiation
site for nucleotide polymerization under appropriate conditions,
such as the presence of nucleoside triphosphates and an enzyme for
polymerization, such as DNA or RNA polymerase or reverse
transcriptase, in an appropriate buffer and at a suitable
temperature.
[0109] "Oligonucleotide", as used herein, refers to a polymer of
nucleotides, generally to a single-stranded polymer of nucleotides.
In some embodiments, the oligonucleotide comprises from 2 to 500
nucleotides, preferably from 10 to 150 nucleotides, preferably from
20 to 100 nucleotides. Oligonucleotides may be synthetic or may be
made enzymatically. In some embodiments, oligonucleotides may
comprise ribonucleotide monomers, deoxyribonucleotide monomers, or
a mix of both.
[0110] "Microbe" or "microorganism", as used herein, refer to an
organism, such as, without limitation, a virus, a bacterium, an
archaeon, a fungus or a protozoan, likely able of infecting or
contaminating a sample; and/or of generating, transmitting or
carrying a disease in a subject.
[0111] "Polymerase chain reaction" or "PCR", as used herein,
encompass methods including, but not limited to, allele-specific
PCR, asymmetric PCR, hot-start PCR, intersequence-specific PCR,
methylation-specific PCR, miniprimer PCR, multiplex
ligation-dependent probe amplification, multiplex-PCR, nested PCR1
quantitative PCR, reverse transcription PCR and/or touchdown PCR.
DNA polymerase enzymes suitable to amplify nucleic acids comprise,
but are not limited to, Taq polymerase Stoffel fragment, Taq
polymerase, Advantage DNA polymerase, AmpliTaq, AmpliTaq Gold,
Titanium Taq polymerase, KlenTaq DNA polymerase, Platinum Taq
polymerase, Accuprime Taq polymerase, Pfu polymerase, Pfu
polymerase turbo, Vent polymerase, Vent exo-polymerase, Pwo
polymerase, 9 Nm DNA polymerase, Therminator, Pfx DNA polymerase,
Expand DNA polymerase, rTth DNA polymerase, DyNAzyme-EXT
Polymerase, Klenow fragment, DNA polymerase I, T7 polymerase,
Sequenase.TM., Tfi polymerase, T4 DNA polymerase, Bst polymerase,
Bca polymerase, BSU polymerase, phi-29 DNA polymerase and DNA
polymerase Beta or modified versions thereof. In one embodiment,
the DNA polymerase has a 3'-5' proofreading, i.e., exonuclease,
activity. In one embodiment, the DNA polymerase has a 5'-3'
proofreading, i.e., exonuclease, activity. In one embodiment, the
DNA polymerase has strand displacement activity, i.e., the DNA
polymerase causes the dissociation of a paired nucleic acid from
its complementary strand in a direction from 5' towards 3', in
conjunction with, and close to, the template-dependent nucleic acid
synthesis. DNA polymerases such as E. coli DNA polymerase I, Klenow
fragment of DNA polymerase I, T7 or T5 bacteriophage DNA
polymerase, and HIV virus reverse transcriptase are enzymes which
possess both the polymerase activity and the strand displacement
activity. Agents such as helicases can be used in conjunction with
inducing agents which do not possess strand displacement activity
in order to produce the strand displacement effect, that is to say
the displacement of a nucleic acid coupled to the synthesis of a
nucleic acid of the same sequence. Likewise, proteins such as Rec A
or Single Strand Binding Protein from E. coli or from another
organism could be used to produce or to promote the strand
displacement, in conjunction with other inducing agents (Kornberg
& Baker (1992). Chapters 4-6. In DNA replication (2nd ed., pp.
113-225). New York: W.H. Freeman).
[0112] "Random amplification techniques", as used herein, means
amplification of any nucleic acid present in a biological sample,
independently of its sequence. This includes without limitation,
multiple displacement amplification (MDA), random PCR, random
amplification of polymorphic DNA (RAPD) or multiple annealing and
looping based amplification cycles (MALBAC).
[0113] "Transcriptionally-active microbial nucleic acid sequence",
as used herein, refers to a nucleic acid sequence belonging to a
live microbe, i.e., a microbe expressing microbial genes, even if
the microbe is latent. By contrast, "inert microbial nucleic acid
sequence", as used herein, refers to a nucleic acid sequence
belonging to an inactive microbe, i.e., a dead microbe. The term
"inert microbial nucleic acid sequence" further refers to free
nucleic acid sequences, i.e., outside of a microbe, whether intact
or degraded/fragmented, but in any case, not active.
[0114] "Transcriptionally-active viral nucleic acid sequence", as
used herein, refers to a nucleic acid sequence belonging to an
active virus, i.e., a live virus expressing viral genes, even if
the virus cycle is abortive, i.e., does not lead to the formation
of virus particles (such as in the case, e.g., of latent viruses).
By contrast, "inert viral nucleic acid sequence", as used herein,
refers to a nucleic acid sequence belonging to an inactive virus,
i.e., a dead virus or nucleic acids not associated to virus
particles.
[0115] "Reverse transcription", as used herein, refers to the
replication of RNA using an RNA-directed DNA polymerase (reverse
transcriptase, abbreviated "RT") to produce complementary strands
of DNA ("cDNA"). The reverse-transcription of RNAs may be carried
out by techniques well-known to the one skilled in the art, using a
reverse transcriptase enzyme and a mix of 4 deoxyribonucleotides
triphosphate (dNTPs), namely deoxyadenosine triphosphate (dATP),
deoxycytidine triphosphate (dCTP), deoxyguanosine triphosphate
(dGTP) and (deoxy)thymidine triphosphate (dTTP). In some
embodiments, the reverse-transcription of RNAs comprises a first
step of first-strand cDNA synthesis. Methods for first-strand cDNA
synthesis are well-known to the one skilled in the art.
First-strand cDNA synthesis reactions can use a combination of
sequence-specific primers, oligo(dT) primers or random primers.
Examples of reverse transcriptase enzymes include, but are not
limited to, M-MLV reverse transcriptase, SuperScript II
(Invitrogen), SuperScript III (Invitrogen), SuperScript IV
(Invitrogen), Maxima (ThermoFisher Scientific), ProtoScript II (New
England Biolabs), PrimeScript (ClonTech).
[0116] "Sequence read", as used herein, refers to a sequence or
data representing a sequence of nucleotide bases, in other words,
the order of monomers in a nucleic acid sequence, which is
determined by a sequencer.
[0117] "Sequencer" or "sequenator", as used herein, refer to
apparatus used for determining the order of constituents in a
biological polymer, such as a nucleic acid or a protein.
Preferably, sequencers, in the sense of the present invention,
refer to next-generation sequencers. A "next-generation sequencer"
can include a number of different sequencers based on different
technologies, such as Illumina sequencing, Roche 454 sequencing,
Ion torrent sequencing, SOLiD sequencing and the like.
[0118] "Subject", as used herein, refers to a mammal, preferably a
human. In one embodiment, the subject is a pet, including, without
limitation, a dog, a cat, a guinea pig, a hamster, a rat, a mouse,
a ferret, a rabbit, a bird or an amphibian. In one embodiment, a
subject may be a "patient", i.e., a female or a male, an adult or a
child, who/which is awaiting the receipt of, or is receiving
medical care or was/is/will be the object of a medical procedure,
or is monitored for the development of a disease, disorder or
condition, in particular a viral, bacterial, archaeal, fungal or
protozoan infection.
[0119] "Template" or "template sequence", as used herein, refer to
a nucleic acid sequence for which amplification is desired. A
template can comprise DNA or RNA. In one embodiment, the template
sequence is known. In one embodiment, the template sequence is not
known.
[0120] The terminology used herein is for the purpose of describing
particular cases only and is not intended to be limiting. As used
herein, the singular forms "a", "an" and "the" are intended to
include the plural forms as well, unless the context clearly
indicates otherwise. Furthermore, to the extent that the terms
"including", "includes", "having", "has", "with", or variants
thereof are used in either the detailed description and/or the
claims, such terms are intended to be inclusive in a manner similar
to the term "comprising".
DETAILED DESCRIPTION
[0121] The present invention relates to a method for discriminating
between live and dead microbes in a sample, preferably a cell
sample. In particular, the method according to the present
invention is based on the discrimination between
transcriptionally-active and inert microbial nucleic acid sequences
in a sample, preferably a cell sample.
[0122] The method of the present invention is particularly useful
for distinguishing between (1) dead microbes--such as viruses,
bacteria, archaea, fungi or protozoans--and inert microbial
sequences; and (2) active (or transcriptionally-active) and latent
microbes--such as viruses, bacteria, archaea, fungi or
protozoans.
[0123] It is therefore to be understood that the present method is
readily applicable to the detection of any sort of microbe, and the
discrimination between live and dead microbes.
[0124] In one embodiment, the microbe is selected from viruses,
bacteria, archaea, fungi and protozoans.
[0125] In one embodiment, the microbe is a virus.
[0126] Viruses are small infectious agents that replicates inside
living cells, and which infect all types of life form.
[0127] The Baltimore classification of viruses is based on the
mechanism of mRNA production. Viruses must generate mRNAs from
their genomes to produce proteins and replicate themselves, but
different mechanisms are used to achieve this in each virus family
Viral genomes may be single-stranded (ss) or double-stranded (ds),
RNA or DNA, and may or may not use reverse-transcriptase. In
addition, ssRNA viruses may be either sense (+) or antisense (-).
This classification places viruses into seven groups: [0128] I.
dsDNA viruses (such as, e.g., adenoviruses, herpesviruses or
poxviruses), [0129] II. (+)ssDNA viruses (such as, e.g.,
anelloviridae, bidnaviridae, circoviridae, geminiviridae,
genomoviridae, inoviridae, microviridae, nanoviridae, parvoviridae,
smacoviridae or spiraviridae), [0130] III. dsRNA viruses (such as,
e.g., reoviruses), [0131] IV. (+)ssRNA viruses (such as, e.g.,
picornaviruses or togaviruses), [0132] V. (-)ssRNA viruses (such
as, e.g., orthomyxoviruses, rhabdoviruses), [0133] VI. (+)ssRNA-RT
viruses with DNA intermediate in life-cycle (such as, e.g.,
retroviruses), [0134] VII. dsDNA-RT viruses DNA with RNA
intermediate in life-cycle (such as, e.g., hepadnaviruses).
[0135] In one embodiment, the method according to the present
invention is for discriminating samples, preferably cell samples,
containing transcriptionally-active and inert viral nucleic acid
sequences belonging to viruses selected from the group comprising
or consisting of dsDNA viruses, (+)ssDNA viruses, dsRNA viruses,
(+)ssRNA viruses, (-)ssRNA viruses, (+)ssRNA-RT viruses and
dsDNA-RT viruses.
[0136] In one embodiment, the method according to the present
invention is for discriminating samples, preferably cell samples,
containing transcriptionally-active and inert viral nucleic acid
sequences belonging to viruses selected from those disclosed in the
International Committee on Taxonomy of Viruses (ICTV) database,
preferably in the ICTV Master Species List 2018b.v2 of May 31, 2019
(MSL #34), which is herein incorporated by reference in its
entirety.
[0137] In one embodiment, the microbe is a bacterium.
[0138] In one embodiment, the method according to the present
invention is for discriminating samples, preferably cell samples,
containing transcriptionally-active and inert bacterial nucleic
acid sequences belonging to bacteria.
[0139] Examples of bacteria include, but are not limited to,
bacteria belonging to the Acidobacteria, Actinobacteria, Aquificae,
Bacteroidetes, Chlamydiae, Chlorobi, Chloroflexi, Chrysiogenetes,
Cyanobacteria, Deferribacteres, Deinococcus-Thermus, Dictyoglomi,
Fibrobacteres, Firmicutes, Fusobacteria, Gemmatimonadetes,
Nitrospirae, Planctomycetes, Proteobacteria, Spirochaetes,
Thermodesulfobacteria, Thermomicrobia, Thermotogae and
Verrucomicrobia phyla; including subtaxons thereof.
[0140] In a preferred embodiment, the bacterium is a Firmicute,
preferably of the Bacilli class, more preferably of the Mollicute
subclass, even more preferably of Mycoplasma genus.
[0141] In one embodiment, the microbe is an archaeon.
[0142] In one embodiment, the method according to the present
invention is for discriminating samples, preferably cell samples,
containing transcriptionally-active and inert archaeal nucleic acid
sequences belonging to archaea.
[0143] Examples of archaea include, but are not limited to, archaea
belonging to the Aenigmarchaeota, Aigarchaeota, Altiarchaeia,
Archaeoglobi, Asgardarchaeota, Bathyarchaeota, Crenarchaeota,
Diapherotrites, Geoarchaeota, Halobacteria, Korarchaeota,
Methanobacteria, Methanococci, Methanomicrobia, Methanopyri,
Nanoarchaeota, Nanohaloarchaea, Parvarchaeota, Thalassoarchaeia,
Thaumarchaeota, Thermococci, Thermoplasmata and Woesearchaeota
phyla; including subtaxons thereof.
[0144] In one embodiment, the microbe is a fungus.
[0145] Fungi are eukaryotic organisms, including yeasts and molds,
characterized in that they comprise chitin in their cell walls.
[0146] In one embodiment, the method according to the present
invention is for discriminating samples, preferably cell samples,
containing transcriptionally-active and inert fungal nucleic acid
sequences belonging to fungi.
[0147] Examples of fungi include, but are not limited to, fungi
belonging to the Ascomycota, Basidiomycota, Entorrhizomycota,
Glomeromycota, Mucoromycota, Calcarisporiellomycota,
Mortierellomycota, Kickxellomycota, Entomophthoromycota,
Olpidiomycota, Basidiobolomycota, Neocallimastigomycota,
Chytridiomycota and Blastocladiomycota phyla; including subtaxons
thereof.
[0148] In one embodiment, the microbe is a protozoan.
[0149] In one embodiment, the method according to the present
invention is for discriminating samples, preferably cell samples,
containing transcriptionally-active and inert protozoan nucleic
acid sequences belonging to protozoans.
[0150] Examples of protozoans include, but are not limited to,
protozoans belonging to the Euglenozoa, Amoebozoa, Choanozoa,
Microsporidia and Sulcozoa phyla; including subtaxons thereof.
[0151] In one embodiment, the sample is a biological sample.
Examples of suitable biological samples include, but are not
limited to, solid tissue samples and fluid samples.
[0152] In one embodiment, the biological sample is/was obtained
through sampling by minimally invasive or non-invasive
approaches.
[0153] In one embodiment, the biological sample was previously
obtained from the subject, i.e., the methods according to the
present invention are in vitro methods.
[0154] In one embodiment, the biological sample is a cell sample.
By "cell sample", it is referred to any biological sample as
described herein, comprising at least one cell.
[0155] In one embodiment, the biological sample is cultured.
Therefore, encompassed under the term "biological sample" are cell
or tissue cultures, preferably in vitro cell or tissue cultures,
such as, e.g., a culture of cells or tissues isolated from a
cytology sample, a tissue sample or a biological fluid sample.
[0156] In one embodiment, the method according to the present
invention comprises an initial step of culturing the sample,
preferably the cell sample, preferably culturing in vitro the cell
sample.
[0157] The culture of cell samples, in particular the culture of
cells or tissues isolated from a cytology sample, a tissue sample
or a biological fluid sample, is well known to the one skilled in
the art.
[0158] In one embodiment, the cell sample is seeded in a density
that allows exponential growth. In one embodiment, the biological
sample is seeded at about 50% to about 80% confluency.
[0159] The initial step of culturing the sample is required (1) to
allow the potential microbe (such as the virus, bacterium,
archaeon, fungus or protozoan) in the sample to transcribe RNAs
(which is the key biological process used in the present method to
discriminate between live and dead microbes) and (2) for metabolic
labelling. In the case where the microbe to be detected is not a
self-replicating microbe (such as, e.g., a virus, or some bacterias
such as Mycoplasma), the sample shall be a cell sample to allow the
potential microbe to infect said cells and replicate. In the case
where the microbe is a self-replicating microbe (i.e., the microbe
comprises or is itself a cell, such as, typically, a bacterium, an
archaeon, a fungus or a protozoan), it is not compulsory that the
sample be a cell sample.
[0160] In one embodiment, the sample is not a biological sample. In
this case, the sample may be, e.g., an environmental sample such as
water, soil, air, and the like. Other examples of non-biological
samples include food samples. Other examples of non-biological
sample include preservation medium.
[0161] In one embodiment, the method for discriminating between
live and dead microbes--preferably virus, bacterium, archaeon,
fungus or protozoan--in a sample, preferably a cell sample,
comprising discriminating between transcriptionally-active and
inert microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequences in the sample, preferably the
cell sample, comprises the steps of: [0162] (a) sequencing a first
set of RNAs extracted from the sample, preferably the cell sample,
wherein the first set of RNAs is obtained by culturing the sample,
preferably the cell sample, in presence of an RNA-labelling agent
and further by submitting the extracted RNAs to conditions allowing
for nucleotide substitution; thereby obtaining a first set of
sequence reads; [0163] (b) comparing the number of substituted
nucleotides in the first set of sequence reads mapping against at
least one microbial--preferably viral, bacterial, archaeal, fungal
or protozoan--nucleic acid sequence hit with a control sequence;
and [0164] (c) concluding that the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--if the number of substituted nucleotides in the sequence
reads mapping against said at least one microbial--preferably
viral, bacterial, archaeal, fungal or protozoan--nucleic acid
sequence hit in the first set of sequence reads is greater than the
number of nucleotides randomly substituted in the control
sequence.
[0165] In one embodiment, the method of the invention is carried
out in conditions causing a reverse-transcriptase enzyme to make
errors (i.e., to incorporate mispaired nucleotides) that can be
detected and compared, by reference to a consensus standard method
of reverse-transcription.
[0166] Such conditions include the presence, in the RNAs to be
reverse-transcribed, of labels such as thiol labels, and/or of
nucleotide modifications by nucleotide substitution techniques.
Exemplary conditions are further detailed in the following.
[0167] As used herein, the term "mispaired nucleotide" refers to a
nucleotide that is incorporated in a non-Watson-Crick base pairing
fashion.
[0168] In one embodiment, the error rate of the
reverse-transcriptase enzyme is not linked to the fidelity of the
reverse-transcriptase enzyme fidelity.
[0169] As used herein, the term "fidelity" with reference of a
reverse-transcriptase enzyme refers to the sequence accuracy
maintained by the enzyme during synthesis of DNA from RNA. Fidelity
is inversely correlated to the error rate of
reverse-transcription.
[0170] In one embodiment, the method according to the present
invention comprises a step of sequencing a first set of RNAs
extracted from the sample, preferably the cell sample. In one
embodiment, the method according to the present invention comprises
a step of sequencing a first set of total RNAs extracted from the
sample, preferably the cell sample. In one embodiment, the method
according to the present invention comprises a step of sequencing a
first set of total messenger RNAs (mRNAs) extracted from the
sample, preferably the cell sample.
[0171] In one embodiment, the step of sequencing a first set of
RNAs extracted from the sample, preferably the cell sample,
comprises one or more or all of the sub-steps of labelling RNAs,
lysing the cells, extracting RNAs, substituting nucleotides in
labelled RNAs, generating a cDNA library, amplifying the cDNA
library and sequencing the cDNA library.
[0172] Labelling RNA is typically carried out in culture, during in
vitro transcription, by addition in the culture medium of a label
to be incorporated in RNA transcripts, thereby obtaining labelled
RNAs. Alternatively or additionally, labelling RNA can also be
carried out in culture without addition of a label to be
incorporated in RNA transcripts, in the case where the sample from
which the RNAs are extracted already comprise such label, as will
be further detailed below.
[0173] By "RNA transcript", it is meant any neosynthesized RNA
molecule.
[0174] The labelling of RNA transcripts can be carried out by
technics well-known to the one skilled in the art. Such technics
include, but are not limited to, those described in Schulz &
Rentmeister (2014. Chembiochem. 15(16):2342-7), Huang & Yu
(2013. Curr Protoc Mol Biol. Chapter 4: Unit 4.15) and Liu et al.
(2016. Bioessays. 38(2):192-200).
[0175] Preferably, metabolic labelling of RNAs alters Watson-Crick
base pairing and causes reverse-transcription of labelled RNAs to
substitute nucleotides, i.e., to pair a labelled nucleotide with a
non-Watson-Crick nucleotide. For example, a labelled uridine may be
paired with a guanosine instead of an adenine during first-strand
synthesis. Consequently, a cytosine shall be incorporated during
second-strand synthesis, ultimately leading to a thymidine (T) to
cytosine (C) substitution with respect to the initial nucleic acid
sequence.
[0176] In one embodiment, the metabolic labelling of RNA
transcripts is carried out by thiol-labelling. Thiol-labelling is a
technic well-known in the art, which comprises the incorporation of
thiol-labelled RNA precursors into newly synthesized RNAs. Such
technics include, but are not limited to, those described in Cleary
et al. (2005. Nat Biotechnol. 23(2):232-7), Miller et al. (2009.
Nat Methods. 6(6):439-41), Garibaldi et al. (2017. Methods Mol
Biol. 1648:169-176), Russo et al. (2017. Methods. 120:39-48) and
Herzog et al. (2017. Nat Methods. 14(12):1198-1204).
[0177] Examples of suitable thiol-labelled RNA precursors include,
but are not limited to, 4-thiouridine, 2-thiouridine,
2,4-dithiouridine, 2-thio-4-deoxyuridine,
5-carbethoxy-2-thiouridine, 5-carboxy-2-thiouridine,
5-(n-propyl)-2-thiouridine, 6-methyl-2-thiouridine,
6-(n-propyl)-2-thiouridine, 6-thioguanosine, 6-methylthioguanosine,
6-thioinosine and 6-methylthioinosine.
[0178] In one embodiment, the thiol-labelled RNA precursor is a
thiouridine derivative, preferably selected from the group
comprising or consisting of 4-thiouridine (4sU), 2-thiouridine
(2sU), 2,4-dithiouridine (2,4sU), 2-thio-4-deoxyuridine,
5-carbethoxy-2-thiouridine, 5-carboxy-2-thiouridine,
5-(n-propyl)-2-thiouridine, 6-methyl-2-thiouridine and
6-(n-propyl)-2-thiouridine.
[0179] In a preferred embodiment, the thiol-labelled RNA precursor
is 4-thiouridine (sometimes abbreviated as "4sU" or "s4U").
[0180] In one embodiment, the thiol-labelled RNA precursor is
supplied to the sample, preferably the cell sample, from the
culture medium. In one embodiment, the thiol-labelled RNA precursor
is added in the culture medium.
[0181] Thiol-labelled RNA precursors, when added to the culture
medium, can be imported into cells of the sample, preferably the
cell sample, (such as, e.g., cells infected by a virus or microbes
comprising or being themselves a cell, e.g., a bacterium, a fungus
or a protozoan) through specific transporters, named "Equilibrative
Nucleoside Transporters". These transporters are quasi-ubiquitous
in metazoans. In particular, 4-thiouridine can be imported into
cells through the Equilibrative Nucleoside Transporter 1 (ENT1),
encoded in humans by the SLC29A1 gene.
[0182] In one embodiment, fresh thiol-labelled RNA precursor is
added to the culture every 1 hour, 2 hours, 3 hours, 4 hours, 5
hours, 6 hours or more.
[0183] In one embodiment, the sample, preferably the cell sample,
is cultured in a thiol-labelled RNA precursor-containing culture
medium for a period of time ranging from about 2 hours to about 15
hours, preferably from about 4 hours to about 12 hours, preferably
from about 6 hours to about 10 hours.
[0184] In one embodiment, the sample, preferably the cell sample,
is cultured in a thiol-labelled RNA precursor-containing culture
medium for a first period of time and a second period of time,
comprising addition of fresh thiol-labelled RNA precursor between
the first and the second period of time. In one embodiment, the
first period of time ranges from about 1 hour to about 10 hours,
preferably from about 2 hours to about 8 hours, preferably from
about 4 hours to about 6 hours, preferably is about 6 hours. In one
embodiment, the second period of time ranges from about 1 hour to
about 6 hours, preferably from about 2 hours to about 5 hours,
preferably from about 3 hours to about 4 hours, preferably is about
3 hours.
[0185] Preferably, the thiol-labelled RNA precursor is not toxic to
the sample, preferably the cell sample.
[0186] In one embodiment, the thiol-labelled RNA precursor is
supplied to the sample, preferably the cell sample, at a
concentration which does not compromise cell viability. In one
embodiment, a "concentration which does not compromise cell
viability" ranges from about 1 .mu.M to about 2 mM final,
preferably from about 10 .mu.M to about 1.5 mM final, preferably
from about 100 .mu.M to about 1 mM final, preferably from about 250
.mu.M to about 1 mM final, preferably from about 500 .mu.M to about
1 mM final, preferably from about 700 .mu.M to about 900 .mu.M
final, preferably about 800 .mu.M final of thiol-labelled RNA
precursor.
[0187] In one embodiment, the thiol-labelled RNA precursor is
supplied to the sample, preferably the cell sample, directly from
the microbe--preferably the virus, bacterium, archaeon, fungus or
protozoan. In one embodiment, the thiol-labelled RNA precursor is
not added in the culture medium.
[0188] Certain microbes are able to catalyze the biosynthesis of
thiol-labelled RNA precursors, using enzymes such as, without
limitation, 4-thiouridine synthetase (ThiI) (Mueller et al., 1998.
Nucleic Acids Res. 26(11):2606-10) or 2-thiouridine synthetase
(MnmA) (Kambampati & Lauhon, 2003. Biochemistry. 42(4):1109-1;
Black & Dos Santos, 2015. J Bacteriol. 197(11):1952-62).
[0189] In a specific embodiment where the sample comprises a
microbe able to catalyze the biosynthesis of thiol-labelled RNA
precursors, it can be advantageous to further supply the
thiol-labelled RNA precursor to the sample, preferably the cell
sample, from the culture medium.
[0190] In this embodiment, the thiol-labelled RNA precursor further
supplied from the culture medium may be the same or may be
different from the thiol-labelled RNA precursor supplied by the
microbe.
[0191] In this embodiment, the thiol-labelled RNA precursor further
supplied from the culture medium may be supplied as described
hereinabove (with regards to, without limitation, addition of fresh
thiol-labelled RNA precursor, concentration, periods of time,
etc.).
[0192] Thiol-labelled RNA precursors and thiol-labelled RNAs are
light-sensitive, and prone to oxidation. Therefore, in one
embodiment, RNA labelling is carried out in the dark, or, at the
very least, with protection from light. In one embodiment, RNA
labelling is carried out in presence of a reducing agent. Examples
of suitable reducing agents include, but are not limited to,
.beta.-mercaptoethanol, dithiothreitol (DTT),
tris(2-carboxyethyl)phosphine (TCEP), cysteine, N-acetyl cysteine,
cysteamine, 2-mercaptoethanesulfonic acid sodium salt,
dithioerythritol (DTE) and bis(2-mercaptoethyl)sulfone).
[0193] Typically, lysing the cells of the sample, preferably the
cell sample, aims at releasing the cell's content, in particular,
its RNAs. In one embodiment, lysing the cells of the sample may be
optional, such as in the case where the RNA content of the cells is
already released in the sample.
[0194] In one embodiment, cells are lysed by chemical lysis,
mechanical lysis, proteolytic lysis, thermal lysis and/or osmotic
lysis. These cell lysis technics are well-known to the one skilled
in the art.
[0195] In one embodiment, cells are lysed in a suitable lysis
solution. Lysis solutions can comprise various components,
including salts, buffers, detergents, reducing agents, protease
inhibitors, nuclease inhibitors, glycerol, sugars and the like. The
one skilled in the art has knowledge in lysis solutions and is
readily able to design and/or choose the appropriate lysis solution
depending on the type of cells to lyse.
[0196] In one embodiment, cell lysis is carried out in presence of
ribonuclease (RNase) inhibitor. RNases can sometimes be released
from cells during cell lysis, or be co-purified with isolated RNA,
and therefore compromise downstream applications. Such RNase
contamination can also be introduced via tips, tubes and other
reagents used in procedures. RNase inhibitors are commercially
available.
[0197] Thiol-labelled RNAs being light-sensitive and prone to
oxidation, in one embodiment, cell lysis is carried out in the
dark, or, at the very least, with protection from light. In one
embodiment, cell lysis is carried out in presence of a reducing
agent. Examples of suitable reducing agents include, but are not
limited to, .beta.-mercaptoethanol, dithiothreitol (DTT),
tris(2-carboxyethyl)phosphine (TCEP), cysteine, N-acetyl cysteine,
cysteamine, 2-mercaptoethanesulfonic acid sodium salt,
dithioerythritol (DTE) and bis(2-mercaptoethyl)sulfone).
[0198] Extracting RNAs can be carried out by technics well-known to
the one skilled in the art. Such technics include, but are not
limited to, chloroform-isoamyl alcohol extraction,
phenol-chloroform extraction, alkaline extraction, guanidinium
thiocyanate-phenol-chloroform extraction, binding on anion exchange
resin, silica matrices, glass particle, diatomaceous earth,
magnetic particles made from different synthetic polymers,
biopolymers, porous glass and based on inorganic magnetic.
[0199] Preferably, extraction of RNAs is carried out by
chloroform-isoamyl alcohol extraction, using, e.g.,
chloroform:isoamyl alcohol 24:1.
[0200] In one embodiment, extracted RNAs are further precipitated.
Precipitating RNAs can be carried out by technics well-known to the
one skilled in the art. Such technics include isopropanol-ethanol
precipitation, TRIzoI method (Chomczynski, 1993. Biotechniques.
15(3):532-4, 536-7) and Pine Tree method (Chang et al., 1993. Plant
Mol Biol Report. 11(2):113-116).
[0201] Preferably, precipitation of RNAs is carried out by
isopropanol-ethanol precipitation.
[0202] Thiol-labelled RNAs being light-sensitive and prone to
oxidation, in one embodiment, extraction of RNAs is carried out in
the dark, or, at the very least, with protection from light. In one
embodiment, extraction of RNAs is carried out in presence of a
reducing agent. Examples of suitable reducing agents include, but
are not limited to, .beta.-mercaptoethanol, dithiothreitol (DTT),
tris(2-carboxyethyl)phosphine (TCEP), cysteine, N-acetyl cysteine,
cysteamine, 2-mercaptoethanesulfonic acid sodium salt,
dithioerythritol (DTE) and bis(2-mercaptoethyl)sulfone).
[0203] In one embodiment, labelled RNAs undergo nucleotide
substitution. In one embodiment, labelled RNAs are submitted to
conditions allowing for nucleotide substitution.
[0204] In the following, the terms "substitution", "conversion",
transformation", may be used interchangeably to refer to the
incorporation of mispaired nucleotides.
[0205] Nucleotide substitution in labelled RNAs can be carried out
by chemically modifying the labelled RNAs; and further
reverse-transcribing said chemically-modified labelled RNAs.
Accordingly, conditions allowing for nucleotide substitution
include a chemical modification of the labelled RNAs; and the
reverse-transcription of said chemically-modified labelled
RNAs.
[0206] Preferably, methods of nucleotide conversion allow to alter
Watson-Crick base pairing in labelled RNAs, and causes
reverse-transcription of labelled RNAs during cDNA synthesis to
incorporate mispaired nucleotides, i.e., to pair a labelled
nucleotide with a non-Watson-Crick nucleotide.
[0207] For example, a labelled uridine (such as, a thiol-labelled
uridine) may be paired with a guanosine (G) instead of an adenine
(A) during cDNA first-strand synthesis. Consequently, a cytosine
shall be incorporated during second-strand synthesis, ultimately
leading to a thymidine (T) to cytosine (C) substitution with
respect to the initial nucleic acid sequence.
[0208] Nucleotide substitution can therefore be defined as the
equivalent first-strand synthesis nucleotide substitution (i.e.,
the nucleotide substitution occurring upon first-strand synthesis);
or as the equivalent second-strand synthesis nucleotide
substitution (i.e., the nucleotide substitution occurring upon
second-strand synthesis).
[0209] In one embodiment, labelled RNAs undergo a first-strand
synthesis A-to-G (A.fwdarw.G) substitution. In one embodiment,
labelled RNAs undergo a second-strand synthesis T-to-C (T.fwdarw.C)
substitution. In these embodiments, a labelled uridine (U) in the
labelled RNA is therefore converted to cytosine (C) instead of
thymidine (T) in the corresponding cDNA.
[0210] Unless explicitly stated otherwise, nucleotide substitutions
recited herein correspond to second-strand synthesis nucleotide
substitutions.
[0211] Suitable chemical modifications of labelled RNAs include,
but are not limited to, alkylation, oxidative-nucleophilic-aromatic
substitution, osmium-mediated transformation, or any other method
known to the one skilled in the art.
[0212] Alkylating labelled RNAs can be carried out by technics
well-known to the one skilled in the art. Such technics include,
but are not limited to, those described in Herzog et al. (2017. Nat
Methods. 14(12): 1198-1204).
[0213] Preferably, alkylation of labelled RNAs is carried out after
extraction of RNAs as detailed hereinabove.
[0214] In one embodiment, labelled RNA alkylation is carried out
using an alkylating agent. Examples of suitable alkylating agents
include, but are not limited to, iodoacetamide, iodoacetic acid,
N-ethylmaleimide and 4-vinylpyridine.
[0215] In a preferred embodiment, the alkylating agent is
iodoacetamide.
[0216] A non-limiting example of alkylation treatment of labelled
RNAs comprises adding to labelled RNAs: [0217] from about 1 mM
final to about 20 mM final, preferably from about 5 mM final to
about 15 mM final, preferably about 10 mM final of iodoacetamide in
100% ethanol, [0218] from about 10 mM final to about 100 mM final,
preferably from about 25 mM final to about 75 mM final, preferably
about 50 mM final of a buffer at pH 8.0 (such as, e.g., a sodium
phosphate (NaPO.sub.4) buffer), [0219] from about 25% v/v to about
75% v/v, preferably from about 40% v/v to about 60% v/v, preferably
about 50% v/v of DMSO.
[0220] Thiol-labelled RNAs being light-sensitive, in one
embodiment, RNA alkylation is carried out in the dark, or, at the
very least, with protection from light.
[0221] In one embodiment, RNA alkylation is not carried out in
presence of a reducing agent.
[0222] In one embodiment, RNA alkylation is quenched, i.e., stopped
at the end the alkylation treatment.
[0223] Quenching the alkylation treatment can be carried out by
technics well-known to the one skilled in the art.
[0224] In one embodiment, RNA alkylation quenching is carried out
using a reducing agent.
[0225] Examples of suitable reducing agents include, but are not
limited to, .beta.-mercaptoethanol, dithiothreitol (DTT),
tris(2-carboxyethyl)phosphine (TCEP), cysteine, N-acetyl cysteine,
cysteamine, 2-mercaptoethanesulfonic acid sodium salt,
dithioerythritol (DTE) and bis(2-mercaptoethyl)sulfone).
[0226] A non-limiting example of RNA alkylation quenching comprises
adding to alkylated RNAs from about 1 mM final to about 100 mM
final, preferably from about 10 mM final to about 50 mM final,
preferably from about 10 mM final to about 30 mM final, preferably
about 20 mM final of dithiothreitol (DTT).
[0227] Oxidative-nucleophilic-aromatic substitution of labelled
RNAs can be carried out by technics well-known to the one skilled
in the art. Such technics include, but are not limited to, those
described in Schofield et al. (2018. Nat Methods.
15(3):221-225).
[0228] Preferably, oxidative-nucleophilic-aromatic substitution of
labelled RNAs is carried out after extraction of RNAs as detailed
hereinabove.
[0229] In one embodiment, labelled RNA
oxidative-nucleophilic-aromatic substitution is carried out using
an oxidant and a nucleophile.
[0230] Examples of suitable oxidants include, but are not limited
to, sodium periodate (NaIO.sub.4), meta-chloroperoxybenzoic acid
(mCPBA), sodium iodate (NaIO.sub.3) and hydrogen peroxide
(H.sub.2O.sub.2).
[0231] In a preferred embodiment, the alkylating agent is sodium
periodate (NaIO.sub.4).
[0232] Examples of suitable nucleophiles include, but are not
limited to, 2,2,2-trifluoroethanamine (TFEA), hydrazine,
benzylamine, ammonia, methoxyamine, 1,1-dimethylethylenediaminen,
aniline and 4-(trifluoromethyl)benzylamine.
[0233] In a preferred embodiment, the nucleophile is
2,2,2-trifluoroethanamine (TFEA).
[0234] Thiol-labelled RNAs being light-sensitive, in one
embodiment, RNA oxidative-nucleophilic-aromatic substitution is
carried out in the dark, or, at the very least, with protection
from light.
[0235] In one embodiment, RNA oxidative-nucleophilic-aromatic
substitution is not carried out in presence of a reducing
agent.
[0236] In one embodiment, RNA oxidative-nucleophilic-aromatic
substitution is quenched, i.e., stopped at the end the
oxidative-nucleophilic-aromatic substitution treatment.
[0237] Quenching the oxidative-nucleophilic-aromatic substitution
treatment can be carried out by technics well-known to the one
skilled in the art.
[0238] Osmium-mediated transformation of labelled RNAs can be
carried out by technics well-known to the one skilled in the art.
Such technics include, but are not limited to, those described in
Riml et al. (2017. Angew Chem Int Ed Engl. 56(43):13479-13483).
[0239] Preferably, osmium-mediated transformation of labelled RNAs
is carried out after extraction of RNAs as detailed
hereinabove.
[0240] In one embodiment, labelled RNA osmium-mediated
transformation is carried out using osmium tetroxide (OsO.sub.4)
and ammonia.
[0241] Thiol-labelled RNAs being light-sensitive, in one
embodiment, RNA oxidative-nucleophilic-aromatic substitution is
carried out in the dark, or, at the very least, with protection
from light.
[0242] In one embodiment, RNA osmium-mediated transformation is not
carried out in presence of a reducing agent.
[0243] In one embodiment, RNA osmium-mediated transformation is
quenched, i.e., stopped at the end the
oxidative-nucleophilic-aromatic substitution treatment.
[0244] Quenching the osmium-mediated transformation treatment can
be carried out by technics well-known to the one skilled in the
art.
[0245] The generation of cDNA libraries, especially for sequencing
purposes, is part of the knowledge of the one skilled in the art.
Kits for cDNA library generation are commercially available,
including, but not limited to, SMARTer Stranded Total RNA-Seq Kit
(ClonTech), QuantSeq 3' mRNA-Seq Library Prep Kit (Lexogen),
Nextera XT DNA Library Prep Kit (Illumina), TruSeq Nano DNA Library
Prep Kit (Illumina), NEBNext DNA Library Prep Master Mix (New
England Biolabs), NEBNext Ultra DNA Library Prep Kit (New England
Biolabs) and JetSeq DNA Library Preparation Kit (Bioline).
[0246] In one embodiment, generating a cDNA library comprises some
or all of the following sub-steps: [0247] RNAs
reverse-transcription, including: [0248] first-strand cDNA
synthesis (thereby obtaining a double-stranded mixed RNA-cDNA
library), [0249] optionally, RNA templates removal (thereby
obtaining a single-stranded cDNA library), [0250] second-strand
cDNA synthesis (thereby obtaining a double-stranded cDNA library),
and [0251] optionally, double-stranded cDNA library
purification.
[0252] Reverse-transcription of RNAs is carried out by technics
well-known to the one skilled in the art, using a
reverse-transcriptase enzyme and a mix of 4 deoxyribonucleotides
triphosphate (dNTPs), namely deoxyadenosine triphosphate (dATP),
deoxycytidine triphosphate (dCTP), deoxyguanosine triphosphate
(dGTP) and (deoxy)thymidine triphosphate (dTTP).
[0253] In particular, methods for first-strand cDNA synthesis are
well-known to the one skilled in the art. First-strand cDNA
synthesis reactions can use a combination of sequence-specific
primers, oligo(dT) or random primers. In one embodiment, the
first-strand cDNA synthesis reaction uses oligo(dT) primers. In one
embodiment, the first-strand cDNA synthesis reaction uses
sequence-specific primers. In one embodiment, the first-strand cDNA
synthesis reaction uses random primers.
[0254] In one embodiment, primers used for first-strand cDNA
synthesis comprise a fixed nucleic acid sequence (comprising, e.g.,
adapters and/or indexes used for sequencing) and a priming nucleic
acid sequence (complementary to the RNA template). In one
embodiment, primers used for first-strand cDNA synthesis comprise a
fixed 5'-end sequence and a priming 3'-end sequence. In one
embodiment, primers used for first-strand cDNA synthesis comprise a
fixed 3'-end sequence and a priming 5'-end sequence.
[0255] In particular, methods for RNA templates removal are
well-known to the one skilled in the art. RNA template removal can
be carried out, e.g., by incubating the double-stranded mixed
RNA-cDNA library with RNase H.
[0256] RNAs reverse-transcription to generate a cDNA library can be
carried out in a random manner, i.e., using random primers and
thereby reverse-transcribing the whole or major part of the RNAs.
Alternatively, RNAs reverse-transcription to generate a cDNA
library can be carried out in a targeted manner, i.e., using
specific primers and thereby creating a cDNA library of custom
sequences only.
[0257] In one embodiment, generating a library of cDNA, in
particular reverse-transcribing RNAs, leads to nucleotide
substitution. Such nucleotide substitutions occur randomly in a
small number on any reverse-transcribed RNA, in absence of chemical
modification. However, an upsurge of such substitutions is observed
during reverse-transcription of RNAs which were previously
labelled, and further chemically-modified by techniques such as
alkylation, oxidative-nucleophilic-aromatic substitution,
osmium-mediated transformation or the like, as described
hereinabove. This upsurge of substitutions is illustrated in the
"Examples" section further below.
[0258] Amplifying the cDNA library can be carried out by methods
well-known to the one skilled in the art.
[0259] Amplification of the cDNA library can be carried out in a
random manner, i.e., using random primers and thereby amplifying
the whole or major part of the cDNA library. Alternatively,
amplification of the cDNA library can be carried out in a targeted
manner, i.e., using specific primers and thereby amplifying only
custom sequences in the cDNA library.
[0260] Sequencing the cDNA library can be carried out by methods
well-known to the one skilled in the art. In one embodiment,
sequencing the cDNA library is carried out by Next Generation
Sequencing (NGS), deep sequencing or targeted sequencing of custom
sequences.
[0261] Methods for NGS are known to the one skilled in the art, and
comprise, but are not limited to, paired-end sequencing, sequencing
by synthesis, single-read sequencing.
[0262] Platforms for NGS are available, and include, but are not
limited to, Illumina MiSeq (Illumina), Ion Torrent PGM
(ThermoFisher Scientific), PacBio RS (PacBio), Illumina GAIIx
(Illumina), Illumina HiSeq 2000 (Illumina).
[0263] The step of sequencing the cDNA library can be carried out
using commercially available kits, such as MiSeq reagent kit v2
(Illumina).
[0264] In one embodiment, sequencing the cDNA library yields a set
of sequence reads.
[0265] In one embodiment, the method according to the present
invention comprises a step of comparing the number of substituted
nucleotides in the first set of sequence reads mapping against at
least one microbial--preferably viral, bacterial, archaeal, fungal
or protozoan--nucleic acid sequence hit with a control
sequence.
[0266] By "substituted nucleotides", it is meant a nucleotide
replaced by another with respect to the microbial--preferably
viral, bacterial, archaeal, fungal or protozoan--nucleic acid
sequence hit. Typically, any nucleotide can be replaced by any
other nucleotide, such as a thymidine being replaced by a cytosine
(T.fwdarw.C), by an adenine (T.fwdarw.A) or by a guanine
(T.fwdarw.G). The same applies to adenine (A), cytosine (C) and
guanine (G) being replaced by any of the three other
nucleotides.
[0267] Such substitutions occur randomly in a small number, in
particular during steps of reverse-transcription. The present
invention is however based on the upsurge of such substitutions in
the case where RNAs were previously labelled, and further submitted
to chemical modification methods such as alkylation,
oxidative-nucleophilic-aromatic substitution, osmium-mediated
transformation and the like.
[0268] In one embodiment, the total number of substituted
nucleotides in the first set of sequence reads mapping against the
at least one microbial--preferably viral, bacterial, archaeal,
fungal or protozoan--nucleic acid sequence hit is compared with the
total number of substituted nucleotides in the control
sequence.
[0269] In one embodiment, the number of T.fwdarw.C substitutions in
the first set of sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit is compared to the number of
T.fwdarw.C substitutions in the control sequence.
[0270] In one embodiment, the nucleotide substitution rates in the
first set of sequence reads mapping against the at least one
identified microbial--preferably viral, bacterial, archaeal, fungal
or protozoan--nucleic acid sequence hit is compared to the
nucleotide substitution rates in the control sequence.
[0271] In one embodiment, the T.fwdarw.C substitution rates in the
first set of sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit is compared to the T.fwdarw.C
substitution rates in the control sequence.
[0272] The "substitution rate", as used herein, is calculated as
the number of one or several given nucleotide substitutions (e.g.,
T.fwdarw.C, or any other nucleotide substitution as defined
hereinabove) divided by the total number of substitutions.
Alternatively, the "substitution rate" may be calculated as the
number of one or several given nucleotide substitutions divided by
the total number of nucleotides in the sequence reads mapping
against the at least one microbial--preferably viral, bacterial,
archaeal, fungal or protozoan--nucleic acid sequence hit.
[0273] In one embodiment, the ratio of the T.fwdarw.C substitution
rate between the sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads and in the control sequence, and the ratio of the average
substitution rates of all other nucleotides (i.e., all but
T.fwdarw.C) between the sequence reads mapping against the at least
one microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads and in the control sequence, are compared.
[0274] In one embodiment, the method comprises identifying at least
one microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit mapped against at least one
sequence read.
[0275] In one embodiment, identification of at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit mapped against at least one
sequence read comprises sub-steps of filtering the set of reads,
assembling the sequence reads into contigs, aligning the sequence
reads or contigs onto a database, identifying the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit mapped against at least one
sequence read or contig, and re-aligning the sequence reads or
contigs onto the microbial--preferably viral, bacterial, archaeal,
fungal or protozoan--nucleic acid sequence hit.
[0276] Filtering a set of sequence reads is part of the knowledge
of the one skilled in the art.
[0277] In one embodiment, filtering a set of sequence reads may
include, without limitation, suppressing sequence read duplicates,
suppressing low quality sequence reads, suppressing sequence read
homopolymers, removing fixed nucleic acid sequences from the
sequence reads (such as, e.g., adapters and/or indexes used for
sequencing), discarding endogenous sequence reads (i.e., sequence
reads mapping a nucleic acid sequence belonging to the subject's
cell), discarding unwanted sequence reads (such as, e.g., rRNA
sequence reads and the like) and the like.
[0278] Such filtering can be carried out using software readily
available to the one skilled in the art.
[0279] Assembling a set of sequence reads into contigs is part of
the knowledge of the one skilled in the art.
[0280] Such assembly of sequence reads into contigs can be carried
out using software readily available to the one skilled in the
art.
[0281] Optionally, sequence reads or contigs may be translated into
amino acid sequences.
[0282] Aligning a set of sequence reads or contigs is part of the
knowledge of the one skilled in the art. Such alignment of sequence
reads or contigs can be carried out using software readily
available to the one skilled in the art.
[0283] In one embodiment, sequence reads or contigs are aligned on
a microbial database, i.e., a database comprising
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequences or amino acid sequences (in the
case where sequence reads or contigs were translated into amino
acid sequences). Such database can be downloaded, e.g., from the
EMBL Nucleotide Sequence Database.
[0284] Identifying at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
(or amino acid sequence hit in the case where sequence reads or
contigs were translated into amino acid sequences) mapped against
at least one sequence read or contig is part of the knowledge of
the one skilled in the art.
[0285] Upon alignment of the set of sequence reads or contigs on a
database, hit sequences from said database can be identified.
[0286] In one embodiment, at least one hit sequence is identified
(and therefore selected) based on a threshold expected value
(e-value) obtained upon alignment with the sequence reads or
contigs. In one embodiment, a sequence hit is identified (and
therefore selected) if the e-value obtained upon alignment of said
sequence hit with at least one sequence read or contig is below
10.sup.-2, preferably below 510.sup.-3, preferably below
10.sup.-3.
[0287] Re-aligning the sequence reads or contig onto the at least
one microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit (or amino acid sequence hit in
the case where sequence reads or contigs were translated into amino
acid sequences) previously identified (and therefore selected) is
part of the knowledge of the one skilled in the art.
[0288] Such re-alignment of sequence reads or contigs can be
carried out using software readily available to the one skilled in
the art.
[0289] In one embodiment, upon re-alignment, at least one final
consensus sequence of the at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
(or amino acid sequence hit in the case where sequence reads or
contigs were translated into amino acid sequences) previously
identified (and therefore selected) is determined.
[0290] In one embodiment, the control sequence is selected from:
[0291] a second set of sequence reads mapping against said at least
one microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit, wherein the second set of
sequence reads is obtained by sequencing a second set of RNAs
obtained by culturing the sample, preferably the cell sample, in
absence of an RNA-labelling agent; [0292] a second set of sequence
reads mapping against said at least one microbial--preferably
viral, bacterial, archaeal, fungal or protozoan--nucleic acid
sequence hit, wherein the second set of sequence reads is obtained
by sequencing a second set of RNAs obtained by culturing the
sample, preferably the cell sample, in presence of an RNA-labelling
agent but without submitting the extracted RNAs to conditions
allowing for nucleotide substitution; [0293] a consensus
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence, obtained from the sequence reads
or contigs of the first set of sequence reads mapping against the
at least one microbial--preferably viral, bacterial, archaeal,
fungal or protozoan--nucleic acid sequence hit; [0294] a sequence
corresponding to the same microbial--preferably viral, bacterial,
archaeal, fungal or protozoan--nucleic acid sequence hit found in
the closest microbial--preferably viral, bacterial, archaeal,
fungal or protozoan--strain identified in nucleic acid sequence
databases; and/or an analogous sequence corresponding to the same
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit identified in nucleic acid
sequence databases.
[0295] In one embodiment, the control sequence is a second set of
sequence reads mapping against said at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit, wherein the second set of
sequence reads is obtained by sequencing a second set of RNAs
obtained by culturing the sample, preferably the cell sample, in
absence of an RNA-labelling agent.
[0296] In this embodiment, the method according to the present
invention comprises the steps of: [0297] (a) sequencing a first and
a second set of RNAs extracted from the sample, preferably the cell
sample, wherein the first set of RNAs is obtained by culturing the
sample, preferably the cell sample, in presence of an RNA-labelling
agent and the second set of RNAs is obtained by culturing the
sample, preferably the cell sample, in absence of an RNA-labelling
agent, thereby obtaining a first and a second set of sequence
reads, [0298] (b) comparing the number of substituted nucleotides
in the first set of sequence reads mapping against at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit with the number of substituted
nucleotides in the second set of sequence reads mapping against
said at least one microbial--preferably viral, bacterial, archaeal,
fungal or protozoan--nucleic acid sequence hit, and [0299] (c)
concluding that the at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
belongs to a live microbe--preferably virus, bacterium, archaeon,
fungus or protozoan--if the number of substituted nucleotides in
the sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads is greater than in the second set of sequence reads.
[0300] Preferably, the first set of RNAs is obtained by culturing
the sample, preferably the cell sample, in presence of an
RNA-labelling agent, thereby obtaining labelled RNAs, and further
submitting said labelled RNAs to nucleotide substitution as
detailed hereinabove.
[0301] In one embodiment, the method according to the present
invention comprises the steps of: [0302] (a) sequencing a first and
a second set of RNAs extracted from a sample, preferably a cell
sample, [0303] wherein the first set of RNAs is obtained by
culturing the sample, preferably the cell sample, in presence of an
RNA-labelling agent and the second set of RNAs is obtained by
culturing the sample, preferably the cell sample, in absence of an
RNA-labelling agent, [0304] thereby obtaining a first and a second
set of sequence reads, [0305] (b) identifying at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit mapped against at least one
sequence read of the first set of sequence reads, [0306] (c)
comparing the number of substituted nucleotides in the sequence
reads mapping the at least one identified microbial--preferably
viral, bacterial, archaeal, fungal or protozoan--nucleic acid
sequence hit in the first and second set of sequence reads, and
[0307] (d) concluding that the at least one microbial--preferably
viral, bacterial, archaeal, fungal or protozoan--nucleic acid
sequence hit belongs to a live, active microbe--preferably virus,
bacterium, archaeon, fungus or protozoan--if the number of
substituted nucleotides in the sequence reads mapping the at least
one identified microbial--preferably viral, bacterial, archaeal,
fungal or protozoan--nucleic acid sequence hit in the first set of
sequence reads is greater than in the second set of sequence
reads.
[0308] In this embodiment, the method comprises a step of
sequencing a first set of RNAs extracted from the sample,
preferably the cell sample. In this embodiment, the sample,
preferably the cell sample was cultured in presence of
RNA-labelling agent.
[0309] In this embodiment, the step of sequencing a first set of
RNAs extracted from the sample, preferably the cell sample,
comprises one or more or all of the sub-steps of labelling RNAs,
lysing the cells, extracting RNAs, substituting nucleotides in
labelled RNAs, generating a cDNA library, amplifying the cDNA
library and sequencing the cDNA library.
[0310] These sub-steps are defined and detailed hereinabove and
apply to the sequencing of a first set of RNAs.
[0311] In this embodiment, the method comprises a further step of
sequencing a second set of RNAs extracted from the sample,
preferably the cell sample. In this embodiment, the sample,
preferably the cell sample was cultured in absence of RNA-labelling
agent.
[0312] In this embodiment, the step of sequencing a second set of
RNAs extracted from the sample, preferably the cell sample,
comprises one or more or all of the sub-steps of lysing the cells,
extracting RNAs, generating a cDNA library, amplifying the cDNA
library and sequencing the cDNA library.
[0313] These sub-steps are defined and detailed hereinabove and
apply to the sequencing of a second set of RNAs.
[0314] In one embodiment, the control sequence is a second set of
sequence reads mapping against said at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit, wherein the second set of
sequence reads is obtained by sequencing a second set of RNAs
obtained by culturing the sample, preferably the cell sample, in
presence of an RNA-labelling agent but without submitting the
extracted RNAs to conditions allowing for nucleotide
substitution.
[0315] In this embodiment, the method according to the present
invention comprises the steps of: [0316] (a) sequencing a first and
a second set of RNAs extracted from the sample, preferably the cell
sample, [0317] wherein the first and the second set of RNAs are
obtained by culturing the sample, preferably the cell sample, in
presence of an RNA-labelling agent, thereby obtaining labelled
RNAs, and [0318] wherein the first set of RNAs is obtained from a
first fraction of the labelled RNAs which is submitted to
nucleotide substitution, and the second set of RNAs is obtained
from a second fraction of the labelled RNAs which is not submitted
to nucleotide substitution, thereby obtaining a first and a second
set of sequence reads, [0319] (b) comparing the number of
substituted nucleotides in the first set of sequence reads mapping
against at least one microbial--preferably viral, bacterial,
archaeal, fungal or protozoan--nucleic acid sequence hit with the
number of substituted nucleotides in the second set of sequence
reads mapping against said at least one microbial--preferably
viral, bacterial, archaeal, fungal or protozoan--nucleic acid
sequence hit, and [0320] (c) concluding that the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--if the number of substituted nucleotides in the sequence
reads mapping against the at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
in the first set of sequence reads is greater than in the second
set of sequence reads.
[0321] In one embodiment, the control sequence may be a consensus
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence. In one embodiment, a consensus
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence may be obtained from multiple
sequence reads of the first set of sequence reads mapping against
the at least one microbial--preferably viral, bacterial, archaeal,
fungal or protozoan--nucleic acid sequence hit. Such a consensus
sequence can be readily determined since it has been observed that
not all targeted nucleotides are thio-labelled and/or substituted
upon nucleotide substitution procedure. Indeed, a sufficient number
of targeted nucleotides is substituted to allow discrimination
according to the method of the present invention; but this number
remains sufficiently low to establish a consensus sequence.
[0322] In one embodiment, the control sequence may be a nucleic
acid sequence corresponding to the same microbial--preferably
viral, bacterial, archaeal, fungal or protozoan--nucleic acid
sequence hit, but found in the closest microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--strain identified in
nucleic acid sequence databases.
[0323] In one embodiment, the control sequence may be an analogous
sequence corresponding to the same microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
identified in nucleic acid sequence databases.
[0324] In one embodiment, the method according to the present
invention comprises a step of concluding if the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--.
[0325] In one embodiment, the live microbe--preferably virus,
bacterium, archaeon, fungus or protozoan--is characterized by
taxonomic assignment of the at least one microbial--preferably
viral, bacterial, archaeal, fungal or protozoan--nucleic acid
sequence hit.
[0326] In one embodiment, it is concluded that the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--if the total number of nucleotide substitutions in the
sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads is greater than the total number of nucleotide substitutions
in the control sequence.
[0327] In one embodiment, it is concluded that the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--if the number of T.fwdarw.C substitutions in the
sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads is greater than the number of T.fwdarw.C substitutions in the
control sequence.
[0328] In one embodiment, "the [ . . . ] number of [ . . . ]
substitutions [ . . . ] in the first set of sequence reads is
greater than the [ . . . ] number of [ . . . ] substitutions [ . .
. ] in the control sequence" when the number of substitutions is
twice greater, preferably three times greater, more preferably 4,
5, 6, 7, 8, 9, 10, 15, 20, 50, 100 times greater in the sequence
reads mapping against the at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
in the first set of sequence reads than in the control
sequence.
[0329] In one embodiment, it is concluded that the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--if the nucleotide substitution rate in the sequence
reads mapping against the at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
in the first set of sequence reads is greater than the nucleotide
substitution rate in the control sequence.
[0330] In one embodiment, it is concluded that the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--if the T.fwdarw.C substitution rate in the sequence
reads mapping against the at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
in the first set of sequence reads is greater than the T.fwdarw.C
substitution rate in the control sequence.
[0331] As used herein, the term "T.fwdarw.C substitution rate" may
be defined with the following formula:
T -> C substitution rate = Number of C nucleotides identified
when a T was expected Total number of expected T ##EQU00001##
[0332] In one embodiment, "the [ . . . ] substitution rate [ . . .
] in the first set of sequence reads is greater than the [ . . . ]
substitution rate [ . . . ] in the control sequence" when the
substitution rate is twice greater, preferably three times greater,
more preferably 4, 5, 6, 7, 8, 9, 10, 15, 20, 50, 100 times greater
in the sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads than in the control sequence.
[0333] In one embodiment, it is concluded that the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--if the T.fwdarw.C substitution rate is greater than the
average substitution rates of all other nucleotides in the sequence
reads mapping against the at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
in the first set of sequence reads.
[0334] In one embodiment, "the T.fwdarw.C substitution rate is
greater than the average substitution rates of all other
nucleotides" when the T.fwdarw.C substitution rate is twice
greater, preferably three times greater, more preferably 4, 5, 6,
7, 8, 9, 10, 15, 20, 50, 100 times greater than the average
substitution rates of all other nucleotides.
[0335] By "average substitution rates of all other nucleotides", it
is meant the average of A.fwdarw.C, A.fwdarw.G, A.fwdarw.T,
C.fwdarw.A, C.fwdarw.G, C.fwdarw.T, T.fwdarw.A, T.fwdarw.G,
G.fwdarw.A, G.fwdarw.C and G.fwdarw.T substitution rates.
[0336] In one embodiment, it is concluded that the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--if the T.fwdarw.C substitution rate is greater than the
average substitution rates of T.fwdarw.A and T.fwdarw.G in the
sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads.
[0337] In one embodiment, "the T.fwdarw.C substitution rate is
greater than the average substitution rates of T.fwdarw.A and
T.fwdarw.G" when the T.fwdarw.C substitution rate is twice greater,
preferably three times greater, more preferably 4, 5, 6, 7, 8, 9,
10, 15, 20, 50, 100 times greater than the average substitution
rates of T.fwdarw.A and T.fwdarw.G. In one embodiment, it is
concluded that the at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
belongs to a live microbe--preferably virus, bacterium, archaeon,
fungus or protozoan--if the ratio of the T.fwdarw.C substitution
rate between the sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads and in the control sequence is greater than the ratio of the
average substitution rates of all other nucleotides between the
sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads and in the control sequence.
[0338] In one embodiment, it is concluded that the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--if the ratio of the T.fwdarw.C substitution rate between
the sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads and in the control sequence is greater than the ratio of the
average substitution rates of T.fwdarw.A and T.fwdarw.G between the
sequence reads mapping against the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit in the first set of sequence
reads and in the control sequence.
[0339] In one embodiment, it is concluded that the at least one
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan--if the T.fwdarw.C substitution index in the sequence
reads mapping against the at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
in the first set of sequence reads is greater than a threshold
value. In this embodiment, the threshold value can be determined
experimentally. In one embodiment, the threshold value is greater
than the T.fwdarw.C substitution index in the sequence reads
mapping against the at least one microbial--preferably viral,
bacterial, archaeal, fungal or protozoan--nucleic acid sequence hit
in the second set of sequence reads. In one embodiment, the
threshold value is at least 2, preferably at least 2.5, 3, 3.5, 4,
4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20 or more.
[0340] As used herein, the term "T.fwdarw.C substitution index" may
be defined with the following formula:
T .fwdarw. C substitution index = " T -> C " rate Mean ( " T
-> A " , " T -> G " rates ) ##EQU00002##
[0341] The methods for discriminating between live and dead
microbes--preferably viruses, bacteria, archaea, fungi or
protozoans--in a sample, preferably a cell sample, comprising
discriminating between transcriptionally-active and inert
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequences in the sample, preferably the
cell sample, according to the present invention, are useful in a
number of different applications.
[0342] Indeed, the risk of microbial, in particular viral,
bacterial, archaeal, fungal or protozoan, contamination is a topic
of major concern for biological products. This includes both the
risk of contamination of Good Manufacturing Practice (GMP)
facilities and the final drug product. Virus testing of raw
materials, cells, virus seeds, master/working banks, serum batches
for vaccines, etc. is key for the safety of the drug product. This
is particularly critical for live vaccines, gene therapy viral
vectors and cell therapy drug products since their production do
not include downstream viral elimination steps. As a result, the
safety of these products heavily relies on viral testing during the
production process.
[0343] All previously reported contaminations of products based on
cell cultures have been due to unpredictable animal viruses that
were not identified during viral testing of raw materials or
production cells. In fact, classical viral testing is limited since
many viruses do not grow in cell lines used for in vitro tests or
in rodents or eggs used for in vivo tests.
[0344] The methods of the present invention offer an alternative
approach to accurately test samples for contamination and to
distinguish between live--including latent--microbes; and harmless
contamination of inert microbial nucleic acids fragments (e.g.,
fragmented nucleic acids of microbes after gamma-irradiation
inactivation).
[0345] Based on this, numerous industrial applications are
foreseeable.
[0346] In the field of vaccines, the control of inactivated
vaccines is typically carried out nowadays by cultivating the
vaccine which is deemed to be inactivated, then seek for the
presence of active microbes. The methods of the present invention
would allow to differentiate live microbes from the background
noise of inert microbial nucleic acid sequence which are
inactivated and thus harmless.
[0347] Similarly, the methods according to the present invention
can be readily implemented to detect contamination with live
microbes in biological samples, such as raw material (e.g., serum
batches in the case of vaccines), cells, master/working banks, etc.
but also in blood cultures and other types of biological samples
used for diagnosis. After antibiotic treatment in a subject for
example, it could be considered to test the subject for the
presence or absence of remaining live microbes, and thereby
identify potential treatment-resistant microbes.
[0348] In the field of virotherapy, the methods according to the
present invention can be implemented to test for the presence or
absence of replicative revertant viruses in viral vectors, such as
those used in, e.g., gene therapy.
[0349] Preservation medium can also be subjected to microbial
contamination, and the methods according to the present invention
may readily be used to test for such contamination before
contacting the sample to preserve.
[0350] The field of possibilities also extends to non-biological
samples. For example, food safety is a major concern. Sanitary
scandals and the emergence of a food demand focused on quality and
safety resonates with the food testing in search of microbial
contamination. The methods of the present invention can solve this
issue by providing a practical and definite answer as to whether a
food sample is contaminated by live microbes or not.
[0351] Environmental samples may also be tested. For example, water
and/or air-conditioning circuits are known to potentially carry
microbes. The methods according to the present invention can be
implemented to confirm the presence or absence of such live
microbes.
[0352] Another object of the present invention is a diagnosis
method, preferably an in vitro diagnosis method, of a
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--infection in a subject.
[0353] In one embodiment, the diagnosis method according to the
present invention comprises a step of providing a sample,
preferably a cell sample, from the subject.
[0354] In one embodiment, the diagnosis method according to the
present invention further comprises a step of performing any of the
methods for discriminating between transcriptionally-active and
inert microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequences in a sample, preferably a cell
sample, according to the present invention.
[0355] In one embodiment, the diagnosis method according to the
present invention further comprises a step of diagnosing the
subject as having a microbial--preferably viral, bacterial,
archaeal, fungal or protozoan--infection if the at least one
identified microbial--preferably viral, bacterial, archaeal, fungal
or protozoan--nucleic acid sequence hit belongs to a live
microbe--preferably virus, bacterium, archaeon, fungus or
protozoan.
[0356] Another object of the present invention is a method of
treating a microbial--preferably viral, bacterial, archaeal, fungal
or protozoan--infection in a subject.
[0357] In one embodiment, the method of treating a microbial
infection according to the present invention comprises a step of
carrying the diagnosis method according to the present invention;
and a step of treating the subject if said subject was diagnosed as
having a microbial--preferably viral, bacterial, archaeal, fungal
or protozoan--infection.
[0358] Means and methods of treating a microbial infection are well
known to the one skilled in the art, and include, without
limitation, the administration of at least one antiviral,
antibacterial, antifungal or antiprotozoal agent to the
subject.
[0359] Suitable examples of antiviral agents include, without
limitation, those classified in the therapeutic subgroup J05 of the
Anatomical Therapeutic Chemical Classification System. Further
examples include, but are not limited to, acemannan, acyclovir,
acyclovir sodium, adamantanamine, adefovir, adenine arabinoside,
alovudine, alvircept sudotox, amantadine hydrochloride, aranotin,
arildone, atevirdine mesylate, avridine, cidofovir, cipamfylline,
cytarabine hydrochloride, BMS 806, C31G, carrageenan, zinc salts,
cellulose sulfate, cyclodextrins, dapivirine, delavirdine mesylate,
desciclovir, dextrin 2-sulfate, didanosine, disoxaril,
dolutegravir, edoxudine, enviradene, envirozime, etravirine,
famciclovir, famotine hydrochloride, fiacitabine, fialuridine,
fosarilate, foscarnet sodium, fosfonet sodium, FTC, ganciclovir,
ganciclovir sodium, GSK 1265744, 9-2-hydroxy-ethoxy methylguanine,
ibalizumab, idoxuridine, interferon, 5-iodo-2'-deoxyuridine,
IQP-0528, kethoxal, lamivudine, lobucavir, maraviroc, memotine
pirodavir, penciclovir, raltegravir, ribavirin, rimantadine
hydrochloride, rilpivirine (TMC-278), saquinavir mesylate, SCH-C,
SCH-D, somantadine hydrochloride, sorivudine, statolon, stavudine,
T20, tilorone hydrochloride, TMC120, TMC125, trifluridine,
trifluorothymidine, tenofovir, tenofovir alefenamide, tenofovir
disoproxyl fumarate, prodrugs of tenofovir, UC-781, UK-427, UK-857,
valacyclovir, valacyclovir hydrochloride, vidarabine, vidarabine
phosphate, vidarabine sodium phosphate, viroxime, zalcitabene,
zidovudine, zinviroxime, and combinations thereof.
[0360] Suitable examples of antibacterial agents include, without
limitation, those classified in the therapeutic subgroup J01 of the
Anatomical Therapeutic Chemical Classification System. Further
examples include, but are not limited to, aminoglycosides (such as,
e.g., amikacin, gentamicin, kanamycin, neomycin, netilmicin,
streptomycin, tobramycin, paromycin, and the like), ansamycins
(such as, e.g., geldanamycin, herbimycin and the like),
carbacephems (such as, e.g., loracarbef and the like), carbapenems
(such as, e.g., ertapenum, doripenem, imipenem, cilastatin,
meropenem, and the like), first generation cephalosporins (such as,
e.g., cefadroxil, cefazolin, cefalotin, cephalexin, and the like),
second generation cephalosporins (such as, e.g., ceflaclor,
cefamandole, cefoxitin, cefprozil, cefuroxime, and the like), third
generation cephalosporins (such as, e.g., cefixime, cefdinir,
cefditoren, cefoperazone, cefotaxime, cefpodoxime, ceftazidime,
ceftibuten, ceftizoxime, ceftriaxone, and the like), fourth
generation cephalosporins (such as, e.g., cefepime and the like),
fifth generation cephalosporins (such as, e.g., ceftobiprole, and
the like), glycopeptides (such as, e.g., teicoplanin, vancomycin,
and the like), macrolides (such as, e.g., axithromycin,
clarithromycin, dirithromycine, erythromycin, roxithromycin,
troleandomycin, telithromycin, spectinomycin, and the like),
monobactams (such as, e.g., axtreonam, and the like), penicilins
(such as, e.g., amoxicillin, ampicillin, axlocillin, carbenicillin,
cloxacillin, dicloxacillin, flucloxacillin, mezlocillin,
meticillin, nafcilin, oxacillin, penicillin, peperacillin,
ticarcillin, and the like), antibiotic polypeptides (such as, e.g.,
bacitracin, colistin, polymyxin B, and the like), quinolones (such
as, e.g., ciprofloxacin, enoxacin, gatifloxacin, levofloxacin,
lemefloxacin, moxifloxacin, norfloxacin, orfloxacin, trovafloxacin,
and the like), sulfonamides (such as, e.g., mafenide, prontosil,
sulfacetamide, sulfamethizole, sulfanilamide, sulfasalazine,
sulfisoxazole, trimethoprim, trimethoprim-sulfamethoxazole, and the
like), tetracyclines (such as, e.g., demeclocycline, doxycycline,
minocycline, oxytetracycline, tetracycline, and the like), other
antibiotics (such as, e.g., arspenamine, chloramphenicol,
clindamycin, lincomycin, ethambutol, fosfomycin, fusidic acid,
furazolidone, isoniazid, linezolid, metronidazole, mupirocin,
nitrofurantoin, platensimycin, pyrazinamide,
quinupristin/dalfopristin, rifampin/rifampicin, tinidazole, and the
like), and combinations thereof.
[0361] Suitable examples of antifungal agents include, without
limitation, those classified in the therapeutic subgroup J02 of the
Anatomical Therapeutic Chemical Classification System. Further
examples include, but are not limited to, abafungin, albaconazole,
amorolfine, amphotericin B, anidulafungin, atovaquone, biafungin,
bifonazole, bromochlorosalicylanilide, butenafine, butoconazole,
caspofungin, chlormidazole, chlorophetanol, chlorphenesin,
ciclopirox, cilofungin, citronella oil, clotrimazole, croconazole,
crystal violet, dapsone, dimazole, eberconazole, econazole,
efinaconazole, ethylparaben, fenticonazole, fluconazole,
flucytosine, flutrimazole, fosfluconazole, griseofulvin,
haloprogin, hamycin, hexaconazole, isavuconazole, isoconazole,
itraconazole, ketoconazole, lemon grass, lemon myrtle,
luliconazole, micafungin, miconazole, naftifine, natamycin,
neticonazole, nystatin, omoconazole, orange oil, oxiconazole,
patchouli, pentamidine, polynoxylin, posaconazole, potassium
iodide, ravuconazole, salicylic acid, selenium disulfide,
sertaconazole, sodium thiosulfate, sulbentine, sulconazole,
taurolidine, tavaborole, tea tree oil, terbinafine, terconazole,
ticlatone, tioconazole, tolciclate, tolnaftate, tribromometacresol,
undecylenic acid, voriconazole, Whitfield's ointment, and
combinations thereof.
[0362] Suitable examples of antiprotozoal agents include, without
limitation, those classified in the therapeutic subgroup P01 of the
Anatomical Therapeutic Chemical Classification System. Further
examples include, but are not limited to, albendazole, amodiaquine,
amphotericin B, arsthinol, artemether, artemisinin, artemotil,
arterolane, artesunate, atovaquone, azanidazole, benznidazole,
broxyquinoline, carnidazole, chiniofon, chlorhexidine, chloroquine,
chlorproguanil, chlorquinaldol, clefamide, clindamycin, clioquinol,
dehydroemetine, difetarsone, dihydroartemisinin,
diiodohydroxyquinoline, diloxanide, doxycycline, eflornithine,
emetine, etofamide, fexinidazole, fumagillin, furazolidone,
glycobiarsol, halofantrine, hydroxychloroquine, iodoquinol,
lumefantrine, mefloquine, meglumine antimoniate, melarsoprol,
mepacrine, metronidazole, miltefosine, nifurtimox, nimorazole,
nitazoxanide, ornidazole, pamaquine, paromomycin, pentamidine,
phanquinone, piperaquine, primaquine, proguanil, propamidine,
propenidazole, pyrimethamine, pyronaridine, quinacrine, quinidine,
quinine, secnidazole, sodium stibogluconate, sulfadiazine,
sulfadoxine, sulfalene, sulfamethoxazole, suramin, tafenoquine,
teclozan, tenonitrozole, tetracycline, tilbroquinol, tinidazole,
trimethoprim, trimetrexate, and combinations thereof.
[0363] Another object of the present invention is a method for
assessing the risk of microbial--preferably viral, bacterial,
archaeal, fungal or protozoan--contamination in a sample.
[0364] In one embodiment, the method for assessing the risk of
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--contamination according to the present invention
comprises a step of providing a sample. In one embodiment, the
sample by be a biological sample or a non-biological sample.
[0365] In one embodiment, the method for assessing the risk of
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--contamination according to the present invention
comprises a step of performing any of the methods for
discriminating between transcriptionally-active and inert
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--nucleic acid sequences in a sample according to the
present invention.
[0366] In one embodiment, the method for assessing the risk of
microbial--preferably viral, bacterial, archaeal, fungal or
protozoan--contamination according to the present invention
comprises a step of concluding that the sample is at risk of being
contaminated if the at least one identified microbial--preferably
viral, bacterial, archaeal, fungal or protozoan--nucleic acid
sequence hit belongs to a live microbe--preferably virus,
bacterium, archaeon, fungus or protozoan.
BRIEF DESCRIPTION OF THE DRAWINGS
[0367] FIG. 1 is a set of two graphs illustrating the substitution
rates and substitution indexes of T nucleotides. FIG. 1A:
substitution rates of T nucleotides expressed as the ratio of
substituted T to the total of T. FIG. 1B: Substitution indexes
expressed as the ratio of the "T-to-C" substitution rate to the
average of "T-to-A"+"T-to-G" substitution rates. [T: TBEV; S: SMRV;
C: cellular RNAs].
[0368] FIG. 2 is a graph illustrating the substitution rates (in %)
of T nucleotides to C, G or A, in a sample treated with 4sU and
alkylated, using as microbial nucleic acid sequence hit a TBEV
consensus sequence built from data of the current condition.
[0369] FIG. 3 is a graph illustrating the substitution rates (in %)
of T nucleotides to C, G or A, in a sample treated with 4sU and
alkylated, using as microbial nucleic acid sequence hit a SMRV
consensus sequence built from data of the current condition.
[0370] FIG. 4 is a graph showing the GC content distribution of the
662 contigs selected as candidates for LC5_ALAID_CNS reference
genome reconstruction.
[0371] FIG. 5 is a graph showing the tiling of the A. laidlawii
str. PG8A genome with the 662 selected contigs from the initial
assembly of reads of the LC5 experimental condition. Matching
contigs (forward in black, reverse in grey) are reported at their
real percentage of similarity (upper part) and normalized at 10%
similarity to flatten the coverage and ease visualization.
[0372] FIG. 6 is a set of seven graphs illustrating the
substitution rates (or conversion rate) (in %) of T nucleotides to
C, G or A along the LC5_ALAID_CNS reference sequence for the
different tested conditions. Only high confident events
(>20.times. depth) have been selected for the analysis. FIG. 6A:
CTRL5tag condition; FIG. 6B: LC5 condition; FIG. 6C: LC5tag
condition; FIG. 6D: 40 fold diluted LC5tag condition; FIG. 6E:
HC_HK5tag condition; FIG. 6F: HC_HK5tag condition; FIG. 6G:
HC_G5tag condition
EXAMPLES
[0373] The above and other aspects and features of the present
invention will be further illustrated by the following examples.
These examples are illustrative only and not intended to be
limiting.
Example 1: Detection of Replicating Tick-Borne Encephalitis Virus
(TBEV) in Cultured Vero Cells
[0374] Materials and Methods
[0375] Material
[0376] Vero cells were grown in minimum essential medium (MEM)
supplemented with 2% fetal bovine serum (FBS). The virus used for
infection is the Tick-Borne Encephalitis Virus (TBEV), a member of
the family Flaviviridae, consisting of a ssRNA(+) genome with an
average size of 10 kb.
[0377] Methods
[0378] Virus Infection
[0379] Vero cells were plated at 400 000 cells/well in 3 wells of a
MW6 plate, in order to reach 10.sup.6 cells/well after 24
hours.
[0380] Cells were then infected with the TBEV at a MOI
(multiplicity of infection) of 1 and incubated 1 hour on ice with
agitation.
[0381] For one well, the medium was removed just after the
incubation and the cells were lysed with 1 mL of TRIzoI and stored
at -80.degree. C. until RNA extraction (Condition 1).
[0382] For the two other wells (Conditions 2 and 3), the medium was
removed and replaced by MEM+2% FBS and incubated overnight at
37.degree. C.
[0383] 4sU Labelling
[0384] This step was performed using the SLAMseq Kinetic
kit--Anabolic Kinetics Module (Lexogen, Cat. No. 061).
[0385] Incorporation of 4-thiouridine (4sU) in the cell culture
medium during the cell culture allows 4sU nucleotides to be
incorporated into newly synthesized RNA.
[0386] The media containing 800 .mu.M 4sU was prepared by adding 8
.mu.L of 100 nM 4sU in 992 .mu.L of MEM.
[0387] The day following the viral infection, the medium was
removed and replaced by medium without 4sU in one well (Condition
2) or 4sU-containing medium (800 .mu.M) for the last well
(Conditions 3). Six hours later, the medium was removed and
replaced by fresh medium without 4sU in Condition 2, or by fresh
4sU-containing medium (800 .mu.M) in Condition 3.
[0388] Three hours later, the medium was removed from the three
wells, and the cells were lysed with 1 mL of TRIzoI and stored
a-80.degree. C. until RNA extraction.
[0389] RNA Sampling
[0390] This step was performed using the SLAMseq Kinetic
kit--Anabolic Kinetics Module (Lexogen, Cat. No. 061).
[0391] The RNA extraction was performed in the dark and using a
chloroform:isoamyl alcohol mix 24:1 (Sigma Aldrich, Cat. No. 25666)
followed by isopropanol/ethanol precipitation. During extraction,
reducing agent (RA) was used to maintain the 4sU-treated samples
under reducing conditions.
[0392] The isolated total RNA contains both existing (unlabeled)
and newly synthesized (labeled) RNA.
[0393] Alkylation
[0394] This step was performed using the SLAMseq Kinetic
kit--Anabolic Kinetics Module (Lexogen, Cat. No. 061).
[0395] Total extracted RNA of Condition 3 was mixed with
iodoacetamide (IAA), which modifies the 4-thiol group of
4sU-containing nucleotides via the addition of a carboxyamidomethyl
group. The RNA was then purified using ethanol precipitation prior
to proceeding to library preparation.
[0396] Library Preparation
[0397] The SMARTer Stranded Total RNA-Seq Kit--Pico Input Mammalian
(ClonTech) was used for a direct construction of libraries starting
with 10 ng of RNA. The workflow used in this kit incorporates a
proprietary technology (PathoQuest, Paris, France) that depletes
ribosomal cDNA using probes specific to mammalian rRNA and some
mitochondrial RNA.
[0398] Sequencing
[0399] Sequencing was performed on the NextSeq instrument
(Illumina) using the NextSeq 500/550 High Output kit v2
(FC-404-2002, Illumina)
[0400] Sequencing was single-read with a read length of 150
nucleotides and approximatively 125 million of reads per sample
were generated.
[0401] Outline
[0402] Table 1 bellow summarize the protocol used for the three
different Conditions (1, 2 and 3).
TABLE-US-00001 TABLE 1 outline of the protocol Condition 1
Condition 2 Condition 3 Step 1 TBEV infection TBEV infection TBEV
infection Step 2 RNA Extraction RNA extraction 4sU labelling (day
0) (day 1) Step 3 Library preparation Library preparation RNA
extraction (day 1) Step 4 Sequencing Sequencing Alkylation Step 5
-- -- Library preparation Step 6 -- -- Sequencing
[0403] Bioinformatic Analysis--TBEV Genome Analysis
[0404] The first objective of this study was to obtain a complete
TBEV genome sequence of this isolate to be used as a reference.
This analysis was performed on the infected sample at day 0,
without 4sU treatment.
[0405] Raw Reads Filtering
[0406] First, the raw data reads were filtered to select
high-quality and relevant reads.
[0407] Raw data were sorted out to suppress or cut the duplicates,
low quality reads and homopolymers (proprietary software).
Sequences introduced during Illumina.RTM. libraries preparation
(adapters, primers) were removed with Skewer (Jiang et al., 2014.
BMC Bioinformatics. 15:182).
[0408] Finally, endogenous primate reads (from Vero cells) aligned
to the human genome (Reference GRCh37/hg19) or reads aligned to
bacterial rRNA were discarded.
[0409] Local alignments were performed with BWA (Li et al., 2009.
Bioinformatics. 25(14):1754-60).
[0410] Human genome was downloaded from the UCSC Genome Browser
(2002. Genome Res. 12(6):996-1006). Bacterial rRNA database was
downloaded from the EMBL-EBI ENA rRNA database, with additional
in-house sequences cleaning and clustering process.
[0411] These filtered reads were considered as sequences of
interest.
[0412] De Novo Assembly
[0413] The set of remaining and relevant reads was then assembled
into longer sequences named contigs. This de novo assembly step was
performed with CLC assembly cell solution (Qiagen).
[0414] Agnostic Virus Identification
[0415] Resulting contigs and non-assembled reads (singletons) were
aligned using BLAST alignment (Altschul et al., 1990. J Mol Biol.
215(3):403-10) on viral and comprehensive databases. Contigs and
singletons were first aligned on a viral nucleotide database. Hits
with a e-value below 10.sup.-3 were aligned on a comprehensive
nucleotide database. If their best hit was still a viral taxonomy,
hits were reported.
[0416] Nucleotide viral and comprehensive databases were downloaded
on November 2017 from the EMBL-EBI nucleotide sequence database
STD. A proprietary software was developed to remove duplication and
low confidence sequences (because too short, multiple taxonomies,
low-quality associated keywords, etc.).
[0417] The contigs without any viral nucleotide hit were similarly
aligned successively on viral and comprehensive protein databases
to check for more distant viral hits.
[0418] Protein viral and comprehensive databases were downloaded on
November 2017 from the Uniref100 database. The Uniref100 database
is already non-redundant but a taxonomic cleaning process was
performed to produce the final databases.
[0419] The taxonomic assignment reported the best hit results.
Contigs not assigned after these two rounds of alignment were
classified as unknown or non-viral species.
[0420] Table 2 below shows the results of the analysis.
TABLE-US-00002 TABLE 2 agnostic viruses identification Condition 1
Condition 2 Condition 3 Total reads 153379022 153360490 149549706
FILTERING Duplicates/Quality 84575371 85885045 73631007 Adapters
84575497 85884068 73629933 Host 19348675 21678833 27726895 rRNA
19314195 21637189 27664792 ASSEMBLY Contigs 32777 246669 460995
Singletons 2800174 2937361 3951554 % assembled reads 85.50% 86.42%
85.72% TBEV RESULTS Nb contigs 4 127 143 Nb reads in contigs 100031
2514666 2255875 Nb singletons 145 5052 3932 Total reads 100176
2519718 2259807 Average contigs identity (%) 91.23% 92.06% 91.71%
ADDITIONNAL CLOSE SPECIES (TOTAL READS) Machupo mammarenavirus 1 2
Louping ill virus 14 Bovine viral diarrhea virus 1 12 7 9 Bovine
viral diarrhea virus 2 1 Bovine viral diarrhea virus 3 1 Singapore
grouper iridovirus 48 41 Rotavirus C 1 Cercopithecine
betaherpesvirus 5 25 Stealth virus 5 1 Human gammaherpescirus 8 23
6 uncultured virus 30 Simian retrovirus 3 17 Squirrel monkey
retrovirus 923291 837907 1064215 Primate T-lymphotropic virus 1 5
29 10 Baboon endogenous virus 8503 8691 7710 Feline leukemia virus
2 3 Human endogenous retrovirus 478 280 529 Human endogenous
retrovirus K 16 35 37 Human endogenous retrovirus W 64 53 37
Retroviridae 4 (no genus, no species)
[0421] TBEV Final Consensus Edition
[0422] This process identified a contig encompassing the full TBEV
genome sequence.
[0423] All the reads were then realigned with CLC assembly cell
solution (Qiagen) on this sequence to extract a final consensus
sequence. This sequence was labeled "TBEV REFERENCE" for the
study.
[0424] Bioinformatic Analysis--Nucleotides Substitution Rate
Study
[0425] The objective was to compare the reads from the different
samples to check if the "T-to-C" substitution rate was
significantly higher in the 4sU+alkylation sample (Condition
3).
[0426] Tick Borne Encephalitis Virus Bank Creation
[0427] The "TBEV REFERENCE" sequence was used to create a blast
bank.
[0428] In order to detect potential sequences with a very high
"T-to-C" substitution rate, the reference sequence was also
modified by substituting every T by a C. This sequence was named
"TBEV T-C REFERENCE". The "TBEV REFERENCE" and "TBEV T-C REFERENCE"
sequences were merged together to form a single "TBEV BLAST"
bank.
[0429] Raw Reads Filtering
[0430] First, a quality filtering process was performed to remove
or trim low-quality reads (proprietary software).
[0431] Then sequences introduced during Illumina libraries
preparation (adapters, primers) were removed with Skewer (Jiang et
al., 2014. BMC Bioinformatics. 15:182).
[0432] To avoid any analysis bias, duplicated reads were not
removed.
[0433] Filtered Reads Blast on TBEV Blast Bank
[0434] The set of remaining and relevant reads was then aligned by
BLAST (Altschul et al., 1990. J Mol Biol. 215(3):403-10) on the
previously designed "TBEV BLAST" bank. The maximal e-value was set
to 10.sup.-8.
[0435] All aligned reads were considered as TBEV positive and
selected for the next step of the analysis.
[0436] Mapping of Selected Reads on TBEV Complete Genome
[0437] TBEV-positive reads were then realigned by mapping with CLC
assembly cell solution (Qiagen) on the "TBEV REFERENCE" sequence. A
quality control was set to ensure that at least 99% of the
blast-selected reads were positively realigned on the
reference.
[0438] Table 3 below summarizes the number of mapped sense and
antisense reads obtained for each condition and the resulting
coverage of the sequence.
TABLE-US-00003 TABLE 3 reads mapping, orientation and coverage.
Number of reads Coverage (%) Condition 1 Total 160085 100.00 Sens
159647 100.00 Antisens 438 45.48 % of antisens reads 0.27 Condition
2 Total 6408291 100.00 Sens 6385645 100.00 Antisens 22646 99.06 %
of antisens reads 0.353 Condition 3 Total 5240211 100.00 Sens
5221070 100.00 Antisens 19141 95.95 % of antisens reads 0.365
[0439] Substitutions Rate Estimation
[0440] The CLC program "cic_find_variations" was used to detect
every mismatch at every position of the TBEV study reference. The
global variations profile was then analyzed by a proprietary script
to define each nucleotide substitution rate. The proportion of
substituted nucleotides was compared to the total number of aligned
nucleotides. Typically, the "T-to-C" substitution rate is
calculated using the following formula:
Number of C nucleotides identified when a T was expected Total
number of aligned nucleotides ##EQU00003##
[0441] TBEV Stranded Analysis
[0442] A targeted and stranded analysis was performed on the TBEV
identified reads. This analysis performs a more stringent mapping
alignment of filtered reads. This alignment provides a detailed
horizontal genome coverage and depth profile.
[0443] Local alignments were performed with BWA (Li et al., 2009.
Bioinformatics. 25(14):1754-60).
[0444] Since the samples' libraries were prepared using the SMARTer
Stranded RNA-Seq Kit, the RNA strand information was retained.
Therefore, a mapping alignment analysis provided information on the
mother strand of each read (sense or reverse relative to the mother
strand).
[0445] The transcripts coverage allowed to conclude on the viral
replication signature in the cell sample.
[0446] Results and Conclusion
[0447] The ratio of the "T-to-C" substitution rate calculated on
the mismatches mapped on the TBEV reference genome in Condition 3
over the T-to-C substitution rate calculated on the mismatches
mapped on the TBEV reference genome in Condition 1 is equal to
7,86:1 (Table 4).
[0448] This indicates an increase of the proportion of TBEV RNA
species having incorporated 4sU, hence the neo-synthesis of viral
RNA during the 9 hours incubation of the Vero cell culture in
medium containing 4sU.
[0449] The method exemplified here allows thus the detection of
replicating the (+)ssRNA virus, TBEV, using metabolic
labelling.
TABLE-US-00004 TABLE 4 T-to-C substitution rate. Condition 1
Condition 2 Condition 3 Total mapped nt 22613213 906722034
731556753 Total mismatchs 64625 2551959 3131703 Mismatch rate 0.29%
0.28% 0.43% Number of T.fwdarw.C 4751 189637 1209349 substitutions
Number of T.fwdarw.C 7.35% 7.43% 38.62% substitutions/ Total
mismatchs Number of T->C 0.02% 0.02% 0.17% substitutions/ Total
mapped nt Other nt substitution rates/ Total mapped nt A.fwdarw.C
0.03% 0.03% 0.02% A.fwdarw.G 0.02% 0.02% 0.02% A.fwdarw.T 0.02%
0.02% 0.01% C.fwdarw.A 0.05% 0.04% 0.04% C.fwdarw.G 0.01% 0.01%
0.01% C.fwdarw.T 0.02% 0.02% 0.03% T.fwdarw.A 0.03% 0.03% 0.03%
T.fwdarw.G 0.02% 0.02% 0.02% G.fwdarw.A 0.03% 0.03% 0.03%
G.fwdarw.C 0.03% 0.03% 0.03% G.fwdarw.T 0.02% 0.02% 0.02% Minimum
0.01% 0.01% 0.01% Maximum 0.05% 0.04% 0.04% Average 0.02% 0.02%
0.02%
Example 2: Detection of Replicating Squirrel Monkey Retrovirus
(SMRV) in Cultured Vero Cells
[0450] Following agnostic virus identification in Example 1, the
best hit results showed contigs assigned to the Squirrel Monkey
Retrovirus. This virus is known to be endogenous and fully
integrated in some monkey species. In particular, Vero cells used
in this study have been described to harbors a variety of simian
endogenous type D retrovirus sequences, in particular SMRV
sequences (Sakuma et al., 2018. Sci Rep. 8(1):644).
[0451] Based on this knowledge and in view of the results shown in
Table 2 above, the same bioinformatic procedure was carried out to
identify a SMRV sequence hit and to study nucleotides substitution
rate in this sequence hit.
[0452] Table 5 below summarizes the number of mapped sense and
antisense reads obtained for each condition and the resulting
coverage of the sequence.
TABLE-US-00005 TABLE 5 reads mapping, orientation and coverage.
Number of reads Coverage (%) Condition 1 Total 1807960 100.00 Sens
1800981 100.00 Antisens 6979 45.48 % of antisens reads 0.386
Condition 2 Total 1601090 100.00 Sens 1594681 100.00 Antisens 6409
98.75 % of antisens reads 0.400 Condition 3 Total 1816788 100.00
Sens 1808031 100.00 Antisens 8757 95.99 % of antisens reads
0.482
[0453] The ratio of the "T-to-C" substitution rate calculated on
the mismatches mapped on the SMRV reference genome in Condition 3
over the T-to-C substitution rate calculated on the mismatches
mapped on the SMRV reference genome in Condition 1 is equal to
41.86:1 (Table 6).
[0454] This indicates an increase of the proportion of SMRV RNA
species having incorporated 4sU, hence the neo-synthesis of viral
RNA during the 9 hours incubation of the Vero cell culture in
medium containing 4sU.
[0455] The method exemplified here allows thus the detection of
replicating the (+)ssRNA-RT virus, SMRV, using metabolic
labelling.
TABLE-US-00006 TABLE 6 T-to-C substitution rate. Condition 1
Condition 2 Condition 3 Total mapped nt 256422353 227104801
253753828 Total mismatchs 1288384 1132542 3531923 Mismatch rate
0.502% 0.499% 1.392% Number of T.fwdarw.C 54668 47393 2230483
substitutions Number of T.fwdarw.C 4.243% 4.185% 63.152%
substitutions/ Total mismatchs Number of T.fwdarw.C 0.021% 0.021%
0.879% substitutions/ Total mapped nt Other nt substitution rates/
Total mapped nt A.fwdarw.C 0.08% 0.08% 0.07% A.fwdarw.G 0.03% 0.03%
0.04% A.fwdarw.T 0.02% 0.02% 0.02% C.fwdarw.A 0.10% 0.10% 0.09%
C.fwdarw.G 0.03% 0.03% 0.03% C.fwdarw.T 0.02% 0.02% 0.04%
T.fwdarw.A 0.05% 0.05% 0.06% T.fwdarw.G 0.05% 0.05% 0.07%
G.fwdarw.A 0.06% 0.06% 0.07% G.fwdarw.C 0.03% 0.02% 0.03%
G.fwdarw.T 0.02% 0.02% 0.02% Minimum 0.02% 0.02% 0.02% Maximum
0.10% 0.10% 0.09% Average 0.04% 0.04% 0.05%
Example 3
[0456] Materials and Methods
[0457] Cells and Viruses
[0458] A vial of Vero cells (ATCC-CCL-81, batch #62488537,
Molsheim, France) was frozen at passage 3 and then defrosted in a
BSL-3 laboratory and the cells were grown in MEM supplemented with
10% FBS. Cells were used at passage 18.
[0459] A second vial of Vero cells (batch #70005907) was bought
from the same source and used directly for PCR testing.
[0460] Tick-Borne Encephalitis Virus (TBEV) is a member of the
family Flaviviridae, consisting of a ssRNA(+) genome. The Hypr
strain (Wallner et al., 1996. J Gen Virol. 77(Pt 5):1035-42) was
kindly supplied by Sarah Moutailler, ANSES, Maisons-Alfort,
France).
[0461] TBEV Infection of Vero Cells
[0462] Vero cells were plated at 400,000 cells/well in 3 wells of a
MW6 plate in order to reach 10.sup.6 cells/well after 24 hours.
Cells were then infected with the Strain Hypr TBEV at a
multiplicity of infection of 1, and incubated 1 hour on ice with
agitation.
[0463] The medium was removed in one well just after incubation and
the cells were lysed with 1 mL of Trizol and stored at -80.degree.
C. until RNA extraction (Condition "D0--no 45U").
[0464] For the other two wells (Conditions 2, 3 and 4), the medium
was removed and replaced by MEM+10% FBS and incubated overnight at
37.degree. C.
[0465] 4sU Labelling and RNA Extraction
[0466] Addition of 4-thiouridine (4sU) into the cell culture medium
enables 4sU nucleotides to be incorporated into newly synthesized
RNA. The reverse transcription of 4sU displays a certain percentage
of misincorporation resulting in a T>C transition in the cDNA,
which can be identified by sequencing (Herzog et al., 2017. Nat
Methods. 14(12):1198-1204).
[0467] The medium containing 800 .mu.M 4sU was prepared by adding 8
.mu.L of 100 nM 4sU in 992 .mu.L of MEM. The day following viral
infection, the medium was removed and replaced by either medium
without 4sU in one well (Condition "D1--no 4sU") or 4sU-containing
medium (800 .mu.M) for the other well (Conditions "D1--with
4sU").
[0468] Six hours later, the medium was removed and replaced by
fresh medium without 4sU in condition "D1--no 4sU", or by fresh
4sU-containing medium (800 .mu.M) in condition "D1--with 4sU".
[0469] Three hours later, the medium was removed from the three
wells and the cells were lysed with 1 mL of Trizol and stored at
-80.degree. C. until RNA extraction.
[0470] RNA extraction was performed in the dark using a
chloroform:isoamyl alcohol mix 24:1 (Sigma Aldrich, Cat. No. 25666,
Saint Louis, USA) followed by isopropanol/ethanol precipitation.
During extraction, reducing agent was used to maintain the
4sU-treated sample under reducing conditions.
[0471] Alkylation was performed using the SLAMseq Kinetic
kit--Anabolic Kinetics Module (Lexogen, Cat. No. 061, Vienna,
Austria) for one part of the condition "D1--with 4sU" only. Total
extracted RNA was mixed with iodoacetamide (IAA), which modifies
the 4-thiol group of 4sU-containing nucleotides via the addition of
a carboxyamidomethyl group leading to the condition "D1--with
4sU+alkylation". This alkylation amplifies the frequency of T>C
misincorporations during the reverse transcription. The other part
was labelled "D1--with 4sU no alkylation".
[0472] The RNA was then purified using ethanol precipitation prior
to proceeding to library preparation.
[0473] Library Preparation and Sequencing
[0474] The SMARTer Stranded Total RNA-Seq Kit--Pico Input Mammalian
(ClonTech, Mountain View, USA) was used for a direct construction
of libraries starting with 10 ng of RNA. The workflow used with
this kit incorporates a proprietary technology (PathoQuest, Paris,
France) that depletes ribosomal cDNA using probes specific to
mammalian rRNA and some mitochondrial RNA. Sequencing was performed
on the NextSeq instrument (Illumina, San Diego, United States)
using the NextSeq 500/550 High Output kit v2 (FC-404-2002,
Illumina). Sequencing was single-read with a read length of 150
nucleotides generating approximatively 125 million reads per
sample.
[0475] Agnostic Bioinformatic Analysis
[0476] The raw data reads were filtered to select high-quality and
relevant reads. Raw data was sorted to suppress or cut duplicates,
low quality reads and homopolymers (PathoQuest proprietary
software).
[0477] Sequences introduced during the preparation of Illumina
libraries (adapters, primers) were removed with Skewer (Jiang et
al., 2014. BMC Bioinformatics. 15:182).
[0478] Primate reads (from Vero cells) aligned to the human genome
(Reference GRCh37/hg19) or reads aligned to bacterial rRNA were
discarded. Local alignments were performed with BWA (Li et al.,
2009. Bioinformatics. 25(14):1754-60). Human genome was downloaded
from the UCSC Genome Browser (Kent et al., 2002. Genome Res.
12(6):996-1006). The bacterial rRNA database was initially
downloaded from the EMBL-EBI ENA rRNA database
(ebi.ac.uk/pub/databases/ena/rRNA/release) followed by an
additional in-house sequence cleaning and clustering process. These
filtered reads were considered as sequences of interest and were
assembled into longer sequences named "contigs" with CLC assembly
cell solution (Qiagen Hilden, Germany). Resulting contigs and
non-assembled reads (singletons) were aligned using BLAST alignment
(Altschul et al., 1990. J Mol Biol. 215(3):403-10) on viral and
comprehensive databases. Contigs and singletons were first aligned
on a viral nucleotide database. Hits with an e-value below
10.sup.-3 were aligned on a comprehensive nucleotide database. If
the best hit was still a viral taxonomy, hits were reported.
[0479] Nucleotide viral and comprehensive databases were downloaded
on November 2017 from the EMBL-EBI nucleotide sequence database
STD. A proprietary software (PathoQuest, Paris, France) was
developed to remove duplication and low confidence sequences (e.g.,
too short, multiple taxonomies, low-quality associated keywords).
Contigs without any viral nucleotide hits were similarly aligned
successively on viral and comprehensive protein databases to check
for more distant viral hits. Protein viral and comprehensive
databases were downloaded on November 2017 from the Uniref100
database (https://www.uniprot.org). While the Uniref100 database is
already non-redundant, we utilized a taxonomic cleaning process to
produce the final databases. The taxonomic assignment reported the
best hit results with contigs not assigned after these two rounds
of alignment being classified as unknown or non-viral species.
[0480] The above process identified a contig encompassing the full
TBEV genome sequence (see "Results"). All the reads were then
realigned with CLC assembly cell solution (Qiagen, Hilden, Germany)
on this sequence to extract a final consensus sequence. The data
retrieved from the condition "D0--no 4sU" allowed to identify
contigs covering the whole genome of TBEV and SMRV and the
resulting sequence was respectively labeled "TBEV REFERENCE" and
"SMRV REFERENCE".
[0481] Estimation of T>C Substitution Ratio
[0482] In order to be able to detect viral sequences with a very
high "T-to-C" substitution rate, each reference sequence was also
modified by substituting every T by a C. These sequences were named
"TBEV T-C REFERENCE" and "SMRV-T-C REFERENCE". The "TBEV REFERENCE"
and "TBEV T-C REFERENCE" and the "SMRV REFERENCE" and "SMRV T-C
REFERENCE" were merged together to form two banks named "TBEV
BLAST" and "SMRV BLAST". The set of quality filtered reads was then
aligned by BLAST using these previously designed "BLAST" banks. The
maximal e-value was set to 10.sup.-8. Only aligned reads were
selected for the next step of the analysis.
[0483] The CLC program "cic_find_variations" was used to detect
every mismatch at every position of the TBEV study reference. The
global variations profile was then analyzed by a proprietary script
(PathoQuest, Paris, France) to define each nucleotide substitution
rate. The proportion of substituted nucleotides was compared to the
total number of aligned nucleotides. For example, the "T-to-C"
substitution rate was calculated using the following formula:
T -> C substitution rate = Number of C nucleotides identified
when a T was expected Total number of expected T ##EQU00004##
[0484] The substitution rates for each time point were normalized
with the following substitution index:
T .fwdarw. C substitution index = " T -> C " rate Mean ( " T
-> A " , " T -> G " rates ) ##EQU00005##
[0485] As a quality control for the labelling, we checked the mean
substitution index of a set of exons using non-labelled cells as a
reference. Exons were used from the following human genes described
by Eisenberg & Levanon (2013. Trends Genet. 29(10):569-74)
(RefSeq accession number): C1orf43 (NM_015449), CHMP2A (NM_014453),
EMC7 (NM_020154), GPI (NM_000175).
[0486] We used these human exons to identify their equivalent in
the Cholorcebus sabeus genome, from whom the Vero cells are
derived. The complete assembly of Chlorocebus sabeus (Accession
number GCF_000409795.2) was retrieved from NCBI assembly database
(https://www.ncbi.nlm.nih.gov/assembly/). Selected human exons were
mapped onto C. sabeus assembly using minimap2 (Li, 2018.
Bioinformatics. 34(18):3094-3100) and resulting .bam file was
converted to .bed file using the bamtobed module from the BEDTools
utility (Quinlan & Hall, 2010. Bioinformatics. 26(6):841-2).
Only hits with mapping quality higher than 30 were retained (41
exons) and the corresponding sequences were extracted from C.
sabeus assembly using the getfasta module from the BEDTools utility
and indexed for further analyses. Labelling was considered
satisfactory if the substitution index was superior to 10.
[0487] Stranded Analysis
[0488] A targeted and stranded analysis was performed on the
identified TBEV reads. This analysis was based on a more stringent
mapping alignment of filtered reads with the alignment providing a
detailed horizontal genome coverage and depth profile. Local
alignments were performed with BWA. Since the sample libraries were
prepared using the SMARTer Stranded RNA-Seq Kit, the RNA strand
information was also retained. As a result, a mapping alignment
analysis was able to provide information on the mother strand of
each read (sense or reverse relative to the mother strand).
[0489] Results
[0490] Identification of Adventitious Viruses by Agnostic RNA-Seq
in Vero Cells
[0491] Vero cells were first put in contact with a high dose of
TBEV at +4.degree. C. (D0). At this temperature, only virus binding
to cells receptors occurs and virus entry is blocked. Therefore,
this experimental setting mimics the carryover of a non-replicating
virus. RNAs were extracted and sequenced as a marker of DNA- or RNA
virus infection. The results of the agnostic analysis and those of
the mapping of the reads against the two main viral hits found by
the agnostic analysis (TBEV and SMRV) are shown respectively in
Table 7 and Table 8.
TABLE-US-00007 TABLE 7 number (% negative sense/total reads) of the
reads onto TBEV and SMRV genomes and horizontal coverage of the
genome (% genome). Reads were mapped on the genomes of TBV and SMRV
found by the agnostic procedure (Table 8). D1 - with D1 - with
Sample D0 - D1 - 4sU + 4sU no condition no 4sU no 4sU alkylation
alkylation TBEV reads 160085 6408291 5240211 5722338 (0.27) (0.35)
(0.36) (0.32) TBEV 100 100 100 100 horizontal coverage SMRV reads
1807960 1601090 1816788 2479933 (0.39) (0.40) (0.48) (0.37) SMRV
100 100 100 100 horizontal coverage
TABLE-US-00008 TABLE 8 Agnostic analysis - Number of reads
following each step o the filtering process, results of the de novo
assembly and of the blast analysis. D1 - with D1 - with D0 - D1 -
4sU + 4sU no no 4sU no 4sU alkylation alkylation Total reads
153379022 153360490 149549706 163035957 FILTERING
Duplicates/Quality 84575371 85885045 73631007 82102171 Adapters
84575497 85884068 73629933 82101241 Host 19348675 21678833 27726895
21993690 rRNA 19314195 21637189 27664792 21957108 ASSEMBLY Contigs
32777 246669 460995 337130 Singletons 2800174 2937361 3951554
2842044 % assembled reads 85.50% 86.42% 85.72% 87.06% TBEV RESULTS
Nb contigs 4 127 143 104 Nb reads in contigs 100031 2514666 2255875
2060105 Nb singletons 145 5052 3932 3570 Total reads 100176 2519718
2259807 2063675 Average contigs identity (%) 91.23% 92.06% 91.71%
90.53% ADDITIONNAL CLOSE SPECIES (TOTAL READS) Machupo
mammarenavirus 1 2 Louping ill virus 14 Bovine viral diarrhea virus
1 12 7 9 9 Bovine viral diarrhea virus 2 1 Bovine viral diarrhea
virus 3 1 Singapore grouper iridovirus 48 41 59 Orthohepevirus A 1
Rotavirus C 1 Cercopithecine betaherpesvirus 5 25 2 Stealth virus 4
5 Stealth virus 5 1 Human gammaherpescirus 8 23 6 uncultured virus
30 Simian retrovirus 3 17 Squirrel monkey retrovirus 923291 837907
1064215 1107994 Primate T-lymphotropic virus 1 5 29 10 Baboon
endogenous virus 8503 8691 7710 10617 Feline leukemia virus 2 3 28
Human endogenous retrovirus 478 280 529 322 Human endogenous
retrovirus K 16 35 37 37 Human endogenous retrovirus W 64 53 37 60
Lnras* SN acutely transforming 9 retrovirus Retroviridae 4 7 (no
genus, no snecies)
[0492] The main viral species detected at D0 was, as expected,
TBEV, but also, unexpectedly, SMRV (Table 7). More than 160,000
TBEV reads out of a total of around 150 million raw reads (Table 8)
were identified, covering the whole genome. Vero cells were then
shifted at 37.degree. C. to allow for virus entry and then
incubated for one day before harvest. The number of reads strongly
increased with between 5.2 million to 6.4 million TBEV reads
recorded. Additionally, between 1.6 and 1.8 million reads mapping
to SMRV-H (a SMRV isolated from a human lymphoid cell line (Oda et
al., 1988. Virology. 167(2):468-76)) were also identified
independent of the day of harvest. This meant that the SMRV
transcripts were expressed by the cells without any relationship to
experimental infection by TBEV.
[0493] We also identified a number of other hits (Table 8). The
main additional hit was Baboon endogenous virus, a known endogenous
virus of Vero cells (Ma et al., 2011. J Virol. 85(13):6579-88). A
few hundred reads mapping to endogenous human retroviruses were
also recorded. In our experience this finding is frequent in
primate/human cell lines. We found also a few BVDV reads typically
associated with the use of gamma-irradiated bovine serum. We also
identified a few reads (<50) targeted to different herpes
viruses which we considered as background noise.
[0494] Differentiation of Cell Infection Versus Carry-Over of Inert
Sequences
[0495] Since our primary objective was to mimic challenging
conditions for differentiation between cell infection from
carryover while testing the capability of HTS for detecting early
infection of cells, we compared results of cells put in contact
with high doses of TBEV blocked for virus replication at +4.degree.
C. with those of cells infected with the same dose of virus 24
hours post-infection. The former mimicked cells inactivated virus
or free nucleic acids and the latter mimicking cells infected just
before banking. Since TBEV is a positive sense ssRNA virus, the
negative sense RNA was used as a marker of virus replication. The
three conditions tested at D1 (no 4uS; with 4sU+alkylation; with
4sU no alkylation) showed that 0.32 to 0.36% of the reads were
negative sense compared to 0.27% at D0, a very small but highly
significant difference (chi-square test, p<0.0001). This type of
comparative analysis is not relevant for the chronic infection of
cells by SMRV, a retrovirus for which transcription uses as matrix
a DNA provirus and leads mainly to positive but also to negative
sense RNAs (Manghera et al., 2017. Virol J. 14(1):9).
[0496] We then examined the TBEV rate of "T-to-C" substitution
following metabolic labelling by 4sU of newly synthetized RNAs
(Table 9 and FIG. 1).
TABLE-US-00009 TABLE 9 substitution rate of T nucleotides and
substitution index D1 - with D1 - with Sample D0 - D1 - 4sU + 4sU
no condition no 4sU no 4sU alkylation alkylation --- TBEV --- Rate
"T-to-C" (%) 0.15 0.13 0.79 0.13 Rate "T-to-A" (%) 0.04 0.04 0.08
0.04 Rate "T-to-G" (%) 0.10 0.12 0.17 0.12 Substitution index 2.09
1.68 6.39 1.71 --- SMRV --- Rate "T-to-C" (%) 0.12 0.12 1.87 0.12
Rate "T-to-A" (%) 0.04 0.03 0.06 0.04 Rate "T-to-G" (%) 0.07 0.07
0.09 0.07 Substitution index 2.19 2.26 24.16 2.27 --- Cellular
transcripts --- Rate "T-to-C" (%) 0.08 0.08 0.98 0.07 Rate "T-to-A"
(%) 0.02 0.02 0.03 0.01 Rate "T-to-G" (%) 0.04 0.05 0.05 0.03
Substitution index 2.73 2.30 26.21 3.36
[0497] At D1, in absence of metabolic labelling, the ratio of
"T-to-C" was very low (0.13%) and similar to those of "T-to-A" or
"T-to-G" (0.04-0.13%) resulting in a calculated background
substitution index of 1.68. Similar results were obtained at D0
indicating good reproducibility of the background of
substitution.
[0498] In clear contrast, the "T-to-C" substitution rate for
labelled and alkyled RNAs of TBEV at D1 was much higher (0.79%)
resulting in a substitution index of 6.4, a 3.8-fold increase
compared to the background. The substitution index at D1 for the
labelled and alkyled SMRV cells was 24.16, 10.7-fold over
background.
[0499] Comparisons between metabolically labelled and non-labelled
RNAs would necessitate two conditions of culture. As a result, we
also compared the TBEV and SMRV substitution indexes obtained at D1
for the 4sU-labelled culture, with and without RNA alkylation. This
necessitates only one condition of culture, followed by RNA
extraction and alkylation, or no treatment. The low level of
substitutions in RNA 4sU-labeled, non-alkylated cells did not
impair the detection by blast analysis of potential viral hits
(Table 8). As shown in Table 9 and FIG. 1B the substitution index
of the 4sU-labelled, non-alkylated RNAs remained low and close to
that of the non-labelled condition (1.71 and 2.27 for TBEV and
SMRV, respectively, increasing to 4.0 and 10.6-fold respectively in
the alkylated condition). This suggests that non-alkylated RNAs
extracted from the same cell culture can be used to establish the
reference consensus viral bank used to calculate substitution
rates. Therefore, our results show that following 4sU labelling of
cells, RNA-Seq was able to specifically identify newly synthetized
viral RNAs with a high signal-to-background noise ratio.
[0500] Finally, we also compared the ratio between the T.fwdarw.C
substitution rate in 4sU-labelled, alkylated cells and the average
T.fwdarw.A and T.fwdarw.G substitutions observed in the same cells,
for TBEV (FIG. 2) and SMRV (FIG. 3).
[0501] These ratios are given in Table 10. A ratio of substitution
above 1 is indicative of active transcription in the sample. These
results therefore clearly show that the method of the invention is
able to discriminate and detect live TBEV and SMRV by comparing the
substitutions rates of different nucleotides in a single condition
(D1--with 4sU+alkylation).
TABLE-US-00010 TABLE 10 ratios of T.fwdarw.C substitution vs
average T.fwdarw.A/T.fwdarw.G substitutions TBEV SMRV T.fwdarw.A
0.10 0.10 T.fwdarw.C 0.83 3.54 T.fwdarw.G 0.18 0.12
T.fwdarw.C/Avg(T.fwdarw.A, T.fwdarw.G) 5.87 32.41
Example 4
[0502] Materials and Methods
[0503] Cells and Mollicutes
[0504] A549 (ATCC_CCL-185) cells were grown in DMEM-Dulbecco's
Modified Eagle Medium to circa 70% confluence in a 6 well plate
before contamination.
[0505] Acholeplasma laidlawii is the representative of the
mollicute family selected to infect the A549 cells.
[0506] Acholeplasma laidlawii Infection of A549 Cells
[0507] At circa 70% confluence, the culture medium of the A549
cells was changed to MEM-Earle medium supplemented with 7% fetal
bovine serum and 1% L-glutamine without antibiotics. Cells were
infected at various infectious doses of Acholeplasma laidlawii at
day 0 (Table 11). At day 5, 4-thiouridine (4sU) (800 .mu.M) was
added to the culture medium 9 hours, 6 hours and 3 hours before
supernatant harvest. 2 mL of culture medium were removed after 5
days of incubation at 37.degree. C. and clarified by centrifugation
at 200 g for 5 minutes. 1 mL of clarified supernatant centrifuged
at 15 000-20 000 g during 10 minutes and 900 .mu.L of supernatant
were removed and the pellet was homogenized in the 100 .mu.L
remaining supernatant. Samples were then frozen prior to nucleic
acids extraction.
[0508] Addition of 4-thiouridine (4sU) into the cell culture medium
enables 4sU nucleotides to be incorporated into newly synthesized
RNA. The reverse transcription of 4sU displays a certain percentage
of misincorporation resulting in a T>C transition in the cDNA,
which can be identified by sequencing (Herzog et al., 2017. Nat
Methods. 14(12):1198-1204).
TABLE-US-00011 TABLE 11 description of the test items Acholeplasma
sp. Viable Acholeplasma sp. infection dose count at day 5 Test
items (cfu/mL) (cfu/mL) CTRL5Tag none none LC5 5 >10.sup.9
LC5Tag* 5 >10.sup.9 Diluted LC5Tag* ~2.5 .times. 10.sup.7
HC_HK5tag 2.5 .times. 10.sup.7** none 4.degree. HC_HK5tag 2.5
.times. 10.sup.7** none HC_G5tag 2.5 .times. 10.sup.7 10.sup.4
*This test item will be evaluated in this study with and without
dilution; the sample LC5Tag will be diluted to obtain the similar
counts of acholeplasma sp prior to inactivation and infection of
the cells. **Prior to inactivation. CTRL5Tag is a control sample,
not infected with Acholeplasma laidlawii and labelled with 4SU at
day 5. LC5 is a sample infected with a Low Concentration of
Acholeplasma laidlawii at day 5. LC5Tag is a sample infected with a
Low Concentration of Acholeplasma laidlawii and 4 SU labelled at
day 5. HC_HK5tag is a sample infected with a High Concentration of
Acholeplasma laidlawii heat killed before infection and 4 SU
labelled at day 5. HC_G5tag is a sample infected with a high dose
of Acholeplasma laidlawii treated with gentamycin before infection
and 4 SU labelled at day 5.
[0509] CTRL5Tag is a control sample, not infected with Acholeplasma
laidlawii and labelled with 4SU at day 5.
[0510] LC5 is a sample infected with a Low Concentration of
Acholeplasma laidlawii at day 5.
[0511] LC5Tag is a sample infected with a Low Concentration of
Acholeplasma laidlawii and 4 SU labelled at day 5.
[0512] HC_HK5tag is a sample infected with a High Concentration of
Acholeplasma laidlawii heat killed before infection and 4 SU
labelled at day 5.
[0513] HC_G5tag is a sample infected with a high dose of
Acholeplasma laidlawii treated with gentamycin before infection and
4 SU labelled at day 5.
[0514] RNA Extraction
[0515] RNA extraction was performed in the dark using a
chloroform:isoamyl alcohol mix 24:1 (Sigma Aldrich, Cat. No. 25666,
Saint Louis, USA) followed by isopropanol/ethanol precipitation.
During extraction, reducing agent was used to maintain the
4sU-treated sample under reducing conditions.
[0516] Alkylation was performed using the SLAMseq Kinetic
kit--Anabolic Kinetics Module (Lexogen, Cat. No. 061, Vienna,
Austria) for one part of the condition "D1--with 4sU" only. Total
extracted RNA was mixed with iodoacetamide (IAA), which modifies
the 4-thiol group of 4sU-containing nucleotides via the addition of
a carboxyamidomethyl group leading to the condition "D1--with
4sU+alkylation". This alkylation amplifies the frequency of T>C
misincorporations during the reverse transcription. The other part
was labelled "D1--with 4sU no alkylation".
[0517] The RNA was then purified using ethanol precipitation prior
to proceeding to library preparation.
[0518] Library Preparation and Sequencing
[0519] The SMARTer Stranded Total RNA-Seq Kit--Pico Input Mammalian
(ClonTech, Mountain View, USA) was used for a direct construction
of libraries starting with 10 ng of RNA. The depletion of ribosomal
RNA of bacterial origin (16S and 23S) is performed on total RNA
using the Ribominus Bacteria Transcriptome analysis kit
(thermoFisher). Depletion of ribosomal cDNA using probes specific
to mammalian rRNA and some mitochondrial RNA is also performed
(included in the SMARTer Stranded Total RNA-Seq kit, prior to the
library preparation using the manufacturer's recommendation
(ClonTech). Sequencing was performed on the Next Seq instrument
(Illumina, San Diego, United States) using the NextSeq mid output
flow cell (FC-404-1001, Illumina). Sequencing was single-read with
a read length of 150 nucleotides generating approximatively 125
million reads per sample.
[0520] Agnostic Bioinformatic Analysis
[0521] The raw data reads were filtered to select high-quality and
relevant reads. Raw data was sorted to suppress or cut duplicates,
low quality reads and homopolymers (PathoQuest proprietary
software).
[0522] Sequences introduced during the preparation of Illumina
libraries (adapters, primers) were removed with Skewer (Jiang et
al., 2014. BMC Bioinformatics. 15:182).
[0523] Filtered reads of the LC5 condition were considered first as
sequences of interest. As this condition very likely includes a
high content of unlabeled sequences of the organism of interest,
this will allow the reconstruction of the genome of the targeted
organism (Acholeplasma laidlawii). LC5 reads were therefore
assembled into longer sequences named "contigs" with Megahit (Li et
al., 2015. Bioinformatics. 31(10):1674-1676). Resulting contigs
were then mapped back with minimap2 (Li, 2018. Bioinformatics.
34(18):3094-3100) onto Acholeplasma laidlawii strain PG8A genome
(RefSeq AccNum CP000896.1). Positive hits were then tiled on the
Acholeplasma laidlawii strain PG8A genome using Mummer 3 (Kurtz et
al., 2004. Genome Biol. 5(2):R12) in order to: [0524] 1. confirm
the identity of contigs potentially detected as A. laidlawii,
[0525] 2. ensure completeness of the newly build sequence.
[0526] Once the identity of the contigs has been assessed and the
tiling validated, contigs were pooled in a .fasta file to serve as
reference genome (hereafter called LC5_ALAID_CNS) for further
analyses.
[0527] Estimation of T>C Substitution Ratio
[0528] In order to detect A. laidlawii sequences with a very high
number of T.fwdarw.C substitutions, the set of quality filtered
reads was mapped back to LC5_ALAID_CNS with minimap2 in
non-multimap mode (Li, 2018. Bioinformatics. 34(18):3094-3100). The
pileup module of the htsbox software
(https://github.com/lh3/htsbox) was then used to detect all
mismatches (with a base quality at least equal to 30) at every
position of the LC5_ALAID_CNS sequence. The global variations
profiles were then analyzed using a proprietary script (PathoQuest,
Paris, France) to define each nucleotide substitution rates. The
proportion of substituted nucleotides was compared to the total
number of aligned nucleotides. For example, the T.fwdarw.C
substitution rate was calculated using the following formula:
T -> C substitution rate = Number of C nucleotides identified
when a T was expected Total number of expected T ##EQU00006##
[0529] The substitution rates for each time point were normalized
with the following substitution index:
T .fwdarw. C substitution index = " T -> C " rate Mean ( " T
-> A " , " T -> G " rates ) ##EQU00007##
[0530] Results
[0531] Sequencing Throughput
[0532] Sequencing runs throughput are reported in Table 12. For
almost all conditions, more than 10 million of single end reads
have been produced. For each condition, more than 90% of reads have
been retained after the filtering step, indicating that the
sequencing runs were of good quality and thus suitable for
subsequent analyses.
TABLE-US-00012 TABLE 12 Sequencing throughput for all experimental
condition Test items Raw Reads Filtered Reads Ratio CTRL5Tag
14,092,086 12,698,272 0.90 LC5 14,922,171 14,766,414 0.99 LC5Tag*
17,690,706 17,519,522 0.99 Diluted LC5Tag* 17,124,090 16,894,486
0.99 HC_HK5tag 20,410,793 20,076,010 0.98 4.degree. HC_HK5tag
20,410,793 20,076,010 0.98 HC_G5tag 16,591,955 16,060,004 0.97
[0533] Reference Genome Reconstruction
[0534] LC5 reads assembly process allowed the generation of a set
of 877 contigs (cumulative length 1,374,213; Min_length=201;
Avg_length=1,566.9; Max_length=15,043). Remapping onto A. laidlawii
PG8A (CP000896.1) genome sequence allowed the unambiguous selection
of 662 contigs (cumulative length 1,287,020; Min_length=301;
Avg_length=1,944.1; Max_length=15,043) as candidates for
LC5_ALAID_CNS reconstruction. As a first check, GC content
distribution and statistics were investigated to evidence a
possible mix of organisms in contigs set (FIG. 4).
[0535] As seen in FIG. 4, the GC content distribution is unimodal
suggesting a low probability of the presence of contigs
representative of several organisms in the contigs set.
Furthermore, the mean GC content of this set is not significantly
different from the expectation (32.01% vs 31.93% for A. laidlawii
str. PG8A).
[0536] To ensure that we were able to reconstruct the entire genome
(or at least a significant portion) of a close relative of A.
laidlawii str. PG8A, we "tiled" the latter with the selected
contigs from the initial assembly of reads of the LC5 experimental
condition. The results are presented in FIG. 5.
[0537] As shown in FIG. 5, the 662 contigs set covers almost
entirely the A. laidlawii str. PG8A with high similarity (higher
than 99% in all cases; data not shown) which strongly suggests that
the reconstructed LC5_ALAID_CNS is a very close relative of A.
laidlawii str. PG8A.
[0538] In conclusion, we were able to: [0539] 1. select a clean set
of contigs corresponding to A. laidlawii, and [0540] 2. cover the
complete genome of a close relative (A. laidlawii str. PG8A).
[0541] This process thus validates our reference sequence
LC5_ALAID_CNS for further analyses.
[0542] Substitution Rates and Indexes
[0543] Lowly covered positions might induce biases in rate
estimates as they account with the same weight as pretty well
covered ones. Indeed, if a position is covered only 3 times and is
once a T.fwdarw.C substitution, the T.fwdarw.C substitution rate at
this position would be 33% regardless of the fact that it might be
either a true substitution or a sequencing/assembly error.
Therefore, to avoid overestimates of substitution rates and
therefore substitution indexes, we conducted the analysis first
selecting all detected events (i.e., covered at least once
(1.lamda.)) and then selecting events at least covered 20 times
(i.e., 20.times.), the latter being considered as highly confident
events.
[0544] Substitution rates and indexes are reported in Table 13.
Overall, we show here that transition T.fwdarw.C rate are always
higher than transversion T.fwdarw.A and T.fwdarw.G rates, which is
expected as classical mutation patterns favor transitions upon
transversions.
[0545] Moreover, the T.fwdarw.C substitution rates are
significantly higher for LC5tag and 40-fold diluted LC5tag
conditions compared to all other conditions (including the high
load inactivated sample (HC_HK5Tag)), whatever the selection level
of events. Moreover, the inclusion of lowly covered position in
this analysis had little impact on the results since observed rates
were not significantly different at 1.times. and 20.times.
thresholds, still the latter would limit the background noise. The
same trend is observable for substitution indexes.
TABLE-US-00013 TABLE 13 Substitution rates and substitution index
(SI) for each experimental condition for all detected events
(1.times. threshold) and for highly confident events (20.times.
Threshold). 1.times. threshold 20.times. threshold Substitution
rate Substitution Substitution rate Substitution T >A T > C T
> G index T > A T > C T > G index LC5 0.02 0.07 0.04
2.03 0.02 0.07 0.04 2.03 LC5tag 0.04 1.02 0.06 20.59 0.04 0.90 0.06
17.71 LC5tag 0.04 1.36 0.05 29.94 0.04 1.27 0.05 27.90 Dilution
40.times. HC_HK5tag 0.05 0.13 0.06 2.32 0.05 0.13 0.07 2.31
4.degree. C. HC_HK5tag 0.04 0.11 0.06 2.11 0.04 0.11 0.07 2.11
HC_G5tag 0.04 0.15 0.05 3.24 0.04 0.15 0.05 3.26 CTRL5tag 0.12 1.41
0.89 2.77 0.13 1.42 0.90 2.76
[0546] In conclusion, the reported results showed that experiments
expected to be spiked by A. laidlawii and 4sU-labeled were detected
as such.
[0547] Positional Analysis
[0548] We have reported a global increase in substitution rates and
substitution indexes. In order to investigate whether these
increases result from substitution hotspots, we conducted a
positional analysis evaluating substitution rates along the
LC5_ALAID_CNS reference sequence (FIGS. 6A-G).
[0549] We noticed the presence of A. laidlawii reads in the
CTRL5tag experimental condition as some peaks were visible though
this condition has not been spiked with Acholeplasma laidlawii
(FIG. 6A). Most of the rates reached 100% suggesting that those
substitutions are actually real SNPs. This observation suggested
either a contamination at experimental level or a cross index
contamination during the sequencing phase when multiplexing samples
(the so-called index hopping). Nevertheless, as it concerned a
quite limited number of positions, it did not impair the
analysis.
[0550] FIG. 6B shows a rather low background of substitutions for
LC5 condition with all the genome positions well-covered (i.e., no
coverage hole). No real dominant substitution type is visible,
which is in agreement with the global scale analysis. In contrast,
for LC5tag and 40 fold diluted LC5tag conditions, one could
distinguish large T.fwdarw.C peaks emerging from the background,
indicating a successful labelling and thus active transcription in
A. laidlawii (FIGS. 6C and 6D).
[0551] FIGS. 6E and 6F show the results of the 4sU-labelling of
experimental conditions where A. laidlawii cells have been killed
by heat (HC_HK5tag and 4.degree. HC_HK5tag). In both cases, we
observed a smaller number of peaks (and especially T.fwdarw.C
peaks) compared to LC5tag and 40.times._LC5tag condition,
confirming the lower amount of extracted RNA due to a low remaining
number of living bacterial cells in the medium after heating.
[0552] Likewise, the gentamycin treatment had the same effect
(HC_G5 condition; FIG. 6G), but was apparently much moderate
compared to the effect of heat in HC_HK5tag experimental conditions
(FIGS. 6E and 6F). Yet, the 4sU-labelling was still visible and
confirm the global analyses with a moderate substitution index
(Table 13).
Example 5
[0553] Materials and Methods
[0554] Cells and Mollicutes
[0555] A549 (ATCC_CCL-185) cells are grown in DMEM-Dulbecco's
Modified Eagle Medium to circa 70% confluence in a 6 well plate
before contamination.
[0556] Acholeplasma Sp or Mycoplasma sp Infection of A549 Cells
[0557] At circa 70% confluence, the culture medium of the A549
cells is changed to MEM-Earle medium supplemented with 7% fetal
bovine serum and 1% L-glutamine without antibiotics. Cells will be
infected at various infectious doses of Acholeplasma sp or
Mycoplasma sp.
[0558] Several conditions are tested, among which: [0559] CTRL5Tag:
control sample, not infected with Acholeplasma sp or Mycoplasma sp
and labelled with 4-SU at day 5; [0560] LC5: sample infected with a
Low Concentration of Acholeplasma sp or Mycoplasma sp; [0561]
LC5Tag: sample infected with a Low Concentration of Acholeplasma sp
or Mycoplasma sp and 4-SU labelled at day 5; [0562] HC_HK5tag:
sample infected with a High Concentration of Acholeplasma sp or
Mycoplasma sp heat killed before infection and 4-SU labelled at day
5; [0563] HC_G5tag: sample infected with a high dose of
Acholeplasma sp or Mycoplasma sp treated with gentamycin before
infection and 4-SU-labelled at day 5.
[0564] At day 5, 4-thiouridine (4sU) (800 .mu.M) is added to the
culture medium 9 hours, 6 hours and 3 hours before cell harvest.
Culture medium is removed after 5 days of incubation at 37.degree.
C., cells are pelleted and frozen prior to RNA extraction.
[0565] Addition of 4-thiouridine (4sU) into the cell culture medium
enables 4sU nucleotides to be incorporated into newly synthesized
RNA. The reverse transcription of 4sU displays a certain percentage
of misincorporation resulting in a T>C transition in the cDNA,
which can be identified by sequencing (Herzog et al., 2017. Nat
Methods. 14(12):1198-1204).
[0566] RNA Extraction
[0567] RNA extraction is performed in the dark using a
chloroform:isoamyl alcohol mix 24:1 (Sigma Aldrich, Cat. No. 25666,
Saint Louis, USA) followed by isopropanol/ethanol precipitation.
During extraction, reducing agent is used to maintain the
4sU-treated sample under reducing conditions.
[0568] Alkylation is performed using the SLAMseq Kinetic
kit--Anabolic Kinetics Module (Lexogen, Cat. No. 061, Vienna,
Austria) for one part of the condition "D1--with 4sU" only. Total
extracted RNA is mixed with iodoacetamide (IAA), which modifies the
4-thiol group of 4sU-containing nucleotides via the addition of a
carboxyamidomethyl group leading to the condition "D1--with
4sU+alkylation". This alkylation increases the frequency of T>C
misincorporations during the reverse transcription.
[0569] The RNA is then purified using ethanol precipitation prior
to proceeding to library preparation.
[0570] Library Preparation and Sequencing
[0571] The SMARTer Stranded Total RNA-Seq Kit--Pico Input Mammalian
(ClonTech, Mountain View, USA) is used for a direct construction of
libraries starting with 10 ng of RNA. The depletion of ribosomal
RNA of bacterial origin (16S and 23S) is performed on total RNA
using the Ribominus Bacteria Transcriptome analysis kit
(ThermoFisher). Depletion of ribosomal cDNA using probes specific
to mammalian rRNA and some mitochondrial RNA is also performed
(included in the SMARTer Stranded Total RNA-Seq kit, prior to the
library preparation using the manufacturer's recommendations
(ClonTech)). Sequencing is performed on the Illumina instrument
(Illumina, San Diego, United States) using the NextSeq 500/550 High
Output kit v2 (FC-404-2002, Illumina). Sequencing is paired-end
with a read length of 150 nucleotides generating approximatively
100 million reads per sample.
[0572] Agnostic Bioinformatic Analysis
[0573] The raw data reads are filtered to select high-quality and
relevant reads. Raw data are sorted to suppress or cut duplicates,
low quality reads and homopolymers (PathoQuest proprietary
software).
[0574] Sequences introduced during the preparation of Illumina
libraries (adapters, primers) are removed with Skewer (Jiang et
al., 2014. BMC Bioinformatics. 15:182).
[0575] Filtered reads of the negative control conditions
(unlabeled, inactivated or both) are considered first as sequences
of interest. As these conditions very likely include a high load of
sequences of the organism of interest, this allows the
reconstruction of the genome of the targeted organism (Acholeplasma
sp or Mycoplasma sp). These reads are therefore assembled into
longer sequences named "contigs" with Megahit (Li et al., 2015.
Bioinformatics. 31(10):1674-1676). Resulting contigs are then
mapped back with minimap2 (Li, 2018. Bioinformatics.
34(18):3094-3100) onto Acholeplasma sp or Mycoplasma sp strain PG8A
genome (RefSeq AccNum CP000896.1). Positive hits are then tiled on
the Acholeplasma sp or Mycoplasma sp strain PG8A genome using
Mummer 3 (Kurtz et al., 2004. Genome Biol. 5(2):R12) in order to:
[0576] 1. confirm the identity of contigs potentially detected as
Acholeplasma sp or Mycoplasma sp, and [0577] 2. ensure completeness
of the newly build sequence (hereafter referred as ALAID_CNS).
[0578] Estimation of T>C Substitution Ratio
[0579] In order to detect Acholeplasma sp or Mycoplasma sp
sequences with a very high number of T.fwdarw.C substitutions, the
set of quality filtered reads is mapped back to ALAID_CNS with
minimap2 in non-multimap mode (Li, 2018. Bioinformatics.
34(18):3094-3100). The pileup module of the htsbox software
(https://github.com/lh3/htsbox) is then used to detect all
mismatches (with a base quality at least equal to 30) at every
position of the ALAID_CNS sequence. The global variations profiles
are then analyzed using a proprietary script (PathoQuest, Paris,
France) to define each nucleotide substitution rates. The
proportion of substituted nucleotides is compared to the total
number of aligned nucleotides. For example, the T.fwdarw.C
substitution rate is calculated using the following formula:
T -> C substitution rate = Number of C nucleotides identified
when a T was expected Total number of expected T ##EQU00008##
[0580] The substitution rates for each time point is normalized
with the following substitution index:
T .fwdarw. C substitution index = " T -> C " rate Mean ( " T
-> A " , " T -> G " rates ) ##EQU00009##
* * * * *
References