U.S. patent application number 11/543601 was filed with the patent office on 2007-02-08 for identification of new small rnas and orfs of e. coli as mediators of cell and intercell regulation.
Invention is credited to Susan Gottesman, Francis Repoila, Carsten Rosenow, Gisela Storz, Karen Wassarman.
Application Number | 20070032639 11/543601 |
Document ID | / |
Family ID | 23014422 |
Filed Date | 2007-02-08 |
United States Patent
Application |
20070032639 |
Kind Code |
A1 |
Gottesman; Susan ; et
al. |
February 8, 2007 |
Identification of new small RNAs and ORFs of E. coli as mediators
of cell and intercell regulation
Abstract
The invention relates to new small RNAs and ORFs of E. coli as
mediators of cell and intercell regulation.
Inventors: |
Gottesman; Susan; (Bethesda,
MD) ; Storz; Gisela; (Bethesda, MD) ; Repoila;
Francis; (Colomiers, FR) ; Wassarman; Karen;
(Middleton, WI) ; Rosenow; Carsten; (Redwood
Shores, CA) |
Correspondence
Address: |
KNOBBE, MARTENS, OLSON & BEAR, LLP
2040 MAIN STREET
FOURTEENTH FLOOR
IRVINE
CA
92614
US
|
Family ID: |
23014422 |
Appl. No.: |
11/543601 |
Filed: |
October 5, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10627007 |
Jul 25, 2003 |
7119193 |
|
|
11543601 |
Oct 5, 2006 |
|
|
|
PCT/US02/03147 |
Jan 31, 2002 |
|
|
|
10627007 |
Jul 25, 2003 |
|
|
|
60266402 |
Feb 1, 2001 |
|
|
|
Current U.S.
Class: |
530/350 ;
530/388.1; 536/23.2 |
Current CPC
Class: |
C12N 15/11 20130101;
C07K 14/245 20130101 |
Class at
Publication: |
530/350 ;
536/023.2; 530/388.1 |
International
Class: |
C07K 14/47 20070101
C07K014/47; C07K 16/18 20070101 C07K016/18; C07H 21/04 20060101
C07H021/04 |
Claims
1. An isolated polynucleotide comprising Candidate #8, or its
complement, or its homolog having at least about 95% identity
thereto.
2. An isolated polynucleotide comprising Candidate #12, or its
complement, or its homolog having at least about 95% identity
thereto.
3. An isolated polynucleotide comprising Candidate #14, or its
complement, or its homolog having at least about 95% identity
thereto.
4. An isolated polynucleotide comprising Candidate #22, or its
complement, or its homolog having at least about 95% identity
thereto.
5. An isolated polynucleotide comprising Candidate #24, or its
complement, or its homolog having at least about 95% identity
thereto.
6. An isolated polynucleotide comprising Candidate #25, or its
complement, or its homolog having at least about 95% identity
thereto.
7. An isolated polynucleotide comprising Candidate #26, or its
complement, or its homolog having at least about 95% identity
thereto.
8. An isolated polynucleotide comprising Candidate #27, or its
complement, or its homolog having at least about 95% identity
thereto.
9. An isolated polynucleotide comprising Candidate #31, or its
complement, or its homolog having at least about 95% identity
thereto.
10. An isolated polynucleotide comprising Candidate #38, or its
complement, or its homolog having at least about 95% identity
thereto.
11. An isolated polynucleotide comprising Candidate #40, or its
complement, or its homolog having at least about 95% identity
thereto.
12. An isolated polynucleotide comprising Candidate #41-I, or its
complement, or its homolog having at least about 95% identity
thereto.
13. An isolated polynucleotide comprising Candidate #41-II, or its
complement, or its homolog having at least about 95% identity
thereto.
14. An isolated polynucleotide comprising Candidate #43, or its
complement, or its homolog having at least about 95% identity
thereto.
15. An isolated polynucleotide comprising Candidate #52-I, or its
complement, or its homolog having at least about 95% identity
thereto.
16. An isolated polynucleotide comprising Candidate #52-II, or its
complement, or its homolog having at least about 95% identity
thereto.
17. An isolated polynucleotide comprising Candidate #55-I, or its
complement, or its homolog having at least about 95% identity
thereto.
18. An isolated polynucleotide comprising Candidate #55-II, or its
complement, or its homolog having at least about 95% identity
thereto.
19. An isolated polynucleotide comprising Candidate #61, or its
complement, or its homolog having at least about 95% identity
thereto.
20. An isolated polynucleotide comprising Candidate #9, or its
complement, or its homolog having at least about 95% identity
thereto.
21. An isolated polynucleotide comprising Candidate #17, or its
complement, or its homolog having at least about 95% identity
thereto.
22. An isolated polynucleotide comprising Candidate #28, or its
complement, or its homolog having at least about 95% identity
thereto.
23. An isolated polynucleotide comprising Candidate #36, or its
complement, or its homolog having at least about 95% identity
thereto.
24. An isolated polynucleotide comprising Candidate #49, or its
complement, or its homolog having at least about 95% identity
thereto.
25. An isolated polynucleotide comprising Candidate #50, or its
complement, or its homolog having at least about 95% identity
thereto.
26.-56. (canceled)
Description
RELATED APPLICATIONS
[0001] This application is a divisional of U.S. application Ser.
No. 10/627,007, filed Jul. 25, 2003, which is a continuation and
claims the benefit of priority of International Application No.
PCT/US02/03147 filed Jan. 31, 2002, designating the United States
of America and published in English, which claims the benefit of
priority of U.S. Provisional Application No. 60/266,402 filed Feb.
1, 2001, all of which are hereby expressly incorporated by
reference in their entireties.
FIELD OF THE INVENTION
[0002] The invention relates to new small RNAs and ORFs of E. coli
as mediators of cell and intercell regulation.
BACKGROUND OF THE INVENTION
[0003] In the last few years, the importance of regulatory small
RNAs (sRNAs) as mediators of a number of cellular processes in
bacteria has begun to be recognized. Although instances of
naturally occurring antisense RNAs have been known for many years,
the participation of sRNAs in protein tagging for degradation,
modulation of RNA polymerase activity, and stimulation of
translation are relatively recent discoveries (see Wassarman, K. M.
et al. 1999 Trends Microbiol 7:37-45 for review; Wassarman, K. M.
and Storz, G. 2000 Cell 101:613-623). These findings have raised
questions about how extensively sRNAs are used, what other cellular
activities might be regulated by sRNAs, and what other mechanisms
of action exist for sRNAs. In addition, prokaryotic sRNAs appear to
target different cellular functions than their eukaryotic
counterparts that primarily act during RNA biogenesis. It is
unclear whether this difference between prokaryotic and eukaryotic
sRNAs is accurate or stems from the incompleteness of current
knowledge. Implicit in these questions is the question of how many
sRNAs exist in a given organism and whether the current known sRNAs
are truly representative of sRNA function in general.
[0004] To date, most known bacterial sRNAs have been identified
fortuitously by the direct detection of highly abundant sRNAs (4.5S
RNA, tmRNA, 6S RNA, RNase P RNA, and Spot42 RNA), by the
observation of an sRNA during studies on proteins (OxyS RNA, Crp
Tic RNA, CsrB RNA, and GcvB RNA) or by the discovery of activities
associated with overexpression of genomic fragments (MicF RNA, DicF
RNA, DsrA RNA, and RprA RNA) (Okamoto, K. and Freundlich, M. 1986
PNAS USA 83:5000-5004; Bhasin, R. S. 1989 Studies on the mechanism
of the autoregulation of the crp operon of E. coli K12 In: Dept. of
Biochemistry and Cell Biology, State University of New York at
Stonybrook; Urbanowski, M. L. et al. 2000 Mol Microbiol 37:856-868;
Wassarman, K. M. and Storz, G. 2000 Cell 101:613-623; Majdalani, N.
et al. 2001 Mol Microbiol 39:1382-1394; for review see Wassarman,
K. M. et al. 1999 Trends Microbiol 7:37-45). None of the E. coli
sRNAs were found as a result of mutational screens. This
observation may reflect the small target size of genes encoding
sRNAs compared to protein genes, or may be a consequence of the
regulatory rather than essential nature of many sRNA functions. The
complete genome sequence of an organism provides a rapid inventory
of most encoded proteins, tRNAs, and rRNAs, but it has not led to
the immediate recognition of other genes that are not translated.
In particular, new bacterial sRNA genes have been overlooked, as
there are no identifiable classes of sRNAs that can be found based
solely on sequence determinants.
SEGUE TO THE SUMMARY OF THE INVENTION
[0005] We and others have previously suggested several approaches
to look for new sRNAs including computer searching of complete
genomes based on parameters common to sRNAs, probing of genomic
microarrays, and isolating sRNAs based on an association with
general RNA binding proteins (Wassarman, K. M. et al. 1999 Trends
Microbiol 7:37-45; Eddy, S. R. 1999 Curr Opin Genet Dev 9:695-699).
Using a combination of these approaches, we have identified 17
novel sRNAs; in addition, we have found six small transcripts that
contain short conserved open reading frames (ORFs).
SUMMARY OF THE INVENTION
[0006] A burgeoning list of small RNAs with a variety of regulatory
functions has been identified in both prokaryotic and eukaryotic
cells. However, it remains difficult to identify small RNAs by
sequence inspection. We utilized the high conservation of small
RNAs among closely related bacterial species, as well as analysis
of transcripts detected by high-density oligonucleotide probe
arrays, to predict the presence of novel small RNA genes in the
intergenic regions of the Escherichia coli genome. The existence of
23 distinct new RNA species was confirmed by Northern analysis. Of
these, six are predicted to encode short ORFs, whereas 17 are novel
functional small RNAs. Based on the interaction of these small RNAs
with the RNA binding protein Hfq, the modulation of rpoS
expression, and other information, we contemplate these new small
RNAs and ORFs of E. coli as mediators of cell and intercell
regulation. As such, we anticipate their use in the development of
diagnostics and in the development of antibiotics.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] FIG. 1 shows BLAST alignments of representative Ig regions.
The indicated Ig regions were used in a BLAST search of the NCBI
Unfinished Microbial Genomes database. Each panel shows the summary
figure provided by the BLAST program for matches to Salmonella
enteritidis, Salmonella paratyphi A, Salmonella typhi, Salmonella
typhimurium LT2 and Klebsiella pneumoniae, three contain known sRNA
genes (rprA, csrB, and oxys), and four contain sRNA candidates
(#14, #17, #52, and #36; see Table 1). For each panel, the center
numbered line represents the length of the full Ig region; the
orientation of flanking genes is given by >(clockwise) or
<(counterclockwise). The top hatched line in each panel is the
match to E. coli (full identity throughout the Ig). The other
hatched or double-diagonal lines resulted from the closest matches,
and the other lines indicate additional less homologous matches.
Location of the conserved region with respect to the borders of the
Ig region also was a criterion used for the selection of our
candidates; conservation 3' to an ORF or far from the 5' start of
an ORF was considered more likely to encode an sRNA. Note that the
conservation within the Ig region encoding oxyS might be
interpreted as a leader sequence based on location relative to the
start of the flanking gene (oxyR). However, the conservation
extends for 185 nt, and therefore candidate regions in our search
in which the conservation was near the start of an ORF but was
longer than 150 nt were considered further.
[0008] FIG. 2 is the expression profile across high-density
oligonucleotide arrays for representative Ig regions. Probe
intensities are shown for the indicated Ig regions (solid bars) and
the flanking ORFs (hatched bars), calculated from the perfect match
minus the mismatch intensities. All negative differences were set
to zero. The data shown are for one experiment using cDNA probes,
but similar results were seen in the duplicate experiment and with
directly labeled RNA probes. The Ig regions and each flanking gene
generally contain 15 interrogating probes. Upward bars correspond
to genes transcribed on the Watson (W, clockwise) strand and
downward bars correspond to genes transcribed on the Crick (C,
counterclockwise) strand. The C strand signal for the CsrB Ig
region corresponds well with the known location of the csrB gene.
Similarly for the RprA Ig region, the W strand signal corresponds
with the location of the rprA gene, but only one probe is positive.
The W strand signal for #14 and the C strand signal for #17 overlap
well with the conserved regions shown in the BLAST analysis in FIG.
1. #36 was chosen for further analysis because of the strong C
strand signal; both flanking ORFs are on the W strand. For #52, low
levels of expression were seen on both strands; the very low level
for probes in the middle of the Ig on the C strand overlapped best
with the conserved region found by the BLAST searches (FIG. 1).
[0009] FIG. 3 shows detection of novel sRNAs by Northern
hybridization. Northern hybridization using strand specific probes
for each candidate was done on RNA extracted from MG1655 cells
grown under three different growth conditions: (E), exponential
growth in LB medium; (M), exponential growth in M63-glucose medium,
and (S) stationary phase in LB medium. Five .mu.g of total RNA was
loaded in each lane. Exposure times were optimized for each panel
for visualization here, therefore the signal intensity shown does
not indicate relative abundance between sRNAs. Oligonucleotide
probes were used for #12, #22, #55-I, #55-I, and #61; RNA probes
were used for all other panels. DNA molecular weight markers
(5'-end-labeled MspI digested pBR322 DNA) were run with each set of
samples for direct estimation of RNA transcript length. One lane of
DNA molecular weight markers are shown for comparison, but these
are approximate sizes as there was slight variation in running of
gels.
[0010] FIG. 4 shows results of coimmunoprecipitation of sRNAs with
the Hfq protein. (A) Immunoprecipitations using extract from MG1655
cells grown in LB medium in exponential growth (OD.sub.600=0.4)
were done using no antibody (lane 1); 5 .mu.l of preimmune serum
(lane 2); or 0.5, 1, 5, or 10 .mu.l of hfq antisera (lanes 3-6).
Selected RNAs were fractionated on a 10% polyacrylamide urea gel
after 3'-end labeling. Asterisks mark RNA bands present in the
anti-hfq precipitated samples but not in the preimmune control
samples and therefore represent Hfq-interfacing RNAs. (B)
Immunoprecipitations were done using extract from MG1655 cells
grown under three different growth conditions: (E) exponential
growth in LB medium; (M) exponential growth in M63-glucose medium,
and (S) stationary phase in LB medium. Immunoprecipitations were
carried out with 5 .mu.l of preimmune sera (lane 1) or 5 .mu.l Hfq
antisera (lane 2) and compared to total RNA from 1/10 extract
equivalent used in the immunoprecipitations (lane 3). RNAs were
fractionated on 10% polyacrylamide urea gels and analyzed by
Northern hybridization using RNA probes to previously known sRNAs
or our novel RNAs as indicated.
[0011] TABLE-US-00001 Brief Description of the Sequences Candidate
Number SEQ ID NO 12 1 14 2 22 3 24 4 25 5 26 6 27 7 31 8 38 9 40 10
41-I 11 41-II 12 52-I 13 52-II 14 55-I 15 55-II 16 61 17 8 18 43 19
9 (nucleotide) 20 9 (amino acid) 21 17 (nucleotide) 22 17 (amino
acid) 23 28 (nucleotide) 24 28 (amino acid) 25 36(nucleotide) 26 36
(amino acid) 27 49 (nucleotide) 28 49 (amino acid) 29 50
(nucleotide) 30 50 (amino acid) 31
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0012] By "RNA" or "gene product" or "transcription product" is
meant the RNA encoded by the E. Coli gene or RNA substantially
homologous or complementary thereto or a derivative or fragment
thereof having RNA activity. Encompassed by the definition of "RNA"
are variants of RNA in which there have been trivial mutations such
as substitutions, deletions, insertions or other modifications of
the native RNA. The term "substantial homology" or "substantial
identity", when referring to polypeptides or polynucleotides,
indicates that the sequence of a polypeptide or polynucleotide in
question, when properly aligned, exhibits at least about 30%
identity with the sequence of an entire naturally occurring
polypeptide or polynucleotide or a portion thereof. Polynucleotides
of the present invention which are homologous or substantially
homologous to, for example, the polynucleotides of the invention
are usually at least about 70% identity to that shown in the
Sequence Listing, preferably at least about 90% identity and most
preferably at least about 95% identity, or a complement thereof.
Any technique known in the art can be used to sequence
polynucleotides, including, for example, dideoxynucleotide
sequencing (Sanger et al. 1977 PNAS USA 74:5463-5467), or using the
Sequenase.TM. kit (United States Biochemical Corp.). Homologs of
polynucleotides and polypeptides, whether synthetically or
recombinantly produced or found in nature, are also encompassed by
the scope of the invention, and are herein defined as
polynucleotides and polypeptides which are homologous to,
respectively, polynucleotides and polypeptides of the invention, or
fragments, variants, or complements thereof. Homologous
polynucleotides and polypeptides are generally encoded by
homologous genes as described above, and retain significant amino
acid residue or nucleotide identity to the genes of the invention.
Such polypeptides can be expressed by other organisms such as
bacteria, yeast and higher order organisms such as mammals. Various
methods of determining amino acid residue or nucleotide identity
are known in the art. Homologous polynucleotides or polypeptides
can be obtained by in vitro synthesis by expressing genes derived
from other bacteria or by mutagenizing genes of the invention. Also
included in the definition of "substantially homologous
polynucleotides" would be those polynucleotides which, when
annealed under conditions known in the art, would remain annealed
under moderate wash conditions also known in the art (such as
washing in 6.times.SSPE twice at room temperature and then twice at
37.degree. C.) (Wahl et al. 1987 Methods in Enzymology 152 Academic
Press Inc., San Diego).
[0013] Polynucleotide and polypeptide homology is typically
measured using sequence analysis software. See, e.g., Sequence
Analysis Software Package of the Genetics Computer Group,
University of Wisconsin Biotechnology Center, 1710 University
Avenue, Madison, Wis. 53705.
[0014] By "polynucleotide" or "nucleic acid" is meant a single- or
double-stranded DNA, genomic DNA, cDNA, RNA, DNA-RNA hybrid, or a
polymer comprising purine and pyrimidine bases, or other natural,
chemically or biochemically modified or containing non-natural or
derivatized nucleotide bases. The backbone of the polynucleotide
can comprise sugars and phosphate groups (as typically found in RNA
or DNA), or modified or substituted sugar or phosphate groups.
Alternatively, the backbone of the polynucleotide can comprise a
polymer of synthetic subunits such as phosphoramidates and is thus
a oligodeoxynucleoside phosphoramidate (P--NH.sub.2) or a mixed
phosphoramidate-phosphodiester oligomer (Peyrottes et al. 1996
Nucleic Acids Res 24:1841-8; Chaturvedi et al. 1996 Nucleic Acids
Res 24:2318-23; and Schultz et al. 1996 Nucleic Acids Res
24:2966-73). In another embodiment, a phosphorothiate linkage can
be used in place of a phosphodiester linkage (Braun et al. 1988 J
Immunol 141:2084-9; and Latimer et al. 1995 Mol Immunol
32:1057-1064). In addition, a double-stranded polynucleotide can be
obtained from the single-stranded polynucleotide product of
chemical synthesis either by synthesizing the complementary strand
and annealing the strands under appropriate conditions, or by
synthesizing the complementary strand de novo using a DNA
polymerase with an appropriate primer.
[0015] A nucleic acid is said to "encode" an RNA or a polypeptide
if, in its native state or when manipulated by methods known to
those skilled in the art, it can be transcribed and/or translated
to produce the RNA, the polypeptide or a fragment thereof. The
anti-sense strand of such a nucleic acid is also said to encode the
sequence. The polynucleotides of the present invention comprise
those which are naturally-occurring, synthetic or recombinant.
[0016] A "recombinant" nucleic acid is one which is chemically
synthesized or the product of the artificial manipulation of
isolated segments of nucleic acids, e.g., by genetic engineering
techniques. Isolated segments within a recombinant nucleic acid can
be naturally occurring sequences.
[0017] By "polynucleotide" or "gene" or "RNA" and the like is meant
a polynucleotide encoding or comprising the RNA of the invention,
or a homolog, fragment, derivative or complement thereof and having
RNA activity as described herein. As is known in the art, a DNA can
be transcribed by an RNA polymerase to produce RNA, but an RNA can
be reverse transcribed by reverse transcriptase to produce a DNA.
Thus a DNA can encode an RNA and vice versa.
[0018] The invention also encompasses vectors such as single- and
double-stranded plasmids or viral vectors comprising RNA, DNA or a
mixture or variant thereof, further comprising a polynucleotide of
the invention. A wide variety of suitable expression systems are
known in the art and are selected based on the host cells used,
inducibility of expression desired and ease of use. The
non-transcribed portions of a gene and the non-coding portions of a
gene can be modified as known in the art. For example, the native
promoters can be deleted, substituted or supplemented with other
promoters known in the art; transcriptional enhancers, inducible
promoters or other transcriptional control elements can be added,
as can be replication origins and replication initiator proteins,
autonomously replicating sequence (ARS), marker genes (e.g.
antibiotic resistance markers), sequences for chromosomal
integration (e.g., viral integration sites or sequences homologous
to chromosomal sequences), restriction sites, multiple cloning
sites, ribosome-binding sites, RNA splice sites, polyadenylation
sites, transcriptional terminator sequences, mRNA stabilizing
sequences, 5' stem-loop to protect against degradation, and other
elements commonly found on plasmids and other vectors known in the
art. Secretion signals from secreted polypeptides can also be
included to allow the polypeptide to cross and/or lodge in cell
membranes or be secreted from the cell. Such vectors can be
prepared by means of standard recombinant techniques discussed, for
example, in Sambrook et al. 1989 Molecular Cloning: A Laboratory
Manual, 2nd edition, Cold Spring Harbor Press, Cold Spring Harbor
Laboratory, N.Y.; and Ausubel et al. (eds.), 1987 Current Protocols
in Molecular Biology, Greene Publishing Associates, Brooklyn,
N.Y.). Many useful vectors are known in the art and can be obtained
from vendors including, but not limited to, Stratagene, New England
Biolabs, and Promega Biotech.
[0019] An appropriate promoter and other necessary vector sequences
are selected so as to be functional in the chosen host. While
prokaryotic host cells are preferred, mammalian or other eukaryotic
host cells, including, but not limited to, yeast, filamentous
fungi, plant, insect, amphibian or avian species, can also be
useful for production of the polypeptides of the present invention.
See, Kruse et al. (eds.) 1973 Tissue Culture Academic Press.
Examples of workable combinations of cell lines and expression
vectors are described in Sambrook et al. 1989 or Ausubel et al.
1987; see also, e.g., Metzger et al. 1988 Nature 334:31-36.
Examples of commonly used mammalian host cell lines are VERO and
HeLa cells, Chinese hamster ovary (CHO) cells, and W138, BHK, and
COS cell lines, or others as appropriate, e.g., to provide higher
expression, desirable glycosylation patterns, etc.
[0020] By "bacterial host cell" or "bacteria" or "bacterium" is
meant various micro-organism(s) containing at least one chromosome
but lacking a discrete nuclear membrane. Representatives include E.
coli, Bacillus, Salmonella, Pseudomonas, Staphylococcus and other
eubacteria, archaebacteria, chlamydia and rickettsia and related
organisms, and the like, and may be spherical, rod-like, straight,
curved, spiral, filamentous or other shapes.
[0021] Vectors suitable for use with various cells can comprise
promoters which can, when appropriate, include those naturally
associated with genes of the invention. Promoters can be operably
linked to a polynucleotide of the invention.
[0022] A nucleic acid sequence is "operably linked" when it is
placed into a functional relationship with another nucleic acid
sequence. For instance, a promoter is operably linked to a coding
sequence if the promoter affects the transcription or expression of
the gene. Generally, operably linked means that the DNA sequences
being linked are contiguous and, where necessary to join two
protein coding regions, contiguous and in reading frame.
[0023] Promoters can be inducible or repressible by factors which
respond biochemically to changes in temperature, osmolarity, carbon
source, sugars, etc., as is known in the art. Promoters including,
but not limited to, the trp, lac and phage promoters, tRNA
promoters and glycolytic enzyme promoters can be used in
prokaryotic hosts. Useful yeast promoters include, but are not
limited to, the promoter regions for metallothionein,
3-phosphoglycerate kinase or other glycolytic enzymes such as
enolase or glyceraldehyde-3-phosphate dehydrogenase, enzymes
responsible for maltose and galactose utilization. Appropriate
foreign mammalian promoters include, but are not limited to, the
early and late promoters from SV40 (Fiers et al. 1978 Nature
273:113-120) and promoters derived from murine Moloney leukemia
virus, mouse mammary tumor virus, avian sarcoma viruses, adenovirus
II, bovine papilloma virus or polyoma. In addition, the construct
can be joined to an amplifiable gene (e.g., DHFR) so that multiple
copies of the construct can be made. For appropriate enhancer and
other expression control sequences suitable for vectors, see also
Enhancers and Eukaryotic Gene Expression, Cold Spring Harbor Press:
N.Y. 1983.
[0024] While expression vectors are preferably autonomously
replicating, they can also be inserted into the genome of the host
cell by methods known in the art. Expression and cloning vectors
preferably contain a selectable marker which is a gene encoding a
protein necessary under at least one control for the survival or
growth of a host cell transformed with the vector. The presence of
this gene ensures the growth of only those host cells which express
the inserts. Typical selection genes are known in the art and
include, but are not limited to, those which encode proteins that
(a) confer resistance to antibiotics or other toxic substances,
e.g., ampicillin, neomycin, methotrexate, etc.; (b) complement
auxotrophic deficiencies, or (c) supply critical nutrients not
available from complex media, e.g., the gene encoding D-alanine
racemase for Bacilli. The choice of the proper selectable marker
depends on the host cell, as appropriate markers for different
hosts are well known.
[0025] As one of skill in the art will understand, the choice in
construction and arrangement of markers, promoters, origins of
replication, etc. in various vectors of the present invention will
be dictated by the desired level and timing of expression of RNA of
the invention, with the ultimate goal of regulating the production
of metabolic compounds in the host cell.
[0026] By "protein" or "polypeptide" is meant a polypeptide encoded
by the E. coli gene of the invention or a polypeptide substantially
homologous thereto and having protein activity. Encompassed by the
proteins of the invention are variants thereof in which there have
been trivial substitutions, deletions, insertions or other
modifications of the native polypeptide which substantially retain
protein characteristics, particularly silent or conservative
substitutions. Silent nucleotide substitutions are changes of one
or more nucleotides which do not change any amino acid of protein.
Conservative substitutions include substitutions within the
following groups: glycine, alanine; valine, isoleucine, leucine;
aspartic acid, glutamic acid; asparagine, glutamine; serine,
threonine; lysine, arginine; and phenylalanine, tyrosine. Such
conservative substitutions are not expected to interfere with
biochemical activity, particularly when they occur in structural
regions (e.g., alpha helices or beta pleated sheets) of the
polypeptide, which can be predicted by standard computer analysis
of the amino acid sequence of the protein. Also encompassed by the
claimed polypeptides of the invention are polypeptides encoded by
polynucleotides which are substantially homologous to a
polynucleotide of the invention.
[0027] Nucleic acids encoding the polypeptides of the present
invention include not only native or wild-type sequences but also
any sequence capable of encoding the polypeptide, which can be
synthesized by making use of the redundancy in the genetic code.
Various codon substitutions can be introduced, e.g., silent or
conservative changes as discussed above. Due to degeneracy in the
genetic code there is some degree of flexibility in the third base
of each codon and some amino acid residues are encoded by several
different codons. Each possible codon could be used in the gene to
encode the protein. While this may appear to present innumerable
choices, in practice, each host has a particular preferred codon
usage, so that genes can be tailored for optimal translation in the
host in which they are expressed. Thus, synthetic genes that encode
the proteins of the invention are included in this invention.
[0028] Techniques for nucleic acid manipulation are described
generally, for example, in Sambrook et al. (1989) and Ausubel et
al. (1987). Reagents useful in applying such techniques, such as
restriction enzymes and the like, are widely known in the art and
commercially available from vendors including, but not limited to,
New England BioLabs, Boehringer Mannheim, Amersham, Promega
Biotech, U.S. Biochemicals, New England Nuclear, and a number of
other sources.
[0029] Nucleic acid probes and primers based on sequences of the
invention can be prepared by standard techniques. Such a probe or
primer comprises an isolated nucleic acid. In the case of probes,
the nucleic acid further comprises a label (e.g., a radionuclide
such as .sup.32P ATP or .sup.35S) or a reporter molecule (e.g., a
ligand such as biotin or an enzyme such as horseradish peroxidase).
The [.sup.32P]-ATP, [.sup.35S]-dATP and [.sup.35S]-methionine can
be purchased, for example, from DuPont NEN (Wilmington, Del.).
Probes can be used to identify the presence of a hybridizing
nucleic acid sequence, e.g., an RNA in a sample or a cDNA or
genomic clone in a library. Primers can be used, for example, for
amplification of nucleic acid sequences, e.g., by the polymerase
chain reaction (PCR). See, e.g., Innis et al. (eds.) 1990 PCR
Protocols: A Guide to Methods and Applications, Academic Press: San
Diego. The preparation and use of probes and primers is described,
e.g., in Sambrook et al. (1989) or Ausubel et al. (1987). The genes
of homologs of RNA of the invention in other species can be
obtained by generating cDNA from RNA from such species using any
technique known in the art, such as using Riboclone cDNA Synthesis
Systems AMV RT (Promega, Madison, Wis.), then probing such cDNA
with radiolabeled primers containing various portions (e.g. 30 or
40 bases long) of the sequences disclosed herein. To obtain
homologs of the proteins of the invention, degenerate primers can
encode the amino acid sequence of the disclosed E. coli protein but
differ in codon usage from the sequences disclosed.
[0030] Antisense and ribozyme nucleic acids capable of specifically
binding to sequences of the invention are also useful for
interfering with gene expression.
[0031] The nucleic acids of the present invention (whether sense or
anti-sense, and whether encoding the genes of the invention, or a
homolog, variant, fragment or complement thereof) can be produced
in large amounts by replication of a suitable recombinant vector
comprising DNA sequences in a compatible host cell. Alternatively,
these nucleic acids can be chemically synthesized, e.g., by any
method known in the art, including, but not limited to, the
phosphoramidite method described by Beaucage et al. 1981 Tetra
Letts 22:1859-1862, and the triester method according to Matteucci
et al. 1981 J Am Chem Soc 103:3191, preferably using commercial
automated synthesizers. The purification of nucleic acids produced
by the methods of the present invention can be achieved by any
method known in the art including, but not limited to, those
described, e.g., in Sambrook et al. (1989), or Ausubel et al.
(1987). Numerous commercial kits are available for DNA purification
including Qiagen plasmid mini DNA cartridges (Chatsworth,
Calif.).
[0032] The nucleic acids of the present invention can be introduced
into host cells by any method known in the art, which vary
depending on the type of cellular host, including, but not limited
to, electroporation; transfection employing calcium chloride,
rubidium chloride calcium phosphate, DEAE-dextran, or other
substances; microprojectile bombardment; P1 transduction; use of
suicide vectors; lipofection; infection (where the vector is an
infectious agent, such as a retroviral genome); and other methods.
See generally, Sambrook et al. (1989), and Ausubel et al. (1987).
The cells into which these nucleic acids have been introduced also
include the progeny of such cells.
[0033] A polypeptide "fragment", "portion", or "segment" is a
stretch of amino acid residues of at least about 7 to 19 amino
acids (or the minimum size retaining an antigenic determinant). A
fragment of the present invention can comprise a portion of at
least 20 amino acids of the protein sequence, at least 30 amino
acids of the protein sequence, at least 40 amino acids of the
protein sequence, at least 50 amino acids of the protein sequence,
or all or substantially all of the protein sequence. In addition,
the invention encompasses polypeptides which comprise a portion of
the sequence of the lengths described in this paragraph, which
further comprise additional amino acid sequences on the ends or in
the middle of sequences. The additional amino acid sequences can,
for example, comprise another protein or a functional domain
thereof, such as signal peptides, membrane-binding moieties,
etc.
[0034] A polynucleotide fragment of the present invention can
comprise a polymer of at least six bases or basepairs. A fragment
of the present invention can comprise at least six bases or
basepairs, at least 10 bases or basepairs, at least twenty bases or
basepairs, at least forty bases or base pairs, at least fifty bases
or basepairs, at least one hundred bases or basepairs, at least one
hundred fifty bases or basepairs, at least two hundred bases or
basepairs, at least two hundred fifty bases or basepairs, at least
three hundred bases or basepairs of the gene sequence. In addition,
the invention encompasses polynucleotides which comprise a portion
of the sequence of the lengths described in this paragraph, which
further comprise additional nucleic acid sequences on the 5' or 3'
end or inserted into the sequence. These additional sequences can,
for example, encode a coding region of a gene or a functional
domain thereof or a promoter.
[0035] The terms "isolated", "pure", "substantially pure", and
"substantially homogenous" are used interchangeably to describe a
polypeptide, or polynucleotide which has been separated from
components which naturally accompany it. A monomeric protein or a
polynucleotide is substantially pure when at least about 60 to 75%
of a sample exhibits a single polypeptide or polynucleotide
sequence. A substantially pure protein or polynucleotide typically
comprises about 60 to 90% by weight of a protein or polynucleotide
sample, more usually about 95%, and preferably will be over about
99% pure.
[0036] Protein or polynucleotide purity or homogeneity may be
indicated by a number of means, such as polyacrylamide gel
electrophoresis of a sample, followed by visualizing a single band
upon staining the gel. For certain purposes higher resolution can
be provided by using high performance liquid chromatography (HPLC)
or other means well known in the art for purification.
[0037] An RNA or a protein is "isolated" when it is substantially
separated from the contaminants which accompany it in its natural
state. Thus, a polypeptide which is chemically synthesized or
expressed as a recombinant protein, i.e., an expression product of
an isolated and manipulated genetic sequence, is considered
isolated. A recombinant polypeptide is considered "isolated" even
if expressed in a homologous cell type.
[0038] A polypeptide can be purified from cells in which it is
produced by any of the purification methods known in the art. For
example, such polypeptides can be purified by immunoaffinity
chromatography employing, e.g., the antibodies provided by the
present invention. Various methods of protein purification include,
but are not limited to, those described in Guide to Protein
Purification, ed. Deutscher, vol. 182 of Methods in Enzymology
Academic Press, Inc., San Diego, 1990 and Scopes, 1982 Protein
Purification: Principles and Practice Springer-Verlag, New
York.
[0039] Polypeptide fragments of the protein of the invention are
first obtained by digestion with enzymes such as trypsin;
clostripain, or Staphylococcus protease, or with chemical agents
such as cyanogen bromide, O-iodosobenzoate, hydroxylamine or
2-nitro-5-thiocyanobenzoate. Peptide fragments can be separated by
reversed-phase HPLC and analyzed by gas-phase sequencing. Peptide
fragments are used in order to determine the partial amino acid
sequence of a polypeptide by methods known in the art including but
not limited to, Edman degradation.
[0040] The present invention also provides polyclonal and/or
monoclonal antibodies capable of specifically binding to a
polypeptide of the invention, or homolog, fragment, complement or
derivative thereof. Antibodies can also be produced which bind
specifically to a polynucleotide of the invention, such as an RNA
of the invention or homolog, fragment, complement or derivative
thereof, and may be produced as described in, for example, Thiry
1994 Chromosoma 103:268-76; Thiry 1993 Eur J Cell Biol 62:259-69;
Reines 1991 J Biol Chem 266:10510-7; Putterman et al. 1996 J Clin
Invest 97:2251-9; and Fournie 1996 Clin Exp Immunol 104:236-40.
Antibodies capable of binding to polypeptides or polynucleotides of
the invention can be useful in detecting protein, in titrating
protein, for quantifying protein, for purifying protein or
polynucleotide, or for other uses.
[0041] For production of polyclonal antibodies, an appropriate host
animal is selected, typically a mouse or rabbit. The substantially
purified antigen, whether the whole polypeptide, a fragment,
derivative, or homolog thereof, or a polypeptide coupled or fused
to another polypeptide, or polynucleotide or homolog, derivative,
complement or fragment thereof, is presented to the immune system
of the host by methods appropriate for the host, commonly by
injection into the footpads, intramuscularly, intraperitoneally, or
intradermally. Peptide fragments suitable for raising antibodies
can be prepared by chemical synthesis, and are commonly coupled to
a carrier molecule (e.g., keyhole limpet hemocyanin) and injected
into a host over a period of time suitable for the production of
antibodies. The sera are tested for immunoreactivity to the protein
or fragment. Monoclonal antibodies can be made by injecting the
host with the protein polypeptides, fusion proteins or fragments
thereof and following methods known in the art for production of
such antibodies (Harlow et al. 1988 Antibodies: A Laboratory
Manual, Cold Spring Harbor Laboratories).
[0042] An immunological response is usually assayed with an
immunoassay, a variety of which are provided, e.g., in Harlow et
al. 1988, or Goding 1986 Monoclonal Antibodies: Principles and
Practice, 2d ed., Academic Press, New York), although any method
known in the art can be used.
[0043] Monoclonal antibodies with affinities of 10.sup.8 M.sup.-1,
preferably 10.sup.9 to 10.sup.10, or stronger are made by standard
procedures as described, e.g., in Harlow et al. 1988, or Goding
1986. Briefly, appropriate animals are immunized with the antigen
by a standard protocol. After the appropriate period of time, the
spleens of such animals are excised and individual spleen cells
fused to immortalized myeloma cells. Thereafter the cells are
clonally separated and the supernatants of each clone are tested
for their production of an appropriate antibody specific for the
desired region of the antigen.
[0044] Other suitable techniques of antibody production include,
but are not limited to, in vitro exposure of lymphocytes to the
antigenic polypeptides or selection of libraries of antibodies in
phage or similar vectors (Huse et al. 1989 Science
246:1275-1281).
[0045] Frequently, the polypeptides and antibodies are labeled,
either covalently or noncovalently, with a substance which provides
for a detectable signal. A wide variety of labels and conjugation
techniques are known. Suitable labels include, but are not limited
to, radionuclides, enzymes, substrates, cofactors, inhibitors,
fluorescent agents, chemiluminescent agents, magnetic particles.
Also, recombinant immunoglobulins can be produced by any method
known in the art.
Identification of Novel Small RNAs Using Comparative Genomics and
Microarrays
[0046] As a starting point for detecting novel sRNAs in E. coli, we
considered a number of common properties of the previously
identified sRNAs that might serve as a guide to identify genes
encoding new sRNAs. We are defining sRNA as relatively short RNAs
that do not function by encoding a complete ORF. Of the 13 small
RNAs known when this work begun, we were struck by the high
conservation of these genes between closely related organisms. In
most cases, the conservation between E. coli and Salmonella was
above 85%, whereas that of the typical gene encoding an ORF was
frequently below 70%. Conservation tests on random noncoding
regions of the genome suggested that extended conservation in
intergenic regions was unusual enough to be used as an initial
parameter to screen for new sRNA genes. We therefore tested this
approach to look for novel sRNAs in the E. coli genome.
[0047] All known sRNAs are encoded within intergenic (Ig) regions
(defined as regions between ORFs). A file containing all Ig
sequences from the E. coli genome (Blattner, F. R. et al. 1997
Science 277:1453-1474) was used as a starting point for our
homology search. We arbitrarily chose the 1.0- to 2.5-Mb region of
the 4.6-Mb E. coli genome to test and refine our approach and
developed the following steps for searching the full E. coli
genome.
[0048] All Ig regions of 180 nucleotides (nt) or larger were
compared to the NCBI Unfinished Microbial Genomes database using
the BLAST program (Altschul, S. F. et al. 1990 J Mol Biol
215:403-410). These 1097 Ig regions were rated based on the degree
of conservation and length of the conserved region when compared to
the closely related Salmonella and Klebsiella pneumoniae species.
The highest rating was given to Ig regions with a high degree of
conservation (raw BLAST score of >80) over at least 80 nt (see
below for explanation of ratings). Note that most promoters do not
meet these length and conservation requirements. FIG. 1 shows a set
of BLAST searches for three known sRNAs (RprA RNA, CsrB RNA, OxyS
RNA), three Ig regions with high conservation (#14, #17, #52) and
one Ig region with intermediate conservation (#36). Some Ig regions
had a large number of matches, often to several chromosomal regions
of the same organism. These Ig regions were noted and many were
found to contain tRNAs, rRNAs, REP, or other repeated sequences.
The 40 highly conserved Ig regions containing tRNAs and/or rRNAs
were eliminated from our search, as these regions were complicated
in their patterns of conservation.
[0049] Next the orientation and identity of the ORFs bordering the
Ig regions were determined using the Colibri database, an annotated
listing of all E. coli genes and their coordinates. Inconsistencies
between the Colibri database and our original file led to the
reclassification of some Ig regions as shorter than 180 nt, and
these were not analyzed further. Of the remaining 1006 Ig regions,
13 contained known small RNAs, 295 were in the highest conservation
group, 88 showed intermediate conservation, and 610 showed no
conservation.
[0050] The location of the conservation relative to the orientation
of the flanking ORFs was an important consideration in choosing
candidates for further analysis. In many cases (132/295 Ig
regions), the conserved region was just upstream of the start of an
ORF, consistent with conservation of regulatory regions, including
untranslated leaders. Cases where the conserved region was >50
nt from an ORF start or extended over more than 150 nt in length
(RprA RNA, CsrB RNA, OxyS RNA, #17, and #52 in FIG. 1), or where
the bordering ORFs ended rather than started at the Ig region (#14
in FIG. 1), were considered better candidates for novel sRNAs.
[0051] Published information on promoters and other known
regulatory sites within conserved regions of promising candidates
was tabulated and used to eliminate many candidates in which the
conservation could be attributed to previously identified promoter
or 5' untranslated leaders. Finally, the remaining candidate
regions were examined for sequence elements such as potential
promoters, terminators, and inverted repeat regions. We considered
evidence for possible stem-loops, in particular those with
characteristics of rho-independent terminators, as especially
indicative of possible sRNA genes (Table 1).
[0052] Using these criteria, together with microarray expression
data (see below), a set of 59 candidates was selected (Table 1).
Candidates 1-18 were chosen in the first round of screening of the
1.0- to 2.5-Mb region; some of these candidates would not have met
the higher criteria applied to the rest of the genome.
TABLE-US-00002 TABLE 1 sRNA Candidates Ig Ig Flanking Selection
Microarray Northern Interpretation of No..sup.a Start Length Genes
Strand.sup.b Criteria.sup.c Detection.sup.d Detection.sup.e
Conservation.sup.f 1 1019277 359 ompA/sulA << C (4), S <
large known ompA leader 2 1102420 754 csgD/csgB <> C (4), L
none faint large known csgD leader, promoter 3 1150625 213
fabG/acpP >> C* (4), S > multiple, 300+ nt known acpP mRNA
& operon 4 1194145 201 ymfC/icd <> C* (0), S > large
leader 5 1297345 476 adhE/ychE <> C (4), L none large known
adhE leader 6 1298466 740 yhcE/oppA >> C (2), L, S > large
+ faint others leader, promoter? 7 1328693 376 yciN/topA <> C
(4) none large known leader, promoter 8 1407055 480 ydaN/dbpA
>> C (4), L none none predict sRNA 9 1515024 314 ydcW/ydcX
>> C (4), L, S <, > 180 nt (<) mRNA, 31 aa ORF 10
1671526 411 ydgF/ydgG <> C (4), L, T none none
promoter/leader? 11 1755132 313 pykF/lpp >> C (4) > (rif)
300 nt known lpp mRNA 12 1762411 550 ydiC/ydiH << C(4), T
none 60 nt (<) sRNA 13 1860454 341 yeaA/gapA <> C (4), S
> (rif) large known gapA leader, promoter? 14 2165049 278
yegQ/orgK >< C (4), L, S > 86 nt (>) sRNA 15 2276258
335 yejG/bcr << C (4), L, S < large leader 16 2403093 633
nuoA/lrhA << C (4), L, S < large + 300 nt known processed
leader 17 2588726 540 acrD/yffB >> C (4), S, I < 175, 266
nt (<) mRNA, 19 aa ORF 18 1339749 196 yciM/pyrF >> C (3),
S* none none promoter/leader? 19 450835 462 cyoA/ampG << C
(4), S* > faint large promoter/leader? 20 753692 708 gltA/sdhC
<> C* (4), S < (rif) faint large known gltA, sdhC leaders
21 986206 605 ompF/asnS << (4), S*, I, P, T < (rif), >
large known ompF leader, promoter 22 2651357 823 sseA/sseB ><
C (4), L, S, I, T > (rif) 320 nt (>) sRNA 24 3348110 223
elbB/arcB << C* (4), S* <, > 45 nt (>) sRNA 25
3578437 332 yhhX/yhhY <> C (4), L, P, T none 90 nt (<)
sRNA 26 3983621 681 aslA/hemY << C (4), T > 210 nt (>)
sRNA 27 4275510 548 soxR/yjcD >> C (4), L, S*, T < 140 nt
(<) sRNA 28 4609568 412 osmY/yjjU >> C (4), L, S* <,
> (rif) 350 nt (>) mRNA, 53 aa ORF 29 454011 346 bolA/tig
>> C* (4), S, I > (rif). large leader or operon 30 668152
370 ybeB/cobC << C (4), L, S*, I, P <, >(rif) large
(>) leader/promoter? 31 887180 180 ybjK/ybjL >< C (4), L
none 80 nt (<) sRNA 32 2590752 343 dapE/ypfH >< C (0), L,
S <, > none 66 aa ORF 33 2967000 684 ygdP/mutH <> C*
(4) none none promoter/leader? 34 3672003 413 yhjD/yhjE >> C*
(4) none none promoter/leader? 35 3719676 284 yiaZ/glyS << C
(4), L, P none large leader/promoter? 36 3773784 508 mtlR/yibL
>> (2), S* < (rif) 500 nt (<) mRNA, 69 aa ORF 37
4638109 402 yjjY/lasT >> C (4), L, P, F > none/faint known
arcA leader 38 4048313 614 yihA/yihI <> C* (4), S, T >
(rif) 270 nt (>) sRNA 39 279100 512 afaB/yagB << C (4), L,
S* <, > faint large IS30, leader/promoter? 40 852161 245
b0816/ybiQ <> C (4), L, P none 205 nt (<) sRNA 41 2974037
584 aas/galR <> C (4), L, S, T <, > 89, 83 nt (<)
sRNA 42 2781229 432 pinH/ypjB << C(1), L, T none none not
conserved 43 3192539 424 yqiK/rfaE >< C*(4), L, S <, >
none predicted sRNA 44 3245066 347 exuR/yqjA >> C (4), L >
none promoter/leader? 45 3376287 221 rplM/yhcM << C* (4),
(S), T < (rif) large leader 46 2531398 386 cysK/ptsH >> C
(4), S*, T <, > (rif) large known ptsH leader 47 4403561 207
purA/yjeB >> C (4), S*, I > large leader/promoter? 48
1239170 391 dadX/ycgO >< C (4), L none none IS end 49 1306670
373 cls/kch << C* (4) none 250 nt (>) mRNA, 57 aa ORF 50
1620541 446 ydeE/ydeH >< C (4), L, I > 185, 220 nt (>)
mRNA, 31 aa ORF 51 1903281 377 yobD/yebN >> C (4), L none
none promoter/leader? 52 1920997 395 pphA/yebY << C (4), L,
S* <, > 275 nt (>), sRNA 100 nt (<) 53 1932629 237
edd/zwf << C (4) < none promoter/leader? 54 2085091 263
yeeF/yeeY << C (4), T < large leader 55 2151151 740
yegL/yegM <> C (4), L > 143 nt + others(>) sRNA 56
2494583 497 ddg/yfdZ >< C (4), L < none known ORF 57
3717395 283 yiaG/cspA >> C*(4), S* <, > large known
cspA leader 58 4177159 415 rplA/rplJ >> C (4), S* <, >
large known operon 59 1668974 396 ynfM/asr >> (2), S* <
none promoter/leader? 60 2033263 591 yedS/yedU >> (1), S*
<, > none not conserved 61 3054807 394 ygfA/serA ><
(1), D <, > 139 nt (>) sRNA .sup.aCandidate numbers. #23
was not analyzed; the region of conservation corresponds to a
published leader sequence. Candidate #61 was added because it is
homologous to candidate #43 and the duplicated regions within #55
(see Text and Table 2). .sup.bOrientation of flanking genes. >
and < denote genes present on the clockwise (Watson) or
counterclockwise (Crick) strand of the E. coli chromosome,
respectively. .sup.cCriteria used for selection of candidates: C,
conservation; C*, long conservation; (#), conservation score. Ig
regions were assigned scores on the basis of BLAST searches (see
text below). #4 and #32 were rerated from 4 (conserved) to 0 on
reanalysis of the endpoints of the flanking ORF (#4) and
information on an ORF within the Ig region (#32). L, location of #
conservation either far from 5' end of flanking gene or near 3' end
of gene; S, signal detected in microarray experiments; S*,
microarray signal on opposite strand to flanking genes; I, inverted
repeat; P, predicted promoter; T, predicted terminator; D,
duplicated gene. .sup.dDetection on high-density oligonucleotide
probe arrays. ><, orientation of signal as in b. Rif, signals
present after 20 min treatment with rifampicin. .sup.eNorthern
analysis of RNA extracted from MG1655 cells grown in three
conditions (LB medium, exponential phase; minimal medium,
exponential phase; LB medium, stationary phase). Strand specific
probes were used for sRNA and mRNAs encoding novel ORFs
(orientation noted < or > as in b); double stranded DNA
probes were used for the rest. For #43, bands were originally
detected with a double stranded probe, but appear to be from
homologs (see text). Large, >400 nt. .sup.finterpretation of
high conservation was based on microarray and Northern analyses as
well as literature. mRNAs, small RNA transcripts predicted to
encode new polypeptides (see text). "known leaders", literature
references supported the existence of leaders corresponding to
conservation. For # 37, conservation is consistent with the leader
of the arcA gene (Compan, I. and Touati, D. # 1994 Mol Microbiol
11: 955-964). The ORF noted for #56 is described in Seoane, A. S.
and Levy, S. B. 1995 J Bacteriol 177: 530-535; and Bouvier, J. et
al. 1992 J Bacteriol 174: 5265-5271; see GenBank entry BAA16347.1.
The IS sequence fragment in the conserved region of #48 is
homologous to that described by McVeigh, A. et al. 2000 Infect
Immun 68: 5710-5715. "leaders", a large band on Northern analysis,
coupled with conservation near the # 5' end of an ORF.
"promoter/leade coupled with conservation near the 5' end of a
gene. "leader/promoter?", RNA signal from microarray or Northern
analyses suggested a leader, while the conservation is far from the
expected position of a leader. "leader or operon", (for #29)
microarray analysis suggested a continuous transcript throughout
Ig. "predicted sRNAs", (for #8 and #43) Igs contain the hallmarks
expected for an # sRNA, but RNA transcripts were not detected. Igs
encoding sRNAs also may include leaders; this is not included in
the conclusion column.
Selecting Candidate Genes by Whole Genome Expression Analysis
[0053] In an independent series of experiments, high-density
oligonucleotide probe arrays were used to detect transcripts that
might correspond to sRNAs from Ig regions. Total RNA isolated from
MG1655 cells grown to late exponential phase in LB medium was
labeled for probes or used to generate cDNA probes (see text
below). From a single RNA isolation each labeling approach was
carried out in duplicate and individually hybridized to
high-density oligonucleotide microarrays. The high-density
oligonucleotide probe arrays used are appropriate for this analysis
as they have probes specific for both the clockwise (Watson) and
counterclockwise (Crick) strands of each Ig region as well as for
the sense strand of each ORF. The resulting data from the four
experiments were analyzed to examine global expression within Ig
regions, as well as neighboring ORFs.
[0054] Our criteria for analyzing the microarray data evolved
during the course of this analysis. Stringent criteria (longer
transcripts in the Ig region, higher expression levels) identified
many of the previously known sRNAs but did not uncover many strong
candidates for new small RNAs. More relaxed criteria (shorter
transcripts, lower expression levels) gave a very large number of
candidates and therefore were not by themselves useful as the
initial basis for identifying candidates. However, this data was
very useful as an additional criterion for selection of candidate
regions based on the conservation approach. Detection of a
transcript by microarray on the strand opposite to that of
surrounding ORFs was considered a strong indicator of an sRNA (S*
in Table 1). Microarray data contributed to the selection of 34 of
59 candidates (Table 1). Examples of the different types of
expression observed in microarray experiments are shown in FIG. 2.
Signal corresponding to CsrB RNA clearly is detected on the Crick
(C) strand. #17 and #36 have a transcript in the Ig region on the
opposing strand (C) to that for the flanking genes (Watson; W).
However, the expression patterns were not as obvious in many cases,
either because expression levels were low or because the pattern of
expression could be interpreted in a number of ways. For instance,
very little expression was detected for RprA RNA encoded on the W
strand, and there is unexplained signal detected from the opposite
strand of the rprA and csrB Ig regions. #14 and #52 also had some
expression on each strand (FIG. 2). #14 proved to express a small
RNA from the Watson strand, while #52 expresses sRNAs from each
strand (see below and Table 2).
[0055] Given that a number of the known sRNAs are relatively
stable, we tested whether selection for stable RNAs might allow the
microarray data to be more useful for de novo identification of
sRNA candidates. The transcription inhibitor rifampicin was added
to cells for 20 min prior to harvesting the RNA with the intention
of enriching for stable RNAs. Many of the known sRNAs can be
detected after the rifampicin treatment. Of the 59 candidates in
Table 1 twelve retained a hybridization signal (marked rif in Table
1), and four of these proved to correspond to small transcripts
(see below). Other rif resistant transcripts detected in Ig regions
appeared to be due to highly expressed leaders. TABLE-US-00003
TABLE 2 Novel sRNAs and Predicted Small ORFs.sup.a Effect on RNA
Hfq rpoS-lacZ.sup.h No. Gene Minute Size.sup.b,c,d Strand.sup.e
Expression.sup.f Binding.sup.g S M Other Information.sup.j 12 rydB
38 60.sup.b <<< M >> S > E NT 0.4 1.0 14 ryeE 47
86.sup.b >>< E, S > M + (E) 0.25 1.2 bordered by
cryptic prophage 22 ryfA 57 320.sup.c >>< E, M NT NT NT
PAIR3 (Rudd, K. E. 1999 Res Microbiol 150: 653-664) 24 ryhA 72
45.sup.b <>< S >> M > E + (S) 1.0 1.9 105, 120
nt, present S >> M > E 105 nt binds Hfq (+, S) 25 ryhB 77
90.sup.b <<> M >> S + (M) 1.2 0.4 multicopy plasmid
restricts growth on succinate 26 ryiA 86 210.sup.b <>< E
> M, S + (E) 0.9 1.5 155 nt, present M > E, S 27 ryjA 92
140.sup.b ><> S >> M - (S) NT NT 31 rybB 19 80.sup.b
><< S >> M + (S) 1.0 2.3 38 ryiB 87 270.sup.b
<>> M > S >> E - (M) 1.0 1.6 CsrC (Romeo, pers.
commun.) 40 rybA 18 205.sup.b ><> S > M > E - (S)
1.2 1.5 ladder up from 255, 300 nt, present S > M > E 41-I
rygA 64 89.sup.b <<> S >> M, E + (S) 1.3.sup.i
1.7.sup.i PAIR2 (Rudd, K. E. 1999 Res Microbiol 150: 653-664) 41-II
rygB 64 83.sup.b <<> S, E > M + (S) 1.3.sup.i 1.7.sup.i
PAIR2 (Rudd, K. E. 1999 Res Microbiol 150: 653-664) 52-I ryeA 41
275.sup.b <>< M > E > S -/+ (M) 1.1.sup.i 1.0.sup.i
148, 152, 180 nt (+ others), present M, S 52-II ryeB 41 100.sup.b
<<< S >> M + (S) 1.1.sup.i 1.0.sup.i 70 nt, present
S >> M 55-I ryeC 46 143.sup.c <>> S > M > E NT
1.2 1.6 QUAD1a (Rudd, K. E. 1999 Res 107.sup.c M > E, S
Microbiol 150: 653-664) 55-II ryeD 46 137.sup.c <>> M >
E > S NT NT NT QUAD1b (Rudd, K. E. 1999 Res 102.sup.c M > E
Microbiol 150: 653-664) 61 rygC 65 139.sup.c >>< S
>> M > E NT NT NT QUAD1c (Rudd, K. E. 1999 Res 107.sup.c
S, M > E Microbiol 150: 653-664) 8 rydA 30 139.sup.d
>(>)> none NT NT NT Expression not detected; predicted
sRNA 43 rygD 69 143.sup.d >(<)< none NT NT NT QUAD1d
(Rudd, K. E. 1999 Res Microbiol 150: 653-664) Expression not
detected 9 yncL 32 180.sup.b ><> S > M > E +/- (S)
NT NT 31 aa ORF 17 ypfM 55 266.sup.b ><> E >> M -/+
(E) 2.0 1.5 19 aa ORF 175 nt, present E, M 28 ytjA 99 305.sup.b
>>> S > M NT NT NT 53 aa ORF 36 yibT 81 500.sup.b
><> S >> E, M NT 1.3 1.0 69 aa ORF 49 yciY 28
250.sup.b <>< E, M NT NT NT 57 aa ORF 50 yneM 35 185.sup.b
>>< S NT NT NT 31 aa ORF 220.sup.b M > E .sup.aTable is
divided into three sections: detected sRNAs, predicted sRNAs and
detected RNAs predicted to encode small ORFs. .sup.b,c,dRNA sizes
estimated from Northern analyses using .sup.b single stranded RNA
probes or .sup.c oligonucleotide probes, or .sup.dfrom predictions
resulting from sequence analysis (see text). .sup.e>< denotes
orientation of sRNA and flanking genes as in Table 1.
.sup.fRelative expression in three growth conditions: E, LB medium,
exponential phase; M, minimal medium, exponential phase; and S, LB
medium, stationary phase. .sup.gRNA coimmunoprecipitation with Hfq
as detected by Northern analysis: +, strong binding (>30% of RNA
bound); +/-, weak binding (5-10%); -/+, minimal binding (<5%),
and -, no detectable binding. E, M, S refer to cell growth
conditions as in f. NT, not tested. .sup.hExpression of rpoS-lacZ
fusion in the presence of multicopy plasmids carrying intergenic
regions. Activity was measured in stationary phase in LB medium (S)
or minimal medium (M) and normalized to the activity of the vector
control in the same experiment. In parallel experiments, cells
carrying the vector alone gave 1.3-2 (S) and 0.7-2.6 (M) units,
cells carrying pRS- # DsrA plasmid gave a 4.9 fold increase (S) and
12 fold increase (M); cells carrying pRS-RprA plasmid gave 3.1 fold
(S) and 3.3 fold (M) increase. Results in table are average of at
least three independent assays. Values in bold were considered
significantly different from the control. NT, not tested. .sup.i#41
and #52 each express two sRNAs so it is not possible to assign a
phenotype to a given small RNA. Thus far there is no evidence for a
strong phenotype for either candidate. .sup.jIncluded is
information about additional RNA bands detected in Northern
analysis.
Small RNA Transcripts Detected by Northern Hybridization
[0056] The final test for the presence of an sRNA gene was the
direct detection of a small RNA transcript. The candidates in Table
1 were analyzed by Northern hybridization using RNA extracted from
MG1655 cells harvested from three growth conditions (exponential
phase in LB medium, exponential phase in M63-glucose medium, or
stationary phase in LB medium). The microarray analysis discussed
above used RNA isolated from cells grown to late exponential phase
in LB medium, which is intermediate between the two LB growth
conditions used for the Northern analysis. Initially, Northern
analysis was carried out using double-stranded DNA probes
containing the full Ig region for most candidates. In three cases
(#8, #22, and #55) PCR amplification of the Ig region to generate a
probe was not successful and therefore oligonucleotide probes were
used for Northern analysis. Seventeen candidates gave distinct
bands consistent with small RNAs, and one additional candidate gave
a somewhat larger RNA, but the location of conservation was not
consistent with a leader sequence for a flanking ORF (#36). In some
of these cases, two or more RNA species were detected with a single
Ig probe (Table 2, see also FIG. 3). One candidate (#43) gave a
signal with the double stranded DNA probe, but contains regions
duplicated elsewhere in E. coli that probably account for this
signal (see below). Of the remaining 41 candidates, 17 gave no
detectable transcript. These Ig regions could encode sRNAs
expressed only under very specific growth conditions. For instance,
#8 has all the sequence hallmarks of an sRNA gene (a well-conserved
region preceded by a possible promoter and ending with a
terminator), but has not been detected. Alternatively, the observed
conservation could be due to nontranscribed regulatory regions.
Fairly large RNAs were detected for another 24 candidates. Given
the size of these transcripts together with data on the orientation
of flanking genes and the location of conserved regions, it is
likely these are leader sequences within mRNAs (Table 1).
[0057] For candidates expressing RNAs not expected to be 5'
untranslated leaders, Northern analysis was carried out with
strand-specific probes to determine gene orientation (FIG. 3). For
many of the candidates, we used sequence elements (see below) as
well as expression information from the microarray experiments to
predict which strand was most likely expressed; both strands were
tested when predictions were unclear. The results from the
strand-specific probes generally agreed with predictions and were
used to estimate the RNA size (Table 2). Interestingly, in one case
there is an sRNA expressed from both the W and C strand within the
Ig (#52; FIG. 3). For #12, although no sRNA had been detected using
a double stranded DNA probe, the presence of a potential terminator
and promoter remained suggestive of the presence of an sRNA gene.
Therefore, oligonucleotide probes also were used in Northern
analysis of this candidate, and a small RNA transcript was detected
(FIG. 3; Table 1).
[0058] Examination of expression profiles of the RNAs under
different growth conditions gave an indication of specificity of
expression. Some candidates were detected under all three growth
conditions; others were preferentially expressed under one growth
condition (FIG. 3; Table 2). For instance, #25 was present
primarily during growth in minimal medium, consistent with the
absence of detection in the whole genome expression experiment,
which analyzed RNA isolated from cells grown in rich medium.
Sequence Predictions of sRNA Genes and ORFs
[0059] For the candidates expressing small RNA transcripts, the
conserved sequence blocks (contigs) from K. pneumoniae, the highest
conserved Salmonella species, and in a few cases Yersinia pestis,
were selected from the NCBI Unfinished Microbial Genome database
and aligned with the E. coli Ig region using GCG Gap (Devereux, J.
et al. 1984 Nucleic Acids Res 12:387-395). Multiple alignments were
assembled by hand, and the conserved regions were examined for
likely promoters and terminators and other conserved structures.
Information from the alignments, together with results from
strand-specific Northern and microarray expression analyses,
allowed assignments of gene orientation, putative regulatory
regions, and RNA length from the predicted starting and ending
positions. Where a terminator sequence was very apparent (13 of 19
candidates), transcription was assumed to end at the terminator,
and the observed size of the transcript was used to help identify
possible promoters. The identification of promoters and terminators
was less definite when there was only one species with conservation
to E. coli.
[0060] As the alignments were assembled, the pattern of
conservation in some cases was reminiscent of patterns expected
from ORFs, with higher sequence variation in positions consistent
with the third nucleotide of codons. GCG Map (Devereux, J. et al.
1984 Nucleic Acids Res 12:387-395) was used to predict translation
in all frames for all of the candidate small RNAs. In six cases,
the conservation and translation potential suggested the presence
of a short ORF. In these cases, a ribosome-binding site and the
potential ORF were well conserved, with the most variation in the
third position of codons, but other elements of the predicted RNA
were less well conserved. For example, #17 expresses an RNA of
about 266 nt, containing a predicted ORF of only 19 amino acids.
Within the predicted Shine-Delgarno sequence and ORF, only 9/80
positions showed variation for either Klebsiella or Salmonella,
while the overall RNA is less than 60% conserved. We predict that
for #17, as well as five others (Table 2), the detected RNA
transcript is functioning as an mRNA, encoding a short, conserved
ORF. An evaluation of both the new predicted ORFs and the
untranslated sRNAs with GLIMMER, a program designed to predict ORFs
within genomes, gave complete agreement with our designations
(Delcher, A. L. et al. 1999 Nucleic Acids Res 27:4636-4641).
[0061] We have assigned gene names to all candidates that we have
confirmed are expressed as RNAs (see Table 2). The genes we predict
to encode ORFs were given names according to accepted practice for
ORFs (Rudd, K. E. 1998 Microbiol Mol Biol Rev 62:985-1019). The
genes that express sRNAs without evidence of conserved ORFs were
named with a similar nomenclature: ryx, with ry denoting RNA and x
indicating the 10 min interval on the E. coli genetic map.
[0062] We noted one instance of overlap in sequence between our new
sRNAs. The conserved region within #43 is highly homologous to a
duplicated region within #55, as well as to a fourth region of the
chromosome within a more poorly conserved Ig (#61 in Table 1). This
repeated region was previously denoted the QUAD repeat and
suggested to encode sRNAs (Rudd, K. E. 1999 Res Microbiol
150:653-664). Each of the QUAD repeats contains a short stretch
homologous to boxC, a repeat element of unknown function present in
50 copies or more within the genome of E. coli (Bachellier, S. et
al. 1996 Repeated Sequences In: Escherichia coli and Salmonella:
Cellular and Molecular Biology eds. F. C. Neidhardt, et al. pp.
2012-2040 American Society for Microbiology, Washington, D.C.).
Rudd also has detected transcripts from the QUAD regions. To
determine which of the four QUAD genes was being expressed, we
designed oligonucleotide probes unique for each of the four
repeats. These oligonucleotide probes demonstrated expression for
three of the four QUAD genes (#55-I, #55-II, and #61); furthermore,
each gave two RNA bands (FIG. 3; Table 2). No signal was detected
for the fourth repeat (#43). The #41 Ig region encodes another pair
of repeats, PAIR2 (Rudd, K. E. 1999 Res Microbiol 150:653-664), and
we observed two RNA species, suggesting that each of the repeats
may be transcriptionally active. Finally, another repeat region
noted by Rudd, PAIR3, is encoded by the #22 Ig region.
Many sRNAs Bind Hfq and Modulate rpoS Expression
[0063] Hfq is a small, highly abundant RNA-binding protein first
identified for its role in replication of the RNA phage Q.beta.
(Franze de Fernandez, M. et al. 1968 Nature 219:588-590; reviewed
in Blumenthal, T. and Carmichael, G. G. 1979 Annu Rev Biochem
48:525-548). Recently, Hfq has been shown to be involved in a
number of RNA transactions in the cell, including translational
regulation (rpoS), mRNA polyadenylation, and mRNA stability (ompA,
mutS, and miaA) (Muffler, A. et al. 1996 Genes & Dev
10:1143-1151; Tsui, H.-C. T. et al. 1997 J Bacteriol 179:7476-7487;
Vytvytska, O. et al. 1998 PNAS USA 95:14118-14123; Hajndsorf, E.
and Regnier, P. 2000 PNAS USA 97:1501-1505; Vytvytska, O. et al.
2000 Genes & Dev 14:1109-1118). Three of the known E. coli
sRNAs regulate rpoS expression: DsrA RNA and RprA RNA positively
regulate rpoS translation, whereas OxyS RNA represses its
translation. In all three cases the Hfq protein is required for
regulation (Zhang, A. et al. 1998 EMBO J. 17:6061-6068; Majdalani,
N. et al. 2001 Mol Microbiol 39:1382-1394; Sledjeski, D. D. et al.
2001 J Bacteriol 183:1997-2005), and binding studies have revealed
a direct interaction between Hfq and the OxyS and DsrA RNAs (Zhang,
A. et al. 1998 EMBO J17:6061-6068; Sledjeski, D. D. et al. 2001 J
Bacteriol 183:1997-2005).
[0064] Given the interaction of the Hfq protein with at least three
of the known sRNAs, we asked how many of the newly discovered sRNAs
are bound by this protein. Hfq-specific antisera was used to
immunoprecipitate Hfq-associated RNAs from extracts of cells grown
under the conditions used for the Northern analysis. Total
immunoprecipitated RNA was examined using two methods. First, RNA
was 3'-end labeled and selected RNAs were visualized directly on
polyacrylamide gels. Under each growth condition, several RNA
species co-immunoprecipitated with Hfq-specific sera but not with
preimmune sera, which indicates that many sRNAs interact with Hfq
(FIG. 4A). Second, selected RNAs were examined using Northern
hybridization to determine whether other known sRNAs and any of our
newly discovered sRNAs interact with Hfq. For each sRNA, Hfq
binding was examined under growth conditions where the sRNA was
most abundant (FIG. 4B; Table 2). sRNAs present in samples using
the Hfq antisera but not preimmune sera were concluded to interact
with Hfq. Comparison of levels of a selected sRNA relative to the
total amount of that sRNA in the extract revealed that many of the
sRNAs bound Hfq quite efficiently (>30% bound) (#14, #24, #25,
#26, #31, #41, #52-II, Spot42 RNA, and RprA RNA), but other sRNAs
bound Hfq less efficiently (<10% bound) (#9, #17, and #52-I), or
not at all (#27, #38, #40, 6S RNA, 5S RNA, and tmRNA) (FIG. 4;
Table 2).
[0065] As mentioned above, at least three of the known sRNAs that
interact with Hfq also regulate translation of rpoS, the stationary
phase sigma factor. In light of the fact that many of the new sRNAs
also interact with Hfq, we examined whether these new sRNAs affect
rpoS expression. Plasmids carrying the Ig regions encoding either
control sRNAs (pRS-DsrA and pRS-RprA) or many of our novel sRNAs
were introduced into an MG1655 Alac derivative carrying a rpoS-lacZ
translational fusion. We then compared expression of the rpoS-lacZ
fusion in these cells to cells carrying the control vector by
measuring .beta.-galactosidase activity at stationary phase in LB
or M63-glucose medium (Table 2). As expected, overproduction of
either DsrA RNA or RprA RNA increased rpoS-lacZ expression
significantly (Table 2 legend). A number of plasmids (pRS-#24,
pRS-#31) led to increased rpoS-lacZ expression, whereas others
(pRS-#12, pRS-#14, and pRS-#25) led to decreased expression. These
results indicate that the corresponding sRNAs may directly regulate
rpoS expression or indirectly affect rpoS expression by altering
Hfq activity, possibly by competition. Intriguingly, there is not a
complete correlation between Hfq binding and altered rpoS-lacZ
expression in these studies.
[0066] As another strategy in defining possible functions for the
sRNAs, we screened strains carrying the multicopy plasmids for
effects on growth in LB medium at various temperatures as well as
growth in minimal medium containing a number of different carbon
sources. pRS-#25 renders cells unable to grow on succinate in
agreement with predictions for #25 RNA interaction with sdh mRNA
(discussed below). We were unable to isolate plasmids carrying the
#27 Ig region without mutations, indicating that overproduction of
this small RNA may interfere with growth. No other growth
phenotypes were observed. A caveat for the interpretation of
results with the multicopy plasmids is that they contain the full
intergenic region, therefore we cannot rule out effects of
sequences outside the sRNA genes but within the intergenic
regions.
[0067] In summary, a multifaceted search strategy to predict sRNA
genes was validated by our discovery of 17 novel sRNAs. Northern
analysis determined that 44 of 60 candidate regions express RNA
transcripts, some of them expressing more than one RNA species. Of
these transcripts, 24 were concluded to be 5' untranslated leaders
for mRNAs of flanking genes, and another six are predicted to
encode new, short ORFs (Tables 1 and 2). The 17 transcripts
believed to be novel, functional sRNAs range from 45 nt to 320 nt
in length and vary significantly in expression levels and
expression profiles under different growth conditions. More than
half of the new sRNAs were found to interact with the RNA-binding
protein Hfq, indicating that Hfq binding may be a defining
characteristic of a family of prokaryotic sRNAs.
Evaluation of Selection Criteria
[0068] Three general approaches for predicting sRNA genes were
evaluated in this work. In the primary approach, Ig regions were
scored for degree and length of conservation between closely
related bacterial species followed by examination of sequence
features. This approach proved to be very productive in identifying
Ig regions encoding novel sRNAs in E. coli; more than 30% of the
candidates selected primarily on the basis of their conservation
proved to encode novel small transcripts. The availability of
nearly completed genome sequences for Salmonella and Klebsiella
made this approach possible. Any organism for which the genome
sequences of closely related species are known can be analyzed in
this way. Comparative genomics of this sort have been used before
to search for regulatory sites (for review, see Gelfand, M. S. 1999
Res Microbiol 150:755-771), but have not been employed previously
to find sRNAs.
[0069] Although we found the conservation-based approach to be the
most productive in identifying sRNA genes, we note a number of
limitations to its use. A high level of conservation is not
sufficient to indicate the presence of an sRNA gene. Many of the
most highly conserved regions, not unexpectedly, were consistent
with regulatory and leader sequences for flanking genes. We also
did not analyze any Ig regions where conservation was attributable
to sources other than an sRNA. For example, potential sRNAs
processed from mRNAs, or any sRNAs encoded by the antisense strand
of ORFs or leaders, may have been missed in our approach. We made
the assumption that Ig regions must be >180 nt to encode an sRNA
of >60 nt, a 50-60-nt promoter and regulatory region to control
expression of the sRNA, as well as regulatory regions for flanking
genes. Any sRNA genes in smaller Ig regions would have been
overlooked. We also excluded the highly conserved tRNA and rRNA
operons from our consideration because of their complexity. It is
certainly possible that sRNA genes may be associated with these
other RNA genes. In fact, sRNA genes have been predicted to be
encoded in at least one tRNA operon. In addition, conservation need
not be a property of all sRNAs. We expect sRNAs that play a role in
modulating cellular metabolism to be well conserved, as is the case
for the previously identified sRNAs. Nevertheless, sRNAs may be
encoded within or act upon regions for which there is no homology
between E. coli, Klebsiella, and Salmonella (e.g., in cryptic
prophages and pathogenicity islands), and they would be missed by
this approach. Only one of 24 Ig regions within the e14, CP4-54, or
CP4-6 prophages showed conservation. A few of these Ig regions
showed evidence of transcription by microarray analysis, and RNAs
have been implicated in immunity regulation in phage P4 (Ghisotti,
D. et al. 1992 Mol Microbiol 6:3405-3413), which is related to the
prophages CP4-54 and CP4-6. Despite the limitations listed above,
however, we believe the use of conservation provides a relatively
quick identification of the majority of sRNAs.
[0070] An alternative genomic sequence-based strategy for
identifying sRNAs would be to search for orphan promoter and
terminator elements as well as other potential RNA structural
elements. Potential promoter elements were generally too abundant
to be useful predictors without other information on their expected
location and orientation. We found sequences predicted to be
rho-independent terminators a more useful indicator of sRNAs; such
sequences were clearly present for 13/17 of the sRNAs and 3/6 of
the new mRNAs. In a number of cases, it appears that the sRNAs
share a terminator with a convergent gene for an ORF. In other
cases, either no terminator was detected or it appeared to be in a
neighboring ORF. A search using promoter and terminator sequences
as the requirements for identifying sRNAs might therefore have
found two-thirds of the sRNAs described here. Phage integration
target sequences also could be scanned for nearby sRNA genes. Many
phage att sites overlap tRNAs (reviewed in Campbell, A. M. 1992 J
Bacteriol 174: 7495-7499), and ssrA, encoding the tmRNA, has a 3'
structure like a tRNA and overlaps the att site of a cryptic
prophage (Kirby, J. E. et al. 1994 J Bacteriol 176:2068-2081). In
this work, we found that the 3' end and terminator of #14 overlaps
the previously mapped phage P2 att site (Barreiro, V. and
Haggard-Ljungquist, E. 1992 J Bacteriol 174:4086-4093). # 14 sRNA
does not obviously resemble a tRNA, suggesting that the overlap
between phage att sites and RNA genes extends beyond tRNAs and
related molecules and may be common to additional sRNAs.
[0071] Our second approach, high-density oligonucleotide probe
array expression analysis, proved to be more useful in confirming
the presence of sRNA genes first found by the conservation approach
than in identifying new sRNA genes de novo. Further consideration
of the location of microarray signal compared to flanking genes as
well as analysis of microarray signals after a variety of growth
conditions should expand the ability to detect sRNAs in this
manner. Under a single growth condition, signal consistent with the
RNA identified by Northern analysis was detected for 5/15 of the Ig
regions proven to encode new sRNAs and for 4/6 of the new mRNAs.
Thus, a similar analysis of microarray data in nonconserved genomic
regions might help in the identification of sRNAs missed by the
conservation-based approaches. We predict that sRNAs from any
organism expressed at reasonably high levels under normal growth
conditions will be detected by microarrays that interrogate the
entire genome, inclusive of noncoding regions.
[0072] One clear limitation in detecting sRNAs with microarray or
Northern analyses is the fact that some sRNAs may be expressed only
under limited growth conditions or at extremely low levels. We
chose three growth conditions to scan our samples. While most of
the previously known sRNAs were seen under these conditions, OxyS
RNA, which is induced by oxidative stress, was not detectable. For
a few of our candidates in which no RNA was detected, it is
possible that an sRNA is encoded but is not expressed sufficiently
to be detected under any of our growth conditions. Another possible
limitation of hybridization-based approaches is that highly
structured sRNAs may be refractory to probe generation. sRNA
transcripts may not remain quantitatively represented after the
fragmentation used in the direct labeling approach here. cDNA
labeling also may underrepresent sRNAs because they are a small
target for the oligonucleotide primers, and secondary structure can
interfere with efficiency of extension.
[0073] As our third approach, sRNAs were selected on the basis of
their ability to bind to the general RNA binding protein, Hfq.
Northern analysis revealed that many of our novel sRNAs interact
with Hfq. In preliminary microarray analysis of Hfq-selected RNAs
to look for additional unknown sRNAs, DsrA RNA, DicF RNA, Spot42
RNA, #14, #24, #25, #31, #41, and #52-II were detected among those
RNAs with the largest difference in levels between Hfq-specific
sera and pre-immune sera. This preliminary experiment suggests that
microarray analysis of selected RNAs will be very valuable on a
genome-wide basis. Interestingly, a large number of genes with
leaders and a number of RNAs for operons were found to
co-immunoprecipitate with Hfq (including the known Hfq target
nlpD-rpoS mRNA (Brown, L. and Elliott, T. 1996 J Bacteriol
178:3763-3770). It seems likely that the subset of sRNAs binding a
common protein will represent a subset in terms of function; the
sRNAs of known function associated with Hfq in our experiments
appear to be those involved in regulating mRNA translation and
stability. Other sRNAs have been shown to interact with specific
prokaryotic RNA-binding proteins, for example, tmRNA with SmpB
(Karzai, A. W. et al. 1999 EMBO J. 18:3793-3799), and the
possibility of other sRNAs interacting with these proteins or other
general sRNA-binding proteins should be tested. This approach is
adaptable to all organisms, and, in fact, binding to Sm and
Fibrillarin proteins has been the basis for identification of
several sRNAs in eukaryotic cells (Montzka, K. A. and Steitz, J. A.
1988 PNAS USA 85:8885-8889; Tyc, K. and Steitz, J. A. 1989 EMBO
J8:3113-3119).
[0074] All the criteria we used to identify sRNAs also will detect
short genes encoding new small peptides, and we have found six
conserved short ORFs. Although our approach was intended to develop
methods to identify non-translated genes within the genome, short
ORFs also are missing from annotated genome sequences. The
combination of a requirement for conservation and/or transcription
with sequence predictions for ORFs should add significantly to our
ability to recognize short ORFs. Small polypeptides have been shown
to have a variety of interesting cellular roles. We expect that the
short ORFs we have found are involved in signaling pathways, akin
to those of B. subtilis peptides that enter the medium and carry
out cell-cell signaling (reviewed in Lazazzera, B. A. 2000 Curr
Opin Microbiol 3:177-182).
Characteristics and Functions of New sRNAs
[0075] The current work serves as a blueprint for the prediction,
detection, and characterization of a large group of novel sRNAs. We
have definitive information on characteristics that provide
information regarding the cellular roles of these new sRNAs.
Several known sRNAs that bind the Hfq protein act via base pairing
to target mRNAs. The finding that a number of our new sRNAs bind
Hfq indicates a similar mechanism of action for this subset of
sRNAs. We searched the E. coli genome for possible complementary
target sequences and examined phenotypes associated with multicopy
plasmids containing new sRNA genes. Intriguingly, #25, an sRNA
preferentially expressed in minimal medium, has extended
complementarity to a sequence near the start of sdhD, the second
gene of the succinate dehydrogenase operon. When the #25 Ig region
is present on a multicopy plasmid, it interferes with growth on
succinate minimal medium (Table 2), consistent with #25 sRNA acting
as an antisense RNA for sdhD. Complementarity to many target mRNAs
was found for a number of other novel sRNAs, confirming the
validity of this analysis.
[0076] As outlined in the evaluation of each of our approaches, we
do not expect our searches have been exhaustive. sRNAs also have
been detected by others using a variety of approaches. The sRNA
encoded by #38 was independently identified as a regulatory RNA
(CsrC RNA; T. Romeo, pers. comm.), and others have found additional
sRNAs using variations of the approaches used here (Argaman, L. et
al. 2001. Curr. Biol. in press). Nevertheless, we think it unlikely
that there are many more than 50 sRNAs encoded by the E. coli
chromosome and by closely related bacteria. We expect such sRNAs to
be present and playing important regulatory roles in all organisms.
Using the approaches described here, it is feasible to search all
sequenced organisms for these important regulatory molecules. We
anticipate that study of the expanded list of sRNAs in E. coli will
allow a more complete understanding of the range of roles played by
regulatory sRNAs.
EXAMPLE 1
Computer Searches
[0077] Ig regions are defined here as sequences between two
neighboring ORFs. We compared Ig regions of >180 nt against the
NCBI Unfinished Microbial Genomes database (Worldwide web at
ncbi.nlm.nih.gov/Microb_blast/unfinishedgenome.html) using the
BLAST program (Altschul, S. F. et al. 1990 J Mol Biol 215:403-410).
Salmonella enteritidis sequence data were from the University of
Illinois, Department of Microbiology (Worldwide web at
salmonella.org). Salmonella typhi and Yersinia pestis sequence data
were from the Sanger Centre (Worldwide web at
sanger.ac.uk/Projects/S_typhi/ and
sanger.ac.uk/Projects/Y_pestis/). Salmonella typhimurium,
Salmonella paratyphi, and Klebsiella pneumoniae sequences were from
the Washington University Genome Sequencing Center.
[0078] Each Ig region was rated based on the best match to
Salmonella or K. pneumoniae species. Ig regions containing
previously identified sRNAs were rated 5 (each of them met the
criteria to be rated 4). Ig regions were rated 4 if the raw BLAST
score was >200 (hatched bars in FIG. 1) or 80-200
(double-diagonal bars in FIG. 1) extending for >80 nt; 3 if the
raw BLAST score was 80-200 (double-diagonal bar) extending for
60-80 nt; 2 if the raw BLAST score was 50-80 (diagonal bar)
extending for >65 nt; and 1 if the raw BLAST score was <50
(diagonal-dash, solid or none) or <65 nt. The location of the
longest conserved section(s) within each Ig and the number of
matches to the NCBI Unfinished Microbial database were recorded.
Note that the computer searches were done from May 2000 to December
2000; more sequences are expected to match as the database
continues to expand. The identity and orientation of genes flanking
each Ig region were determined from the Colibri database (using
http://for genolist.pasteur.fr/Colibri). Ig regions that the
Colibri database predicted to be <180 nt in length and Ig
regions containing tRNA and/or rRNAs were rated 0 and removed from
further consideration.
Strains and Plasmids
[0079] Strains were grown at 37.degree. C. in Luria-Bertani (LB)
medium or M63 minimal medium supplemented with 0.2% glucose and
0.002% vitamin Bi (Silhavy, T. J. et al. 1984 Experiments with gene
fusions Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.)
except for phenotype testing of strains carrying multicopy plasmids
as described below. Ampicillin (50 .mu.g/ml) was added where
appropriate. E. coli MG1655 was the parent for all strains used in
this study. MG1655 .DELTA.lac (DJ480, obtained from D. Jin, NCI),
was lysogenized with a .lamda. phage carrying an rpoS-lacZ
translational fusion (Sledjeski, D. D. et al. 1996 EMBO
J15:3993-4000) to create strain SG30013.
[0080] To generate clones containing the Ig region of each
candidate (pCR-#N where N refers to candidate number; see Table 1),
Ig regions were amplified by PCR from a MG1655 colony and cloned
into the pCRII vector using the TOPO TA cloning kit (Invitrogen).
Oligonucleotides were designed so the entire conserved region and
in most cases the full Ig region was included. In a few cases,
repeated sequences or other irregularities required a reduction in
the Ig regions cloned. See Table 3 for a list of all
oligonucleotides used in this paper. Ig regions encoding sRNAs also
were cloned into multicopy expression vectors (pRS-#N) in which
each Ig region is flanked by several vector-encoded transcription
terminators. To generate pRS-#N plasmids, pCR-#N plasmids were
digested with BamHI and XhoI and the Ig-containing fragments were
cloned into the BamHI and SalI sites of pRS1553 (Pepe, C. M. et al.
1997 J Mol Biol 270:14-25), replacing the lacZ-.alpha.peptide. To
construct pBS-spot42, the Spot42-containing fragment was amplified
by PCR from K12 genomic DNA, digested with EcoRI and BamHI and
cloned into corresponding sites in pBluescript II SK.sup.+
(Stratagene). All DNA manipulations were carried out using standard
procedures. All clones were confirmed by sequencing. TABLE-US-00004
TABLE 3 Oligonucleotides Oligo SEQ ID Candidate Name Sequence NO
Number KW-39 GCGCCTCGTTATCATCCAAAATACG 32 #1 KW-40
GTCGCCCAGCCAATGCTTTCAGTCG 33 KW-41 ATTGATCGCACACCTGACAGCTGCC 34 #2
KW-42 GTTGTCACCCTGGACCTGGTCGTAC 35 KW-43 TGACCGCGATTTGCACAAAATGC 36
#3 KW-44 ACTCTTAAATTTCCTATCAAAACTCGC 37 KW-45
GGTATTTTCAGAGATTATGAATTGCCG 38 #4 KW-46 TCACCTCTCCTTCGAGCGCTACTGG
39 KW-47 AATGCTCTCCTGATAATGTTAAACTT 40 #5 KW-48
GGTTAGCTCCGAAGCAAAAGCCGGAT 41 KW-49 TAATTCCTTTCAAATGAAACGGAGC 42 #6
KW-50 GGACTCCCTCATTATAATTACTGG 43 KW-51 CTCCTTAAACAAGGACATTAGTCTACG
44 #7 KW-52 ATTCACCTTACCTAATTTGATTCTTCC 45 KW-123
CCATCGCTTGACGTTGCATTCACCTGC 46 #8(probes) KW-124
GTCGGCGTCGTACGAATCAATTGTGC 47 KW-125 GCACAATTGATTCGTACGACGCCGAC 48
KW-55 TAAGGATAATATTGCAGATCGTAAG 49 #9 KW-56 ATCATCAAACAGCAACTTGCCC
50 KW-57 TGTCCTTCTCCTGCAAGAGAATTATT 51 #10 KW-58
GCTAATAATAATGTCTTTTTCGCTCC 52 FR-100
GCTTTTGTGAATTAATTTGTATATCGAAGCG 53 #11 FR-101
TATTAATACCCTCTAGATTGAGTTAATC 54 FR-102 CGATTTACCTCACTTCATCGCTTTCAG
55 #12 FR-103 TGATCCTGACTTAATGCCGCAAGTTC 56 FR-104
GCTTATCTCCGGCACTCTCAGTGGCTTAGCTCT 57 (probe) TGAAGG FR-105
TTGCTCACATCTCACTTTAATCGTGCTC 58 #13 FR-106
ATATTCCACCAGCTATTTGTTAGTGAATAAAAG 59 G FR-107
TGATTAATTTCGATTATTTTTCCCGGATGG 60 #14 FR-108
ATTAGAAACAGGAAGCCCCTCAGTCGAG 61 FR-109
TTATTTTCCCCGGAAGCACATTCACTTCAC 62 #15 FR-110
TGATCTATTGCACAACGAGGAAGC 63 FR-111 TGCTTACTCATCAAAAGTAGCGCCAGATTC
64 #16 FR-112 TAATCGACGGACGATAGATAATTCCTG 65 FR-113
CCAATGTGTCGCCTTTTTCAACTTTCCG 66 #17 FR-114
CGATTTATGAGAATAAATACTCATTTAAGGGTG 67 FR-115
AAATCCGACTTTAGTTACAACATAC 68 #18 FR-116 GACCAGACCTTCTTGATGATGGGCAC
69 KW-69 CGACCTCAATTCCACGGGATCTGG 70 #19 KW-70
ATTTAGCTGTAGTAATCACTCGCCG 71 KW-71 GGTCTCCTTAGCGCCTTATTGCG 72 #20
KW-72 CGCCCACATGCTGTTCTTATTATTCCC 73 KW-73 TTTATGACACCTGCCACTGCCGTC
74 #21 KW-74 CTGTCAAGTTATCTGTTTGTTAAGTCAAGC 75 KW-126
GCTGTGAAGCACCTGCGTTGCTCATG 76 #22 KW-127
GCTGTGAAACACCTGCATTTACGGCCACGG 77 (probes) KW-128
CCGTGGCCGTAAATGCAGGTGTTTCACAGC 78 KW-77 CCTTTCGCAATTGACTGAAACAC 79
#24 KW-78 GGCTAGACCGGGGTGCGCG 80 KW-79 AAGGTGGTTATTTACACCTTAGCG 81
#25 KW-80 GTCCTCTTTGGGGTAAATGTC 82 KW-81 AATGCTCCGGTTTCATGTCATC 83
#26 KW-82 TAGTTCCTTCTCACCCGGAG 84 FR-117
CACAAGGGCGCTTTAGTTTGTTTTCCG 85 #27 FR-118
ATCCCCTGAGAGTTTAATTTTCGTCAAG 86 KW-85 TAATTCGTCGTAATTCGTCCTCC 87
#28 KW-86 CTCTGCCTTCCTGTTTTTGTTGTG 88 FR-119
AAACGCATTTGCAACTGTCGGCGCTTTTCC 89 #29 FR-120
CTTGTTACCTCAAAAAATCACAGTGCTCG 90 FR-121 GCAGTCGGTGATGCTGGATTTGCCCTG
91 #30 FR-122 GTTTTTTTACGGGTAAGCCGCAACGACCATTG 92 FR-123
TAGTAGATAAGTTTTAGATAAC 93 #31 FR-124 TAAAACTGAAGTTGCCCTGAAAATG 94
FR-125 TGATGAGTGGTTCTGCAAGAGG 95 #32 FR-126 TAAAAGACAGATTACCTGGCCTG
96 FR-127 CGGACTACCTCAAAATAAAGCTTTATATACG 97 #33 FR-128
GTCATGATACCTTGATTAAAAAACAAACAGC 98 FR-129
GGCTATAATGCGCACATAACCTCTTG 99 #34 FR-130
AATCTTTTCTTATTTTTTGGCTAACGAATAGCC 100 FR-131
GTCCAACTTTTTGGGGTCAGTACAAACTTTG 101 #35 FR-132
TAATAACGCCGTTATTAAATAGCCTGCC 102 FR-133 TAAGCAACGTCTGCTTACTGCCCCTC
103 #36 FR-134 GTGATGGCTTCTGATAAAGATAAATTTATAGCC 104 FR-135
TAACAGGCTAAGAGGGGC 105 #37 FR-136 ATTGCCACTCTTCTTGATCAAATAACCG 106
FR-137 AATGCGTCTGTTGATAATTCAAATTAGTC 107 #38 FR-138
TAGCCGTTTTATTCAGTATAGATTTGCG 108 KW-89 GTTCGTCGGTAACCCGTTTCAGC 109
#39 KW-90 ATGGCTTAAAGAGAGGTGCC 110 KW-91 CGTACTTTAAAGGGAGAATGAC 111
#40 KW-92 GTGCTTCCTCATTATGGTGACG 112 KW-93 GAATGGAGGGAGATTACACG 113
#41 KW-94 CCTTAGTGGGTAAACGCTTAC 114 KW-95 CTTTCAGGCAGCTAAGGAAAG 115
#42 KW-96 CAATATGTATTATTGATTGAGTAAACGGG 116 KW-97
CCTCTTCCAGGAATAATCCC 117 #43 KW-98 CGGAAAGCGGTTCACAGATC 118 KW-132
CTCGTAAGTTTCGCAGCTTATTA 119 #43 (probe) KW-99 TGAAATTCCTGTCCGACAGG
120 #44 KW-100 GCACTACCGCAATGTTATTGC 121 KW-101
GCTTAGCCAATAAATAGTTACACG 122 #45 KW-102 TAAAACCTGTCACAAATCACAAA 123
KW-103 GTGGCCTGCTTCAAACTTTCG 124 #46 KW-104 GTAAAGTCTAGCCTGGCGGTTCG
125 FR-139 TAATTCTGGTACGCCTGGCAGATATTTTGCC 126 #47 FR-140
ATCAACCTCAAAAGGGAAATCGGG 127 KW-105 TAACTTGTTGTAAGCCGGATCGG 128 #48
KW-106 TGAAGCATCTATCGCCGGTTGCG 129 KW-107
GATTAGAAATCCTTTTGAAAGCGCATTG 130 #49 KW-108 CTTATTGGGCACCGCAATGG
131 KW-109 CGAACACAATAAAGATTTAATTCAGCC 132 #50 KW-110
CTGATGCTACTGTGTCAACG 133 KW-111 AATAATCAGACATAGCTTAGGC 134 #51
KW-112 GCCGTGATGGTTTTCGCGTTC 135 KW-113 TATTTTCCTCCCGCGCTAAAG 136
#52 KW-114 TTCAGCTGATGACCACCACGCTT 137 KW-115
GAGTTGTCAGAGCAGGATGATTC 138 #53 KW-116 TATCTGCGCTTATCCTTTATGG 139
KW-117 CCTTTACGGTGATAACCGTCGCG 140 #54 KW-118
CTGACAAGCCTCTCATTCTCTTGTC 141 KW-119 GAGAATTATCGAGGTCCGGTATC 142
#55 KW-120 CTACGCGTTAGCGATAGACTGC 143 FR-141
AGGCTTACTAAGAACACCAGGGGGAGGGGAA 144 probe for 55-I FR-142
AGTCATAAGCTTCCCCGCTTACTAAGACTA 145 probe for 55-II KW-121
CCTCAAATCGGCCATAATAACC 146 56 KW-122 TAAACACCGTCGTCAGAAATGC 147
FR-143 TAGACTTTTATCCACTTTATTGCTG 148 #57 FR-144
GTGTGCCTTTCGGCGATATGGCGTG 149 FR-145 CCTTTACGTGGGCGGTGATTTTGTC 150
#58 FR-146 TAGCTTTGCTCCTGGATGTTTGCC 151 FR-147
GCTGTAATTTATTCAGCGTTTGTACATACG 152 #59 (probe) FR-148
TCAGTCAACTCGCTGCGGCGTGTTAC 153 #60 FR-149
CTTATTGTTGCTTAGTTAGGGTAGTCAC 154 KW-131 CAGTCAGTCTCAGGGGAGGAGCAATC
155 #61 (probe) KW-59 TGAATGCACAATAAAAAAATCCCGACCCTG 156 For DsrA
KW-60 AGTCGCGCAGTACTCCTCTTACCAG 157 Ig region KW-63
TAATTTCTCATCAGGCGGCTCTGC 158 for RprA Ig KW-64
TAACATTATCAGCCTGCTGACGGC 159 region sp42-5'-1
GGCCGAATTCGTAGGGTACAGAGGTAAG 160 for cloning sp42-3'-1
GGCCGGATCCGTCATTACTGACTGGGGCGG 161 pBSspot42
RNA Analysis
[0081] RNA for Northern analysis was isolated directly from
.about.3.times.10.sup.9 cells in exponential growth
(OD.sub.600=0.2-0.4) or stationary phase (overnight growth) as
described previously (Wassarman, K. M. and Storz, G. 2000 Cell
101:613-623). Five-.mu.g RNA samples were fractionated on 10%
polyacrylamide urea gels and transferred to Hybond N membrane as
described previously (Wassarman, K. M. and Storz, G. 2000 Cell
101:613-623). For Northern analysis of candidate regions,
double-stranded DNA probes were generated by PCR from a colony of
MG1655 cells or from the pCR-#N plasmids with oligonucleotides used
for cloning the pCR-#N plasmids. PCR amplification was done with
52.degree. C. annealing for 30 cycles in 1.times.PCR buffer (1 mM
each dATP, dGTP, and dTTP; 2.5 .mu.M dCTP; 100 .mu.Ci
[.alpha..sup.32P] dCTP; 10 ng plasmid; 1 unit taq polymerase)
(Perkin Elmer). Probes were purified over G-50 microspin columns
(Amersham Pharmacia Biotech) prior to use. Northern membranes were
prehybridized in a 1:1 mixture of Hybrisol I and Hybrisol II
(Intergen) at 40.degree. C. DNA probes with 500 .mu.g sonicated
salmon sperm DNA were heated for 5 min to 95.degree. C., added to
prehybridization solution, and membranes were hybridized overnight
at 40.degree. C. Membranes were washed by rinsing twice with
4.times.SSC/0.1% SDS at room temperature followed by three washes
with 2.times.SSC/0.1% SDS at 40.degree. C. Northern blot analysis
using RNA probes was done as described previously (Wassarman, K. M.
and Steitz, J. A. 1992 Mol Cell Biol 12:1276-1285). RNA probes were
generated by in vitro transcription according to manufacturer
protocols (Roche Molecular Biochemicals) from pCR-#N plasmids
linearized with EcoRV or HinDIII using SP6 RNA polymerase or T7 RNA
polymerase, respectively; pBS-6S (pGS0112; Wassarman, K. M. and
Storz, G. 2000 Cell 101:613-623) or pBS-spot42 were linearized with
EcoRI using T3 RNA polymerase; pGEM-5S (pG5019; Altuvia, S. et al.
1997 Cell 90:43-53) or pGEM-10Sa (Altuvia, S. et al. 1997 Cell
90:43-53) were linearized with EcoRI using SP6 RNA polymerase.
Oligonucleotide probes were labeled by polynucleotide kinase
according to manufacturer protocols (New England Biolabs) using
[.gamma..sup.32P]ATP (>5000 Ci/mmole; Amersham Pharmacia
Biotech). For oligonucleotide probes, Northern membranes were
prehybridized in
[0082] Ultrahyb (Ambion) at 40.degree. C. followed by addition of
labeled oligonucleotide probe and hybridization overnight at
40.degree. C. Membranes were washed twice with 2.times.SSC/0.1% SDS
at room temperature followed by two washes with 0.1.times.SSC/0.1%
SDS at 40.degree. C. for 15 minutes each.
Immunoprecipitation
[0083] Immunoprecipitations were carried out using extracts from
cells in exponential growth (OD.sub.600=0.2-0.4) or stationary
phase (overnight growth) as described previously (Wassarman, K. M.
and Storz, G. 2000 Cell 101:613-623), using rabbit antisera against
the Hfq protein or preimmune serum. After immunoprecipitation, RNA
was isolated from Protein A Sepharose-antibody pellets by
extraction with phenol:chloroform:isoamyl alcohol (50:50:1)
followed by ethanol precipitation. RNA was examined on gels
directly after 3' end labeling or analyzed by Northern
hybridization after fractionation on 10% polyacrylamide urea gels
as described previously (Wassarman, K. M. and Storz, G. 2000 Cell
101:613-623).
rpoS-lacZ Expression
[0084] Effects on rpoS-lacZ expression by multicopy plasmids
containing the novel sRNAs were determined from a single colony of
SG30013 transformed with pRS-#N, grown for 18 h in 5 ml of
LB-ampicillin medium or M63-ampicillin medium supplemented with
0.2% glucose at 37.degree. C. .beta.-galactosidase activity in the
culture was assayed as described previously (Zhou, Y.-N. and
Gottesman, S. 1998 J Bacteriol 180:1154-1158). The numbers provided
in Table 2 were calculated as the ratio between pRS-#N and the
pRS1553 vector control.
Phenotype Testing
[0085] To test carbon source utilization or temperature sensitivity
associated with the multicopy plasmids containing the novel sRNAs,
a single colony of MG1655 transformed with a given pRS-#N was grown
for 6 hours in 5 ml LB-ampicillin medium at 37.degree. C. Then 10
.mu.l of serial dilutions (10.sup.-2, 10.sup.-4, and 10.sup.-6)
were spotted on M63-ampicillin plates containing 0.2% of the carbon
source being tested (glucose, arabinose, lactose, glycerol, ribose,
or succinate) and grown at 37.degree. C.; or on LB plates incubated
at room temperature or 42.degree. C. Plates were analyzed after
both 1 and 2 days. Failure to grow in Table 2 indicates an
efficiency of plating of <10.sup.-3.
Microarray Analysis
[0086] RNA for microarray analysis was isolated using the
MasterPure RNA purification kit according to the manufacturer
protocols (Epicentre) from MG1655 cells grown to OD.sub.600=0.8 in
LB medium at 37.degree. C. DNA was removed from RNA samples by
digestion with DNase I for 30 min at 37.degree. C. Probes for
microarray analysis were generated by one of two methods: direct
labeling of enriched mRNA or generation of labeled cDNA.
[0087] To generate direct labeled RNA probes, mRNA enrichment and
labeling was done as described in the Affymetrix expression
handbook (Affymetrix). Oligonucleotide primers complementary to 16S
and 23S rRNA were annealed to total RNA followed by reverse
transcription to synthesize cDNA strands complementary to 16S and
23S rRNA species. 16S and 23S were degraded with RNase H followed
by DNase I treatment to remove cDNA and oligonucleotides. Enriched
RNA was fragmented for 30 min at 95.degree. C. in 1.times.T4
polynucleotide kinase buffer (New England Biolabs), followed by
labeling with .gamma.-S-ATP and T4 polynucleotide kinase and
ethanol precipitation. The biotin label was introduced by
resuspending RNA in 96 .mu.l of 30 mM MOPS (pH 7.5), 4 .mu.l of a
50 mM Iodoacetylbiotin solution, and incubating at 37.degree. C.
for 1 hr. RNA was purified using the RNA/DNA Mini Kit according to
manufacturer protocols (QIAGEN).
[0088] To generate cDNA probes, 5 .mu.g of total RNA was reverse
transcribed using the Superscript II system for first strand cDNA
synthesis (Life Technologies) and 500-ng random hexamers. RNA and
primers were heated to 70.degree. C. and cooled to 25.degree. C.;
reaction buffer was then added, followed by addition of Superscript
II and incubation at 42.degree. C. RNA was removed by RNase H and
RNase A. The cDNA was purified using the Qiaquick cDNA purification
kit (QIAGEN) and fragmented by incubation of up to 5 .mu.g cDNA and
0.2 U DNase I for 10 min at 37.degree. C. in 1.times. one-phor-all
buffer (Amersham-Pharmacia Biotech). The reaction was stopped by
incubation for 10 min at 99.degree. C., and fragmentation was
confirmed on a 0.7% agarose gel to verify that average length
fragments were 50-100 nt. Fragmented cDNA was 3'-end-labeled with
terminal transferase (Roche Molecular Biochemicals) and
biotin-N-6-ddATP (DuPont/NEN) in 1.times.TdT buffer (Roche
Molecular Biochemicals) containing 2.5 mM cobalt chloride for 2
hours at 37.degree. C.
[0089] Hybridization to microarrays and staining procedures were
done according to the Affymetrix expression manual (Affymetrix).
The arrays were read at 570 nm with a resolution of 3 .mu.m using a
laser scanner.
[0090] The expression of genes was analyzed using the Affymetrix
Microarray Suite 4.01 software program. Detection of transcripts in
intergenic regions was done using the intensities of each probe
designed to be a perfect match and the corresponding probe designed
to be the mismatch. If the perfect match probe showed an intensity
that was 200 units higher than the mismatch probe, the probe pair
was called positive. Two neighboring positive probe pairs were
considered evidence of a transcript. The location and length of the
transcripts were estimated based on the first and last identified
positive probe pair within an Ig region.
[0091] While the present invention has been described in some
detail for purposes of clarity and understanding, one skilled in
the art will appreciate that various changes in form and detail can
be made without departing from the true scope of the invention. All
patents, patent applications and publications referred to above are
hereby incorporated by reference.
Sequence CWU 1
1
161 1 93 DNA E. Coli 1 gccccttcaa gagctaagcc actgagagtg ccggagataa
gcgccggatg gggtagaaac 60 ccttaagcct gtgtcgcaca gacttaaggg ttt 93 2
86 DNA E. Coli 2 tcgctgaaaa acataaccca taaaatgcta gctgtaccag
gaaccacctc cttagcctgt 60 gtaatctccc ttacacgggc ttattt 86 3 307 DNA
E. Coli 3 actgcggccc tttccgccgt ctcgcaaacg ggcgctggct ttaggaaagg
atgttccgtg 60 ccgtaaatgc aggtgtttca cagcgcttgc tatcgcggca
atatcgccag tggtgctgtc 120 gtgatgcggt cttcgcatgg accgcacaat
gaagatacgg tgcttttgta tcgtacttat 180 tgtttctggt gcgctgttaa
ccgaggtaaa taataaccgg agtctctccg gcgacaattt 240 actggtggtt
aacaaccttc agagcagcaa gtaagcccga atgccgccct ttgggcggca 300 tatttta
307 4 65 DNA E. Coli 4 acggcgcagc caagatttcc ctggtgttgg cgcagtattc
gcgcaccccg gtctagccgg 60 ggtca 65 5 92 DNA E. Coli 5 cgcgatcagg
aagaccctcg cggagaacct gaaagcacga cattgctcac attgcttcca 60
gtattactta gccagccggg tgctggcttt tt 92 6 211 DNA E. Coli 6
aacgagtaga tgctcattcc atctcttatg ttcgccttag tgcctcataa actccggaat
60 gacgcagagc cgtttacggt gcttatcgtc cactgacaga tgtcgcttat
gcctcatcag 120 acaccatgga cacaacgttg agtgaagcac ccacttgttg
tcatacagac ctgttttaac 180 gcctgctccg taataagagc aggcgttttt t 211 7
141 DNA E. Coli 7 catcaacacc aaccggaacc tccaccacgt gctcgaatga
ggtgtgttga cgtcggggga 60 aaccctcctg tgtaccagcg ggatagagag
aaagacaaag accggaaaac aaactaaagc 120 gcccttgtgg cgctttagtt t 141 8
79 DNA E. Coli 8 tgccactgct tttctttgat gtccccattt tgtggagccc
atcaaccccg ccatttcggt 60 tcaaggttga tgggttttt 79 9 272 DNA E. Coli
9 tgtttaaagc aaaggcgtaa agtagcaccc atagagcgag gacgctaaca ggaacaatga
60 ctcaggatga gggtcaggag cgccaggagg cgaagacaga ggattgtcag
gaagacaaac 120 gtccggagac gtaattaaac ggaaatggaa tcaacacgga
ttgttcccta aaggaaaaac 180 agggtgtgtt ggcggcctgc aaggattgta
agacccgtta agggttatga gtcaggaaaa 240 aaggcgacag agtaatctgt
cgcctttttt ct 272 10 195 DNA E. Coli 10 acattgtaaa ccagagttgc
gaaggtacaa aaaattaacg ttttagcaat agctatataa 60 tatagcctgt
gctatatctg tatgtaatgc aatcatccct caaggatcga cgggattagc 120
aagtcaggag gtcttatgaa tgagttcaag aggtgtatgc gcgtgtttag tcattctccc
180 tttaaagtac ggtta 195 11 82 DNA E. Coli 11 atcccagagg tattgatagg
tgaagtcaac ttcgggttga gcacatgaat tacaccagcc 60 tgcgcagatg
cgcaggtttt tt 82 12 92 DNA E. Coli 12 atcccagagg tattgattgg
tgagattatt cggtacgctc tcttcgtacc ctgtctcttg 60 caccaacctg
cgcggatgcg caggtttttt tt 92 13 278 DNA E. Coli 13 actataaagt
cagcgaagga aatgcttctg gcttttaaca gataaaaaga gaccgaacac 60
gattcctgta ttcggtccag ggaaatggct cttgggagag agccgtgcgc taaaagttgg
120 cattaatgca ggcttagttg ccttgccctt taagaataga tgacgacgcc
aggttttcca 180 gtttgcgtgc aaaatggtca ataaaaagcg tggtggtcat
cagctgaaat gttaaaaacc 240 gcccgttctg gtgaaagaac tgaggcggtt tttttatt
278 14 105 DNA E. Coli 14 agggcaaggc aactaagcct gcattaatgc
caacttttag cgcacggctc tctcccaaga 60 gccatttccc tggaccgaat
acaggaatcg tgttcggtct ctttt 105 15 144 DNA E. Coli 15 agtgagggtt
agggagaggt ttccccctcc ccctggtgtt cttagtaagc ctggaagcta 60
atcactaaga gtatcaccag tatgatgacg tgcttcatca taaccctttc cttattaaaa
120 gccctcttct ccgggagagg cttt 144 16 137 DNA E. Coli 16 agtgagggta
gagcggggtt tcccccgccc tggtagtctt agtaagcggg gaagcttatg 60
actaagagca ccacgatgat gagtagcttc atcatgaccc tttccttatt tatggcccct
120 tcctcgggag gggcttt 137 17 112 DNA E. Coli 17 aggaacaagg
gtaagggagg atttctcccc cctctgattg gctgttaata agctgcgaaa 60
cttacgagta acaacacaat cagtatgatg acgagcttca tcataaccct tt 112 18
139 DNA E. Coli 18 cagggcaata tctctcttgc aggtgaatgc aacgtcaagc
gatgggcgtt gcgctccata 60 ttgtcttact tccttttttg aattactgca
tagcacaatt gattcgtacg acgccgactt 120 tgatgagtcg gcttttttt 139 19
155 DNA E. Coli 19 tagagtaaag gaacaagggt aagggaggat ttctcccccc
tctgattggc tgttaataag 60 ctgcgaaact tacgagtaac aacacaatca
gtatgatgac gagcttcatc ataacccttt 120 ccttctgtaa ggcccccttc
ttcgggaggg gcttt 155 20 128 DNA E. Coli 20 cataggggca atgataaaag
gtggcaaaaa tgaatgtttc cagtagaact gtagtactga 60 taaatttctt
tgctgctgtt ggtttgttta ctcttatctc tatgagattt ggctggttta 120 tttgatgt
128 21 31 PRT E. Coli 21 Met Asn Val Ser Ser Arg Thr Val Val Leu
Ile Asn Phe Phe Ala Ala 1 5 10 15 Val Gly Leu Phe Thr Leu Ile Ser
Met Arg Phe Gly Trp Phe Ile 20 25 30 22 84 DNA E. Coli 22
ataattataa gagaggttgt tatgattgaa cgtgaactgg ggaactggaa agactttatc
60 gaagttatgc ttcgtaagta attc 84 23 19 PRT E. Coli 23 Met Ile Glu
Arg Glu Leu Gly Asn Trp Lys Asp Phe Ile Glu Val Met 1 5 10 15 Leu
Arg Lys 24 180 DNA E. Coli 24 aaaggagacg cttatgtttc gttggggcat
catatttctg gttatcgcgt taatcgccgc 60 cgcacttggg tttggtggtc
tggccggtac cgctgcaggc gcagctaaaa ttgtctttgt 120 cgtcgggatt
attctgttcc tggtgagttt gttcatgggc cgaaaacgac cctagatttc 180 25 53
PRT E. Coli 25 Met Phe Arg Trp Gly Ile Ile Phe Leu Val Ile Ala Leu
Ile Ala Ala 1 5 10 15 Ala Leu Gly Phe Gly Gly Leu Ala Gly Thr Ala
Ala Gly Ala Ala Lys 20 25 30 Ile Val Phe Val Val Gly Ile Ile Leu
Phe Leu Val Ser Leu Phe Met 35 40 45 Gly Arg Lys Arg Pro 50 26 226
DNA E. Coli 26 atacggagat atcatcatgg gcaaattagg tgaaaacgtt
ccgcttctta tcgataaagc 60 cgtagatttc atggcatcaa gccaggcgtt
ccgggagtat ctgaaaaaac ttcctccccg 120 taacgcgatt ccgtccggaa
tacccgatga aagcgtgccg ttatatctac aacgtctgga 180 gtattatcgt
cggctttatc ggccgaagca ggtagagggg cagtaa 226 27 69 PRT E. Coli 27
Met Gly Lys Leu Gly Glu Asn Val Pro Leu Leu Ile Asp Lys Ala Val 1 5
10 15 Asp Phe Met Ala Ser Ser Gln Ala Phe Arg Glu Tyr Leu Lys Lys
Leu 20 25 30 Pro Pro Arg Asn Ala Ile Pro Ser Gly Ile Pro Asp Glu
Ser Val Pro 35 40 45 Leu Tyr Leu Gln Arg Leu Glu Tyr Tyr Arg Arg
Leu Tyr Arg Pro Lys 50 55 60 Gln Val Glu Gly Gln 65 28 189 DNA E.
Coli 28 gagtagttaa catgaagcgg agtagaacgg aagtggggcg ctggcgcatg
cagcgtcagg 60 ctagccgacg taaatcgcgt tggcttgagg ggcaatcgcg
ccgaaatatg cgtatccaca 120 gcatcaggaa gtgcattcta aacaaacagc
gtaactcgtt attgtttgcg atctacaata 180 tctaaatgt 189 29 57 PRT E.
Coli 29 Met Lys Arg Ser Arg Thr Glu Val Gly Arg Trp Arg Met Gln Arg
Gln 1 5 10 15 Ala Ser Arg Arg Lys Ser Arg Trp Leu Glu Gly Gln Ser
Arg Arg Asn 20 25 30 Met Arg Ile His Ser Ile Arg Lys Cys Ile Leu
Asn Lys Gln Arg Asn 35 40 45 Ser Leu Leu Phe Ala Ile Tyr Asn Ile 50
55 30 117 DNA E. Coli 30 aacggaggca aataatgctg ggtaatatga
atgtttttat ggccgtactg ggaataattt 60 tattttctgg ttttctggcc
gcgtatttca gccacaaatg ggatgactaa tgaacgg 117 31 31 PRT E. Coli 31
Met Leu Gly Asn Met Asn Val Phe Met Ala Val Leu Gly Ile Ile Leu 1 5
10 15 Phe Ser Gly Phe Leu Ala Ala Tyr Phe Ser His Lys Trp Asp Asp
20 25 30 32 25 DNA Artificial Sequence Oligonucleotide 32
gcgcctcgtt atcatccaaa atacg 25 33 25 DNA Artificial Sequence
Oligonucleotide 33 gtcgcccagc caatgctttc agtcg 25 34 25 DNA
Artificial Sequence Oligonucleotide 34 attgatcgca cacctgacag ctgcc
25 35 25 DNA Artificial Sequence Oligonucleotide 35 gttgtcaccc
tggacctggt cgtac 25 36 23 DNA Artificial Sequence Oligonucleotide
36 tgaccgcgat ttgcacaaaa tgc 23 37 27 DNA Artificial Sequence
Oligonucleotide 37 actcttaaat ttcctatcaa aactcgc 27 38 27 DNA
Artificial Sequence Oligonucleotide 38 ggtattttca gagattatga
attgccg 27 39 25 DNA Artificial Sequence Oligonucleotide 39
tcacctctcc ttcgagcgct actgg 25 40 26 DNA Artificial Sequence
Oligonucleotide 40 aatgctctcc tgataatgtt aaactt 26 41 26 DNA
Artificial Sequence Oligonucleotide 41 ggttagctcc gaagcaaaag ccggat
26 42 25 DNA Artificial Sequence Oligonucleotide 42 taattccttt
caaatgaaac ggagc 25 43 24 DNA Artificial Sequence Oligonucleotide
43 ggactccctc attataatta ctgg 24 44 27 DNA Artificial Sequence
Oligonucleotide 44 ctccttaaac aaggacatta gtctacg 27 45 27 DNA
Artificial Sequence Oligonucleotide 45 attcacctta cctaatttga
ttcttcc 27 46 27 DNA Artificial Sequence Oligonucleotide 46
ccatcgcttg acgttgcatt cacctgc 27 47 26 DNA Artificial Sequence
Oligonucleotide 47 gtcggcgtcg tacgaatcaa ttgtgc 26 48 26 DNA
Artificial Sequence Oligonucleotide 48 gcacaattga ttcgtacgac gccgac
26 49 25 DNA Artificial Sequence Oligonucleotide 49 taaggataat
attgcagatc gtaag 25 50 22 DNA Artificial Sequence Oligonucleotide
50 atcatcaaac agcaacttgc cc 22 51 26 DNA Artificial Sequence
Oligonucleotide 51 tgtccttctc ctgcaagaga attatt 26 52 26 DNA
Artificial Sequence Oligonucleotide 52 gctaataata atgtcttttt cgctcc
26 53 31 DNA Artificial Sequence Oligonucleotide 53 gcttttgtga
attaatttgt atatcgaagc g 31 54 28 DNA Artificial Sequence
Oligonucleotide 54 tattaatacc ctctagattg agttaatc 28 55 27 DNA
Artificial Sequence Oligonucleotide 55 cgatttacct cacttcatcg
ctttcag 27 56 26 DNA Artificial Sequence Oligonucleotide 56
tgatcctgac ttaatgccgc aagttc 26 57 39 DNA Artificial Sequence
Oligonucleotide 57 gcttatctcc ggcactctca gtggcttagc tcttgaagg 39 58
28 DNA Artificial Sequence Oligonucleotide 58 ttgctcacat ctcactttaa
tcgtgctc 28 59 34 DNA Artificial Sequence Oligonucleotide 59
atattccacc agctatttgt tagtgaataa aagg 34 60 30 DNA Artificial
Sequence Oligonucleotide 60 tgattaattt cgattatttt tcccggatgg 30 61
28 DNA Artificial Sequence Oligonucleotide 61 attagaaaca ggaagcccct
cagtcgag 28 62 30 DNA Artificial Sequence Oligonucleotide 62
ttattttccc cggaagcaca ttcacttcac 30 63 24 DNA Artificial Sequence
Oligonucleotide 63 tgatctattg cacaacgagg aagc 24 64 30 DNA
Artificial Sequence Oligonucleotide 64 tgcttactca tcaaaagtag
cgccagattc 30 65 27 DNA Artificial Sequence Oligonucleotide 65
taatcgacgg acgatagata attcctg 27 66 28 DNA Artificial Sequence
Oligonucleotide 66 ccaatgtgtc gcctttttca actttccg 28 67 33 DNA
Artificial Sequence Oligonucleotide 67 cgatttatga gaataaatac
tcatttaagg gtg 33 68 25 DNA Artificial Sequence Oligonucleotide 68
aaatccgact ttagttacaa catac 25 69 26 DNA Artificial Sequence
Oligonucleotide 69 gaccagacct tcttgatgat gggcac 26 70 24 DNA
Artificial Sequence Oligonucleotide 70 cgacctcaat tccacgggat ctgg
24 71 25 DNA Artificial Sequence Oligonucleotide 71 atttagctgt
agtaatcact cgccg 25 72 23 DNA Artificial Sequence Oligonucleotide
72 ggtctcctta gcgccttatt gcg 23 73 27 DNA Artificial Sequence
Oligonucleotide 73 cgcccacatg ctgttcttat tattccc 27 74 24 DNA
Artificial Sequence Oligonucleotide 74 tttatgacac ctgccactgc cgtc
24 75 30 DNA Artificial Sequence Oligonucleotide 75 ctgtcaagtt
atctgtttgt taagtcaagc 30 76 26 DNA Artificial Sequence
Oligonucleotide 76 gctgtgaagc acctgcgttg ctcatg 26 77 30 DNA
Artificial Sequence Oligonucleotide 77 gctgtgaaac acctgcattt
acggccacgg 30 78 30 DNA Artificial Sequence Oligonucleotide 78
ccgtggccgt aaatgcaggt gtttcacagc 30 79 23 DNA Artificial Sequence
Oligonucleotide 79 cctttcgcaa ttgactgaaa cac 23 80 19 DNA
Artificial Sequence Oligonucleotide 80 ggctagaccg gggtgcgcg 19 81
24 DNA Artificial Sequence Oligonucleotide 81 aaggtggtta tttacacctt
agcg 24 82 21 DNA Artificial Sequence Oligonucleotide 82 gtcctctttg
gggtaaatgt c 21 83 22 DNA Artificial Sequence Oligonucleotide 83
aatgctccgg tttcatgtca tc 22 84 20 DNA Artificial Sequence
Oligonucleotide 84 tagttccttc tcacccggag 20 85 27 DNA Artificial
Sequence Oligonucleotide 85 cacaagggcg ctttagtttg ttttccg 27 86 28
DNA Artificial Sequence Oligonucleotide 86 atcccctgag agtttaattt
tcgtcaag 28 87 23 DNA Artificial Sequence Oligonucleotide 87
taattcgtcg taattcgtcc tcc 23 88 24 DNA Artificial Sequence
Oligonucleotide 88 ctctgccttc ctgtttttgt tgtg 24 89 30 DNA
Artificial Sequence Oligonucleotide 89 aaacgcattt gcaactgtcg
gcgcttttcc 30 90 29 DNA Artificial Sequence Oligonucleotide 90
cttgttacct caaaaaatca cagtgctcg 29 91 27 DNA Artificial Sequence
Oligonucleotide 91 gcagtcggtg atgctggatt tgccctg 27 92 32 DNA
Artificial Sequence Oligonucleotide 92 gtttttttac gggtaagccg
caacgaccat tg 32 93 22 DNA Artificial Sequence Oligonucleotide 93
tagtagataa gttttagata ac 22 94 25 DNA Artificial Sequence
Oligonucleotide 94 taaaactgaa gttgccctga aaatg 25 95 22 DNA
Artificial Sequence Oligonucleotide 95 tgatgagtgg ttctgcaaga gg 22
96 23 DNA Artificial Sequence Oligonucleotide 96 taaaagacag
attacctggc ctg 23 97 31 DNA Artificial Sequence Oligonucleotide 97
cggactacct caaaataaag ctttatatac g 31 98 31 DNA Artificial Sequence
Oligonucleotide 98 gtcatgatac cttgattaaa aaacaaacag c 31 99 26 DNA
Artificial Sequence Oligonucleotide 99 ggctataatg cgcacataac ctcttg
26 100 33 DNA Artificial Sequence Oligonucleotide 100 aatcttttct
tattttttgg ctaacgaata gcc 33 101 31 DNA Artificial Sequence
Oligonucleotide 101 gtccaacttt ttggggtcag tacaaacttt g 31 102 28
DNA Artificial Sequence Oligonucleotide 102 taataacgcc gttattaaat
agcctgcc 28 103 26 DNA Artificial Sequence Oligonucleotide 103
taagcaacgt ctgcttactg cccctc 26 104 33 DNA Artificial Sequence
Oligonucleotide 104 gtgatggctt ctgataaaga taaatttata gcc 33 105 18
DNA Artificial Sequence Oligonucleotide 105 taacaggcta agaggggc 18
106 28 DNA Artificial Sequence Oligonucleotide 106 attgccactc
ttcttgatca aataaccg 28 107 29 DNA Artificial Sequence
Oligonucleotide 107 aatgcgtctg ttgataattc aaattagtc 29 108 28 DNA
Artificial Sequence Oligonucleotide 108 tagccgtttt attcagtata
gatttgcg 28 109 23 DNA Artificial Sequence Oligonucleotide 109
gttcgtcggt aacccgtttc agc 23 110 20 DNA Artificial Sequence
Oligonucleotide 110 atggcttaaa gagaggtgcc 20 111 22 DNA Artificial
Sequence Oligonucleotide 111 cgtactttaa agggagaatg ac 22 112 22 DNA
Artificial Sequence Oligonucleotide 112 gtgcttcctc attatggtga cg 22
113 20 DNA Artificial Sequence
Oligonucleotide 113 gaatggaggg agattacacg 20 114 21 DNA Artificial
Sequence Oligonucleotide 114 ccttagtggg taaacgctta c 21 115 21 DNA
Artificial Sequence Oligonucleotide 115 ctttcaggca gctaaggaaa g 21
116 29 DNA Artificial Sequence Oligonucleotide 116 caatatgtat
tattgattga gtaaacggg 29 117 20 DNA Artificial Sequence
Oligonucleotide 117 cctcttccag gaataatccc 20 118 20 DNA Artificial
Sequence Oligonucleotide 118 cggaaagcgg ttcacagatc 20 119 23 DNA
Artificial Sequence Oligonucleotide 119 ctcgtaagtt tcgcagctta tta
23 120 20 DNA Artificial Sequence Oligonucleotide 120 tgaaattcct
gtccgacagg 20 121 21 DNA Artificial Sequence Oligonucleotide 121
gcactaccgc aatgttattg c 21 122 24 DNA Artificial Sequence
Oligonucleotide 122 gcttacccaa taaatagtta cacg 24 123 23 DNA
Artificial Sequence Oligonucleotide 123 taaaacctgt cacaaatcac aaa
23 124 21 DNA Artificial Sequence Oligonucleotide 124 gtggcctgct
tcaaactttc g 21 125 23 DNA Artificial Sequence Oligonucleotide 125
gtaaagtcta gcctggcggt tcg 23 126 31 DNA Artificial Sequence
Oligonucleotide 126 taattctggt acgcctggca gatattttgc c 31 127 24
DNA Artificial Sequence Oligonucleotide 127 atcaacctca aaagggaaat
cggg 24 128 23 DNA Artificial Sequence Oligonucleotide 128
taacttgttg taagccggat cgg 23 129 23 DNA Artificial Sequence
Oligonucleotide 129 tgaagcatct atcgccggtt gcg 23 130 28 DNA
Artificial Sequence Oligonucleotide 130 gattagaaat ccttttgaaa
gcgcattg 28 131 20 DNA Artificial Sequence Oligonucleotide 131
cttattgggc accgcaatgg 20 132 27 DNA Artificial Sequence
Oligonucleotide 132 cgaacacaat aaagatttaa ttcagcc 27 133 20 DNA
Artificial Sequence Oligonucleotide 133 ctgatgctac tgtgtcaacg 20
134 22 DNA Artificial Sequence Oligonucleotide 134 aataatcaga
catagcttag gc 22 135 21 DNA Artificial Sequence Oligonucleotide 135
gccgtgatgg ttttcgcgtt c 21 136 21 DNA Artificial Sequence
Oligonucleotide 136 tattttcctc ccgcgctaaa g 21 137 23 DNA
Artificial Sequence Oligonucleotide 137 ttcagctgat gaccaccacg ctt
23 138 23 DNA Artificial Sequence Oligonucleotide 138 gagttgtcag
agcaggatga ttc 23 139 22 DNA Artificial Sequence Oligonucleotide
139 tatctgcgct tatcctttat gg 22 140 23 DNA Artificial Sequence
Oligonucleotide 140 cctttacggt gataaccgtc gcg 23 141 25 DNA
Artificial Sequence Oligonucleotide 141 ctgacaagcc tctcattctc ttgtc
25 142 23 DNA Artificial Sequence Oligonucleotide 142 gagaattatc
gaggtccggt atc 23 143 22 DNA Artificial Sequence Oligonucleotide
143 ctacgcgtta gcgatagact gc 22 144 31 DNA Artificial Sequence
Oligonucleotide 144 aggcttacta agaacaccag ggggagggga a 31 145 30
DNA Artificial Sequence Oligonucleotide 145 agtcataagc ttccccgctt
actaagacta 30 146 22 DNA Artificial Sequence Oligonucleotide 146
cctcaaatcg gccataataa cc 22 147 22 DNA Artificial Sequence
Oligonucleotide 147 taaacaccgt cgtcagaaat gc 22 148 25 DNA
Artificial Sequence Oligonucleotide 148 tagactttta tccactttat tgctg
25 149 25 DNA Artificial Sequence Oligonucleotide 149 gtgtgccttt
cggcgatatg gcgtg 25 150 25 DNA Artificial Sequence Oligonucleotide
150 cctttacgtg ggcggtgatt ttgtc 25 151 24 DNA Artificial Sequence
Oligonucleotide 151 tagctttgct cctggatgtt tgcc 24 152 30 DNA
Artificial Sequence Oligonucleotide 152 gctgtaattt attcagcgtt
tgtacatacg 30 153 26 DNA Artificial Sequence Oligonucleotide 153
tcagtcaact cgctgcggcg tgttac 26 154 28 DNA Artificial Sequence
Oligonucleotide 154 cttattgttg cttagttagg gtagtcac 28 155 26 DNA
Artificial Sequence Oligonucleotide 155 cagtcagtct caggggagga
gcaatc 26 156 30 DNA Artificial Sequence Oligonucleotide 156
tgaatgcaca ataaaaaaat cccgaccctg 30 157 25 DNA Artificial Sequence
Oligonucleotide 157 agtcgcgcag tactcctctt accag 25 158 24 DNA
Artificial Sequence Oligonucleotide 158 taatttctca tcaggcggct ctgc
24 159 24 DNA Artificial Sequence Oligonucleotide 159 taacattatc
agcctgctga cggc 24 160 28 DNA Artificial Sequence Oligonucleotide
160 ggccgaattc gtagggtaca gaggtaag 28 161 30 DNA Artificial
Sequence Oligonucleotide 161 ggccggatcc gtcattactg actggggcgg
30
* * * * *
References