U.S. patent application number 10/205220 was filed with the patent office on 2003-09-11 for nucleotide sequence of the mycoplasma genitalium genome, fragments thereof, and uses thereof.
This patent application is currently assigned to Johns Hopkins University. Invention is credited to Adams, Mark D., Fraser, Claire M., Gocayne, Jeannine D., Hutchison, Clyde A. III, Smith, Hamilton O., Venter, J. Craig, White, Owen R..
Application Number | 20030170663 10/205220 |
Document ID | / |
Family ID | 27413253 |
Filed Date | 2003-09-11 |
United States Patent
Application |
20030170663 |
Kind Code |
A1 |
Fraser, Claire M. ; et
al. |
September 11, 2003 |
Nucleotide sequence of the Mycoplasma genitalium genome, fragments
thereof, and uses thereof
Abstract
The present invention provides the nucleotide sequence of the
entire genome of Mycoplasma genitalium, SEQ ID NO: 1. The present
invention further provides the sequence information stored on
computer readable media, and computer-based systems and methods
which facilitate its use. In addition to the entire genomic
sequence, the present invention identifies protein encoding
fragments of the genome, and identifies, by position relative to
two (2) genes known to flank the origin of replication, any
regulatory elements which modulate the expression of the protein
encoding fragments of the Mycoplasma genitalium genome.
Inventors: |
Fraser, Claire M.; (Potomac,
MD) ; Adams, Mark D.; (Rockville, MD) ;
Gocayne, Jeannine D.; (Potomac, MD) ; Hutchison,
Clyde A. III; (Chapel Hill, MD) ; Smith, Hamilton
O.; (Reisterstown, MD) ; Venter, J. Craig;
(Queenstown, MD) ; White, Owen R.; (Rockville,
MD) |
Correspondence
Address: |
HUMAN GENOME SCIENCES INC
9410 KEY WEST AVENUE
ROCKVILLE
MD
20850
|
Assignee: |
Johns Hopkins University
Baltimore
MD
|
Family ID: |
27413253 |
Appl. No.: |
10/205220 |
Filed: |
July 26, 2002 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10205220 |
Jul 26, 2002 |
|
|
|
08545528 |
Oct 19, 1995 |
|
|
|
08545528 |
Oct 19, 1995 |
|
|
|
08488018 |
Jun 7, 1995 |
|
|
|
08545528 |
Oct 19, 1995 |
|
|
|
08473545 |
Jun 7, 1995 |
|
|
|
Current U.S.
Class: |
435/6.12 ;
435/183; 435/252.3; 435/320.1; 435/6.15; 435/69.1; 536/23.7 |
Current CPC
Class: |
C07K 14/30 20130101;
A61K 38/00 20130101 |
Class at
Publication: |
435/6 ; 435/69.1;
435/183; 435/252.3; 435/320.1; 536/23.7 |
International
Class: |
C12Q 001/68; C07H
021/04; C12N 009/00; C12N 001/21; C12P 021/02 |
Goverment Interests
[0002] Part of the work performed during development of this
invention utilized U.S. Government funds. The U.S. Government may
have certain right in the invention--DE-FC02-95ER61962.A000;
NP-838C; NIH-AI08998, AI33161, and HL19171.
Claims
What is claimed is:
1. An isolated polynucleotide comprising the nucleotide sequence of
any one of the fragments of SEQ ID NO: 1 depicted in Tables 1a, 1c
and 2, or a degenerate variant thereof.
2. An isolated polynucleotide complementary to the polynucleotide
of claim 1.
3. The isolated polynucleotide of claim 1, wherein said
polynucleotide comprises a heterologous nucleic acid sequence.
4. A method for making a recombinant vector comprising inserting
the isolated polynucleotide of claim 1 into a vector.
5. A recombinant vector comprising the isolated polynucleotide of
claim 1.
6. The recombinant vector of claim 5, wherein said polynucleotide
is operably associated with a heterologous regulatory sequence that
controls gene expression.
7. A recombinant host cell comprising the isolated polynucleotide
of claim 1.
8. The recombinant host cell of claim 7, wherein said
polynucleotide is operably associated with a heterologous
regulatory sequence that controls gene expression.
9. An isolated polynucleotide comprising a nucleic acid sequence
which hybridizes under hybridization conditions comprising
hybridization in 5.times.SSC and 50% formamide at 50.degree. C. and
washing in a wash buffer consisting of 0.5.times. SSC at 65.degree.
C., to the complementary strand of any one of the fragments of SEQ
ID NO: 1 depicted in Tables 1a, 1c and 2, or a degenerate variant
thereof.
10. An isolated polynucleotide complementary to the polynucleotide
of claim 9.
11. The isolated polynucleotide of claim 9, wherein said
polynucleotide comprises a heterologous nucleic acid sequence.
12. A recombinant vector comprising the isolated polynucleotide of
claim 9.
13. A recombinant host cell comprising the isolated polynucleotide
of claim 9.
14. An isolated polynucleotide comprising at least 50 contiguous
nucleotides of any one of the fragments of SEQ ID NO: 1 depicted in
Tables 1a, 1c and 2, or a degenerate variant thereof.
15. The isolated polynucleotide of claim 14, wherein said
polynucleotide comprises at least 100 contiguous nucleotides of any
one of the fragments of SEQ ID NO: 1 depicted in Tables 1a, 1c and
2, or a degenerate variant thereof.
16. An isolated polynucleotide complementary to the polynucleotide
of claim 14.
17. The isolated polynucleotide of claim 14, wherein said
polynucleotide comprises a heterologous nucleic acid sequence.
18. A recombinant vector comprising the isolated polynucleotide of
claim 14.
19. A recombinant host cell comprising the isolated polynucleotide
of claim 14.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a divisional of and claims priority
under 35 U.S.C. .sctn. 120 to U.S. application Ser. No. 08/545,528,
filed Oct. 19, 1995, which is a continuation-in-part of and claims
priority under 35 U.S.C. .sctn. 120 to U.S. application Ser. Nos.
08/488,018 and 08/473,545, both filed Jun. 7, 1995. U.S.
application Ser. Nos. 08/488,018 and 08/473,545 are each hereby
incorporated herein by reference.
REFERENCE TO SEQUENCE LISTING
[0003] This application refers to a "Sequence Listing" listed
below, which is provided as an electronic document on two identical
compact discs (CD-R), labeled "Copy 1" and "Copy 2." These compact
discs each contain the file "PB196P1D1.ST25.txt" (735,244 bytes,
created on Jun. 24, 2002), which is hereby incorporated in its
entirety herein.
FIELD OF THE INVENTION
[0004] The present invention relates to the field of molecular
biology. The invention discloses compositions comprising the
nucleotide sequence of Mycoplasma genitalium, fragments thereof,
and its use in medical diagnostics, therapies and pharmaceutical
development.
BACKGROUND OF THE INVENTION
[0005] Mycoplasmas are the smallest free-living bacterial organisms
known (Colman, S. D. et al., Mol. Microbiol. 4:683-687 (1990)).
Mycoplasmas are thought to have evolved from higher gram-positive
bacteria through the loss of genetic material (Bailey, C. C. et
al., J. Bacteriol. 176:5814-5819 (1994)). Mycoplasma genitalium (M.
genitalium) is widely considered to be the smallest
self-replicating biological system, as the molecular size of its
genome has been shown to be only 570-600 kp (Pyle, L. E. et al.,
Nucleic Acids Res. 16(13):6015-6025 (1988); Peterson, S. N. et al.,
J. Bacteriol. 175:7918-7930 (1993)). All mycoplasmas lack a cell
wall and have small genomes and a characteristically low G+C
content (Razin, S., Microbiol. Rev. 49(4):419-455 (1985); Peterson,
S. N. et al., J. Bacteriol. 175:7918-7930 (1993)). Some
mycoplasmas, including M. genitalium, have a specialized codon
usage, whereby UGA encodes tryptophan rather than serving as a stop
codon (Inamine, J. M. et al., J. Bacteriol. 172:504-506 (1990);
Tanaka, J. G. et al., Nucleic Acids Res. 19:6787-6792 (1991);
Yamao, F. A. et al., Proc. Natl. Acad. Sci. USA 82:2306-2309
(1985)).
[0006] Mycoplasmas are widely known to be significant pathogens of
humans, animals, and plants (Bailey, C. C. et al., J. Bacteriol.
176:5814-5819 (1994)). The metabolic systems of mycoplasmas
indicate that they are generally biosynthetically deficient, and
thus depend on the microenvironment of the host by
characteristically adhering to host cells in order to obtain
essential precursor molecules, i.e., amino acids, fatty acids and
sterols etc. (Baseman, J. B., 1987. Mycoplasma Cell Membranes, Vol.
20. The Plenum Press, New York, N.Y.).
[0007] In particular, M. genitalium, a newly discovered species, is
a pathogenic etiological agent first isolated in 1980 from the
urethras of human males infected with non-gonococcal urethritis
(Tully, J. G. et al., Lancet 1:1288-1291 (1981); Tully, J. G., et
al., Int. J. Syst. Bacteriol. 33:387-396 (1983)). M. genitalium has
also been identified in specimens of pneumonia patients as a
co-isolate of Mycoplasma pneumoniae (Baseman, J. B. et al., J.
Clin. Microbiol. 26:2266-2269 (1988)). M. genitalium opportunistic
infection has often been observed in individuals infected with
human immunodeficiency virus type 1 (HIV-1) (Lo, S. -C. et al.,
Amer. J. Trop. Med. Hyg. 41:601-616 (1989); Lo, S. -C. et al.,
Amer. J. Trop. Med. Hyg. 41:601-616 (1989); Sasaki, Y. et al., AIDS
Res. Hum. Retrov. 9(8):775-780 (1993)). Mycoplasmas can also induce
various cytokines, including tumor necrosis factor, which may
enhance HIV replication (Chowdhury, I. H. et al., Biochem. Biophys.
Res. Commun. 170:1365-1370 (1990)).
[0008] A high amino acid homology exists between the attachment
protein of M. genitalium and the aligned proteins of several human
Class II major histocompatibility complex proteins (HLA),
suggesting that M. genitalium infection may play an important role
in triggering autoimmune mechanisms, thereby aggravating the
immunodeficiency characteristics of acquired immune deficiency
syndrome (AIDS) (Montagnier, L. et al., C.R. Acad. Sci. Paris
311(3):425-430 (1990); Root-Bernstein, R. S. et al., Res. Immunol.
142:519-523 (1991); Bisset, L. R. Autoimmunity 14:167-168 (1992)).
A diagnostic immunoassay for detecting M. genitalium infection
using monoclonal antibodies specific for some M. genitalium
antigens has been developed. Baseman, J. B. et al., U.S. Pat. No.
5,158,870.
[0009] Due to its diminutive genomic size, M. genitalium provides a
useful model for determining the minimum number of genes and
protein products necessary for a host-independent existence. M.
genitalium expresses a characteristically low number of base-pairs
and low G+C content, which along with its UGA tryptophan codon, has
hampered sequencing efforts by conventional techniques (Razin, A.,
Microbiol. Rev. 49(4):419-455 (1985); Colman, S. D. et al., Gene
87:91-96 (1990); Dybvig, K. 1992. Gene Transfer In: Maniloff, J.
(ed.) Mycoplasmas: Molecular Biology and Pathogenesis., Am. Soc.
Microbiol. Washington, D.C., pp.355-362)). M. genitalium possesses
a single circular chromosome (Colman, S. D. et al., Gene 87:91-96
(1990); Peterson, S. N. et al., J. Bacteriol. 175:7918-7930
(1993)). The characterization of the genome of M. genitalium has
also been hampered by the lack of auxotrophic mutants and by the
lack of a system for genetic exchange, precluding reverse genetic
approaches. Thus, the sequencing of the M. genitalium genome would
enhance the understanding of how M. genitalium causes or promotes
various invasive or immunodeficiency diseases and to how best to
medically combat mycoplasma infection.
[0010] Prior attempts at characterizing the structure and gene
arrangement of the chromosomes of mycoplasmas using pulsed-field
gel electrophoretic methods (Pyle, L. E. et al., Nucleic Acids Res.
16(13):6015-6025 (1988); Neimark, H. C. et al., Nucleic Acids Res.
18(18):5443-5448 (1990)), indicated that mycoplasmas have genomes
ranging widely in size. Southern blot hybridization of digested
DNAs of M. genitalium compared to the well-known human pathogen, M.
pneumoniae, indicated overall low homology values of approximately
6-8% (Yogev, D. et al., Int. J. Syst. Bacteriol. 36(3):426-430
(1986)). However, high homologies have been reported between the
adhesin genes of M. genitalium and M. pneumoniae (Dallo, S. F. et
al., Microbial Path. 6:69-73 (1989)). Initial studies at
characterizing the genome of M. genitalium by comparison to the
well-known M. pneumoniae species, indicated that both species have
three (3) rRNA genes clustered together in a chromosomal segment of
about 5 kb and form a single operon organized in classical
procaryotic fashion, but differences exist between their respective
restriction sites (Yogev, D. et al., Int. J. Syst. Bacteriol.
36(3):426-430 (1986)).
[0011] Restriction enzyme mapping of M. genitalium indicates that
the genome is approximately 600 kb. Several genes have also been
mapped, including the single ribosomal operon, and the gene
encoding the MgPa cytadhesion protein (Su, C. J. et al., J.
Bacteriol. 172:4705-4707 (1990); Colman, S. D. et al., Mol.
Microbiol. 4(4):683-687 (1990)). The entire restriction map of the
genome of M. genitalium has also been cloned in an ordered library
of 20 overlapping cosmids and one .lambda. clone (Lucier, T. S. et
al., Gene 150:27-34 (1994)).
[0012] An initial study using random sequencing techniques to
characterize the M. genitalium genome resulted in forty-four (44)
random clones being partially sequenced; several long open reading
frames were also found (Peterson, S. N. et al., Nucleic Acids Rev.
19:6027-6031 (1991)). Subsequent work using random sequencing of
508 random nonidentical clones has allowed sequence information to
be compiled for approximately seventeen percent (17%) (100,993
nucleotides) of the M. genitalium genome (Peterson, S. N. et al.,
J. Bacteriol. 175:7918-7930 (1993)). Sequence information indicates
that the diminutive genome of M. genitalium contains numerous genes
involved in various metabolic processes. The genome is estimated to
encode approximately 390 proteins, indicating that M. genitalium
makes very efficient use of its limited amount of DNA (Peterson, S.
N. et al., J. Bacteriol. 175:7918-7930 (1993)).
[0013] Several studies have been undertaken to sequence and
characterize individual genes identified in M. genitalium. In
particular, the medically important aspects of M. genitalium have
helped to direct interest to those genes which determine the degree
of infectivity and the virulence characteristics of the organism.
The nucleotide sequence and deduced amino acid sequence for the
MgPa adhesin gene, i.e., the gene encoding the surface cytadhesion
protein of M. genitalium, indicates that the complete gene contains
4,335 nucleotides coding for a protein of 159,668 Da. (Dallo, S. F.
et al., Infect. Immun. 57(4):1059-1065 (1989)). Furthermore,
subsequent nucleotide sequencing of the M. genitalium MgPa adhesin
gene revealed the specific codon order for this important gene
(Inamine, J. M. et al., Gene 82:259-267 (1989)). The MgPa adhesin
gene also has been shown to express restriction fragment length
polymorphism (Dallo, S. F. et al., Microbial Path. 10:475-480
(1991)). Nucleotide homology to the well-known highly conserved
procaryotic origin-of-replication gene (gyrA) was noted for M.
genitalium (Bailey, C. C. et al., J. Bacteriol. 176:5814-5819
(1994)). The highly conserved procaryotic elongation factor, Tu,
encoded by the tuf gene, has been noted and sequenced for M.
genitalium, and was found to contain an open reading frame encoding
a protein of approximately 393 amino acids (Loechel, S. et al.,
Nucleic Acids Res. 17(23):10127 (1989)). The tuf gene of M.
genitalium has also been determined to use a signal other than a
Shine-Dalgamo (ribosomal binding site) sequence preceding the
initiation codon (Loechel, S. et al., Nucleic Acids Res.
19:6905-6911 (1991)).
SUMMARY OF THE INVENTION
[0014] The present invention is based on the sequencing of the
Mycoplasma genitalium genome. The primary nucleotide sequence which
was generated is provided in SEQ ID NO: 1.
[0015] The present invention provides the generated nucleotide
sequence of the Mycoplasma genitalium genome, or a representative
fragment thereof, in a form which can be readily used, analyzed,
and interpreted by a skilled artisan. In one embodiment, present
invention is provided as a contiguous string of primary sequence
information corresponding to the nucleotide sequence depicted in
SEQ ID NO: 1.
[0016] The present invention further provides nucleotide sequences
which are at least 99.9% identical to the nucleotide sequence of
SEQ ID NO: 1.
[0017] The nucleotide sequence of SEQ ID NO: 1, a representative
fragment thereof, or a nucleotide sequence which is at least 99.9%
identical to the nucleotide sequence of SEQ ID NO: 1 may be
provided in a variety of mediums to facilitate its use. In one
application of this embodiment, the sequences of the present
invention are recorded on computer readable media. Such media
includes, but is not limited to: magnetic storage media, such as
floppy discs, hard disc storage medium, and magnetic tape; optical
storage media such as CD-ROM; electrical storage media such as RAM
and ROM; and hybrids of these categories such as magnetic/optical
storage media.
[0018] The present invention further provides systems, particularly
computer-based systems which contain the sequence information
herein described stored in a data storage means. Such systems are
designed to identify commercially important fragments of the
Mycoplasma genitalium genome.
[0019] Another embodiment of the present invention is directed to
isolated fragments of the Mycoplasma genitalium genome. The
fragments of the Mycoplasma genitalium genome of the present
invention include, but are not limited to, fragments which encode
peptides, hereinafter open reading frames (ORFs), fragments which
modulate the expression of an operably linked ORF, hereinafter
expression modulating fragments (EMFs), fragments which mediate the
uptake of a linked DNA fragment into a cell, hereinafter uptake
modulating fragments (UMFs), and fragments which can be used to
diagnose the presence of Mycoplasma genitalium in a sample,
hereinafter, diagnostic fragments (DFs).
[0020] Each of the ORF fragments of the Mycoplasma genitalium
genome disclosed in Tables 1(a), 1(c) and 2, and the EMF found 5'
to the ORF, can be used in numerous ways as polynucleotide
reagents. The sequences can be used as diagnostic probes or
diagnostic amplification primers for the presence of a specific
microbe in a sample, for the production of commercially important
pharmaceutical agents, and to selectively control gene
expression.
[0021] The present invention further includes recombinant
constructs comprising one or more fragments of the Mycoplasma
genitalium genome of the present invention. The recombinant
constructs of the present invention comprise vectors, such as a
plasmid or viral vector, into which a fragment of the Mycoplasma
genitalium has been inserted.
[0022] The present invention further provides host cells containing
any one of the isolated fragments of the Mycoplasma genitalium
genome of the present invention. The host cells can be a higher
eukaryotic host such as a mammalian cell, a lower eukaryotic cell
such as a yeast cell, or can be a procaryotic cell such as a
bacterial cell.
[0023] The present invention is further directed to isolated
proteins encoded by the ORFs of the present invention. A variety of
methodologies known in the art can be utilized to obtain any one of
the proteins of the present invention. At the simplest level, the
amino acid sequence can be synthesized using commercially available
peptide synthesizers. In an alternative method, the protein is
purified from bacterial cells which naturally produce the protein.
Lastly, the proteins of the present invention can alternatively be
purified from cells which have been altered to express the desired
protein.
[0024] The invention further provides methods of obtaining homologs
of the fragments of the Mycoplasma genitalium genome of the present
invention and homologs of the proteins encoded by the ORFs of the
present invention. Specifically, by using the nucleotide and amino
acid sequences disclosed herein as a probe or as primers, and
techniques such as PCR cloning and colony/plaque hybridization, one
skilled in the art can obtain homologs.
[0025] The invention further provides antibodies which selectively
bind one of the proteins of the present invention. Such antibodies
include both monoclonal and polyclonal antibodies.
[0026] The invention further provides hybridomas which produce the
above-described antibodies. A hybridoma is an immortalized cell
line which is capable of secreting a specific monoclonal
antibody.
[0027] The present invention further provides methods of
identifying test samples derived from cells which express one of
the ORF of the present invention, or homolog thereof. Such methods
comprise incubating a test sample with one or more of the
antibodies of the present invention, or one or more of the DFs of
the present invention, under conditions which allow a skilled
artisan to determine if the sample contains the ORF or product
produced therefrom.
[0028] In another embodiment of the present invention, kits are
provided which contain the necessary reagents to carry out the
above-described assays.
[0029] Specifically, the invention provides a compartmentalized kit
to receive, in close confinement, one or more containers which
comprises: (a) a first container comprising one of the antibodies,
or one of the DFs of the present invention; and (b) one or more
other containers comprising one or more of the following: wash
reagents, reagents capable of detecting presence of bound
antibodies or hybridized DFs.
[0030] Using the isolated proteins of the present invention, the
present invention further provides methods of obtaining and
identifying agents capable of binding to a protein encoded by one
of the ORFs of the present invention. Specifically, such agents
include antibodies (described above), peptides, carbohydrates,
pharmaceutical agents and the like. Such methods comprise the steps
of:
[0031] (a) contacting an agent with an isolated protein encoded by
one of the ORFs of the present invention; and
[0032] (b) determining whether the agent binds to said protein.
[0033] The complete genomic sequence of M. genitalium will be of
great value to all laboratories working with this organism and for
a variety of commercial purposes. Many fragments of the Mycoplasma
genitalium genome will be immediately identified by similarity
searches against GenBank or protein databases and will be of
immediate value to Mycoplasma researchers and for immediate
commercial value for the production of proteins or to control gene
expression. A specific example concerns PHA synthase. It has been
reported that polyhydroxybutyrate is present in the membranes of M.
genitalium and that the amount correlates with the level of
competence for transformation. The PHA synthase that synthesizes
this polymer has been identified and sequenced in a number of
bacteria, none of which are evolutionarily close to M. genitalium.
This gene has yet to be isolated from M. genitalium by use of
hybridization probes or PCR techniques. However, the genomic
sequence of the present invention allows the identification of the
gene by utilizing search means described below.
[0034] Developing the methodology and technology for elucidating
the entire genomic sequence of bacterial and other small genomes
has and will greatly enhance the ability to analyze and understand
chromosomal organization. In particular, sequenced genomes will
provide the models for developing tools for the analysis of
chromosome structure and function, including the ability to
identify genes within large segments of genomic DNA, the structure,
position, and spacing of regulatory elements, the identification of
genes with potential industrial applications, and the ability to do
comparative genomic and molecular phylogeny.
BRIEF DESCRIPTION OF THE FIGURES
[0035] FIG. 1--EcoRI restriction map of the Mycoplasma genitalium
genome.
[0036] FIG. 2--Block diagram of a computer system 102 that can be
used to implement the computer-based systems of present
invention.
[0037] FIG. 3--Summary of the Mycoplasma genitalium sequencing
project.
[0038] FIG. 4--A circular representation of the M. genitalium
chromosome. Outer concentric circle: Coding regions on the plus
strand for which a gene identification was made. Second concentric
circle: Coding regions on the minus strand for which a gene
identification was made. Third concentric circle: The direction of
transcription on each strand of the chromosome is depicted as an
arrow starting at the putative origin of replication. Fourth
concentric circle: Coverage by cosmid and lambda clones. Nineteen
cosmid clones and one lambda clone were sequenced from each end to
confirm the overall structure of the genome. Fifth concentric
circle: The locations of the single ribosomal operon and the 33
tRNAs. The clusters of tRNAs (trnA, trnB, trnC, trnD and trnE) are
indicated by the letters A-E with the number of tRNAs in each
cluster listed in parentheses. Sixth concentric circle: Location of
the MgPa operon and MgPa repeat fragments.
[0039] FIGS. 5A-5R--Gene map of the M. genitalium genome. Predicted
coding regions are shown on each strand. The rRNA operon and tRNA
genes are shown as described in the Figure key. Gene identification
numbers correspond to those in Table 6.
[0040] FIG. 6--Location of the MgPa repeats in the M. genitalium
genome. The structure of the MgPa operon (ORF1-MgPa gene-ORF3) in
the M. genitalium genome is illustrated across the top. In addition
to the complete operon, nine repetitive elements which are
composites of particular regions of the MgPa operon were found. The
coordinates of each repeat in the genome are indicated on the left
and right end of each line. The repetitive elements are located
directly below those regions in the operon for which there is
sequence similarity. The percent of sequence identity between the
repeat elements and the MgPa gene ranges from 78%-90%. In some of
the repeats, the MgPa-related sequences are separated in the genome
by a variable length, A-T rich spacer sequence (indicated in the
figure by a line with the length of the spacer indicated in bp). In
cases where no spacer sequence is shown, the composites of the
operon are co-linear in the genome. In repeats 7 and 9, the order
of the sequences in the repeats differs from that in the operon. In
these cases, the order of the elements in each repeat in the genome
is indicated numerically where element 1 is followed by element 2
which is followed by element 3, etc.
DETAILED DESCRIPTION
[0041] The present invention is based on the sequencing of the
Mycoplasma genitalium genome. The primary nucleotide sequence which
was generated is provided in SEQ ID NO: 1. As used herein, the
"primary sequence" refers to the nucleotide sequence represented by
the IUPAC nomenclature system.
[0042] The sequence provided in SEQ ID NO: 1 is oriented relative
to two genes (DNAA and DNA gyrase) known to flank the origin of
replication of the Mycoplasma genitalium genome. A skilled artisan
will readily recognize that this start/stop point was chosen for
convenience and does not reflect a structural significance.
[0043] The present invention provides the nucleotide sequence of
SEQ ID NO: 1, or a representative fragment thereof, in a form which
can be readily used, analyzed, and interpreted by a skilled
artisan. In one embodiment, the sequence is provided as a
contiguous string of primary sequence information corresponding to
the nucleotide sequence provided in SEQ ID NO: 1.
[0044] As used herein, a "representative fragment of the nucleotide
sequence depicted in SEQ ID NO: 1" refers to any portion of SEQ ID
NO: 1 which is not presently represented within a publicly
available database. Preferred representative fragments of the
present invention are Mycoplasma genitalium open reading frames,
expression modulating fragments, uptake modulating fragments, and
fragments which can be used to diagnose the presence of Mycoplasma
genitalium in sample. A non-limiting identification of such
preferred representative fragments is provided in Tables 1(a), 1(c)
and 2.
[0045] The nucleotide sequence information provided in SEQ ID NO: 1
was obtained by sequencing the Mycoplasma genitalium genome using a
megabase shotgun sequencing method. The nucleotide sequence
provided in SEQ ID NO: 1 is a highly accurate, although not
necessarily a 100% perfect, representation of the nucleotide
sequence of the Mycoplasma genitalium genome.
[0046] As discussed in detail below, using the information provided
in SEQ ID NO: 1 and in Tables 1(a), 1(c) and 2 together with
routine cloning and sequencing methods, one of ordinary skill in
the art would be able to clone and sequence all "representative
fragments" of interest including open reading frames (ORFs)
encoding a large variety of Mycoplasma genitalium proteins. In very
rare instances, this may reveal a nucleotide sequence error present
in the nucleotide sequence disclosed in SEQ ID NO: 1. Thus, once
the present invention is made available (i.e., once the information
in SEQ ID NO: 1 and Tables 1(a), 1(c) and 2 have been made
available), resolving a rare sequencing error in SEQ ID NO: 1 would
be well within the skill of the art. Nucleotide sequence editing
software is publicly available. For example, Applied Biosystem's
(AB) AutoAssembler.TM. can be used as an aid during visual
inspection of nucleotide sequences.
[0047] Even if all of the very rare sequencing errors in SEQ ID NO:
1 were corrected, the resulting nucleotide sequence would still be
at least 99.9% identical to the nucleotide sequence in SEQ ID NO:
1.
[0048] The nucleotide sequences of the genomes from different
strains of Mycoplasma genitalium differ slightly. However, the
nucleotide sequence of the genomes of all Mycoplasma genitalium
strains will be at least 99.9% identical to the nucleotide sequence
provided in SEQ ID NO: 1.
[0049] Thus, the present invention further provides nucleotide
sequences which are at least 99.9% identical to the nucleotide
sequence of SEQ ID NO: 1 in a form which can be readily used,
analyzed and interpreted by the skilled artisan. Methods for
determining whether a nucleotide sequence is at least 99.9%
identical to the nucleotide sequence of SEQ ID NO: 1 are routine
and readily available to the skilled artisan. For example, the well
known fasta algorithm (Pearson and Lipman, Proc. Natl. Acad. Sci.
USA 85:2444 (1988)) can be used to generate the percent identity of
nucleotide sequences.
[0050] Computer Related Embodiments
[0051] The nucleotide sequence provided in SEQ ID NO: 1, a
representative fragment thereof, or a nucleotide sequence at least
99.9% identical to SEQ ID NO: 1 may be "provided" in a variety of
mediums to facilitate use thereof. As used herein, provided refers
to a manufacture, other than an isolated nucleic acid molecule,
which contains a nucleotide sequence of the present invention,
i.e., the nucleotide sequence provided in SEQ ID NO: 1, a
representative fragment thereof, or a nucleotide sequence at least
99.9% identical to SEQ ID NO: 1. Such a manufacture provides the
Mycoplasma genitalium genome or a subset thereof (e.g., a
Mycoplasma genitalium open reading frame (ORF)) in a form which
allows a skilled artisan to examine the manufacture using means not
directly applicable to examining the Mycoplasma genitalium genome
or a subset thereof as it exists in nature or in purified form.
[0052] In one application of this embodiment, a nucleotide sequence
of the present invention can be recorded on computer readable
media. As used herein, "computer readable media" refers to any
medium which can be read and accessed directly by a computer. Such
media include, but are not limited to: magnetic storage media, such
as floppy discs, hard disc storage medium, and magnetic tape;
optical storage media such as CD-ROM; electrical storage media such
as RAM and ROM; and hybrids of these categories such as
magnetic/optical storage media. A skilled artisan can readily
appreciate how any of the presently known computer readable mediums
can be used to create a manufacture comprising computer readable
medium having recorded thereon a nucleotide sequence of the present
invention.
[0053] As used herein, "recorded" refers to a process for storing
information on computer readable medium. A skilled artisan can
readily adopt any of the presently know methods for recording
information on computer readable medium to generate manufactures
comprising the nucleotide sequence information of the present
invention.
[0054] A variety of data storage structures are available to a
skilled artisan for creating a computer readable medium having
recorded thereon a nucleotide sequence of the present invention.
The choice of the data storage structure will generally be based on
the means chosen to access the stored information. In addition, a
variety of data processor programs and formats can be used to store
the nucleotide sequence information of the present invention on
computer readable medium. The sequence information can be
represented in a word processing text file, formatted in
commercially-available software such as WordPerfect and Microsoft
Word, or represented in the form of an ASCII file, stored in a
database application, such as DB2, Sybase, Oracle, or the like. A
skilled artisan can readily adapt any number of data processor
structuring formats (e.g. text file or database) in order to obtain
computer readable medium having recorded thereon the nucleotide
sequence information of the present invention.
[0055] By providing the nucleotide sequence of SEQ ID NO: 1, a
representative fragment thereof, or a nucleotide sequence at least
99.9% identical to SEQ ID NO: 1 in computer readable form, a
skilled artisan can routinely access the sequence information for a
variety of purposes. Computer software is publicly available which
allows a skilled artisan to access sequence information provided in
a computer readable medium. The examples which follow demonstrate
how software which implements the BLAST (Altschul et al., J. Mol.
Biol. 215:403-410 (1990)) and BLAZE (Brutlag et al., Comp. Chem.
17:203-207 (1993)) search algorithms on a Sybase system was used to
identify open reading frames (ORFs) within the Mycoplasma
genitalium genome which contain homology to ORFs or proteins from
other organisms. Such ORFs are protein encoding fragments within
the Mycoplasma genitalium genome and are useful in producing
commercially important proteins such as enzymes used in
fermentation reactions and in the production of commercially useful
metabolites.
[0056] The present invention further provides systems, particularly
computer-based systems, which contain the sequence information
described herein. Such systems are designed to identify
commercially important fragments of the Mycoplasma genitalium
genome.
[0057] As used herein, "a computer-based system" refers to the
hardware means, software means, and data storage means used to
analyze the nucleotide sequence information of the present
invention. The minimum hardware means of the computer-based systems
of the present invention comprises a central processing unit (CPU),
input means, output means, and data storage means. A skilled
artisan can readily appreciate that any one of the currently
available computer-based system are suitable for use in the present
invention.
[0058] As stated above, the computer-based systems of the present
invention comprise a data storage means having stored therein a
nucleotide sequence of the present invention and the necessary
hardware means and software means for supporting and implementing a
search means. As used herein, "data storage means" refers to memory
which can store nucleotide sequence information of the present
invention, or a memory access means which can access manufactures
having recorded thereon the nucleotide sequence information of the
present invention.
[0059] As used herein, "search means" refers to one or more
programs which are implemented on the computer-based system to
compare a target sequence or target structural motif with the
sequence information stored within the data storage means. Search
means are used to identify fragments or regions of the Mycoplasma
genitalium genome which match a particular target sequence or
target motif. A variety of known algorithms are disclosed publicly
and a variety of commercially available software for conducting
search means are available and can be used in the computer-based
systems of the present invention. Examples of such software
includes, but is not limited to, MacPattern (EMBL), BLASTN and
BLASTX (NCBIA). A skilled artisan can readily recognize that any
one of the available algorithms or implementing software packages
for conducting homology searches can be adapted for use in the
present computer-based systems.
[0060] As used herein, a "target sequence" can be any DNA or amino
acid sequence of six or more nucleotides or two or more amino
acids. A skilled artisan can readily recognize that the longer a
target sequence is, the less likely a target sequence will be
present as a random occurrence in the database. The most preferred
sequence length of a target sequence is from about 10 to 100 amino
acids or from about 30 to 300 nucleotide residues. However, it is
well recognized that searches for commercially important fragments
of the Mycoplasma genitalium genome, such as sequence fragments
involved in gene expression and protein processing, may be of
shorter length.
[0061] As used herein, "a target structural motif," or "target
motif," refers to any rationally selected sequence or combination
of sequences in which the sequence(s) are chosen based on a
three-dimensional configuration which is formed upon the folding of
the target motif. There are a variety of target motifs known in the
art. Protein target motifs include, but are not limited to,
enzymatic active sites and signal sequences. Nucleic acid target
motifs include, but are not limited to, promoter sequences, hairpin
structures and inducible expression elements (protein binding
sequences).
[0062] A variety of structural formats for the input and output
means can be used to input and output the information in the
computer-based systems of the present invention. A preferred format
for an output means ranks fragments of the Mycoplasma genitalium
genome possessing varying degrees of homology to the target
sequence or target motif. Such presentation provides a skilled
artisan with a ranking of sequences which contain various amounts
of the target sequence or target motif and identifies the degree of
homology contained in the identified fragment.
[0063] A variety of comparing means can be used to compare a target
sequence or target motif with the data storage means to identify
sequence fragments of the Mycoplasma genitalium genome. In the
present examples, implementing software which implement the BLAST
and BLAZE algorithms (Altschul et al., J. Mol. Biol. 215:403-410
(1990)) was used to identify open reading frames within the
Mycoplasma genitalium genome. A skilled artisan can readily
recognize that any one of the publicly available homology search
programs can be used as the search means for the computer-based
systems of the present invention.
[0064] One application of this embodiment is provided in FIG. 2.
FIG. 2 provides a block diagram of a computer system 102 that can
be used to implement the present invention. The computer system 102
includes a processor 106 connected to a bus 104. Also connected to
the bus 104 are a main memory 108 (preferably implemented as random
access memory, RAM) and a variety of secondary storage devices 110,
such as a hard drive 112 and a removable medium storage device 114.
The removable medium storage device 114 may represent, for example,
a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A
removable storage medium 116 (such as a floppy disk, a compact
disk, a magnetic tape, etc.) containing control logic and/or data
recorded therein may be inserted into the removable medium storage
device 114. The computer system 102 includes appropriate software
for reading the control logic and/or the data from the removable
medium storage device 114 once inserted in the removable medium
storage device 114.
[0065] A nucleotide sequence of the present invention may be stored
in a well known manner in the main memory 108, any of the secondary
storage devices 110, and/or a removable storage medium 116.
Software for accessing and processing the genomic sequence (such as
search tools, comparing tools, etc.) reside in main memory 108
during execution.
[0066] Biochemical Embodiments
[0067] Another embodiment of the present invention is directed to
isolated fragments of the Mycoplasma genitalium genome. The
fragments of the Mycoplasma genitalium genome of the present
invention include, but are not limited to fragments which encode
peptides, hereinafter open reading frames (ORFs), fragments which
modulate the expression of an operably linked ORF, hereinafter
expression modulating fragments (EMFs), fragments which mediate the
uptake of a linked DNA fragment into a cell, hereinafter uptake
modulating fragments (UMFs), and fragments which can be used to
diagnose the presence of Mycoplasma genitalium in a sample,
hereinafter diagnostic fragments (DFs).
[0068] As used herein, an "isolated nucleic acid molecule" or an
"isolated fragment of the Mycoplasma genitalium genome" refers to a
nucleic acid molecule possessing a specific nucleotide sequence
which has been subjected to purification means to reduce, from the
composition, the number of compounds which are normally associated
with the composition. A variety of purification means can be used
to generated the isolated fragments of the present invention. These
include, but are not limited to methods which separate constituents
of a solution based on charge, solubility, or size.
[0069] In one embodiment, Mycoplasma genitalium DNA can be
mechanically sheared to produce fragments of 15-20 kb in length.
These fragments can then be used to generate an Mycoplasma
genitalium library by inserting them into lambda clones as
described in the Examples below. Primers flanking, for example, an
ORF provided in Table 1(a), 1(c) or 2 can then be generated using
nucleotide sequence information provided in SEQ ID NO: 1. PCR
cloning can then be used to isolate the ORF from the lambda DNA
library. PCR cloning is well known in the art. Thus, given the
availability of SEQ ID NO: 1, Table 1(a), 1(c) and Table 2, it
would be routine to isolate any ORF or other representative
fragment of the present invention.
[0070] The isolated nucleic acid molecules of the present invention
include, but are not limited to single stranded and double stranded
DNA, and single stranded RNA.
[0071] As used herein, an "open reading frame," ORF, means a series
of triplets coding for amino acids without any termination codons
and is a sequence translatable into protein. Tables 1(a), 1(b),
1(c) and 2 identify ORFs in the Mycoplasma genitalium genome. In
particular, Table 1(a) indicates the location of ORFs (i.e., the
addresses) within the Mycoplasma genitalium genome which encode the
recited protein based on homology matching with protein sequences
from the organism appearing in parentheticals (see the fifth column
of Table 1(a)).
[0072] The first column of Table 1(a) provides the "UID" (an
arbitrary identification number) of a particular ORF. The second
and third columns in Table 1(a) indicate an ORFs position in the
nucleotide sequence provided in SEQ ID NO: 1. One of ordinary skill
in the art will recognize that ORFs may be oriented in opposite
directions in the Mycoplasma genitalium genome. This is reflected
in columns 2 and 3.
[0073] The fourth column of Table 1(a) provides the accession
number of the database match for the ORF. As indicated above, the
fifth column of Table 1(a) provides the name of the database match
for the ORF.
[0074] The sixth column of Table 1(a) indicates the percent
identity of the protein encoded for by an ORF to the corresponding
protein from the organism appearing in parentheticals in the fifth
column. The seventh column of Table 1(a) indicates the percent
similarity of the protein encoded for by an ORF to the
corresponding protein from the organism appearing in parentheticals
in the fifth column. The concepts of percent identity and percent
similarity of two polypeptide sequences are well understood in the
art. For example, two polypeptides 10 amino acids in length which
differ at three amino acid positions (e.g., at positions 1, 3 and
5) are said to have a percent identity of 70%. However, the same
two polypeptides would be deemed to have a percent similarity of
80% if, for example at position 5, the amino acids moieties,
although not identical, were "similar" (i.e., possessed similar
biochemical characteristics). The eighth column in Table 1(a)
indicates the length of the ORF in nucleotides.
[0075] Table 1(b) is a list of ORFs that have database matches to
previously published Mycoplasma genitalium sequences over the full
length of the ORF. The table headings for Table 1(b) are identical
for Table 1(a) with the following two exceptions: (II) The heading
for the eighth column in Table 1(a) (i.e., nucleotide length of the
ORF) has been replaced with the following in Table 1(b):
"Match_info". "Match_info" refers to the coordinates of the match
of the ORF and the previously published Mycoplasma genitalium
sequence. For example, "MG002 (1-930 of 930) GB:U09251 (298-1227 of
6140)," indicates that for ORF MG002, which is 930 nucleotides in
length, there is a database match to accession number GB:U09251,
which has a total length of 6140 nucleotides. The ORF matches this
accession from position 298 to 1227.
[0076] (II) Where an ORF shows homology matches for both a
previously published Mycoplasma genitalium sequence and a
previously published sequence from a different organism, columns 3,
4, 5, and 6 of Table 1(b) respectively provide the accession
number, protein name (and organism in parentheticals), percent
identity and percent similarity for the "other organism," rather
than for the previously published Mycoplasma genitalium sequence.
(However, in this scenario, the accession number for the Mycoplasma
genitalium sequence is still provided in column 8.)
[0077] Table 1(c) provides ORFs having database matches to
previously published Mycoplasma genitalium sequences but only over
a portion of the ORF. The table headings are the same as above for
Table 1(b).
[0078] In Tables 1(a), 1(b) and 1(c), unique identifiers are used
to identify the recited ORFs, (e.g., "MG123"). In the parent U.S.
application Ser. Nos. 08/488,018 and 08/473,545, the recited ORFs
are identified using the "MORF" identifier. Table 1(d) lists which
of the new and old identifiers refer to the same ORF. For example,
the first entry in Table 1(d) indicates that the ORF identified as
MG001 in the current application is the same ORF which was
previously identified as MORF-20072 in parent U.S. application Ser.
Nos. 08/488,018 and 08/473,545. Similarly, the third entry in Table
1(d) indicates that the ORF identified as MG003 in the current
application is the same ORF which was previously identified as
MORF-19818 and MORF-20073 in the parent applications.
[0079] Table 2 provides ORFs of the Mycoplasma genitalium genome
which did not elicit a "homology match" with a known sequence from
either M. genitalium or another organism.
[0080] Table 6 classifies each ORF according to its role category
(adapted from Riley, M., Microbiol. Rev. 57:862 (1992)). The gene
identification, the accession number from public archives that
corresponds to the best match, the percent amino acid identity, and
the length of the match in amino acids is also listed for each
entry as above in Tables 1 (a-c). Those genes in M. genitalium that
also match a gene in H. influenzae are indicated by an asterisk
(*). For the purposes of Tables 6 and 7 and FIG. 4, each of the
MgPa repetitive elements has been assigned an MG number, even
though there is evidence to suggest that these repeats may not be
transcribed.
[0081] Table 7 sorts the gene content in H. influenzae and M.
genitalium by functional category. The number of genes in each
category is listed for each organism. The number in parentheses
indicates the percent of the putatively identified genes devoted to
each functional category. For the category of unassigned genes, the
percent of the genome indicated in parentheses represents the
percent of the total number of putative coding regions.
[0082] Further details concerning the algorithms and criteria used
for homology searches are provided in the Examples below.
[0083] A skilled artisan can readily identify ORFs in the
Mycoplasma genitalium genome other than those listed in Tables
1(a), 1(b), 1(c) and 2, such as ORFs which are overlapping or
encoded by the opposite strand of an identified ORF in addition to
those ascertainable using the computer-based systems of the present
invention.
[0084] As used herein, an "expression modulating fragment," EMF,
means a series of nucleotide molecules which modulates the
expression of an operably linked ORF or EMF.
[0085] As used herein, a sequence is said to "modulate the
expression of an operably linked sequence" when the expression of
the sequence is altered by the presence of the EMF. EMFs include,
but are not limited to, promoters, and promoter modulating
sequences (inducible elements). One class of EMFs are fragments
which induce the expression or an operably linked ORF in response
to a specific regulatory factor or physiological event. A review of
known EMFs from Mycoplasma are described by (Tomb et al. Gene
104:1-10 (1991), Chandler, M. S., Proc. Natl. Acad. Sci. USA
89:1626-1630 (1992).
[0086] EMF sequences can be identified within the Mycoplasma
genitalium genome by their proximity to the ORFs provided in Tables
1(a), 1(b), 1(c) and 2. An intergenic segment, or a fragment of the
intergenic segment, from about 10 to 200 nucleotides in length,
taken 5' from any one of the ORFs of Tables 1(a), 1(b), 1(c) or 2
will modulate the expression of an operably linked 3' ORF in a
fashion similar to that found with the naturally linked ORF
sequence. As used herein, an "intergenic segment" refers to the
fragments of the Mycoplasma genome which are between two ORF(s)
herein described. Alternatively, EMFs can be identified using known
EMFs as a target sequence or target motif in the computer-based
systems of the present invention.
[0087] The presence and activity of an EMF can be confirmed using
an EMF trap vector. An EMF trap vector contains a cloning site 5'
to a marker sequence. A marker sequence encodes an identifiable
phenotype, such as antibiotic resistance or a complementing
nutrition auxotrophic factor, which can be identified or assayed
when the EMF trap vector is placed within an appropriate host under
appropriate conditions. As described above, an EMF will modulate
the expression of an operably linked marker sequence. A more
detailed discussion of various marker sequences is provided
below.
[0088] A sequence which is suspected as being an EMF is cloned in
all three reading frames in one or more restriction sites upstream
from the marker sequence in the EMF trap vector. The vector is then
transformed into an appropriate host using known procedures and the
phenotype of the transformed host in examined under appropriate
conditions. As described above, an EMF will modulate the expression
of an operably linked marker sequence.
[0089] As used herein, an "uptake modulating fragment," UMF, means
a series of nucleotide molecules which mediate the uptake of a
linked DNA fragment into a cell. UMFs can be readily identified
using known UMFs as a target sequence or target motif with the
computer-based systems described above.
[0090] The presence and activity of a UMF can be confirmed by
attaching the suspected UMF to a marker sequence. The resulting
nucleic acid molecule is then incubated with an appropriate host
under appropriate conditions and the uptake of the marker sequence
is determined. As described above, a UMF will increase the
frequency of uptake of a linked marker sequence. A review of DNA
uptake in Mycoplasma is provided by Goodgall, S. H., et al., J.
Bact. 172:5924-5928 (1990).
[0091] As used herein, a "diagnostic fragment," DF, means a series
of nucleotide molecules which selectively hybridize to Mycoplasma
genitalium sequences. DFs can be readily identified by identifying
unique sequences within the Mycoplasma genitalium genome, or by
generating and testing probes or amplification primers consisting
of the DF sequence in an appropriate diagnostic format which
determines amplification or hybridization selectivity.
[0092] The sequences falling within the scope of the present
invention are not limited to the specific sequences herein
described, but also include allelic and species variations thereof.
Allelic and species variations can be routinely determined by
comparing the sequence provided in SEQ ID NO: 1, a representative
fragment thereof, or a nucleotide sequence at least 99.9% identical
to SEQ ID NO: 1 with a sequence from another isolate of the same
species. Furthermore, to accommodate codon variability, the
invention includes nucleic acid molecules coding for the same amino
acid sequences as do the specific ORFs disclosed herein. In other
words, in the coding region of an ORF, substitution of one codon
for another which encodes the same amino acid is expressly
contemplated.
[0093] Any specific sequence disclosed herein can be readily
screened for errors by resequencing a particular fragment, such as
an ORF, in both directions (i.e., sequence both strands).
Alternatively, error screening can be performed by sequencing
corresponding polynucleotides of Mycoplasma genitalium origin
isolated by using part or all of the fragments in question as a
probe or primer.
[0094] Each of the ORFs of the Mycoplasma genitalium genome
disclosed in Tables 1(a), 1(b), 1(c) and 2, and the EMF found 5' to
the ORF, can be used in numerous ways as polynucleotide reagents.
The sequences can be used as diagnostic probes or diagnostic
amplification primers to detect the presence of a specific microbe,
such as Mycoplasma genitalium, in a sample. This is especially the
case with the fragments or ORFs of Table 2, which will be highly
selective for Mycoplasma genitalium.
[0095] In addition, the fragments of the present invention, as
broadly described, can be used to control gene expression through
triple helix formation or antisense DNA or RNA, both of which
methods are based on the binding of a polynucleotide sequence to
DNA or RNA. Polynucleotides suitable for use in these methods are
usually 20 to 40 bases in length and are designed to be
complementary to a region of the gene involved in transcription
(triple helix--see Lee et a., Nucl. Acids Res. 6:3073 (1979);
Cooney et al., Science 241:456 (1988); and Dervan et al., Science
251:1360 (1991)) or to the mRNA itself (antisense--Okano, J.
Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense
Inhibitors of Gene Expression, CRC Press, Boca Raton, Fla.
(1988)).
[0096] Triple helix-formation optimally results in a shut-off of
RNA transcription from DNA, while antisense RNA hybridization
blocks translation of an MRNA molecule into polypeptide. Both
techniques have been demonstrated to be effective in model systems.
Information contained in the sequences of the present invention is
necessary for the design of an antisense or triple helix
oligonucleotide.
[0097] The present invention further provides recombinant
constructs comprising one or more fragments of the Mycoplasma
genitalium genome of the present invention. The recombinant
constructs of the present invention comprise a vector, such as a
plasmid or viral vector, into which a fragment of the Mycoplasma
genitalium has been inserted, in a forward or reverse orientation.
In the case of a vector comprising one of the ORFs of the present
invention, the vector may further comprise regulatory sequences,
including for example, a promoter, operably linked to the ORF. For
vectors comprising the EMFs and UMFs of the present invention, the
vector may further comprise a marker sequence or heterologous ORF
operably linked to the EMF or UMF. Large numbers of suitable
vectors and promoters are known to those of skill in the art and
are commercially available for generating the recombinant
constructs of the present invention. The following vectors are
provided by way of example. Bacterial: pBs, phagescript, PsiX174,
pBluescript SK, pBs KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene);
pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia). Eukaryotic:
pWLneo, pSV2cat, pOG44, pXT1, pSG (Stratagene) pSVK3, pBPV, pMSG,
pSVL (Pharmacia).
[0098] Promoter regions can be selected from any desired gene using
CAT (chloramphenicol transferase) vectors or other vectors with
selectable markers. Two appropriate vectors are pKK232-8 and pCM7.
Particular named bacterial promoters include lacI, lacZ, T3, T7,
gpt, lambda P.sub.R, and trc. Eukaryotic promoters include CMV
immediate early, HSV thymidine kinase, early and late SV40, LTRs
from retrovirus, and mouse metallothionein-I. Selection of the
appropriate vector and promoter is well within the level of
ordinary skill in the art.
[0099] The present invention further provides host cells containing
any one of the isolated fragments of the Mycoplasma genitalium
genome of the present invention, wherein the fragment has been
introduced into the host cell using known transformulation methods.
The host cell can be a higher eukaryotic host cell, such as a
mammalian cell, a lower eukaryotic host cell, such as a yeast cell,
or the host cell can be a procaryotic cell, such as a bacterial
cell. Introduction of the recombinant construct into the host cell
can be effected by calcium phosphate transfection, DEAE, dextran
mediated transfection, or electroporation (Davis, L. et al., Basic
Methods in Molecular Biology (1986)).
[0100] The host cells containing one of the fragments of the
Mycoplasma genitalium genome of the present invention, can be used
in conventional manners to produce the gene product encoded by the
isolated fragment (in the case of an ORF) or can be used to produce
a heterologous protein under the control of the EMF.
[0101] The present invention fuirther provides isolated
polypeptides encoded by the nucleic acid fragments of the present
invention or by degenerate variants of the nucleic acid fragments
of the present invention. By "degenerate variant" is intended
nucleotide fragments which differ from a nucleic acid fragment of
the present invention (e.g., an ORF) by nucleotide sequence but,
due to the degeneracy of the Genetic Code, encode an identical
polypeptide sequence. Preferred nucleic acid fragments of the
present invention are the ORFs depicted in Tables 1(a), 1(c) and
2.
[0102] A variety of methodologies known in the art can be utilized
to obtain any one of the isolated polypeptides or proteins of the
present invention. At the simplest level, the amino acid sequence
can be synthesized using commercially available peptide
synthesizers. This is particularly useful in producing small
peptides and fragments of larger polypeptides. Fragments are
useful, for example, in generating antibodies against the native
polypeptide. In an alternative method, the polypeptide or protein
is purified from bacterial cells which naturally produce the
polypeptide or protein. One skilled in the art can readily follow
known methods for isolating polypeptides and proteins in order to
obtain one of the isolated polypeptides or proteins of the present
invention. These include, but are not limited to,
immunochromatography, HPLC, size-exclusion chromatography,
ion-exchange chromatography, and immuno-affinity
chromatography.
[0103] The polypeptides and proteins of the present invention can
alternatively be purified from cells which have been altered to
express the desired polypeptide or protein. As used herein, a cell
is said to be altered to express a desired polypeptide or protein
when the cell, through genetic manipulation, is made to produce a
polypeptide or protein which it normally does not produce or which
the cell normally produces at a lower level. One skilled in the art
can readily adapt procedures for introducing and expressing either
recombinant or synthetic sequences into eukaryotic or prokaryotic
cells in order to generate a cell which produces one of the
polypeptides or proteins of the present invention.
[0104] Any host/vector system can be used to express one or more of
the ORFs of the present invention. These include, but are not
limited to, eukaryotic hosts such as HeLa cells, Cv-1 cell, COS
cells, and Sf9 cells, as well as prokaryotic host such as E. coli
and B. subtilis. The most preferred cells are those which do not
normally express the particular polypeptide or protein or which
expresses the polypeptide or protein at low natural level.
[0105] "Recombinant," as used herein, means that a polypeptide or
protein is derived from recombinant (e.g., microbial or mammalian)
expression systems. "Microbial" refers to recombinant polypeptides
or proteins made in bacterial or fungal (e.g., yeast) expression
systems. As a product, "recombinant microbial" defines a
polypeptide or protein essentially free of native endogenous
substances and unaccompanied by associated native glycosylation.
Polypeptides or proteins expressed in most bacterial cultures,
e.g., E. coli, will be free of glycosylation modifications;
polypeptides or proteins expressed in yeast will have a
glycosylation pattern different from that expressed in mammalian
cells.
[0106] "Nucleotide sequence" refers to a heteropolymer of
deoxyribonucleotides. Generally, DNA segments encoding the
polypeptides and proteins provided by this invention are assembled
from fragments of the Mycoplasma genitalium genome and short
oligonucleotide linkers, or from a series of oligonucleotides, to
provide a synthetic gene which is capable of being expressed in a
recombinant transcriptional unit comprising regulatory elements
derived from a microbial or viral operon.
[0107] "Recombinant expression vehicle or vector" refers to a
plasmid or phage or virus or vector, for expressing a polypeptide
from a DNA (RNA) sequence. The expression vehicle can comprise a
transcriptional unit comprising an assembly of (1) a genetic
element or elements having a regulatory role in gene expression,
for example, promoters or enhancers, (2) a structural or coding
sequence which is transcribed into mRNA and translated into
protein, and (3) appropriate transcription initiation and
termination sequences. Structural units intended for use in yeast
or eukaryotic expression systems preferably include a leader
sequence enabling extracellular secretion of translated protein by
a host cell. Alternatively, where recombinant protein is expressed
without a leader or transport sequence, it may include an
N-terminal methionine residue. This residue may or may not be
subsequently cleaved from the expressed recombinant protein to
provide a final product.
[0108] "Recombinant expression system" means host cells which have
stably integrated a recombinant transcriptional unit into
chromosomal DNA or carry the recombinant transcriptional unit extra
chromosomally. The cells can be prokaryotic or eukaryotic.
Recombinant expression systems as defined herein will express
heterologous polypeptides or proteins upon induction of the
regulatory elements linked to the DNA segment or synthetic gene to
be expressed.
[0109] Mature proteins can be expressed in mammalian cells, yeast,
bacteria, or other cells under the control of appropriate
promoters. Cell-free translation systems can also be employed to
produce such proteins using RNAs derived from the DNA constructs of
the present invention. Appropriate cloning and expression vectors
for use with prokaryotic and eukaryotic hosts are described by
Sambrook, et al., in Molecular Cloning: A Laboratory Manual, Second
Edition, Cold Spring Harbor, N.Y. (1989), the disclosure of which
is hereby incorporated by reference.
[0110] Generally, recombinant expression vectors will include
origins of replication and selectable markers permitting
transformation of the host cell, e.g., the ampicillin resistance
gene of E. coli and S. cerevisiae TRP1 gene, and a promoter derived
from a highly-expressed gene to direct transcription of a
downstream structural sequence. Such promoters can be derived from
operons encoding glycolytic enzymes such as 3-phosphoglycerate
kinase (PGK), a-factor, acid phosphatase, or heat shock proteins,
among others. The heterologous structural sequence is assembled in
appropriate phase with translation initiation and termination
sequences, and preferably, a leader sequence capable of directing
secretion of translated protein into the periplasmic space or
extracellular medium. Optionally, the heterologous sequence can
encode a fusion protein including an N-terminal identification
peptide imparting desired characteristics, e.g., stabilization or
simplified purification of expressed recombinant product.
[0111] Useful expression vectors for bacterial use are constructed
by inserting a structural DNA sequence encoding a desired protein
together with suitable translation initiation and termination
signals in operable reading phase with a functional promoter. The
vector will comprise one or more phenotypic selectable markers and
an origin of replication to ensure maintenance of the vector and
to, if desirable, provide amplification within the host. Suitable
prokaryotic hosts for transformation include E. coli, Bacillus
subtilis, Salmonella typhimurium and various species within the
genera Pseudomonas, Streptomyces, and Staphylococcus, although
others may, also be employed as a matter of choice.
[0112] As a representative but nonlimiting example, useful
expression vectors for bacterial use can comprise a selectable
marker and bacterial origin of replication derived from
commercially available plasmids comprising genetic elements of the
well known cloning vector pBR322 (ATCC 37017). Such commercial
vectors include, for example, pKK223-3 (Pharmacia Fine Chemicals,
Uppsala, Sweden) and GEM 1 (Promega Biotec, Madison, Wis., USA).
These pBR322 "backbone" sections are combined with an appropriate
promoter and the structural sequence to be expressed.
[0113] Following transformation of a suitable host strain and
growth of the host strain to an appropriate cell density, the
selected promoter is derepressed by appropriate means (e.g.,
temperature shift or chemical induction) and cells are cultured for
an additional period. Cells are typically harvested by
centrifugation, disrupted by physical or chemical means, and the
resulting crude extract retained for further purification.
[0114] Various mammalian cell culture systems can also be employed
to express recombinant protein. Examples of mammalian expression
systems include the COS-7 lines of monkey kidney fibroblasts,
described by Gluzman, Cell 23:175 (1981), and other cell lines
capable of expressing a compatible vector, for example, the C127,
3T3, CHO, HeLa and BHK cell lines. Mammalian expression vectors
will comprise an origin of replication, a suitable promoter and
enhancer, and also any necessary ribosome binding sites,
polyadenylation site, splice donor and acceptor sites,
transcriptional termination sequences, and 5' flanking
nontranscribed sequences. DNA sequences derived from the SV40 viral
genome, for example, SV40 origin, early promoter, enhancer, splice,
and polyadenylation sites may be used to provide the required
nontranscribed genetic elements.
[0115] Recombinant polypeptides and proteins produced in bacterial
culture is usually isolated by initial extraction from cell
pellets, followed by one or more salting-out, aqueous ion exchange
or size exclusion chromatography steps. Protein refolding steps can
be used, as necessary, in completing configuration of the mature
protein. Finally, high performance liquid chromatography (HPLC) can
be employed for final purification steps. Microbial cells employed
in expression of proteins can be disrupted by any convenient
method, including freeze-thaw cycling, sonication, mechanical
disruption, or use of cell lysing agents.
[0116] The present invention further includes isolated
polypeptides, proteins and nucleic acid molecules which are
substantially equivalent to those herein described. As used herein,
substantially equivalent can refer both to nucleic acid and amino
acid sequences, for example a mutant sequence, that varies from a
reference sequence by one or more substitutions, deletions, or
additions, the net effect of which does not result in an adverse
functional dissimilarity between reference and subject sequences.
For purposes of the present invention, sequences having equivalent
biological activity, and equivalent expression characteristics are
considered substantially equivalent. For purposes of determining
equivalence, truncation of the mature sequence should be
disregarded.
[0117] The invention further provides methods of obtaining homologs
from other strains of Mycoplasma genitalium, of the fragments of
the Mycoplasma genitalium genome of the present invention and
homologs of the proteins encoded by the ORFs of the present
invention. As used herein, a sequence or protein of Mycoplasma
genitalium is defined as a homolog of a fragment of the Mycoplasma
genitalium genome or a protein encoded by one of the ORFs of the
present invention, if it shares significant homology to one of the
fragments of the Mycoplasma genitalium genome of the present
invention or a protein encoded by one of the ORFs of the present
invention. Specifically, by using the sequence disclosed herein as
a probe or as primers, and techniques such as PCR cloning and
colony/plaque hybridization, one skilled in the art can obtain
homologs.
[0118] As used herein, two nucleic acid molecules or proteins are
said to "share significant homology" if the two contain regions
which process greater than 85% sequence (amino acid or nucleic
acid) homology.
[0119] Region specific primers or probes derived from the
nucleotide sequence provided in SEQ ID NO: 1 or from a nucleotide
sequence at least 99.9% identical to SEQ ID NO: 1 can be used to
prime DNA synthesis and PCR amplification, as well as to identify
colonies containing cloned DNA encoding a homolog using known
methods (Innis et al., PCR Protocols, Academic Press, San Diego,
Calif. (1990)).
[0120] When using primers derived from SEQ ID NO: 1 or from a
nucleotide sequence at least 99.9% identical to SEQ ID NO: 1, one
skilled in the art will recognize that by employing high stringency
conditions (e.g., annealing at 50-60.degree. C.) only sequences
which are greater than 75% homologous to the primer will be
amplified. By employing lower stringency conditions (e.g.,
annealing at 35-37.degree. C.), sequences which are greater than
40-50% homologous to the primer will also be amplified.
[0121] When using DNA probes derived from SEQ ID NO: 1 or from a
nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 for
colony/plaque hybridization, one skilled in the art will recognize
that by employing high stringency conditions (e.g., hybridizing at
50-65.degree. C. in 5.times. SSC and 50% formamide, and washing at
50-65.degree. C. in 0.5.times. SSC), sequences having regions which
are greater than 90% homologous to the probe can be obtained, and
that by employing lower stringency conditions (e.g., hybridizing at
35-37.degree. C. in 5.times. SSC and 40-45% formamide, and washing
at 42.degree. C. in SSC), sequences having regions which are
greater than 35-45% homologous to the probe will be obtained.
[0122] Any organism can be used as the source for homologs of the
present invention so long as the organism naturally expresses such
a protein or contains genes encoding the same. The most preferred
organism for isolating homologs are bacteria which are closely
related to Mycoplasma genitalium.
[0123] Uses for the Compositions of the Invention
[0124] Each ORF provided in Table 1(a), 1(b) and 1(c) was assigned
to biological role categories adapted from Riley, M., Microbiology
Reviews 57(4):862 (1993)). This allows the skilled artisan to
determine a use for each identified coding sequence. Tables 1(a),
1(b) and 1(c) further provides an identification of the type of
polypeptide which is encoded for by each ORF. As a result, one
skilled in the art can use the polypeptides of the present
invention for commercial, therapeutic and industrial purposes
consistent with the type of putative identification of the
polypeptide.
[0125] Such identifications permit one skilled in the art to use
the Mycoplasma genitalium ORFs in a manner similar to the known
type of sequences for which the identification is made; for
example, to ferment a particular sugar source or to produce a
particular metabolite. (For a review of enzymes used within the
commercial industry, see Biochemical Engineering and Biotechnology
Handbook 2nd, eds. Macmillan Publ. Ltd., NY (1991) and Biocatalysts
in Organic Syntheses, ed. J. Tramper et al., Elsevier Science
Publishers, Amsterdam, The Netherlands (1985)).
[0126] 1. Biosynthetic Enzymes
[0127] Open reading frames encoding proteins involved in mediating
the catalytic reactions involved in intermediary and macromolecular
metabolism, the biosynthesis of small molecules, cellular processes
and other functions includes enzymes involved in the degradation of
the intermediary products of metabolism, enzymes involved in
central intermediary metabolism, enzymes involved in respiration,
both aerobic and anaerobic, enzymes involved in fermentation,
enzymes involved in ATP proton motor force conversion, enzymes
involved in broad regulatory function, enzymes involved in amino
acid synthesis, enzymes involved in nucleotide synthesis, enzymes
involved in cofactor and vitamin synthesis, can be used for
industrial biosynthesis. The various metabolic pathways present in
Mycoplasma can be identified based on absolute nutritional
requirements as well as by examining the various enzymes identified
in Table 1(a), 1(b) and 1(c).
[0128] Identified within the category of intermediary metabolism, a
number of the proteins encoded by the identified ORFs in Tables
1(a), 1(b) and 1(c) are particularly involved in the degradation of
intermediary metabolites as well as non-macromolecular metabolism.
Some of the enzymes identified include amylases, glucose oxidases,
and catalase.
[0129] Proteolytic enzymes are another class of commercially
important enzymes. Proteolytic enzymes find use in a number of
industrial processes including the processing of flax and other
vegetable fibers, in the extraction, clarification and
depectinization of fruit juices, in the extraction of vegetables'
oil and in the maceration of fruits and vegetables to give
unicellular fruits. A detailed review of the proteolytic enzymes
used in the food industry is provided by Rombouts et al., Symbiosis
21:79 (1986) and Voragen et al. in Biocatalyst in Agricultural
Biotechnology, edited J. R. Whitaker et al., American Chemical
Society Symposium Series 389:93 (1989)).
[0130] The metabolism of glucose, galactose, fructose and xylose
are important parts of the primary metabolism of Mycoplasma.
Enzymes involved in the degradation of these sugars can be used in
industrial fermentation. Some of the important sugar transforming
enzymes, from a commercial viewpoint, include sugar isomerases such
as glucose isomerase. Other metabolic enzymes have found commercial
use such as glucose oxidases which produces ketogulonic acid (KGA).
KGA is an intermediate in the commercial production of ascorbic
acid using the Reichstein's procedure (see Krueger et al.,
Biotechnology 6(A), Rhine, H. J. et al., eds., Verlag Press,
Weinheim, Germany (1984)).
[0131] Glucose oxidase (GOD) is commercially available and has been
used in purified form as well as in an immobilized form for the
deoxygenation of beer. See Hartmeir et al., Biotechnology Letters
1:21 (1979). The most important application of GOD is the
industrial scale fermentation of gluconic acid. Market for gluconic
acids which are used in the detergent, textile, leather,
photographic, pharmaceutical, food, feed and concrete industry (see
Bigelis in Gene Manipulations and Fungi, Benett, J. W. et al.,
eds., Academic Press, New York (1985), p. 357). In addition to
industrial applications, GOD has found applications in medicine for
quantitative determination of glucose in body fluids recently in
biotechnology for analyzing syrups from starch and cellulose
hydrosylates. See Owusu et al., Biochem. et Biophysica. Acta.
872:83 (1986).
[0132] The main sweetener used in the world today is sugar which
comes from sugar beets and sugar cane. In the field of industrial
enzymes, the glucose isomerase process shows the largest expansion
in the market today. Initially, soluble enzymes were used and later
immobilized enzymes were developed (Krueger et al., Biotechnology,
The Textbook of Industrial Microbiology, Sinauer Associated
Incorporated, Sunderland, Mass. (1990)). Today, the use of
glucose-produced high fructose syrups is by far the largest
industrial business using immobilized enzymes. A review of the
industrial use of these enzymes is provided by Jorgensen, Starch
40:307 (1988).
[0133] Proteinases, such as alkaline serine proteinases, are used
as detergent additives and thus represent one of the largest
volumes of microbial enzymes used in the industrial sector. Because
of their industrial importance, there is a large body of published
and unpublished information regarding the use of these enzymes in
industrial processes. (See Faultman et al., Acid Proteases
Structure Function and Biology, Tang, J., ed., Plenum Press, New
York (1977) and Godfrey et al., Industrial Enzymes, MacMillan
Publishers, Surrey, UK (1983) and Hepner et al., Report Industrial
Enzymes by 1990, Hel Hepner & Associates, London (1986)).
[0134] Another class of commercially usable proteins of the present
invention are the microbial lipases identified in Tables 1(a), 1(b)
and 1(c) (see Macrae et al., Philosophical Transactions of the
Chiral Society of London 310:227 (1985) and Poserke, Journal of the
American Oil Chemist Society 61:1758 (1984). A major use of lipases
is in the fat and oil industry for the production of neutral
glycerides using lipase catalyzed inter-esterification of readily
available triglycerides. Application of lipases include the use as
a detergent additive to facilitate the removal of fats from fabrics
in the course of the washing procedures.
[0135] The use of enzymes, and in particular microbial enzymes, as
catalyst for key steps in the synthesis of complex organic
molecules is gaining popularity at a great rate. One area of great
interest is the preparation of chiral intermediates. Preparation of
chiral intermediates is of interest to a wide range of synthetic
chemists particularly those scientists involved with the
preparation of new pharmaceuticals, agrochemicals, fragrances and
flavors. (See Davies et al., Recent Advances in the Generation of
Chiral Intermediates Using Enzymes, CRC Press, Boca Raton, Fla.
(1990)). The following reactions catalyzed by enzymes are of
interest to organic chemists: hydrolysis of carboxylic acid esters,
phosphate esters, amides and nitrites, esterification reactions,
transesterification reactions, synthesis of amides, reduction of
alkanones and oxoalkanates, oxidation of alcohols to carbonyl
compounds, oxidation of sulfides to sulfoxides, and carbon bond
forming reactions such as the aldol reaction.
[0136] When considering the use of an enzyme encoded by one of the
ORFs of the present invention for biotransformation and organic
synthesis it is sometimes necessary to consider the respective
advantages and disadvantages of using a microorganism as opposed to
an isolated enzyme. Pros and cons of using a whole cell system on
the one hand or an isolated partially purified enzyme on the other
hand, has been described in detail by Bud et al., Chemistry in
Britain (1987), p. 127.
[0137] Amino transferases, enzymes involved in the biosynthesis and
metabolism of amino acids, are useful in the catalytic production
of amino acids. The advantages of using microbial based enzyme
systems is that the amino transferase enzymes catalyze the
stereo-selective synthesis of only l-amino acids and generally
possess uniformly high catalytic rates. A description of the use of
amino transferases for amino acid production is provided by
Roselle-David, Methods of Enzymology 136:479 (1987).
[0138] 2. Generation of Antibodies
[0139] As described here, the proteins of the present invention, as
well as homologs thereof, can be used in a variety procedures and
methods known in the art which are currently applied to other
proteins. The proteins of the present invention can further be used
to generate an antibody which selectively binds the protein. Such
antibodies can be either monoclonal or polyclonal antibodies, as
well fragments of these antibodies, and humanized forms.
[0140] The invention further provides antibodies which selectively
bind to one of the proteins of the present invention and hybridomas
which produce these antibodies. A hybridoma is an immortalized cell
line which is capable of secreting a specific monoclonal
antibody.
[0141] In general, techniques for preparing polyclonal and
monoclonal antibodies as well as hybridomas capable of producing
the desired antibody are well known in the art (Campbell, A. M.,
Monoclonal Antibody Technology: Laboratory Techniques in
Biochemistry and Molecular Biology, Elsevier Science Publishers,
Amsterdam, The Netherlands (1984); St. Groth et al., J. Immunol.
Methods 35:1-21 (1980); Kohler and Milstein, Nature 256:495-497
(1975)), the trioma technique, the human B-cell hybridoma technique
(Kozbor et al., Immunology Today 4:72 (1983); Cole et al., in
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc.
(1985), pp. 77-96).
[0142] Any animal (mouse, rabbit, etc.) which is known to produce
antibodies can be immunized with the pseudogene polypeptide.
Methods for immunization are well known in the art. Such methods
include subcutaneous or interperitoneal injection of the
polypeptide. One skilled in the art will recognize that the amount
of the protein encoded by the ORF of the present invention used for
immunization will vary based on the animal which is immunized, the
antigenicity of the peptide and the site of injection.
[0143] The protein which is used as an immunogen may be modified or
administered in an adjuvant in order to increase the protein's
antigenicity. Methods of increasing the antigenicity of a protein
are well known in the art and include, but are not limited to
coupling the antigen with a heterologous protein (such as globulin
or .beta.-galactosidase) or through the inclusion of an adjuvant
during immunization.
[0144] For monoclonal antibodies, spleen cells from the immunized
animals are removed, fused with myeloma cells, such as SP2/0-Ag14
.myeloma cells, and allowed to become monoclonal antibody producing
hybridoma cells.
[0145] Any one of a number of methods well known in the art can be
used to identify the hybridoma cell which produces an antibody with
the desired characteristics. These include screening the hybridomas
with an ELISA assay, western blot analysis, or radioimmunoassay
(Lutz et al., Exp. Cell Res. 175:109-124 (1988)).
[0146] Hybridomas secreting the desired antibodies are cloned and
the class and subclass is determined using procedures known in the
art (Campbell, A. M., Monoclonal Antibody Technology: Laboratory
Techniques in Biochemistry and Molecular Biology, Elsevier Science
Publishers, Amsterdam, The Netherlands (1984)).
[0147] Techniques described for the production of single chain
antibodies (U.S. Pat. No. 4,946,778) can be adapted to produce
single chain antibodies to proteins of the present invention.
[0148] For polyclonal antibodies, antibody containing antisera is
isolated from the immunized animal and is screened for the presence
of antibodies with the desired specificity using one of the
above-described procedures.
[0149] The present invention further provides the above-described
antibodies in detectably labeled form. Antibodies can be detectably
labeled through the use of radioisotopes, affinity labels (such as
biotin, avidin, etc.), enzymatic labels (such as horseradish
peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as
FITC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for
accomplishing such labeling are well-known in the art, for example
see (Stemberger, L. A. et al., J. Histochem. Cytochem. 18:315
(1970); Bayer, E. A. et al., Meth. Enzym. 62:308 (1979); Engval, E.
et al., Immunol. 109:129 (1972); Goding, J. W. J. Immunol. Meth.
13:215 (1976)).
[0150] The labeled antibodies of the present invention can be used
for in vitro, in vivo, and in situ assays to identify cells or
tissues in which a fragment of the Mycoplasma genitalium genome is
expressed.
[0151] The present invention further provides the above-described
antibodies immobilized on a solid support. Examples of such solid
supports include plastics such as polycarbonate, complex
carbohydrates such as agarose and sepharose, acrylic resins and
such as polyacrylamide and latex beads. Techniques for coupling
antibodies to such solid supports are well known in the art (Weir,
D. M. et al., "Handbook of Experimental Immunology" 4th Ed.,
Blackwell Scientific Publications, Oxford, England, Chapter 10
(1986); Jacoby, W. D. et al., Meth. Enzym. 34 Academic Press, N.Y.
(1974)). The immobilized antibodies of the present invention can be
used for in vitro, in vivo, and in situ assays as well as for
immunoaffinity purification of the proteins of the present
invention.
[0152] 3. Diagnostic Assays and Kits
[0153] The present invention further provides methods to identify
the expression of one of the ORFs of the present invention, or
homolog thereof, in a test sample, using one of the DFs or
antibodies of the present invention.
[0154] In detail, such methods comprise incubating a test sample
with one or more of the antibodies or one or more of the DFs of the
present invention and assaying for binding of the DFs or antibodies
to components within the test sample.
[0155] Conditions for incubating a DF or antibody with a test
sample vary. Incubation conditions depend on the format employed in
the assay, the detection methods employed, and the type and nature
of the DF or antibody used in the assay. One skilled in the art
will recognize that any one of the commonly available
hybridization, amplification or immunological assay formats can
readily be adapted to employ the DFs or antibodies of the present
invention. Examples of such assays can be found in Chard, T., An
Introduction to Radioimmunoassay and Related Techniques, Elsevier
Science Publishers, Amsterdam, The Netherlands (1986); Bullock, G.
R. et al., Techniques in Immunocytochemistry, Academic Press,
Orlando, Fla. Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen,
P., Practice and Theory of Enzyme Immunoassays: Laboratory
Techniques in Biochemistry and Molecular Biology, Elsevier Science
Publishers, Amsterdam, The Netherlands (1985).
[0156] The test samples of the present invention include cells,
protein or membrane extracts of cells, or biological fluids such as
sputum, blood, serum, plasma, or urine. The test sample used in the
above-described method will vary based on the assay format, nature
of the detection method and the tissues, cells or extracts used as
the sample to be assayed. Methods for preparing protein extracts or
membrane extracts of cells are well known in the art and can be
readily be adapted in order to obtain a sample which is compatible
with the system utilized.
[0157] In another embodiment of the present invention, kits are
provided which contain the necessary reagents to carry out the
assays of the present invention.
[0158] Specifically, the invention provides a compartmentalized kit
to receive, in close confinement, one or more containers which
comprises: (a) a first container comprising one of the DFs or
antibodies of the present invention; and (b) one or more other
containers comprising one or more of the following: wash reagents,
reagents capable of detecting presence of a bound DF or
antibody.
[0159] In detail, a compartmentalized kit includes any kit in which
reagents are contained in separate containers. Such containers
include small glass containers, plastic containers or strips of
plastic or paper. Such containers allows one to efficiently
transfer reagents from one compartment to another compartment such
that the samples and reagents are not cross-contaminated, and the
agents or solutions of each container can be added in a
quantitative fashion from one compartment to another. Such
containers will include a container which will accept the test
sample, a container which contains the antibodies used in the
assay, containers which contain wash reagents (such as phosphate
buffered saline, Tris-buffers, etc.), and containers which contain
the reagents used to detect the bound antibody or DF.
[0160] Types of detection reagents include labeled nucleic acid
probes, labeled secondary antibodies, or in the alternative, if the
primary antibody is labeled, the enzymatic, or antibody binding
reagents which are capable of reacting with the labeled antibody.
One skilled in the art will readily recognize that the disclosed
DFs and antibodies of the present invention can be readily
incorporated into one of the established kit formats which are well
known in the art.
[0161] 4. Screening Assay for Binding Agents
[0162] Using the isolated proteins of the present invention, the
present invention further provides methods of obtaining and
identifying agents which bind to a protein encoded by one of the
ORFs of the present invention or to one of the fragments and the
Mycoplasma genome herein described.
[0163] In detail, said method comprises the steps of:
[0164] (a) contacting an agent with an isolated protein encoded by
one of the ORFs of the present invention, or an isolated fragment
of the Mycoplasma genome; and
[0165] (b) determining whether the agent binds to said protein or
said fragment.
[0166] The agents screened in the above assay can be, but are not
limited to, peptides, carbohydrates, vitamin derivatives, or other
pharmaceutical agents. The agents can be selected and screened at
random or rationally selected or designed using protein modeling
techniques.
[0167] For random screening, agents such as peptides,
carbohydrates, pharmaceutical agents and the like are selected at
random and are assayed for their ability to bind to the protein
encoded by the ORF of the present invention.
[0168] Alternatively, agents may be rationally selected or
designed. As used herein, an agent is said to be "rationally
selected or designed" when the agent is chosen based on the
configuration of the particular protein. For example, one skilled
in the art can readily adapt currently available procedures to
generate peptides, pharmaceutical agents and the like capable of
binding to a specific peptide sequence in order to generate
rationally designed antipeptide peptides, for example see Hurby et
al., "Application of Synthetic Peptides: Antisense Peptides," In
Synthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp.
289-307, and Kaspezak et al., Biochemistry 28:9230-8 (1989), or
pharmaceutical agents, or the like.
[0169] In addition to the foregoing, one class of agents of the
present invention, as broadly described, can be used to control
gene expression through binding to one of the ORFs or EMFs of the
present invention. As described above, such agents can be randomly
screened or rationally designed/selected. Targeting the ORF or EMF
allows a skilled artisan to design sequence specific or element
specific agents, modulating the expression of either a single ORF
or multiple ORFs which rely on the same EMF for expression
control.
[0170] One class of DNA binding agents are agents which contain
base residues which hybridize or form a triple helix formation by
binding to DNA or RNA. Such agents can be based on the classic
phosphodiester, ribonucleic acid backbone, or can be a variety of
sulfhydryl or polymeric derivatives which have base attachment
capacity.
[0171] Agents suitable for use in these methods usually contain 20
to 40 bases and are designed to be complementary to a region of the
gene involved in transcription (triple helix--see Lee et al., Nucl.
Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988);
and Dervan et al., Science 251: 1360 (1991)) or to the MRNA itself
(antisense--Okano, J. Neurochem. 56:560 (1991);
Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression,
CRC Press, Boca Raton, Fla. (1988)). Triple helix-formation
optimally results in a shut-off of RNA transcription from DNA,
while antisense RNA hybridization blocks translation of an mRNA
molecule into polypeptide. Both techniques have been demonstrated
to be effective in model systems. Information contained in the
sequences of the present invention is necessary for the design of
an antisense or triple helix oligonucleotide and other DNA binding
agents.
[0172] Agents which bind to a protein encoded by one of the ORFs of
the present invention can be used as a diagnostic agent, in the
control of bacterial infection by modulating the activity of the
protein encoded by the ORF. Agents which bind to a protein encoded
by one of the ORFs of the present invention can be formulated using
known techniques to generate a pharmaceutical composition for use
in controlling Mycoplasma growth and infection.
[0173] 5. Vaccine and Pharmaceutical Composition
[0174] The present invention further provides pharmaceutical agents
which can be used to modulate the growth of Mycoplasma genitalium,
or another related organism, in vivo or in vitro. As used herein, a
"pharmaceutical agent" is defined as a composition of matter which
can be formulated using known techniques to provide a
pharmaceutical compositions. As used herein, the "pharmaceutical
agents of the present invention" refers the pharmaceutical agents
which are derived from the proteins encoded by the ORFs of the
present invention or are agents which are identified using the
herein described assays.
[0175] As used herein, a pharmaceutical agent is said to "modulated
the growth of Mycoplasma sp., or a related organism, in vivo or in
vitro," when the agent reduces the rate of growth, rate of
division, or viability of the organism in question. The
pharmaceutical agents of the present invention can modulate the
growth of an organism in many fashions, although an understanding
of the underlying mechanism of action is not needed to practice the
use of the pharmaceutical agents of the present invention. Some
agents will modulate the growth by binding to an important protein
thus blocking the biological activity of the protein, while other
agents may bind to a component of the outer surface of the organism
blocking attachment or rendering the organism more prone to act the
bodies nature immune system. Alternatively, the agent may be
comprise a protein encoded by one of the ORFs of the present
invention and serve as a vaccine. The development and use of a
vaccine based on outer membrane components, such as the LPS, are
well known in the art.
[0176] As used herein, a "related organism" is a broad term which
refers to any organism whose growth can be modulated by one of the
pharmaceutical agents of the present invention. In general, such an
organism will contain a homolog of the protein which is the target
of the pharmaceutical agent or the protein used as a vaccine. As
such, related organism do not need to be bacterial but may be
fungal or viral pathogens.
[0177] The pharmaceutical agents and compositions of the present
invention may be administered in a convenient manner such as by the
oral, topical, intravenous, intraperitoneal, intramuscular,
subcutaneous, intranasal or intradermal routes. The pharmaceutical
compositions are administered in an amount which is effective for
treating and/or prophylaxis of the specific indication. In general,
they are administered in an amount of at least about 10 .mu.g/kg
body weight and in most cases they will be administered in an
amount not in excess of about 8 mg/Kg body weight per day. In most
cases, the dosage is from about 10 .mu.g/kg to about 1 mg/kg body
weight daily, taking into account the routes of administration,
symptoms, etc.
[0178] The agents of the present invention can be used in native
form or can be modified to form a chemical derivative. As used
herein, a molecule is said to be a "chemical derivative" of another
molecule when it contains additional chemical moieties not normally
a part of the molecule. Such moieties may improve the molecule's
solubility, absorption, biological half life, etc. The moieties may
alternatively decrease the toxicity of the molecule, eliminate or
attenuate any undesirable side effect of the molecule, etc.
Moieties capable of mediating such effects are disclosed in
Remington's Pharmaceutical Sciences (1980).
[0179] For example, a change in the immunological character of the
functional derivative, such as affinity for a given antibody, is
measured by a competitive type immunoassay. Changes in
immunomodulation activity are measured by the appropriate assay.
Modifications of such protein properties as redox or thermal
stability, biological half-life, hydrophobicity, susceptibility to
proteolytic degradation or the tendency to aggregate with carriers
or into multimers are assayed by methods well known to the
ordinarily skilled artisan.
[0180] The therapeutic effects of the agents of the present
invention may be obtained by providing the agent to a patient by
any suitable means (i.e., inhalation, intravenously,
intramuscularly, subcutaneously, enterally, or parenterally). It is
preferred to administer the agent of the present invention so as to
achieve an effective concentration within the blood or tissue in
which the growth of the organism is to be controlled.
[0181] To achieve an effective blood concentration, the preferred
method is to administer the agent by injection. The administration
may be by continuous infusion, or by single or multiple
injections.
[0182] In providing a patient with one of the agents of the present
invention, the dosage of the administered agent will vary depending
upon such factors as the patient's age, weight, height, sex,
general medical condition, previous medical history, etc. In
general, it is desirable to provide the recipient with a dosage of
agent which is in the range of from about 1 pg/kg to 10 mg/kg (body
weight of patient), although a lower or higher dosage may be
administered. The therapeutically effective dose can be lowered by
using combinations of the agents of the present invention or
another agent.
[0183] As used herein, two or more compounds or agents are said to
be administered "in combination" with each other when either (1)
the physiological effects of each compound, or (2) the serum
concentrations of each compound can be measured at the same time.
The composition of the present invention can be administered
concurrently with, prior to, or following the administration of the
other agent.
[0184] The agents of the present invention are intended to be
provided to recipient subjects in an amount sufficient to decrease
the rate of growth (as defined above) of the target organism.
[0185] The administration of the agent(s) of the invention may be
for either a "prophylactic" or "therapeutic" purpose. When provided
prophylactically, the agent(s) are provided in advance of any
symptoms indicative of the organisms growth. The prophylactic
administration of the agent(s) serves to prevent, attenuate, or
decrease the rate of onset of any subsequent infection. When
provided therapeutically, the agent(s) are provided at (or shortly
after) the onset of an indication of infection. The therapeutic
administration of the compound(s) serves to attenuate the
pathological symptoms of the infection and to increase the rate of
recovery.
[0186] The agents of the present invention are administered to the
mammal in a pharmaceutically acceptable form and in a
therapeutically effective concentration. A composition is said to
be "pharmacologically acceptable" if its administration can be
tolerated by a recipient patient. Such an agent is said to be
administered in a "therapeutically effective amount" if the amount
administered is physiologically significant. An agent is
physiologically significant if its presence results in a detectable
change in the physiology of a recipient patient.
[0187] The agents of the present invention can be formulated
according to known methods to prepare pharmaceutically useful
compositions, whereby these materials, or their functional
derivatives, are combined in admixture with a pharmaceutically
acceptable carrier vehicle. Suitable vehicles and their
formulation, inclusive of other human proteins, e.g., human serum
albumin, are described, for example, in Remington's Pharmaceutical
Sciences (16th ed., Osol, A., Ed., Mack, Easton Pa. (1980)). In
order to form a pharmaceutically acceptable composition suitable
for effective administration, such compositions will contain an
effective amount of one or more of the agents of the present
invention, together with a suitable amount of carrier vehicle.
[0188] Additional pharmaceutical methods may be employed to control
the duration of action. Control release preparations may be
achieved through the use of polymers to complex or absorb one or
more of the agents of the present invention. The controlled
delivery may be exercised by selecting appropriate macromolecules
(for example polyesters, polyamino acids, polyvinyl, pyrrolidone,
ethylenevinylacetate, methylcellulose, carboxymethylcellulose, or
protamine, sulfate) and the concentration of macromolecules as well
as the methods of incorporation in order to control release.
Another possible method to control the duration of action by
controlled release preparations is to incorporate agents of the
present invention into particles of a polymeric material such as
polyesters, polyamino acids, hydrogels, poly(lactic acid) or
ethylene vinylacetate copolymers. Alternatively, instead of
incorporating these agents into polymeric particles, it is possible
to entrap these materials in microcapsules prepared, for example,
by coacervation techniques or by interfacial polymerization, for
example, hydroxymethylcellulose or gelatine-microcapsules and
poly(methylmethacylate) microcapsules, respectively, or in
colloidal drug delivery systems, for example, liposomes, albumin
microspheres, microemulsions, nanoparticles, and nanocapsules or in
macroemulsions. Such techniques are disclosed in Remington's
Pharmaceutical Sciences (1980).
[0189] The invention further provides a pharmaceutical pack or kit
comprising one or more containers filled with one or more of the
ingredients of the pharmaceutical compositions of the invention.
Associated with such container(s) can be a notice in the form
prescribed by a governmental agency regulating the manufacture, use
or sale of pharmaceuticals or biological products, which notice
reflects approval by the agency of manufacture, use or sale for
human administration. In addition, the agents of the present
invention may be employed in conjunction with other therapeutic
compounds.
EXPERIMENTAL
EXAMPLE 1
[0190] Overview of Experimental Design and Methods
[0191] 1. Shotgun Sequencing Strategy
[0192] The overall strategy for a shotgun approach to whole genome
sequencing is outlined in Table 3. The theory of shotgun sequencing
follows from the application of the equation for the Poisson
distribution p.sub.x=m.sup.xe/x!, where x is the number of
occurrences of an event and m is the mean number of occurrences. To
determine the probability that any given base is not sequenced
after a certain amount of random sequence has been generated, if L
is the genome length, n is the number of clone insert ends
sequenced, and w is the sequencing read length, then m=nw/L, and
the probability that no clone originates at any of the w bases
preceding a given base, i.e., the probability that the base is not
sequenced, id p.sub.0=e.sup.-m. Using the fold coverage as the unit
form, one sees that after 580 kb of sequence has been randomly
generated, m=1, representing 1.times. coverage. In this case,
p.sub.0=e.sup.-1=37, thus approximately 37% is unsequenced. A
5.times. coverage (approximately 3150 clones sequenced from both
insert ends) yields p.sub.0=e.sup.-5=0.0067, or 0.67% unsequenced.
The total gap length is Le.sup.-m and the average gap size is L/n.
5.times.coverage would leave about 48 gaps averaging about 80 bp in
size. The treatment is essentially that of Lander and Waterman.
Table 4 illustrates a computer simulation of a random sequencing
experiment for coverage of a 580 kb genome with an average fragment
size of 400 bp.
[0193] 2. Random Library Construction
[0194] In order to approximate the random model described above
during actual sequencing, a nearly ideal library of cloned genomic
fragment is required. M. genitalium genomic chromosomal DNA was
mechanically sheared, digested with BAL31 nuclease to produce
blunt-ends, and size-fractionated by agarose gel electrophoresis.
Fragments in the 2.0 kb size range were excised and recovered.
These fragments were ligated to SmaI-cut, phophatased pUC18 vector
and the ligated products were fractionated on an agarose gel. The
linear vector plus insert band was excised and recovered. The ends
of the linear recombinant molecules were repaired with T4
polymerase treatment and the molecules were then ligated into
circles. This two-stage procedure resulted in a molecularly random
collection of single-insert plasmid recombinants with minimal
contamination from double-insert plasmid recombinants with minimal
contamination from double-insert chimeras (<1%) or free vector
(<1%). Deviation from randomness is most likely to occur during
cloning. E. coli host cells deficient in all recombinant and
restriction functions were used to prevent rearrangements,
deletions, and loss of clones by restriction. Transformed cells
were plated directly on antibiotic diffusion plates to avoid the
usual broth recovery phase which allows multiplication and
selection of the most rapidly growing cells. All colonies were
picked for template preparation regardless of size. Only clones
lost due to "poison" DNA or deleterious gene products would be
deleted from the library, resulting in a slight increase in gap
number over that expected.
[0195] In order to evaluate the quality of the M. genitalium random
insert library, sequence data was obtained from approximately 2000
templates using the M13F primer. The random sequence fragments were
assembled using The Institute for Genomic Research (TIGR)
autoassembler software after obtaining 500, 1000, 1500, and 2000
sequence fragments, and the number of unique assembled base pairs
was determined. The progression of assembly was plotted using the
actual data obtained from the assembly of up to 2000 sequence
fragments and compared the data that is provided in the ideal plot.
There was essentially no deviation of the actual assembly data from
the ideal plot, indicating that we had constructed close to an
ideal random library with minimal contamination from double insert
chimeras and free of vector.
[0196] 3. Random DNA Sequencing
[0197] Five-thousand seven hundred and sixty (5,760) plasmid
templates were prepared using a "boiler bead" preparation method
developed in collaboration with AGTC (Gaithersburg, Md.), as
suggested by the manufacturer. The AGTC method is performed in a
96-well format for all stages of DNA preparation from bacterial
growth through final DNA purification. Template concentration was
determined using Hoechst Dye and a Millipore Cytofluor. DNA
concentrations were not adjusted and low-yielding templates were
identified and not sequenced where possible. Sequencing reactions
were carried out on plasmid templates using the AB Catalyst Lab
station or Perkin-Elmer 9600 Thermocyclers with Applied Biosystems
PRISM Ready Reaction Dye Primer Cycle Sequencing Kits for the M13
forward (-21M13) and the M13 reverse (RP1) primers. Dye terminator
sequencing reactions were carried out on the lambda templates on a
Perkin-Elmer 9600 Thermocyler using the Applied Biosystems Ready
Reaction Dye Terminator Cycle Sequencing kits. Nine-thousand eight
hundred and forty-six (9,846) sequencing reactions were performed
during the random phase of the project by 4 individuals using an
average of 10 AB373 DNA Sequencers over a 2 month period. All
sequencing reactions were analyzed using the Stretch modification
of the AB373, primarily using a 36cm well-to-read distance. The
overall sequencing success rate for M13-21 sequences was 88% and
84% for M13RP1 sequences. The average usable read length for M13-21
sequences was 485 and 441 for M13RP1 sequences.
[0198] The art has described the value of using sequence from both
ends of sequencing templates to facilitate ordering of contigs in
shotgun assembly projects. A skilled artisan must balance the
desirability of both-end sequencing (including the reduced cost of
lower total number of templates) against shorter read-lengths and
lower success rates for sequencing reactions performed with the
M13RP1 (reverse) primer compared to the M13-21 (forward) primer.
For this project, essentially all of the templates were sequenced
from both ends.
[0199] 4. Protocol for Automated Cycle Sequencing
[0200] The sequencing consisted of using five (5) ABI Catalyst
robots and ten (10) ABI 373 Automated DNA Sequencers. The Catalyst
robot is a publicly available sophisticated pipetting and
temperature control robot which has been developed specifically for
DNA sequencing reactions. The Catalyst combines pre-aliquoted
templates and reaction mixes consisting of deoxy- and
dideoxynucleotides, the Taq thermostable DNA polymerase,
fluorescently-labeled sequencing primers, and reaction buffer.
Reaction mixes and templates were combined in the wells of an
aluminum 96-well thermocycling plate. Thirty consecutive cycles of
linear amplification (e.g., one primer synthesis) steps were
performed including denaturation, annealing of primer and template,
and extension of DNA synthesis. A heated lid with rubber gaskets on
the thermocycling plate.prevented evaporation without the need for
an oil overlay.
[0201] Two sequencing protocols were used: dye-labeled primers and
dye-labeled dideoxy chain terminators. The shotgun sequencing
involves use of four dye-labeled sequencing primers, one for each
of the four terminator nucleotide. Each dye-primer is labeled with
a different fluorescent dye, permitting the four individual
reactions to be combined into one lane of the 373 DNA Sequencer for
electrophoresis, detection, and base-calling. ABI currently
supplies pre-mixed reaction mixes in bulk packages containing all
the necessary non-template reagents for sequencing. Sequencing can
be done with both plasmid and PCR-generated templates with both
dye-primers and dye-terminators with approximately equal fidelity,
although plasmid templates generally give longer usable
sequences.
[0202] Thirty-two reactions were loaded per 373 Sequencer each day,
for a total of 960 samples. Electrophoresis was run overnight
following the manufacture's protocols, and the data was collected
for twelve hours. Following electrophoresis and fluorescence
detection, the ABI 373 performs automatic lane tracking and
base-calling. The lane-tracking was confirmed visually. Each
sequence electropherogram (or fluorescence lane trace) was
inspected visually and assessed for quality. Trailing sequences of
low quality were removed and the sequence itself was loaded via
software a Sybase database (archived daily to a 8 mm tape). Leading
vector polylinker sequence was removed automatically by software
program. The average edited lengths of sequences from the ABI 373
Sequencers converted to Stretch Liners were approximately 460
bp.
[0203] Informatics
[0204] 1. Data Management
[0205] A number of information management systems (LIMS) for a
large-scale sequencing lab have been developed. A system was used
which allowed an automated data flow wherever possible to reduce
user error. The system used to collect and assemble the sequence
information obtained is centered upon a relational data management
system built using the Sybase RDBMS. The database is designed to
store and correlate all information collected during the entire
operation from template preparation to final analysis of the
genome. Because the raw output of the AB 373 Sequencers is based on
a Macintosh platform and the data management system chosen is based
on a Unix platform, it was necessary to design and implement a
variety of multi-user, client server applications which allow the
raw data as well as analysis results to flow seamlessly into the
database with a minimum of user effort.
[0206] 2. Assembly
[0207] The sequence data from 8,472 sequence fragments was used to
assemble the M. genitalium genome. The assembly was performed by
using a new assembly engine (TIGR Assembler--previously designated
ASMG) developed at TIGR. The TIGR Assembler simultaneously clusters
and assembles fragments of the genome. In order to obtain the
necessary speed, the TIGR Assembler builds a hash table of 10 bp
oligonucleotide subsequences to generate a list of potential
sequence fragment. The number of potential overlaps for each
fragment determines which fragments are likely to fall into
repetitive elements. Beginning with a single seed sequence
fragment, the TIGR Assembler extends the current contig by
attempting to add the best matching fragment based on
oligonucleotide content. The current contig and candidate fragment
are aligned using a modified version of the Smith-Waterman
algorithm which provides for optimal gap alignments. The current
contig is extended by the fragment only if strict criteria for the
quality of the match are met. The match criteria include the
minimum length of overlap, the maximum length of an unmatched end,
and the minimum percentage match. These criteria are automatically
lowered by the TIGR Assembler in regions of minimal coverage and
raised in regions with a good chance of containing repetitive
elements. Potentially chimeric fragments and fragments representing
the boundaries of repetitive elements are often rejected based on
partial mismatches at the ends of alignments and excluded from the
current contig. The TIGR Assembler is designed to take advantage of
clone size information coupled with sequencing from both ends of
each template. The TIGR Assembler enforces the constraint that
sequence fragments from two ends of the same template point toward
one another in the contig and are located within a certain range of
base pairs (definable for each clone based on the known clone size
range for a given library). Assembly of the 8,472 sequence
fragments of M. genitalium required 10 hours of CPU time on a
SPARCenter 2000. All contigs were loaded into a Sybase structure
representing the location of each fragment in the contig and
extensive information about the consensus sequence itself. The
result of this process was approximately 40 contigs ordered into 2
groups (See below). Because of the high stringency of the TIGR
Assembler process it was found to be useful to perform a FASTA
(GRASTA) alignment of all contigs built by the TIGR Assembler
process against each other. In this way additional overlaps were
detected which enabled compression of the data set into 26 contigs
in 2 groups.
[0208] Achieving Closure
[0209] The complete genome sequence was obtained by sequencing
across the gaps between contigs. While gap filling has occupied a
major portion of the time and expense of other genome sequencing
projects, it was minimal in the present invention. This was
primarily due to 1) saturation of the genome as a result of the
number of random clones and sequencing reactions performed, 2) the
longer read lengths obtained from the Stretch Liners, 3) the
anchored ends which were obtain for joining contigs, and 4) the
overall capacity and efficiency of the high throughput sequencing
facility.
[0210] Gaps occurred on a predicted random basis, as shown in Table
4, which illustrates simulated random sequencing. These gaps
generally were less than 200 bp in size. All of the gaps were
closed by sequencing further on the templates bordering the gaps.
In these cases, oligo primers for extension of the sequence from
both ends of the gap were generated using techniques known in the
art. This gave a double standard coverage across the gap areas.
[0211] The high redundancy of sequence information that was
obtained from the shotgun approach gave a highly accurate sequence.
Our sequence accuracy was confirmed by comparing the sequence
information obtained against known M. genitalium genes present in
the GenBank database. The accuracy of our chromosome structure was
confirmed by comparison of restriction digests to the known
restriction map of IM. genitalium. The EcoRI restriction map of M.
genitalium is shown in FIG. 1 and expressed in tabular form in
Table 5.
[0212] Identifying Genes
[0213] M. genitalium ORFs were initially defined by evaluating
their coding potential with the program GeneWorks using composition
matrices specific to Mycoplasma genomic DNA. The ORF sequences
(plus 300 bp of flanking sequence) were used in searches against a
database of non-redundant bacterial proteins (NRBP). Redundancy was
removed from NRBP at two stages. (1) All DNA coding sequences were
extracted from GenBank (release 85), and sequences from the same
species were searched against each other. Sequences having >97%
similarity over regions >100 nucleotides were combined. (2) The
sequences were translated and used to protein comparisons with all
sequences in Swiss-Prot (release 30). Sequences belonging to the
same species and having >98% similarity over 33 amino acids were
combined. NRBP is composed of 21445 sequences from 23751 GenBank
sequences and 11183 Swiss-Prot sequences from 1099 different
species.
[0214] Searches were performed using an algorithm that (1)
translates the query DNA sequence in all six reading frames for
searching against a protein database, (2) identifies the protein
sequences that match the query, and (3) aligns the protein-protein
matches using a modified Smith-Waterman algorithm. In cases where
insertion or deletions in the DNA sequence produced a frame shift
error, the alignment algorithm started with protein regions of
maximum similarity and extended the alignment to the same database
match using the 300 bp flanking region. Regions known to contain
frame shift errors were saved to the database and evaluated for
possible correction. The role categories were adopted from those
previously defined by Riley et al. for E. coli gene products. Role
assignments were made to M. genitalium ORFS at the protein sequence
level by linking the protein sequence of the ORFS with the
Swiss-Prot sequences in the Riley database.
[0215] Detailed Description of Sequencing the Mycoplasma genitalium
Genome, Genome Analysis and Comparative Genomics
[0216] We have determined the complete nucleotide sequence (580,071
bp) of the Mycoplasma genitalium genome using the approach of whole
chromosome shotgun sequencing and assembly, which has successfully
been applied to the analysis of the Haemophilus influenzae genome
(R. Fleischmann et al., Science 269:496 (1995)). These data,
together with the description of the complete genome sequence (1.83
Mb) of the eubacterium Haemophilus influenzae, have provided the
opportunity for comparative genomics on a whole genome level for
the first time. Our initial whole genome comparisons reveal
fundamental differences in genome content which are reflected in
different physiological and metabolic capacities of M. genitalium
and H. influenzae.
[0217] The strategy and methodology for whole genome shotgun
sequencing and assembly was similar to that previously described
for H. influenzae (R. Fleischmann et al., Science 269:496 (1995).
In particular, a total of 50 .mu.g of purified M. genitalium strain
G-37 DNA (ATCC No. 33530) was isolated from cells grown in
Hayflick's medium. A mixture (990 .mu.l) containing 50 .mu.g of
DNA, 300 mM sodium acetate, 10 mM tris HCl, 1 mM EDTA, and 30
percent glycerol was chilled to 0.degree. C. in a nebulizer chamber
and sheared at 4 lbs/in.sup.2 for 60 seconds. The DNA was
precipitated in ethanol and redissolved in 50 .mu.l of tris-EDTA
(TE) buffer to create blunt ends; a 40 .mu.l portion was digested
for 10 minutes at 30.degree. C. in 85 .mu.l of BAL31 buffer with 2
units of BAL 31 nuclease (New England BioLabs). The DNA was
extracted with phenol, precipitated in ethanol, dissolved in 60
.mu.l of TE buffer, and fractionated on a 1.0 percent low melting
agarose gel. A fraction (2.0 kb) was excised, extracted with
phenol, and redissolved in 20 .mu.l of TE buffer. A two-step
ligation procedure was used to produce a plasmid library in which
99% of the recombinants contained inserts, of which >99% were
single inserts. The first ligation mixture (50 .mu.l) contained
approximately 2 .mu.g DNA fragments, 2 .mu.g of SmaI+bacterial
alkaline phosphatase pUC 18 DNA (Pharmacia), and 10 units of T4 DNA
ligase (GIBCO/BRL), and incubation was for 5 hours at 4.degree. C.
After extraction with phenol and ethanol precipitation, the DNA was
dissolved in 20 .mu.l of TE buffer and separated by electrophoresis
on a 1.0 percent low melting agarose gel. A ladder of ethidium
bromide-stained, linearized DNA bands, identified by size as insert
(i), vector (v), v+i, v+2i, v+3i, etc. was visualized by 360 nm
ultraviolet light. The v+i DNA was excised and recovered in 20
.mu.l of TE buffer. The v+i DNA was blunt-ended by T4 polymerase
treatment for 5 minutes at 37.degree. C. in a reaction mixture
(50til) containing the linearized v+i fragments, four
deoxynucleotide triphosphates (dNTPs) (25 .mu.M each), and 3 units
of T4 polymerase (New England Biolabs) under buffer conditions
recommended by the supplier. After phenol extraction and ethanol
precipitation, the repaired v+i linear pieces were dissolved in 20
.mu.l of TE. The final ligation to produce circles was carried out
in a 50 .mu.l reaction containing 5 .mu.l of v+i DNA and 5 units of
T4 ligase at 15.degree. C. overnight. The reaction mixture was
heated at 67.degree. C. for 10 minutes and stored at -20.degree.
C.
[0218] For transformation, a 100 .mu.l portion of Epicurian-SURE 2
Supercompetent Cells (Stratagene 200152) was thawed on ice and
transferred to a chilled Falcon 2059 tube on ice. A 1.7 .mu.l
volume of 1.42M .beta.-mercaptoethanol was added to the cells to a
final concentration of 25 mM. Cells were incubated on ice for 10
minutes. A 1 .mu.l sample of the final ligation mix was added to
the cells and incubated on ice for 30 minutes. The cells were
heat-treated for 30 seconds at 42.degree. C. and placed back on ice
for 2 minutes. The outgrowth period in liquid culture was omitted
to minimize the preferential growth of any transformed cell.
Instead, the transformed cells were plated directly on a nutrient
rich SOB plate containing a 5 ml bottom layer of SOB agar (1.5
percent SOB agar consisted of 20 g of tryptone, 5 g of yeast
extract, 0.5 g of NaCl, and 1.5 percent Difco agar/liter). The 5 ml
bottom layer was supplemented with 0.4 ml of ampicillin (50 mg/ml)
per 100 ml of SOB agar. The 15 ml top layer of SOB agar was
supplemented with 1 ml of MgCl.sub.2 (1M) and 1 ml of MgSO.sub.4
(1M) per 100 ml of SOB agar. The 15 ml top layer was poured just
before plating. The titer of the library was approximately 100
colonies per 10 .mu.l aliquot of transformation.
[0219] One of the lessons learned from sequencing and assembly of
the complete H. influenzae genome was that contig ordering and gap
closure is most efficient if the random sequencing phase of the
project is continued until at least 99.8% -99.9% of the genome is
sequenced with at least 6-fold coverage. To calculate the number of
random sequencing reactions necessary to obtain this coverage for
the M. genitalium genome, we made use of the Lander and Waterman
[E. S. Lander and M. S. Waterman, Genomics 2:231 (1988)]
application of the Poisson distribution, where p.sub.x=e.sup.-nw/L.
p.sub.x is the probability that any given base is not sequenced, n
is the number of clone insert ends sequenced, w is the average read
length of each template in bp, and L is the size of the genome in
bp. For a genome of 580 kb with an average sequencing read length
of 450 bp after editing, approximately 8650 sequencing reactions
(or 4325 clones sequenced from both ends) should theoretically
provide 99.85% coverage of the genome. This level of coverage
should leave approximately 10 gaps with an average size of 70 bp
unsequenced.
[0220] To evaluate the quality of the M. genitalium library,
sequence data were obtained from both ends of approximately 600
templates using both the M13 forward (M13-21) and the M13 reverse
(M13RP1) primers. Sequence fragments were assembled using the TIGR
ASSEMBLER and found to approximate a Poisson distribution of
fragments with an average read length of 450 bp for a 580 kb
library, indicating that the library was essentially random.
[0221] For this project, a total of 5760 double-stranded DNA
plasmid templates were prepared in a 96-well format using a boiling
bead method. Ninety-four percent of the templates prepared yielded
a DNA concentration .gtoreq.30 ng/.mu.l and were used for
sequencing reactions. To facilitate ordering of contigs each
template was sequenced from both ends. Reactions were carried out
on using the AB Catalyst LabStation with Applied Biosystems PRISM
Ready reaction Dye Primer Cycle Sequencing Kits for the M13 forward
(M13-21) and the M13 reverse (M13RP1) primers. The success rate and
average read length after editing with the M13-21 primer were 88
percent and 444 bp, respectively, and 84 percent and 435 bp,
respectively, with the M13RP1 primer. All data from template
preparation to final analysis of the project were stored in a
relational data management system developed at TIGR [A. R.
Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii
International Conference on System Science (IEEE Computer Society
Press, Washington, D.C., 1993), p. 585] To facilitate ordering of
contigs each template was sequenced from both ends. A total of 9846
sequencing reactions were performed by five individuals using an
average of 8 AB 373 DNA Sequencers per day for a total of 8 weeks.
Assembly of 8472 high quality M. genitalium sequence fragments
along with 299 random genomic sequences from Peterson et al. (S. N.
Peterson et al., J. Bacteriol. 175:7918 (1993)) was performed with
the TIGR ASSEMBLER. The assembly process generated 39 contigs (size
range: 606 to 73,351 bp) which contained a total of 3,806,280 bp of
primary DNA sequence data. Contigs were ordered by ASM_ALIGN,
program which links contigs based on information derived from
forward and reverse sequencing reactions from the same clone.
[0222] ASM_ALIGN analysis revealed that all 39 gaps were spanned by
an existing template from the small insert genomic DNA library
(i.e., there were no physical gaps in the sequence assembly). The
order of the contigs was confirmed by comparing the order of the
random genomic sequences from Peterson et al. (S. N. Peterson et
al., J. Bacteriol. 175:7918 (1993)) that were incorporate into the
assembly with their known position on the physical map of the M.
genitalium chromosome (T. S. Lucier et al., Gene 150:27 (1994);
Peterson et al., J. Bacteriol. 177:3199 (1995)). Because of the
high stringency of the TIGR ASSEMBLER, the 39 contigs were searched
against each other with GRASTA (a modified FASTA (B. Brutlag et
al., Comp. Chem. 1:203 (1993)). The BLOSUM 60 amino acid
substitution matrix was used in all protein-protein comparisons [S.
Henikoff and J. G. Henikoff, Proc. Natl. Acad. Sci. USA 89:1091
(1992)] to detect overlaps (<30 bp) that would have been missed
during the initial assembly process. Eleven overlaps were detected
with this approach which reduced the total number of gaps from 39
to 28.
[0223] Templates spanning each of the sequence gaps were identified
and oligonucleotide primers were designed from the sequences at the
end of each contig. All gaps were less than 300 bp; thus a primer
walk from both ends of each template was sufficient for closure.
All electropherograms were visually inspected with TIGR EDITOR (R.
Fleischmann et al., Science 269:496 (1995)) for initial sequence
editing. Where a discrepancy could not be resolved o a clear
assignment made, the automatic base calls were left unchanged.
[0224] Several criteria for determination of sequence completion
were established for the H. influenzae genome sequencing project
and these same criteria were applied to this study. Across the
assembled M. genitalium genome there is an average sequence
redundancy of 6.5-fold. The completed sequence contains less than
1-% single sequence coverage. For each of the 53 ambiguities
remaining after editing and the 25 potential frameshifts found
after sequence-similarity searching, the appropriate template was
resequenced with an alternative sequencing chemistry (dye
terminator vs. dye primer) to resolve ambiguities. Although it is
extremely difficult to assess sequence accuracy, we estimate our
error rate to be less than 1 base in 10,000 based upon frequency of
shifts in open reading frames, unresolved ambiguities, overall
quality of raw data, and fold coverage.
[0225] A direct cost estimate for sequencing, assembly, and
annotation of the M. genitalium genome was determined by summing
reagent and labor costs for library construction, template
preparation and sequencing, gap closure, sequence confirmation,
annotation, and preparation for publication, and dividing by the
size of the genome in base pairs. This yielded a final cost of 30
cents per finished base pair.
[0226] Genomic Analysis
[0227] The M. genitalium genome is a circular chromosome of 580,071
bp. The overall G+C content is 32% (A, 34%; C, 16%; G, 16%; and T,
34%). The G+C content across the genome varies between 27 and 37%
(using a window of 5000 bp), with the regions of lowest G+C content
flanking the presumed origin of replication of the organism. As in
H. influenzae (Fleischmann, R. et al., Science 269:496 (1995)), the
rRNA operon in M. genitalium contains a higher G+C content (44%)
than the rest of the genome, as do the tRNA genes (52%). The higher
G+C content in these regions may reflect the necessity of retaining
essential G+C base pairing for secondary structure in rRNAs and
tRNAs (Rogers, M. J. et al., Isr. J. Med. Sci. 20:768 (1984)).
[0228] The genome of M. genitalium contains 74 EcoRI fragments, as
predicted by cosmid mapping data (Lucier, T. S. et al., Gene 150:27
(1994); Peterson et al., J. Bacteriol. 177:3199 (1995)). The order
and sizes of the EcoRI fragments determined from sequence analysis
are in agreement with those previously reported (Lucier, T. S. et
al., Gene 150:27 (1994); Peterson et al., J. Bacteriol. 177:3199
(1995)), with one apparent discrepancy between coordinates 62,708
and 94,573 in the sequence. However, re-evaluation of cosmid
hybridization data in light of results from genome sequence
analysis confirms that the sequence data are correct, and the extra
4.0 kb EcoRI fragment in this region of the cosmid map reflects a
misinterpretation of the overlap between cosmids J-8 and 21
(Lucier, T. S., unpublished observation). The ends of each clone
from the ordered cosmid library were sequenced and are shown on the
circular chromosome in FIG. 4. The order of the cosmids based on
sequence analysis is in complete agreement with that determined by
physical mapping (Lucier, T. S. et al., Gene 150:27 (1994);
Peterson et al., J. Bacteriol. 177:3199 (1995)).
[0229] We defined the first bp of the chromosomal sequence of M.
genitalium based on the putative origin of replication (Bailey
& Bott, J. Bacteriol. 176:5814 (1994)). Studies of origins of
replication in some prokaryotes have shown that DNA synthesis is
initiated in an untranscribed AT rich region between dnaA and dnaN
(Ogasawara, N. et al., in The Bacterial Chromosome, Krlica &
Riley, eds., American Society for Microbiology, Washington, D.C.
(1990), pp. 287-295; Ogasawara & Yoshikawa, Mol. Microbiol.
6:629 (1992)). A search of the M. genitalium sequence for "DnaA
boxes" around the putative origin of replication with consensus
"DnaA boxes" from Escherichia coli, Bacillus subtilis, and
Pseudomonas aeruginosa revealed no significant matches. Although we
have not been able to precisely localize the origin, the
co-localization of dnaA and dnaN to a 4000 bp region of the
chromosome lends support to the hypothesis that it is the
functional origin of replication in M. genitalium (Ogasawara, N. et
al., in The Bacterial Chromosome, Krlica & Riley, eds.,
American Society for Microbiology, Washington, D.C. (1990), pp.
287-295; Ogasawara & Yoshikawa, Mol. Microbiol. 6:629 (1992),
Miyata, M. et al., Nucleic Acids Res. 21:4816 (1993)). We have
chosen an untranscribed region between dnaA and dnaN so that dnaN
is numbered as the first open reading frame in the genome. As seen
in FIG. 4, genes to the right of this region are preferentially
transcribed from the plus strand and to the left of this region,
are preferentially transcribed from the minus strand. The apparent
polarity in gene transcription is maintained across each half of
the genome (FIGS. 4 and 5). This stands in marked contrast to H.
influenzae which displays no apparent polarity of transcription
around the origin of replication. The significance of this
observation remains to be determined.
[0230] The predicted coding regions of M. genitalium were initially
defined by searching the entire genome for open reading frames
greater than 100 amino acids. Translations were made using the
genetic code for mycoplasma species in which UGA encodes
tryptophan. All open reading frames were searched with BLAZE
(Brutlag, D. et al., Comp. Chem. 1:203 (1993). The BLOSUM 60 amino
acid substitution matrix was used in all protein-protein
comparisons (Henikoff, S. and Henikoff, J. G., Proc. Natl. Acad.
Sci. USA 89:1091 (1992)) against a non-redundant bacterial protein
database (NRBP) (Fleischmann, R. et al., Science 269:496 (1995))
developed at TIGR on a MasPar MP-2 massively parallel computer with
4096 microprocessors. Protein matches were aligned with PRAZE, a
modified Smith-Waterman (Waterman, M. S., Methods Enzymol. 164:765
(1988)) algorithm. Segments between predicted coding regions of the
genome were used in additional searches against all protein
sequences from GenPept, Swiss-Prot, and PIR. Pairwise alignments
between M. genitalium predicted open reading frames and sequences
from the public archives were examined. Motif matches were
annotated in cases where sequence similarity was confined to short
domains in the predicted coding region. The coding potential of 170
unidentified open reading frames was analyzed with GeneMark
(Borodovsky & Mcninch, ibid, p. 123) which had been trained
with 308 M. genitalium sequences. Open reading frames that had low
coding potential (based on the GeneMark analysis) and were smaller
than. 100 nucleotides (a total of 53) were removed from the final
set of putative coding regions. In a separate analysis, open
reading frames were searched against the complete set of translated
sequences from H. influenzae (GSDB accession L42023, see
(Fleischmann, R. et al., Science 269:496 (1995))). In total, these
processes resulted in the identification of 482 predicted coding
regions, of which 365 were putatively identified (Twenty-three of
the protein matches in Table 6 were annotated as motifs. These data
matches were not full-length protein matches, but nonetheless
displayed regions of significant amino acid similarity) and 117 had
no matches to protein sequences from any other organism.
[0231] The 365 predicted coding regions that matched protein
sequences from the public sequence archives were assigned
biological roles. The role classifications were developed from
Riley (Riley, M., Microbiol. Rev. 57:862 (1992)) and identical to
those used in H. influenzae assignments (Fleischmann, R., et al.,
Science 269:496 (1995)). A separate search procedure was used in
cases where we were unable to detect genes in the M. genitalium
genome. Query peptide sequences that were available from eubacteria
such as E. coli, B. subtilis, M. capricolum, and H. influenzae were
used in searches against all six reading frame translations of the
entire genome sequence, and the alignments were examined. The
possibility remains that current searching methods, an incomplete
set of query sequences, or the subjective analysis of the database
matches, are not sensitive enough to identify certain M. genitalium
gene sequences.
[0232] One-half of all predicted coding regions in M. genitalium
for which a putative identification could be assigned display the
greatest degree of similarity to a protein from either a
gram-positive organism (e.g., B. subtilis) or a Mycoplasma species.
The significance of this finding is underscored by the fact that
NRBP contained 3885 sequences from E. coli and only 1975 sequences
from B. subtilis. In the majority of cases where M. genitalium
coding regions matched sequences from both E. coli and Bacillus
species, the better match was to a sequence from Bacillus (average
of 62 percent similarity) rather than to a sequence from E. coli
(average of 56 percent similarity). The evolutionary relationship
between Mycoplasma and the Lactobacillus-Clostridium branch of the
gram-positive phylum has been deduced from small subunit rRNA
sequences (Maidak, B. L. et al., Nucleic Acids Research 22:3485
(1994)). Our data from whole genome analysis support this
hypothesis.
[0233] Comparative Genomics: M. genitalium and H. influenzae
[0234] A survey of the genes and their organization in M.
genitalium makes possible the description of a minimal set of genes
required for survival. One would predict that a minimal cell must
contain genes for replication and transcription, at least one rRNA
operon and a set of ribosomal proteins, tRNAs and tRNA synthetases,
transport proteins to derive nutrients from the environment,
biochemical pathways to generate ATP and reducing power, and
mechanisms for maintaining cellular homeostasis. Comparison of the
genes identified in M. genitalium with those in H. influenzae
allows for identification of a basic complement of genes conserved
in these two species and provides insights into physiological
differences between one of the simplest self-replicating
prokaryotes and a more complex, gram-negative bacterium.
[0235] The M. genitalium genome contains 482 predicted coding
sequences (Table 6) as compared to 1,727 identified in H.
influenzae (Fleischmann, R. et al., Science 269:496 (1995)). Table
7 summarizes the gene content of both organisms sorted by
functional category. The percent of the total genome in M.
genitalium and H. influenzae encoding genes involved in cell
envelope, cellular processes, energy metabolism, purine and
pyrimidine metabolism, replication, transcription, transport, and
other categories is similar; although the total number of genes in
these categories is considerably fewer in M. genitalium. A smaller
percentage of the M. genitalium genome encodes genes involved in
amino acid biosynthesis, biosynthesis of co-factors, central
intermediary metabolism, fatty acid and phospholipid metabolism,
and regulatory functions as compared with H. influenzae. A greater
percentage of the M. genitalium genome encodes proteins involved in
translation than in H. influenzae , as shown by the similar numbers
of ribosomal proteins and tRNA synthetases in both organisms.
[0236] The 482 predicted coding regions in M. genitalium (average
size of 1100 bp) cover 85% of the genome (on average, one gene
every 1169 bp), a value similar to that found in H. influenzae
where 1727 predicted coding regions (average size of 900 bp) cover
91% of the genome (one gene every 1042 bp). These data indicate
that the reduction in genome size that has occurred within
Mycoplasma has not led to an increase in gene density or a decrease
in gene size (Bork, P. et al., Mol. Microbiol. 16:955 (1995)). A
global search of M. genitalium and H. influenzae genomes reveals
short regions of conservation of gene order, particularly two
clusters of ribosomal proteins.
[0237] Replication. Two major protein complexes are formed during
replication: the primosome and the replisome. We have identified
genes encoding many of the essential proteins in the replication
process, including M. genitalium isologs of the primosome proteins
DnaA, DnaB, GyrA, GyrB, a single stranded DNA binding protein, and
the primase protein, DnaE. DnaJ and DnaK, heat shock proteins that
may function in the release of the primosome complex, are also
found in M. genitalium. A gene encoding the DnaC protein,
responsible for delivery of DnaB to the primosome, has yet to be
identified.
[0238] Genes encoding most of the essential subunit proteins for
DNA polymerase III in M. genitalium were also identified. The polC
gene encodes the a subunit which contains the polymerase activity.
We have also identified the isolog of dnaH in B. subtilis (dnaX in
E. coli) which encodes the .gamma. and t subunits as alternative
products from the same gene. These proteins are necessary for the
processivity of DNA polymerase III. An isolog of dnaN which encodes
the P subunit was previously identified in M. genitalium (Bailey
& Bott, J. Bacteriol. 176:5814 (1994)) and is involved in the
process of clamping the polymerase to the DNA template. While we
have yet to identify a gene encoding the subunit responsible for
the 3'-5' proofreading activity, it is possible that this activity
is encoded in the a subunit as has been previously described
(Sanjanwala, B. and Ganesa, A. T., Mol. Gen. Genet. 226:467 (1991);
Sanjanwala, B. and Ganesan, A. T., Proc. Natl. Acad. Sci. USA
86:4421 (1989)). Finally, we have identified a gene encoding a DNA
ligase, necessary for the joining of the Okazaki fragments formed
during synthesis of the lagging strand.
[0239] While we have identified genes encoding many of the isologs
thought to be essential for DNA replication, some genes encoding
proteins with key functions have yet to be identified. Examples of
these are the DnaC protein mentioned above as well as Dna.theta.
and Dna.delta. whose functions are less well understood but are
thought to be involved in the assembly and processivity of
polymerase III. Also apparently absent is a specific RNaseH protein
responsible for the hydrolysis of the RNA primer synthesized during
lagging strand synthesis.
[0240] DNA Repair. It has been suggested that in E. coli as many as
100 genes are involved in DNA repair (Kornberg, A. and Baker, T.
A., DNA Replication--2nd Ed., W. H. Freeman and Co., New York
(1992)), and in H. influenzae the number of putatively identified
DNA repair enzymes is approximately 30 (Fleischmann, R. et al.,
Science 269:496 (1995)). Although M. genitalium appears to have the
necessary genes to repair many of the more common lesions in DNA,
the number of genes devoted to the task is much smaller. Excision
repair of regions containing missing bases (apurinic/apyriminic
(AP) sites) can likely occur by a pathway involving endonuclease IV
(info), Pol I, and ligase. The ung gene which encodes uracil-DNA
glycosylase is present. This activity removes uracil residues from
DNA which usually arise by spontaneous deamination of cytosine.
This produces an AP site which could then be repaired as described
above.
[0241] All three genes necessary for production of the uvr ABC
exinuclease are present, and along with Pol I, helicase II, and
ligase should provide a mechanism for repair of damage such as
cross-linking, which requires replacement of both strands. Although
recA is present, which in E. coli is activated as it binds to
single strand DNA, thereby initiating the SOS response, we find no
evidence for a lexA gene which encodes the repressor which
regulates the SOS genes. We have not identified photolyase (phr) in
M. genitalium which repairs UV-induced pyrimidine dimers, or other
genes involved in reversal of DNA damage rather than excision and
replacement of the lesion.
[0242] Transcription. The critical components for transcription
were identified in M. genitalium. In addition to the a, b, and
b-prime subunits of the core RNA polymerase, M. genitalium appears
to encode a single & factor, whereas E. coli and B. subtilis
encode at least six and seven, respectively. We have not detected a
homolog of the Rho termination factor gene, so it seems likely that
a mechanism similar to Rho-independent termination in E. coli
operates in M. genitalium. We have clear evidence for homologs of
only two other genes which modulate transcription, nusA and
nusG.
[0243] Translation. M. genitalium possesses a single rRNA operon
which contains three rRNA subunits in the order: 16S rRNA(1518
bp)-spacer (203 bp)-23S rRNA (2905 bp)-spacer (56 bp)-5S rRNA (103
bp). The small subunit rRNA sequence was compared with the
Ribosomal Database Project's (Maidak, B. L. et al., Nucleic Acids
Research 22:3485 (1994)) prokaryote database with the program
"similarity_yank." Our sequence is identical to the M. genitalium
(strain G37) sequence deposited there, and the 10 most similar taxa
returned by this search are also in the genus Mycoplasma.
[0244] A total of 33 tRNA genes were identified in M. genitalium,
these were organized into five clusters plus nine single genes. In
all cases, the best match for each tRNA gene in M. genitalium was
the corresponding gene in M. pneumoniae (Simoneau, P. et al., Nuc.
Acids Res. 21:4967 (1993)). Furthermore, the grouping of tRNAs into
clusters (tmA, trnB, trnC, trnD, and trnE) was identical in M.
genitalium and M. pneumoniae as was gene order within the cluster
(Simoneau, P. et al., Nuc. Acids Res. 21:4967 (1993)). The only
difference between M. genitalium and M. pneumoniae observed with
regard to tRNA gene organization was an inversion between trnD and
GTG. In contrast to H. influenzae and many other eubacteria, no
tRNAs were found in the spacer region between the 16S and 23S rRNA
genes in the rRNA operon of M. genitalium, similar to what has been
reported for M. capricolum (Sawada, M. et al., Mol. Gen. Genet.
182:502 (1981)).
[0245] A search of the M. genitalium genome for tRNA synthetase
genes identified all of the expected genes with the exception of
glutaminyl tRNA synthetase. We expect that this gene is present in
the M. genitalium genome, but we have not been able to identify it
by similarity searches. The latest GenBank release (release 89)
contains only a single entry for a glutaminyl tRNA synthetase from
a bacterial species; this was from E. coli, a gram-negative
organism only distantly related to Mycoplasma. In general, tRNA
synthetase sequences from gram-positive organisms such as B.
subtilis displayed greater similarity to those from M. genitalium
than the corresponding sequences from E. coli, lending support to
the notion that the similarity between the E. coli and M.
genitalium glutaminyl tRNA synthetase may not have been high enough
to be detected.
[0246] Metabolic pathways. The reduction in genome size among
Mycoplasma species is associated with a marked reduction in the
number and components of biosynthetic pathways in these organisms,
requiring them to use metabolic products from their hosts. In the
laboratory, M. genitalium has not been grown in a chemically
defined medium. The complex growth requirements of this organism
can be explained by the almost complete lack of enzymes involved in
amino acid biosynthesis, de novo nucleotide biosynthesis, and fatty
acid biosynthesis (Table 6 and FIGS. 5A-5R). When the number of
genes in the categories of central intermediary metabolism, energy
metabolism, and fatty acid and phospholipid metabolism are summed,
marked differences in gene content between H. influenzae and M.
genitalium are apparent. For example, whereas the H. influenzae
genome contains 68 genes involved in amino acid biosynthesis, the
M. genitalium genome contains only one. In total, the H. influenzae
genome has 167 genes associated with metabolic pathways whereas the
M. genitalium genome has just 42. A recent analysis of 214 kb of
sequence from Mycoplasma capricolum (Bork, P. et al., Mol.
Microbiol. 16:955 (1995)), a related organism whose genome size is
twice as large as that of M. genitalium, reveals that M. capricolum
contains a number of biosynthetic enzymes not present in M.
genitalium. This observation suggests that M. capricolum's larger
genome confers a greater anabolic capacity.
[0247] M. genitalium is a facultative anaerobe that ferments
glucose and possibly other sugars via glycolysis to lactate and
acetate. Genes that encode all the enzymes of the glycolytic
pathway were identified, including genes for components of the
pyruvate dehydrogenase complex, phosphotransacetylase, and acetate
kinase. The major route for ATP synthesis may be through substrate
level phosphorylation since no cytochromes are present. M.
genitalium also lacks all the components of the tricarboxylic acid
cycle. None of the genes coding for glycogen or
poly-beta-hydroxybutryate production were identified, indicating
limited capacity for carbon and energy storage. The pentose
phosphate pathway also appears limited since only genes encoding
6-phosphogluconate dehydrogenase and transketolase were identified.
The limited metabolic capacity of M. genitalium sharply contrasts
with the complexity of catabolic pathways in H. influenzae,
reflecting the four-fold greater number of genes involved in energy
metabolism found in H. influenzae.
[0248] Transport. The transporters identified in H. influenzae are
specific for a range of nutritional substrates. Using protein
transport as an example, both oligopeptide and amino acid
transporters are represented. One interesting peptide transporter
has homology to a lactococcin transporter (IcnDR3) and related
bacteriocin transporters, suggesting the M. genitalium may export a
small peptide with antibacterial activity. The H. influenzae isolog
of the M. hyorhinis p37 high-affinity transport system also has a
conserved lipid modification site, providing further evidence that
the Mycoplasma binding-protein dependent transport systems are
organized in a manner analogous to gram positive bacteria (Gilson,
E. et al., EMBO J. 7:3971 (1988)).
[0249] Genes encoding proteins that function in the transport of
glucose via the phosphoenolpyruvate:sugar transferase system (PTS)
have been identified in M. genitalium. These include enzyme I (EI),
HPr and sugar specific enzyme IIs (EII) (Postma, P. W. et al.,
Microbiol. Rev. 57:543 (1993)). EIIs consist of a complex of at
least there domains, EIIA, EIIB and EIIC. In some bacteria (e.g.,
E. coli), EIIA is a soluble protein, while in others (Bacillus
subtilis), a single membrane protein contains all three domains,
EIIA, B and C. These variations in the proteins that make up the
Ell complex are due to fusion or splitting of domains during
evolution and are not considered to be mechanistic differences
(Postma, P. W. et al., Microbiol. Rev. 57:543 (1993)). In M.
genitalium EIIA, B, and C are located in a single protein similar
to the protein found in B. subtilis. In Mycoplasma capricolum ptsH,
the gene which encodes for HPr, is located on a monocistronic
transcriptional unit while genes encoding EI (ptsI) and EIIA (crr)
are located on a dicistronic operon (Zhu, P. P. et al., Protein
Sci. 3:2115 (1994); Zhu, P. P. et al., J. Biol. Chem. 268:26531
(1993)). In most bacterial species studied to date, ptsl, ptsH, and
crr are part of a polycistronic operon (pts operon). In M.
genitalium ptsH, ptsI and the gene encoding EIIABC reside at
different locations of the genome and thus each of these genes may
constitute monocistronic transcriptional units. We have also
identified EIIBC component for uptake of fructose; however, other
components of the fructose PTS were not found. Thus, M. genitalium
may be limited to the use of glucose as an energy source. In
contrast, H. influenzae has the ability to use at least six
different sugars as a source of carbon and energy.
[0250] Regulatory Systems. It appears that regulatory systems found
in other bacteria are absent in M. genitalium. For instance,
although two component systems have been described for a number of
gram-positive organisms, no sensor or response regulator genes are
found in the M. genitalium genome. Furthermore, the lack of a heat
shock .sigma. factor raises the question of how the heat shock
response is regulated. Another stress faced by all metabolically
active organisms is the generation of reactive oxygen intermediates
such as superoxide anions and hydrogen peroxide. Although H.
influenzae has an oxyR homologue, as well as catalase and
superoxide dismutase, M. genitalium appears to lack these genes as
well as an NADH peroxidase. The importance of these reactive
intermediate molecules in host cell damage suggests that some as
yet unidentified protective mechanism may exist within the
cell.
[0251] Antigenic variation. Numerous examples exist of microbial
pathogens expressing outer membrane proteins that vary due to DNA
rearrangements as a mechanism for providing antigenic and
functional variations that influence virulence potential
(Bergstrom, S. et al., Proc. Natl. Acad. Sci. USA 83:3890 (1986);
Meier, J. T. et al., Cell 47:61 (1986); Majiwa, P. A. O. et al.,
Nature 297:514 (1982)). Because humans are the natural host for
both M. genitalium and H. influenzae, it was of interest to compare
mechanisms for generating antigenic variation in these organisms.
In H. influenzae, a number of virulence-related genes encoding
membrane proteins contain tandem tetramer repeats that undergo
frequent addition and deletion of one or more repeat units during
replication, such that the reading frame of the gene is changed and
its expression altered (Weiser, J. N. et al., Cell 59:657
(1989)).
[0252] M. genitalium appears to use a different system for evading
host immune responses. The 140 kDa adhesion protein of M.
genitalium is densely clustered at a differentiated tip of this
organism and elicits a strong immune response in humans and
experimentally infected animals (Collier, A. M. et al., Zbl. Bkt.
Suppl. 20:73 (1992)). The adhesion protein (MgPa) operon in M.
genitalium contains a 29 kDa ORF, the MgPa protein (160 kDa) and a
114 kDa ORF with intervening regions of 6 and 1 nt, respectively
(Inamine, J. M. et al., Gene 82:259 (1989)). Based on hybridization
experiments (Dallo, S. F. and Baseman, J. B., Microb. Pathog. 8:371
(1990)), multiple copies of regions of the M. genitalium MgPa gene
and the 114 kDa ORF are known to exist throughout the genome.
[0253] The availability of the complete genomic sequence from M.
genitalium has allowed a comprehensive mapping of the MgPa repeats
(FIGS. 4 and 6). In addition to the complete operon, nine
repetitive elements which are composites of particular regions of
the MgPa operon were found. The percent of sequence identity
between the repeat elements and the MgPa gene ranges from 78%-90%.
In some of the repeats, the MgPa-related sequences are separated in
the genome by a variable length, A-T rich spacer sequence, as has
previously been described (Peterson, S. N., PhD dissertation, Univ.
No. Carolina 1992, Univ. Mi. Dissertation Services #6246). The
sequences contained in the MgPa operon and the nine repeats
scattered throughout the chromosome represent 4.5% of the total
genomic sequence. At first glance this might appear to contradict
the expectation for a minimal genome. However, recent evidence for
recombination between the repetitive elements and the MgPa operon
has been reported (Peterson, S. N. et al., Proc. Natl. Acad. Sci.
USA, in press (1995)). Such recombination may allow M. genitalium
to evade the host immune response through mechanisms that induce
antigenic variation within the population. Since M. genitalium
survives in nature by obtaining essential nutrients from its
mammalian host, an efficient mechanism to evade the immune response
may be a necessary part of this minimal genome.
[0254] The M. genitalium genome contains 93 putatively identified
genes that are apparently not present in H. influenzae. Almost 60%
of these genes have database matches to known or hypothetical
proteins from gram-positive bacteria or other Mycoplasma species,
suggesting that these genes may encode proteins with a restricted
phylogenetic distribution. One hundred seventeen potential coding
regions in M. genitalium have no database match to any sequences in
public archives including the entire H. influenzae genome;
therefore, these likely represent novel. genes in M. genitalium,
and related organisms.
[0255] The predicted coding sequences of the hypothetical ORFs, the
ORFs with motif matches and the ORFs that have no similarities to
known peptide sequences were analyzed. The two programs used were
the Kyte-Doolittle algorithm (Kyte, J. and Doolittle, R. F., J.
Mol. Biol. 157:105 (1982)) with a range of 11 residues, and PSORT
which is available on the WWW site http://psort.nibb.ac.jp. PSORT
predicts the presence of signal sequences by the methods of McGeoch
(McGeoch, D. J., Virus Res. 3:271 (1985)) and von Heijne (von
Heijne, G., Nucl. Acids Res. 14:4683 (1986)), and detects potential
transmembrane domains by the method of Klein et al. (Klein, P. et
al., Biochim. Biophys. Acta 815:468 (1985)). Of a total of 201 ORFs
examined, 90 potential membrane proteins were found. Eleven of them
are predicted to have type I signal peptides, and five type II
signal peptides. Using this approach, at least fifty potential
membrane proteins were identified from the list of ORFs with known
functions. This brings the total number of membrane proteins in M.
genitalium to approximately 140.
[0256] To manage these putative membrane proteins, M. genitalium
has at its disposal a minimal secretary machinery composed of seven
functions: three chaperoning GroEL, DnaK and the trigger factor Tig
(Pugsley, A. P., Microbiol. Rev. 57:50 (1993); Guthrie, B. and
Wickner, W., J. Bacteriol. 172:5555 (1990), an ATPase pilot protein
SecA, one integral membrane protein translocase (SecY), a signal
recognition particle protein (Ffh) and a lipoprotein-specific
signal peptidase LspA (Pugsley, A. P., Microbiol. Rev. 57:50
(1993)). Perhaps the lack of other known translocases like SecE,
SecD, and SecF which are present in E. coli and H. influenzae , is
related to the fact that M. genitalium has a one-layer cell
envelope. Also, the absence of a SecB homologue, the secretory
chaperonin of E. coli, in M. genitalium (it is also absent in B.
subtilis (Collier, D. N. J. Bacteriol. 176:4937 (1994))) might
reflect a difference between gram negative and wall-less Mollicutes
in handling nascent proteins destined for the general secretory
pathway. Considering the presence of several putative membrane
proteins that contain type I signal peptides, the absence of a
signal peptidase I (lepB) is most surprising. A direct electronic
search for the M. genitalium lepB gene using the E. coli lepB and
the B. subtilis sipS (van Dijil, J. M. et al., EMBO J. 11:2819
(1992)) as queries did not reveal any significant similarities.
[0257] There are a number of possible explanations as to why genes
encoding some of the proteins thought to be essential for a
self-replicating organism appear to be absent in M. genitalium. One
possibility is that a limited number of proteins may have adapted
to take on other functions. A second possibility is that certain
proteins thought to be essential for life based on studies in E.
coli are not required in a simpler prokaryote like M. genitalium.
Finally, it may be that sequences from M. genitalium have such a
low similarity to known sequences from other species that matches
are not detectable above a reasonable confidence threshold.
[0258] Determination of the complete genome sequence of M.
genitalium provides a new starting point in understanding the
biology of this and related organisms. Comparison of the genes
expressed in M. genitalium, a simple prokaryote, with those in H.
influenzae, a more complex organism, has revealed a myriad of
differences between these species. Fifty-six percent of the genes
in M. genitalium have apparent isologs in H. influenzae, suggesting
that this subset of the M. genitalium genome may encode the genes
that are truly essential for a self-replicating organism. Notable
among the genes that are conserved between M. genitalium and H.
influenzae are those involved in DNA replication and repair,
transcription and translation, cell division, and basic energy
metabolism via glycolysis. Isologs of these genes are found in
eukaryotes as well.
EEXAMPLE 2
[0259] Production of an Antibody to a Mycoplasma genitalium
Protein
[0260] Substantially pure protein or polypeptide is isolated from
the transfected or transformed cells using any one of the methods
known in the art. The protein can also be produced in a recombinant
prokaryotic expression system, such as E. coli, or can by
chemically synthesized. Concentration of protein in the final
preparation is adjusted, for example, by concentration on an Amicon
filter device, to the level of a few micrograms/ml. Monoclonal or
polyclonal antibody to the protein can then be prepared as
follows:
[0261] Monoclonal Antibody Production by Hybridoma Fusion
[0262] Monoclonal antibody to epitopes of any of the peptides
identified and isolated as described can be prepared from murine
hybridomas according to the classical method of Kohler, G. and
Milstein, C., Nature 256:495 (1975) or modifications of the methods
thereof. Briefly, a mouse is repetitively inoculated with a few
micrograms of the selected protein over a period of a few weeks.
The mouse is then sacrificed, and the antibody producing cells of
the spleen isolated. The spleen cells are fused by means of
polyethylene glycol with mouse myeloma cells, and the excess
unfused cells destroyed by growth of the system on selective media
comprising aminopterin (HAT media). The successfully fused cells
are diluted and aliquots of the dilution placed in wells of a
microtiter plate where growth of the culture is continued.
Antibody-producing clones are identified by detection of antibody
in the supernatant fluid of the wells by immunoassay procedures,
such as ELISA, as originally described by Engvall, E., Meth.
Enzymol. 70:419 (1980), and modified methods thereof. Selected
positive clones can be expanded and their monoclonal antibody
product harvested for use. Detailed procedures for monoclonal
antibody production are described in Davis, L. et al. Basic Methods
in Molecular Biology Elsevier, New York. Section 21-2 (1989).
[0263] Polyclonal Antibody Production by Immunization
[0264] Polyclonal antiserum containing antibodies to heterogeneous
epitopes of a single protein can be prepared by immunizing suitable
animals with the expressed protein described above, which can be
unmodified or modified to enhance immunogenicity. Effective
polyclonal antibody production is affected by many factors related
both to the antigen and the host species. For example, small
molecules tend to be less immunogenic than other and may require
the use of carriers and adjuvant. Also, host animals vary in
response to site of inoculations and dose, with both inadequate or
excessive doses of antigen resulting in low titer antisera. Small
doses (ng level) of antigen administered at multiple intradermal
sites appears to be most reliable. An effective immunization
protocol for rabbits can be found in Vaitukaitis, J. et al., J.
Clin. Endocrinol. Metab. 33:988-991 (1971).
[0265] Booster injections can be given at regular intervals, and
antiserum harvested when antibody titer thereof, as determined
semi-quantitatively, for example, by double immunodifflision in
agar against known concentrations of the antigen, begins to fall
(See Ouchterlony, O. et al., Chap. 19 in: Handbook ofExperimental
Immunology, Wier, D., ed, Blackwell (1973)). Plateau concentration
of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum
(about 12 .mu.M). Affinity of the antisera for the antigen is
determined by preparing competitive binding curves, as described,
for example, by Fisher, D., Chap. 42 in: Manual of Clinical
Immunology, second edition, Rose and Friedman, (eds.), Amer. Soc.
For Microbio., Washington, D.C. (1980).
[0266] Antibody preparations prepared according to either protocol
are useful in quantitative immunoassays which determine
concentrations of antigen-bearing substances in biological samples;
they are also used semi-quantitatively or qualitatively to identify
the presence of antigen in a biological sample.
EXAMPLE 3
[0267] Preparation of PCR Primers and Amplification of DNA
[0268] Various fragments of the Mycoplasma genitalium genome, such
as those disclosed in Tables 1a, 1b, 1c and 2 can be used, in
accordance with the present invention, to prepare PCR primers for a
variety of uses. The PCR primers are preferably at least 15 bases,
and more preferably at least 18 bases in length. When selecting a
primer sequence, it is preferred that the primer pairs have
approximately the same G/C ratio, so that melting temperatures are
approximately the same. The PCR primers and amplified DNA of this
Example find use in the examples that follow.
EXAMPLE 4
[0269] Gene Expression from DNA Sequences Corresponding to ORFs
[0270] A fragment of the Mycoplasma genitalium genome provided in
Tables 1a, 1b, 1c and 2 is introduced into an expression vector
using conventional technology (techniques to transfer cloned
sequences into expression vectors that direct protein translation
in mammalian, yeast, insect or bacterial expression systems are
well known in the art). Commercially available vectors and
expression systems are available from a variety of suppliers
including Stratagene (La Jolla, Calif.), Promega (Madison, Wis.),
and Invitrogen (San Diego, Calif.). If desired, to enhance
expression and facilitate proper protein folding, the codon context
and codon pairing of the sequence may be optimized for the
particular expression organism, as explained by Hatfield et al.,
U.S. Pat. No. 5,082,767, which is hereby incorporated by
reference.
[0271] The following is provided as one exemplary method to
generate polypeptide(s) from cloned ORFs of the Mycoplasma genome
fragment. Since the ORF lacks a poly A sequence because of the
bacterial origin of the ORF, this sequence can be added to the
construct by, for example, splicing out the poly A sequence from
pSG5 (Stratagene) using BglI and SalI restriction endonuclease
enzymes and incorporating it into the mammalian expression vector
pXT1 (Stratagene) for use in eukaryotic expression systems. pXT1
contains the LTRs and a portion of the gag gene from Moloney Murine
Leukemia Virus. The position of the LTRs in the construct allow
efficient stable transfection. The vector includes the Herpes
Simplex thymidine kinase promoter and the selectable neomycin gene.
The Mycoplasma DNA is obtained by PCR from the bacterial vector
using oligonucleotide primers complementary to the Mycoplasma DNA
and containing restriction endonuclease sequences for PstI
incorporated into the 5' primer and BgllI at the 5' end of the
corresponding Mycoplasma DNA 3' primer, taking care to ensure that
the Mycoplasma DNA is positioned such that its followed with the
poly A sequence. The purified fragment obtained from the resulting
PCR reaction is digested with PstI, blunt ended with an
exonuclease, digested with BglII, purified and ligated to pXT1, now
containing a poly A sequence and digested BglII.
[0272] The ligated product is transfected into mouse NIH 3T3 cells
using Lipofectin (Life Technologies, Inc., Grand Island, N.Y.)
under conditions outlined in the product specification. Positive
transfectants are selected after growing the transfected cells in
600 .mu.g/ml G418 (Sigma, St. Louis, Mo.). The protein is
preferably released into the supernatant. However if the protein
has membrane binding domains, the protein may additionally be
retained within the cell or expression may be restricted to the
cell surface.
[0273] Since it may be necessary to purify and locate the
transfected product, synthetic 15-mer peptides synthesized from the
predicted Mycoplasma DNA sequence are injected into mice to
generate antibody to the polypeptide encoded by the Mycoplasma
DNA.
[0274] If antibody production is not possible, the Mycoplasma DNA
sequence is additionally incorporated into eukaryotic expression
vectors and expressed as a chimeric with, for example,
.beta.-globin. Antibody to .beta.-globin is used to purify the
chimeric. Corresponding protease cleavage sites engineered between
the .beta.-globin gene and the Mycoplasma DNA are then used to
separate the two polypeptide fragments from one another after
translation. One useful expression vector for generating
.beta.-globin chimerics is pSG5 (Stratagene). This vector encodes
rabbit .beta.-globin. Intron II of the rabbit .beta.-globin gene
facilitates splicing of the expressed transcript, and the
polyadenylation signal incorporated into the construct increases
the level of expression. These techniques as described are well
known to those skilled in the art of molecular biology. Standard
methods are published in methods texts such as Davis et al. and
many of the methods are available from the technical assistance
representatives from Stratagene, Life Technologies, Inc., or
Promega. Polypeptide may additionally be produced from either
construct using in vitro translation systems such as In vitro
Express.TM. Translation Kit (Stratagene).
[0275] While the present invention has been described in some
detail for purposes of clarity and understanding, one skilled in
the art will appreciate that various changes in formn and detail
can be made without departing from the true scope of the
invention.
[0276] All patents, patent applications and publications recited
herein are hereby incorporated by reference.
1TABLE 1(a) UID end5 end3 db_match db_match name per_id per_sim
gene_len MG006 8552 9181 SP-P00572 thymidylate kinase (CDC8)
{Saccharomyces cerevisiae} 27.5862 51.7241 630 MG009 11252 12037
GB: D26185.sub.-- hypothetical protein (GB: D26185_102) {Bacillus
subtilis} 35.4331 55.1181 786 102 MG010 12069 12722 SP: P33655 DNA
primase (dnaE) {Clostridium acetobutylicum} 25.731 53.2164 654
MG012 14247 13573 SP: P17116 ribosomal protein S6 modification
protein (rimK) {Escherichia coli} 31.4961 54.3307 675 MG013 15217
14399 GB: D10588_1 5,10-methylene-tetrahydrofolate dehydrogenase
(folD) {Escherichia 33.0472 53.2189 819 coli} MG015 17474 19240 SP:
P27299 transport ATP-binding protein (msbA) {Escherichia coli}
32.2382 57.4949 1767 MG023 26478 27341 GB: M22039_4
fructose-bisphosphate aldolase (tsr) {Bacillus subtilis} 45.9649
65.9649 864 MG024 27345 28445 GP: U02423_1 GTP-binding protein
(gtpl) {Escherichia coli} 46.8401 67.658 1101 MG032 36978 38975 GB:
M63489_1 ATP-dependent nuclease (addA) {Bacillus subtilis} 26.8293
54.2683 1998 MG033 39242 39901 GB: M99611_2 glycerol uptake
facilitator (glpF) {Bacillus subtilis} 35.8974 55.3846 660 MG034
40514 39876 GB: M97678_5 thymidine kinase (tdk) {Bacillus subtilis}
48.1283 69.5187 639 MG035 40543 41784 GB: U00011_2 histidyl-tRNA
synthetase (hisS) {Mycobacterium leprae} 30.7107 50.7614 1242 MG038
46277 44754 GB: L19201_68 glycerol kinase (glpK) {Escherichia coli}
46.8254 70.2381 1524 MG039 47422 46271 PIR: S48379
glycerol-3-phospate dehydrogenase (GUT2) {Saccharomyces 43.2099
60.4938 1152 cerevisiae} MG041 49377 49640 GB: L22432_2
phosphohistidinoprotein-hexose phosphotransferase (ptsH) 48.8636
70.4545 264 {Mycoplasma capricolum} MG042 50060 51517 GB: M64519_1
spermidine/putrescine transport ATP-binding protein (potA) 41.9231
65.3846 1458 {Escherichia coli} MG043 51525 52379 GB: M64519_2
spermidine/putrescine transport system permease protein (potB)
26.5116 57.2093 855 {Escherichia coli} MG044 52366 53217 GB:
M64519_3 spermidine/putrescine transport system permease protein
(potC) 29.4574 58.1395 852 {Escherichia coli} MG046 54658 55602 GB:
M62364_1 sialoglycoprotease (gcp) {Pasteurella haemolytica} 36.6013
59.4771 945 MG048 58310 56973 SP: P37105 signal recognition
particle protein (ffh) {Bacillus subtilis} 43.0206 66.1327 1338
MG049 58117 59076 GB: U14003.sub.-- purine-nucleoside phosphorylase
(deoD) {Escherichia coli} 44.7826 63.0435 960 295 MG050 59083 59751
GB: X13544_1 deoxyribose-phosphate aldolase (deoC) {Mycoplasma
pneumoniae} 83.0357 91.5179 669 MG056 65731 64901 GB: D26185_99
hypothetical protein (GB: D26185_99) {Bacillus subtilis} 30.2583
54.6125 831 MG057 66249 65716 GB: D26185.sub.-- hypothetical
protein (GB: D26185_104) {Bacillus subtilis} 28.9017 28.9017 534
104 MG067 81047 82594 GB: D00730_1 glutamic acid specific protease
(SPase) {Staphylococcus aureus} 28.8462 48.0769 1548 MG070 91065
91916 SP: P34831 ribosomal protein S2 (rpS2) {Spirulina platensis}
34.8 55.2 852 MG077 103104 104324 SP: P24138 oligopeptide transport
system permease protein (oppB) {Bacillus 28.0528 58.4158 1221
subtilis} MG078 104320 105447 SP: P26904 oligopeptide transport
system permease protein (dciAC) {Bacillus 33.4572 55.0186 1128
subtilis} MG079 105452 106657 SP: P18765 oligopeptide transport
ATP-binding protein (amiE) {Streptococcus 47.9412 67.9412 1206
pneumoniae} MG081 109262 109672 SP: P29395 ribosomal protein L11
(RPL11) {Thermotoga maritima} 51.7986 71.9424 411 MG085 111790
112722 PIR: S24760 hydroxymethylglutaryl-CoA reductase (NADPH)
{Nicotiana 23.3216 49.1166 933 sylvestris} MG086 112718 113863 GB:
L13259_2 prolipoprotein diacylglyceryl transferase (lgt)
{Salmonella 29.1262 53.8835 1146 typhimurium} MG091 117553 118032
GB: U04997_2 single-stranded DNA binding protein (ssb) {Haemophilus
21.7949 41.6667 480 influenzae} MG092 118025 118339 GB:
U14003.sub.-- ribosomal protein S18 (rpS18) {Escherichia coli}
45.4545 68.1818 315 114 MG093 118345 118794 GB: M57623_1 ribosomal
protein L9 (rpL9) {Bacillus stearothermophilus} 32.8859 56.3758 450
MG099 125852 127282 GB: M61151_1 hydrolase (aux2) {Agrobacterium
rhizogenes} 32.1212 51.8182 1431 MG106 134826 134149 SP: P27251
formylmethionine deformylase (def) {Escherichia coli} 36.9369
68.4685 678 MG107 134558 135334 GB: L10328_14 5'guanylate kinase
(gmk) {Escherichia coli} 42.623 65.0273 777 MG114 141345 142052 GB:
M12299_2 phosphatidylglycerophosphate synthase (pgsA) {Escherichia
coli} 29.2994 57.3248 708 MG118 143935 144954 SP: P09147
UDP-glucose 4-epimerase (galE) {Escherichia coli} 34.0557 53.87
1020 MG121 148238 149155 SP: P32720 hypothetical protein (SP:
P32720) {Escherichia coli} 30.8824 50.7353 918 MG125 153081 153935
GB: L10328_61 hypothetical protein (GB: L10328_61) {Escherichia
coli} 31.9149 48.227 855 MG126 154962 153922 GB: M24068_1
tryptophanyl-tRNA synthetase (trpS) {Bacillus subtilis} 41.1585
61.5854 1041 MG127 154998 155432 SP: P19434 hypothetical protein
(SP: P19434) {Streptomyces 25.9615 49.0385 435 viridochromogenes}
MG128 155443 156219 GB: U00021_19 hypothetical protein (GB:
U00021_19) {Mycobacterium leprae} 27.7027 49.3243 777 MG129 156222
156572 GB: U12340_1 PTS glucose-specific permease {Bacillus
stearothermophilus} 25.4545 51.8182 351 MG130 156565 158016 GB:
M91593_1 hypothetical protein (GB: M91593_1) {Mycoplasma mycoides}
30.6773 55.7769 1452 MG131 158022 158243 GB: M31161_3 hypothetical
protein (GB: M31161_3) {Spiroplasma citri} 21.5909 56.8182 222
MG132 159005 158583 SP: P32083 hypothetical protein (SP: P32083)
{Mycoplasma hyorhinis} 30.0971 56.3107 423 MG136 160964 162431 GB:
D26185.sub.-- lysyl-tRNA synthetase (lysS) {Bacillus subtilis}
45.6212 68.4318 1470 144 UID end5 end3 db_match db_match name
per_id per_sim gene_len MG137 162376 163587 GP: L41518_4
dTDP-4-dehydrorhamnose reductase (rfbD) {Klebsiella 32.1622 55.9459
1212 pneumoniae} MG139 165470 167176 GB: L18927_2 hypothetical
protein (GB: L18927_2) {Buchnera aphidicola} 28.5714 62.8571 1707
MG143 182853 183188 SP: P09170 hypothetical protein (SP: P09170)
{Escherichia coli} 25 53.7037 336 MG145 184055 184861 GB: M35367_1
protein X {Pseudomonas fluorescens} 29.0698 48.4496 807 MG148
187304 188530 GB: L18965_6 hypothetical protein (GB: L18965_6)
{Thermophilic bacterial sp.} 25.2874 52.8736 1227 MG150 190048
190365 SP: P38518 ribosomal protein S10 (rpS10) {Thermotoga
maritima} 48.913 71.7391 318 MG152 191145 191777 SP: P28601
ribosomal protein L4 (rpL4) {Bacillus stearothermophilus} 39.2345
63.1579 633 MG153 191784 192101 SP: P04454 ribosomal protein L23
(rpL23) {Bacillus stearothermophilus} 38.7097 62.3656 318 MG154
192104 192958 SP: P04257 ribosomal protein L2 (rpL2) {Bacillus
stearothermophilus} 58.7814 72.4014 855 MG155 192961 193221 GB:
X02613_6 ribosomal protein S19 (rpS19) {Escherichia coli} 58.6207
77.0115 261 MG156 193227 193658 GB: M74770_4 ribosomal protein L22
(rpL22) {Mycoplasma-like organism} 49.0385 67.3077 432 MG157 193664
194467 SP: P02353 ribosomal protein S3 (rpS3) {Mycoplasma
capricolum} 46.729 67.2897 804 MG158 194476 194889 SP: P02415
ribosomal protein L16 (rpL16) {Mycoplasma capricolum} 63.5037
78.1022 414 MG159 194892 195491 SP: P38514 ribosomal protein L29
(rpL29) {Thermotoga maritima} 41.6667 65 600 MG160 195494 195748
SP: P10131 ribosomal protein S17 (rpS17) {Mycoplasma capricolum}
51.1905 67.8571 255 MG161 195755 196120 SP: P04450 ribosomal
protein L14 (rpLl4) {Bacillus stearothermophilus} 63.1148 86.0656
366 MG162 196123 196446 SP: P04455 ribosomal protein L24 (rpL24)
{Bacillus stearothermophilus) 44.5783 66.2651 324 MG163 196455
196994 SP: P08895 ribosomal protein L5 (rpL5) {Bacillus
stearothermophilus} 57.5419 77.095 540 MG164 197000 197182 GB:
X06414_15 ribosomal protein S14 (rpS14) {Mycoplasma capricolum}
70.4918 83.6066 183 MG165 197179 197601 SP: P04446 ribosomal
protein S8 (rpS8) {Mycoplasma capricolum} 46.875 71.0938 423 MG166
197611 198162 SP: P04448 ribosomal protein L6 (rpL6) {Mycoplasma
capricolum} 46.9945 66.6667 552 MG167 198167 198511 GB: M57624_1
ribosomal protein L18 (rpL18) {Bacillus stearothermophilus} 42.9825
57.8947 345 MG169 199160 199609 SP: P10138 ribosomal protein L15
(rpL15) {Mycoplasma capricolum} 41.8919 66.2162 450 MG170 199612
201036 SP: P10250 preprotein translocase secY subunit (secY)
{Mycoplasma 38.7892 68.1614 1425 capricolum} MG171 201033 201674
GB: M88104_2 adenylate kinase (adk) {Bacillus stearothermophilus}
32.2115 57.6923 642 MG172 201680 202423 GB: D00619_5 methionine
amino peptidase (map) {Bacillus subtilis} 36.2903 58.4677 744 MG173
202426 202635 GB: M26414_1 initiation factor 1 (infA) {Bacillus
subtilis} 48.5294 67.6471 210 MG174 202649 202759 SP: P38015
ribosomal protein L36 (rpL36) {Chlamydia trachomatis} 78.3784
83.7838 111 MG177 203516 204499 GB: M26414_5 RNA polymerase alpha
core subunit (rpoA) {Bacillus subtilis} 39.3939 65.9933 984 MG178
204515 204515 GB: M26414_6 ribosomal protein L17 (rpL17) {Bacillus
subtilis} 34.7826 59.1304 369 MG179 204873 205694 SP: P11599
haemolysin secretion ATP-binding protein (hlyB) {Proteus vulgaris}
34.5992 62.0253 822 MG187 216762 218516 GB: M77351_7
ATP-bindingprotein(msmK) {Streptococcus mutans} 40.5325 65.6805
1755 MG188 218522 219508 GB: M77351_4 membrane protein (msmF)
{Streptococcus mutans} 22.4719 51.6854 987 MG189 219435 220436 GB:
M77351_5 membrane protein (msmG) {Streptococcus mutans} 27.1429
52.8571 1002 MG196 235635 236057 GB: X16188_1 translation
initiation factor 1F3 (infC) {Bacillus stearothermophilus} 31.3433
62.6866 423 MG197 236063 236239 PIR: S05347 ribosomal protein L35
(rpL35) {Bacillus stearothermophilus} 60 72.7273 177 MG198 236245
236616 SP: Q05427 ribosomal protein L20 (rpL20) {Mycoplasma
fermentans} 57.5221 73.4513 372 MG201 239163 239813 GB: M84964_2
heat shock protein (grpE) {Bacillus subtilis} 31.677 49.6894 651
MG205 245596 244568 GB: M84964_1 hypothetical protein (GB:
M84964_1) {Bacillus subtilis} 30.9942 58.1871 1029 MG213 252579
253991 GB: L09228_16 hypothetical protein(GB: L09228_16) {Bacillus
subtilis} 27.1186 54.661 1413 MG214 253978 254598 GB: L09228_17
hypothetical protein (GB: L09228_17) {Bacillus subtilis} 34.8571
59.4286 621 MG215 254620 255588 SP: P20275 6-phosphofructokinase
(pfk) {Spiroplasma citri} 39.441 63.0435 969 MG217 258040 259155
SP: P29126 bifunctional endo-1,4-beta-xylanase xyla precursor
(xynA) 37.5839 48.9933 1116 {Ruminococcus flavefaciens} MG219
265596 266039 GB: M87491_1 IgAl protease {Haemophilus influenzae}
32.2314 51.2397 444 MG220 266382 266077 GB: Z26883_1
pre-procytotoxin (vacA) {Helicobacter pylori} 36.1446 51.8072 306
MG222 267080 268006 GB: D10483_63 hypothetical protein (GB:
D10483_63){Escherichia coli} 35.1974 56.5789 927 MG224 269249
270355 GB: U06462_1 cell division protein (ftsZ) {Staphylococcus
aureus} 30.8824 50.7353 1107 MG234 279491 279802 GB: K02665_2
ribosomal protein L27 (rpL27) {Bacillus subtilis} 64.3678 80.4598
312 MG235 279798 280670 SP: P12638 endonuclease IV (nfo)
{Escherichia coli} 29.368 51.3011 873 MG245 293446 293940 GB:
M12965_1 hypothetical protein (GB: M12965_1) {Escherichia coli}
33.8462 56.9231 495 MG247 295484 294768 SP: P31056 hypothetical
protein (SP: P31056) {Escherichia coli} 32.973 56.2162 717 MG248
296127 295474 GP: U17284_2 major sigma factor (rpoD) {Listeria
monocytogenes} 28.4848 51.5152 654 MG251 300802 299465 GB: L08106_1
glycyl-tRNA synthetase {Bombyx mori} 35.8974 56.1772 1338 MG252
301550 300825 GP: Z33076_2 rRNA methylase {Mycoplasma capricolum}
38.8626 59.7156 726 MG253 302839 301556 GB: D26185.sub.--
cysteinyl-tRNA synthetase (cysS) {Bacillus subtilis} 34.3458
56.3084 1284 156 MG257 307635 307925 GB: L19201_78 ribosomal
protein L31 (rpL31) {Escherichia coli} 37.3134 61.194 291 MG258
307928 309004 GB: M11519_1 peptide chain release factor 1 (RF-1)
{Escherichia coli} 43.1677 66.4596 1077 MG259 309008 310375 GB:
D28567_2 protoporphyrinogen oxidase (hemK) {Escherichia coli}
30.5732 54.1401 1368 MG260 310509 312803 GB: Z32651_1 hypothetical
protein (GB: Z32651_1) {Mycoplasma pneumoniae} 57.1429 71.4286 2295
MG262 318330 319202 GB: L11920_1 DNA polymerase I (polI)
{Mycobacterium tuberculosis} 29.9419 47.9651 873 MG264 321044
321637 GB: M64324_1 6-phosphogluconate dehydrogenase (gnd)
{Escherichia coli} 29.8507 47.7612 594 MG265 322412 321579 GB:
L10328_61 hypothetical protein (GB: L10328_61) {Escherichia coli}
27.193 48.6842 834 MG268 325877 325194 GB: U01881_2
deoxyguanosine/deoxyadenosine kinase(I) subunit 2 {Lactobacillus
29.5181 49.3976 684 acidophilus} MG270 328442 327435 GB:
U14003.sub.-- hypothetical protein (GB: U14003_297) {Escherichia
coli} 38.2838 57.7558 1008 297 MG272 330984 329833 GB: M81753_3
dihydrolipoamide acetyltransferase (pdhC) {Acholeplasma 45.1524
62.0499 1152 laidlawii} MG273 332214 331237 GB: M81753_2 pyruvate
dehydrogenase E1-beta subunit (pdhB) {Acholeplasma 55.0314 76.7296
978 laidlawii} MG274 333308 332235 GB: M81753_1 pyruvate
dehydrogenase E1-alpha subunit (pdhA) {Acholeplasma 42.9825 61.1111
1074 laidlawii} MG277 338323 335414 GB: L16960_2 spore germination
apparatus protein (gerBB) {Bacillus subtilis} 31.2 55.2 2910 MG280
341920 341177 GB: Z35086_1 sensory rhodopsin II transducer (htrII)
{Natronobacterium 15.7143 46.6667 744 pharaonis } MG288 353034
351793 GB: L04466_1 protein L {Peptostreptococcus magnus} 31.1475
50.8197 1242 MG290 355119 355853 SP: P15361 ATP-binding protein P29
{Mycoplasma hyorhinis} 32.3009 58.8496 735 MG292 360592 357893 GB:
J01581_1 alanyl-tRNA synthetase (alaS) {Escherichia coli} 33.8403
55.64 2700 MG295 364022 362922 SP: P25745 hypothetical protein (SP:
P25745) {Escherichia coli} 34.7107 57.0248 1101 MG299 369694 368735
SP: P39646 phosphotransacetylase (pta) {Clostridium acetobutylicum}
44.6541 63.522 960 MG303 373998 372928 GB: M61017_1 membrane
transport protein (glnQ) {Bacillus stearothermophilus} 31.982
54.955 1071 MG304 374741 373983 GB: U13043_1 membrane associated
ATPase (cbiO) {Propionibacterium 30.0448 53.8117 759
freudenreichii} MG310 386462 387265 GB: D11037_1 proline
iminopeptidase (pip) {Bacillus coagulans} 29.2079 51.4851 804 MG311
387892 387278 GB: M59358_1 ribosomal protein S4 (rpS4) {Bacillus
subtilis} 43 65.5 615 MG313 392023 391397 GP: L38997_5
cytadherence-accessory protein (hmwl) {Mycoplasma pneumoniae}
53.8462 79.8077 627 MG315 394550 393660 GP: L38997_3 cytadherence
accessory protein (hmwl) {Mycoplasma pneumoniae} 44.3878 69.898 891
MG316 395583 394477 GB: L15202_4 competence locus E (comE3)
{Bacillus subtilis} 30.4933 52.4664 1107 MG322 405398 403725 GB:
D17462_11 Na+ATPase subunit J (ntpJ) {Enterococcus hirae} 31.0811
56.3063 1674 MG323 405455 406135 GB: D37799_6 hypothetical protein
(GB: D37799_6) {Bacillus subtilis} 27.5701 54.2056 681 MG325 408953
408795 SP: P23375 ribosomal protein L33 (rpL33) {Bacillus
stearothermophilus} 58.1395 69.7674 159 MG326 409857 408973 GB:
Z18629_1 hypothetical protein (GB: Z18629_1) {Bacillus subtilis}
27.0758 52.7076 885 MG329 414318 412975 GB: U00021_5 hypothetical
protein (GB: U00021_5) {Mycobacterium leprae} 32.1839 54.2529 1344
MG332 416329 415613 GB: D10165_3 hypothetical protein (GB:
D10165_3) {Escherichia coli} 26.9231 49.1453 717 MG346 443922
444419 GB: M65289_3 hypothetical protein (GB: M65289_3) {Bacillus
stearothermophilus} 37.9747 60.1266 498 MG347 444413 445042 SP:
P32049 hypothetical protein (SP: P32049) {Escherichia coli} 28.4615
46.9231 630 MG351 449665 450216 SP: P37981 inorganic
pyrophosphatase (ppa) {Thermoplasma acidophilum} 38.8535 61.7834
552 MG355 453757 451616 GB: M29364_2 ATP-dependent protease binding
subunit (clpB) {Escherichia coli} 47.7337 70.6799 2142 MG356 454753
453914 GB: M27280_1 lic-1 operon protein (licA) {Haemophilus
influenzae} 27.7778 56.25 840 MG359 457347 458267 GB: M21298_2
Holliday junction DNA helicase (ruvB) {Escherichia coli} 34.6939
64.966 921 MG360 459495 458263 SP: P14303 UV protection protein
(mucB) {Salmonella typhimurium} 22.0859 48.1595 1233 MG363 460497
460667 GB: M29698_2 ribosomal protein L32 (rpL32) {Escherichia
coli} 48.1481 62.963 171 MG364 461015 461686 GB: M95954_1
mobilization protein (mobl3) {Leuconostoc oenos} 30.8725 53.6913
672 MG367 465434 464649 GB: X02673_1 ribonuclease III (mc)
{Escherichia coli} 30.1724 65.5172 786 MG380 478999 479574 GB:
L10328_105 glucose inhibited division protein (gidB)
{Escherichia coli} 24.8276 51.7241 576 MG382 480691 481329 SP:
P31218 uridine kinase (udk) {Escherichia coli} 34.4828 62.5616 639
MG383 482075 481332 GB: M15811_1 sporulation protein (outB)
{Bacillus subtilis} 36.3636 54.9784 744 MG384 483369 482071 GB:
M24537_2 GTP-binding protein (obg) {Bacillus subtilis} 39.627
62.0047 1299 MG387 490711 489842 SP: P37214 GTP-binding protein era
homolog (spg) {Streptococcus mutans} 27.3859 51.0373 870 MG396
500719 500264 GB: M80797_2 galactosidase acetyltransferase (lacA)
{Streptococcus mutans} 40.5797 57.971 456 MG398 502823 502425 SP:
P33255 ATP synthase epsilon chain (atpC) {Mycoplasma gallisepticum}
36.9231 55.3846 399 MG402 507201 506674 SP: P33254 ATP synthase
delta chain (atpH) {Mycoplasma gallisepticum} 33.9181 58.4795 528
MG403 507820 507197 SP: P33256 ATP synthase B chain (atpF)
{Mycoplasma gallisepticum} 36.5979 66.4948 624 MG404 508131 507826
SP: P33258 ATP synthase C chain (atpE) {Mycoplasma gallisepticum}
50 74.359 306 MG407 510836 509463 GB: L29475_4 enolase (eno)
{Bacillus subtilis} 54.0793 74.1259 1374 MG408 510903 511373 SP:
P14930 pilin repressor (pilB) {Neisseria gonorrhoeae} 49.2188 68.75
471 MG409 512050 511376 GB: L10328_88 peripheral membrane protein U
(phoU) {Escherichia coli} 27.027 48.6486 675 MG420 524144 523365
GB: D26185_83 DNA polymerase III subunit (dnaH) {Bacillus subtilis}
49.115 68.5841 780 MG424 531479 531222 SP: P05766 ribosomal protein
S15 (BS18) {Bacillus stearothermophilus} 48.1481 71.6049 258 MG426
533040 533231 GB: L12244_2 ribosomal protein L28 (rpL28) {Bacillus
subtilis} 36.0656 59.0164 192 MG429 536036 534321 GB: M69050_2
PEP-dependent HPr protein kinase phosphoryltransferase (ptsI)
46.4789 66.5493 1716 {Staphylococcus carnosus} MG430 537563 536043
GB: L29475_3 phosphoglycerate mutase (pgm) {Bacillus subtilis}
45.1866 62.4754 1521 MG432 539546 538353 SP: P27712 hypothetical
protein (SP: P27712) {Spiroplasma citri} 28.436 48.8152 1194 MG433
539632 540525 GB: M31161_2 elongation factor Ts (tsf) {Spiroplasma
citri} 39.0572 62.6263 894 MG434 540848 541237 GB: D26562_56 mukB
suppressor protein (smbA) {Escherichia coli} 40.8696 61.7391 390
MG435 541240 541788 GB: D26562_57 ribosome releasing factor (frr)
{Escherichia coli} 34.9112 57.3965 549 MG438 543004 544152 GB:
J01631_1 restriction-modification enzyme EcoD specificity subunit
(hsdS) 24.5734 45.7338 1149 {Escherichia coli} MG442 547690 546881
GB: U00021_5 hypothetical protein (GB: U00021_5) {Mycobacterium
leprae} 26.8966 42.069 810 MG443 548849 547665 GB: D16311_1
hypothetical protein (GB: D16311_1) {Bacillus subtilis} 26.1818 52
1185 MG444 549224 548868 SP: P30529 ribosomal protein L19 (rpL19)
{Bacillus stearothermophilus} 49.1071 69.6429 357 MG445 549903
549211 SP: P36245 tRNA (guanine-N1)-methyltransferase (trmD)
{Salmonella 40.8072 64.1256 693 typhimurium} MG446 550172 549906
SP: P21474 ribosomal protein S16 (BS17) {Bacillus subtilis} 48.7805
64.6341 267 MG448 552897 552448 GB: Z33052_1 pilin repressor (pilB)
{Mycoplasma capricolum} 53.4884 72.093 450 MG454 557770 557306 SP:
P23929 osmotically inducible protein (osmC) {Escherichia coli}
28.4091 51.1364 465 MG457 562602 560497 GB: D26185.sub.-- cell
division protein (ftsH) {Bacillus subtilis} 49.7445 68.1431 2106
132 MG461 566203 564929 GB: X73124_94 hypothetical protein (GB:
X73124_94) {Bacillus subtilis} 40 64.2857 1275 MG464 569554 568400
GB: D14982_3 hypothetical protein (GB: D14982_3) {Mycoplasma
capricolum} 32.3699 53.7572 1155 MG465 569912 569529 GB: D14982_2
RNaseP C5 subunit (rnpA) {Mycoplasma capricolum} 40 58.75 384 MG466
570027 569884 GB: L10328_67 ribosomal protein L34 (rpL34)
{Escherichia coli} 67.3913 80.4348 144 MG470 580030 579224 GB:
D26185_55 SpoOJ regulator {Bacillus subtilis} 27.8884 53.3865
807
[0277]
2TABLE 1(b) UID end5 end3 db_match db_match name per_sim per_id
match_info MG002 1829 2758 SP: P35514 heat shock protein (dnaJ)
{Lactococcus 40 61.6667 MG002(1-930 of 930) lactis} GB:
U09251(298-1227 of 6140) MG003 2846 4795 GB: U09251_3 DNA gyrase
subunit B (gyrB) {Mycoplasma 99.3846 99.3846 MG003(1-1950 of 1950)
genitalium} GB: U09251(1315-3264 of 6140) MG004 4813 7320 GB:
U09251_4 DNA gyrase subunit A (gyrA) {Mycoplasma 99.8804 99.8804
MG004(1-2508 of 2508) genitalium} GB: U09251(3282-5789 of 6140)
MG191 221571 225902 SP: P20796 attachment protein, MgPa operon
(mgp) 100 100 MG191(1-4332 of 4332) {Mycoplasma genitalium} GB:
M31431(1066-5397 of 8760) MG192 225907 229062 SP: P22747 114 kDa
protein, MgPa operon (mgp) 100 100 MG192(1-3156 of 3156)
{Mycoplasma genitalium} GB: M31431(5402-8557 of 8760) MG232 278904
279203 SP: P26908 ribosomal protein L21 (rpL21) {Bacillus 37.8947
65.2632 MG232(1-300 of 300) subtilis} GB: U02141(138-437 of 827)
MG233 279199 279495 GP: U02141_2 ribosomal protein L21 homolog 100
100 MG233(1-297 of 297) {Mycoplasma genitalium} GB: U02141(433-729
of 827) MG287 348882 349133 SP: P04686 nodulation protein F (nodE)
{Rhizobium 34.9398 56.6265 MG287(1-252 of 252) leguminosarum} GB:
U01810(152-403 of 917) MG417 521868 521473 SP: P07842 ribosomal
protein S9 (rpS9) {Bacillus 51.9685 71.6535 MG417(1-396 of 396
stearothermophilus} GB: U01744(127-522 of 620)
[0278]
3TABLE 1(c) UID end5 end3 db_match db_match name per_sim per_id
match_info MG001 1026 1826 GB: U09251_1 DNA polymerase III beta
subunit (dnaN) 100 100 MG001(507-801 of {Mycoplasma genitalium}
801) GB: U09251 (1-295 of 6140) MG005 7295 8545 GB: D26185_77
seryl-tRNA synthetase (serS) {Bacillus subtilis} 42.615 66.3438
MG005(1-377 of 1251) GB: U09251(5764- 6140 of 6140) MG005 7295 8545
GB: D26185_77 seryl-tRNA synthetase (serS) {Bacillus subtilis}
42.615 66.3438 MG005(16-337 of 1251) GB: U02210(1-322 of 322) MG007
9157 9918 GB: D26185_83 DNA polymerase III subunit (dnaH) {Bacillus
subtilis} 22.695 45.3901 MG007(762-711 of 762) GB: U02216(270- 321
of 321) MG008 9924 11249 GB: D26185_60 thiophene and furan oxidizer
(tdhF) {Bacillus subtilis} 31.9101 59.7753 MG008(264-1 of 1326) GB:
U02216(1-264 of 321) MG011 13565 12705 -- -- -- -- MG011(473-767 of
861) GB: U02257(2- 296 of 296) MG014 15556 17424 SP: P27299
transport ATP-binding protein (msbA) {Escherichia coli} 28.0702
52.6316 MG014(1005-678 of 1869) GB: U02235(1- 326 of 326) MG018
21063 22343 SP: P32333 helicase (motl) {Saccharomyces cerevisiae}
36.6972 60.0917 MG018(1281-1067 of 1281) GB: U01723(89- 304 of 304)
MG018 21063 22343 SP: P32333 helicase (mot1) {Saccharomyces
cerevisiae} 36.6972 60.0917 MG018(409-105 of 1281) GB: U02179(1-
305 of 305) MG018 21063 22343 SP: P32333 helicase (mot1)
{Saccharomyces cerevisiae} 36.6972 60.0917 MG018(592-896 of 1281)
GB: U01757(1- 305 of 305) MG019 22388 23554 SP: P35514 heat shock
protein (dnaJ) {Lactococcus lactis} 33.9779 51.105 MG019(44-1 of
1167) GB: U01723(1- 44 of 304) MG020 23541 24464 GB: Z25461_2
proline iminopeptidase (pip) {Neisseria gonorrhoeae} 37.5439
55.7895 MG020(723-924 of 924) GB: U02229(1- 202 of 333) MG021 24467
26002 GB: D26185_101 methionyl-tRNA synthetase (metS) {Bacillus
subtilis} 37.5494 58.8933 MG021(1-129 of 1536) GB: U02229(205- 333
of 333) MG021 24467 26002 GB: D26185_101 methionyl-tRNA synthetase
(metS) {Bacillus subtilis} 37.5494 58.8933 MG021(1318-1527 of 1536)
GB: X61513(1-209 of 209) MG022 26035 26469 GB: M21677_1 RNA
polymerase delta subunit (rpoE) {Bacillus subtilis} 28.6765 49.2647
MG022(254-1 of 435) GB: U01721(1-254 of 299) MG025 28651 29544 GP:
Z47767_4 TrsB {Yersinia enterocolitica} 27.551 54.0816
MG025(514-894 of 894) GB: U02253(1-381 of 649) MG026 29551 30120
GB: U14003_62 elongation factor P (efp) {Escherichia coli} 26.3804
47.2393 MG026(1-262 of 570) GB: U02253(388- 649 of 649) MG029 31702
31145 GB: L19300_1 hypothetical protein (GB: L19300_1) 27.027
45.045 MG029(1-93 {Staphylococcus aureus} of 558) GB: U01773
(210-302 of 302) MG030 32324 31707 GB: Z27121_3 uracil
phosphoribosyltransferase (upp) 44.9275 66.6667 MG030(414-618
{Mycoplasma hominis} of 618) GB: U01773(1-205 of 302) MG031 36713
32361 GB: U06833_1 DNA polymerase III (polC) 38.0303 59.3182
MG031(1473-1701 {Mycoplasma pulmonis} of 4353) GB: U01807(1- 229 of
229) MG031 36713 32361 GB: U06833_1 DNA polymerase III (polC)
38.0303 59.3182 MG031(2923-3309 {Mycoplasma pulmonis} of 4353) GB:
U01712(1- 387 of 387) MG031 36713 32361 GB: U06833_1 DNA polymerase
III (polC) {Mycoplasma pulmonis} 38.0303 59.3182 MG031(3330- 3676
of 4353) GB: U02208(1- 347 of 347) MG036 41777 43426 SP: P36419
aspartyl-tRNA synthetase (aspS) {Thermus aquaticus} 40.8582 62.8731
MG036(1115- 1650 of 1650) GB: U01814(1- 532 of 1006) MG036 41777
43426 SP: P36419 aspartyl-tRNA synthetase (aspS) {Thermus
aquaticus} 40.8582 62.8731 MG036(1407- 1638 of 1650) GB: X61511(1-
232 of 232) MG036 41777 43426 SP: P36419 aspartyl-tRNA synthetase
(aspS) {Thermus aquaticus} 40.8582 62.8731 MG036(1412- 1160 of
1650) GB: X61523(1- 252 of 252) MG037 43402 44751 GP: U02020_1
pre-B cell enhancing factor (PBEF) {Homo sapiens} 34.3164 52.2788
MG037(1- 500 of 1350) GB: U01814(508- 1006 of 1006) MG040 47581
49353 SP: P29724 membrane lipoprotein (tmpC) {Treponema pallidum}
30.8594 48.0469 MG040(1341- 1552 of 1773) GB: U02125(1- 212 of 212)
MG045 53205 54653 -- -- -- -- MG045(381- 4 of 1449) GB: U02166(1-
378 of 378) MG047 55589 56737 SP: P30869 S-adenosylmethionine
synthetase 2 (metX) 43.6111 60.5556 MG047(787- {Escherichia coli}
1070 of 1149) GB: U02123(1- 284 of 284) MG051 59741 61003 GB:
L13289_3 thymidine phosphorylase (deoA) {Mycoplasma pirum} 52.7316
73.6342 MG051(1161- 1263 of 1263) GB: U02191(1- 103 of 183) MG052
61015 61404 GB: L13289_4 cytidine deaminase (cdd) {Mycoplasma
pirum} 38.2114 64.2276 MG052(1- 69 of 390) GB: U02191(115- 183 of
183) MG052 61015 61404 GB: L13289_4 cytidine deaminase (cdd)
{Mycoplasma pirum} 38.2114 64.2276 MG052(320- 390 of 390) GB:
U02108(1- 71 of 212) MG053 61407 63056 GB: L13289_5
phosphomannomutase (cpsG) {Mycoplasma pirum} 38.7868 58.0882
MG053(1- 140 of 1650) GB: U02108(74- 212 of 212) MG054 63986 63039
GB: D13303_4 transcription antitermination factor (nusG) 30.8571
51.4286 MG054(688- {Bacillus subtilis} 44 of 948) GB: U01710(1- 645
of 645) MG054 63986 63039 GB: D13303_4 transcription
antitermination factor (nusG) 30.8571 51.4286 MG054(948- {Bacillus
subtilis} 719 of 948) GB: U02236(45- 274 of 276) MG055 64361 63993
-- -- -- -- MG055(1- 326 of 369) GB: U02240(23- 348 of 348) MG058
67121 66231 GB: D26185_114 phosphoribosylpyrophosphate synthetase
(prs) 44.4089 63.5783 MG058(72 - {Bacillus subtilis} 1 of 891) GB:
U01693(1- 72 of 350) MG059 67644 67210 GB: D12501_1 small protein
(smpB) {Escherichia coli} 32.5581 62.0155 MG059(435- 247 of 435)
GB: U01693(161- 350 of 350) MG060 67651 68541 SP: P26401
lipopolysaccharide biosynthesis protein (rfbV) 36.0656 59.8361
MG060(723- {Salmonella typhimurium} 396 of 891) GB: U02262(1- 328
of 328) MG061 69908 68526 GB: M89480_4 hexosephosphate transport
protein (uhpT) 30.9091 57.2727 MG061(1273- {Salmonellatyphimurium}
613 of 1383) GB: U01705(1- 661 of 661) MG062 70531 72570 SP: P20966
fructose-permease IIBC component (fruA) 42.723 60.5634 MG062(439-
{Escherichia coli} 761 of 2040) GB: U02138(1- 323 of 323) MG063
72668 73432 SP: P23539 1-phosphofructokinase (fruK) {Escherichia
coli} 26.3158 51.5038 MG063(363- 626 of 765) GB: U01777(1- 264 of
264) MG065 77686 79083 GB: X75422_1 heterocyst maturation protein
(devA) {Anabaena sp.} 35.2941 59.7285 MG065(1398- 1176 of 1398) GB:
U02154(133- 354 of 354) MG066 79090 81033 SP: P27302 transketolase
1 (TK 1) (tktA) 32.5617 54.9383 MG066(126- {Escherichia coli} 1 of
1944) GB: U02154(1- 126 of 354) MG068 82621 84042 -- -- -- --
MG068(1244- 919 of 1422) GB: U02162(1- 326 of 326) MG069 88228
90951 SP: P20166 phosphotransferase enzyme II, ABC component (ptsG)
43.1596 61.0749 MG069(1127- {Bacillus subtilis} 849 of 2724) GB:
U02207(l- 279 of 279) MG071 91924 94545 SP: P37278
cation-transporting ATPase (pacL) 34.3897 57.277 MG071(1470-
{Synechococcus sp.} 1209 of 2622) GB: X61532(1- 262 of 262) MG072
94535 96952 GB: D10279_2 preprotein translocase (secA) {Bacillus
subtilis} 43.6601 66.7974 MG072(2269- 2418 of 2418) GB: U01743(1-
150 of 365) MG073 96933 98900 SP: P07025 excinuclease ABC subunit B
(uvrB) 47.9751 67.2897 MG073(1- {Escherichia coli} 235 of 1968) GB:
U01743(131- 365 of 365) MG073 96933 98900 SP: P07025 excinuclease
ABC subunit B (uvrB) 47.9751 67.2897 MG073(1584- {Escherichia coli}
1240 of 1968) GB: U01698(1- 345 of 345) MG073 96933 98900 SP:
P07025 excinuclease ABC subunit B (uvrB) 47.9751 67.2897 MG073(305-
{Escherichia coli} 694 of 1968) GB: U02119(1- 391 of 391) MG074
98906 99316 -- -- -- -- MG074(369- 411 of 411) GB: U01715(1- 43 of
576) MG075 99383 102454 -- -- -- -- MG075(1- 467 of 3072) GB:
U01715(110- 576 of 576) MG075 99383 102454 -- -- -- -- MG075(1206-
804 of 3072) GB: U02251(1- 403 of 403) MG075 99383 102454 -- -- --
-- MG075(1927- 2210 of 3072) GB: U01749(1- 284 of 284) MG075 99383
102454 -- -- -- -- MG075(2841- 2422 of 3072) GB: U01775(1- 420 of
420) MG080 106660 109203 SP: P18766 oligopeptide transport
ATP-binding protein (amiF) 46.6403 67.1937 MG080(2268-
{Streptococcus pneumoniae} 1954 of 2544) GB: U02129(1- 315 of 315)
MG080 106660 109203 SP: P18766 oligopeptide transport ATP-binding
protein (amiF) 46.6403 67.1937 MG080(951- {Streptococcus
pneumoniae} 646 of 2544) GB: U01758(1- 306 of 306) MG082 109675
110352 SP: P04447 ribosomal protein L1 (rpL1) 48.1982 67.5676
MG082(446- {Bacillus stearothermophilus} 170 of 678) GB: U02113(1-
278 of 278) MG083 110355 110921 GB: L32144_1 peptidyl-tRNA
hydrolase homolog (pth) 38.2166 57.3248 MG083(567- {Borrelia
burgdorferi} 220 of 567) GB: U02185(26- 373 of 373) MG084 110917
111786 SP: P37563 hypothetical protein (SP: P37563) 28.125 46.3542
MG084(30- {Bacillus subtilis} 1 of 870) GB: U02185(1- 30 of 373)
MG084 110917 111786 SP: P37563 hypothetical protein (SP: P37563)
{Bacillus subtilis} 28.125 46.3542 MG084(794- 870 of 870) GB:
U01783(1- 77 of 269) MG087 113895 114311 SP: P09901 ribosomal
protein S12 (rpS12) 75.3731 82.0896 MG087(417- {Bacillus
stearothermophilus} 349 of 417) GB: U02212(326- 394 of 394) MG088
114331 114795 SP: P22744 ribosomal protein S7 (rpS7) 64.9351
81.1688 MG088(305- {Bacillus stearothermophilus} 1 of 465) GB:
U02212(2- 306 of 394) MG089 114808 116871 SP: P13551 elongation
factor G (fus) 59.2105 78.0702 MG089(1878- {Thermus aquaticus} 1540
of 2064) GB: U02180(1- 339 of 340) MG089 114808 116871 SP: P13551
elongation factor G (fus) {Thermus aquaticus} 59.2105 78.0702
MG089(1885- 2064 of 2064) GB: U02136(1- 180 of 410) MG089 114808
116871 SP: P13551 elongation factor G (fus) {Thermus aquaticus}
59.2105 78.0702 MG089(687- 1374 of 2064) GB: U01722(1- 688 of 688)
MG090 116926 117549 SP: P02358 ribosomal protein S6 (rpS6)
{Escherichia coli} 23.8636 44.3182 MG090(1- 176 of 624) GB:
U02136(235- 410 of 410) MG094 118847 120184 SP: P03005 replicative
DNA helicase (dnaB) 33.105 55.0228 MG094(1068- {Escherichia coli}
731 of 1338) GB: U01803(1- 336 of 336) MG094 118847 120184 SP:
P03005 replicative DNA helicase (dnaB) 33.105 55.0228 MG094(228-
{Escherichia coli} 1 of 1338) GB: U02158(1- 228 of 301) MG095
120191 121384 -- -- -- -- MG095(355- 759 of 1194) GB: U01787(1- 403
of 403) MG096 121939 123519 -- -- -- -- MG096(1- 309 of 1581) GB:
U01713(58- 366 of 366) MG096 121939 123519 -- -- -- -- MG096(361-
531 of 1581) GB: U01762(1- 171 of 171) MG097 123579 124313 GB:
D13169_3 uracil DNA glycosylase (ung) 32.5688 51.8349 MG097(220-
{Escherichia coli} 694 of 735) GB: U02201(1- 475 of 475) MG098
124416 125846 GP: M74170_2 p48 eggshell protein (p48) {Schistosoma
mansoni} 23.0769 47.9853 MG098(1260- 831 of 1431) GB: U01782(1- 431
of 431) MG098 124416 125846 GP: M74170_2 p48 eggshell protein (p48)
{Schistosoma mansoni} 23.0769 47.9853 MG098(134- 467 of 1431) GB:
U01701(1- 334 of 334) MG100 127278 128708 GP: L22072_1 PET112
protein {Saccharomyces cerevisiae} 30.8696 54.1304 MG100(533- 238
of 1431) GB: U01799(1- 296 of 296) MG101 128686 129351 -- -- -- --
MG101(89- 398 of 666) GB: U02103(1- 309 of 309) MG102 129347 130291
GB: J03762_1 thioredoxin reductase (trxB) 38.5906 59.396 MG102(45-
{Escherichia coli} 367 of 945) GB: U02197(1- 322 of 322) MG103
130284 131123 -- -- -- -- MG103(623- 256 of 840) GB: U02170(1- 368
of 369) MG104 131384 133558 GB: U14003_91 virulence associated
protein homolog (vacB) 29.2335 52.2282 MG104(215- {Escherichia
coli} 491 of 2175) GB: U01795(1- 277 of 277) MG108 135337 136116
SP: P35182 protein phosphatase 2C homolog (ptc1) 27.5362 52.1739
MG108(780- {Saccharomyces cerevisiae} 598 of 780) GB: U02111(33-
215 of 215) MG109 136179 137264 PIR: S36944 protein
serine/threonine kinase {Arabidopsis thaliana} 33.7398 52.0325
MG109(425- 786 of 1086) GB: U01720(1- 362 of 362) MG109 136179
137264 PIR: S36944 protein serine/threonine kinase {Arabidopsis
thaliana} 33.7398 52.0325 MG109(781- 1084 of 1086) GB: U01748(1-
303 of 303) MG110 137380 138087 GB: U14003_76 hypothetical protein
(GB: U14003_76) 28.5714 54.1126 MG110(140- {Escherichia coli} 242
of 708) GB: X61518(1- 102 of 102) MG110 137380 138087 GB: U14003_76
hypothetical protein (GB: U14003_76) 28.5714 54.1126 MG110(670-
{Escherichia coli} 378 of 708) GB: U01714(1- 293 of 293) MG111
138105 139403 SP: P13376 phosphoglucose isomerase B (pgiB) 34.8235
53.6471 MG111(1- {Bacillus stearothermophilus} 98 of 1299) GB:
U01747(38- 135 of 135) MG112 139396 140022 GB: M64173_3
D-ribulose-5-phosphate 3 epimerase (cfxEc) 33.1361 53.8462
MG112(207- {Alcaligenes eutrophus} 473 of 627) GB: U02181(1- 267 of
267) MG113 140039 141406 GB: M33145_1 asparaginyl-tRNA synthetase
(asnS) {Escherichia coli} 41.4579 64.2369 MG113(1231- 941 of 1368)
GB: U01692(1- 291 of 291) MG115 142314 142550 SP: P31131
hypothetical protein (SP: P31131) {Escherichia coli} 32.6087 50
MG115(198- 237 of 237) GB: U02127(1- 40 of 234) MG116 142562 143314
-- -- -- -- MG116(1- 183 of 753) GB: U02127(52- 234 of 234) MG119
144972 146663 GB: M59444_2 methylgalactoside permease ATP-binding
protein (mglA) 33.1984 57.6923 MG119(1660- {Escherichia coli} 1692
of 1692) GB: U02147(1- 33 of 301) MG119 144972 146663 GB: M59444_2
methylgalactoside permease ATP-binding protein (mglA) 33.1984
57.6923 MG119(192- {Escherichia coli} 1 of 1692) GB: U02149(1- 192
of 681) MG120 146673 148232 SP: P36948 ribose transport system
permease protein (rbsC) 27.4809 51.9084 MG120(1- {Bacillus
subtilis} 259 of 1560) GB: U02147(43- 301 of 301) MG122 149198
151324 GB: L27797_2 DNA topoisomerase I (topA) {Bacillus subtilis}
38.9222 59.7305 MG122(1193- 1443 of 2127) GB: U02134(1- 251 of 251)
MG122 149198 151324 GB:
L27797_2 DNA topoisomerase I (topA) {Bacillus subtilis} 38.9222
59.7305 MG122(1578- 1971 of 2127) GB: U02242(1- 394 of 394) MG123
151305 152717 GB: M91593_1 hypothetical protein (GB: M91593_1)
23.9837 50.4065 MG123(1413- {Mycoplasma mycoides} 1236 of 1413) GB:
U01796(114- 291 of 291) MG124 152767 153072 GB: J03294_1
thioredoxin (trx) {Bacillus subtilis} 36.0825 65.9794 MG124(64- 1
of 306) GB: U01796(1- 64 of 291) MG133 159669 158986 -- -- -- --
MG133(1- 110 of 684) GB: U02144(237- 345 of 345) MG133 159669
158986 -- -- -- -- MG133(435- 673 of 684) GB: X61537(1- 238 of 238)
MG134 159797 160096 GB: M38777_3 hypothetical protein (GB:
M38777_3) 28.5714 57.1429 MG134(109- {Escherichia coli} 1 of 300)
GB: U02144(1- 109 of 345) MG135 160913 160074 PIR: E22845
hypothetical protein 4 (GP: Z33006_1) 30.7692 55.9441 MG135(485-
{Trypanosoma brucei} 782 of 840) GB: U02114(1- 298 of 298) MG138
163590 165383 GB: K00426_1 GTP-binding membrane protein (lepA)
47.5465 70.5584 MG138(1237- {Escherichia coli} 938 of 1794) GB:
U02133(2- 301 of 301) MG138 163590 165383 GB: K00426_1 GTP-binding
membrane protein (lepA) 47.5465 70.5584 MG138(1318- {Escherichia
coli} 1794 of 1794) GB: U01745(1- 477 of 524) MG138 163590 165383
GB: K00426_1 GTP-binding membrane protein (lepA) 47.5465 70.5584
MG138(323- {Escherichia coli} 591 of 1794) GB: X61521(1- 269 of
269) MG140 175807 179145 -- -- -- -- MG140(1- 41 of 3339) GB:
U02110(178- 218 of 218) MG140 175807 179145 -- -- -- -- MG140(2727-
2429 of 3339) GB: U01730(1- 297 of 297) MG140 175807 179145 -- --
-- -- MG140(3302- 2994 of 3339) GB: U02156(1- 308 of 308) MG140
175807 179145 -- -- -- -- MG140(382- 834 of 3339) GB: U01729(1- 454
of 454) MG140 175807 179145 -- -- -- -- MG140(834- 616 of 3339) GB:
X61512(1- 220 of 220) MG140 175807 179145 -- -- -- -- MG140(880-
1182 of 3339) GB: U01742(1- 303 of 303) MG141 179153 180745 SP:
P32727 N-utilization substance protein A homolog (nusA) 30.8743
53.8251 MG141(223- {Bacillus subtilis} 871 of 1593) GB: U01778(1-
652 of 652) MG142 181007 182863 GB: M34836_1 protein synthesis
initiation factor 2 (infB) 46.0292 64.6677 MG142(265- {Bacillus
subtilis} 393 of 1857) GB: U01765(1- 129 of 129) MG144 183216
184052 -- -- -- -- MG144(190- 420 of 837) GB: U02121(1- 231 of 231)
MG146 184877 186148 GB: X73141_2 hemolysin (tlyC) {Serpulina
hyodysenteriae} 26.2712 52.1186 MG146(1272- 1174 of 1272) GB:
U02223(19- 117 of 117) MG149 188609 189451 -- -- -- -- MG149(843-
765 of 843) GB: U02135(182- 260 of 260) MG151 190372 191142 SP:
P10134 ribosomal protein L3 (rpL3) {Mycoplasma capricolum} 42.5926
61.5741 MG151(528- 1 of 771) GB: U02153(1- 527 of 543) MG168 198519
199151 GB: M57621_1 ribosomal protein S5 (rpS5) 55.9748 72.327
MG168(505- {Bacillus stearothermophilus} 633 of 633) GB: U01726(1-
129 of 260) MG175 202762 203133 GB: M26414_3 ribosomal protein S13
(rpS13) {Bacillus subtilis} 63.3333 82.5 MG175(22- 372 of 372) GB:
U01733(1- 351 of 600) MG176 203136 203528 GB: X02543_2 ribosomal
protein S11 (rpS11) {Escherichia coli} 47.7876 69.9115 MG176(1- 247
of 393) GB: U01733(354- 600 of 600) MG180 205682 206593 GB:
M61017_1 membrane transport protein (glnQ) 37.3832 63.0841
MG180(249- {Bacillus stearothermophilus} 1 of 912) GB: U01754(1-
248 of 265) MG180 205682 206593 GB: M61017_1 membrane transport
protein (glnQ) 37.3832 63.0841 MG180(912- {Bacillus
stearothermophilus} 784 of 912) GB: U01750(167- 295 of 295) MG181
206589 207848 -- -- -- -- MG181(171- 1 of 1260) GB: U01750(1- 171
of 295) MG182 207844 208575 SP: P07649 pseudouridylate synthase I
(hisT) {Escherichia coli} 27.0042 45.1477 MG182(1- 308 of 732) GB:
U02176(70- 377 of 377) MG182 207844 208575 SP: P07649
pseudouridylate synthase I (hisT) {Escherichia coli} 27.0042
45.1477 MG182(732- 383 of 732) GB: U02100(31- 380 of 380) MG183
208568 210388 GB: Z32522_1 oligoendopeptidase F (pepF) {Lactococcus
lactis} 30 50.6667 MG183(27- 335 of 1821) GB: U02198(1- 309 of 309)
MG183 208568 210388 GB: Z32522_1 oligoendopeptidase F (pepF)
{Lactococcus lactis} 30 50.6667 MG183(38- 1 of 1821) GB: U02100(1-
38 of 380) MG184 210392 211342 GB: M97479_2 methyltransferase
(ssoIM) {Shigella sonnei} 42.5249 67.4419 MG184(520- 719 of 951)
GB: U02115(1- 200 of 201) MG190 220479 221561 PIR.JS0068 29 kDa
protein, MgPa operon (mgp) 62.0833 82.0833 MG190(28- {Mycoplasma
genitalium} 1083 of 1083) GB: M31431(1- 1056 of 8760) MG194 232007
233029 GB: V00291_5 phenylalanyl-tRNA synthetase beta-subunit
(pheS) 35.0769 56.3077 MG194(194- {Escherichia coli} 359 of 1023)
GB: U02120(1- 166 of 166) MG195 233036 235453 SP: P17922
phenylalanyl-tRNA synthetase beta chain (pheT) 25.4597 49.0806
MG195(2044- {Bacillus subtilis} 2396 of 2418) GB: U02173(1- 353 of
353) MG200 237346 239148 GB: L36455_1 heat shock protein (dnaJ)
{Coxiella burnetii} 33.5938 51.5625 MG200(842- 1227 of 1803) GB:
U02163(2- 387 of 387) MG203 240322 242220 GB: U25549_1
topoisomerase IV subunit B (parE) 100 100 MG203(1216- {Mycoplasma
genitalium} 1899 of 1899) GB U25549(1- 684 of 2124) MG204 242223
244565 GB: U25549_2 topoisomerase IV subunit A (parC) 99.7912
99.7912 MG204(1- {Mycoplasma genitalium} 1438 of 2343) GB:
U25549(687- 2124 of 2124) MG204 242223 244565 GB: U25549_2
topoisomerase IV subunit A (parC) 99.7912 99.7912 MG204(1950-
{Mycoplasma genitalium} 1641 of 2343) GB: U02155(1- 308 of 308)
MG206 246127 247422 SP: P14951 excinuclease ABC subunit C (uvrC)
28.0872 51.0896 MG206(738- 399 of 1296) GB: U02182(1- 341 of 341)
MG208 248492 247905 -- -- -- -- MG208(585- 162 of 588) GB:
U01785(1- 423 of 423) MG209 249402 248479 SP: P23851 hypothetical
protein (SP: P23851) {Escherichia coli} 30.4498 55.0173 MG209(730-
372 of 924) GB: U02214(1- 359 of 359) MG210 249947 249405 GB:
M83994_1 prolipoprotein signal peptidase (lsp) 32.3944 52.1127
MG210(1- {Staphylococcus aureus} 116 of 543) GB: U01759(196- 311 of
311) MG212 251780 252583 GB: L32861_1
1-acyl-sn-glycerol-3-phosphate acetyltransferase (plsC) 32.1429
60.7143 MG212(7- {Borrelia burgdorferi} 315 of 804) GB: U02160(5-
313 of 313) MG216 255594 257117 GB: L07920_2 pyruvate kinase (pyk)
{Lactococcus lactis} 35.3319 57.6017 MG216(1118- 790 of 1524) GB:
U01798(1- 329 of 329) MG218 259176 264590 PIR: S37536 no score
generated -score shown is bogus -1 -1 MG218(1669- 1977 of 5415) GB:
U02165(1- 309 of 309) MG221 266626 267087 SP: P22186 hypothetical
protein (SP: P22186) {Escherichia coli} 28.8732 56.338 MG221(337-
49 of 462) GB: U02195(1- 290 of 290) MG225 270404 271870 GB:
U14003_71 hypothetical protein (GB: U14003_71) 21.9565 48.0435
MG225(1467- {Escherichia coli} 1409 of 1467) GB: U02264(289- 347 of
347) MG226 271938 273314 GB: D26562_11 aromatic amino acid
transport protein (aroP) 24.5902 47.2131 MG226(221- {Escherichia
coli} 1 of 1377) GB: U02264(1- 221 of 347) MG227 273789 274649 SP:
P13954 thymidylate synthase (thyA) {Staphylococcus aureus} 56.5972
75.3472 MG227(577- 861 of 861) GB: U01718(1- 285 of 439) MG228
274652 275131 GB: X60681_1 dihydrofolate reductase (dhfr)
{Lactococcus lactis} 33.1288 59.5092 MG228(480- 385 of 480) GB:
U02137(174- 269 of 269) MG229 275140 276159 SP: P17424
ribonucleotide reductase 2 (nrdF) 50 70.0637 MG229(1020-
{Salmonella typhimurium} 697 of 1020) GB: U01739(22- 344 of 344)
MG231 276646 278808 GB: X73226_1 ribonucleoside-diphosphate
reductase (nrdE) 54.1193 73.1534 MG231(2122- {Salmonella
typhimurium} 2163 of 2163) GB: U02141(1- 42 of 827) MG237 281078
281959 -- -- -- -- MG237(647- 882 of 882) GB: U01774(1- 236 of 289)
MG238 281992 283323 GB: M34066_1 trigger factor (tig) {Escherichia
coli} 24.6193 47.9695 MG238(420- 648 of 1332) GB: U01772(1- 229 of
229) MG239 283395 285779 SP: P37945 ATP-dependent protease (Ion)
{Bacillus subtilis} 43.6268 65.8344 MG239(1818- 1449 of 2385) GB:
U02148(1- 370 of 370) MG240 286657 285782 GB: M91593_1 hypothetical
protein (GB: M91593_1) 27.8195 53.3835 MG240(876- {Mycoplasma
mycoides} 598 of 876) GB.U01734(27- 305 of 305) MG242 288752 290641
-- -- -- -- MG242(886- 543 of 1890) GB: U02194(1- 344 of 344) MG244
291332 293440 GB: M99049_1 DNA helicase II (mutB1) {Haemophilus
influenzae} 36.0078 55.9687 MG244(829- 1035 of 2109) GB: X61517(1-
207 of 207) MG249 297604 296114 SP: P33656 RNA polymerase sigma-A
factor (sigA) 43.6842 66.0526 MG249(970- {Clostridium
acetobutylicum} 666 of 1491) GB: X61535(1- 306 of 306) MG250 299472
297652 GB: M10040_1 DNA primase (dnaE) {Bacillus subtilis} 27.2727
52.2078 MG250(1530- 1821 of 1821) GB: U01771(1- 292 of 572) MG250
299472 297652 GB: M10040_1 DNA primase (dnaE) {Bacillus subtilis}
27.2727 52.2078 MG250(648- 231 of 1821) GB: U02146(1- 418 of 418)
MG254 304823 302847 GB: M24278_1 DNA ligase (lig) {Escherichia
coli} 38.2263 59.3272 MG254(1429- 1722 of 1977) GB: U02152(1- 294
of 294) MG254 304823 302847 GB: M24278_1 DNA ligase (lig)
{Escherichia coli} 38.2263 59.3272 MG254(37- 367 of 1977) GB:
U01761(1- 330 of 330) MG255 304999 306093 -- -- -- -- MG255(726-
1095 of 1095) GB: U02164(1- 370 of 370) MG255 304999 306093 -- --
-- -- MG255(729- 400 of 1095) GB: U02174(1- 333 of 333) MG261
315699 318320 GB: M19334_4 DNA polymerase III alpha subunit (dnaE)
31.9115 55.7662 MG261(2442- {Escherichia coli} 2159 of 2622) GB:
U01738(1- 284 of 284) MG263 320175 321047 GB: L10328_61
hypothetical protein (GB: L10328_61) 27.8008 47.7178 MG263(828-
{Escherichia coli} 489 of 873) GB: U01764(1- 340 of 340) MG266
324809 322434 GB: M88581_1 leucyl-tRNA synthetase (leuS) 43.401
64.2132 MG266(78- {Bacillus stearothermophilus} 287 of 2376) GB:
U01780(1- 210 of 210) MG266 324809 322434 GB: M88581_1 leucyl-tRNA
synthetase (leuS) 43.401 64.2132 MG266(957- {Bacillus
stearothermophilus} 622 of 2376) GB: U02167(1- 336 of 336) MG269
327050 326031 GB: D90354_1 surface protein antigen precursor (pag)
25.5144 47.3251 MG269(239- {Streptococcus sobrinus} 1 of 1020) GB:
U02215(1- 239 of 366) MG271 329826 328456 SP: P11959
Dihydrolipoamide dehydrogenase (pdhD) 38.3592 62.306 MG271(914-
{Bacillus stearothermophilus} 1214 of 1371) GB: U01784(1- 301 of
301) MG275 334772 333339 SP: P37061 NADH oxidase (nox)
{Enterococcus faecalis} 39.229 62.1315 MG275(81- 1 of 1434) GB:
U01786(4- 84 of 280) MG276 335397 334858 GB: M14040_1 Adenine
phosphoribosyltransferase (apt) 34.3373 58.4337 MG276(540-
{Escherichia coli} 430 of 540) GB: U01786(170- 280 of 280) MG278
338366 340525 GB: X72832_5 stringent response-like protein (rel)
29.1339 55.1181 MG278(391- {Streptococcus equisimilis} 697 of 2160)
GB: U01770(1- 308 of 308) MG281 343702 342035 -- -- -- --
MG281(748- 1051 of 1668) GB: U01706(1- 303 of 303) MG282 344849
344367 SP: P2740 transcription elongation factor (greA) 40.146
65.6934 MG282(483- {Rickettsia prowazekii} 356 of 483) GB:
U02104(187- 314 of 314) MG283 345181 346629 GB: M97858_1
prolyl-tRNA synthetase (proS) {Escherichia coli} 22.6562 46.0938
MG283(839- 1183 of 1449) GB: U02205(1- 346 of 346) MG285 347214
348254 -- -- -- -- MG285(315- 493 of 1041) GB: U02266(1- 180 of
180) MG289 354023 355126 SP: P15363 high affinity transport system
protein P37 (P37) 35.7798 58.4098 MG289(105- {Mycoplasma hyorhinis}
1 of 1104) GB: U02132(1- 105 of 571) MG291 355846 357474 SP: P15362
transport system permease protein P69 (P69) 27.9159 54.8757
MG291(1216- {Mycoplasma hyorhinis} 1629 of 1629) GB: U01768(1- 415
of 705) MG291 355846 357474 SP: P15362 transport system permease
protein P69 (P69) 27.9159 54.8757 MG291(279- {Mycoplasma hyorhinis}
1 of 1629) GB: U02171(1- 279 of 346) MG293 361384 360653 SP: P37965
Glycerophosphoryl diester phosphodiesterase (glpQ) 30.3965 55.9471
MG293(357- {Bacillus subtilis} 41 of 732) GB: U02118(1- 317 of 317)
MG294 362801 361380 GB: L19201_18 hypothetical protein (GB:
L19201_18) 23.1013 46.2025 MG294(256- {Escherichia coli} 592 of
1422) GB: U02243(1- 337 of 337) MG297 365574 364537 GB: U00039_18
cell division protein (ftsY) {Escherichia coli} 36.1371 57.9439
MG297(1- 57 of 1038) GB: U02177(215- 271 of 271) MG298 368529
365584 GB: M34956_1 115 kDa protein (p115) {Mycoplasma hyorhinis}
33.4059 57.5626 MG298(2743- 2946 of 2946) GB: U02177(1- 205 of 271)
MG300 370962 369715 SP: P36204 phosphoglycerate kinase (pgk)
{Thermotoga maritima} 51.2887 70.6186 MG300(1- 167 of 1248) GB:
U02178(167- 333 of 333) MG300 370962 369715 SP: P36204
phosphoglycerate kinase (pgk) {Thermotoga maritima} 51.2887 70.6186
MG300(935- 609 of 1248) GB: U02226(1- 326 of 326) MG300 370962
369715 SP: P36204 phosphoglycerate kinase (pgk) {Thermotoga
maritima} 51.2887 70.6186 MG300(939- 1243 of 1248) GB: U02234(1-
305 of 305) MG301 371962 370952 GB: X72219_1
glyceraldehyde-3-phosphate dehydrogenase (gap) 56.0606 73.0303
MG301(244- {Clostridium pasteurianum} 1 of 1011) GB: U02213(1- 244
of 364) MG301 371962 370952 GB: X72219_1 glyceraldehyde-3-phosphate
dehydrogenase (gap) 56.0606 73.0303 MG301(835- {Clostridium
pasteurianum} 1011 of 1011) GB: U02178(1- 177 of 333) MG302 372946
371996 -- -- -- -- MG302(951- 865 of 951) GB: U02213(278- 364 of
364) MG305 376705 374921 GB: D30690_3 heat shock protein 70 (hsp70)
{Staphylococcus aureus} 57.4359 75.8974 MG305(1382- 1055 of 1785)
GB: U02204(1- 327 of 327) MG307 381507 377977 -- -- -- --
MG307(3175- 2042 of 3531) GB: U01767(1- 1134 of 1134) MG308 382724
381495 SP: P23304 ATP-dependent RNA helicase (deaD) {Escherichia
coli} 23.0986 48.169 MG308(1- 89 of 1230) GB: U02200(276- 364 of
364) MG309 386408 382734 -- -- -- -- MG309(3410-
3675 of 3675) GB: U02200(1- 266 of 364) MG312 391334 387918 GB:
U11381_1 cytadherence-accessory protein (hmwl) 39.3235 60.6765
MG312(2541- {Mycoplasma pneumoniae} 2160 of 3417) GB: U02261(1- 382
of 382) MG314 393633 392305 GP: L38997_4 hypothetical protein (GP:
L38997_4) 51.4477 71.4922 MG314(514- {Mycoplasma pneurnoniae} 206
of 1329) GB: U02151(1- 309 of 309) MG317 397423 395627 GB: M82965_1
cytadherence-accessory protein (hmw3) 41.1458 59.8958 MG317(1329-
{Mycoplasma pneumoniae} 1542 of 1797) GB: U02267(1- 214 of 214)
MG317 397423 395627 GB: M82965_1 cytadherence-accessory protein
(hmw3) 41.1458 59.8958 MG317(509- {Mycoplasma pneumoniae} 169 of
1797) GB: U02224(1- 341 of 341) MG317 397423 395627 GB: M82965_1
cytadherence-accessory protein (hmw3) 41.1458 59.8958 MG317(73-
{Mycoplasma pneumoniae} 1 of 1797) GB: U01716(1- 73 of 325) MG318
398280 397441 GB: J04151_1 fibronectin-binding protein (fnbA)
24.6154 43.0769 MG318(840- {Staphylococcus aureus} 604 of 840) GB:
U01716(91- 325 of 325) MG319 398833 398300 -- -- -- -- MG319(423- 1
of 534) GB: U01769(1- 426 of 541) MG320 399797 398940 -- -- -- --
MG320(371- 781 of 858) GB: U01700(1- 410 of 410) MG324 408792
407731 GB: D00398_1 aminopeptidase P (pepP) {Escherichia coli}
30.531 54.4248 MG324(883- 1062 of 1062) GB: U01717(1- 181 of 223)
MG324 408792 407731 GB: D00398_1 aminopeptidase P (pepP)
{Escherichia coli} 30.531 54.4248 MG324(889- 1062 of 1062) GB:
U01755(2- 175 of 217) MG327 410676 409873 SP: P26174
magnesium-chelatase 30 kDa subunit (bchO) 26.7281 51.1521
MG327(782- {Rhodobacter capsulatus} 533 of 804) GB: U02232(1- 250
of 250) MG328 412933 410666 GB: X62467_1 protein V (fcrV)
{Streptococcus sp.} 27.5434 48.3871 MG328(339- 53 of 2268) GB:
U02188(1- 287 of 287) MG328 412933 410666 GB: X62467_1 protein V
(fcrV) {Streptococcus sp.} 27.5434 48.3871 MG328(817- 462 of 2268)
GB: U02203(1- 356 of 356) MG330 414975 414325 SP: P38493 cytidylate
kinase (cmk) {Bacillus subtilis} 40.3756 61.0329 MG330(537- 226 of
651) GB: U02241(1- 312 of314) MG334 419480 416970 SP: Q05873
valyl-tRNA synthetase (valS) {Bacillus subtilis} 38.5629 60.5988
MG334(1109- 781 of 2511) GB: U02202(1- 330 of 330) MG334 419480
416970 SP: Q05873 valyl-tRNA synthetase (valS) {Bacillus subtilis}
38.5629 60.5988 MG334(2400- 2511 of 2511) GB: U02249(1- 112 of 305)
MG335 420045 419473 SP: P38424 hypothetical protein (SP: P38424)
{Bacillus subtilis} 34.5238 61.3095 MG335(1- 95 of 573) GB:
U02190(200- 294 of 294) MG336 421467 422690 GB: U00013_6 nitrogen
fixation protein (nifS) {Mycobacterium leprae} 26.2295 47.2678
MG336(990- 719 of 1224) GB: U02256(1- 272 of 272) MG337 422697
423110 -- -- -- -- MG337(414- 151 of 414) GB: U01709(35- 297 of
297) MG338 426915 423103 -- -- -- -- MG338(1- 251 of 3813) GB:
U02269(65- 315 of 315) MG338 426915 423103 -- -- -- -- MG338(1304-
917 of 3813) GB: U02221(1- 388 of 388) MG338 426915 423103 -- -- --
-- MG338(3342- 3067 of 3813) GB: U01809(1- 276 of 276) MG338 426915
423103 -- -- -- -- MG338(3772- 3813 of 3813) GB: U01709(1- 42 of
297) MG339 428115 427096 GB: L25893_1 recombination protein (recA)
{Staphylococcus aureus} 46.5986 69.3878 MG339(372- 93 of 1020) GB:
U01704(1- 279 of 279) MG340 434458 430583 SP: P00577 DNA-directed
RNA polymerase beta'chain (rpoC) 44.4828 66.0345 MG340(1294-
{Escherichia coli} 999 of 3876) GB: X61534(1- 295 of 295) MG340
434458 430583 SP: P00577 DNA-directed RNA polymerase beta'chain
(rpoC) 44.4828 66.0345 MG340(1519- {Escherichia coli} 1289 of 3876)
GB: X61528(1- 231 of 231) MG340 434458 430583 SP: P00577
DNA-directed RNA polymerase beta'chain (rpoC) 44.4828 66.0345
MG340(3444- {Escherichia coli} 3083 of 3876) GB: U02169(1- 361 of
361) MG340 434458 430583 SP: P00577 DNA-directed RNA polymerase
beta'chain (rpoC) 44.4828 66.0345 MG340(3772- {Escherichia coli}
3876 of 3876) GB: U01766(1- 105 of 467) MG340 434458 430583 SP:
P00577 DNA-directed RNA polymerase beta'chain (rpoC) 44.4828
66.0345 MG340(426- {Escherichia coli} 66 of 3876) GB: U01797(1- 361
of 361) MG341 438640 434471 GB: L24376_3 RNA polymerase beta
subunit (rpoB) {Bacillus subtilis} 46.5338 67.5043 MG341(1- 107 of
4170) GB: U02230(217- 323 of 323) MG341 438640 434471 GB: L24376_3
RNA polymerase beta subunit (rpoB) {Bacillus subtilis} 46.5338
67.5043 MG341(1932- 1595 of 4170) GB: U01737(1- 338 of 338) MG341
438640 434471 GB: L24376_3 RNA polymerase beta subunit (rpoB)
{Bacillus subtilis} 46.5338 67.5043 MG341(2833- 3201 of 4170) GB:
U01735(1- 369 of 369) MG342 439236 438733 -- -- -- -- MG342(381-
504 of 504) GB: U02230(1- 124 of 323) MG342 439236 438733 -- -- --
-- MG342(386- 65 of 504) GB: U02231(1- 322 of 322) MG343 440355
439318 -- -- -- -- MG343(108- 452 of 1038) GB: U01811(1- 345 of
345) MG344 441180 440362 GP: U17036_2 lipase-esterase (lip1)
{Mycoplasma mycoides} 26.6667 47.5 MG344(575- 767 of 819) GB:
U02222(1- 193 of 193) MG345 443878 441194 SP: P00956 isoleucyl-tRNA
synthetase (ileS) {Escherichia coli} 33.2963 56.2708 MG345(1115-
782 of 2685) GB: U02196(1- 334 of 334) MG345 443878 441194 SP:
P00956 isoleucyl-tRNA synthetase (ileS) {Escherichia coli} 33.2963
56.2708 MG345(1811- 2134 of 2685) GB: U02254(1- 324 of 324) MG348
446165 445200 -- -- -- -- MG348(166- 459 of 966) GB: U01781(1- 292
of 292) MG352 450222 450719 GB: U11883_2 hypothetical protein (GB:
U11883_2) {Bacillus subtilis} 33.3333 56.7901 MG352(366- 498 of
498) GB: U02237(1- 133 of 310) MG353 451048 450722 -- -- -- --
MG353(327- 153 of 327) GB: U02237(136- 309 of 310) MG357 455947
454769 GB: L17320_2 acetate kinase (ackA) {Bacillus subtilis}
42.6735 65.5527 MG357(342- 131 of 1179) GB: X61531(1- 211 of 211)
MG358 456590 457369 GB: M21298_1 Holliday junction DNA helicase
(ruvA) 26.2411 42.5532 MG358(350- {Escherichia coli} 87 of 780) GB:
U02233(1- 265 of 265) MG361 459615 460100 SP: P29394 ribosomal
protein L10 (rpL10) {Thermotoga maritima} 29.8137 61.4907
MG361(274- 486 of 486) GB: U02206(1- 213 of 345) MG362 460126
460491 SP: P02394 ribosomal protein L7/L12 (`A`type) (rpL7/L12)
47.5 70 MG362(1- {Bacillus subtilis} 107 of 366) GB: U02206(239-
345 of 345) MG365 461682 462614 GB: X63666_2 methionyl-tRNA
formyltransferase (fmt) 24.43 50.8143 MG365(292- {Escherichia coli}
1 of 933) GB: U02238(1- 292 of 349) MG368 466410 465427 GB:
M96793_1 fatty acid/phospholipid synthesis protein (plsX) 28.972
52.3364 MG368(227- {Escherichia coli} 1 of 984) GB: U01791(1- 227
of 326) MG369 468083 466413 -- -- -- -- MG369(1146- 1446 of 1671)
GB: U01763(1- 300 of 300) MG370 469123 468155 SP: P23851
hypothetical protein (SP: P23851) {Escherichia coli} 26.9531
48.8281 MG370(240- 599 of 969) GB: U02220(1- 360 of 360) MG371
470084 469113 GB: D26185_10 hypothetical protein (GB: D26185_10)
25.8065 47.0046 MG371(349- {Bacillus subtilis} 689 of 972) GB:
U02263(1- 341 of 341) MG374 472891 472070 -- -- -- -- MG374(1- 178
of 822) GB: U02250(159- 337 of 337) MG375 474578 472887 GB:
M36594_1 threonyl-tRNA synthetase (thrSv) {Bacillus subtilis}
38.7097 60.7527 MG375(1048- 1389 of 1692) GB: U02130(1- 342 of 342)
MG375 474578 472887 GB: M36594_1 threonyl-tRNA synthetase (thrSv)
{Bacillus subtilis} 38.7097 60.7527 MG375(1530- 1692 of 1692) GB:
U02250(1- 163 of 337) MG378 477139 475529 SP: P35868 arginyl-tRNA
synthetase (argS) 33.6406 56.9124 MG378(1364- {Corynebacterium
glutamicum} 1047 of 1611) GB: U01740(1- 319 of 319) MG378 477139
475529 SP: P35868 arginyl-tRNA synthetase (argS) 33.6406 56.9124
MG378(765- {Corynebacterium glutamicum} 456 of 1611) GB: U02168(1-
309 of 309) MG379 477168 479003 GB: L10328_106 glucose inhibited
division protein (gidA) 40.7346 61.9366 MG379(900- {Escherichia
coli} 1184 of 1836) GB: U01812(1- 285 of 285) MG385 484699 483992
-- -- -- -- MG385(234- 6 of 708) GB: U02112(1- 229 of 229) MG385
484699 483992 -- -- -- -- MG385(523- 708 of 708) GB: U02239(1- 186
of 320) MG385 484699 483992 -- -- -- -- MG385(528- 259 of 708) GB:
U02246(1- 270 of 270) MG386 489552 484705 GB: U11381_1
cytadherence-accessory protein (hmwl) 31.1755 49.4037 MG386(1294-
{Mycoplasma pneumoniae} 1628 of 4848) GB: U02175(1- 335 of 335)
MG386 489552 484705 GB: U11381_1 cytadherence-accessory protein
(hmwl) 31.1755 49.4037 MG386(2274- {Mycoplasma pneumoniae} 1991 of
4848) GB: X61519(1- 283 of 284) MG386 489552 484705 GB: U11381_1
cytadherence-accessory protein (hmwl) 31.1755 49.4037 MG386(3247-
{Mycoplasma pneumoniae} 3420 of 4848) GB: U02126(1- 174 of 174)
MG386 489552 484705 GB: U11381_1 cytadherence-accessory protein
(hmwl) { 31.1755 49.4037 MG386(3842- Mycoplasma pneumoniae} 4196 of
4848) GB: U02192(1- 355 of 355) MG386 489552 484705 GB: U11381_1
cytadherence-accessory protein (hmwl) 31.1755 49.4037 MG386(767-
{Mycoplasma pneumoniae} 1281 of 4848) GB: U02245(2- 515 of 515)
MG388 491004 490702 GB: U00016_19 hypothetical protein (GB:
U00016_19) 30.9278 56.701 MG388(285- {Mycobacterium leprae} 1 of
303) GB: U02265(1- 285 of 339) MG389 491530 491150 -- -- -- --
MG389(320- 129 of 381) GB: U01813(1- 192 of 192) MG390 493516
491537 SP: P37608 lactococcin transport ATP-binding protein
(lcnDR3) 22.3421 46.5331 MG390(1395- {Lactococcus lactis} 1744 of
1980) GB: U02218(1- 350 of 350) MG390 493516 491537 SP: P37608
lactococcin transport ATP-binding protein (lcnDR3) 22.3421 46.5331
MG390(1400- {Lactococcus lactis} 1174 of 1980) GB: U02248(1- 227 of
227) MG391 494967 493627 GB: D17450_1 aminopeptidase {Mycoplasma
salivarium} 41.2921 60.3933 MG391(1- 217 of 1341) GB: U02268(256-
472 of 472) MG391 494967 493627 GB: D17450_1 aminopeptidase
{Mycoplasma salivarium} 41.2921 60.3933 MG391(412- 735 of 1341) GB:
U01801(1- 324 of 324) MG391 494967 493627 GB: D17450_1
aminopeptidase {Mycoplasma salivarium} 41.2921 60.3933 MG391(412-
735 of 1341) GB: U01802(1- 324 of 324) MG392 496615 494987 GB:
L10132_2 heat shock protein (groEL) 51.5209 71.4829 MG392(1394-
{Bacillus stearothermophilus} 1629 of 1629) GB: U02268(1- 236 of
472) MG392 496615 494987 GB: L10132_2 heat shock protein (groEL)
51.5209 71.4829 MG392(181- {Bacillus stearothermophilus} 1 of 1629)
GB: U02252(1- 181 of 296) MG393 496960 496631 GB: D17398_1 heat
shock protein 60-like protein (PggroES) 39.5604 54.9451 MG393(330-
{Porphyromonas gingivalis} 231 of 330) GB: U02252(197- 296 of 296)
MG394 498306 497089 SP: P06192 serine hydroxymethyltransferase
(glyA) 55.303 70.7071 MG394(328- {Salmonella typhimurium} 683 of
1218) GB: U02131(1- 356 of 356) MG395 499890 498319 -- -- -- --
MG395(457- 116 of 1572) GB: U02260(1- 342 of 342) MG395 499890
498319 -- -- -- -- MG395(763- 979 of 1572) GB: X61530(1- 217 of
217) MG399 503976 502831 SP: P33253 ATP synthase beta chain (atpD)
80.9524 89.418 MG399(447- {Mycoplasma gallisepticum} 852 of 1146)
GB: U01752(1- 406 of 406) MG400 505099 504263 SP: P33257 ATP
synthase gamma chain (atpG) 37.9433 62.0567 MG400(160- {Mycoplasma
gallisepticum} 711 of 837) GB: U01703(1- 552 of 552) MG401 506655
505102 SP: P33252 ATP synthase alpha chain (atpA) 63.3911 79.5761
MG401(973- {Mycoplasma gallisepticum} 1554 of 1554) GB: U01727(1-
583 of 598) MG405 509012 508137 GB: X64256_2
adenosinetriphosphatase (atpB) 36.4261 63.9175 MG405(75-
{Mycoplasma gallisepticum} 1 of 876) GB: U01728(1- 75 of 299) MG406
509319 508981 SP: P15362 transport system permease protein P69
(P69) 40 57.1429 MG406(339- {Mycoplasma hyorhinis} 84 of 339) GB:
U01728(44- 299 of 299) MG410 513042 512056 GB: L10328_89 peripheral
membrane protein B (pstB) {Escherichia coli} 50.813 70.3252
MG410(301- 941 of 987) GB: U01707(1- 640 of 640) MG411 514991
513030 GB: X75297_1 periplasmic phosphate permease homolog (AG88)
30.7692 56.2753 MG411(406- {Mycobacterium tuberculosis} 632 of
1962) GB: U01746(1- 227 of 229) MG412 516124 514994 -- -- -- --
MG412(252- 1 of 1131) GB: U01702(1- 252 of 313) MG412 516124 514994
-- -- -- -- MG412(675- 563 of 1131) GB: U02101(1- 113 of 113) MG413
518389 516248 GB: L22432_4 hypothetical protein (GB: L22432_4) 25
54.1667 MG413(1179- {Mycoplasma capricolum} 701 of 2142) GB:
U01699(1- 480 of 480) MG413 518389 516248 GB: L22432_4 hypothetical
protein (GB: L22432_4) 25 54.1667 MG413(1535- {Mycoplasma
capricolum} 1230 of 2142) GB: U01804(1- 305 of 305) MG414 519355
516248 -- -- -- -- MG414(438- 154 of 917) GB: U01695(1- 285 of 285)
MG416 521414 520371 -- -- -- -- MG416(1- 39 of 1044) GB:
U01744(580- 618 of 620) MG416 521414 520371 -- -- -- -- MG416(7-
351 of 1044) GB: U02102(1- 345 of 345) MG418 522314 521877 SP:
P02410 ribosomal protein L13 (rpL13) {Escherichia coli} 41.3043
70.2899 MG418(321- 438 of 438) GB: U01744(1- 118 of 620) MG421
526696 524153 SP: P07671 excinuclease ABC subunit A (uvrA)
{Escherichia coli} 47.7541 68.5579 MG421(1693- 1393 of 2544) GB:
X61514(1- 301 of 301) MG422 529493 526989 -- -- -- -- MG422(2274-
2101 of 2505) GB: U02117(1- 174 of 174) MG422 529493 526989 -- --
-- -- MG422(2439- 2505 of 2505) GB: U02172(1- 67 of 318) MG422
529493 526989 -- -- -- -- MG422(35- 1 of 2505) GB: U02228(1- 35 of
304) MG423 531216 529534 -- -- -- -- MG423(1434- 1197 of 1683) GB:
X61510(1- 238 of 238) MG423 531216 529534 -- -- -- -- MG423(161-
413 of 1683) GB: X61524(1- 252 of 255) MG423 531216 529534 -- -- --
-- MG423(1683- 1455 of 1683) GB: U02228(76- 304
of 304) MG425 531668 533014 SP: P23304 ATP-dependent RNA helicase
(deaD) {Escherichia coli} 32.4121 58.0402 MG425(989- 769 of 1347)
GB: U01805(1- 220 of 220) MG431 538290 537559 GB: L27492_1
triosephosphate isomerase (tim) {Thermotoga maritima} 39.7541
61.8852 MG431(463- 732 of 732) GB: U02109(1- 270 of 277) MG437
542067 542981 GB: M11330_1 CDP-diglyceride synthetase (cdsA)
{Escherichia coli} 38.0165 55.3719 MG437(679- 378 of 915) GB:
U02189(2- 303 of 303) MG441 546707 546300 -- -- -- -- MG441(20- 318
of 408) GB: U02128(1- 299 of 299) MG447 552444 550804 GB: L08897_1
hypothetical protein (GB: L08897_1) 34.058 55.0725 MG447(319-
{Mycoplasma gallisepticum} 645 of 1641) GB: U01788(1- 327 of 327)
MG451 555612 554431 SP: P13927 elongation factor TU (tuf)
{Mycoplasma genitalium} 100 100 MG451(927- 586 of 1182) GB:
U02255(1- 342 of 342) MG453 556435 557310 GB: L12272_1 UDP-glucose
pyrophosphorylase (gtaB) 48.0287 65.233 MG453(491- {Bacillus
subtilis} 181 of 876) GB: U02258(1- 311 of 311) MG455 557724 558944
GB: M77668_1 tyrosyl tRNA synthetase (tyrS) 38.539 61.7128
MG455(604- {Bacillus stearothermophilus} 362 of 1221) GB: U02247(5-
247 of 247) MG456 559941 558940 -- -- -- -- MG456(256- 568 of 1002)
GB: U01790(1- 312 of 312) MG458 563307 562783 SP: Q02522
hypoxanthine-guanine phosphoribosyltransferase (hpt) 38.3721
66.8605 MG458 (295- {Lactococcus lactis} 24 of 525) GB: U02193(1-
272 of 272) MG459 563818 563312 GB: M64978_2 surface exclusion
protein (prgA) (Plasmid pCF10) 28.3582 49.2537 MG459(330-
{Enterococcus faecalis} 1 of 507) GB: U01725(1- 330 of 638) MG460
563991 564926 SP: P33572 L-lactate dehydrogenase (ldh) 50.3226
67.7419 MG460(1 - {Mycoplasma hyopneumoniae} 136 of 936) GB:
U01725(503- 638 of 638) MG462 567638 566187 GB: M55072_1
glutamyl-tRNA synthetase (gltX) 42.887 65.272 MG462(1452- {Bacillus
stearothermophilus} 1081 of 1452) GB: U02122(9- 379 of 379) MG463
568404 567628 GB: D26185_105 high level kasgamycin resistance
(ksgA) 35.6164 53.8813 MG463(777- {Bacillus subtilis} 409 of 777)
GB: U01719(36- 405 of 405) MG467 570988 570056 GB: X75422_1
heterocyst maturation protein (devA) {Anabaena sp.} 39.899 63.1313
MG467(40- 352 of 933) GB: U01741(1- 313 of 313) MG469 578578 577268
SP: P34028 chromosomal replication initiator protein (dnaA) 30.9469
57.2748 MG469(845- {Spiroplasma citri} 547 of 1311) GB: U02259(1-
299 of 299) MG469 578578 577268 SP: P34028 chromosomal replication
initiator protein (dnaA) 30.9469 57.2748 MG469(855- {Spiroplasma
citri} 1206 of 1311) GB: U02145(1- 352 of 352)
[0279]
4TABLE 1(d) UID Old_id(s) MG001 MORF-20072 MG002 MORF-19817 MG003
MORF-19818 MORF-20073 MG004 MORF-19819 MORF-20074 MG005 MORF-20075
MG006 MORF-20076 MG007 MORF-19820 MG008 MORF-20077 MG009 MORF-20078
MG010 MORF-20079 MG011 MORF-19821 MORF-19822 MG012 MORF-20080 MG013
MORF-19823 MORF-20080 MORF-20081 MG014 MORF-20082 MG015 MORF-20084
MG016 MORF-19824 MG017 MORF-19825 MG018 MORF-20085 MG019 MORF-20086
MG020 MORF-20088 MG021 MORF-20089 MG022 MORF-20091 MG023 MORF-20092
MG024 MORF-19826 MORF-20093 MG025 MORF-20094 MG026 MORF-20095 MG027
MORF-19827 MG028 MORF-19828 NG029 MORF-19829 MG030 MORF-20096 MG031
MORF-19830 MORF-20097 MG032 MORF-20099 MG033 MORF-20100 MG034
MORF-20101 MG035 MORF-20102 MG036 MORF-20103 MG037 MORF-20104 MG038
MORF-20105 MG039 MORF-19831 MORF-20106 MG040 MORF-20107 MG042
MORF-19832 MORF-20108 MG043 MORF-20110 MG044 MORF-20111 MG045
MORF-19833 MG046 MORF-20112 MG047 MORF-20113 MG048 MORF-19834
MORF-20114 MORF-20115 MG049 MORF-20114 MORF-20115 MG050 MORF-20117
MG051 MORF-19835 MORF-20118 MG052 MORF-20119 MG053 MORF-20120 MG054
MORF-20120 MORF-20121 MG055 MORF-19836 MG056 MORF-20122 MG057
MORF-20123 MG058 MORF-20124 MG059 MORF-20124 MORF-20125 MG060
MORF-20126 MG061 MORF-19838 MG062 MORF-19839 MORF-20127 MORF-20128
MG063 MORF-19840 MORF-20128 MG064 MORF-19841 MORF-19842 MG065
MORF-19843 MORF-20129 MG066 MORF-19844 MORF-20130 MG067 MORF-19845
MG068 MORF-20131 MG069 MORF-19847 MORF-20135 MG070 MORF-20136 MG071
MORF-19848 MORF-19849 MORF-19850 MORF-19851 MORF-20137 MG072
MORF-19852 MORF-19853 MORF-19854 MORF-20138 MG073 MORF-20139 MG074
MORF-19855 MG075 MORF-19856 MORF-19857 MG076 MORF-19858 MG077
MORF-20140 MG078 MORF-19859 MORF-20141 MG079 MORF-20142 MG080
MORF-20143 MG081 MORF-20144 MG082 MORF-20145 MG083 MORF-20146 MG084
MORF-20147 MG085 MORF-20147 MORF-20148 MG086 MORF-19860 MORF-19861
MG087 MORF-20149 MG088 MORF-20150 MG089 MORF-20151 MORF-20152 MG090
MORF-19862 MG091 MORF-20153 MG092 MORF-20154 MG093 MORF-20155 MG094
MORF-20156 MG095 MORF-19863 MG096 MORF-20157 MG097 MORF-20158 MG098
MORF-20159 MG099 MORF-19864 MORF-20160 MG100 MORF-19865 MORF-20161
MG101 MORF-19866 MG102 MORF-20162 MG103 MORF-19867 MORF-19868 MG104
MORF-20163 MG105 MORF-19869 MG106 MORF-20164 MORF-20165 MG107
MORF-20164 MORF-20165 MG108 MORF-20166 MG109 MORF-20167 MG110
MORF-20168 MG111 MORF-20169 MG112 MORF-20170 MG113 MORF-19870
MORF-20171 MORF-20172 MG114 MORF-20171 MORF-20172 MG116 MORF-19871
MG117 MORF-19872 MG118 MORF-20173 MG119 MORF-19873 MORF-20174 MG120
MORF-19874 MG121 MORF-19875 MORF-20175 MG122 MORF-20176 MG123
MORF-19876 MG124 MORF-20177 MG125 MORF-19877 MG126 MORF-20178 MG127
MORF-20179 MG128 MORF-20180 MG129 MORF-20181 MG130 MORF-20182 MG132
MORF-20183 MG133 MORF-19878 MG134 MORF-20184 MG135 MORF-20185 MG136
MORF-20186 MORF-20187 MG137 MORF-20186 MORF-20187 MG138 MORF-20188
MG139 MORF-19879 MG140 MORF-19884 MG141 MORF-19885 MORF-20192 MG142
MORF-19886 MORF-20193 MG143 MORF-20194 MG144 MORF-19887 MG145
MORF-20195 MG146 MORF-20196 MG147 MORF-19888 MORF-19889 MG148
MORF-19890 MG149 MORF-19891 MG150 MORF-19893 MORF-20197 MG151
MORF-19893 MORF-20198 MG152 MORF-19895 MORF-20199 MG153 MORF-19894
MG154 MORF-19896 MORF-20200 MG156 MORF-19897 MG157 MORF-20201 MG158
MORF-20202 MG159 MORF-19898 MG161 MORF-19900 MORF-20203 MG162
MORF-19899 MORF-19900 MG163 MORF-20204 MG165 MORF-20205 MG166
MORF-19901 MORF-20206 MG167 MORF-19901 MORF-20207 MG168 MORF-19902
MORF-20208 MG169 MORF-20209 MG170 MORF-20210 MG171 MORF-20211 MG172
MORF-20212 MG175 MORF-20213 MG176 MORF-20214 MG177 MORF-19903
MORF-20215 MG178 MORF-20216 MG179 MORF-19904 MORF-20217 MG180
MORF-20218 MG181 MORF-19905 MG182 MORF-20219 MG183 MORF-20219 MG184
MORF-20220 MG185 MORF-20221 MG186 MORF-19907 MG187 MORF-19908
MORF-19909 MORF-20225 MG188 MORF-20226 MORF-20227 MG189 MORF-20226
MORF-20227 MG190 MORF-20228 MG191 MORF-19910 MORF-19911 MORF-20229
MG192 MORF-19911 MORF-19912 MORF-20230 MG194 MORP-19913 MORF-20234
MG195 MORF-20235 MG196 MORF-20236 MG199 MORF-19914 MG200 MORF-19915
MORF-20237 MG201 MORF-19916 MORF-20239 MG202 MORF-19917 MG203
MORF-19918 MORF-19919 MORF-20240 MG204 MORF-20241 MORF-20242 MG205
MORF-20243 MG206 MORF-20244 MG207 MORF-19920 MG208 MORF-19921 MG209
MORF-20245 MG210 MORF-20246 MG211 MORF-19922 MG212 MORF-19924
MORF-20247 MORF-20248 MG213 MORF-20248 MG214 MORF-20249 MG215
MORF-20250 MG216 MORF-20251 MG217 MORF-20252 MG218 MORF-19926
MORF-19927 MORF-20253 MG219 MORF-19928 MORF-19930 MORF-20253 MG220
MORF-19931 MG221 MORF-20255 MG222 MORF-20256 MG223 MORF-19932 MG224
MORF-20257 MG225 MORF-20258 MG226 MORF-20259 MG227 MORF-20260 MG228
MORF-19933 MG229 MORF-19934 MORF-20261 MG230 MORF-19935 MG231
MORF-20262 MG232 MORF-20263 MG234 MORF-20264 MG235 MORF-19936
MORF-20265 MG236 MORF-19937 MG237 MORF-19938 MG238 MORF-19939
MORF-20266 MG239 MORF-20267 MG240 MORF-20268 MG241 MORF-19940
MORF-19941 MORF-19942 MG242 MORF-19943 MG243 MORF-19945 MG244
MORF-20269 MG245 MORF-19946 MG246 MORF-19947 MG247 MORF-20270 MG248
MORF-19948 MG249 MORF-19949 MORF-20271 MG250 MORF-20272 MG251
MORF-19950 MORF-20273 MG252 MORF-20274 MG253 MORF-20275 MG254
MORF-20276 MG255 MORF-19951 MORF-19952 MG256 MORF-19953 MG258
MORF-19954 MORF-20277 MG259 MORF-20278 MG260 MORF-19955 MORF-19956
MORF-20279 MG261 MORF-19958 MORF-20282 MG262 MORF-20283 MG263
MORF-20285 MG264 MORF-20286 MORF-20287 MG265 MORF-20286 MORF-20287
MG266 MORF-20288 MG267 MORF-19959 MORF-19960 MG268 MORF-20290 MG269
MORF-20291 MG270 MORF-20292 MG271 MORF-20293 MG272 MORF-19961
MORF-19962 MORF-20294 MG273 MORF-20295 MG274 MORF-20296 MG275
MORF-20297 MG276 MORF-20298 MG277 MORF-19963 MORF-20299 MG278
MORF-19964 MORF-20300 MG279 MORF-19965 MG280 MORF-19966 MORF-20301
MG281 MORF-19967 MORF-19968 MG282 MORF-20302 MG283 MORF-20303 MG284
MORF-19969 MORF-19970 MORF-19971 MG285 MORF-19969 MORF-19970
MORF-19971 MG286 MORF-19972 MG288 MORF-20306 MG289 MORF-20307 MG290
MORF-20308 MG291 MORF-20309 MG292 MORF-20310 MG293 MORF-20311 MG294
MORF-19974 MORF-20312 MG295 MORF-20313 MG296 MORF-19975 MG297
MORF-20314 MG298 MORF-19976 MORF-20315 MG299 MORF-20316 MG300
MORF-20317 MG301 MORF-19977 MORF-20318 MG302 MORF-19978 MG303
MORF-20319 MG304 MORF-20320 MG305 MORF-19979 MORF-20321 MG306
MORF-19980 MG307 MORF-19981 MORF-19982 MG308 MORF-20323 MG309
MORF-19983 MORF-19984 MG310 MORF-20324 MG311 MORF-20325 MG312
MORF-20326 MG314 MORF-19985 MORF-19986 MG315 MORF-19987 MORF-19988
MORF-20327 MG316 MORF-19988 MORF-20327 MG317 MORF-20328 MORF-20329
MG318 MORF-19989 MORF-19990 MG319 MORF-20330 MG320 MORF-19991 MG321
MORF-19992 MG322 MORF-19993 MORF-20331 MG323 MORF-19994 MORF-20332
MG324 MORF-19995 MORF-20333 MG326 MORF-20334 MG327 MORF-20335 MG328
MORF-19996 MORF-20336 MG329 MORF-19997 MORF-20337 MG330 MORF-20338
MORF-20339 MG331 MORF-20339 MG332 M0RF-20340 MG333 MORF-19998 MG334
MORF-20341 MG336 MORF-20343 MORF-20344 MG337 MORF-19999 MG338
MORF-20000 MG339 MORF-20001 MORF-20345 MG340 M0RF-20006 MORF-20348
MG341 MORF-20349 MG342 MORF-20350 MG343 MORF-20007 MG344 MORF-20008
MG345 MORF-20351 MG346 MORF-20352 MG348 MORF-20009 MG349 MORF-20010
MG350 MORF-20011 MG351 MORF-20353 MG352 MORF-20354 MG353 MORF-20355
MG354 MORF-20013 MORF-20014 MG355 MORF-20015 MORF-20016 MORF-20356
MG356 MORF-20357 MG357 MORF-20358 MG358 MORF-20017 MORF-20018
MORF-20019 MORF-20359 MG359 MORF-20019 MORF-20359 MORF-20360 MG360
MORF-20361 MG361 MORF-20362 MG362 MORF-20363 MG364 MORF-20364 MG365
MORF-20020 MORF-20365 MG366 MORF-20021 MG367 MORF-20366 MG368
MORF-20022 MORF-20366 MORF-20367 MG369 MORF-20022 MORF-20023 MG370
MORF-20368 MG371 MORF-20368 MORF-20369 MG372 MORF-20370 MG373
MORF-20024 MG374 MORF-20025 MG375 MORF-20371 MG376 MORF-20026 MG377
MORF-20027 MG378 MORF-20372 MG379 MORF-20373 MG380 MORF-20374 MG381
MORF-20028 MG382 MORF-20375 MG383 MORF-20376 MG384 MORF-20029
MORF-20377 MG385 MORF-20031 MORF-20378 MG386 MORF-20032 MORF-20379
MORF-20381 MG387 MORF-20382 MG388 MORF-20383 MG389 MORF-20033 MG390
MORF-20034 MORF-20384 MG391 MORF-20034 MORF-20035 MORF-20385 MG392
MORF-20036 MORF-20037 MORF-20386 MG393 MORF-20038 MG394 MORF-20387
MG395 MORF-20039 MG396 MORF-20388 MG397 MORF-20040 MORF-20041 MG398
MORF-20042 MG399 MORF-20389 MG400 MORF-20390 MG401 MORF-20043
MORF-20391 MG402 MORF-20392 MG403 MORF-20393 MG404 MORF-20394 MG405
MORF-20395 MORF-20396 MG406 MORF-20395 MORF-20396 MG407 MORF-20044
MORF-20397 MG408 MORF-20398 MG409 MORF-20045 MG410 MORF-20046
MORF-20399 MG411 MORF-20400 MG412 MORF-20047 MG413 MORF-20401 MG414
MORF-20048 MG415 MORF-20049 MG416 MORF-20050 MORF-20051 MG417
MORF-20402 MG418 MORF-20052 MG419 MORF-20053 MG420 MORF-20403 MG421
MORF-20404 MG422 MORF-20054 MORF-20055 MG423 MORF-20056 MG425
MORF-20406 MG427 MORF-20057 MG428 MORF-20058 MG429 MORF-20059
MORF-20407 MG430 MORF-20408 MG431 MORF-20409 MG432 MORF-20410 MG433
MORF-20411 MG435 MORF-20060 MORF-20412 MG436 MORF-20060 MORF-20412
MG437 MORF-20413 MG438 MORF-20414 MG439 MORF-20061 MG440 MORF-20062
MG441 MORF-20063 MG442 MORF-20415 MG443 MORF-20064 MG444 MORF-20065
MORF-20416 MG445 MORF-20417 MG447 MORF-20418 MG448 MORF-20419
MORF-20420 MG449 MORF-20419 MORF-20420 MG450 MORF-20066 MG451
MORF-20421 MG452 MORF-20067 MG453 MORF-20422 MG454 MORF-20423
MORF-20424 MG455 MORF-20423 MORF-20424 MG456 MORF-20068 MG457
MORF-20069 MORF-20425 MG458 MORF-20426 MG459 MORF-20070 MG460
MORF-20427 MG461 MORF-20428 MG462 MORF-20429 MG463 MORF-20430 MG464
MORF-20431 MG467 MORF-20432 MG468 MORF-20283 MG469 MORF-20434 MG470
MORF-20071 MORF-20435
[0280]
5 TABLE 2 UID end5 end3 gene_len MG016 19253 19756 504 MG017 19825
20352 528 MG027 30092 30544 453 MG028 30547 31149 603 MG064 74066
77683 3618 MG076 102870 102457 414 MG105 133569 134168 600 MG117
143310 143951 642 MG147 186138 187262 1125 MG185 211445 213547 2103
MG186 216017 216766 750 MG199 237094 236594 501 MG202 239826 240191
366 MG207 247523 247906 384 MG211 250997 251437 441 MG223 268011
269243 1233 MG230 276166 276624 459 MG236 280663 281082 420 MG241
286884 288743 1860 MG243 290976 291323 348 MG246 293936 294778 843
MG256 306819 307586 768 MG267 325157 324813 345 MG279 341181 340528
654 MG284 346853 347248 396 MG286 348260 348847 588 MG296 364414
364028 387 MG306 377974 376796 1179 MG321 402922 400121 2802 MG331
415622 414987 636 MG333 416716 416339 378 MG349 446576 447787 1212
MG350 447790 448722 933 MG354 451197 451607 411 MG366 462619 464619
2001 MG372 471234 470080 1155 MG373 472066 471224 843 MG376 474892
474581 312 MG377 475479 474901 579 MG381 479570 480223 654 MG397
502420 500723 1698 MG415 520238 519929 310 MG419 523215 522355 861
MG427 533270 533692 423 MG428 533806 534318 513 MG436 542092 541739
354 MG439 545378 544563 816 MG440 546154 545381 774 MG449 553295
552864 432 MG450 554269 553559 711 MG452 555665 556447 783 MG468
318330 319202 873
[0281]
6TABLE 3 Whole Genome Sequencing Strategy Stage Description Random
small insert and large Randomly shear genomic DNA on the insert
library construction order of 2 kb and 15-20 kb, respectively
Library plating Maximize random selection of small insert and large
insert clones for template production High-throughput DNA Sequence
xxx,xxxx templates from both sequencing ends (>99% genome
coverage) Assembly (TIGR Assembler, Assembly of sequence fragments
into GRASTA) contigs Gap closure a. Physical gaps Order all contigs
into a circular genome and provide templates for closure of all
physical gaps b. Sequence gaps Complete the genome by primer
walking Editing Visual inspection and resolution of all sequence
ambiguities when possible, including frameshifts Annotation
Identification and description of all ORF's, putative
identification, role assignments
[0282]
7TABLE 4 Computer simulation of random sequencing experiments where
L = 580,000 and w = 400. Clones Percent of Number Average sequenced
genome Base pairs of double gap length (n) unsequenced unsequenced
strand gaps (bp) 1000 50.18 291014 501 580 2000 25.18 146016 503
289 4000 6.34 36759 253 145 6000 1.60 9254 97 96 7250 0.67 3886 48
80 8000 0.40 2330 32 72 10000 0.10 586 10 59
[0283]
8TABLE 5 Mycoplasma genitalium - EcoRI fragments 5' Enzyme Start
Res 3' Enzyme End Res Length M W EcoRI 572231 EcoRI 1530 9367
5763365 EcoRI 1531 EcoRI 6723 5193 3195384 EcoRI 6724 EcoRI 15283
8560 5266795 EcoRI 15284 EcoRI 25781 10498 6459359 EcoRI 25782
EcoRI 35532 9751 5999831 EcoRI 35533 EcoRI 39821 4289 2639037 EcoRI
39822 EcoRI 43179 3358 2066196 EcoRI 43180 EcoRI 43707 528 324906
EcoRI 43708 EcoRI 49410 5703 3509174 EcoRI 49411 EcoRI 62708 13298
8182420 EcoRI 62709 EcoRI 71387 8679 5340230 EcoRI 71388 EcoRI
80769 9382 5772840 EcoRI 80770 EcoRI 84845 4076 2507946 EcoRI 84846
EcoRI 89622 4777 2939580 EcoRI 89623 EcoRI 93383 3761 2314332 EcoRI
93384 EcoRI 94573 1190 732268 EcoRI 94574 EcoRI 102229 7656 4710994
EcoRI 102230 EcoRI 107347 5118 3149292 EcoRI 107348 EcoRI 110797
3450 2122895 EcoRI 110798 EcoRI 114909 4112 2530290 EcoRI 114910
EcoRI 116440 1531 942140 EcoRI 116441 EcoRI 137514 21074 12967294
EcoRI 137515 EcoRI 144092 6578 4047534 EcoRI 144093 EcoRI 155336
11244 6918646 EcoRI 155337 EcoRI 162136 6800 4184109 EcoRI 162137
EcoRI 163907 1771 1089750 EcoRI 163908 EcoRI 169816 5909 3636217
EcoRI 169817 EcoRI 171885 2069 1273325 EcoRI 171886 EcoRI 176630
4745 2920129 EcoRI 176631 EcoRI 221880 45250 27844584 EcoRI 221881
EcoRI 225692 3812 2345923 EcoRI 225693 EcoRI 228254 2562 1576700
EcoRI 228255 EcoRI 277826 49572 30503951 EcoRI 277827 EcoRI 282740
4914 3023818 EcoRI 282741 EcoRI 285470 2730 1679928 EcoRI 285471
EcoRI 292152 6682 4111409 EcoRI 292153 EcoRI 293879 1727 1062607
EcoRI 293880 EcoRI 312725 18846 11596154 EcoRI 312726 EcoRI 347231
34506 21232617 EcoRI 347232 EcoRI 352330 5099 3137714 EcoRI 352331
EcoRI 362310 9980 6140434 EcoRI 362311 EcoRI 377990 15680 9648201
EcoRI 377991 EcoRI 390080 12090 7439090 EcoRI 390081 EcoRI 402043
11963 7361170 EcoRI 402044 EcoRI 408452 6409 3943775 EcoRI 408453
EcoRI 419230 10778 6631662 EcoRI 419231 EcoRI 422653 3423 2106066
EcoRI 422654 EcoRI 425383 2730 1679735 EcoRI 425384 EcoRI 426391
1008 620235 EcoRI 426392 EcoRI 439467 13076 8046286 EcoRI 439468
EcoRI 444297 4830 2971763 EcoRI 444298 EcoRI 444940 643 395631
EcoRI 444941 EcoRI 452525 7585 4667018 EcoRI 452526 EcoRI 455595
3070 1888976 EcoRI 455596 EcoRI 461533 5938 3653550 EcoRI 461534
EcoRI 467016 5483 3373523 EcoRI 467017 EcoRI 483871 16855 10370549
EcoRI 483872 EcoRI 487269 3398 2090889 EcoRI 487270 EcoRI 488085
816 502090 EcoRI 488086 EcoRI 488496 411 252914 EcoRI 488497 EcoRI
498574 10078 6201025 EcoRI 498575 EcoRI 499113 539 331666 EcoRI
499114 EcoRI 516146 17033 10480304 EcoRI 516147 EcoRI 524998 8852
5446303 EcoRI 524999 EcoRI 527362 2364 1454583 EcoRI 527363 EcoRI
529777 2415 1485826 EcoRI 529778 EcoRI 530256 479 294749 EcoRI
530257 EcoRI 531045 789 485489 EcoRI 531046 EcoRI 533591 2546
1566584 EcoRI 533592 EcoRI 549000 15409 9480966 EcoRI 549001 EcoRI
550638 1638 1007852 EcoRI 550639 EcoRI 563713 13075 8045103 EcoRI
563714 EcoRI 566925 3212 1976345 EcoRI 566926 EcoRI 572230 5305
3264227
[0284]
9 MG# Identification MatchAcc % ID Length MG# Identification
MatchAcc % ID Length *MG394 Uridine Kinase (udk) (Escherichia coli)
SP: P31218 34.5 204 *MG390 arginyl-tRNA synthetase (argS)
(Corynebacterium glutamicum) SP: P35868 33.6 431 Purine
ribonucleotide biosynthesis *MG114 asparaginyl-tRNA synthetase
(asnS) (Escherichia coli) GP: M33145_1 41.5 449 *MG107 5' guanylate
kinase (gmk) (Escherichia coli) GP: L10328_14 42.6 183 *MG036
aspartyl-tRNA synthetase (aspS) (Thermus aqusticus) SP: P36419 40.9
563 *MG175 adenylate kinase (adk) (Bacillus stearothermophilus) GP:
M88104_2 32.2 210 *MG258 cysteinyl-tRNA synthetase (cryS) (Bacillus
subtilis) GP: D26185_158 34.3 437 *MG058 phosphoribosyloyrophospha-
te synthetase (prs) GP: D26185_114 44.4 310 *MG474 glutamyl-tRNA
synthetase (gtiX) (Bacillus stearothermophilus) GP: M55072_1 42.9
480 (Bacillus subtilis) Pyrimidine nibonucleotide biosynthesis
*MG256 glycyl-tRNA synthetase (Bombyx mori) GP: L06106_1 35.9 574
Salvage of nucleosides and nucleotides *MG035 histidyl-tRNA
synthetase (hisS) (Mycobacterium leprae) GP: U00011_2 30.7 386
*MG284 adenine phosphoribosyltransferase (apt) GP: M14040_1 34.1
153 *MG357 Isoleucyl-tRNA synthetase (ileS) (Escherichia coli) SP:
P00958 33.3 921 (Escherichia coli) *MG052 cytidine deaminase (cdd)
(Mycoplasma pirum) GP: L13289_4 38.2 121 *MG274 leucyl-tRNA
synthetase (leuS) (Bacillus stearothermophilus) GP: M88581_1 43.4
799 *MG340 cytidylate kinase (cmk) (Bacillus subtilis) SP: P38493
40.4 215 *MG137 lysyl-tRNA synthetase (lysS) (Bacillus subtilis)
GP: D26185_144 45.6 490 MG276 deoxyguanosine/deoxyadenosine
kinase(I) subunit 2 GP: U01881_2 29.5 164 *MG377 methlonyl-tRNa
lormytransferase (lmt) (Escherichia coli) GP: X63668_2 24.1 304
(Lactobacillus acrdophilus) *MG021 methlonyl-tRNA synthetase (metS)
(Bacillus subtilis) GP: D26185_101 37.5 515 *MG470
hypoxanthine-guanine phosphorlbosyltransferase SP: O02522 38.4 170
*MG085 peptidyl-tRNA hyorolase homolog (ptn) (Borrelia burgdorieri)
GP: L32144_1 38.2 154 (hpt) (Lactococcus lactus) *MG201
phenylalanyl-tRNA synthetase beta chain (pheT) (Bacillus subtilis)
SP: P17922 26.0 677 *MG048 punne-nucleoside phosphorylase (deoD)
GP: U14003_295 44.3 228 *MG200 phenylalanyl-tRNA synthetase
beta-subunit (pheS) (Escherichia coli) GP: V00291_5 35.1 320
(Escherichia coli) *MG034 thymidine kinase (Bacillus subtilis) GP:
M97678_5 48.1 187 *MG292 prolyl-tRNA synthetase (proS) (Escherichia
coli) GP: M97858_1 22.7 438 MG051 thymidine phosphorylase (deoA)
GP: L13289_3 52.7 416 *MG005 Seryl-tRNA synthetase (serS) (Bacillus
subtilis) GP: D26185_77 42.6 416 (Mycoplasma plrum) *MG030 uracil
phosphorlbosyltransferase (upp) GP: Z27121_3 44.9 206 *MG387
threonyl-tRNA synthetase (thrSv) (Bacillus subtills) GP: M36594_1
38.7 558 (Mycoplasma hominis) Sugar-nucleotide biosynthesis and
conversions *MG457 tRNA (guanine-N1)-methyltransferase (trmD)
(Salmonelia SP: P36245 40.8 223 *MG119 UDP-glucose 4-epinerase
(galE) (Escherichia coli) SP: P09147 34.1 322 typhimurium) *MG465
UDP-glucose pyrophosphorylase (gtaB) GP: L12272_1 48.0 277 *MG127
tryptophanyl-tRNA synthetase (trpS) (Bacillus subtills) GP:
M24068_1 41.2 324 (Bacillus subtilis) *MG466 tyrosyl tRNA
synthetase (tyrS) (Bacillus stearothermophilus) GP: M77668_1 38.5
418 Regulatory functions *MG344 valyl-tRNA synthetase (valS)
(Bacillus subtilis) SP: O06873 38.5 857 *MG396 GTP-binding protein
(obg) (Bacillus subtilis) GP: M24537_2 39.6 426 Degradation of
proteins, peptides, and glycopeptides *MG399 GTP-binding protein
era homolog (spg) SP: P37214 27.4 273 *MG334 aminopeptidase P
(pepP) (Escherichia coli) GP: D00398_1 30.5 254 (Streptococcus
mulans) *MG460 pilB homolog transcription repressor GP: Z33052_1
53.5 128 *MG403 aminopeptidase (Mycoplasma salivarium) GP: D17450_1
44.6 303 (Mycoplasma capricolum) *MG420 PILB protein MOTIF
(Neisseria gonorrhoeae) SP: P14930 49.2 127 *MG244 ATP-dependent
protease (lon) (Bacillus subtilis) SP: P37945 43.6 753 *MG105
virulence associated protein homolog (vacB) GP: U14003_91 29.2 560
*MG367 ATP-dependent protease binding subunit (ctpB) (Escherichia
coli) GP: M29364_2 47.7 709 (Escherichia coli) *MG067 glutamic acid
specific protease prepropetide (Staphylococcus GP: D00730_1 28.8
250 Replication aureus) Degradation of DNA MG224 IgA1 protease
(Haemophilus Influenzae) GP: M87491_1 32.2 675 MG032 ATP-dependent
nuclease (addA) (Bacillus subtilis) GP: M63489_1 26.8 706 MG186
oligoendopeptidase F (pepF) (Lactococcus lactis) GP: Z32522_1 30.0
442 MG240 endonuclease IV (nfo) (Escherichia coli) SP: P12638 29.4
267 MG321 proline iminopeptidase (pip) (Bacillus coagulans) GP:
D11037_1 29.2 209 DNA replication, restinction, modification,
recombination, and repair MG020 proline iminopeptidase (pip)
(Neisseria gonorrhoaeae) GP: Z25461_2 37.5 281 *MG481 chromosomal
replication initiator protein (dnaA) SP: P34028 30.9 432 *MG046
sialoglycoprotease (gcp) (Pasteurella haemolytica) GP: M62384_1
36.4 313 (Spiroplasma citri) *MG210 DNA gyrase subunit A
(Mycoplasma genitalium) GP: U09251_4 37.4 782 Nucleoproteins *MG004
DNA gyrase subunit A (Mycoplasma genitalium) GP: U09251_4 99.9 835
Protein modification and translation factors *MG003 DNA gyrase
subunit B (gyrB) GP: U09251_3 99.2 645 *MG090 elongation factor G
(fus) (Thermus aquaticus) SP: P13551 59.2 683 (Mycoplasma
genitalium) *MG249 DNA helicase II (mutB1) (Haernophilus
influenzae) GP: M99049_1 36.0 715 *MG026 elongation factor P (efp)
(Escherichia coli) GP: U14003_62 26.4 162 *MG259 DNA ligase (lig)
(Escherichia coli) GP: M24278_1 38.2 657 *MG445 elongation factor
Ts (lsf) (Spiroplasma citri) GP: M31161_2 39.1 294 *MG269 DNA
polymerase I (poll) MOTIF GP: L11920_1 29.9 837 *MG463 elongation
factor TU (luf) (Mycoplasma genitalium) SP: P13927 100.0 383
(Mycobacterium tuberculosis) *MG031 DNA polymerase III (polC)
(Mycoplasma pulmonis) GP: U06833_1 38.1 1352 *MG176 methionine
amino peptidase (Bacillus subtilis) GP: D00619_5 36.3 245 *MG001
DNA polymerase III beta subunit (dnaN) GP: U09251_1 100.0 97 *MG263
peptide chain release factor I (RF-1) (Escherichia coli) GP:
M11519_1 43.2 320 (Mycoplasma genitalium) MG007 DNA polymerase III
subunit (dnaH) MOTIF GP: D26185_83 22.7 142 *MG108 polypeptide
delormylase (lormylmethionine delormylase) (def) SP: P27251 36.9
107 (Bacillus subtilis) *MG432 DNA polymerase III subunit (dnaH)
GP: D26185_83 49.1 224 MOTIF (Escherichia coli) (Bacillus subtilis)
*MG268 DNA polymerase III, alpha chain (dnaE) GP: M19334_4 31.9 843
MG109 protein phosphatase 2C homolog (ptc1) MOTIF (Saccharomyces
SP: P35182 27.5 141 *MG010 DNA primase (dnaE) MOTIF (Clostridium
SP: P33655 25.7 174 cerevisiae) acatobutylicum) (Escherichia coli)
*MG255 DNA primase (dnaE) (Bacillus subtilis) GP: M10040_1 27.3 587
MG110 protein serine/threonine kinase MOTIF (Arabidopsis thaliana)
PIR: S36944 33.7 242 *MG123 DNA topolsomerase I (topA) (Bacillus
subtilis) GP: L27797_2 38.9 658 *MG146 protein synthesis initiation
factor 2 (inIB) (Bacillus subtilis) GP: M34836_1 48.0 619 *MG433
excinuclease ABC subunit A (uvrA) SP: P07671 47.8 842 *MG447
ribosome releasing factor (irr) (Escherichia coli) GP: D26552_57
34.9 169 (Escherichia coli) *MG075 excinuclease ABC subunit B
(uvrB) SP: P07025 48.0 662 *MG291 transcription elongation factor
(greA) (Rickettsia prowazekll) SP: P27640 40.1 135 (Escherichia
coli) *MG270 formamidopyrimidine-DNA glycosylase (tpg) SP: P19210
37.6 272 *MG202 translation initiation factor IF3 (inIC) Bacillus
stearothermophilus) GP: X16188_1 31.3 133 (Bacillus firmus) *MG391
glucose inhibited division protein (gidA) GP: L10328_106 40.3 600
Ribosomal proteins synthesis end modification (Escherichia coli)
*MG392 glucose inhibited division protein (gidB) GP: L10328_105
24.8 143 *MG084 ribosomal protein L1 (rpL1) (Bacillus
stearothermophilis) SP: P04447 48.2 221 (Escherichia coli) *MG370
Holliday junction DNA helicase (ruvA) GP: M21298_1 26.2 153 *MG373
ribosomal protein L10 (rpL10) (Thermologa maritime) SP: P29394 29.8
162 (Escherichia coli) *MG371 Holliday junction DNA helicase (ruvB)
GP: M21298_2 34.7 297 *MG083 ribosomal protein L11 (RPL11)
(Thermologa maritime) SP: P29395 51.8 140 (Escherichia coli) MG187
methyltransferase (ssoIM) GP: M97479_2 42.5 314 *MG430 ribosomal
protein L13 (Escherichia coli) SP: P02410 39.9 137 (Shigella
sonnei) *MG349 recombination protein (recA) GP: L25893_1 46.6 292
*MG165 ribosomal protein L14 (rpL14) (Bacillus stearothermophilus)
SP: P04450 63.1 121 (Staphylococcus aureus) *MG095 replicative DNA
helicase (dnaB) SP: P03005 33.1 439 *MG173 ribosomal protein L15
(rpL15) (Mycoplasma capricolum) SP: P10138 41.9 144 (Escherichia
coli) MG450 restriction-modification enzyme EcoD GP: J01631_1 24.6
390 *MG162 ribosomal protein L16 (rpL16) (Mycoplasma capricolum)
SP: P02415 63.5 136 specificity subunit (nsdS) (Escherichia coli)
*MG161 ribosomal protein L17 (rpL17) (Bacillus subtilis) GP:
M26414_6 34.8 115 *MG047 S-adenosylmethlonine synthetase 2 (metX)
SP: P30869 43.8 363 *MG171 ribosomal protein L18 (rpL18) (Bacillus
stearothermophilus) GP: M57624_1 43.0 113 (Escherichia coli) *MG092
single-stranded DNA binding protein (ssb) GP: U04997_2 21.8 162
*MG456 ribosomal protein L19 (rpL19) (Bacillus stearothermophilus)
SP: P30529 49.1 111 (Haemophilus Influenzae) *MG209 topoisomerase
II subunit B (topIIB) GP: L35044_2 52.4 630 *MG158 ribosomal
protein L2 (rpL2) (Bacillus stearothermophilus) SP: P04257 58.4 273
(Mycoplasma gallisepticum) *MG098 uracil DNA glycosylase (ung)
(Escherichia coli) GP: D13169_3 32.6 217 *MG238 ribosomal protein
L21 (rpL21) (Bacillus subtilis) SP: P26908 37.9 98 *MG160 ribosomal
protein L22 (rpL22) (Mycoplasma-like organism) GP: M74770_4 49.0
103 Transcription *MG157 ribosomal protein L23 (Bacillus
stearothermophilus) SP: P04454 38.7 89 Degradation of RNA *MG166
ribosomal protein L24 (Bacillus stearothermophilus) SP: P04455 44.6
83 *MG379 ribonuclease III (rnc) (Escherichia coli) GP: X02673_1
30.2 118 *MG239 ribosomal protein L27 (rpL27) (Bacillus subtilis)
GP: K02665_2 64.4 88 *MG477 RNaseP C5 subunit (Mycoplasma
capricolum) GP: D14982_2 40.0 78 *MG163 ribosomal protein L29
(Thermologa mantima) SP: P38514 41.7 59 RNA synthesis,
modification, and DNA transcription *MG155 ribosomal protein L3
(rpL3) (Mycoplasma capricolum) SP: P10134 42.6 213 *MG319
ATP-dependent RNA helicase (deaD) SP: P23304 23.1 369 *MG335
ribosomal protein L33 (Bacillus stearothermophilus) SP: P23375 58.1
42 (Escherichia coli) *MG437 ATP-dependent RNA helicase (deaD) SP:
P23304 32.4 390 *MG478 ribosomal protein L34 (rpL34) (Escherichia
coli) GP: L10328_67 67.4 45 (Escherichia coli) *MG352 DNA-directed
RNA polymerase beta' chain (rpoC) SP: P00577 44.5 1348 *MG156
ribosomal protein L4 (rpL4) (Bacillus stearothermophilus) SP:
P28601 39.2 205 (Escherichia coli) *MG018 helicase (mol1) MOTIF
(Saccharomyces cerevisiae) SP: P32333 36.5 502 *MG167 ribosomal
protein L5 (rpL5) (Bacillus stearothermophilus) SP: P08895 57.5 178
*MG145 N-utilization substance protein A homolog (nusA) SP: P32727
30.9 360 *MG170 ribosomal protein L6 (rpL6) (Mycoplasma capricolum)
SP: P04446 46.4 179 (Bacillus subtilis) *MG180 RNA polymerase
alpha-core-subunit (rpoA) GP: M26414_5 39.4 295 *MG374 ribosomal
protein L7/L12 (`A`type) (rpL7/L12)(Bacillus subtilis) SP: P02394
47.5 118 (Bacillus subtilis) *MG353 RNA polymerase beta-subunit
(rpoB) GP: L24376_3 46.5 1144 *MG094 ribosomal protein L9 (rpL9)
(Bacillus stearothermophilus) GP: M57623_1 32.9 148 (Bacillus
subtilis) MG022 RNA polymerase delta-subunit (rpoE) GP: M21677_1
28.7 152 *MG154 ribosomal protein S10 (rpS10) (Thermologa maritime)
SP: P38518 48.9 91 (Bacillus subtilis) *MG254 RNA polymerase
sigma-A factor (sigA) SP: P33656 43.7 370 *MG179 ribosomal protein
S11 (rpS11) (Escherichia coli) GP: X02543_2 47.8 112 (Clostridium
acetobutylicum) *MG054 transcription antitermination factor (nusG)
GP: D13303_4 30.9 171 *MG088 ribosomal protein S12 (rpS12)
(Bacillus stearothermophilus) SP: P09901 75.4 133 (Bacillus
subtilis) *MG178 ribosomal protein S13 (rpS13) (Bacillus subtilis)
GP: M26414_3 63.3 119 Translation *MG168 ribosomal protein S14
(Mycoplasma capricolum) GP: X06414_15 70.0 59 Amino acyl tRNA
synthetases and tRNA modification *MG438 ribosomal protein S15
(BS18) (Bacillus stearothermophilus) SP: P05768 48.1 80 *MG303
alanyl-tRNA-synthetase (alaS) (Escherichia coli) GP: J01581_1 33.8
795 *MG458 ribosomal protein S16 (BS17) (Bacillus subtilis) SP:
P21474 48.8 81 Amino acid biosynthesis Central Intermediary
metabolism Aromatic amino acid family Amino sugars Aspartate family
Degradation of polysaccharides Branched chain family *MG222
bifunctional endo-1,4-beta-xylanase xyla precursor MOTIF SP: P29126
37.6 240 Glutamate family (Ruminococcus flavefaciens) Pyruvate
family Other Sarine family *MG369 acetate kinase (Bacillus
subtilis) GP: L17320_2 42.7 391 *MG406 serine
hydroxymethyltransferase (glyA) SP: P06192 55.3 397 *MG038 glycerol
kinase (glpK) (Escherchia coli) GP: L19201_68 46.8 498 (Salmonella
typhimurium) *MG304 glycerophosphoryl diester phosphodiesterase
(glpO) (Bacillus subtilis) SP: P37965 30.4 235 Biosynthesis of
cofactors, prosthetic groups, and carriers *MG310
phosphotransacetylase (Closindium acetobutylicum) SP: P39648 44.7
320 Biotin Phosphorus compounds Folic acid *MG363 Inorganic
pyrophosphalase (ppa) (Thermoplasma acidophilum) SP: P37981 38.9
156 *MG013 5,10-methylene-tetrahydrofolate dehydrogenase GP:
D10588_1 33.0 238 Polyamine biosynthesis (foID) (Escherichia coli)
Polysaccharides --(cytoplasmic) *MG234 dihydrofolate reductase GP:
X60681_1 33.1 166 Sulfur metabolism (Lactococcus lactis) Hemo and
porphyrin *MG264 protoporphyrinogen oxidase (hernK) GP: D28567_2
30.6 160 Energy metabolism (Escherichia coli) Locate Aerobic
Menaquinone and ubiquinone *MG039 glycerol-3-phospate dehydrogenase
(GUT2) (Saccharomyces PIR: S48379 43.2 212 Molybdopterin
cerevisiae) Pantothenate *MG472 L-lactate dehydrogenase (ldh)
(Mycoplasma hyopneumoniae) SP: P33572 50.3 312 Pyndoxne MG283 NADH
oxidase (nox) (Enterococcus faecalis) SP: P37061 39.2 433
Riboflavin Amino acids and amines Thoredoxin, glutaredoxin, and
glutathione Anaerobic ATP-proton motive force interconversion Cell
envelope *MG410 ATP synthase epsilon chain (atpC) (Mycoplasma
gallisepticum) SP: P33255 36.9 129 Membranes, lipoproteins, and
portis *MG411 ATP synthase beta chain (atpD) (Mycoplasma
gallisepticum) SP: P33253 81.0 377 MG328 fibronectin-binding
protein (fnbA) GP: J04151_1 24.6 913 *MG412 ATP synthase gamma
chain (atpG) (Mycoplasma gallisepticum) SP: P33257 37.9 285
(Staphylococcus aureus) MG040 membrane lipoprotein (tmpC)
(Treponema pallidum) SP: P29724 30.9 248 *MG413 ATP synthase alpha
chain (atpA) (Mycoplasma gallisepticum) SP: P33252 63.4 517 *MG087
prolipoprotein diacylglyceryl transferase GP: L13259_2 29.1 261
*MG414 ATP synthase delta chain (atpH) (Mycoplasma gallisepticum)
SP: P33254 33.9 168 (Salmonella typhimurium) Murein sacculus and
peptidoglycan *MG415 ATP synthase B chain (atpF) (Mycoplasma
gallisepticum) SP: P33258 36.6 192 Surface polysaccharides,
lipopolysaccharides and antigens *MG416 ATP synthase C chain (atpE)
(Mycoplasma gallisepticum) SP: P33258 50.0 77 *MG368 lic-1 operon
protein (licA) MOTIF GP: M27280_1 27.8 152 *MG417
adenosinetriphosphatase (atpB) (Mycoplasma gallisepticum) GP:
X64256_2 35.7 292 (Haemophilus Influenzae) *MG060
lipopolysaccharide biosynthesis protein SP: P26401 36.1 185
Electron transport (rfbV) MOTIF (Salmonella typhimurium)
Entner-Doudoroff *MG277 surface protein antigen precursor (pag) GP:
D90354_1 25.5 797 Fermentation MOTIF (Streptococcus sobrinus)
Gluconeogenesis Surface structures Glycolysis MG196 attachment
protein (mgpA) SP: P20796 100.0 1443 *MG063 1-phosphotructokinase
(lruK) (Escherichia coli) SP: P23539 26.3 268 (Mycoplasma
genitalium) MG190 attachment protein repeat (mgpA) SP: P20796 36.6
903 *MG220 6-phosphotructokinase (phosphotructokinase)
(phosphohexokinase) SP: P20275 39.4 321 (Mycoplasma genitalium)
MG267 attachment protein repeat (mgpA) SP: P20796 38.0 963
(Spiroplasma citn) (Mycoplasma genitalium) MG188 attachment protein
repeat (mgpA) SP: P20796 61.8 943 *MG419 enolase (Bacillus
subtilis) GP: L29475_4 54.1 425 (Mycoplasma
genitalium) MG069 attachment protein repeat (mgpA) SP: P20796 76.4
760 *MG023 fructose-bisphosphate aidolase (tsr) (Bacillus subtilis)
GP: M22039_4 46.0 282 (Mycoplasma genitalium) MG189 attachment
protein repeat (mgpA) SP: P20796 77.9 763 *MG312
glyceraldehyde-3-phospha- te dehydrogenase (gap) (Clostridium GP:
X72219_1 56.1 329 (Mycoplasma genitalium) MG232 attachment protein
repeat (mgpA) SP: P20796 78.2 86 pasteurianum) (Mycoplasma
genitalium) MG297 attachment protein repeat (mgpA) SP: P20796 80.2
756 *MG112 phosphoglucose Isomerase B (pgiB) (Bacillus
stearothermophilus) SP: P13376 34.8 424 (Mycoplasma genitalium)
MG141 attachment protein repeat (mgpA) SP: P20796 80.3 753 *MG311
phosphoglycerate kinase (Thermotoga maritima) SP: P36204 51.3 383
(Mycoplasma genitalium) *MG198 attachment protein repeat (mgpA) SP:
P20796 81.3 753 MG442 phosphoglycerate mutase (pgm) (Bacillus
subtilis) GP: L29475_3 45.2 510 (Mycoplasma genitalium) MG266
attachment protein repeat (mgpA) SP: P20796 82.2 753 *MG221
pyruvate kinase (pyk) (Lactococcus lactis) GP: L07920_2 35.3 467
(Mycoplasma genitalium) MG351 attachment protein repeat (mgpA) SP:
P20796 84.3 734 *MG443 triosephosphate Isomerase (tim) Thermotoga
maritima) GP: L27492_1 39.8 247 (Mycoplasma genitalium) *MG398
Cylacherence-accessory protein (hmw1) GP: U11381_1 34.1 876 Pentose
phosphate pathway (Mycoplasma pneumoniae) MG323
Cylacherence-accessory protein (hmw1) GP: U11381_1 39.3 1015 *MG272
6-phosphogluconate dehydrogenase (gnd) (Escherichia coli) GP:
M64324_1 29.9 440 (Mycoplasma pneumoniae) *MG327
Cylacherence-accessory protein (hmw3) GP: M82965_1 41.1 669 *MG066
transketolase 1 (TK 1) (tk1A) (Escherichia coli) SP: P27302 32.8
647 (Mycoplasma pneumoniae) Pyruvate dehydrogenase Cellular
processes *MG280 dihydrolipoamide acetyltransferase (pdhC)
(Acholeplasma Isidiawi) GP: M81753_3 45.2 524 Cell division *MG279
lipoamide dehydrogenase component (E3) of pyruvate dehydrogenase
SP: P11959 38.4 453 *MG469 cell division protein (ftsH) (Bacillus
subtilis) GP: D26185_132 49.7 627 complex dihydrolipoamide dehydrog
*MG308 cell division protein (ftsY) (Escherichia coli) GP:
U00039_18 36.1 323 *MG282 pyruvate dehydrogenase E1-alpha subunit
(pdhA) (Acholeplasma GP: M81753_1 43.0 341 *MG229 cell division
protein (ftsZ) (Staphylococcus aureus) GP: U06462_1 30.9 274
ladiawi) Cell killing *MG281 pyruvate dehydrogenase E1-beta subunit
(pdhB) (Acholeplasma GP: M81753_2 55.0 317 *MG150 hemolysin (ftyC)
(Serpulina hyodysenterise) GP: X73141_2 26.3 234 ladiawi) MG225
pre-procylotoxin (Helicobacter pylon) GP: Z26883_1 36.1 789 Sugars
Chaperones *MG113 D-ribulose-5-phosphate 3 epimerase (ctxEc)
(Aicafigenes eutrophus) GP: M64173_3 33.1 175 *MG404 groEL protein
(Bacillus stearothermophilus) GP: L10132_2 51.5 524 *MG050
deoxyribose-phosphate aidolase (deoC) (Mycoplasma pneumoniae) GP:
X13544_1 83.0 223 *MG206 heat shock protein (dnaJ) MOTIF (Coxiella
burnatil) GP: L36455_1 33.6 349 MG408 galactosidase
acetyltransferase (Streptococcus mutans) GP: M80797_2 40.3 135
MG002 heat shock protein (dnaJ) MOTIF SP: P35514 40.0 60 *MG053
phosphomannomutase (cpsG) (Mycoplasma pirum) GP: L13289_5 38.6 534
(Lactococcus lactis) *MG019 heat shock protein (dnaJ) (Lactococcus
lactis) SP: P35514 34.0 357 TCA cycle *MG207 heat shock protein
(grpE) (Bacillus subtilis) GP: M84964_2 31.7 158 *MG405 heat shock
protein 60 (GroEL) like protein GP: D17398_1 39.6 87 Fatty acid and
phospholipid metabolism (PggroES) (Porphyromonas gingivalis) *MG217
1-acyl-sn-glycerol-3-pho- sphate acetyltransferase (ptsC) (Borrella
GP: L32881_1 32.1 119 *MG316 heat shock protein 70 (HSP70) GP:
D30690_3 57.3 580 burgdorfen) (Staphylococcus aureus)
Detoxification *MG448 CDP-diglyceride synthetase (cdsA)
(Escherichia coli) GP: M11330_1 38.0 120 *MG008 thiophene and furan
oxidizer (tohF) GP: D26185_60 31.9 456 MG380 fatty acid
phosphol-pid synthesis protein (ptsX) (Escherichia coli) GP:
M96793_1 29.0 327 (Bacillus subtilis) Protein and peptide secretion
MG086 hydroxymethylglutaryl-CoA reductase (NADPH) PIR: S24760 23.3
502 (Nicotiana sylvestris) *MG139 GTP-binding membrane protein
(lepA) GP: K00426_1 47.5 589 *MG115 phosphatidylglycerophosphate
synthase (pgsA) (Escherichia coli) GP: M12299_2 29.3 156
(Escherichia coli) *MG182 haemolysin secretion ATP-binding protein
SP: P11599 34.6 236 (hlyB) MOTIF (Proteus vulgaris) Purines,
Pyrimidines, nucleosides, and nucleotides *MG074 preprotein
translocase (secA) (Bacillus sutilis) GP: D10279_2 43.7 764
2'-Deoxyribonucleotide metabolism *MG174 preprotein translocase
secY subunit (SecY) SP: P10250 38.8 449 *MG237
ribonucleoside-diphosphate reductase (nrdE) (Salmonella GP:
X73226_1 54.1 703 (Mycoplasma capricolum) MG215 prolipoprotein
signal peptidase (lsp) GP: M83994_1 32.4 145 typhimurium)
(Staphylococcus aureus) *MG049 signal recognition particle protein
(lfh) SP: P37105 43.0 439 *MG235 ribonucleotide reductase 2 (nrdF)
(Salmonella typhimurium) SP: P17424 50.0 313 (Bacillus subtilis)
*MG243 trigger factor (tig) Escherichia coli) GP: M34066_1 24.6 391
*MG125 thioredoxin (trx) (Bacillus subtilis) GP: J03294_1 36.1 98
Transformation *MG103 thioredoxin reductase (trxB) (Escherichia
coli) GP: J03762_1 38.6 299 MG326 competence locus E (comE3) MOTIF
GP: L15202_4 30.5 239 *MG233 thymidylate synthase (thyA)
(Staphylococcus aureus) SP: P13954 56.6 311 (Bacillus subtilis)
Nucleotide and nucleoside interconversions *MG164 ribosomal protein
S17 (Mycoplasma capricolum) SP: P10131 51.2 82 *MG309 115 kDa
protein (p115) (Mycoplasma hyorhinis) GP: M34958_1 33.4 975 *MG093
ribosomal protein S18 (rpS18) (Escherichia coli) GP: U14003_114
45.5 64 *MG065 heterocysl maturation protein (devA) (Anabaena sp.)
GP: X75422_1 35.3 221 *MG159 ribosomal protein S19 (Escherichia
coli) GP: X02613_6 58.6 86 *MG479 heterocysl maturation protein
(devA) (Anabaena sp.) GP: X75422_1 39.9 198 *MG072 ribosomal
protein S2 (Spirulina platensis) SP: P34831 34.8 247 MG100
hydrolase (aux2) (Agrobacterium rhizogenes) GP: M61151_1 32.1 458
*MG161 ribosomal protein S3 (rpS3) SP: P02353 46.7 212 *MG223
macrogolgin (Homo sapiens) PIR: S37538 25.3 3055 (Mycoplasma
capricolum) *MG322 ribosomal protein S4 (rpS4) (Bacillus subtilis)
GP: M59358_1 43.0 197 *MG337 magnesium-chelatase 30 kDa subunit
(bchO) (Rhodobacter SP: P26174 26.7 245 *MG172 ribosomal protein S5
(Bacillus stearothermophilus) GP: M57621_1 56.0 157 capsulatus)
*MG012 ribosomal protein S6 modification protein (nmK) SP: P17116
31.5 127 *MG315 membrane associated ATPase (cbrO))
(Propionibacterium GP: U13043_1 30.0 227 MOTIF (Escherichia coli)
Ireudenrechii) *MG091 ribosomal protein S6 (Escherichia coli) SP:
P02358 23.9 87 MG376 mobilization protein (mob13) MOTIF
(Leuconostoc cenos) GP: M95954_1 30.9 161 *MG089 ribosomal protein
S7 rpS7) SP: P22744 64.9 153 MG372 muc8 protein (muc8) (Salmonella
typhimurium) SP: P14303 22.1 331 (Bacillus stearothermophilus)
*MG169 ribosomal protein S8 (Mycoplasma capriocolum) SP: P04446
46.9 125 *MG346 nitrogen fixation protein (nifS) (Mycobacterium
leprae) GP: U00013_6 26.2 358 *MG429 ribosomal protein S9 (rpS9)
SP: P07842 52.0 125 MG296 nodulation protein F (host-specificity of
nodulation protein A) SP: P04686 34.9 86 (Bacillus
stearothermophilus) (Rhizobium legumnosarum) Transport and binding
protein MG299 protein L (Peptostreptococcus magnus) GP: L04466_1
31.1 663 Amino acids, peptides and amines *MG338 protein V (IcrV)
(Streptococcus sp.) GP: X62467_1 28.3 478 MG231 aromatic amino acid
transport protein (aroP) GP: D26562_11 24.6 389 *MG149 protein X
(Pseudomonas fluorescens) GP: M35367_1 29.1 280 (Escherichia coli)
*MG314 membrane transport protein (gtnQ) GP: M61017_1 32.0 219
MG132 protein X (Spiroplasma citn) GP: M31161_3 21.6 88 (Bacillus
stearothermophilus) *MG183 membrane transport protein (gtnQ) GP:
M61017_1 37.4 210 *MG288 sensory rhodopsin II transducer (hiril)
MOTIF (Natronobacterium GP: Z35088_1 15.7 208 (Bacillus
stearothermophilus) *MG081 oligopeptide transport ATP-binding
protein SP: P18765 47.9 336 pharaonis) (amiE) (Streptococcus
pneumoniae) *MG059 small protein (smpB) (Escherichia coli) GP:
D12501_1 32.6 128 *MG082 oligopeptide transport ATP-binding protein
(amiF) SP: P18766 46.6 250 (Streptococcus pneumoniae) Hypothetical
*MG080 oligopeptide transport system permease protein SP: P26904
33.5 269 MG142 hypothetical 130K protein (P1 operon) MOTIF
(Mycoplasma PIR: JS0069 55.4 512 (dciAC) (Bacillus subtilis)
pneumoniae *MG079 oligopeptide transport system permease protein
SP: P24138 28.1 308 MG199 hypothetical 130K protein (P1 operon)
(Mycoplasma pneumoniae) PIR: JS0069 45.2 570 (oppB) (Bacillus
subtilis) MG195 hypothetical 28K protein (P1 operon) (Mycoplasma
pneumoniae) PIR: JS0068 61.7 239 *MG042 spermidine/putrescine
transport ATP-binding protein GP: M64519_1 41.9 262 *MG342
hypothetical protein (GB: D10165_3) (Escherichia coli) GP: D10165_3
28.9 233 (potA) (Escherichia coli) *MG227 hypothetical protein (GB:
D10483_63) (Escherichia coli) GP: D10483_63 35.2 304 *MG043
spermidine/putrescine transport system permease GP: M64519_2 26.5
221 *MG476 hypothetical protein (GB: D14982_3) (Mycoplasma
capricolum) GP: D14982_3 32.0 377 protein (potB) (Escherichia coli)
MG455 hypothetical protein (GB: D16311_1) (Bacillus subtilis) GP:
D16311_1 26.2 267 *MG044 spermidine/putrescine transport system
permease GP: M64519_3 29.5 252 MG383 hypothetical protein (GB:
D26185_10) (Bacillus subtilis) GP: D26185_10 25.8 221 protein
(potC) (Escherichia coli) *MG009 hypothetical protein (GB:
D26185_102) (Bacillus subtilis) GP: D26185_102 35.4 249 Anions
MG057 hypothetical protein (GB: D26185_104) (Bacillus subtilis) GP:
D26185_104 28.9 175 *MG422 peripheral membrane protein B (pstB) GP:
L10328_89 50.8 244 *MG024 hypothetical protein (GB: D26185_50)
(Bacillus subtilis) GP: D26185_50 51.1 363 (Escherichia coli) MG421
peripheral membrane protein U (Escherichia coli) GP: L10328_88 27.0
169 *MG006 hypothetical protein (GB: D26185_92) (Bacillus subtilis)
GP: D26185_92 41.5 178 *MG423 periplasmic phosphate permease
homolog (AG88) GP: X75297_1 30.8 254 *MG056 hypothetical protein
(GB: D26185_99) (Bacillus subtilis) GP: D26185_99 29.3 275
(Mycobacterium tuberculosis) MG333 hypothetical protein (GB:
D37799_6) (Bacillus subtilis) GP: D37799_6 27.6 211 Carbohydrates,
organic alcohols, and acids MG459 hypothetical protein (GB:
L08897_1) (Mycoplasma gallisepticum) GP: L08897_1 34.1 138 *MG192
ATP-binding protein (msmK) GP: M77351_7 40.5 357 MG218 hypothetical
protein (GB: L09228_16) (Bacillus subtilis) GP: L09228_16 27.1 238
(Streptococcus mutans) *MG062 fructose-permease IIBC component
(truA) SP: P20966 42.7 416 MG219 hypothetical protein (GB:
L09228_17) (Bacillus subtilis) GP: L10228_17 34.9 174 (Escherichia
coli) *MG033 glycerol uptake facilitator (glpF) GP: M99611_2 35.9
189 *MG273 hypothetical protein (GB: L10328_61) (Escherichia coli)
GP: L10328_61 27.2 267 (Bacillus subtilis) MG061 hexosephosphate
transport protein (uhpT) GP: M89480_4 30.9 158 *MG271 hypothetical
protein (GB: L10328_61) (Escherichia coli) GP: L10328_61 27.8 250
(Salmonella typhimurium) *MG193 membrane protein (msmF)
(Streptococcus mutans) GP: M77351_4 22.5 263 MG126 hypothetical
protein (GB: L10328_61) (Escherichia coli) GP: L10328_61 31.9 252
MG194 membrane protein (msmG) GP: M77351_5 27.1 272 MG140
hypothetical protein (GB: L18927_2) (Buchnera aphidicola) GP:
L18927_2 28.6 68 *MG120 methylgalactoside permease ATP-binding
protein GP: M59444_2 33.2 487 MG152 hypothetical protein (GB:
L18965_6) (Thermophilic bacterial sp.) GP: L18965_6 25.3 170 (mglA)
(Escherichia coli) MG305 hypothetical protein (GB: L9201_18)
(Escherichia coli) GP: L19201_18 23.1 328 *MG441 PEP-dependent HPr
protein kinase GP: M69050_2 46.5 570 MG029 hypothetical protein
(GB: L19300_1) (Staphylococcus aureus) GP: L19300_1 27.0 109
phosphoryltransferase (ptsI) MG425 hypothetical protein (GB:
L22432_4) (Mycoplasma capricolum) GP: L22432_4 25.0 94
(Staphylococcus camosus) MG041 phosphohistidinoprotein-hexose
phosphotransferase GP: L22432_2 48.9 86 MG250 hypothetical protein
(GB: M12965_1) (Escherichia coli) GP: M12965_1 33.8 64 (ptsH)
(Mycoplasma capricolum) *MG135 hypothetical protein (GB: M38777_3)
(Escherichia coli) GP: M38777_3 28.6 98 *MG071 phosphotransferase
enzyme II, ABC component SP: P20166 43.2 620 *MG358 hypothetical
protein (GB: M65289_3) (Bacillus stearothermophilus) GP: M65289_3
38.0 155 (ptsG) (Bacillus subtilis) MG211 hypothetical protein (GB:
M84964_1) (Bacillus subtilis) GP: M84964_1 30.7 341 MG130 PTS
glucose-specific permease GP: U12340_1 25.5 108 MG124 hypothetical
protein (GB: M91593_1) (Mycoplasma mycoides) GP: M91593_1 24.0 249
(Bacillus stearothermophilus) *MG121 ribose transport system
permease protein RBSC SP: P36948 27.5 199 MG245 hypothetical
protein (GB: M91593_1) (Mycoplasma mycoides) GP: M91593_1 27.8 130
(Bacillus subtilis) Cations MG131 hypothetical protein (GB:
M91593_1) (Myocoplasma mycoides) GP: M91593_1 30.7 246 MG073
cation-transporting ATPase (pacL) SP: P37278 34.4 887 *MG400
hypothetical protein (GB: U00016_19) (Mycobacterium leprae) GP:
U00016_19 30.9 106 (Synechococcus sp) Nucleosides, purines and
pyrimidines *MG129 hypothetical protein (GB: U00021_19)
(Mycobacterium leprae) GP: U00021_19 27.7 152 Other *MG454
hypothetical protein (GB: U00021_5) (Mycobacterium leprae) GP:
U00021_5 26.9 150 *MG301 ATP-binding protein P29 (Mycoplasma
hyorhinis) SP: P15361 32.3 227 *MG339 hypothetical protein (GB:
U00021_5) (Mycobacterium leprae) GP: U00021_5 32.9 430 *MG402
lactococcin transport ATP-binding protein (lcnDR3) SP: P37608 22.3
654 MG364 hypothetical protein (GB: U11883_2) (Bacillus subtilis)
GP: U11883_2 33.3 167 (Laciococcus tactis) MG230 hypothetical
protein (GB: U14003_71) (Escherichia coli) GP: U14003_71 22.0 481
MG332 Na + ATPase subunit J (ntpJ) (Enterococcus hirae) GP:
D17462_11 31.1 436 *MG111 hypothetical protein (GB: U14003_76)
(Escherichia coli) GP: U14003_76 28.6 230 MG300 protein P37
precursor (Mycoplasma hyorhinis) SP: P15363 35.8 331 MG473
hypothetical protein (GB: X73124_94) (Bacillus subtilis) GP:
X73124_94 40.0 68 *MG014 transport ATP-binding protein (msbA) SP:
P27299 28.1 518 MG265 hypothetical protein (GB: Z32651_1)
(Mycoplasma pneumoniae) GP: Z32651_1 57.1 41 (Escherichia coli)
*MG015 transport ATP-binding protein (msbA) SP: P27299 32.2 482
*MG257 hypothetical protein (GB: Z33076_2) (Mycoplasma capricolum)
GP: Z33076_2 37.7 210 (Escherichia coli) *MG418 transport system
permease protein P69 MOTIF SP: P15362 40.0 252 *MG147 hypothetical
protein (SP: P09170) (Escherichia coli) SP: P09170 24.1 109
(Mycoplasma hyorhinis) *MG128 hypothetical protein (SP: P19434)
(Streptomyces vindochromogenes) SP: P19434 26.0 105 MG302 transport
system permease protein P69 SP: P15362 27.9 524 *MG226 hypothetical
protein (SP: P22186) (Escherichia coli) SP: P22186 28.9 148
(Mycoplasma hyorhinis) *MG382 hypothetical protein (SP: P23851)
(Escherichia coli) SP: P23851 27.0 253 Other categories *MG214
hypothetical protein (SP: P23851) (Escherichia coli) SP: P23851
30.5 295 Adaptations and atypical conditions *MG306 hypothetical
protein (SP: P25745) (Escherichia coli) SP: P25745 34.7 123 MG467
osmotically inducible protein (osmC) SP: P23929 28.4 88 MG444
hypothetical protein (SP: P27712) (Spiroplasma citri) SP: P27712
28.4 231 (Escherichia coli) MG640 phosphate limitation protein
(sphX) GP: D26161_1 30.9 271 *MG252 hypothetical protein (SP:
P31056) (Escherichia coli) SP: P31058 33.0 180 (Synechococcus sp)
MG482 SpoOJ regulator MOTIF (Bacillus subtilis) GP: D26185_55 27.5
245 MG116 hypothetical protein (SP: P31131) (Escherichia coli) SP:
P31131 32.6 45 *MG285 spore germination apparatus protein (gerBB)
GP: L16960_2 31.2 128 *MG359 hypothetical protein (SP: P32049)
(Escherichia coli) SP: P32049 28.5 128 MOTIF (Bacillus subtilis)
MG480 hypothetical protein (SP: P32049) (Escherichia coli) SP:
P32049 28.5 128 MG395 sporulation protein (outB) MOTIF (Bacillus
subtilis) GP: M15811_1 36.4 235 *MG133 hypothetical protein (SP:
P32083) (Mycoplasma hyorhinis) SP: P32083 30.1 102 Colicin-related
functions *MG122 hypothetical protein (SP: P32720) (Escherichia
coli) SP: P32720 30.9 132 Drug and analog sensitivity MG138
hypothetical protein (SP: P37747) (Escherichia coli) SP: P37747
34.1 363 *MG475 high level kasgamycin resistance (ksgA)
GP: D26185_105 35.6 224 *MG345 hypothetical protein (SP: P38424)
(Bacillus subtilis) SP: P38424 33.9 167 (Bacillus subtilis)
Phage-related functions and prophages *MG136 hypothetical protein 4
(Trypanosoma brucei) PIR: E22845 30.8 302 Radiation sensitivity
*MG286 sinngent response-like protein (rel) (Streptococcus
equisimllis) GP: X72832_5 29.1 713 Transposon-related functions
MG338 U3 protein (Bacillus subtilis) GP: Z18629_1 27.1 272 Other
MG278 $$F protein (Escherichia coli) GP: U14003_297 38.3 302
[0285]
10TABLE 7 Summary of gene content in H. influenzae and M.
genitalium sorted by functional category Biological role H.
influenzae M. genitalium Amino acid biosynthesis 68 (6.8%) 1 (0.3%)
Biosynthesis of cofactors 54 (5.4%) 3 (0.8%) Cell envelope 84
(8.3%) 21 (5.8%) Cellular processes 53 (5.3%) 21 (5.8%) Cell
division 16 3 Cell killing 5 2 Chaperones 6 7 Detoxification 3 1
Protein secretion 15 7 Transformation 8 1 Central intermediary
metabolism 30 (3%) 6 (1.7%) Energy metabolism 112 (10 4%) 31 (8.5%)
Aerobic 4 3 Amino acids and amines 4 0 Anerobic 24 0 ATP-proton
force interconversion 9 8 Electron transport 9 0 Entner-Doudoroff 9
0 Fermentation 8 0 Gluconeogenesis 2 0 Glycolysis 10 10 Pentose
phosphate pathway 3 2 Pyruvate dehydrogenase 4 4 Sugars 15 4 TCA
cycle 11 0 Fatty acid and phospholipid metabolism 25 (2.5%) 5
(1.4%) Purines, pyrimidines, nucleosides and 53 (5 3%) 20 (5.4%)
nucleotides 2' Deoxyribonucleotide metabolism 8 5 Nucleotide and
nucleoside 3 1 interconversions Purine ribonucleotide biosynthesis
18 3 Pyrimidine ribonucleotide biosynthesis 5 0 Salvage of
nucleosides and nucleotides 13 9 Sugar-nucleotide biosynthesis and
6 2 conversions Regulatory functions 64 (6.3%) 5 (1.4%) Replication
87 (8.6%) 32 (8.8%) Degradition of DNA 8 2 DNA replication,
restriction, 76 30 modification, recombination and repair
Transcription 27 (2.7%) 12 (3.3%) Degradation of RNA 10 2 RNA
synthesis, modification and DNA 17 10 transcription Translation 141
(14%) 90 (24.7%) Transport and binding proteins 123 (12 2%) 34
(9.3%) Amino acids and peptides 38 10 Anions 8 3 Carbohydrates 30
12 Cations 24 1 Other transporters 22 8 Other Categories 93 (9.2%)
23 (6.3%) Unassigned role 736 (43%) 178 (37%) No database match 389
117 Match hypothetical proteins 347 61
[0286]
Sequence CWU 0
0
* * * * *
References