Nucleotide sequence of the Mycoplasma genitalium genome, fragments thereof, and uses thereof Fraser, Claire M. ; et al. [Johns Hopkins University]

Nucleotide sequence of the Mycoplasma genitalium genome, fragments thereof, and uses thereof

Fraser, Claire M. ; et al.

Patent Application Summary

U.S. patent application number 10/205220 was filed with the patent office on 2003-09-11 for nucleotide sequence of the mycoplasma genitalium genome, fragments thereof, and uses thereof. This patent application is currently assigned to Johns Hopkins University. Invention is credited to Adams, Mark D., Fraser, Claire M., Gocayne, Jeannine D., Hutchison, Clyde A. III, Smith, Hamilton O., Venter, J. Craig, White, Owen R..

Application Number	20030170663 10/205220
Document ID	/
Family ID	27413253
Filed Date	2003-09-11

United States Patent Application	20030170663
Kind Code	A1
Fraser, Claire M. ; et al.	September 11, 2003

Nucleotide sequence of the Mycoplasma genitalium genome, fragments thereof, and uses thereof

Abstract

The present invention provides the nucleotide sequence of the entire genome of Mycoplasma genitalium, SEQ ID NO: 1. The present invention further provides the sequence information stored on computer readable media, and computer-based systems and methods which facilitate its use. In addition to the entire genomic sequence, the present invention identifies protein encoding fragments of the genome, and identifies, by position relative to two (2) genes known to flank the origin of replication, any regulatory elements which modulate the expression of the protein encoding fragments of the Mycoplasma genitalium genome.

Inventors:	Fraser, Claire M.; (Potomac, MD) ; Adams, Mark D.; (Rockville, MD) ; Gocayne, Jeannine D.; (Potomac, MD) ; Hutchison, Clyde A. III; (Chapel Hill, MD) ; Smith, Hamilton O.; (Reisterstown, MD) ; Venter, J. Craig; (Queenstown, MD) ; White, Owen R.; (Rockville, MD)
Correspondence Address:	HUMAN GENOME SCIENCES INC 9410 KEY WEST AVENUE ROCKVILLE MD 20850
Assignee:	Johns Hopkins University Baltimore MD
Family ID:	27413253
Appl. No.:	10/205220
Filed:	July 26, 2002

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
10205220	Jul 26, 2002
08545528	Oct 19, 1995
08545528	Oct 19, 1995
08488018	Jun 7, 1995
08545528	Oct 19, 1995
08473545	Jun 7, 1995

Current U.S. Class:	435/6.12 ; 435/183; 435/252.3; 435/320.1; 435/6.15; 435/69.1; 536/23.7
Current CPC Class:	C07K 14/30 20130101; A61K 38/00 20130101
Class at Publication:	435/6 ; 435/69.1; 435/183; 435/252.3; 435/320.1; 536/23.7
International Class:	C12Q 001/68; C07H 021/04; C12N 009/00; C12N 001/21; C12P 021/02

Goverment Interests

[0002] Part of the work performed during development of this invention utilized U.S. Government funds. The U.S. Government may have certain right in the invention--DE-FC02-95ER61962.A000; NP-838C; NIH-AI08998, AI33161, and HL19171.

Claims

What is claimed is:

1. An isolated polynucleotide comprising the nucleotide sequence of any one of the fragments of SEQ ID NO: 1 depicted in Tables 1a, 1c and 2, or a degenerate variant thereof.

2. An isolated polynucleotide complementary to the polynucleotide of claim 1.

3. The isolated polynucleotide of claim 1, wherein said polynucleotide comprises a heterologous nucleic acid sequence.

4. A method for making a recombinant vector comprising inserting the isolated polynucleotide of claim 1 into a vector.

5. A recombinant vector comprising the isolated polynucleotide of claim 1.

6. The recombinant vector of claim 5, wherein said polynucleotide is operably associated with a heterologous regulatory sequence that controls gene expression.

7. A recombinant host cell comprising the isolated polynucleotide of claim 1.

8. The recombinant host cell of claim 7, wherein said polynucleotide is operably associated with a heterologous regulatory sequence that controls gene expression.

9. An isolated polynucleotide comprising a nucleic acid sequence which hybridizes under hybridization conditions comprising hybridization in 5.times.SSC and 50% formamide at 50.degree. C. and washing in a wash buffer consisting of 0.5.times. SSC at 65.degree. C., to the complementary strand of any one of the fragments of SEQ ID NO: 1 depicted in Tables 1a, 1c and 2, or a degenerate variant thereof.

10. An isolated polynucleotide complementary to the polynucleotide of claim 9.

11. The isolated polynucleotide of claim 9, wherein said polynucleotide comprises a heterologous nucleic acid sequence.

12. A recombinant vector comprising the isolated polynucleotide of claim 9.

13. A recombinant host cell comprising the isolated polynucleotide of claim 9.

14. An isolated polynucleotide comprising at least 50 contiguous nucleotides of any one of the fragments of SEQ ID NO: 1 depicted in Tables 1a, 1c and 2, or a degenerate variant thereof.

15. The isolated polynucleotide of claim 14, wherein said polynucleotide comprises at least 100 contiguous nucleotides of any one of the fragments of SEQ ID NO: 1 depicted in Tables 1a, 1c and 2, or a degenerate variant thereof.

16. An isolated polynucleotide complementary to the polynucleotide of claim 14.

17. The isolated polynucleotide of claim 14, wherein said polynucleotide comprises a heterologous nucleic acid sequence.

18. A recombinant vector comprising the isolated polynucleotide of claim 14.

19. A recombinant host cell comprising the isolated polynucleotide of claim 14.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a divisional of and claims priority under 35 U.S.C. .sctn. 120 to U.S. application Ser. No. 08/545,528, filed Oct. 19, 1995, which is a continuation-in-part of and claims priority under 35 U.S.C. .sctn. 120 to U.S. application Ser. Nos. 08/488,018 and 08/473,545, both filed Jun. 7, 1995. U.S. application Ser. Nos. 08/488,018 and 08/473,545 are each hereby incorporated herein by reference.

REFERENCE TO SEQUENCE LISTING

[0003] This application refers to a "Sequence Listing" listed below, which is provided as an electronic document on two identical compact discs (CD-R), labeled "Copy 1" and "Copy 2." These compact discs each contain the file "PB196P1D1.ST25.txt" (735,244 bytes, created on Jun. 24, 2002), which is hereby incorporated in its entirety herein.

FIELD OF THE INVENTION

[0004] The present invention relates to the field of molecular biology. The invention discloses compositions comprising the nucleotide sequence of Mycoplasma genitalium, fragments thereof, and its use in medical diagnostics, therapies and pharmaceutical development.

BACKGROUND OF THE INVENTION

[0005] Mycoplasmas are the smallest free-living bacterial organisms known (Colman, S. D. et al., Mol. Microbiol. 4:683-687 (1990)). Mycoplasmas are thought to have evolved from higher gram-positive bacteria through the loss of genetic material (Bailey, C. C. et al., J. Bacteriol. 176:5814-5819 (1994)). Mycoplasma genitalium (M. genitalium) is widely considered to be the smallest self-replicating biological system, as the molecular size of its genome has been shown to be only 570-600 kp (Pyle, L. E. et al., Nucleic Acids Res. 16(13):6015-6025 (1988); Peterson, S. N. et al., J. Bacteriol. 175:7918-7930 (1993)). All mycoplasmas lack a cell wall and have small genomes and a characteristically low G+C content (Razin, S., Microbiol. Rev. 49(4):419-455 (1985); Peterson, S. N. et al., J. Bacteriol. 175:7918-7930 (1993)). Some mycoplasmas, including M. genitalium, have a specialized codon usage, whereby UGA encodes tryptophan rather than serving as a stop codon (Inamine, J. M. et al., J. Bacteriol. 172:504-506 (1990); Tanaka, J. G. et al., Nucleic Acids Res. 19:6787-6792 (1991); Yamao, F. A. et al., Proc. Natl. Acad. Sci. USA 82:2306-2309 (1985)).

[0006] Mycoplasmas are widely known to be significant pathogens of humans, animals, and plants (Bailey, C. C. et al., J. Bacteriol. 176:5814-5819 (1994)). The metabolic systems of mycoplasmas indicate that they are generally biosynthetically deficient, and thus depend on the microenvironment of the host by characteristically adhering to host cells in order to obtain essential precursor molecules, i.e., amino acids, fatty acids and sterols etc. (Baseman, J. B., 1987. Mycoplasma Cell Membranes, Vol. 20. The Plenum Press, New York, N.Y.).

[0007] In particular, M. genitalium, a newly discovered species, is a pathogenic etiological agent first isolated in 1980 from the urethras of human males infected with non-gonococcal urethritis (Tully, J. G. et al., Lancet 1:1288-1291 (1981); Tully, J. G., et al., Int. J. Syst. Bacteriol. 33:387-396 (1983)). M. genitalium has also been identified in specimens of pneumonia patients as a co-isolate of Mycoplasma pneumoniae (Baseman, J. B. et al., J. Clin. Microbiol. 26:2266-2269 (1988)). M. genitalium opportunistic infection has often been observed in individuals infected with human immunodeficiency virus type 1 (HIV-1) (Lo, S. -C. et al., Amer. J. Trop. Med. Hyg. 41:601-616 (1989); Lo, S. -C. et al., Amer. J. Trop. Med. Hyg. 41:601-616 (1989); Sasaki, Y. et al., AIDS Res. Hum. Retrov. 9(8):775-780 (1993)). Mycoplasmas can also induce various cytokines, including tumor necrosis factor, which may enhance HIV replication (Chowdhury, I. H. et al., Biochem. Biophys. Res. Commun. 170:1365-1370 (1990)).

[0008] A high amino acid homology exists between the attachment protein of M. genitalium and the aligned proteins of several human Class II major histocompatibility complex proteins (HLA), suggesting that M. genitalium infection may play an important role in triggering autoimmune mechanisms, thereby aggravating the immunodeficiency characteristics of acquired immune deficiency syndrome (AIDS) (Montagnier, L. et al., C.R. Acad. Sci. Paris 311(3):425-430 (1990); Root-Bernstein, R. S. et al., Res. Immunol. 142:519-523 (1991); Bisset, L. R. Autoimmunity 14:167-168 (1992)). A diagnostic immunoassay for detecting M. genitalium infection using monoclonal antibodies specific for some M. genitalium antigens has been developed. Baseman, J. B. et al., U.S. Pat. No. 5,158,870.

[0009] Due to its diminutive genomic size, M. genitalium provides a useful model for determining the minimum number of genes and protein products necessary for a host-independent existence. M. genitalium expresses a characteristically low number of base-pairs and low G+C content, which along with its UGA tryptophan codon, has hampered sequencing efforts by conventional techniques (Razin, A., Microbiol. Rev. 49(4):419-455 (1985); Colman, S. D. et al., Gene 87:91-96 (1990); Dybvig, K. 1992. Gene Transfer In: Maniloff, J. (ed.) Mycoplasmas: Molecular Biology and Pathogenesis., Am. Soc. Microbiol. Washington, D.C., pp.355-362)). M. genitalium possesses a single circular chromosome (Colman, S. D. et al., Gene 87:91-96 (1990); Peterson, S. N. et al., J. Bacteriol. 175:7918-7930 (1993)). The characterization of the genome of M. genitalium has also been hampered by the lack of auxotrophic mutants and by the lack of a system for genetic exchange, precluding reverse genetic approaches. Thus, the sequencing of the M. genitalium genome would enhance the understanding of how M. genitalium causes or promotes various invasive or immunodeficiency diseases and to how best to medically combat mycoplasma infection.

[0010] Prior attempts at characterizing the structure and gene arrangement of the chromosomes of mycoplasmas using pulsed-field gel electrophoretic methods (Pyle, L. E. et al., Nucleic Acids Res. 16(13):6015-6025 (1988); Neimark, H. C. et al., Nucleic Acids Res. 18(18):5443-5448 (1990)), indicated that mycoplasmas have genomes ranging widely in size. Southern blot hybridization of digested DNAs of M. genitalium compared to the well-known human pathogen, M. pneumoniae, indicated overall low homology values of approximately 6-8% (Yogev, D. et al., Int. J. Syst. Bacteriol. 36(3):426-430 (1986)). However, high homologies have been reported between the adhesin genes of M. genitalium and M. pneumoniae (Dallo, S. F. et al., Microbial Path. 6:69-73 (1989)). Initial studies at characterizing the genome of M. genitalium by comparison to the well-known M. pneumoniae species, indicated that both species have three (3) rRNA genes clustered together in a chromosomal segment of about 5 kb and form a single operon organized in classical procaryotic fashion, but differences exist between their respective restriction sites (Yogev, D. et al., Int. J. Syst. Bacteriol. 36(3):426-430 (1986)).

[0011] Restriction enzyme mapping of M. genitalium indicates that the genome is approximately 600 kb. Several genes have also been mapped, including the single ribosomal operon, and the gene encoding the MgPa cytadhesion protein (Su, C. J. et al., J. Bacteriol. 172:4705-4707 (1990); Colman, S. D. et al., Mol. Microbiol. 4(4):683-687 (1990)). The entire restriction map of the genome of M. genitalium has also been cloned in an ordered library of 20 overlapping cosmids and one .lambda. clone (Lucier, T. S. et al., Gene 150:27-34 (1994)).

[0012] An initial study using random sequencing techniques to characterize the M. genitalium genome resulted in forty-four (44) random clones being partially sequenced; several long open reading frames were also found (Peterson, S. N. et al., Nucleic Acids Rev. 19:6027-6031 (1991)). Subsequent work using random sequencing of 508 random nonidentical clones has allowed sequence information to be compiled for approximately seventeen percent (17%) (100,993 nucleotides) of the M. genitalium genome (Peterson, S. N. et al., J. Bacteriol. 175:7918-7930 (1993)). Sequence information indicates that the diminutive genome of M. genitalium contains numerous genes involved in various metabolic processes. The genome is estimated to encode approximately 390 proteins, indicating that M. genitalium makes very efficient use of its limited amount of DNA (Peterson, S. N. et al., J. Bacteriol. 175:7918-7930 (1993)).

[0013] Several studies have been undertaken to sequence and characterize individual genes identified in M. genitalium. In particular, the medically important aspects of M. genitalium have helped to direct interest to those genes which determine the degree of infectivity and the virulence characteristics of the organism. The nucleotide sequence and deduced amino acid sequence for the MgPa adhesin gene, i.e., the gene encoding the surface cytadhesion protein of M. genitalium, indicates that the complete gene contains 4,335 nucleotides coding for a protein of 159,668 Da. (Dallo, S. F. et al., Infect. Immun. 57(4):1059-1065 (1989)). Furthermore, subsequent nucleotide sequencing of the M. genitalium MgPa adhesin gene revealed the specific codon order for this important gene (Inamine, J. M. et al., Gene 82:259-267 (1989)). The MgPa adhesin gene also has been shown to express restriction fragment length polymorphism (Dallo, S. F. et al., Microbial Path. 10:475-480 (1991)). Nucleotide homology to the well-known highly conserved procaryotic origin-of-replication gene (gyrA) was noted for M. genitalium (Bailey, C. C. et al., J. Bacteriol. 176:5814-5819 (1994)). The highly conserved procaryotic elongation factor, Tu, encoded by the tuf gene, has been noted and sequenced for M. genitalium, and was found to contain an open reading frame encoding a protein of approximately 393 amino acids (Loechel, S. et al., Nucleic Acids Res. 17(23):10127 (1989)). The tuf gene of M. genitalium has also been determined to use a signal other than a Shine-Dalgamo (ribosomal binding site) sequence preceding the initiation codon (Loechel, S. et al., Nucleic Acids Res. 19:6905-6911 (1991)).

SUMMARY OF THE INVENTION

[0014] The present invention is based on the sequencing of the Mycoplasma genitalium genome. The primary nucleotide sequence which was generated is provided in SEQ ID NO: 1.

[0015] The present invention provides the generated nucleotide sequence of the Mycoplasma genitalium genome, or a representative fragment thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan. In one embodiment, present invention is provided as a contiguous string of primary sequence information corresponding to the nucleotide sequence depicted in SEQ ID NO: 1.

[0016] The present invention further provides nucleotide sequences which are at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1.

[0017] The nucleotide sequence of SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence which is at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1 may be provided in a variety of mediums to facilitate its use. In one application of this embodiment, the sequences of the present invention are recorded on computer readable media. Such media includes, but is not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.

[0018] The present invention further provides systems, particularly computer-based systems which contain the sequence information herein described stored in a data storage means. Such systems are designed to identify commercially important fragments of the Mycoplasma genitalium genome.

[0019] Another embodiment of the present invention is directed to isolated fragments of the Mycoplasma genitalium genome. The fragments of the Mycoplasma genitalium genome of the present invention include, but are not limited to, fragments which encode peptides, hereinafter open reading frames (ORFs), fragments which modulate the expression of an operably linked ORF, hereinafter expression modulating fragments (EMFs), fragments which mediate the uptake of a linked DNA fragment into a cell, hereinafter uptake modulating fragments (UMFs), and fragments which can be used to diagnose the presence of Mycoplasma genitalium in a sample, hereinafter, diagnostic fragments (DFs).

[0020] Each of the ORF fragments of the Mycoplasma genitalium genome disclosed in Tables 1(a), 1(c) and 2, and the EMF found 5' to the ORF, can be used in numerous ways as polynucleotide reagents. The sequences can be used as diagnostic probes or diagnostic amplification primers for the presence of a specific microbe in a sample, for the production of commercially important pharmaceutical agents, and to selectively control gene expression.

[0021] The present invention further includes recombinant constructs comprising one or more fragments of the Mycoplasma genitalium genome of the present invention. The recombinant constructs of the present invention comprise vectors, such as a plasmid or viral vector, into which a fragment of the Mycoplasma genitalium has been inserted.

[0022] The present invention further provides host cells containing any one of the isolated fragments of the Mycoplasma genitalium genome of the present invention. The host cells can be a higher eukaryotic host such as a mammalian cell, a lower eukaryotic cell such as a yeast cell, or can be a procaryotic cell such as a bacterial cell.

[0023] The present invention is further directed to isolated proteins encoded by the ORFs of the present invention. A variety of methodologies known in the art can be utilized to obtain any one of the proteins of the present invention. At the simplest level, the amino acid sequence can be synthesized using commercially available peptide synthesizers. In an alternative method, the protein is purified from bacterial cells which naturally produce the protein. Lastly, the proteins of the present invention can alternatively be purified from cells which have been altered to express the desired protein.

[0024] The invention further provides methods of obtaining homologs of the fragments of the Mycoplasma genitalium genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention. Specifically, by using the nucleotide and amino acid sequences disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs.

[0025] The invention further provides antibodies which selectively bind one of the proteins of the present invention. Such antibodies include both monoclonal and polyclonal antibodies.

[0026] The invention further provides hybridomas which produce the above-described antibodies. A hybridoma is an immortalized cell line which is capable of secreting a specific monoclonal antibody.

[0027] The present invention further provides methods of identifying test samples derived from cells which express one of the ORF of the present invention, or homolog thereof. Such methods comprise incubating a test sample with one or more of the antibodies of the present invention, or one or more of the DFs of the present invention, under conditions which allow a skilled artisan to determine if the sample contains the ORF or product produced therefrom.

[0028] In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry out the above-described assays.

[0029] Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: (a) a first container comprising one of the antibodies, or one of the DFs of the present invention; and (b) one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of bound antibodies or hybridized DFs.

[0030] Using the isolated proteins of the present invention, the present invention further provides methods of obtaining and identifying agents capable of binding to a protein encoded by one of the ORFs of the present invention. Specifically, such agents include antibodies (described above), peptides, carbohydrates, pharmaceutical agents and the like. Such methods comprise the steps of:

[0031] (a) contacting an agent with an isolated protein encoded by one of the ORFs of the present invention; and

[0032] (b) determining whether the agent binds to said protein.

[0033] The complete genomic sequence of M. genitalium will be of great value to all laboratories working with this organism and for a variety of commercial purposes. Many fragments of the Mycoplasma genitalium genome will be immediately identified by similarity searches against GenBank or protein databases and will be of immediate value to Mycoplasma researchers and for immediate commercial value for the production of proteins or to control gene expression. A specific example concerns PHA synthase. It has been reported that polyhydroxybutyrate is present in the membranes of M. genitalium and that the amount correlates with the level of competence for transformation. The PHA synthase that synthesizes this polymer has been identified and sequenced in a number of bacteria, none of which are evolutionarily close to M. genitalium. This gene has yet to be isolated from M. genitalium by use of hybridization probes or PCR techniques. However, the genomic sequence of the present invention allows the identification of the gene by utilizing search means described below.

[0034] Developing the methodology and technology for elucidating the entire genomic sequence of bacterial and other small genomes has and will greatly enhance the ability to analyze and understand chromosomal organization. In particular, sequenced genomes will provide the models for developing tools for the analysis of chromosome structure and function, including the ability to identify genes within large segments of genomic DNA, the structure, position, and spacing of regulatory elements, the identification of genes with potential industrial applications, and the ability to do comparative genomic and molecular phylogeny.

BRIEF DESCRIPTION OF THE FIGURES

[0035] FIG. 1--EcoRI restriction map of the Mycoplasma genitalium genome.

[0036] FIG. 2--Block diagram of a computer system 102 that can be used to implement the computer-based systems of present invention.

[0037] FIG. 3--Summary of the Mycoplasma genitalium sequencing project.

[0038] FIG. 4--A circular representation of the M. genitalium chromosome. Outer concentric circle: Coding regions on the plus strand for which a gene identification was made. Second concentric circle: Coding regions on the minus strand for which a gene identification was made. Third concentric circle: The direction of transcription on each strand of the chromosome is depicted as an arrow starting at the putative origin of replication. Fourth concentric circle: Coverage by cosmid and lambda clones. Nineteen cosmid clones and one lambda clone were sequenced from each end to confirm the overall structure of the genome. Fifth concentric circle: The locations of the single ribosomal operon and the 33 tRNAs. The clusters of tRNAs (trnA, trnB, trnC, trnD and trnE) are indicated by the letters A-E with the number of tRNAs in each cluster listed in parentheses. Sixth concentric circle: Location of the MgPa operon and MgPa repeat fragments.

[0039] FIGS. 5A-5R--Gene map of the M. genitalium genome. Predicted coding regions are shown on each strand. The rRNA operon and tRNA genes are shown as described in the Figure key. Gene identification numbers correspond to those in Table 6.

[0040] FIG. 6--Location of the MgPa repeats in the M. genitalium genome. The structure of the MgPa operon (ORF1-MgPa gene-ORF3) in the M. genitalium genome is illustrated across the top. In addition to the complete operon, nine repetitive elements which are composites of particular regions of the MgPa operon were found. The coordinates of each repeat in the genome are indicated on the left and right end of each line. The repetitive elements are located directly below those regions in the operon for which there is sequence similarity. The percent of sequence identity between the repeat elements and the MgPa gene ranges from 78%-90%. In some of the repeats, the MgPa-related sequences are separated in the genome by a variable length, A-T rich spacer sequence (indicated in the figure by a line with the length of the spacer indicated in bp). In cases where no spacer sequence is shown, the composites of the operon are co-linear in the genome. In repeats 7 and 9, the order of the sequences in the repeats differs from that in the operon. In these cases, the order of the elements in each repeat in the genome is indicated numerically where element 1 is followed by element 2 which is followed by element 3, etc.

DETAILED DESCRIPTION

[0041] The present invention is based on the sequencing of the Mycoplasma genitalium genome. The primary nucleotide sequence which was generated is provided in SEQ ID NO: 1. As used herein, the "primary sequence" refers to the nucleotide sequence represented by the IUPAC nomenclature system.

[0042] The sequence provided in SEQ ID NO: 1 is oriented relative to two genes (DNAA and DNA gyrase) known to flank the origin of replication of the Mycoplasma genitalium genome. A skilled artisan will readily recognize that this start/stop point was chosen for convenience and does not reflect a structural significance.

[0043] The present invention provides the nucleotide sequence of SEQ ID NO: 1, or a representative fragment thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan. In one embodiment, the sequence is provided as a contiguous string of primary sequence information corresponding to the nucleotide sequence provided in SEQ ID NO: 1.

[0044] As used herein, a "representative fragment of the nucleotide sequence depicted in SEQ ID NO: 1" refers to any portion of SEQ ID NO: 1 which is not presently represented within a publicly available database. Preferred representative fragments of the present invention are Mycoplasma genitalium open reading frames, expression modulating fragments, uptake modulating fragments, and fragments which can be used to diagnose the presence of Mycoplasma genitalium in sample. A non-limiting identification of such preferred representative fragments is provided in Tables 1(a), 1(c) and 2.

[0045] The nucleotide sequence information provided in SEQ ID NO: 1 was obtained by sequencing the Mycoplasma genitalium genome using a megabase shotgun sequencing method. The nucleotide sequence provided in SEQ ID NO: 1 is a highly accurate, although not necessarily a 100% perfect, representation of the nucleotide sequence of the Mycoplasma genitalium genome.

[0046] As discussed in detail below, using the information provided in SEQ ID NO: 1 and in Tables 1(a), 1(c) and 2 together with routine cloning and sequencing methods, one of ordinary skill in the art would be able to clone and sequence all "representative fragments" of interest including open reading frames (ORFs) encoding a large variety of Mycoplasma genitalium proteins. In very rare instances, this may reveal a nucleotide sequence error present in the nucleotide sequence disclosed in SEQ ID NO: 1. Thus, once the present invention is made available (i.e., once the information in SEQ ID NO: 1 and Tables 1(a), 1(c) and 2 have been made available), resolving a rare sequencing error in SEQ ID NO: 1 would be well within the skill of the art. Nucleotide sequence editing software is publicly available. For example, Applied Biosystem's (AB) AutoAssembler.TM. can be used as an aid during visual inspection of nucleotide sequences.

[0047] Even if all of the very rare sequencing errors in SEQ ID NO: 1 were corrected, the resulting nucleotide sequence would still be at least 99.9% identical to the nucleotide sequence in SEQ ID NO: 1.

[0048] The nucleotide sequences of the genomes from different strains of Mycoplasma genitalium differ slightly. However, the nucleotide sequence of the genomes of all Mycoplasma genitalium strains will be at least 99.9% identical to the nucleotide sequence provided in SEQ ID NO: 1.

[0049] Thus, the present invention further provides nucleotide sequences which are at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1 in a form which can be readily used, analyzed and interpreted by the skilled artisan. Methods for determining whether a nucleotide sequence is at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1 are routine and readily available to the skilled artisan. For example, the well known fasta algorithm (Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85:2444 (1988)) can be used to generate the percent identity of nucleotide sequences.

[0050] Computer Related Embodiments

[0051] The nucleotide sequence provided in SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 may be "provided" in a variety of mediums to facilitate use thereof. As used herein, provided refers to a manufacture, other than an isolated nucleic acid molecule, which contains a nucleotide sequence of the present invention, i.e., the nucleotide sequence provided in SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1. Such a manufacture provides the Mycoplasma genitalium genome or a subset thereof (e.g., a Mycoplasma genitalium open reading frame (ORF)) in a form which allows a skilled artisan to examine the manufacture using means not directly applicable to examining the Mycoplasma genitalium genome or a subset thereof as it exists in nature or in purified form.

[0052] In one application of this embodiment, a nucleotide sequence of the present invention can be recorded on computer readable media. As used herein, "computer readable media" refers to any medium which can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. A skilled artisan can readily appreciate how any of the presently known computer readable mediums can be used to create a manufacture comprising computer readable medium having recorded thereon a nucleotide sequence of the present invention.

[0053] As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled artisan can readily adopt any of the presently know methods for recording information on computer readable medium to generate manufactures comprising the nucleotide sequence information of the present invention.

[0054] A variety of data storage structures are available to a skilled artisan for creating a computer readable medium having recorded thereon a nucleotide sequence of the present invention. The choice of the data storage structure will generally be based on the means chosen to access the stored information. In addition, a variety of data processor programs and formats can be used to store the nucleotide sequence information of the present invention on computer readable medium. The sequence information can be represented in a word processing text file, formatted in commercially-available software such as WordPerfect and Microsoft Word, or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily adapt any number of data processor structuring formats (e.g. text file or database) in order to obtain computer readable medium having recorded thereon the nucleotide sequence information of the present invention.

[0055] By providing the nucleotide sequence of SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 in computer readable form, a skilled artisan can routinely access the sequence information for a variety of purposes. Computer software is publicly available which allows a skilled artisan to access sequence information provided in a computer readable medium. The examples which follow demonstrate how software which implements the BLAST (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) and BLAZE (Brutlag et al., Comp. Chem. 17:203-207 (1993)) search algorithms on a Sybase system was used to identify open reading frames (ORFs) within the Mycoplasma genitalium genome which contain homology to ORFs or proteins from other organisms. Such ORFs are protein encoding fragments within the Mycoplasma genitalium genome and are useful in producing commercially important proteins such as enzymes used in fermentation reactions and in the production of commercially useful metabolites.

[0056] The present invention further provides systems, particularly computer-based systems, which contain the sequence information described herein. Such systems are designed to identify commercially important fragments of the Mycoplasma genitalium genome.

[0057] As used herein, "a computer-based system" refers to the hardware means, software means, and data storage means used to analyze the nucleotide sequence information of the present invention. The minimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means, and data storage means. A skilled artisan can readily appreciate that any one of the currently available computer-based system are suitable for use in the present invention.

[0058] As stated above, the computer-based systems of the present invention comprise a data storage means having stored therein a nucleotide sequence of the present invention and the necessary hardware means and software means for supporting and implementing a search means. As used herein, "data storage means" refers to memory which can store nucleotide sequence information of the present invention, or a memory access means which can access manufactures having recorded thereon the nucleotide sequence information of the present invention.

[0059] As used herein, "search means" refers to one or more programs which are implemented on the computer-based system to compare a target sequence or target structural motif with the sequence information stored within the data storage means. Search means are used to identify fragments or regions of the Mycoplasma genitalium genome which match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a variety of commercially available software for conducting search means are available and can be used in the computer-based systems of the present invention. Examples of such software includes, but is not limited to, MacPattern (EMBL), BLASTN and BLASTX (NCBIA). A skilled artisan can readily recognize that any one of the available algorithms or implementing software packages for conducting homology searches can be adapted for use in the present computer-based systems.

[0060] As used herein, a "target sequence" can be any DNA or amino acid sequence of six or more nucleotides or two or more amino acids. A skilled artisan can readily recognize that the longer a target sequence is, the less likely a target sequence will be present as a random occurrence in the database. The most preferred sequence length of a target sequence is from about 10 to 100 amino acids or from about 30 to 300 nucleotide residues. However, it is well recognized that searches for commercially important fragments of the Mycoplasma genitalium genome, such as sequence fragments involved in gene expression and protein processing, may be of shorter length.

[0061] As used herein, "a target structural motif," or "target motif," refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration which is formed upon the folding of the target motif. There are a variety of target motifs known in the art. Protein target motifs include, but are not limited to, enzymatic active sites and signal sequences. Nucleic acid target motifs include, but are not limited to, promoter sequences, hairpin structures and inducible expression elements (protein binding sequences).

[0062] A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention. A preferred format for an output means ranks fragments of the Mycoplasma genitalium genome possessing varying degrees of homology to the target sequence or target motif. Such presentation provides a skilled artisan with a ranking of sequences which contain various amounts of the target sequence or target motif and identifies the degree of homology contained in the identified fragment.

[0063] A variety of comparing means can be used to compare a target sequence or target motif with the data storage means to identify sequence fragments of the Mycoplasma genitalium genome. In the present examples, implementing software which implement the BLAST and BLAZE algorithms (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) was used to identify open reading frames within the Mycoplasma genitalium genome. A skilled artisan can readily recognize that any one of the publicly available homology search programs can be used as the search means for the computer-based systems of the present invention.

[0064] One application of this embodiment is provided in FIG. 2. FIG. 2 provides a block diagram of a computer system 102 that can be used to implement the present invention. The computer system 102 includes a processor 106 connected to a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented as random access memory, RAM) and a variety of secondary storage devices 110, such as a hard drive 112 and a removable medium storage device 114. The removable medium storage device 114 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable storage medium 116 (such as a floppy disk, a compact disk, a magnetic tape, etc.) containing control logic and/or data recorded therein may be inserted into the removable medium storage device 114. The computer system 102 includes appropriate software for reading the control logic and/or the data from the removable medium storage device 114 once inserted in the removable medium storage device 114.

[0065] A nucleotide sequence of the present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 110, and/or a removable storage medium 116. Software for accessing and processing the genomic sequence (such as search tools, comparing tools, etc.) reside in main memory 108 during execution.

[0066] Biochemical Embodiments

[0067] Another embodiment of the present invention is directed to isolated fragments of the Mycoplasma genitalium genome. The fragments of the Mycoplasma genitalium genome of the present invention include, but are not limited to fragments which encode peptides, hereinafter open reading frames (ORFs), fragments which modulate the expression of an operably linked ORF, hereinafter expression modulating fragments (EMFs), fragments which mediate the uptake of a linked DNA fragment into a cell, hereinafter uptake modulating fragments (UMFs), and fragments which can be used to diagnose the presence of Mycoplasma genitalium in a sample, hereinafter diagnostic fragments (DFs).

[0068] As used herein, an "isolated nucleic acid molecule" or an "isolated fragment of the Mycoplasma genitalium genome" refers to a nucleic acid molecule possessing a specific nucleotide sequence which has been subjected to purification means to reduce, from the composition, the number of compounds which are normally associated with the composition. A variety of purification means can be used to generated the isolated fragments of the present invention. These include, but are not limited to methods which separate constituents of a solution based on charge, solubility, or size.

[0069] In one embodiment, Mycoplasma genitalium DNA can be mechanically sheared to produce fragments of 15-20 kb in length. These fragments can then be used to generate an Mycoplasma genitalium library by inserting them into lambda clones as described in the Examples below. Primers flanking, for example, an ORF provided in Table 1(a), 1(c) or 2 can then be generated using nucleotide sequence information provided in SEQ ID NO: 1. PCR cloning can then be used to isolate the ORF from the lambda DNA library. PCR cloning is well known in the art. Thus, given the availability of SEQ ID NO: 1, Table 1(a), 1(c) and Table 2, it would be routine to isolate any ORF or other representative fragment of the present invention.

[0070] The isolated nucleic acid molecules of the present invention include, but are not limited to single stranded and double stranded DNA, and single stranded RNA.

[0071] As used herein, an "open reading frame," ORF, means a series of triplets coding for amino acids without any termination codons and is a sequence translatable into protein. Tables 1(a), 1(b), 1(c) and 2 identify ORFs in the Mycoplasma genitalium genome. In particular, Table 1(a) indicates the location of ORFs (i.e., the addresses) within the Mycoplasma genitalium genome which encode the recited protein based on homology matching with protein sequences from the organism appearing in parentheticals (see the fifth column of Table 1(a)).

[0072] The first column of Table 1(a) provides the "UID" (an arbitrary identification number) of a particular ORF. The second and third columns in Table 1(a) indicate an ORFs position in the nucleotide sequence provided in SEQ ID NO: 1. One of ordinary skill in the art will recognize that ORFs may be oriented in opposite directions in the Mycoplasma genitalium genome. This is reflected in columns 2 and 3.

[0073] The fourth column of Table 1(a) provides the accession number of the database match for the ORF. As indicated above, the fifth column of Table 1(a) provides the name of the database match for the ORF.

[0074] The sixth column of Table 1(a) indicates the percent identity of the protein encoded for by an ORF to the corresponding protein from the organism appearing in parentheticals in the fifth column. The seventh column of Table 1(a) indicates the percent similarity of the protein encoded for by an ORF to the corresponding protein from the organism appearing in parentheticals in the fifth column. The concepts of percent identity and percent similarity of two polypeptide sequences are well understood in the art. For example, two polypeptides 10 amino acids in length which differ at three amino acid positions (e.g., at positions 1, 3 and 5) are said to have a percent identity of 70%. However, the same two polypeptides would be deemed to have a percent similarity of 80% if, for example at position 5, the amino acids moieties, although not identical, were "similar" (i.e., possessed similar biochemical characteristics). The eighth column in Table 1(a) indicates the length of the ORF in nucleotides.

[0075] Table 1(b) is a list of ORFs that have database matches to previously published Mycoplasma genitalium sequences over the full length of the ORF. The table headings for Table 1(b) are identical for Table 1(a) with the following two exceptions: (II) The heading for the eighth column in Table 1(a) (i.e., nucleotide length of the ORF) has been replaced with the following in Table 1(b): "Match_info". "Match_info" refers to the coordinates of the match of the ORF and the previously published Mycoplasma genitalium sequence. For example, "MG002 (1-930 of 930) GB:U09251 (298-1227 of 6140)," indicates that for ORF MG002, which is 930 nucleotides in length, there is a database match to accession number GB:U09251, which has a total length of 6140 nucleotides. The ORF matches this accession from position 298 to 1227.

[0076] (II) Where an ORF shows homology matches for both a previously published Mycoplasma genitalium sequence and a previously published sequence from a different organism, columns 3, 4, 5, and 6 of Table 1(b) respectively provide the accession number, protein name (and organism in parentheticals), percent identity and percent similarity for the "other organism," rather than for the previously published Mycoplasma genitalium sequence. (However, in this scenario, the accession number for the Mycoplasma genitalium sequence is still provided in column 8.)

[0077] Table 1(c) provides ORFs having database matches to previously published Mycoplasma genitalium sequences but only over a portion of the ORF. The table headings are the same as above for Table 1(b).

[0078] In Tables 1(a), 1(b) and 1(c), unique identifiers are used to identify the recited ORFs, (e.g., "MG123"). In the parent U.S. application Ser. Nos. 08/488,018 and 08/473,545, the recited ORFs are identified using the "MORF" identifier. Table 1(d) lists which of the new and old identifiers refer to the same ORF. For example, the first entry in Table 1(d) indicates that the ORF identified as MG001 in the current application is the same ORF which was previously identified as MORF-20072 in parent U.S. application Ser. Nos. 08/488,018 and 08/473,545. Similarly, the third entry in Table 1(d) indicates that the ORF identified as MG003 in the current application is the same ORF which was previously identified as MORF-19818 and MORF-20073 in the parent applications.

[0079] Table 2 provides ORFs of the Mycoplasma genitalium genome which did not elicit a "homology match" with a known sequence from either M. genitalium or another organism.

[0080] Table 6 classifies each ORF according to its role category (adapted from Riley, M., Microbiol. Rev. 57:862 (1992)). The gene identification, the accession number from public archives that corresponds to the best match, the percent amino acid identity, and the length of the match in amino acids is also listed for each entry as above in Tables 1 (a-c). Those genes in M. genitalium that also match a gene in H. influenzae are indicated by an asterisk (*). For the purposes of Tables 6 and 7 and FIG. 4, each of the MgPa repetitive elements has been assigned an MG number, even though there is evidence to suggest that these repeats may not be transcribed.

[0081] Table 7 sorts the gene content in H. influenzae and M. genitalium by functional category. The number of genes in each category is listed for each organism. The number in parentheses indicates the percent of the putatively identified genes devoted to each functional category. For the category of unassigned genes, the percent of the genome indicated in parentheses represents the percent of the total number of putative coding regions.

[0082] Further details concerning the algorithms and criteria used for homology searches are provided in the Examples below.

[0083] A skilled artisan can readily identify ORFs in the Mycoplasma genitalium genome other than those listed in Tables 1(a), 1(b), 1(c) and 2, such as ORFs which are overlapping or encoded by the opposite strand of an identified ORF in addition to those ascertainable using the computer-based systems of the present invention.

[0084] As used herein, an "expression modulating fragment," EMF, means a series of nucleotide molecules which modulates the expression of an operably linked ORF or EMF.

[0085] As used herein, a sequence is said to "modulate the expression of an operably linked sequence" when the expression of the sequence is altered by the presence of the EMF. EMFs include, but are not limited to, promoters, and promoter modulating sequences (inducible elements). One class of EMFs are fragments which induce the expression or an operably linked ORF in response to a specific regulatory factor or physiological event. A review of known EMFs from Mycoplasma are described by (Tomb et al. Gene 104:1-10 (1991), Chandler, M. S., Proc. Natl. Acad. Sci. USA 89:1626-1630 (1992).

[0086] EMF sequences can be identified within the Mycoplasma genitalium genome by their proximity to the ORFs provided in Tables 1(a), 1(b), 1(c) and 2. An intergenic segment, or a fragment of the intergenic segment, from about 10 to 200 nucleotides in length, taken 5' from any one of the ORFs of Tables 1(a), 1(b), 1(c) or 2 will modulate the expression of an operably linked 3' ORF in a fashion similar to that found with the naturally linked ORF sequence. As used herein, an "intergenic segment" refers to the fragments of the Mycoplasma genome which are between two ORF(s) herein described. Alternatively, EMFs can be identified using known EMFs as a target sequence or target motif in the computer-based systems of the present invention.

[0087] The presence and activity of an EMF can be confirmed using an EMF trap vector. An EMF trap vector contains a cloning site 5' to a marker sequence. A marker sequence encodes an identifiable phenotype, such as antibiotic resistance or a complementing nutrition auxotrophic factor, which can be identified or assayed when the EMF trap vector is placed within an appropriate host under appropriate conditions. As described above, an EMF will modulate the expression of an operably linked marker sequence. A more detailed discussion of various marker sequences is provided below.

[0088] A sequence which is suspected as being an EMF is cloned in all three reading frames in one or more restriction sites upstream from the marker sequence in the EMF trap vector. The vector is then transformed into an appropriate host using known procedures and the phenotype of the transformed host in examined under appropriate conditions. As described above, an EMF will modulate the expression of an operably linked marker sequence.

[0089] As used herein, an "uptake modulating fragment," UMF, means a series of nucleotide molecules which mediate the uptake of a linked DNA fragment into a cell. UMFs can be readily identified using known UMFs as a target sequence or target motif with the computer-based systems described above.

[0090] The presence and activity of a UMF can be confirmed by attaching the suspected UMF to a marker sequence. The resulting nucleic acid molecule is then incubated with an appropriate host under appropriate conditions and the uptake of the marker sequence is determined. As described above, a UMF will increase the frequency of uptake of a linked marker sequence. A review of DNA uptake in Mycoplasma is provided by Goodgall, S. H., et al., J. Bact. 172:5924-5928 (1990).

[0091] As used herein, a "diagnostic fragment," DF, means a series of nucleotide molecules which selectively hybridize to Mycoplasma genitalium sequences. DFs can be readily identified by identifying unique sequences within the Mycoplasma genitalium genome, or by generating and testing probes or amplification primers consisting of the DF sequence in an appropriate diagnostic format which determines amplification or hybridization selectivity.

[0092] The sequences falling within the scope of the present invention are not limited to the specific sequences herein described, but also include allelic and species variations thereof. Allelic and species variations can be routinely determined by comparing the sequence provided in SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 with a sequence from another isolate of the same species. Furthermore, to accommodate codon variability, the invention includes nucleic acid molecules coding for the same amino acid sequences as do the specific ORFs disclosed herein. In other words, in the coding region of an ORF, substitution of one codon for another which encodes the same amino acid is expressly contemplated.

[0093] Any specific sequence disclosed herein can be readily screened for errors by resequencing a particular fragment, such as an ORF, in both directions (i.e., sequence both strands). Alternatively, error screening can be performed by sequencing corresponding polynucleotides of Mycoplasma genitalium origin isolated by using part or all of the fragments in question as a probe or primer.

[0094] Each of the ORFs of the Mycoplasma genitalium genome disclosed in Tables 1(a), 1(b), 1(c) and 2, and the EMF found 5' to the ORF, can be used in numerous ways as polynucleotide reagents. The sequences can be used as diagnostic probes or diagnostic amplification primers to detect the presence of a specific microbe, such as Mycoplasma genitalium, in a sample. This is especially the case with the fragments or ORFs of Table 2, which will be highly selective for Mycoplasma genitalium.

[0095] In addition, the fragments of the present invention, as broadly described, can be used to control gene expression through triple helix formation or antisense DNA or RNA, both of which methods are based on the binding of a polynucleotide sequence to DNA or RNA. Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary to a region of the gene involved in transcription (triple helix--see Lee et a., Nucl. Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988); and Dervan et al., Science 251:1360 (1991)) or to the mRNA itself (antisense--Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, Fla. (1988)).

[0096] Triple helix-formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an MRNA molecule into polypeptide. Both techniques have been demonstrated to be effective in model systems. Information contained in the sequences of the present invention is necessary for the design of an antisense or triple helix oligonucleotide.

[0097] The present invention further provides recombinant constructs comprising one or more fragments of the Mycoplasma genitalium genome of the present invention. The recombinant constructs of the present invention comprise a vector, such as a plasmid or viral vector, into which a fragment of the Mycoplasma genitalium has been inserted, in a forward or reverse orientation. In the case of a vector comprising one of the ORFs of the present invention, the vector may further comprise regulatory sequences, including for example, a promoter, operably linked to the ORF. For vectors comprising the EMFs and UMFs of the present invention, the vector may further comprise a marker sequence or heterologous ORF operably linked to the EMF or UMF. Large numbers of suitable vectors and promoters are known to those of skill in the art and are commercially available for generating the recombinant constructs of the present invention. The following vectors are provided by way of example. Bacterial: pBs, phagescript, PsiX174, pBluescript SK, pBs KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia). Eukaryotic: pWLneo, pSV2cat, pOG44, pXT1, pSG (Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia).

[0098] Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers. Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters include lacI, lacZ, T3, T7, gpt, lambda P.sub.R, and trc. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.

[0099] The present invention further provides host cells containing any one of the isolated fragments of the Mycoplasma genitalium genome of the present invention, wherein the fragment has been introduced into the host cell using known transformulation methods. The host cell can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or the host cell can be a procaryotic cell, such as a bacterial cell. Introduction of the recombinant construct into the host cell can be effected by calcium phosphate transfection, DEAE, dextran mediated transfection, or electroporation (Davis, L. et al., Basic Methods in Molecular Biology (1986)).

[0100] The host cells containing one of the fragments of the Mycoplasma genitalium genome of the present invention, can be used in conventional manners to produce the gene product encoded by the isolated fragment (in the case of an ORF) or can be used to produce a heterologous protein under the control of the EMF.

[0101] The present invention fuirther provides isolated polypeptides encoded by the nucleic acid fragments of the present invention or by degenerate variants of the nucleic acid fragments of the present invention. By "degenerate variant" is intended nucleotide fragments which differ from a nucleic acid fragment of the present invention (e.g., an ORF) by nucleotide sequence but, due to the degeneracy of the Genetic Code, encode an identical polypeptide sequence. Preferred nucleic acid fragments of the present invention are the ORFs depicted in Tables 1(a), 1(c) and 2.

[0102] A variety of methodologies known in the art can be utilized to obtain any one of the isolated polypeptides or proteins of the present invention. At the simplest level, the amino acid sequence can be synthesized using commercially available peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides. Fragments are useful, for example, in generating antibodies against the native polypeptide. In an alternative method, the polypeptide or protein is purified from bacterial cells which naturally produce the polypeptide or protein. One skilled in the art can readily follow known methods for isolating polypeptides and proteins in order to obtain one of the isolated polypeptides or proteins of the present invention. These include, but are not limited to, immunochromatography, HPLC, size-exclusion chromatography, ion-exchange chromatography, and immuno-affinity chromatography.

[0103] The polypeptides and proteins of the present invention can alternatively be purified from cells which have been altered to express the desired polypeptide or protein. As used herein, a cell is said to be altered to express a desired polypeptide or protein when the cell, through genetic manipulation, is made to produce a polypeptide or protein which it normally does not produce or which the cell normally produces at a lower level. One skilled in the art can readily adapt procedures for introducing and expressing either recombinant or synthetic sequences into eukaryotic or prokaryotic cells in order to generate a cell which produces one of the polypeptides or proteins of the present invention.

[0104] Any host/vector system can be used to express one or more of the ORFs of the present invention. These include, but are not limited to, eukaryotic hosts such as HeLa cells, Cv-1 cell, COS cells, and Sf9 cells, as well as prokaryotic host such as E. coli and B. subtilis. The most preferred cells are those which do not normally express the particular polypeptide or protein or which expresses the polypeptide or protein at low natural level.

[0105] "Recombinant," as used herein, means that a polypeptide or protein is derived from recombinant (e.g., microbial or mammalian) expression systems. "Microbial" refers to recombinant polypeptides or proteins made in bacterial or fungal (e.g., yeast) expression systems. As a product, "recombinant microbial" defines a polypeptide or protein essentially free of native endogenous substances and unaccompanied by associated native glycosylation. Polypeptides or proteins expressed in most bacterial cultures, e.g., E. coli, will be free of glycosylation modifications; polypeptides or proteins expressed in yeast will have a glycosylation pattern different from that expressed in mammalian cells.

[0106] "Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides. Generally, DNA segments encoding the polypeptides and proteins provided by this invention are assembled from fragments of the Mycoplasma genitalium genome and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene which is capable of being expressed in a recombinant transcriptional unit comprising regulatory elements derived from a microbial or viral operon.

[0107] "Recombinant expression vehicle or vector" refers to a plasmid or phage or virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. The expression vehicle can comprise a transcriptional unit comprising an assembly of (1) a genetic element or elements having a regulatory role in gene expression, for example, promoters or enhancers, (2) a structural or coding sequence which is transcribed into mRNA and translated into protein, and (3) appropriate transcription initiation and termination sequences. Structural units intended for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretion of translated protein by a host cell. Alternatively, where recombinant protein is expressed without a leader or transport sequence, it may include an N-terminal methionine residue. This residue may or may not be subsequently cleaved from the expressed recombinant protein to provide a final product.

[0108] "Recombinant expression system" means host cells which have stably integrated a recombinant transcriptional unit into chromosomal DNA or carry the recombinant transcriptional unit extra chromosomally. The cells can be prokaryotic or eukaryotic. Recombinant expression systems as defined herein will express heterologous polypeptides or proteins upon induction of the regulatory elements linked to the DNA segment or synthetic gene to be expressed.

[0109] Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook, et al., in Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N.Y. (1989), the disclosure of which is hereby incorporated by reference.

[0110] Generally, recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly-expressed gene to direct transcription of a downstream structural sequence. Such promoters can be derived from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), a-factor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable of directing secretion of translated protein into the periplasmic space or extracellular medium. Optionally, the heterologous sequence can encode a fusion protein including an N-terminal identification peptide imparting desired characteristics, e.g., stabilization or simplified purification of expressed recombinant product.

[0111] Useful expression vectors for bacterial use are constructed by inserting a structural DNA sequence encoding a desired protein together with suitable translation initiation and termination signals in operable reading phase with a functional promoter. The vector will comprise one or more phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and to, if desirable, provide amplification within the host. Suitable prokaryotic hosts for transformation include E. coli, Bacillus subtilis, Salmonella typhimurium and various species within the genera Pseudomonas, Streptomyces, and Staphylococcus, although others may, also be employed as a matter of choice.

[0112] As a representative but nonlimiting example, useful expression vectors for bacterial use can comprise a selectable marker and bacterial origin of replication derived from commercially available plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (Promega Biotec, Madison, Wis., USA). These pBR322 "backbone" sections are combined with an appropriate promoter and the structural sequence to be expressed.

[0113] Following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter is derepressed by appropriate means (e.g., temperature shift or chemical induction) and cells are cultured for an additional period. Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification.

[0114] Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts, described by Gluzman, Cell 23:175 (1981), and other cell lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell lines. Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required nontranscribed genetic elements.

[0115] Recombinant polypeptides and proteins produced in bacterial culture is usually isolated by initial extraction from cell pellets, followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography steps. Protein refolding steps can be used, as necessary, in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can be employed for final purification steps. Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents.

[0116] The present invention further includes isolated polypeptides, proteins and nucleic acid molecules which are substantially equivalent to those herein described. As used herein, substantially equivalent can refer both to nucleic acid and amino acid sequences, for example a mutant sequence, that varies from a reference sequence by one or more substitutions, deletions, or additions, the net effect of which does not result in an adverse functional dissimilarity between reference and subject sequences. For purposes of the present invention, sequences having equivalent biological activity, and equivalent expression characteristics are considered substantially equivalent. For purposes of determining equivalence, truncation of the mature sequence should be disregarded.

[0117] The invention further provides methods of obtaining homologs from other strains of Mycoplasma genitalium, of the fragments of the Mycoplasma genitalium genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention. As used herein, a sequence or protein of Mycoplasma genitalium is defined as a homolog of a fragment of the Mycoplasma genitalium genome or a protein encoded by one of the ORFs of the present invention, if it shares significant homology to one of the fragments of the Mycoplasma genitalium genome of the present invention or a protein encoded by one of the ORFs of the present invention. Specifically, by using the sequence disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs.

[0118] As used herein, two nucleic acid molecules or proteins are said to "share significant homology" if the two contain regions which process greater than 85% sequence (amino acid or nucleic acid) homology.

[0119] Region specific primers or probes derived from the nucleotide sequence provided in SEQ ID NO: 1 or from a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 can be used to prime DNA synthesis and PCR amplification, as well as to identify colonies containing cloned DNA encoding a homolog using known methods (Innis et al., PCR Protocols, Academic Press, San Diego, Calif. (1990)).

[0120] When using primers derived from SEQ ID NO: 1 or from a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1, one skilled in the art will recognize that by employing high stringency conditions (e.g., annealing at 50-60.degree. C.) only sequences which are greater than 75% homologous to the primer will be amplified. By employing lower stringency conditions (e.g., annealing at 35-37.degree. C.), sequences which are greater than 40-50% homologous to the primer will also be amplified.

[0121] When using DNA probes derived from SEQ ID NO: 1 or from a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 for colony/plaque hybridization, one skilled in the art will recognize that by employing high stringency conditions (e.g., hybridizing at 50-65.degree. C. in 5.times. SSC and 50% formamide, and washing at 50-65.degree. C. in 0.5.times. SSC), sequences having regions which are greater than 90% homologous to the probe can be obtained, and that by employing lower stringency conditions (e.g., hybridizing at 35-37.degree. C. in 5.times. SSC and 40-45% formamide, and washing at 42.degree. C. in SSC), sequences having regions which are greater than 35-45% homologous to the probe will be obtained.

[0122] Any organism can be used as the source for homologs of the present invention so long as the organism naturally expresses such a protein or contains genes encoding the same. The most preferred organism for isolating homologs are bacteria which are closely related to Mycoplasma genitalium.

[0123] Uses for the Compositions of the Invention

[0124] Each ORF provided in Table 1(a), 1(b) and 1(c) was assigned to biological role categories adapted from Riley, M., Microbiology Reviews 57(4):862 (1993)). This allows the skilled artisan to determine a use for each identified coding sequence. Tables 1(a), 1(b) and 1(c) further provides an identification of the type of polypeptide which is encoded for by each ORF. As a result, one skilled in the art can use the polypeptides of the present invention for commercial, therapeutic and industrial purposes consistent with the type of putative identification of the polypeptide.

[0125] Such identifications permit one skilled in the art to use the Mycoplasma genitalium ORFs in a manner similar to the known type of sequences for which the identification is made; for example, to ferment a particular sugar source or to produce a particular metabolite. (For a review of enzymes used within the commercial industry, see Biochemical Engineering and Biotechnology Handbook 2nd, eds. Macmillan Publ. Ltd., NY (1991) and Biocatalysts in Organic Syntheses, ed. J. Tramper et al., Elsevier Science Publishers, Amsterdam, The Netherlands (1985)).

[0126] 1. Biosynthetic Enzymes

[0127] Open reading frames encoding proteins involved in mediating the catalytic reactions involved in intermediary and macromolecular metabolism, the biosynthesis of small molecules, cellular processes and other functions includes enzymes involved in the degradation of the intermediary products of metabolism, enzymes involved in central intermediary metabolism, enzymes involved in respiration, both aerobic and anaerobic, enzymes involved in fermentation, enzymes involved in ATP proton motor force conversion, enzymes involved in broad regulatory function, enzymes involved in amino acid synthesis, enzymes involved in nucleotide synthesis, enzymes involved in cofactor and vitamin synthesis, can be used for industrial biosynthesis. The various metabolic pathways present in Mycoplasma can be identified based on absolute nutritional requirements as well as by examining the various enzymes identified in Table 1(a), 1(b) and 1(c).

[0128] Identified within the category of intermediary metabolism, a number of the proteins encoded by the identified ORFs in Tables 1(a), 1(b) and 1(c) are particularly involved in the degradation of intermediary metabolites as well as non-macromolecular metabolism. Some of the enzymes identified include amylases, glucose oxidases, and catalase.

[0129] Proteolytic enzymes are another class of commercially important enzymes. Proteolytic enzymes find use in a number of industrial processes including the processing of flax and other vegetable fibers, in the extraction, clarification and depectinization of fruit juices, in the extraction of vegetables' oil and in the maceration of fruits and vegetables to give unicellular fruits. A detailed review of the proteolytic enzymes used in the food industry is provided by Rombouts et al., Symbiosis 21:79 (1986) and Voragen et al. in Biocatalyst in Agricultural Biotechnology, edited J. R. Whitaker et al., American Chemical Society Symposium Series 389:93 (1989)).

[0130] The metabolism of glucose, galactose, fructose and xylose are important parts of the primary metabolism of Mycoplasma. Enzymes involved in the degradation of these sugars can be used in industrial fermentation. Some of the important sugar transforming enzymes, from a commercial viewpoint, include sugar isomerases such as glucose isomerase. Other metabolic enzymes have found commercial use such as glucose oxidases which produces ketogulonic acid (KGA). KGA is an intermediate in the commercial production of ascorbic acid using the Reichstein's procedure (see Krueger et al., Biotechnology 6(A), Rhine, H. J. et al., eds., Verlag Press, Weinheim, Germany (1984)).

[0131] Glucose oxidase (GOD) is commercially available and has been used in purified form as well as in an immobilized form for the deoxygenation of beer. See Hartmeir et al., Biotechnology Letters 1:21 (1979). The most important application of GOD is the industrial scale fermentation of gluconic acid. Market for gluconic acids which are used in the detergent, textile, leather, photographic, pharmaceutical, food, feed and concrete industry (see Bigelis in Gene Manipulations and Fungi, Benett, J. W. et al., eds., Academic Press, New York (1985), p. 357). In addition to industrial applications, GOD has found applications in medicine for quantitative determination of glucose in body fluids recently in biotechnology for analyzing syrups from starch and cellulose hydrosylates. See Owusu et al., Biochem. et Biophysica. Acta. 872:83 (1986).

[0132] The main sweetener used in the world today is sugar which comes from sugar beets and sugar cane. In the field of industrial enzymes, the glucose isomerase process shows the largest expansion in the market today. Initially, soluble enzymes were used and later immobilized enzymes were developed (Krueger et al., Biotechnology, The Textbook of Industrial Microbiology, Sinauer Associated Incorporated, Sunderland, Mass. (1990)). Today, the use of glucose-produced high fructose syrups is by far the largest industrial business using immobilized enzymes. A review of the industrial use of these enzymes is provided by Jorgensen, Starch 40:307 (1988).

[0133] Proteinases, such as alkaline serine proteinases, are used as detergent additives and thus represent one of the largest volumes of microbial enzymes used in the industrial sector. Because of their industrial importance, there is a large body of published and unpublished information regarding the use of these enzymes in industrial processes. (See Faultman et al., Acid Proteases Structure Function and Biology, Tang, J., ed., Plenum Press, New York (1977) and Godfrey et al., Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et al., Report Industrial Enzymes by 1990, Hel Hepner & Associates, London (1986)).

[0134] Another class of commercially usable proteins of the present invention are the microbial lipases identified in Tables 1(a), 1(b) and 1(c) (see Macrae et al., Philosophical Transactions of the Chiral Society of London 310:227 (1985) and Poserke, Journal of the American Oil Chemist Society 61:1758 (1984). A major use of lipases is in the fat and oil industry for the production of neutral glycerides using lipase catalyzed inter-esterification of readily available triglycerides. Application of lipases include the use as a detergent additive to facilitate the removal of fats from fabrics in the course of the washing procedures.

[0135] The use of enzymes, and in particular microbial enzymes, as catalyst for key steps in the synthesis of complex organic molecules is gaining popularity at a great rate. One area of great interest is the preparation of chiral intermediates. Preparation of chiral intermediates is of interest to a wide range of synthetic chemists particularly those scientists involved with the preparation of new pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et al., Recent Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press, Boca Raton, Fla. (1990)). The following reactions catalyzed by enzymes are of interest to organic chemists: hydrolysis of carboxylic acid esters, phosphate esters, amides and nitrites, esterification reactions, transesterification reactions, synthesis of amides, reduction of alkanones and oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of sulfides to sulfoxides, and carbon bond forming reactions such as the aldol reaction.

[0136] When considering the use of an enzyme encoded by one of the ORFs of the present invention for biotransformation and organic synthesis it is sometimes necessary to consider the respective advantages and disadvantages of using a microorganism as opposed to an isolated enzyme. Pros and cons of using a whole cell system on the one hand or an isolated partially purified enzyme on the other hand, has been described in detail by Bud et al., Chemistry in Britain (1987), p. 127.

[0137] Amino transferases, enzymes involved in the biosynthesis and metabolism of amino acids, are useful in the catalytic production of amino acids. The advantages of using microbial based enzyme systems is that the amino transferase enzymes catalyze the stereo-selective synthesis of only l-amino acids and generally possess uniformly high catalytic rates. A description of the use of amino transferases for amino acid production is provided by Roselle-David, Methods of Enzymology 136:479 (1987).

[0138] 2. Generation of Antibodies

[0139] As described here, the proteins of the present invention, as well as homologs thereof, can be used in a variety procedures and methods known in the art which are currently applied to other proteins. The proteins of the present invention can further be used to generate an antibody which selectively binds the protein. Such antibodies can be either monoclonal or polyclonal antibodies, as well fragments of these antibodies, and humanized forms.

[0140] The invention further provides antibodies which selectively bind to one of the proteins of the present invention and hybridomas which produce these antibodies. A hybridoma is an immortalized cell line which is capable of secreting a specific monoclonal antibody.

[0141] In general, techniques for preparing polyclonal and monoclonal antibodies as well as hybridomas capable of producing the desired antibody are well known in the art (Campbell, A. M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. Groth et al., J. Immunol. Methods 35:1-21 (1980); Kohler and Milstein, Nature 256:495-497 (1975)), the trioma technique, the human B-cell hybridoma technique (Kozbor et al., Immunology Today 4:72 (1983); Cole et al., in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc. (1985), pp. 77-96).

[0142] Any animal (mouse, rabbit, etc.) which is known to produce antibodies can be immunized with the pseudogene polypeptide. Methods for immunization are well known in the art. Such methods include subcutaneous or interperitoneal injection of the polypeptide. One skilled in the art will recognize that the amount of the protein encoded by the ORF of the present invention used for immunization will vary based on the animal which is immunized, the antigenicity of the peptide and the site of injection.

[0143] The protein which is used as an immunogen may be modified or administered in an adjuvant in order to increase the protein's antigenicity. Methods of increasing the antigenicity of a protein are well known in the art and include, but are not limited to coupling the antigen with a heterologous protein (such as globulin or .beta.-galactosidase) or through the inclusion of an adjuvant during immunization.

[0144] For monoclonal antibodies, spleen cells from the immunized animals are removed, fused with myeloma cells, such as SP2/0-Ag14 .myeloma cells, and allowed to become monoclonal antibody producing hybridoma cells.

[0145] Any one of a number of methods well known in the art can be used to identify the hybridoma cell which produces an antibody with the desired characteristics. These include screening the hybridomas with an ELISA assay, western blot analysis, or radioimmunoassay (Lutz et al., Exp. Cell Res. 175:109-124 (1988)).

[0146] Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using procedures known in the art (Campbell, A. M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984)).

[0147] Techniques described for the production of single chain antibodies (U.S. Pat. No. 4,946,778) can be adapted to produce single chain antibodies to proteins of the present invention.

[0148] For polyclonal antibodies, antibody containing antisera is isolated from the immunized animal and is screened for the presence of antibodies with the desired specificity using one of the above-described procedures.

[0149] The present invention further provides the above-described antibodies in detectably labeled form. Antibodies can be detectably labeled through the use of radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishing such labeling are well-known in the art, for example see (Stemberger, L. A. et al., J. Histochem. Cytochem. 18:315 (1970); Bayer, E. A. et al., Meth. Enzym. 62:308 (1979); Engval, E. et al., Immunol. 109:129 (1972); Goding, J. W. J. Immunol. Meth. 13:215 (1976)).

[0150] The labeled antibodies of the present invention can be used for in vitro, in vivo, and in situ assays to identify cells or tissues in which a fragment of the Mycoplasma genitalium genome is expressed.

[0151] The present invention further provides the above-described antibodies immobilized on a solid support. Examples of such solid supports include plastics such as polycarbonate, complex carbohydrates such as agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. Techniques for coupling antibodies to such solid supports are well known in the art (Weir, D. M. et al., "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific Publications, Oxford, England, Chapter 10 (1986); Jacoby, W. D. et al., Meth. Enzym. 34 Academic Press, N.Y. (1974)). The immobilized antibodies of the present invention can be used for in vitro, in vivo, and in situ assays as well as for immunoaffinity purification of the proteins of the present invention.

[0152] 3. Diagnostic Assays and Kits

[0153] The present invention further provides methods to identify the expression of one of the ORFs of the present invention, or homolog thereof, in a test sample, using one of the DFs or antibodies of the present invention.

[0154] In detail, such methods comprise incubating a test sample with one or more of the antibodies or one or more of the DFs of the present invention and assaying for binding of the DFs or antibodies to components within the test sample.

[0155] Conditions for incubating a DF or antibody with a test sample vary. Incubation conditions depend on the format employed in the assay, the detection methods employed, and the type and nature of the DF or antibody used in the assay. One skilled in the art will recognize that any one of the commonly available hybridization, amplification or immunological assay formats can readily be adapted to employ the DFs or antibodies of the present invention. Examples of such assays can be found in Chard, T., An Introduction to Radioimmunoassay and Related Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986); Bullock, G. R. et al., Techniques in Immunocytochemistry, Academic Press, Orlando, Fla. Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, P., Practice and Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1985).

[0156] The test samples of the present invention include cells, protein or membrane extracts of cells, or biological fluids such as sputum, blood, serum, plasma, or urine. The test sample used in the above-described method will vary based on the assay format, nature of the detection method and the tissues, cells or extracts used as the sample to be assayed. Methods for preparing protein extracts or membrane extracts of cells are well known in the art and can be readily be adapted in order to obtain a sample which is compatible with the system utilized.

[0157] In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry out the assays of the present invention.

[0158] Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: (a) a first container comprising one of the DFs or antibodies of the present invention; and (b) one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of a bound DF or antibody.

[0159] In detail, a compartmentalized kit includes any kit in which reagents are contained in separate containers. Such containers include small glass containers, plastic containers or strips of plastic or paper. Such containers allows one to efficiently transfer reagents from one compartment to another compartment such that the samples and reagents are not cross-contaminated, and the agents or solutions of each container can be added in a quantitative fashion from one compartment to another. Such containers will include a container which will accept the test sample, a container which contains the antibodies used in the assay, containers which contain wash reagents (such as phosphate buffered saline, Tris-buffers, etc.), and containers which contain the reagents used to detect the bound antibody or DF.

[0160] Types of detection reagents include labeled nucleic acid probes, labeled secondary antibodies, or in the alternative, if the primary antibody is labeled, the enzymatic, or antibody binding reagents which are capable of reacting with the labeled antibody. One skilled in the art will readily recognize that the disclosed DFs and antibodies of the present invention can be readily incorporated into one of the established kit formats which are well known in the art.

[0161] 4. Screening Assay for Binding Agents

[0162] Using the isolated proteins of the present invention, the present invention further provides methods of obtaining and identifying agents which bind to a protein encoded by one of the ORFs of the present invention or to one of the fragments and the Mycoplasma genome herein described.

[0163] In detail, said method comprises the steps of:

[0164] (a) contacting an agent with an isolated protein encoded by one of the ORFs of the present invention, or an isolated fragment of the Mycoplasma genome; and

[0165] (b) determining whether the agent binds to said protein or said fragment.

[0166] The agents screened in the above assay can be, but are not limited to, peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents. The agents can be selected and screened at random or rationally selected or designed using protein modeling techniques.

[0167] For random screening, agents such as peptides, carbohydrates, pharmaceutical agents and the like are selected at random and are assayed for their ability to bind to the protein encoded by the ORF of the present invention.

[0168] Alternatively, agents may be rationally selected or designed. As used herein, an agent is said to be "rationally selected or designed" when the agent is chosen based on the configuration of the particular protein. For example, one skilled in the art can readily adapt currently available procedures to generate peptides, pharmaceutical agents and the like capable of binding to a specific peptide sequence in order to generate rationally designed antipeptide peptides, for example see Hurby et al., "Application of Synthetic Peptides: Antisense Peptides," In Synthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, and Kaspezak et al., Biochemistry 28:9230-8 (1989), or pharmaceutical agents, or the like.

[0169] In addition to the foregoing, one class of agents of the present invention, as broadly described, can be used to control gene expression through binding to one of the ORFs or EMFs of the present invention. As described above, such agents can be randomly screened or rationally designed/selected. Targeting the ORF or EMF allows a skilled artisan to design sequence specific or element specific agents, modulating the expression of either a single ORF or multiple ORFs which rely on the same EMF for expression control.

[0170] One class of DNA binding agents are agents which contain base residues which hybridize or form a triple helix formation by binding to DNA or RNA. Such agents can be based on the classic phosphodiester, ribonucleic acid backbone, or can be a variety of sulfhydryl or polymeric derivatives which have base attachment capacity.

[0171] Agents suitable for use in these methods usually contain 20 to 40 bases and are designed to be complementary to a region of the gene involved in transcription (triple helix--see Lee et al., Nucl. Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988); and Dervan et al., Science 251: 1360 (1991)) or to the MRNA itself (antisense--Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, Fla. (1988)). Triple helix-formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Both techniques have been demonstrated to be effective in model systems. Information contained in the sequences of the present invention is necessary for the design of an antisense or triple helix oligonucleotide and other DNA binding agents.

[0172] Agents which bind to a protein encoded by one of the ORFs of the present invention can be used as a diagnostic agent, in the control of bacterial infection by modulating the activity of the protein encoded by the ORF. Agents which bind to a protein encoded by one of the ORFs of the present invention can be formulated using known techniques to generate a pharmaceutical composition for use in controlling Mycoplasma growth and infection.

[0173] 5. Vaccine and Pharmaceutical Composition

[0174] The present invention further provides pharmaceutical agents which can be used to modulate the growth of Mycoplasma genitalium, or another related organism, in vivo or in vitro. As used herein, a "pharmaceutical agent" is defined as a composition of matter which can be formulated using known techniques to provide a pharmaceutical compositions. As used herein, the "pharmaceutical agents of the present invention" refers the pharmaceutical agents which are derived from the proteins encoded by the ORFs of the present invention or are agents which are identified using the herein described assays.

[0175] As used herein, a pharmaceutical agent is said to "modulated the growth of Mycoplasma sp., or a related organism, in vivo or in vitro," when the agent reduces the rate of growth, rate of division, or viability of the organism in question. The pharmaceutical agents of the present invention can modulate the growth of an organism in many fashions, although an understanding of the underlying mechanism of action is not needed to practice the use of the pharmaceutical agents of the present invention. Some agents will modulate the growth by binding to an important protein thus blocking the biological activity of the protein, while other agents may bind to a component of the outer surface of the organism blocking attachment or rendering the organism more prone to act the bodies nature immune system. Alternatively, the agent may be comprise a protein encoded by one of the ORFs of the present invention and serve as a vaccine. The development and use of a vaccine based on outer membrane components, such as the LPS, are well known in the art.

[0176] As used herein, a "related organism" is a broad term which refers to any organism whose growth can be modulated by one of the pharmaceutical agents of the present invention. In general, such an organism will contain a homolog of the protein which is the target of the pharmaceutical agent or the protein used as a vaccine. As such, related organism do not need to be bacterial but may be fungal or viral pathogens.

[0177] The pharmaceutical agents and compositions of the present invention may be administered in a convenient manner such as by the oral, topical, intravenous, intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes. The pharmaceutical compositions are administered in an amount which is effective for treating and/or prophylaxis of the specific indication. In general, they are administered in an amount of at least about 10 .mu.g/kg body weight and in most cases they will be administered in an amount not in excess of about 8 mg/Kg body weight per day. In most cases, the dosage is from about 10 .mu.g/kg to about 1 mg/kg body weight daily, taking into account the routes of administration, symptoms, etc.

[0178] The agents of the present invention can be used in native form or can be modified to form a chemical derivative. As used herein, a molecule is said to be a "chemical derivative" of another molecule when it contains additional chemical moieties not normally a part of the molecule. Such moieties may improve the molecule's solubility, absorption, biological half life, etc. The moieties may alternatively decrease the toxicity of the molecule, eliminate or attenuate any undesirable side effect of the molecule, etc. Moieties capable of mediating such effects are disclosed in Remington's Pharmaceutical Sciences (1980).

[0179] For example, a change in the immunological character of the functional derivative, such as affinity for a given antibody, is measured by a competitive type immunoassay. Changes in immunomodulation activity are measured by the appropriate assay. Modifications of such protein properties as redox or thermal stability, biological half-life, hydrophobicity, susceptibility to proteolytic degradation or the tendency to aggregate with carriers or into multimers are assayed by methods well known to the ordinarily skilled artisan.

[0180] The therapeutic effects of the agents of the present invention may be obtained by providing the agent to a patient by any suitable means (i.e., inhalation, intravenously, intramuscularly, subcutaneously, enterally, or parenterally). It is preferred to administer the agent of the present invention so as to achieve an effective concentration within the blood or tissue in which the growth of the organism is to be controlled.

[0181] To achieve an effective blood concentration, the preferred method is to administer the agent by injection. The administration may be by continuous infusion, or by single or multiple injections.

[0182] In providing a patient with one of the agents of the present invention, the dosage of the administered agent will vary depending upon such factors as the patient's age, weight, height, sex, general medical condition, previous medical history, etc. In general, it is desirable to provide the recipient with a dosage of agent which is in the range of from about 1 pg/kg to 10 mg/kg (body weight of patient), although a lower or higher dosage may be administered. The therapeutically effective dose can be lowered by using combinations of the agents of the present invention or another agent.

[0183] As used herein, two or more compounds or agents are said to be administered "in combination" with each other when either (1) the physiological effects of each compound, or (2) the serum concentrations of each compound can be measured at the same time. The composition of the present invention can be administered concurrently with, prior to, or following the administration of the other agent.

[0184] The agents of the present invention are intended to be provided to recipient subjects in an amount sufficient to decrease the rate of growth (as defined above) of the target organism.

[0185] The administration of the agent(s) of the invention may be for either a "prophylactic" or "therapeutic" purpose. When provided prophylactically, the agent(s) are provided in advance of any symptoms indicative of the organisms growth. The prophylactic administration of the agent(s) serves to prevent, attenuate, or decrease the rate of onset of any subsequent infection. When provided therapeutically, the agent(s) are provided at (or shortly after) the onset of an indication of infection. The therapeutic administration of the compound(s) serves to attenuate the pathological symptoms of the infection and to increase the rate of recovery.

[0186] The agents of the present invention are administered to the mammal in a pharmaceutically acceptable form and in a therapeutically effective concentration. A composition is said to be "pharmacologically acceptable" if its administration can be tolerated by a recipient patient. Such an agent is said to be administered in a "therapeutically effective amount" if the amount administered is physiologically significant. An agent is physiologically significant if its presence results in a detectable change in the physiology of a recipient patient.

[0187] The agents of the present invention can be formulated according to known methods to prepare pharmaceutically useful compositions, whereby these materials, or their functional derivatives, are combined in admixture with a pharmaceutically acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of other human proteins, e.g., human serum albumin, are described, for example, in Remington's Pharmaceutical Sciences (16th ed., Osol, A., Ed., Mack, Easton Pa. (1980)). In order to form a pharmaceutically acceptable composition suitable for effective administration, such compositions will contain an effective amount of one or more of the agents of the present invention, together with a suitable amount of carrier vehicle.

[0188] Additional pharmaceutical methods may be employed to control the duration of action. Control release preparations may be achieved through the use of polymers to complex or absorb one or more of the agents of the present invention. The controlled delivery may be exercised by selecting appropriate macromolecules (for example polyesters, polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcellulose, carboxymethylcellulose, or protamine, sulfate) and the concentration of macromolecules as well as the methods of incorporation in order to control release. Another possible method to control the duration of action by controlled release preparations is to incorporate agents of the present invention into particles of a polymeric material such as polyesters, polyamino acids, hydrogels, poly(lactic acid) or ethylene vinylacetate copolymers. Alternatively, instead of incorporating these agents into polymeric particles, it is possible to entrap these materials in microcapsules prepared, for example, by coacervation techniques or by interfacial polymerization, for example, hydroxymethylcellulose or gelatine-microcapsules and poly(methylmethacylate) microcapsules, respectively, or in colloidal drug delivery systems, for example, liposomes, albumin microspheres, microemulsions, nanoparticles, and nanocapsules or in macroemulsions. Such techniques are disclosed in Remington's Pharmaceutical Sciences (1980).

[0189] The invention further provides a pharmaceutical pack or kit comprising one or more containers filled with one or more of the ingredients of the pharmaceutical compositions of the invention. Associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration. In addition, the agents of the present invention may be employed in conjunction with other therapeutic compounds.

EXPERIMENTAL

EXAMPLE 1

[0190] Overview of Experimental Design and Methods

[0191] 1. Shotgun Sequencing Strategy

[0192] The overall strategy for a shotgun approach to whole genome sequencing is outlined in Table 3. The theory of shotgun sequencing follows from the application of the equation for the Poisson distribution p.sub.x=m.sup.xe/x!, where x is the number of occurrences of an event and m is the mean number of occurrences. To determine the probability that any given base is not sequenced after a certain amount of random sequence has been generated, if L is the genome length, n is the number of clone insert ends sequenced, and w is the sequencing read length, then m=nw/L, and the probability that no clone originates at any of the w bases preceding a given base, i.e., the probability that the base is not sequenced, id p.sub.0=e.sup.-m. Using the fold coverage as the unit form, one sees that after 580 kb of sequence has been randomly generated, m=1, representing 1.times. coverage. In this case, p.sub.0=e.sup.-1=37, thus approximately 37% is unsequenced. A 5.times. coverage (approximately 3150 clones sequenced from both insert ends) yields p.sub.0=e.sup.-5=0.0067, or 0.67% unsequenced. The total gap length is Le.sup.-m and the average gap size is L/n. 5.times.coverage would leave about 48 gaps averaging about 80 bp in size. The treatment is essentially that of Lander and Waterman. Table 4 illustrates a computer simulation of a random sequencing experiment for coverage of a 580 kb genome with an average fragment size of 400 bp.

[0193] 2. Random Library Construction

[0194] In order to approximate the random model described above during actual sequencing, a nearly ideal library of cloned genomic fragment is required. M. genitalium genomic chromosomal DNA was mechanically sheared, digested with BAL31 nuclease to produce blunt-ends, and size-fractionated by agarose gel electrophoresis. Fragments in the 2.0 kb size range were excised and recovered. These fragments were ligated to SmaI-cut, phophatased pUC18 vector and the ligated products were fractionated on an agarose gel. The linear vector plus insert band was excised and recovered. The ends of the linear recombinant molecules were repaired with T4 polymerase treatment and the molecules were then ligated into circles. This two-stage procedure resulted in a molecularly random collection of single-insert plasmid recombinants with minimal contamination from double-insert plasmid recombinants with minimal contamination from double-insert chimeras (<1%) or free vector (<1%). Deviation from randomness is most likely to occur during cloning. E. coli host cells deficient in all recombinant and restriction functions were used to prevent rearrangements, deletions, and loss of clones by restriction. Transformed cells were plated directly on antibiotic diffusion plates to avoid the usual broth recovery phase which allows multiplication and selection of the most rapidly growing cells. All colonies were picked for template preparation regardless of size. Only clones lost due to "poison" DNA or deleterious gene products would be deleted from the library, resulting in a slight increase in gap number over that expected.

[0195] In order to evaluate the quality of the M. genitalium random insert library, sequence data was obtained from approximately 2000 templates using the M13F primer. The random sequence fragments were assembled using The Institute for Genomic Research (TIGR) autoassembler software after obtaining 500, 1000, 1500, and 2000 sequence fragments, and the number of unique assembled base pairs was determined. The progression of assembly was plotted using the actual data obtained from the assembly of up to 2000 sequence fragments and compared the data that is provided in the ideal plot. There was essentially no deviation of the actual assembly data from the ideal plot, indicating that we had constructed close to an ideal random library with minimal contamination from double insert chimeras and free of vector.

[0196] 3. Random DNA Sequencing

[0197] Five-thousand seven hundred and sixty (5,760) plasmid templates were prepared using a "boiler bead" preparation method developed in collaboration with AGTC (Gaithersburg, Md.), as suggested by the manufacturer. The AGTC method is performed in a 96-well format for all stages of DNA preparation from bacterial growth through final DNA purification. Template concentration was determined using Hoechst Dye and a Millipore Cytofluor. DNA concentrations were not adjusted and low-yielding templates were identified and not sequenced where possible. Sequencing reactions were carried out on plasmid templates using the AB Catalyst Lab station or Perkin-Elmer 9600 Thermocyclers with Applied Biosystems PRISM Ready Reaction Dye Primer Cycle Sequencing Kits for the M13 forward (-21M13) and the M13 reverse (RP1) primers. Dye terminator sequencing reactions were carried out on the lambda templates on a Perkin-Elmer 9600 Thermocyler using the Applied Biosystems Ready Reaction Dye Terminator Cycle Sequencing kits. Nine-thousand eight hundred and forty-six (9,846) sequencing reactions were performed during the random phase of the project by 4 individuals using an average of 10 AB373 DNA Sequencers over a 2 month period. All sequencing reactions were analyzed using the Stretch modification of the AB373, primarily using a 36cm well-to-read distance. The overall sequencing success rate for M13-21 sequences was 88% and 84% for M13RP1 sequences. The average usable read length for M13-21 sequences was 485 and 441 for M13RP1 sequences.

[0198] The art has described the value of using sequence from both ends of sequencing templates to facilitate ordering of contigs in shotgun assembly projects. A skilled artisan must balance the desirability of both-end sequencing (including the reduced cost of lower total number of templates) against shorter read-lengths and lower success rates for sequencing reactions performed with the M13RP1 (reverse) primer compared to the M13-21 (forward) primer. For this project, essentially all of the templates were sequenced from both ends.

[0199] 4. Protocol for Automated Cycle Sequencing

[0200] The sequencing consisted of using five (5) ABI Catalyst robots and ten (10) ABI 373 Automated DNA Sequencers. The Catalyst robot is a publicly available sophisticated pipetting and temperature control robot which has been developed specifically for DNA sequencing reactions. The Catalyst combines pre-aliquoted templates and reaction mixes consisting of deoxy- and dideoxynucleotides, the Taq thermostable DNA polymerase, fluorescently-labeled sequencing primers, and reaction buffer. Reaction mixes and templates were combined in the wells of an aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear amplification (e.g., one primer synthesis) steps were performed including denaturation, annealing of primer and template, and extension of DNA synthesis. A heated lid with rubber gaskets on the thermocycling plate.prevented evaporation without the need for an oil overlay.

[0201] Two sequencing protocols were used: dye-labeled primers and dye-labeled dideoxy chain terminators. The shotgun sequencing involves use of four dye-labeled sequencing primers, one for each of the four terminator nucleotide. Each dye-primer is labeled with a different fluorescent dye, permitting the four individual reactions to be combined into one lane of the 373 DNA Sequencer for electrophoresis, detection, and base-calling. ABI currently supplies pre-mixed reaction mixes in bulk packages containing all the necessary non-template reagents for sequencing. Sequencing can be done with both plasmid and PCR-generated templates with both dye-primers and dye-terminators with approximately equal fidelity, although plasmid templates generally give longer usable sequences.

[0202] Thirty-two reactions were loaded per 373 Sequencer each day, for a total of 960 samples. Electrophoresis was run overnight following the manufacture's protocols, and the data was collected for twelve hours. Following electrophoresis and fluorescence detection, the ABI 373 performs automatic lane tracking and base-calling. The lane-tracking was confirmed visually. Each sequence electropherogram (or fluorescence lane trace) was inspected visually and assessed for quality. Trailing sequences of low quality were removed and the sequence itself was loaded via software a Sybase database (archived daily to a 8 mm tape). Leading vector polylinker sequence was removed automatically by software program. The average edited lengths of sequences from the ABI 373 Sequencers converted to Stretch Liners were approximately 460 bp.

[0203] Informatics

[0204] 1. Data Management

[0205] A number of information management systems (LIMS) for a large-scale sequencing lab have been developed. A system was used which allowed an automated data flow wherever possible to reduce user error. The system used to collect and assemble the sequence information obtained is centered upon a relational data management system built using the Sybase RDBMS. The database is designed to store and correlate all information collected during the entire operation from template preparation to final analysis of the genome. Because the raw output of the AB 373 Sequencers is based on a Macintosh platform and the data management system chosen is based on a Unix platform, it was necessary to design and implement a variety of multi-user, client server applications which allow the raw data as well as analysis results to flow seamlessly into the database with a minimum of user effort.

[0206] 2. Assembly

[0207] The sequence data from 8,472 sequence fragments was used to assemble the M. genitalium genome. The assembly was performed by using a new assembly engine (TIGR Assembler--previously designated ASMG) developed at TIGR. The TIGR Assembler simultaneously clusters and assembles fragments of the genome. In order to obtain the necessary speed, the TIGR Assembler builds a hash table of 10 bp oligonucleotide subsequences to generate a list of potential sequence fragment. The number of potential overlaps for each fragment determines which fragments are likely to fall into repetitive elements. Beginning with a single seed sequence fragment, the TIGR Assembler extends the current contig by attempting to add the best matching fragment based on oligonucleotide content. The current contig and candidate fragment are aligned using a modified version of the Smith-Waterman algorithm which provides for optimal gap alignments. The current contig is extended by the fragment only if strict criteria for the quality of the match are met. The match criteria include the minimum length of overlap, the maximum length of an unmatched end, and the minimum percentage match. These criteria are automatically lowered by the TIGR Assembler in regions of minimal coverage and raised in regions with a good chance of containing repetitive elements. Potentially chimeric fragments and fragments representing the boundaries of repetitive elements are often rejected based on partial mismatches at the ends of alignments and excluded from the current contig. The TIGR Assembler is designed to take advantage of clone size information coupled with sequencing from both ends of each template. The TIGR Assembler enforces the constraint that sequence fragments from two ends of the same template point toward one another in the contig and are located within a certain range of base pairs (definable for each clone based on the known clone size range for a given library). Assembly of the 8,472 sequence fragments of M. genitalium required 10 hours of CPU time on a SPARCenter 2000. All contigs were loaded into a Sybase structure representing the location of each fragment in the contig and extensive information about the consensus sequence itself. The result of this process was approximately 40 contigs ordered into 2 groups (See below). Because of the high stringency of the TIGR Assembler process it was found to be useful to perform a FASTA (GRASTA) alignment of all contigs built by the TIGR Assembler process against each other. In this way additional overlaps were detected which enabled compression of the data set into 26 contigs in 2 groups.

[0208] Achieving Closure

[0209] The complete genome sequence was obtained by sequencing across the gaps between contigs. While gap filling has occupied a major portion of the time and expense of other genome sequencing projects, it was minimal in the present invention. This was primarily due to 1) saturation of the genome as a result of the number of random clones and sequencing reactions performed, 2) the longer read lengths obtained from the Stretch Liners, 3) the anchored ends which were obtain for joining contigs, and 4) the overall capacity and efficiency of the high throughput sequencing facility.

[0210] Gaps occurred on a predicted random basis, as shown in Table 4, which illustrates simulated random sequencing. These gaps generally were less than 200 bp in size. All of the gaps were closed by sequencing further on the templates bordering the gaps. In these cases, oligo primers for extension of the sequence from both ends of the gap were generated using techniques known in the art. This gave a double standard coverage across the gap areas.

[0211] The high redundancy of sequence information that was obtained from the shotgun approach gave a highly accurate sequence. Our sequence accuracy was confirmed by comparing the sequence information obtained against known M. genitalium genes present in the GenBank database. The accuracy of our chromosome structure was confirmed by comparison of restriction digests to the known restriction map of IM. genitalium. The EcoRI restriction map of M. genitalium is shown in FIG. 1 and expressed in tabular form in Table 5.

[0212] Identifying Genes

[0213] M. genitalium ORFs were initially defined by evaluating their coding potential with the program GeneWorks using composition matrices specific to Mycoplasma genomic DNA. The ORF sequences (plus 300 bp of flanking sequence) were used in searches against a database of non-redundant bacterial proteins (NRBP). Redundancy was removed from NRBP at two stages. (1) All DNA coding sequences were extracted from GenBank (release 85), and sequences from the same species were searched against each other. Sequences having >97% similarity over regions >100 nucleotides were combined. (2) The sequences were translated and used to protein comparisons with all sequences in Swiss-Prot (release 30). Sequences belonging to the same species and having >98% similarity over 33 amino acids were combined. NRBP is composed of 21445 sequences from 23751 GenBank sequences and 11183 Swiss-Prot sequences from 1099 different species.

[0214] Searches were performed using an algorithm that (1) translates the query DNA sequence in all six reading frames for searching against a protein database, (2) identifies the protein sequences that match the query, and (3) aligns the protein-protein matches using a modified Smith-Waterman algorithm. In cases where insertion or deletions in the DNA sequence produced a frame shift error, the alignment algorithm started with protein regions of maximum similarity and extended the alignment to the same database match using the 300 bp flanking region. Regions known to contain frame shift errors were saved to the database and evaluated for possible correction. The role categories were adopted from those previously defined by Riley et al. for E. coli gene products. Role assignments were made to M. genitalium ORFS at the protein sequence level by linking the protein sequence of the ORFS with the Swiss-Prot sequences in the Riley database.

[0215] Detailed Description of Sequencing the Mycoplasma genitalium Genome, Genome Analysis and Comparative Genomics

[0216] We have determined the complete nucleotide sequence (580,071 bp) of the Mycoplasma genitalium genome using the approach of whole chromosome shotgun sequencing and assembly, which has successfully been applied to the analysis of the Haemophilus influenzae genome (R. Fleischmann et al., Science 269:496 (1995)). These data, together with the description of the complete genome sequence (1.83 Mb) of the eubacterium Haemophilus influenzae, have provided the opportunity for comparative genomics on a whole genome level for the first time. Our initial whole genome comparisons reveal fundamental differences in genome content which are reflected in different physiological and metabolic capacities of M. genitalium and H. influenzae.

[0217] The strategy and methodology for whole genome shotgun sequencing and assembly was similar to that previously described for H. influenzae (R. Fleischmann et al., Science 269:496 (1995). In particular, a total of 50 .mu.g of purified M. genitalium strain G-37 DNA (ATCC No. 33530) was isolated from cells grown in Hayflick's medium. A mixture (990 .mu.l) containing 50 .mu.g of DNA, 300 mM sodium acetate, 10 mM tris HCl, 1 mM EDTA, and 30 percent glycerol was chilled to 0.degree. C. in a nebulizer chamber and sheared at 4 lbs/in.sup.2 for 60 seconds. The DNA was precipitated in ethanol and redissolved in 50 .mu.l of tris-EDTA (TE) buffer to create blunt ends; a 40 .mu.l portion was digested for 10 minutes at 30.degree. C. in 85 .mu.l of BAL31 buffer with 2 units of BAL 31 nuclease (New England BioLabs). The DNA was extracted with phenol, precipitated in ethanol, dissolved in 60 .mu.l of TE buffer, and fractionated on a 1.0 percent low melting agarose gel. A fraction (2.0 kb) was excised, extracted with phenol, and redissolved in 20 .mu.l of TE buffer. A two-step ligation procedure was used to produce a plasmid library in which 99% of the recombinants contained inserts, of which >99% were single inserts. The first ligation mixture (50 .mu.l) contained approximately 2 .mu.g DNA fragments, 2 .mu.g of SmaI+bacterial alkaline phosphatase pUC 18 DNA (Pharmacia), and 10 units of T4 DNA ligase (GIBCO/BRL), and incubation was for 5 hours at 4.degree. C. After extraction with phenol and ethanol precipitation, the DNA was dissolved in 20 .mu.l of TE buffer and separated by electrophoresis on a 1.0 percent low melting agarose gel. A ladder of ethidium bromide-stained, linearized DNA bands, identified by size as insert (i), vector (v), v+i, v+2i, v+3i, etc. was visualized by 360 nm ultraviolet light. The v+i DNA was excised and recovered in 20 .mu.l of TE buffer. The v+i DNA was blunt-ended by T4 polymerase treatment for 5 minutes at 37.degree. C. in a reaction mixture (50til) containing the linearized v+i fragments, four deoxynucleotide triphosphates (dNTPs) (25 .mu.M each), and 3 units of T4 polymerase (New England Biolabs) under buffer conditions recommended by the supplier. After phenol extraction and ethanol precipitation, the repaired v+i linear pieces were dissolved in 20 .mu.l of TE. The final ligation to produce circles was carried out in a 50 .mu.l reaction containing 5 .mu.l of v+i DNA and 5 units of T4 ligase at 15.degree. C. overnight. The reaction mixture was heated at 67.degree. C. for 10 minutes and stored at -20.degree. C.

[0218] For transformation, a 100 .mu.l portion of Epicurian-SURE 2 Supercompetent Cells (Stratagene 200152) was thawed on ice and transferred to a chilled Falcon 2059 tube on ice. A 1.7 .mu.l volume of 1.42M .beta.-mercaptoethanol was added to the cells to a final concentration of 25 mM. Cells were incubated on ice for 10 minutes. A 1 .mu.l sample of the final ligation mix was added to the cells and incubated on ice for 30 minutes. The cells were heat-treated for 30 seconds at 42.degree. C. and placed back on ice for 2 minutes. The outgrowth period in liquid culture was omitted to minimize the preferential growth of any transformed cell. Instead, the transformed cells were plated directly on a nutrient rich SOB plate containing a 5 ml bottom layer of SOB agar (1.5 percent SOB agar consisted of 20 g of tryptone, 5 g of yeast extract, 0.5 g of NaCl, and 1.5 percent Difco agar/liter). The 5 ml bottom layer was supplemented with 0.4 ml of ampicillin (50 mg/ml) per 100 ml of SOB agar. The 15 ml top layer of SOB agar was supplemented with 1 ml of MgCl.sub.2 (1M) and 1 ml of MgSO.sub.4 (1M) per 100 ml of SOB agar. The 15 ml top layer was poured just before plating. The titer of the library was approximately 100 colonies per 10 .mu.l aliquot of transformation.

[0219] One of the lessons learned from sequencing and assembly of the complete H. influenzae genome was that contig ordering and gap closure is most efficient if the random sequencing phase of the project is continued until at least 99.8% -99.9% of the genome is sequenced with at least 6-fold coverage. To calculate the number of random sequencing reactions necessary to obtain this coverage for the M. genitalium genome, we made use of the Lander and Waterman [E. S. Lander and M. S. Waterman, Genomics 2:231 (1988)] application of the Poisson distribution, where p.sub.x=e.sup.-nw/L. p.sub.x is the probability that any given base is not sequenced, n is the number of clone insert ends sequenced, w is the average read length of each template in bp, and L is the size of the genome in bp. For a genome of 580 kb with an average sequencing read length of 450 bp after editing, approximately 8650 sequencing reactions (or 4325 clones sequenced from both ends) should theoretically provide 99.85% coverage of the genome. This level of coverage should leave approximately 10 gaps with an average size of 70 bp unsequenced.

[0220] To evaluate the quality of the M. genitalium library, sequence data were obtained from both ends of approximately 600 templates using both the M13 forward (M13-21) and the M13 reverse (M13RP1) primers. Sequence fragments were assembled using the TIGR ASSEMBLER and found to approximate a Poisson distribution of fragments with an average read length of 450 bp for a 580 kb library, indicating that the library was essentially random.

[0221] For this project, a total of 5760 double-stranded DNA plasmid templates were prepared in a 96-well format using a boiling bead method. Ninety-four percent of the templates prepared yielded a DNA concentration .gtoreq.30 ng/.mu.l and were used for sequencing reactions. To facilitate ordering of contigs each template was sequenced from both ends. Reactions were carried out on using the AB Catalyst LabStation with Applied Biosystems PRISM Ready reaction Dye Primer Cycle Sequencing Kits for the M13 forward (M13-21) and the M13 reverse (M13RP1) primers. The success rate and average read length after editing with the M13-21 primer were 88 percent and 444 bp, respectively, and 84 percent and 435 bp, respectively, with the M13RP1 primer. All data from template preparation to final analysis of the project were stored in a relational data management system developed at TIGR [A. R. Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii International Conference on System Science (IEEE Computer Society Press, Washington, D.C., 1993), p. 585] To facilitate ordering of contigs each template was sequenced from both ends. A total of 9846 sequencing reactions were performed by five individuals using an average of 8 AB 373 DNA Sequencers per day for a total of 8 weeks. Assembly of 8472 high quality M. genitalium sequence fragments along with 299 random genomic sequences from Peterson et al. (S. N. Peterson et al., J. Bacteriol. 175:7918 (1993)) was performed with the TIGR ASSEMBLER. The assembly process generated 39 contigs (size range: 606 to 73,351 bp) which contained a total of 3,806,280 bp of primary DNA sequence data. Contigs were ordered by ASM_ALIGN, program which links contigs based on information derived from forward and reverse sequencing reactions from the same clone.

[0222] ASM_ALIGN analysis revealed that all 39 gaps were spanned by an existing template from the small insert genomic DNA library (i.e., there were no physical gaps in the sequence assembly). The order of the contigs was confirmed by comparing the order of the random genomic sequences from Peterson et al. (S. N. Peterson et al., J. Bacteriol. 175:7918 (1993)) that were incorporate into the assembly with their known position on the physical map of the M. genitalium chromosome (T. S. Lucier et al., Gene 150:27 (1994); Peterson et al., J. Bacteriol. 177:3199 (1995)). Because of the high stringency of the TIGR ASSEMBLER, the 39 contigs were searched against each other with GRASTA (a modified FASTA (B. Brutlag et al., Comp. Chem. 1:203 (1993)). The BLOSUM 60 amino acid substitution matrix was used in all protein-protein comparisons [S. Henikoff and J. G. Henikoff, Proc. Natl. Acad. Sci. USA 89:1091 (1992)] to detect overlaps (<30 bp) that would have been missed during the initial assembly process. Eleven overlaps were detected with this approach which reduced the total number of gaps from 39 to 28.

[0223] Templates spanning each of the sequence gaps were identified and oligonucleotide primers were designed from the sequences at the end of each contig. All gaps were less than 300 bp; thus a primer walk from both ends of each template was sufficient for closure. All electropherograms were visually inspected with TIGR EDITOR (R. Fleischmann et al., Science 269:496 (1995)) for initial sequence editing. Where a discrepancy could not be resolved o a clear assignment made, the automatic base calls were left unchanged.

[0224] Several criteria for determination of sequence completion were established for the H. influenzae genome sequencing project and these same criteria were applied to this study. Across the assembled M. genitalium genome there is an average sequence redundancy of 6.5-fold. The completed sequence contains less than 1-% single sequence coverage. For each of the 53 ambiguities remaining after editing and the 25 potential frameshifts found after sequence-similarity searching, the appropriate template was resequenced with an alternative sequencing chemistry (dye terminator vs. dye primer) to resolve ambiguities. Although it is extremely difficult to assess sequence accuracy, we estimate our error rate to be less than 1 base in 10,000 based upon frequency of shifts in open reading frames, unresolved ambiguities, overall quality of raw data, and fold coverage.

[0225] A direct cost estimate for sequencing, assembly, and annotation of the M. genitalium genome was determined by summing reagent and labor costs for library construction, template preparation and sequencing, gap closure, sequence confirmation, annotation, and preparation for publication, and dividing by the size of the genome in base pairs. This yielded a final cost of 30 cents per finished base pair.

[0226] Genomic Analysis

[0227] The M. genitalium genome is a circular chromosome of 580,071 bp. The overall G+C content is 32% (A, 34%; C, 16%; G, 16%; and T, 34%). The G+C content across the genome varies between 27 and 37% (using a window of 5000 bp), with the regions of lowest G+C content flanking the presumed origin of replication of the organism. As in H. influenzae (Fleischmann, R. et al., Science 269:496 (1995)), the rRNA operon in M. genitalium contains a higher G+C content (44%) than the rest of the genome, as do the tRNA genes (52%). The higher G+C content in these regions may reflect the necessity of retaining essential G+C base pairing for secondary structure in rRNAs and tRNAs (Rogers, M. J. et al., Isr. J. Med. Sci. 20:768 (1984)).

[0228] The genome of M. genitalium contains 74 EcoRI fragments, as predicted by cosmid mapping data (Lucier, T. S. et al., Gene 150:27 (1994); Peterson et al., J. Bacteriol. 177:3199 (1995)). The order and sizes of the EcoRI fragments determined from sequence analysis are in agreement with those previously reported (Lucier, T. S. et al., Gene 150:27 (1994); Peterson et al., J. Bacteriol. 177:3199 (1995)), with one apparent discrepancy between coordinates 62,708 and 94,573 in the sequence. However, re-evaluation of cosmid hybridization data in light of results from genome sequence analysis confirms that the sequence data are correct, and the extra 4.0 kb EcoRI fragment in this region of the cosmid map reflects a misinterpretation of the overlap between cosmids J-8 and 21 (Lucier, T. S., unpublished observation). The ends of each clone from the ordered cosmid library were sequenced and are shown on the circular chromosome in FIG. 4. The order of the cosmids based on sequence analysis is in complete agreement with that determined by physical mapping (Lucier, T. S. et al., Gene 150:27 (1994); Peterson et al., J. Bacteriol. 177:3199 (1995)).

[0229] We defined the first bp of the chromosomal sequence of M. genitalium based on the putative origin of replication (Bailey & Bott, J. Bacteriol. 176:5814 (1994)). Studies of origins of replication in some prokaryotes have shown that DNA synthesis is initiated in an untranscribed AT rich region between dnaA and dnaN (Ogasawara, N. et al., in The Bacterial Chromosome, Krlica & Riley, eds., American Society for Microbiology, Washington, D.C. (1990), pp. 287-295; Ogasawara & Yoshikawa, Mol. Microbiol. 6:629 (1992)). A search of the M. genitalium sequence for "DnaA boxes" around the putative origin of replication with consensus "DnaA boxes" from Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa revealed no significant matches. Although we have not been able to precisely localize the origin, the co-localization of dnaA and dnaN to a 4000 bp region of the chromosome lends support to the hypothesis that it is the functional origin of replication in M. genitalium (Ogasawara, N. et al., in The Bacterial Chromosome, Krlica & Riley, eds., American Society for Microbiology, Washington, D.C. (1990), pp. 287-295; Ogasawara & Yoshikawa, Mol. Microbiol. 6:629 (1992), Miyata, M. et al., Nucleic Acids Res. 21:4816 (1993)). We have chosen an untranscribed region between dnaA and dnaN so that dnaN is numbered as the first open reading frame in the genome. As seen in FIG. 4, genes to the right of this region are preferentially transcribed from the plus strand and to the left of this region, are preferentially transcribed from the minus strand. The apparent polarity in gene transcription is maintained across each half of the genome (FIGS. 4 and 5). This stands in marked contrast to H. influenzae which displays no apparent polarity of transcription around the origin of replication. The significance of this observation remains to be determined.

[0230] The predicted coding regions of M. genitalium were initially defined by searching the entire genome for open reading frames greater than 100 amino acids. Translations were made using the genetic code for mycoplasma species in which UGA encodes tryptophan. All open reading frames were searched with BLAZE (Brutlag, D. et al., Comp. Chem. 1:203 (1993). The BLOSUM 60 amino acid substitution matrix was used in all protein-protein comparisons (Henikoff, S. and Henikoff, J. G., Proc. Natl. Acad. Sci. USA 89:1091 (1992)) against a non-redundant bacterial protein database (NRBP) (Fleischmann, R. et al., Science 269:496 (1995)) developed at TIGR on a MasPar MP-2 massively parallel computer with 4096 microprocessors. Protein matches were aligned with PRAZE, a modified Smith-Waterman (Waterman, M. S., Methods Enzymol. 164:765 (1988)) algorithm. Segments between predicted coding regions of the genome were used in additional searches against all protein sequences from GenPept, Swiss-Prot, and PIR. Pairwise alignments between M. genitalium predicted open reading frames and sequences from the public archives were examined. Motif matches were annotated in cases where sequence similarity was confined to short domains in the predicted coding region. The coding potential of 170 unidentified open reading frames was analyzed with GeneMark (Borodovsky & Mcninch, ibid, p. 123) which had been trained with 308 M. genitalium sequences. Open reading frames that had low coding potential (based on the GeneMark analysis) and were smaller than. 100 nucleotides (a total of 53) were removed from the final set of putative coding regions. In a separate analysis, open reading frames were searched against the complete set of translated sequences from H. influenzae (GSDB accession L42023, see (Fleischmann, R. et al., Science 269:496 (1995))). In total, these processes resulted in the identification of 482 predicted coding regions, of which 365 were putatively identified (Twenty-three of the protein matches in Table 6 were annotated as motifs. These data matches were not full-length protein matches, but nonetheless displayed regions of significant amino acid similarity) and 117 had no matches to protein sequences from any other organism.

[0231] The 365 predicted coding regions that matched protein sequences from the public sequence archives were assigned biological roles. The role classifications were developed from Riley (Riley, M., Microbiol. Rev. 57:862 (1992)) and identical to those used in H. influenzae assignments (Fleischmann, R., et al., Science 269:496 (1995)). A separate search procedure was used in cases where we were unable to detect genes in the M. genitalium genome. Query peptide sequences that were available from eubacteria such as E. coli, B. subtilis, M. capricolum, and H. influenzae were used in searches against all six reading frame translations of the entire genome sequence, and the alignments were examined. The possibility remains that current searching methods, an incomplete set of query sequences, or the subjective analysis of the database matches, are not sensitive enough to identify certain M. genitalium gene sequences.

[0232] One-half of all predicted coding regions in M. genitalium for which a putative identification could be assigned display the greatest degree of similarity to a protein from either a gram-positive organism (e.g., B. subtilis) or a Mycoplasma species. The significance of this finding is underscored by the fact that NRBP contained 3885 sequences from E. coli and only 1975 sequences from B. subtilis. In the majority of cases where M. genitalium coding regions matched sequences from both E. coli and Bacillus species, the better match was to a sequence from Bacillus (average of 62 percent similarity) rather than to a sequence from E. coli (average of 56 percent similarity). The evolutionary relationship between Mycoplasma and the Lactobacillus-Clostridium branch of the gram-positive phylum has been deduced from small subunit rRNA sequences (Maidak, B. L. et al., Nucleic Acids Research 22:3485 (1994)). Our data from whole genome analysis support this hypothesis.

[0233] Comparative Genomics: M. genitalium and H. influenzae

[0234] A survey of the genes and their organization in M. genitalium makes possible the description of a minimal set of genes required for survival. One would predict that a minimal cell must contain genes for replication and transcription, at least one rRNA operon and a set of ribosomal proteins, tRNAs and tRNA synthetases, transport proteins to derive nutrients from the environment, biochemical pathways to generate ATP and reducing power, and mechanisms for maintaining cellular homeostasis. Comparison of the genes identified in M. genitalium with those in H. influenzae allows for identification of a basic complement of genes conserved in these two species and provides insights into physiological differences between one of the simplest self-replicating prokaryotes and a more complex, gram-negative bacterium.

[0235] The M. genitalium genome contains 482 predicted coding sequences (Table 6) as compared to 1,727 identified in H. influenzae (Fleischmann, R. et al., Science 269:496 (1995)). Table 7 summarizes the gene content of both organisms sorted by functional category. The percent of the total genome in M. genitalium and H. influenzae encoding genes involved in cell envelope, cellular processes, energy metabolism, purine and pyrimidine metabolism, replication, transcription, transport, and other categories is similar; although the total number of genes in these categories is considerably fewer in M. genitalium. A smaller percentage of the M. genitalium genome encodes genes involved in amino acid biosynthesis, biosynthesis of co-factors, central intermediary metabolism, fatty acid and phospholipid metabolism, and regulatory functions as compared with H. influenzae. A greater percentage of the M. genitalium genome encodes proteins involved in translation than in H. influenzae , as shown by the similar numbers of ribosomal proteins and tRNA synthetases in both organisms.

[0236] The 482 predicted coding regions in M. genitalium (average size of 1100 bp) cover 85% of the genome (on average, one gene every 1169 bp), a value similar to that found in H. influenzae where 1727 predicted coding regions (average size of 900 bp) cover 91% of the genome (one gene every 1042 bp). These data indicate that the reduction in genome size that has occurred within Mycoplasma has not led to an increase in gene density or a decrease in gene size (Bork, P. et al., Mol. Microbiol. 16:955 (1995)). A global search of M. genitalium and H. influenzae genomes reveals short regions of conservation of gene order, particularly two clusters of ribosomal proteins.

[0237] Replication. Two major protein complexes are formed during replication: the primosome and the replisome. We have identified genes encoding many of the essential proteins in the replication process, including M. genitalium isologs of the primosome proteins DnaA, DnaB, GyrA, GyrB, a single stranded DNA binding protein, and the primase protein, DnaE. DnaJ and DnaK, heat shock proteins that may function in the release of the primosome complex, are also found in M. genitalium. A gene encoding the DnaC protein, responsible for delivery of DnaB to the primosome, has yet to be identified.

[0238] Genes encoding most of the essential subunit proteins for DNA polymerase III in M. genitalium were also identified. The polC gene encodes the a subunit which contains the polymerase activity. We have also identified the isolog of dnaH in B. subtilis (dnaX in E. coli) which encodes the .gamma. and t subunits as alternative products from the same gene. These proteins are necessary for the processivity of DNA polymerase III. An isolog of dnaN which encodes the P subunit was previously identified in M. genitalium (Bailey & Bott, J. Bacteriol. 176:5814 (1994)) and is involved in the process of clamping the polymerase to the DNA template. While we have yet to identify a gene encoding the subunit responsible for the 3'-5' proofreading activity, it is possible that this activity is encoded in the a subunit as has been previously described (Sanjanwala, B. and Ganesa, A. T., Mol. Gen. Genet. 226:467 (1991); Sanjanwala, B. and Ganesan, A. T., Proc. Natl. Acad. Sci. USA 86:4421 (1989)). Finally, we have identified a gene encoding a DNA ligase, necessary for the joining of the Okazaki fragments formed during synthesis of the lagging strand.

[0239] While we have identified genes encoding many of the isologs thought to be essential for DNA replication, some genes encoding proteins with key functions have yet to be identified. Examples of these are the DnaC protein mentioned above as well as Dna.theta. and Dna.delta. whose functions are less well understood but are thought to be involved in the assembly and processivity of polymerase III. Also apparently absent is a specific RNaseH protein responsible for the hydrolysis of the RNA primer synthesized during lagging strand synthesis.

[0240] DNA Repair. It has been suggested that in E. coli as many as 100 genes are involved in DNA repair (Kornberg, A. and Baker, T. A., DNA Replication--2nd Ed., W. H. Freeman and Co., New York (1992)), and in H. influenzae the number of putatively identified DNA repair enzymes is approximately 30 (Fleischmann, R. et al., Science 269:496 (1995)). Although M. genitalium appears to have the necessary genes to repair many of the more common lesions in DNA, the number of genes devoted to the task is much smaller. Excision repair of regions containing missing bases (apurinic/apyriminic (AP) sites) can likely occur by a pathway involving endonuclease IV (info), Pol I, and ligase. The ung gene which encodes uracil-DNA glycosylase is present. This activity removes uracil residues from DNA which usually arise by spontaneous deamination of cytosine. This produces an AP site which could then be repaired as described above.

[0241] All three genes necessary for production of the uvr ABC exinuclease are present, and along with Pol I, helicase II, and ligase should provide a mechanism for repair of damage such as cross-linking, which requires replacement of both strands. Although recA is present, which in E. coli is activated as it binds to single strand DNA, thereby initiating the SOS response, we find no evidence for a lexA gene which encodes the repressor which regulates the SOS genes. We have not identified photolyase (phr) in M. genitalium which repairs UV-induced pyrimidine dimers, or other genes involved in reversal of DNA damage rather than excision and replacement of the lesion.

[0242] Transcription. The critical components for transcription were identified in M. genitalium. In addition to the a, b, and b-prime subunits of the core RNA polymerase, M. genitalium appears to encode a single & factor, whereas E. coli and B. subtilis encode at least six and seven, respectively. We have not detected a homolog of the Rho termination factor gene, so it seems likely that a mechanism similar to Rho-independent termination in E. coli operates in M. genitalium. We have clear evidence for homologs of only two other genes which modulate transcription, nusA and nusG.

[0243] Translation. M. genitalium possesses a single rRNA operon which contains three rRNA subunits in the order: 16S rRNA(1518 bp)-spacer (203 bp)-23S rRNA (2905 bp)-spacer (56 bp)-5S rRNA (103 bp). The small subunit rRNA sequence was compared with the Ribosomal Database Project's (Maidak, B. L. et al., Nucleic Acids Research 22:3485 (1994)) prokaryote database with the program "similarity_yank." Our sequence is identical to the M. genitalium (strain G37) sequence deposited there, and the 10 most similar taxa returned by this search are also in the genus Mycoplasma.

[0244] A total of 33 tRNA genes were identified in M. genitalium, these were organized into five clusters plus nine single genes. In all cases, the best match for each tRNA gene in M. genitalium was the corresponding gene in M. pneumoniae (Simoneau, P. et al., Nuc. Acids Res. 21:4967 (1993)). Furthermore, the grouping of tRNAs into clusters (tmA, trnB, trnC, trnD, and trnE) was identical in M. genitalium and M. pneumoniae as was gene order within the cluster (Simoneau, P. et al., Nuc. Acids Res. 21:4967 (1993)). The only difference between M. genitalium and M. pneumoniae observed with regard to tRNA gene organization was an inversion between trnD and GTG. In contrast to H. influenzae and many other eubacteria, no tRNAs were found in the spacer region between the 16S and 23S rRNA genes in the rRNA operon of M. genitalium, similar to what has been reported for M. capricolum (Sawada, M. et al., Mol. Gen. Genet. 182:502 (1981)).

[0245] A search of the M. genitalium genome for tRNA synthetase genes identified all of the expected genes with the exception of glutaminyl tRNA synthetase. We expect that this gene is present in the M. genitalium genome, but we have not been able to identify it by similarity searches. The latest GenBank release (release 89) contains only a single entry for a glutaminyl tRNA synthetase from a bacterial species; this was from E. coli, a gram-negative organism only distantly related to Mycoplasma. In general, tRNA synthetase sequences from gram-positive organisms such as B. subtilis displayed greater similarity to those from M. genitalium than the corresponding sequences from E. coli, lending support to the notion that the similarity between the E. coli and M. genitalium glutaminyl tRNA synthetase may not have been high enough to be detected.

[0246] Metabolic pathways. The reduction in genome size among Mycoplasma species is associated with a marked reduction in the number and components of biosynthetic pathways in these organisms, requiring them to use metabolic products from their hosts. In the laboratory, M. genitalium has not been grown in a chemically defined medium. The complex growth requirements of this organism can be explained by the almost complete lack of enzymes involved in amino acid biosynthesis, de novo nucleotide biosynthesis, and fatty acid biosynthesis (Table 6 and FIGS. 5A-5R). When the number of genes in the categories of central intermediary metabolism, energy metabolism, and fatty acid and phospholipid metabolism are summed, marked differences in gene content between H. influenzae and M. genitalium are apparent. For example, whereas the H. influenzae genome contains 68 genes involved in amino acid biosynthesis, the M. genitalium genome contains only one. In total, the H. influenzae genome has 167 genes associated with metabolic pathways whereas the M. genitalium genome has just 42. A recent analysis of 214 kb of sequence from Mycoplasma capricolum (Bork, P. et al., Mol. Microbiol. 16:955 (1995)), a related organism whose genome size is twice as large as that of M. genitalium, reveals that M. capricolum contains a number of biosynthetic enzymes not present in M. genitalium. This observation suggests that M. capricolum's larger genome confers a greater anabolic capacity.

[0247] M. genitalium is a facultative anaerobe that ferments glucose and possibly other sugars via glycolysis to lactate and acetate. Genes that encode all the enzymes of the glycolytic pathway were identified, including genes for components of the pyruvate dehydrogenase complex, phosphotransacetylase, and acetate kinase. The major route for ATP synthesis may be through substrate level phosphorylation since no cytochromes are present. M. genitalium also lacks all the components of the tricarboxylic acid cycle. None of the genes coding for glycogen or poly-beta-hydroxybutryate production were identified, indicating limited capacity for carbon and energy storage. The pentose phosphate pathway also appears limited since only genes encoding 6-phosphogluconate dehydrogenase and transketolase were identified. The limited metabolic capacity of M. genitalium sharply contrasts with the complexity of catabolic pathways in H. influenzae, reflecting the four-fold greater number of genes involved in energy metabolism found in H. influenzae.

[0248] Transport. The transporters identified in H. influenzae are specific for a range of nutritional substrates. Using protein transport as an example, both oligopeptide and amino acid transporters are represented. One interesting peptide transporter has homology to a lactococcin transporter (IcnDR3) and related bacteriocin transporters, suggesting the M. genitalium may export a small peptide with antibacterial activity. The H. influenzae isolog of the M. hyorhinis p37 high-affinity transport system also has a conserved lipid modification site, providing further evidence that the Mycoplasma binding-protein dependent transport systems are organized in a manner analogous to gram positive bacteria (Gilson, E. et al., EMBO J. 7:3971 (1988)).

[0249] Genes encoding proteins that function in the transport of glucose via the phosphoenolpyruvate:sugar transferase system (PTS) have been identified in M. genitalium. These include enzyme I (EI), HPr and sugar specific enzyme IIs (EII) (Postma, P. W. et al., Microbiol. Rev. 57:543 (1993)). EIIs consist of a complex of at least there domains, EIIA, EIIB and EIIC. In some bacteria (e.g., E. coli), EIIA is a soluble protein, while in others (Bacillus subtilis), a single membrane protein contains all three domains, EIIA, B and C. These variations in the proteins that make up the Ell complex are due to fusion or splitting of domains during evolution and are not considered to be mechanistic differences (Postma, P. W. et al., Microbiol. Rev. 57:543 (1993)). In M. genitalium EIIA, B, and C are located in a single protein similar to the protein found in B. subtilis. In Mycoplasma capricolum ptsH, the gene which encodes for HPr, is located on a monocistronic transcriptional unit while genes encoding EI (ptsI) and EIIA (crr) are located on a dicistronic operon (Zhu, P. P. et al., Protein Sci. 3:2115 (1994); Zhu, P. P. et al., J. Biol. Chem. 268:26531 (1993)). In most bacterial species studied to date, ptsl, ptsH, and crr are part of a polycistronic operon (pts operon). In M. genitalium ptsH, ptsI and the gene encoding EIIABC reside at different locations of the genome and thus each of these genes may constitute monocistronic transcriptional units. We have also identified EIIBC component for uptake of fructose; however, other components of the fructose PTS were not found. Thus, M. genitalium may be limited to the use of glucose as an energy source. In contrast, H. influenzae has the ability to use at least six different sugars as a source of carbon and energy.

[0250] Regulatory Systems. It appears that regulatory systems found in other bacteria are absent in M. genitalium. For instance, although two component systems have been described for a number of gram-positive organisms, no sensor or response regulator genes are found in the M. genitalium genome. Furthermore, the lack of a heat shock .sigma. factor raises the question of how the heat shock response is regulated. Another stress faced by all metabolically active organisms is the generation of reactive oxygen intermediates such as superoxide anions and hydrogen peroxide. Although H. influenzae has an oxyR homologue, as well as catalase and superoxide dismutase, M. genitalium appears to lack these genes as well as an NADH peroxidase. The importance of these reactive intermediate molecules in host cell damage suggests that some as yet unidentified protective mechanism may exist within the cell.

[0251] Antigenic variation. Numerous examples exist of microbial pathogens expressing outer membrane proteins that vary due to DNA rearrangements as a mechanism for providing antigenic and functional variations that influence virulence potential (Bergstrom, S. et al., Proc. Natl. Acad. Sci. USA 83:3890 (1986); Meier, J. T. et al., Cell 47:61 (1986); Majiwa, P. A. O. et al., Nature 297:514 (1982)). Because humans are the natural host for both M. genitalium and H. influenzae, it was of interest to compare mechanisms for generating antigenic variation in these organisms. In H. influenzae, a number of virulence-related genes encoding membrane proteins contain tandem tetramer repeats that undergo frequent addition and deletion of one or more repeat units during replication, such that the reading frame of the gene is changed and its expression altered (Weiser, J. N. et al., Cell 59:657 (1989)).

[0252] M. genitalium appears to use a different system for evading host immune responses. The 140 kDa adhesion protein of M. genitalium is densely clustered at a differentiated tip of this organism and elicits a strong immune response in humans and experimentally infected animals (Collier, A. M. et al., Zbl. Bkt. Suppl. 20:73 (1992)). The adhesion protein (MgPa) operon in M. genitalium contains a 29 kDa ORF, the MgPa protein (160 kDa) and a 114 kDa ORF with intervening regions of 6 and 1 nt, respectively (Inamine, J. M. et al., Gene 82:259 (1989)). Based on hybridization experiments (Dallo, S. F. and Baseman, J. B., Microb. Pathog. 8:371 (1990)), multiple copies of regions of the M. genitalium MgPa gene and the 114 kDa ORF are known to exist throughout the genome.

[0253] The availability of the complete genomic sequence from M. genitalium has allowed a comprehensive mapping of the MgPa repeats (FIGS. 4 and 6). In addition to the complete operon, nine repetitive elements which are composites of particular regions of the MgPa operon were found. The percent of sequence identity between the repeat elements and the MgPa gene ranges from 78%-90%. In some of the repeats, the MgPa-related sequences are separated in the genome by a variable length, A-T rich spacer sequence, as has previously been described (Peterson, S. N., PhD dissertation, Univ. No. Carolina 1992, Univ. Mi. Dissertation Services #6246). The sequences contained in the MgPa operon and the nine repeats scattered throughout the chromosome represent 4.5% of the total genomic sequence. At first glance this might appear to contradict the expectation for a minimal genome. However, recent evidence for recombination between the repetitive elements and the MgPa operon has been reported (Peterson, S. N. et al., Proc. Natl. Acad. Sci. USA, in press (1995)). Such recombination may allow M. genitalium to evade the host immune response through mechanisms that induce antigenic variation within the population. Since M. genitalium survives in nature by obtaining essential nutrients from its mammalian host, an efficient mechanism to evade the immune response may be a necessary part of this minimal genome.

[0254] The M. genitalium genome contains 93 putatively identified genes that are apparently not present in H. influenzae. Almost 60% of these genes have database matches to known or hypothetical proteins from gram-positive bacteria or other Mycoplasma species, suggesting that these genes may encode proteins with a restricted phylogenetic distribution. One hundred seventeen potential coding regions in M. genitalium have no database match to any sequences in public archives including the entire H. influenzae genome; therefore, these likely represent novel. genes in M. genitalium, and related organisms.

[0255] The predicted coding sequences of the hypothetical ORFs, the ORFs with motif matches and the ORFs that have no similarities to known peptide sequences were analyzed. The two programs used were the Kyte-Doolittle algorithm (Kyte, J. and Doolittle, R. F., J. Mol. Biol. 157:105 (1982)) with a range of 11 residues, and PSORT which is available on the WWW site http://psort.nibb.ac.jp. PSORT predicts the presence of signal sequences by the methods of McGeoch (McGeoch, D. J., Virus Res. 3:271 (1985)) and von Heijne (von Heijne, G., Nucl. Acids Res. 14:4683 (1986)), and detects potential transmembrane domains by the method of Klein et al. (Klein, P. et al., Biochim. Biophys. Acta 815:468 (1985)). Of a total of 201 ORFs examined, 90 potential membrane proteins were found. Eleven of them are predicted to have type I signal peptides, and five type II signal peptides. Using this approach, at least fifty potential membrane proteins were identified from the list of ORFs with known functions. This brings the total number of membrane proteins in M. genitalium to approximately 140.

[0256] To manage these putative membrane proteins, M. genitalium has at its disposal a minimal secretary machinery composed of seven functions: three chaperoning GroEL, DnaK and the trigger factor Tig (Pugsley, A. P., Microbiol. Rev. 57:50 (1993); Guthrie, B. and Wickner, W., J. Bacteriol. 172:5555 (1990), an ATPase pilot protein SecA, one integral membrane protein translocase (SecY), a signal recognition particle protein (Ffh) and a lipoprotein-specific signal peptidase LspA (Pugsley, A. P., Microbiol. Rev. 57:50 (1993)). Perhaps the lack of other known translocases like SecE, SecD, and SecF which are present in E. coli and H. influenzae , is related to the fact that M. genitalium has a one-layer cell envelope. Also, the absence of a SecB homologue, the secretory chaperonin of E. coli, in M. genitalium (it is also absent in B. subtilis (Collier, D. N. J. Bacteriol. 176:4937 (1994))) might reflect a difference between gram negative and wall-less Mollicutes in handling nascent proteins destined for the general secretory pathway. Considering the presence of several putative membrane proteins that contain type I signal peptides, the absence of a signal peptidase I (lepB) is most surprising. A direct electronic search for the M. genitalium lepB gene using the E. coli lepB and the B. subtilis sipS (van Dijil, J. M. et al., EMBO J. 11:2819 (1992)) as queries did not reveal any significant similarities.

[0257] There are a number of possible explanations as to why genes encoding some of the proteins thought to be essential for a self-replicating organism appear to be absent in M. genitalium. One possibility is that a limited number of proteins may have adapted to take on other functions. A second possibility is that certain proteins thought to be essential for life based on studies in E. coli are not required in a simpler prokaryote like M. genitalium. Finally, it may be that sequences from M. genitalium have such a low similarity to known sequences from other species that matches are not detectable above a reasonable confidence threshold.

[0258] Determination of the complete genome sequence of M. genitalium provides a new starting point in understanding the biology of this and related organisms. Comparison of the genes expressed in M. genitalium, a simple prokaryote, with those in H. influenzae, a more complex organism, has revealed a myriad of differences between these species. Fifty-six percent of the genes in M. genitalium have apparent isologs in H. influenzae, suggesting that this subset of the M. genitalium genome may encode the genes that are truly essential for a self-replicating organism. Notable among the genes that are conserved between M. genitalium and H. influenzae are those involved in DNA replication and repair, transcription and translation, cell division, and basic energy metabolism via glycolysis. Isologs of these genes are found in eukaryotes as well.

EEXAMPLE 2

[0259] Production of an Antibody to a Mycoplasma genitalium Protein

[0260] Substantially pure protein or polypeptide is isolated from the transfected or transformed cells using any one of the methods known in the art. The protein can also be produced in a recombinant prokaryotic expression system, such as E. coli, or can by chemically synthesized. Concentration of protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be prepared as follows:

[0261] Monoclonal Antibody Production by Hybridoma Fusion

[0262] Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from murine hybridomas according to the classical method of Kohler, G. and Milstein, C., Nature 256:495 (1975) or modifications of the methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay procedures, such as ELISA, as originally described by Engvall, E., Meth. Enzymol. 70:419 (1980), and modified methods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et al. Basic Methods in Molecular Biology Elsevier, New York. Section 21-2 (1989).

[0263] Polyclonal Antibody Production by Immunization

[0264] Polyclonal antiserum containing antibodies to heterogeneous epitopes of a single protein can be prepared by immunizing suitable animals with the expressed protein described above, which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody production is affected by many factors related both to the antigen and the host species. For example, small molecules tend to be less immunogenic than other and may require the use of carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appears to be most reliable. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al., J. Clin. Endocrinol. Metab. 33:988-991 (1971).

[0265] Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as determined semi-quantitatively, for example, by double immunodifflision in agar against known concentrations of the antigen, begins to fall (See Ouchterlony, O. et al., Chap. 19 in: Handbook ofExperimental Immunology, Wier, D., ed, Blackwell (1973)). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12 .mu.M). Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher, D., Chap. 42 in: Manual of Clinical Immunology, second edition, Rose and Friedman, (eds.), Amer. Soc. For Microbio., Washington, D.C. (1980).

[0266] Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample.

EXAMPLE 3

[0267] Preparation of PCR Primers and Amplification of DNA

[0268] Various fragments of the Mycoplasma genitalium genome, such as those disclosed in Tables 1a, 1b, 1c and 2 can be used, in accordance with the present invention, to prepare PCR primers for a variety of uses. The PCR primers are preferably at least 15 bases, and more preferably at least 18 bases in length. When selecting a primer sequence, it is preferred that the primer pairs have approximately the same G/C ratio, so that melting temperatures are approximately the same. The PCR primers and amplified DNA of this Example find use in the examples that follow.

EXAMPLE 4

[0269] Gene Expression from DNA Sequences Corresponding to ORFs

[0270] A fragment of the Mycoplasma genitalium genome provided in Tables 1a, 1b, 1c and 2 is introduced into an expression vector using conventional technology (techniques to transfer cloned sequences into expression vectors that direct protein translation in mammalian, yeast, insect or bacterial expression systems are well known in the art). Commercially available vectors and expression systems are available from a variety of suppliers including Stratagene (La Jolla, Calif.), Promega (Madison, Wis.), and Invitrogen (San Diego, Calif.). If desired, to enhance expression and facilitate proper protein folding, the codon context and codon pairing of the sequence may be optimized for the particular expression organism, as explained by Hatfield et al., U.S. Pat. No. 5,082,767, which is hereby incorporated by reference.

[0271] The following is provided as one exemplary method to generate polypeptide(s) from cloned ORFs of the Mycoplasma genome fragment. Since the ORF lacks a poly A sequence because of the bacterial origin of the ORF, this sequence can be added to the construct by, for example, splicing out the poly A sequence from pSG5 (Stratagene) using BglI and SalI restriction endonuclease enzymes and incorporating it into the mammalian expression vector pXT1 (Stratagene) for use in eukaryotic expression systems. pXT1 contains the LTRs and a portion of the gag gene from Moloney Murine Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. The vector includes the Herpes Simplex thymidine kinase promoter and the selectable neomycin gene. The Mycoplasma DNA is obtained by PCR from the bacterial vector using oligonucleotide primers complementary to the Mycoplasma DNA and containing restriction endonuclease sequences for PstI incorporated into the 5' primer and BgllI at the 5' end of the corresponding Mycoplasma DNA 3' primer, taking care to ensure that the Mycoplasma DNA is positioned such that its followed with the poly A sequence. The purified fragment obtained from the resulting PCR reaction is digested with PstI, blunt ended with an exonuclease, digested with BglII, purified and ligated to pXT1, now containing a poly A sequence and digested BglII.

[0272] The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island, N.Y.) under conditions outlined in the product specification. Positive transfectants are selected after growing the transfected cells in 600 .mu.g/ml G418 (Sigma, St. Louis, Mo.). The protein is preferably released into the supernatant. However if the protein has membrane binding domains, the protein may additionally be retained within the cell or expression may be restricted to the cell surface.

[0273] Since it may be necessary to purify and locate the transfected product, synthetic 15-mer peptides synthesized from the predicted Mycoplasma DNA sequence are injected into mice to generate antibody to the polypeptide encoded by the Mycoplasma DNA.

[0274] If antibody production is not possible, the Mycoplasma DNA sequence is additionally incorporated into eukaryotic expression vectors and expressed as a chimeric with, for example, .beta.-globin. Antibody to .beta.-globin is used to purify the chimeric. Corresponding protease cleavage sites engineered between the .beta.-globin gene and the Mycoplasma DNA are then used to separate the two polypeptide fragments from one another after translation. One useful expression vector for generating .beta.-globin chimerics is pSG5 (Stratagene). This vector encodes rabbit .beta.-globin. Intron II of the rabbit .beta.-globin gene facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct increases the level of expression. These techniques as described are well known to those skilled in the art of molecular biology. Standard methods are published in methods texts such as Davis et al. and many of the methods are available from the technical assistance representatives from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be produced from either construct using in vitro translation systems such as In vitro Express.TM. Translation Kit (Stratagene).

[0275] While the present invention has been described in some detail for purposes of clarity and understanding, one skilled in the art will appreciate that various changes in formn and detail can be made without departing from the true scope of the invention.

[0276] All patents, patent applications and publications recited herein are hereby incorporated by reference.

1TABLE 1(a) UID end5 end3 db_match db_match name per_id per_sim gene_len MG006 8552 9181 SP-P00572 thymidylate kinase (CDC8) {Saccharomyces cerevisiae} 27.5862 51.7241 630 MG009 11252 12037 GB: D26185.sub.-- hypothetical protein (GB: D26185_102) {Bacillus subtilis} 35.4331 55.1181 786 102 MG010 12069 12722 SP: P33655 DNA primase (dnaE) {Clostridium acetobutylicum} 25.731 53.2164 654 MG012 14247 13573 SP: P17116 ribosomal protein S6 modification protein (rimK) {Escherichia coli} 31.4961 54.3307 675 MG013 15217 14399 GB: D10588_1 5,10-methylene-tetrahydrofolate dehydrogenase (folD) {Escherichia 33.0472 53.2189 819 coli} MG015 17474 19240 SP: P27299 transport ATP-binding protein (msbA) {Escherichia coli} 32.2382 57.4949 1767 MG023 26478 27341 GB: M22039_4 fructose-bisphosphate aldolase (tsr) {Bacillus subtilis} 45.9649 65.9649 864 MG024 27345 28445 GP: U02423_1 GTP-binding protein (gtpl) {Escherichia coli} 46.8401 67.658 1101 MG032 36978 38975 GB: M63489_1 ATP-dependent nuclease (addA) {Bacillus subtilis} 26.8293 54.2683 1998 MG033 39242 39901 GB: M99611_2 glycerol uptake facilitator (glpF) {Bacillus subtilis} 35.8974 55.3846 660 MG034 40514 39876 GB: M97678_5 thymidine kinase (tdk) {Bacillus subtilis} 48.1283 69.5187 639 MG035 40543 41784 GB: U00011_2 histidyl-tRNA synthetase (hisS) {Mycobacterium leprae} 30.7107 50.7614 1242 MG038 46277 44754 GB: L19201_68 glycerol kinase (glpK) {Escherichia coli} 46.8254 70.2381 1524 MG039 47422 46271 PIR: S48379 glycerol-3-phospate dehydrogenase (GUT2) {Saccharomyces 43.2099 60.4938 1152 cerevisiae} MG041 49377 49640 GB: L22432_2 phosphohistidinoprotein-hexose phosphotransferase (ptsH) 48.8636 70.4545 264 {Mycoplasma capricolum} MG042 50060 51517 GB: M64519_1 spermidine/putrescine transport ATP-binding protein (potA) 41.9231 65.3846 1458 {Escherichia coli} MG043 51525 52379 GB: M64519_2 spermidine/putrescine transport system permease protein (potB) 26.5116 57.2093 855 {Escherichia coli} MG044 52366 53217 GB: M64519_3 spermidine/putrescine transport system permease protein (potC) 29.4574 58.1395 852 {Escherichia coli} MG046 54658 55602 GB: M62364_1 sialoglycoprotease (gcp) {Pasteurella haemolytica} 36.6013 59.4771 945 MG048 58310 56973 SP: P37105 signal recognition particle protein (ffh) {Bacillus subtilis} 43.0206 66.1327 1338 MG049 58117 59076 GB: U14003.sub.-- purine-nucleoside phosphorylase (deoD) {Escherichia coli} 44.7826 63.0435 960 295 MG050 59083 59751 GB: X13544_1 deoxyribose-phosphate aldolase (deoC) {Mycoplasma pneumoniae} 83.0357 91.5179 669 MG056 65731 64901 GB: D26185_99 hypothetical protein (GB: D26185_99) {Bacillus subtilis} 30.2583 54.6125 831 MG057 66249 65716 GB: D26185.sub.-- hypothetical protein (GB: D26185_104) {Bacillus subtilis} 28.9017 28.9017 534 104 MG067 81047 82594 GB: D00730_1 glutamic acid specific protease (SPase) {Staphylococcus aureus} 28.8462 48.0769 1548 MG070 91065 91916 SP: P34831 ribosomal protein S2 (rpS2) {Spirulina platensis} 34.8 55.2 852 MG077 103104 104324 SP: P24138 oligopeptide transport system permease protein (oppB) {Bacillus 28.0528 58.4158 1221 subtilis} MG078 104320 105447 SP: P26904 oligopeptide transport system permease protein (dciAC) {Bacillus 33.4572 55.0186 1128 subtilis} MG079 105452 106657 SP: P18765 oligopeptide transport ATP-binding protein (amiE) {Streptococcus 47.9412 67.9412 1206 pneumoniae} MG081 109262 109672 SP: P29395 ribosomal protein L11 (RPL11) {Thermotoga maritima} 51.7986 71.9424 411 MG085 111790 112722 PIR: S24760 hydroxymethylglutaryl-CoA reductase (NADPH) {Nicotiana 23.3216 49.1166 933 sylvestris} MG086 112718 113863 GB: L13259_2 prolipoprotein diacylglyceryl transferase (lgt) {Salmonella 29.1262 53.8835 1146 typhimurium} MG091 117553 118032 GB: U04997_2 single-stranded DNA binding protein (ssb) {Haemophilus 21.7949 41.6667 480 influenzae} MG092 118025 118339 GB: U14003.sub.-- ribosomal protein S18 (rpS18) {Escherichia coli} 45.4545 68.1818 315 114 MG093 118345 118794 GB: M57623_1 ribosomal protein L9 (rpL9) {Bacillus stearothermophilus} 32.8859 56.3758 450 MG099 125852 127282 GB: M61151_1 hydrolase (aux2) {Agrobacterium rhizogenes} 32.1212 51.8182 1431 MG106 134826 134149 SP: P27251 formylmethionine deformylase (def) {Escherichia coli} 36.9369 68.4685 678 MG107 134558 135334 GB: L10328_14 5'guanylate kinase (gmk) {Escherichia coli} 42.623 65.0273 777 MG114 141345 142052 GB: M12299_2 phosphatidylglycerophosphate synthase (pgsA) {Escherichia coli} 29.2994 57.3248 708 MG118 143935 144954 SP: P09147 UDP-glucose 4-epimerase (galE) {Escherichia coli} 34.0557 53.87 1020 MG121 148238 149155 SP: P32720 hypothetical protein (SP: P32720) {Escherichia coli} 30.8824 50.7353 918 MG125 153081 153935 GB: L10328_61 hypothetical protein (GB: L10328_61) {Escherichia coli} 31.9149 48.227 855 MG126 154962 153922 GB: M24068_1 tryptophanyl-tRNA synthetase (trpS) {Bacillus subtilis} 41.1585 61.5854 1041 MG127 154998 155432 SP: P19434 hypothetical protein (SP: P19434) {Streptomyces 25.9615 49.0385 435 viridochromogenes} MG128 155443 156219 GB: U00021_19 hypothetical protein (GB: U00021_19) {Mycobacterium leprae} 27.7027 49.3243 777 MG129 156222 156572 GB: U12340_1 PTS glucose-specific permease {Bacillus stearothermophilus} 25.4545 51.8182 351 MG130 156565 158016 GB: M91593_1 hypothetical protein (GB: M91593_1) {Mycoplasma mycoides} 30.6773 55.7769 1452 MG131 158022 158243 GB: M31161_3 hypothetical protein (GB: M31161_3) {Spiroplasma citri} 21.5909 56.8182 222 MG132 159005 158583 SP: P32083 hypothetical protein (SP: P32083) {Mycoplasma hyorhinis} 30.0971 56.3107 423 MG136 160964 162431 GB: D26185.sub.-- lysyl-tRNA synthetase (lysS) {Bacillus subtilis} 45.6212 68.4318 1470 144 UID end5 end3 db_match db_match name per_id per_sim gene_len MG137 162376 163587 GP: L41518_4 dTDP-4-dehydrorhamnose reductase (rfbD) {Klebsiella 32.1622 55.9459 1212 pneumoniae} MG139 165470 167176 GB: L18927_2 hypothetical protein (GB: L18927_2) {Buchnera aphidicola} 28.5714 62.8571 1707 MG143 182853 183188 SP: P09170 hypothetical protein (SP: P09170) {Escherichia coli} 25 53.7037 336 MG145 184055 184861 GB: M35367_1 protein X {Pseudomonas fluorescens} 29.0698 48.4496 807 MG148 187304 188530 GB: L18965_6 hypothetical protein (GB: L18965_6) {Thermophilic bacterial sp.} 25.2874 52.8736 1227 MG150 190048 190365 SP: P38518 ribosomal protein S10 (rpS10) {Thermotoga maritima} 48.913 71.7391 318 MG152 191145 191777 SP: P28601 ribosomal protein L4 (rpL4) {Bacillus stearothermophilus} 39.2345 63.1579 633 MG153 191784 192101 SP: P04454 ribosomal protein L23 (rpL23) {Bacillus stearothermophilus} 38.7097 62.3656 318 MG154 192104 192958 SP: P04257 ribosomal protein L2 (rpL2) {Bacillus stearothermophilus} 58.7814 72.4014 855 MG155 192961 193221 GB: X02613_6 ribosomal protein S19 (rpS19) {Escherichia coli} 58.6207 77.0115 261 MG156 193227 193658 GB: M74770_4 ribosomal protein L22 (rpL22) {Mycoplasma-like organism} 49.0385 67.3077 432 MG157 193664 194467 SP: P02353 ribosomal protein S3 (rpS3) {Mycoplasma capricolum} 46.729 67.2897 804 MG158 194476 194889 SP: P02415 ribosomal protein L16 (rpL16) {Mycoplasma capricolum} 63.5037 78.1022 414 MG159 194892 195491 SP: P38514 ribosomal protein L29 (rpL29) {Thermotoga maritima} 41.6667 65 600 MG160 195494 195748 SP: P10131 ribosomal protein S17 (rpS17) {Mycoplasma capricolum} 51.1905 67.8571 255 MG161 195755 196120 SP: P04450 ribosomal protein L14 (rpLl4) {Bacillus stearothermophilus} 63.1148 86.0656 366 MG162 196123 196446 SP: P04455 ribosomal protein L24 (rpL24) {Bacillus stearothermophilus) 44.5783 66.2651 324 MG163 196455 196994 SP: P08895 ribosomal protein L5 (rpL5) {Bacillus stearothermophilus} 57.5419 77.095 540 MG164 197000 197182 GB: X06414_15 ribosomal protein S14 (rpS14) {Mycoplasma capricolum} 70.4918 83.6066 183 MG165 197179 197601 SP: P04446 ribosomal protein S8 (rpS8) {Mycoplasma capricolum} 46.875 71.0938 423 MG166 197611 198162 SP: P04448 ribosomal protein L6 (rpL6) {Mycoplasma capricolum} 46.9945 66.6667 552 MG167 198167 198511 GB: M57624_1 ribosomal protein L18 (rpL18) {Bacillus stearothermophilus} 42.9825 57.8947 345 MG169 199160 199609 SP: P10138 ribosomal protein L15 (rpL15) {Mycoplasma capricolum} 41.8919 66.2162 450 MG170 199612 201036 SP: P10250 preprotein translocase secY subunit (secY) {Mycoplasma 38.7892 68.1614 1425 capricolum} MG171 201033 201674 GB: M88104_2 adenylate kinase (adk) {Bacillus stearothermophilus} 32.2115 57.6923 642 MG172 201680 202423 GB: D00619_5 methionine amino peptidase (map) {Bacillus subtilis} 36.2903 58.4677 744 MG173 202426 202635 GB: M26414_1 initiation factor 1 (infA) {Bacillus subtilis} 48.5294 67.6471 210 MG174 202649 202759 SP: P38015 ribosomal protein L36 (rpL36) {Chlamydia trachomatis} 78.3784 83.7838 111 MG177 203516 204499 GB: M26414_5 RNA polymerase alpha core subunit (rpoA) {Bacillus subtilis} 39.3939 65.9933 984 MG178 204515 204515 GB: M26414_6 ribosomal protein L17 (rpL17) {Bacillus subtilis} 34.7826 59.1304 369 MG179 204873 205694 SP: P11599 haemolysin secretion ATP-binding protein (hlyB) {Proteus vulgaris} 34.5992 62.0253 822 MG187 216762 218516 GB: M77351_7 ATP-bindingprotein(msmK) {Streptococcus mutans} 40.5325 65.6805 1755 MG188 218522 219508 GB: M77351_4 membrane protein (msmF) {Streptococcus mutans} 22.4719 51.6854 987 MG189 219435 220436 GB: M77351_5 membrane protein (msmG) {Streptococcus mutans} 27.1429 52.8571 1002 MG196 235635 236057 GB: X16188_1 translation initiation factor 1F3 (infC) {Bacillus stearothermophilus} 31.3433 62.6866 423 MG197 236063 236239 PIR: S05347 ribosomal protein L35 (rpL35) {Bacillus stearothermophilus} 60 72.7273 177 MG198 236245 236616 SP: Q05427 ribosomal protein L20 (rpL20) {Mycoplasma fermentans} 57.5221 73.4513 372 MG201 239163 239813 GB: M84964_2 heat shock protein (grpE) {Bacillus subtilis} 31.677 49.6894 651 MG205 245596 244568 GB: M84964_1 hypothetical protein (GB: M84964_1) {Bacillus subtilis} 30.9942 58.1871 1029 MG213 252579 253991 GB: L09228_16 hypothetical protein(GB: L09228_16) {Bacillus subtilis} 27.1186 54.661 1413 MG214 253978 254598 GB: L09228_17 hypothetical protein (GB: L09228_17) {Bacillus subtilis} 34.8571 59.4286 621 MG215 254620 255588 SP: P20275 6-phosphofructokinase (pfk) {Spiroplasma citri} 39.441 63.0435 969 MG217 258040 259155 SP: P29126 bifunctional endo-1,4-beta-xylanase xyla precursor (xynA) 37.5839 48.9933 1116 {Ruminococcus flavefaciens} MG219 265596 266039 GB: M87491_1 IgAl protease {Haemophilus influenzae} 32.2314 51.2397 444 MG220 266382 266077 GB: Z26883_1 pre-procytotoxin (vacA) {Helicobacter pylori} 36.1446 51.8072 306 MG222 267080 268006 GB: D10483_63 hypothetical protein (GB: D10483_63){Escherichia coli} 35.1974 56.5789 927 MG224 269249 270355 GB: U06462_1 cell division protein (ftsZ) {Staphylococcus aureus} 30.8824 50.7353 1107 MG234 279491 279802 GB: K02665_2 ribosomal protein L27 (rpL27) {Bacillus subtilis} 64.3678 80.4598 312 MG235 279798 280670 SP: P12638 endonuclease IV (nfo) {Escherichia coli} 29.368 51.3011 873 MG245 293446 293940 GB: M12965_1 hypothetical protein (GB: M12965_1) {Escherichia coli} 33.8462 56.9231 495 MG247 295484 294768 SP: P31056 hypothetical protein (SP: P31056) {Escherichia coli} 32.973 56.2162 717 MG248 296127 295474 GP: U17284_2 major sigma factor (rpoD) {Listeria monocytogenes} 28.4848 51.5152 654 MG251 300802 299465 GB: L08106_1 glycyl-tRNA synthetase {Bombyx mori} 35.8974 56.1772 1338 MG252 301550 300825 GP: Z33076_2 rRNA methylase {Mycoplasma capricolum} 38.8626 59.7156 726 MG253 302839 301556 GB: D26185.sub.-- cysteinyl-tRNA synthetase (cysS) {Bacillus subtilis} 34.3458 56.3084 1284 156 MG257 307635 307925 GB: L19201_78 ribosomal protein L31 (rpL31) {Escherichia coli} 37.3134 61.194 291 MG258 307928 309004 GB: M11519_1 peptide chain release factor 1 (RF-1) {Escherichia coli} 43.1677 66.4596 1077 MG259 309008 310375 GB: D28567_2 protoporphyrinogen oxidase (hemK) {Escherichia coli} 30.5732 54.1401 1368 MG260 310509 312803 GB: Z32651_1 hypothetical protein (GB: Z32651_1) {Mycoplasma pneumoniae} 57.1429 71.4286 2295 MG262 318330 319202 GB: L11920_1 DNA polymerase I (polI) {Mycobacterium tuberculosis} 29.9419 47.9651 873 MG264 321044 321637 GB: M64324_1 6-phosphogluconate dehydrogenase (gnd) {Escherichia coli} 29.8507 47.7612 594 MG265 322412 321579 GB: L10328_61 hypothetical protein (GB: L10328_61) {Escherichia coli} 27.193 48.6842 834 MG268 325877 325194 GB: U01881_2 deoxyguanosine/deoxyadenosine kinase(I) subunit 2 {Lactobacillus 29.5181 49.3976 684 acidophilus} MG270 328442 327435 GB: U14003.sub.-- hypothetical protein (GB: U14003_297) {Escherichia coli} 38.2838 57.7558 1008 297 MG272 330984 329833 GB: M81753_3 dihydrolipoamide acetyltransferase (pdhC) {Acholeplasma 45.1524 62.0499 1152 laidlawii} MG273 332214 331237 GB: M81753_2 pyruvate dehydrogenase E1-beta subunit (pdhB) {Acholeplasma 55.0314 76.7296 978 laidlawii} MG274 333308 332235 GB: M81753_1 pyruvate dehydrogenase E1-alpha subunit (pdhA) {Acholeplasma 42.9825 61.1111 1074 laidlawii} MG277 338323 335414 GB: L16960_2 spore germination apparatus protein (gerBB) {Bacillus subtilis} 31.2 55.2 2910 MG280 341920 341177 GB: Z35086_1 sensory rhodopsin II transducer (htrII) {Natronobacterium 15.7143 46.6667 744 pharaonis } MG288 353034 351793 GB: L04466_1 protein L {Peptostreptococcus magnus} 31.1475 50.8197 1242 MG290 355119 355853 SP: P15361 ATP-binding protein P29 {Mycoplasma hyorhinis} 32.3009 58.8496 735 MG292 360592 357893 GB: J01581_1 alanyl-tRNA synthetase (alaS) {Escherichia coli} 33.8403 55.64 2700 MG295 364022 362922 SP: P25745 hypothetical protein (SP: P25745) {Escherichia coli} 34.7107 57.0248 1101 MG299 369694 368735 SP: P39646 phosphotransacetylase (pta) {Clostridium acetobutylicum} 44.6541 63.522 960 MG303 373998 372928 GB: M61017_1 membrane transport protein (glnQ) {Bacillus stearothermophilus} 31.982 54.955 1071 MG304 374741 373983 GB: U13043_1 membrane associated ATPase (cbiO) {Propionibacterium 30.0448 53.8117 759 freudenreichii} MG310 386462 387265 GB: D11037_1 proline iminopeptidase (pip) {Bacillus coagulans} 29.2079 51.4851 804 MG311 387892 387278 GB: M59358_1 ribosomal protein S4 (rpS4) {Bacillus subtilis} 43 65.5 615 MG313 392023 391397 GP: L38997_5 cytadherence-accessory protein (hmwl) {Mycoplasma pneumoniae} 53.8462 79.8077 627 MG315 394550 393660 GP: L38997_3 cytadherence accessory protein (hmwl) {Mycoplasma pneumoniae} 44.3878 69.898 891 MG316 395583 394477 GB: L15202_4 competence locus E (comE3) {Bacillus subtilis} 30.4933 52.4664 1107 MG322 405398 403725 GB: D17462_11 Na+ATPase subunit J (ntpJ) {Enterococcus hirae} 31.0811 56.3063 1674 MG323 405455 406135 GB: D37799_6 hypothetical protein (GB: D37799_6) {Bacillus subtilis} 27.5701 54.2056 681 MG325 408953 408795 SP: P23375 ribosomal protein L33 (rpL33) {Bacillus stearothermophilus} 58.1395 69.7674 159 MG326 409857 408973 GB: Z18629_1 hypothetical protein (GB: Z18629_1) {Bacillus subtilis} 27.0758 52.7076 885 MG329 414318 412975 GB: U00021_5 hypothetical protein (GB: U00021_5) {Mycobacterium leprae} 32.1839 54.2529 1344 MG332 416329 415613 GB: D10165_3 hypothetical protein (GB: D10165_3) {Escherichia coli} 26.9231 49.1453 717 MG346 443922 444419 GB: M65289_3 hypothetical protein (GB: M65289_3) {Bacillus stearothermophilus} 37.9747 60.1266 498 MG347 444413 445042 SP: P32049 hypothetical protein (SP: P32049) {Escherichia coli} 28.4615 46.9231 630 MG351 449665 450216 SP: P37981 inorganic pyrophosphatase (ppa) {Thermoplasma acidophilum} 38.8535 61.7834 552 MG355 453757 451616 GB: M29364_2 ATP-dependent protease binding subunit (clpB) {Escherichia coli} 47.7337 70.6799 2142 MG356 454753 453914 GB: M27280_1 lic-1 operon protein (licA) {Haemophilus influenzae} 27.7778 56.25 840 MG359 457347 458267 GB: M21298_2 Holliday junction DNA helicase (ruvB) {Escherichia coli} 34.6939 64.966 921 MG360 459495 458263 SP: P14303 UV protection protein (mucB) {Salmonella typhimurium} 22.0859 48.1595 1233 MG363 460497 460667 GB: M29698_2 ribosomal protein L32 (rpL32) {Escherichia coli} 48.1481 62.963 171 MG364 461015 461686 GB: M95954_1 mobilization protein (mobl3) {Leuconostoc oenos} 30.8725 53.6913 672 MG367 465434 464649 GB: X02673_1 ribonuclease III (mc) {Escherichia coli} 30.1724 65.5172 786 MG380 478999 479574 GB: L10328_105 glucose inhibited division protein (gidB)

{Escherichia coli} 24.8276 51.7241 576 MG382 480691 481329 SP: P31218 uridine kinase (udk) {Escherichia coli} 34.4828 62.5616 639 MG383 482075 481332 GB: M15811_1 sporulation protein (outB) {Bacillus subtilis} 36.3636 54.9784 744 MG384 483369 482071 GB: M24537_2 GTP-binding protein (obg) {Bacillus subtilis} 39.627 62.0047 1299 MG387 490711 489842 SP: P37214 GTP-binding protein era homolog (spg) {Streptococcus mutans} 27.3859 51.0373 870 MG396 500719 500264 GB: M80797_2 galactosidase acetyltransferase (lacA) {Streptococcus mutans} 40.5797 57.971 456 MG398 502823 502425 SP: P33255 ATP synthase epsilon chain (atpC) {Mycoplasma gallisepticum} 36.9231 55.3846 399 MG402 507201 506674 SP: P33254 ATP synthase delta chain (atpH) {Mycoplasma gallisepticum} 33.9181 58.4795 528 MG403 507820 507197 SP: P33256 ATP synthase B chain (atpF) {Mycoplasma gallisepticum} 36.5979 66.4948 624 MG404 508131 507826 SP: P33258 ATP synthase C chain (atpE) {Mycoplasma gallisepticum} 50 74.359 306 MG407 510836 509463 GB: L29475_4 enolase (eno) {Bacillus subtilis} 54.0793 74.1259 1374 MG408 510903 511373 SP: P14930 pilin repressor (pilB) {Neisseria gonorrhoeae} 49.2188 68.75 471 MG409 512050 511376 GB: L10328_88 peripheral membrane protein U (phoU) {Escherichia coli} 27.027 48.6486 675 MG420 524144 523365 GB: D26185_83 DNA polymerase III subunit (dnaH) {Bacillus subtilis} 49.115 68.5841 780 MG424 531479 531222 SP: P05766 ribosomal protein S15 (BS18) {Bacillus stearothermophilus} 48.1481 71.6049 258 MG426 533040 533231 GB: L12244_2 ribosomal protein L28 (rpL28) {Bacillus subtilis} 36.0656 59.0164 192 MG429 536036 534321 GB: M69050_2 PEP-dependent HPr protein kinase phosphoryltransferase (ptsI) 46.4789 66.5493 1716 {Staphylococcus carnosus} MG430 537563 536043 GB: L29475_3 phosphoglycerate mutase (pgm) {Bacillus subtilis} 45.1866 62.4754 1521 MG432 539546 538353 SP: P27712 hypothetical protein (SP: P27712) {Spiroplasma citri} 28.436 48.8152 1194 MG433 539632 540525 GB: M31161_2 elongation factor Ts (tsf) {Spiroplasma citri} 39.0572 62.6263 894 MG434 540848 541237 GB: D26562_56 mukB suppressor protein (smbA) {Escherichia coli} 40.8696 61.7391 390 MG435 541240 541788 GB: D26562_57 ribosome releasing factor (frr) {Escherichia coli} 34.9112 57.3965 549 MG438 543004 544152 GB: J01631_1 restriction-modification enzyme EcoD specificity subunit (hsdS) 24.5734 45.7338 1149 {Escherichia coli} MG442 547690 546881 GB: U00021_5 hypothetical protein (GB: U00021_5) {Mycobacterium leprae} 26.8966 42.069 810 MG443 548849 547665 GB: D16311_1 hypothetical protein (GB: D16311_1) {Bacillus subtilis} 26.1818 52 1185 MG444 549224 548868 SP: P30529 ribosomal protein L19 (rpL19) {Bacillus stearothermophilus} 49.1071 69.6429 357 MG445 549903 549211 SP: P36245 tRNA (guanine-N1)-methyltransferase (trmD) {Salmonella 40.8072 64.1256 693 typhimurium} MG446 550172 549906 SP: P21474 ribosomal protein S16 (BS17) {Bacillus subtilis} 48.7805 64.6341 267 MG448 552897 552448 GB: Z33052_1 pilin repressor (pilB) {Mycoplasma capricolum} 53.4884 72.093 450 MG454 557770 557306 SP: P23929 osmotically inducible protein (osmC) {Escherichia coli} 28.4091 51.1364 465 MG457 562602 560497 GB: D26185.sub.-- cell division protein (ftsH) {Bacillus subtilis} 49.7445 68.1431 2106 132 MG461 566203 564929 GB: X73124_94 hypothetical protein (GB: X73124_94) {Bacillus subtilis} 40 64.2857 1275 MG464 569554 568400 GB: D14982_3 hypothetical protein (GB: D14982_3) {Mycoplasma capricolum} 32.3699 53.7572 1155 MG465 569912 569529 GB: D14982_2 RNaseP C5 subunit (rnpA) {Mycoplasma capricolum} 40 58.75 384 MG466 570027 569884 GB: L10328_67 ribosomal protein L34 (rpL34) {Escherichia coli} 67.3913 80.4348 144 MG470 580030 579224 GB: D26185_55 SpoOJ regulator {Bacillus subtilis} 27.8884 53.3865 807

[0277]

2TABLE 1(b) UID end5 end3 db_match db_match name per_sim per_id match_info MG002 1829 2758 SP: P35514 heat shock protein (dnaJ) {Lactococcus 40 61.6667 MG002(1-930 of 930) lactis} GB: U09251(298-1227 of 6140) MG003 2846 4795 GB: U09251_3 DNA gyrase subunit B (gyrB) {Mycoplasma 99.3846 99.3846 MG003(1-1950 of 1950) genitalium} GB: U09251(1315-3264 of 6140) MG004 4813 7320 GB: U09251_4 DNA gyrase subunit A (gyrA) {Mycoplasma 99.8804 99.8804 MG004(1-2508 of 2508) genitalium} GB: U09251(3282-5789 of 6140) MG191 221571 225902 SP: P20796 attachment protein, MgPa operon (mgp) 100 100 MG191(1-4332 of 4332) {Mycoplasma genitalium} GB: M31431(1066-5397 of 8760) MG192 225907 229062 SP: P22747 114 kDa protein, MgPa operon (mgp) 100 100 MG192(1-3156 of 3156) {Mycoplasma genitalium} GB: M31431(5402-8557 of 8760) MG232 278904 279203 SP: P26908 ribosomal protein L21 (rpL21) {Bacillus 37.8947 65.2632 MG232(1-300 of 300) subtilis} GB: U02141(138-437 of 827) MG233 279199 279495 GP: U02141_2 ribosomal protein L21 homolog 100 100 MG233(1-297 of 297) {Mycoplasma genitalium} GB: U02141(433-729 of 827) MG287 348882 349133 SP: P04686 nodulation protein F (nodE) {Rhizobium 34.9398 56.6265 MG287(1-252 of 252) leguminosarum} GB: U01810(152-403 of 917) MG417 521868 521473 SP: P07842 ribosomal protein S9 (rpS9) {Bacillus 51.9685 71.6535 MG417(1-396 of 396 stearothermophilus} GB: U01744(127-522 of 620)

[0278]

3TABLE 1(c) UID end5 end3 db_match db_match name per_sim per_id match_info MG001 1026 1826 GB: U09251_1 DNA polymerase III beta subunit (dnaN) 100 100 MG001(507-801 of {Mycoplasma genitalium} 801) GB: U09251 (1-295 of 6140) MG005 7295 8545 GB: D26185_77 seryl-tRNA synthetase (serS) {Bacillus subtilis} 42.615 66.3438 MG005(1-377 of 1251) GB: U09251(5764- 6140 of 6140) MG005 7295 8545 GB: D26185_77 seryl-tRNA synthetase (serS) {Bacillus subtilis} 42.615 66.3438 MG005(16-337 of 1251) GB: U02210(1-322 of 322) MG007 9157 9918 GB: D26185_83 DNA polymerase III subunit (dnaH) {Bacillus subtilis} 22.695 45.3901 MG007(762-711 of 762) GB: U02216(270- 321 of 321) MG008 9924 11249 GB: D26185_60 thiophene and furan oxidizer (tdhF) {Bacillus subtilis} 31.9101 59.7753 MG008(264-1 of 1326) GB: U02216(1-264 of 321) MG011 13565 12705 -- -- -- -- MG011(473-767 of 861) GB: U02257(2- 296 of 296) MG014 15556 17424 SP: P27299 transport ATP-binding protein (msbA) {Escherichia coli} 28.0702 52.6316 MG014(1005-678 of 1869) GB: U02235(1- 326 of 326) MG018 21063 22343 SP: P32333 helicase (motl) {Saccharomyces cerevisiae} 36.6972 60.0917 MG018(1281-1067 of 1281) GB: U01723(89- 304 of 304) MG018 21063 22343 SP: P32333 helicase (mot1) {Saccharomyces cerevisiae} 36.6972 60.0917 MG018(409-105 of 1281) GB: U02179(1- 305 of 305) MG018 21063 22343 SP: P32333 helicase (mot1) {Saccharomyces cerevisiae} 36.6972 60.0917 MG018(592-896 of 1281) GB: U01757(1- 305 of 305) MG019 22388 23554 SP: P35514 heat shock protein (dnaJ) {Lactococcus lactis} 33.9779 51.105 MG019(44-1 of 1167) GB: U01723(1- 44 of 304) MG020 23541 24464 GB: Z25461_2 proline iminopeptidase (pip) {Neisseria gonorrhoeae} 37.5439 55.7895 MG020(723-924 of 924) GB: U02229(1- 202 of 333) MG021 24467 26002 GB: D26185_101 methionyl-tRNA synthetase (metS) {Bacillus subtilis} 37.5494 58.8933 MG021(1-129 of 1536) GB: U02229(205- 333 of 333) MG021 24467 26002 GB: D26185_101 methionyl-tRNA synthetase (metS) {Bacillus subtilis} 37.5494 58.8933 MG021(1318-1527 of 1536) GB: X61513(1-209 of 209) MG022 26035 26469 GB: M21677_1 RNA polymerase delta subunit (rpoE) {Bacillus subtilis} 28.6765 49.2647 MG022(254-1 of 435) GB: U01721(1-254 of 299) MG025 28651 29544 GP: Z47767_4 TrsB {Yersinia enterocolitica} 27.551 54.0816 MG025(514-894 of 894) GB: U02253(1-381 of 649) MG026 29551 30120 GB: U14003_62 elongation factor P (efp) {Escherichia coli} 26.3804 47.2393 MG026(1-262 of 570) GB: U02253(388- 649 of 649) MG029 31702 31145 GB: L19300_1 hypothetical protein (GB: L19300_1) 27.027 45.045 MG029(1-93 {Staphylococcus aureus} of 558) GB: U01773 (210-302 of 302) MG030 32324 31707 GB: Z27121_3 uracil phosphoribosyltransferase (upp) 44.9275 66.6667 MG030(414-618 {Mycoplasma hominis} of 618) GB: U01773(1-205 of 302) MG031 36713 32361 GB: U06833_1 DNA polymerase III (polC) 38.0303 59.3182 MG031(1473-1701 {Mycoplasma pulmonis} of 4353) GB: U01807(1- 229 of 229) MG031 36713 32361 GB: U06833_1 DNA polymerase III (polC) 38.0303 59.3182 MG031(2923-3309 {Mycoplasma pulmonis} of 4353) GB: U01712(1- 387 of 387) MG031 36713 32361 GB: U06833_1 DNA polymerase III (polC) {Mycoplasma pulmonis} 38.0303 59.3182 MG031(3330- 3676 of 4353) GB: U02208(1- 347 of 347) MG036 41777 43426 SP: P36419 aspartyl-tRNA synthetase (aspS) {Thermus aquaticus} 40.8582 62.8731 MG036(1115- 1650 of 1650) GB: U01814(1- 532 of 1006) MG036 41777 43426 SP: P36419 aspartyl-tRNA synthetase (aspS) {Thermus aquaticus} 40.8582 62.8731 MG036(1407- 1638 of 1650) GB: X61511(1- 232 of 232) MG036 41777 43426 SP: P36419 aspartyl-tRNA synthetase (aspS) {Thermus aquaticus} 40.8582 62.8731 MG036(1412- 1160 of 1650) GB: X61523(1- 252 of 252) MG037 43402 44751 GP: U02020_1 pre-B cell enhancing factor (PBEF) {Homo sapiens} 34.3164 52.2788 MG037(1- 500 of 1350) GB: U01814(508- 1006 of 1006) MG040 47581 49353 SP: P29724 membrane lipoprotein (tmpC) {Treponema pallidum} 30.8594 48.0469 MG040(1341- 1552 of 1773) GB: U02125(1- 212 of 212) MG045 53205 54653 -- -- -- -- MG045(381- 4 of 1449) GB: U02166(1- 378 of 378) MG047 55589 56737 SP: P30869 S-adenosylmethionine synthetase 2 (metX) 43.6111 60.5556 MG047(787- {Escherichia coli} 1070 of 1149) GB: U02123(1- 284 of 284) MG051 59741 61003 GB: L13289_3 thymidine phosphorylase (deoA) {Mycoplasma pirum} 52.7316 73.6342 MG051(1161- 1263 of 1263) GB: U02191(1- 103 of 183) MG052 61015 61404 GB: L13289_4 cytidine deaminase (cdd) {Mycoplasma pirum} 38.2114 64.2276 MG052(1- 69 of 390) GB: U02191(115- 183 of 183) MG052 61015 61404 GB: L13289_4 cytidine deaminase (cdd) {Mycoplasma pirum} 38.2114 64.2276 MG052(320- 390 of 390) GB: U02108(1- 71 of 212) MG053 61407 63056 GB: L13289_5 phosphomannomutase (cpsG) {Mycoplasma pirum} 38.7868 58.0882 MG053(1- 140 of 1650) GB: U02108(74- 212 of 212) MG054 63986 63039 GB: D13303_4 transcription antitermination factor (nusG) 30.8571 51.4286 MG054(688- {Bacillus subtilis} 44 of 948) GB: U01710(1- 645 of 645) MG054 63986 63039 GB: D13303_4 transcription antitermination factor (nusG) 30.8571 51.4286 MG054(948- {Bacillus subtilis} 719 of 948) GB: U02236(45- 274 of 276) MG055 64361 63993 -- -- -- -- MG055(1- 326 of 369) GB: U02240(23- 348 of 348) MG058 67121 66231 GB: D26185_114 phosphoribosylpyrophosphate synthetase (prs) 44.4089 63.5783 MG058(72 - {Bacillus subtilis} 1 of 891) GB: U01693(1- 72 of 350) MG059 67644 67210 GB: D12501_1 small protein (smpB) {Escherichia coli} 32.5581 62.0155 MG059(435- 247 of 435) GB: U01693(161- 350 of 350) MG060 67651 68541 SP: P26401 lipopolysaccharide biosynthesis protein (rfbV) 36.0656 59.8361 MG060(723- {Salmonella typhimurium} 396 of 891) GB: U02262(1- 328 of 328) MG061 69908 68526 GB: M89480_4 hexosephosphate transport protein (uhpT) 30.9091 57.2727 MG061(1273- {Salmonellatyphimurium} 613 of 1383) GB: U01705(1- 661 of 661) MG062 70531 72570 SP: P20966 fructose-permease IIBC component (fruA) 42.723 60.5634 MG062(439- {Escherichia coli} 761 of 2040) GB: U02138(1- 323 of 323) MG063 72668 73432 SP: P23539 1-phosphofructokinase (fruK) {Escherichia coli} 26.3158 51.5038 MG063(363- 626 of 765) GB: U01777(1- 264 of 264) MG065 77686 79083 GB: X75422_1 heterocyst maturation protein (devA) {Anabaena sp.} 35.2941 59.7285 MG065(1398- 1176 of 1398) GB: U02154(133- 354 of 354) MG066 79090 81033 SP: P27302 transketolase 1 (TK 1) (tktA) 32.5617 54.9383 MG066(126- {Escherichia coli} 1 of 1944) GB: U02154(1- 126 of 354) MG068 82621 84042 -- -- -- -- MG068(1244- 919 of 1422) GB: U02162(1- 326 of 326) MG069 88228 90951 SP: P20166 phosphotransferase enzyme II, ABC component (ptsG) 43.1596 61.0749 MG069(1127- {Bacillus subtilis} 849 of 2724) GB: U02207(l- 279 of 279) MG071 91924 94545 SP: P37278 cation-transporting ATPase (pacL) 34.3897 57.277 MG071(1470- {Synechococcus sp.} 1209 of 2622) GB: X61532(1- 262 of 262) MG072 94535 96952 GB: D10279_2 preprotein translocase (secA) {Bacillus subtilis} 43.6601 66.7974 MG072(2269- 2418 of 2418) GB: U01743(1- 150 of 365) MG073 96933 98900 SP: P07025 excinuclease ABC subunit B (uvrB) 47.9751 67.2897 MG073(1- {Escherichia coli} 235 of 1968) GB: U01743(131- 365 of 365) MG073 96933 98900 SP: P07025 excinuclease ABC subunit B (uvrB) 47.9751 67.2897 MG073(1584- {Escherichia coli} 1240 of 1968) GB: U01698(1- 345 of 345) MG073 96933 98900 SP: P07025 excinuclease ABC subunit B (uvrB) 47.9751 67.2897 MG073(305- {Escherichia coli} 694 of 1968) GB: U02119(1- 391 of 391) MG074 98906 99316 -- -- -- -- MG074(369- 411 of 411) GB: U01715(1- 43 of 576) MG075 99383 102454 -- -- -- -- MG075(1- 467 of 3072) GB: U01715(110- 576 of 576) MG075 99383 102454 -- -- -- -- MG075(1206- 804 of 3072) GB: U02251(1- 403 of 403) MG075 99383 102454 -- -- -- -- MG075(1927- 2210 of 3072) GB: U01749(1- 284 of 284) MG075 99383 102454 -- -- -- -- MG075(2841- 2422 of 3072) GB: U01775(1- 420 of 420) MG080 106660 109203 SP: P18766 oligopeptide transport ATP-binding protein (amiF) 46.6403 67.1937 MG080(2268- {Streptococcus pneumoniae} 1954 of 2544) GB: U02129(1- 315 of 315) MG080 106660 109203 SP: P18766 oligopeptide transport ATP-binding protein (amiF) 46.6403 67.1937 MG080(951- {Streptococcus pneumoniae} 646 of 2544) GB: U01758(1- 306 of 306) MG082 109675 110352 SP: P04447 ribosomal protein L1 (rpL1) 48.1982 67.5676 MG082(446- {Bacillus stearothermophilus} 170 of 678) GB: U02113(1- 278 of 278) MG083 110355 110921 GB: L32144_1 peptidyl-tRNA hydrolase homolog (pth) 38.2166 57.3248 MG083(567- {Borrelia burgdorferi} 220 of 567) GB: U02185(26- 373 of 373) MG084 110917 111786 SP: P37563 hypothetical protein (SP: P37563) 28.125 46.3542 MG084(30- {Bacillus subtilis} 1 of 870) GB: U02185(1- 30 of 373) MG084 110917 111786 SP: P37563 hypothetical protein (SP: P37563) {Bacillus subtilis} 28.125 46.3542 MG084(794- 870 of 870) GB: U01783(1- 77 of 269) MG087 113895 114311 SP: P09901 ribosomal protein S12 (rpS12) 75.3731 82.0896 MG087(417- {Bacillus stearothermophilus} 349 of 417) GB: U02212(326- 394 of 394) MG088 114331 114795 SP: P22744 ribosomal protein S7 (rpS7) 64.9351 81.1688 MG088(305- {Bacillus stearothermophilus} 1 of 465) GB: U02212(2- 306 of 394) MG089 114808 116871 SP: P13551 elongation factor G (fus) 59.2105 78.0702 MG089(1878- {Thermus aquaticus} 1540 of 2064) GB: U02180(1- 339 of 340) MG089 114808 116871 SP: P13551 elongation factor G (fus) {Thermus aquaticus} 59.2105 78.0702 MG089(1885- 2064 of 2064) GB: U02136(1- 180 of 410) MG089 114808 116871 SP: P13551 elongation factor G (fus) {Thermus aquaticus} 59.2105 78.0702 MG089(687- 1374 of 2064) GB: U01722(1- 688 of 688) MG090 116926 117549 SP: P02358 ribosomal protein S6 (rpS6) {Escherichia coli} 23.8636 44.3182 MG090(1- 176 of 624) GB: U02136(235- 410 of 410) MG094 118847 120184 SP: P03005 replicative DNA helicase (dnaB) 33.105 55.0228 MG094(1068- {Escherichia coli} 731 of 1338) GB: U01803(1- 336 of 336) MG094 118847 120184 SP: P03005 replicative DNA helicase (dnaB) 33.105 55.0228 MG094(228- {Escherichia coli} 1 of 1338) GB: U02158(1- 228 of 301) MG095 120191 121384 -- -- -- -- MG095(355- 759 of 1194) GB: U01787(1- 403 of 403) MG096 121939 123519 -- -- -- -- MG096(1- 309 of 1581) GB: U01713(58- 366 of 366) MG096 121939 123519 -- -- -- -- MG096(361- 531 of 1581) GB: U01762(1- 171 of 171) MG097 123579 124313 GB: D13169_3 uracil DNA glycosylase (ung) 32.5688 51.8349 MG097(220- {Escherichia coli} 694 of 735) GB: U02201(1- 475 of 475) MG098 124416 125846 GP: M74170_2 p48 eggshell protein (p48) {Schistosoma mansoni} 23.0769 47.9853 MG098(1260- 831 of 1431) GB: U01782(1- 431 of 431) MG098 124416 125846 GP: M74170_2 p48 eggshell protein (p48) {Schistosoma mansoni} 23.0769 47.9853 MG098(134- 467 of 1431) GB: U01701(1- 334 of 334) MG100 127278 128708 GP: L22072_1 PET112 protein {Saccharomyces cerevisiae} 30.8696 54.1304 MG100(533- 238 of 1431) GB: U01799(1- 296 of 296) MG101 128686 129351 -- -- -- -- MG101(89- 398 of 666) GB: U02103(1- 309 of 309) MG102 129347 130291 GB: J03762_1 thioredoxin reductase (trxB) 38.5906 59.396 MG102(45- {Escherichia coli} 367 of 945) GB: U02197(1- 322 of 322) MG103 130284 131123 -- -- -- -- MG103(623- 256 of 840) GB: U02170(1- 368 of 369) MG104 131384 133558 GB: U14003_91 virulence associated protein homolog (vacB) 29.2335 52.2282 MG104(215- {Escherichia coli} 491 of 2175) GB: U01795(1- 277 of 277) MG108 135337 136116 SP: P35182 protein phosphatase 2C homolog (ptc1) 27.5362 52.1739 MG108(780- {Saccharomyces cerevisiae} 598 of 780) GB: U02111(33- 215 of 215) MG109 136179 137264 PIR: S36944 protein serine/threonine kinase {Arabidopsis thaliana} 33.7398 52.0325 MG109(425- 786 of 1086) GB: U01720(1- 362 of 362) MG109 136179 137264 PIR: S36944 protein serine/threonine kinase {Arabidopsis thaliana} 33.7398 52.0325 MG109(781- 1084 of 1086) GB: U01748(1- 303 of 303) MG110 137380 138087 GB: U14003_76 hypothetical protein (GB: U14003_76) 28.5714 54.1126 MG110(140- {Escherichia coli} 242 of 708) GB: X61518(1- 102 of 102) MG110 137380 138087 GB: U14003_76 hypothetical protein (GB: U14003_76) 28.5714 54.1126 MG110(670- {Escherichia coli} 378 of 708) GB: U01714(1- 293 of 293) MG111 138105 139403 SP: P13376 phosphoglucose isomerase B (pgiB) 34.8235 53.6471 MG111(1- {Bacillus stearothermophilus} 98 of 1299) GB: U01747(38- 135 of 135) MG112 139396 140022 GB: M64173_3 D-ribulose-5-phosphate 3 epimerase (cfxEc) 33.1361 53.8462 MG112(207- {Alcaligenes eutrophus} 473 of 627) GB: U02181(1- 267 of 267) MG113 140039 141406 GB: M33145_1 asparaginyl-tRNA synthetase (asnS) {Escherichia coli} 41.4579 64.2369 MG113(1231- 941 of 1368) GB: U01692(1- 291 of 291) MG115 142314 142550 SP: P31131 hypothetical protein (SP: P31131) {Escherichia coli} 32.6087 50 MG115(198- 237 of 237) GB: U02127(1- 40 of 234) MG116 142562 143314 -- -- -- -- MG116(1- 183 of 753) GB: U02127(52- 234 of 234) MG119 144972 146663 GB: M59444_2 methylgalactoside permease ATP-binding protein (mglA) 33.1984 57.6923 MG119(1660- {Escherichia coli} 1692 of 1692) GB: U02147(1- 33 of 301) MG119 144972 146663 GB: M59444_2 methylgalactoside permease ATP-binding protein (mglA) 33.1984 57.6923 MG119(192- {Escherichia coli} 1 of 1692) GB: U02149(1- 192 of 681) MG120 146673 148232 SP: P36948 ribose transport system permease protein (rbsC) 27.4809 51.9084 MG120(1- {Bacillus subtilis} 259 of 1560) GB: U02147(43- 301 of 301) MG122 149198 151324 GB: L27797_2 DNA topoisomerase I (topA) {Bacillus subtilis} 38.9222 59.7305 MG122(1193- 1443 of 2127) GB: U02134(1- 251 of 251) MG122 149198 151324 GB:

L27797_2 DNA topoisomerase I (topA) {Bacillus subtilis} 38.9222 59.7305 MG122(1578- 1971 of 2127) GB: U02242(1- 394 of 394) MG123 151305 152717 GB: M91593_1 hypothetical protein (GB: M91593_1) 23.9837 50.4065 MG123(1413- {Mycoplasma mycoides} 1236 of 1413) GB: U01796(114- 291 of 291) MG124 152767 153072 GB: J03294_1 thioredoxin (trx) {Bacillus subtilis} 36.0825 65.9794 MG124(64- 1 of 306) GB: U01796(1- 64 of 291) MG133 159669 158986 -- -- -- -- MG133(1- 110 of 684) GB: U02144(237- 345 of 345) MG133 159669 158986 -- -- -- -- MG133(435- 673 of 684) GB: X61537(1- 238 of 238) MG134 159797 160096 GB: M38777_3 hypothetical protein (GB: M38777_3) 28.5714 57.1429 MG134(109- {Escherichia coli} 1 of 300) GB: U02144(1- 109 of 345) MG135 160913 160074 PIR: E22845 hypothetical protein 4 (GP: Z33006_1) 30.7692 55.9441 MG135(485- {Trypanosoma brucei} 782 of 840) GB: U02114(1- 298 of 298) MG138 163590 165383 GB: K00426_1 GTP-binding membrane protein (lepA) 47.5465 70.5584 MG138(1237- {Escherichia coli} 938 of 1794) GB: U02133(2- 301 of 301) MG138 163590 165383 GB: K00426_1 GTP-binding membrane protein (lepA) 47.5465 70.5584 MG138(1318- {Escherichia coli} 1794 of 1794) GB: U01745(1- 477 of 524) MG138 163590 165383 GB: K00426_1 GTP-binding membrane protein (lepA) 47.5465 70.5584 MG138(323- {Escherichia coli} 591 of 1794) GB: X61521(1- 269 of 269) MG140 175807 179145 -- -- -- -- MG140(1- 41 of 3339) GB: U02110(178- 218 of 218) MG140 175807 179145 -- -- -- -- MG140(2727- 2429 of 3339) GB: U01730(1- 297 of 297) MG140 175807 179145 -- -- -- -- MG140(3302- 2994 of 3339) GB: U02156(1- 308 of 308) MG140 175807 179145 -- -- -- -- MG140(382- 834 of 3339) GB: U01729(1- 454 of 454) MG140 175807 179145 -- -- -- -- MG140(834- 616 of 3339) GB: X61512(1- 220 of 220) MG140 175807 179145 -- -- -- -- MG140(880- 1182 of 3339) GB: U01742(1- 303 of 303) MG141 179153 180745 SP: P32727 N-utilization substance protein A homolog (nusA) 30.8743 53.8251 MG141(223- {Bacillus subtilis} 871 of 1593) GB: U01778(1- 652 of 652) MG142 181007 182863 GB: M34836_1 protein synthesis initiation factor 2 (infB) 46.0292 64.6677 MG142(265- {Bacillus subtilis} 393 of 1857) GB: U01765(1- 129 of 129) MG144 183216 184052 -- -- -- -- MG144(190- 420 of 837) GB: U02121(1- 231 of 231) MG146 184877 186148 GB: X73141_2 hemolysin (tlyC) {Serpulina hyodysenteriae} 26.2712 52.1186 MG146(1272- 1174 of 1272) GB: U02223(19- 117 of 117) MG149 188609 189451 -- -- -- -- MG149(843- 765 of 843) GB: U02135(182- 260 of 260) MG151 190372 191142 SP: P10134 ribosomal protein L3 (rpL3) {Mycoplasma capricolum} 42.5926 61.5741 MG151(528- 1 of 771) GB: U02153(1- 527 of 543) MG168 198519 199151 GB: M57621_1 ribosomal protein S5 (rpS5) 55.9748 72.327 MG168(505- {Bacillus stearothermophilus} 633 of 633) GB: U01726(1- 129 of 260) MG175 202762 203133 GB: M26414_3 ribosomal protein S13 (rpS13) {Bacillus subtilis} 63.3333 82.5 MG175(22- 372 of 372) GB: U01733(1- 351 of 600) MG176 203136 203528 GB: X02543_2 ribosomal protein S11 (rpS11) {Escherichia coli} 47.7876 69.9115 MG176(1- 247 of 393) GB: U01733(354- 600 of 600) MG180 205682 206593 GB: M61017_1 membrane transport protein (glnQ) 37.3832 63.0841 MG180(249- {Bacillus stearothermophilus} 1 of 912) GB: U01754(1- 248 of 265) MG180 205682 206593 GB: M61017_1 membrane transport protein (glnQ) 37.3832 63.0841 MG180(912- {Bacillus stearothermophilus} 784 of 912) GB: U01750(167- 295 of 295) MG181 206589 207848 -- -- -- -- MG181(171- 1 of 1260) GB: U01750(1- 171 of 295) MG182 207844 208575 SP: P07649 pseudouridylate synthase I (hisT) {Escherichia coli} 27.0042 45.1477 MG182(1- 308 of 732) GB: U02176(70- 377 of 377) MG182 207844 208575 SP: P07649 pseudouridylate synthase I (hisT) {Escherichia coli} 27.0042 45.1477 MG182(732- 383 of 732) GB: U02100(31- 380 of 380) MG183 208568 210388 GB: Z32522_1 oligoendopeptidase F (pepF) {Lactococcus lactis} 30 50.6667 MG183(27- 335 of 1821) GB: U02198(1- 309 of 309) MG183 208568 210388 GB: Z32522_1 oligoendopeptidase F (pepF) {Lactococcus lactis} 30 50.6667 MG183(38- 1 of 1821) GB: U02100(1- 38 of 380) MG184 210392 211342 GB: M97479_2 methyltransferase (ssoIM) {Shigella sonnei} 42.5249 67.4419 MG184(520- 719 of 951) GB: U02115(1- 200 of 201) MG190 220479 221561 PIR.JS0068 29 kDa protein, MgPa operon (mgp) 62.0833 82.0833 MG190(28- {Mycoplasma genitalium} 1083 of 1083) GB: M31431(1- 1056 of 8760) MG194 232007 233029 GB: V00291_5 phenylalanyl-tRNA synthetase beta-subunit (pheS) 35.0769 56.3077 MG194(194- {Escherichia coli} 359 of 1023) GB: U02120(1- 166 of 166) MG195 233036 235453 SP: P17922 phenylalanyl-tRNA synthetase beta chain (pheT) 25.4597 49.0806 MG195(2044- {Bacillus subtilis} 2396 of 2418) GB: U02173(1- 353 of 353) MG200 237346 239148 GB: L36455_1 heat shock protein (dnaJ) {Coxiella burnetii} 33.5938 51.5625 MG200(842- 1227 of 1803) GB: U02163(2- 387 of 387) MG203 240322 242220 GB: U25549_1 topoisomerase IV subunit B (parE) 100 100 MG203(1216- {Mycoplasma genitalium} 1899 of 1899) GB U25549(1- 684 of 2124) MG204 242223 244565 GB: U25549_2 topoisomerase IV subunit A (parC) 99.7912 99.7912 MG204(1- {Mycoplasma genitalium} 1438 of 2343) GB: U25549(687- 2124 of 2124) MG204 242223 244565 GB: U25549_2 topoisomerase IV subunit A (parC) 99.7912 99.7912 MG204(1950- {Mycoplasma genitalium} 1641 of 2343) GB: U02155(1- 308 of 308) MG206 246127 247422 SP: P14951 excinuclease ABC subunit C (uvrC) 28.0872 51.0896 MG206(738- 399 of 1296) GB: U02182(1- 341 of 341) MG208 248492 247905 -- -- -- -- MG208(585- 162 of 588) GB: U01785(1- 423 of 423) MG209 249402 248479 SP: P23851 hypothetical protein (SP: P23851) {Escherichia coli} 30.4498 55.0173 MG209(730- 372 of 924) GB: U02214(1- 359 of 359) MG210 249947 249405 GB: M83994_1 prolipoprotein signal peptidase (lsp) 32.3944 52.1127 MG210(1- {Staphylococcus aureus} 116 of 543) GB: U01759(196- 311 of 311) MG212 251780 252583 GB: L32861_1 1-acyl-sn-glycerol-3-phosphate acetyltransferase (plsC) 32.1429 60.7143 MG212(7- {Borrelia burgdorferi} 315 of 804) GB: U02160(5- 313 of 313) MG216 255594 257117 GB: L07920_2 pyruvate kinase (pyk) {Lactococcus lactis} 35.3319 57.6017 MG216(1118- 790 of 1524) GB: U01798(1- 329 of 329) MG218 259176 264590 PIR: S37536 no score generated -score shown is bogus -1 -1 MG218(1669- 1977 of 5415) GB: U02165(1- 309 of 309) MG221 266626 267087 SP: P22186 hypothetical protein (SP: P22186) {Escherichia coli} 28.8732 56.338 MG221(337- 49 of 462) GB: U02195(1- 290 of 290) MG225 270404 271870 GB: U14003_71 hypothetical protein (GB: U14003_71) 21.9565 48.0435 MG225(1467- {Escherichia coli} 1409 of 1467) GB: U02264(289- 347 of 347) MG226 271938 273314 GB: D26562_11 aromatic amino acid transport protein (aroP) 24.5902 47.2131 MG226(221- {Escherichia coli} 1 of 1377) GB: U02264(1- 221 of 347) MG227 273789 274649 SP: P13954 thymidylate synthase (thyA) {Staphylococcus aureus} 56.5972 75.3472 MG227(577- 861 of 861) GB: U01718(1- 285 of 439) MG228 274652 275131 GB: X60681_1 dihydrofolate reductase (dhfr) {Lactococcus lactis} 33.1288 59.5092 MG228(480- 385 of 480) GB: U02137(174- 269 of 269) MG229 275140 276159 SP: P17424 ribonucleotide reductase 2 (nrdF) 50 70.0637 MG229(1020- {Salmonella typhimurium} 697 of 1020) GB: U01739(22- 344 of 344) MG231 276646 278808 GB: X73226_1 ribonucleoside-diphosphate reductase (nrdE) 54.1193 73.1534 MG231(2122- {Salmonella typhimurium} 2163 of 2163) GB: U02141(1- 42 of 827) MG237 281078 281959 -- -- -- -- MG237(647- 882 of 882) GB: U01774(1- 236 of 289) MG238 281992 283323 GB: M34066_1 trigger factor (tig) {Escherichia coli} 24.6193 47.9695 MG238(420- 648 of 1332) GB: U01772(1- 229 of 229) MG239 283395 285779 SP: P37945 ATP-dependent protease (Ion) {Bacillus subtilis} 43.6268 65.8344 MG239(1818- 1449 of 2385) GB: U02148(1- 370 of 370) MG240 286657 285782 GB: M91593_1 hypothetical protein (GB: M91593_1) 27.8195 53.3835 MG240(876- {Mycoplasma mycoides} 598 of 876) GB.U01734(27- 305 of 305) MG242 288752 290641 -- -- -- -- MG242(886- 543 of 1890) GB: U02194(1- 344 of 344) MG244 291332 293440 GB: M99049_1 DNA helicase II (mutB1) {Haemophilus influenzae} 36.0078 55.9687 MG244(829- 1035 of 2109) GB: X61517(1- 207 of 207) MG249 297604 296114 SP: P33656 RNA polymerase sigma-A factor (sigA) 43.6842 66.0526 MG249(970- {Clostridium acetobutylicum} 666 of 1491) GB: X61535(1- 306 of 306) MG250 299472 297652 GB: M10040_1 DNA primase (dnaE) {Bacillus subtilis} 27.2727 52.2078 MG250(1530- 1821 of 1821) GB: U01771(1- 292 of 572) MG250 299472 297652 GB: M10040_1 DNA primase (dnaE) {Bacillus subtilis} 27.2727 52.2078 MG250(648- 231 of 1821) GB: U02146(1- 418 of 418) MG254 304823 302847 GB: M24278_1 DNA ligase (lig) {Escherichia coli} 38.2263 59.3272 MG254(1429- 1722 of 1977) GB: U02152(1- 294 of 294) MG254 304823 302847 GB: M24278_1 DNA ligase (lig) {Escherichia coli} 38.2263 59.3272 MG254(37- 367 of 1977) GB: U01761(1- 330 of 330) MG255 304999 306093 -- -- -- -- MG255(726- 1095 of 1095) GB: U02164(1- 370 of 370) MG255 304999 306093 -- -- -- -- MG255(729- 400 of 1095) GB: U02174(1- 333 of 333) MG261 315699 318320 GB: M19334_4 DNA polymerase III alpha subunit (dnaE) 31.9115 55.7662 MG261(2442- {Escherichia coli} 2159 of 2622) GB: U01738(1- 284 of 284) MG263 320175 321047 GB: L10328_61 hypothetical protein (GB: L10328_61) 27.8008 47.7178 MG263(828- {Escherichia coli} 489 of 873) GB: U01764(1- 340 of 340) MG266 324809 322434 GB: M88581_1 leucyl-tRNA synthetase (leuS) 43.401 64.2132 MG266(78- {Bacillus stearothermophilus} 287 of 2376) GB: U01780(1- 210 of 210) MG266 324809 322434 GB: M88581_1 leucyl-tRNA synthetase (leuS) 43.401 64.2132 MG266(957- {Bacillus stearothermophilus} 622 of 2376) GB: U02167(1- 336 of 336) MG269 327050 326031 GB: D90354_1 surface protein antigen precursor (pag) 25.5144 47.3251 MG269(239- {Streptococcus sobrinus} 1 of 1020) GB: U02215(1- 239 of 366) MG271 329826 328456 SP: P11959 Dihydrolipoamide dehydrogenase (pdhD) 38.3592 62.306 MG271(914- {Bacillus stearothermophilus} 1214 of 1371) GB: U01784(1- 301 of 301) MG275 334772 333339 SP: P37061 NADH oxidase (nox) {Enterococcus faecalis} 39.229 62.1315 MG275(81- 1 of 1434) GB: U01786(4- 84 of 280) MG276 335397 334858 GB: M14040_1 Adenine phosphoribosyltransferase (apt) 34.3373 58.4337 MG276(540- {Escherichia coli} 430 of 540) GB: U01786(170- 280 of 280) MG278 338366 340525 GB: X72832_5 stringent response-like protein (rel) 29.1339 55.1181 MG278(391- {Streptococcus equisimilis} 697 of 2160) GB: U01770(1- 308 of 308) MG281 343702 342035 -- -- -- -- MG281(748- 1051 of 1668) GB: U01706(1- 303 of 303) MG282 344849 344367 SP: P2740 transcription elongation factor (greA) 40.146 65.6934 MG282(483- {Rickettsia prowazekii} 356 of 483) GB: U02104(187- 314 of 314) MG283 345181 346629 GB: M97858_1 prolyl-tRNA synthetase (proS) {Escherichia coli} 22.6562 46.0938 MG283(839- 1183 of 1449) GB: U02205(1- 346 of 346) MG285 347214 348254 -- -- -- -- MG285(315- 493 of 1041) GB: U02266(1- 180 of 180) MG289 354023 355126 SP: P15363 high affinity transport system protein P37 (P37) 35.7798 58.4098 MG289(105- {Mycoplasma hyorhinis} 1 of 1104) GB: U02132(1- 105 of 571) MG291 355846 357474 SP: P15362 transport system permease protein P69 (P69) 27.9159 54.8757 MG291(1216- {Mycoplasma hyorhinis} 1629 of 1629) GB: U01768(1- 415 of 705) MG291 355846 357474 SP: P15362 transport system permease protein P69 (P69) 27.9159 54.8757 MG291(279- {Mycoplasma hyorhinis} 1 of 1629) GB: U02171(1- 279 of 346) MG293 361384 360653 SP: P37965 Glycerophosphoryl diester phosphodiesterase (glpQ) 30.3965 55.9471 MG293(357- {Bacillus subtilis} 41 of 732) GB: U02118(1- 317 of 317) MG294 362801 361380 GB: L19201_18 hypothetical protein (GB: L19201_18) 23.1013 46.2025 MG294(256- {Escherichia coli} 592 of 1422) GB: U02243(1- 337 of 337) MG297 365574 364537 GB: U00039_18 cell division protein (ftsY) {Escherichia coli} 36.1371 57.9439 MG297(1- 57 of 1038) GB: U02177(215- 271 of 271) MG298 368529 365584 GB: M34956_1 115 kDa protein (p115) {Mycoplasma hyorhinis} 33.4059 57.5626 MG298(2743- 2946 of 2946) GB: U02177(1- 205 of 271) MG300 370962 369715 SP: P36204 phosphoglycerate kinase (pgk) {Thermotoga maritima} 51.2887 70.6186 MG300(1- 167 of 1248) GB: U02178(167- 333 of 333) MG300 370962 369715 SP: P36204 phosphoglycerate kinase (pgk) {Thermotoga maritima} 51.2887 70.6186 MG300(935- 609 of 1248) GB: U02226(1- 326 of 326) MG300 370962 369715 SP: P36204 phosphoglycerate kinase (pgk) {Thermotoga maritima} 51.2887 70.6186 MG300(939- 1243 of 1248) GB: U02234(1- 305 of 305) MG301 371962 370952 GB: X72219_1 glyceraldehyde-3-phosphate dehydrogenase (gap) 56.0606 73.0303 MG301(244- {Clostridium pasteurianum} 1 of 1011) GB: U02213(1- 244 of 364) MG301 371962 370952 GB: X72219_1 glyceraldehyde-3-phosphate dehydrogenase (gap) 56.0606 73.0303 MG301(835- {Clostridium pasteurianum} 1011 of 1011) GB: U02178(1- 177 of 333) MG302 372946 371996 -- -- -- -- MG302(951- 865 of 951) GB: U02213(278- 364 of 364) MG305 376705 374921 GB: D30690_3 heat shock protein 70 (hsp70) {Staphylococcus aureus} 57.4359 75.8974 MG305(1382- 1055 of 1785) GB: U02204(1- 327 of 327) MG307 381507 377977 -- -- -- -- MG307(3175- 2042 of 3531) GB: U01767(1- 1134 of 1134) MG308 382724 381495 SP: P23304 ATP-dependent RNA helicase (deaD) {Escherichia coli} 23.0986 48.169 MG308(1- 89 of 1230) GB: U02200(276- 364 of 364) MG309 386408 382734 -- -- -- -- MG309(3410-

3675 of 3675) GB: U02200(1- 266 of 364) MG312 391334 387918 GB: U11381_1 cytadherence-accessory protein (hmwl) 39.3235 60.6765 MG312(2541- {Mycoplasma pneumoniae} 2160 of 3417) GB: U02261(1- 382 of 382) MG314 393633 392305 GP: L38997_4 hypothetical protein (GP: L38997_4) 51.4477 71.4922 MG314(514- {Mycoplasma pneurnoniae} 206 of 1329) GB: U02151(1- 309 of 309) MG317 397423 395627 GB: M82965_1 cytadherence-accessory protein (hmw3) 41.1458 59.8958 MG317(1329- {Mycoplasma pneumoniae} 1542 of 1797) GB: U02267(1- 214 of 214) MG317 397423 395627 GB: M82965_1 cytadherence-accessory protein (hmw3) 41.1458 59.8958 MG317(509- {Mycoplasma pneumoniae} 169 of 1797) GB: U02224(1- 341 of 341) MG317 397423 395627 GB: M82965_1 cytadherence-accessory protein (hmw3) 41.1458 59.8958 MG317(73- {Mycoplasma pneumoniae} 1 of 1797) GB: U01716(1- 73 of 325) MG318 398280 397441 GB: J04151_1 fibronectin-binding protein (fnbA) 24.6154 43.0769 MG318(840- {Staphylococcus aureus} 604 of 840) GB: U01716(91- 325 of 325) MG319 398833 398300 -- -- -- -- MG319(423- 1 of 534) GB: U01769(1- 426 of 541) MG320 399797 398940 -- -- -- -- MG320(371- 781 of 858) GB: U01700(1- 410 of 410) MG324 408792 407731 GB: D00398_1 aminopeptidase P (pepP) {Escherichia coli} 30.531 54.4248 MG324(883- 1062 of 1062) GB: U01717(1- 181 of 223) MG324 408792 407731 GB: D00398_1 aminopeptidase P (pepP) {Escherichia coli} 30.531 54.4248 MG324(889- 1062 of 1062) GB: U01755(2- 175 of 217) MG327 410676 409873 SP: P26174 magnesium-chelatase 30 kDa subunit (bchO) 26.7281 51.1521 MG327(782- {Rhodobacter capsulatus} 533 of 804) GB: U02232(1- 250 of 250) MG328 412933 410666 GB: X62467_1 protein V (fcrV) {Streptococcus sp.} 27.5434 48.3871 MG328(339- 53 of 2268) GB: U02188(1- 287 of 287) MG328 412933 410666 GB: X62467_1 protein V (fcrV) {Streptococcus sp.} 27.5434 48.3871 MG328(817- 462 of 2268) GB: U02203(1- 356 of 356) MG330 414975 414325 SP: P38493 cytidylate kinase (cmk) {Bacillus subtilis} 40.3756 61.0329 MG330(537- 226 of 651) GB: U02241(1- 312 of314) MG334 419480 416970 SP: Q05873 valyl-tRNA synthetase (valS) {Bacillus subtilis} 38.5629 60.5988 MG334(1109- 781 of 2511) GB: U02202(1- 330 of 330) MG334 419480 416970 SP: Q05873 valyl-tRNA synthetase (valS) {Bacillus subtilis} 38.5629 60.5988 MG334(2400- 2511 of 2511) GB: U02249(1- 112 of 305) MG335 420045 419473 SP: P38424 hypothetical protein (SP: P38424) {Bacillus subtilis} 34.5238 61.3095 MG335(1- 95 of 573) GB: U02190(200- 294 of 294) MG336 421467 422690 GB: U00013_6 nitrogen fixation protein (nifS) {Mycobacterium leprae} 26.2295 47.2678 MG336(990- 719 of 1224) GB: U02256(1- 272 of 272) MG337 422697 423110 -- -- -- -- MG337(414- 151 of 414) GB: U01709(35- 297 of 297) MG338 426915 423103 -- -- -- -- MG338(1- 251 of 3813) GB: U02269(65- 315 of 315) MG338 426915 423103 -- -- -- -- MG338(1304- 917 of 3813) GB: U02221(1- 388 of 388) MG338 426915 423103 -- -- -- -- MG338(3342- 3067 of 3813) GB: U01809(1- 276 of 276) MG338 426915 423103 -- -- -- -- MG338(3772- 3813 of 3813) GB: U01709(1- 42 of 297) MG339 428115 427096 GB: L25893_1 recombination protein (recA) {Staphylococcus aureus} 46.5986 69.3878 MG339(372- 93 of 1020) GB: U01704(1- 279 of 279) MG340 434458 430583 SP: P00577 DNA-directed RNA polymerase beta'chain (rpoC) 44.4828 66.0345 MG340(1294- {Escherichia coli} 999 of 3876) GB: X61534(1- 295 of 295) MG340 434458 430583 SP: P00577 DNA-directed RNA polymerase beta'chain (rpoC) 44.4828 66.0345 MG340(1519- {Escherichia coli} 1289 of 3876) GB: X61528(1- 231 of 231) MG340 434458 430583 SP: P00577 DNA-directed RNA polymerase beta'chain (rpoC) 44.4828 66.0345 MG340(3444- {Escherichia coli} 3083 of 3876) GB: U02169(1- 361 of 361) MG340 434458 430583 SP: P00577 DNA-directed RNA polymerase beta'chain (rpoC) 44.4828 66.0345 MG340(3772- {Escherichia coli} 3876 of 3876) GB: U01766(1- 105 of 467) MG340 434458 430583 SP: P00577 DNA-directed RNA polymerase beta'chain (rpoC) 44.4828 66.0345 MG340(426- {Escherichia coli} 66 of 3876) GB: U01797(1- 361 of 361) MG341 438640 434471 GB: L24376_3 RNA polymerase beta subunit (rpoB) {Bacillus subtilis} 46.5338 67.5043 MG341(1- 107 of 4170) GB: U02230(217- 323 of 323) MG341 438640 434471 GB: L24376_3 RNA polymerase beta subunit (rpoB) {Bacillus subtilis} 46.5338 67.5043 MG341(1932- 1595 of 4170) GB: U01737(1- 338 of 338) MG341 438640 434471 GB: L24376_3 RNA polymerase beta subunit (rpoB) {Bacillus subtilis} 46.5338 67.5043 MG341(2833- 3201 of 4170) GB: U01735(1- 369 of 369) MG342 439236 438733 -- -- -- -- MG342(381- 504 of 504) GB: U02230(1- 124 of 323) MG342 439236 438733 -- -- -- -- MG342(386- 65 of 504) GB: U02231(1- 322 of 322) MG343 440355 439318 -- -- -- -- MG343(108- 452 of 1038) GB: U01811(1- 345 of 345) MG344 441180 440362 GP: U17036_2 lipase-esterase (lip1) {Mycoplasma mycoides} 26.6667 47.5 MG344(575- 767 of 819) GB: U02222(1- 193 of 193) MG345 443878 441194 SP: P00956 isoleucyl-tRNA synthetase (ileS) {Escherichia coli} 33.2963 56.2708 MG345(1115- 782 of 2685) GB: U02196(1- 334 of 334) MG345 443878 441194 SP: P00956 isoleucyl-tRNA synthetase (ileS) {Escherichia coli} 33.2963 56.2708 MG345(1811- 2134 of 2685) GB: U02254(1- 324 of 324) MG348 446165 445200 -- -- -- -- MG348(166- 459 of 966) GB: U01781(1- 292 of 292) MG352 450222 450719 GB: U11883_2 hypothetical protein (GB: U11883_2) {Bacillus subtilis} 33.3333 56.7901 MG352(366- 498 of 498) GB: U02237(1- 133 of 310) MG353 451048 450722 -- -- -- -- MG353(327- 153 of 327) GB: U02237(136- 309 of 310) MG357 455947 454769 GB: L17320_2 acetate kinase (ackA) {Bacillus subtilis} 42.6735 65.5527 MG357(342- 131 of 1179) GB: X61531(1- 211 of 211) MG358 456590 457369 GB: M21298_1 Holliday junction DNA helicase (ruvA) 26.2411 42.5532 MG358(350- {Escherichia coli} 87 of 780) GB: U02233(1- 265 of 265) MG361 459615 460100 SP: P29394 ribosomal protein L10 (rpL10) {Thermotoga maritima} 29.8137 61.4907 MG361(274- 486 of 486) GB: U02206(1- 213 of 345) MG362 460126 460491 SP: P02394 ribosomal protein L7/L12 (`A`type) (rpL7/L12) 47.5 70 MG362(1- {Bacillus subtilis} 107 of 366) GB: U02206(239- 345 of 345) MG365 461682 462614 GB: X63666_2 methionyl-tRNA formyltransferase (fmt) 24.43 50.8143 MG365(292- {Escherichia coli} 1 of 933) GB: U02238(1- 292 of 349) MG368 466410 465427 GB: M96793_1 fatty acid/phospholipid synthesis protein (plsX) 28.972 52.3364 MG368(227- {Escherichia coli} 1 of 984) GB: U01791(1- 227 of 326) MG369 468083 466413 -- -- -- -- MG369(1146- 1446 of 1671) GB: U01763(1- 300 of 300) MG370 469123 468155 SP: P23851 hypothetical protein (SP: P23851) {Escherichia coli} 26.9531 48.8281 MG370(240- 599 of 969) GB: U02220(1- 360 of 360) MG371 470084 469113 GB: D26185_10 hypothetical protein (GB: D26185_10) 25.8065 47.0046 MG371(349- {Bacillus subtilis} 689 of 972) GB: U02263(1- 341 of 341) MG374 472891 472070 -- -- -- -- MG374(1- 178 of 822) GB: U02250(159- 337 of 337) MG375 474578 472887 GB: M36594_1 threonyl-tRNA synthetase (thrSv) {Bacillus subtilis} 38.7097 60.7527 MG375(1048- 1389 of 1692) GB: U02130(1- 342 of 342) MG375 474578 472887 GB: M36594_1 threonyl-tRNA synthetase (thrSv) {Bacillus subtilis} 38.7097 60.7527 MG375(1530- 1692 of 1692) GB: U02250(1- 163 of 337) MG378 477139 475529 SP: P35868 arginyl-tRNA synthetase (argS) 33.6406 56.9124 MG378(1364- {Corynebacterium glutamicum} 1047 of 1611) GB: U01740(1- 319 of 319) MG378 477139 475529 SP: P35868 arginyl-tRNA synthetase (argS) 33.6406 56.9124 MG378(765- {Corynebacterium glutamicum} 456 of 1611) GB: U02168(1- 309 of 309) MG379 477168 479003 GB: L10328_106 glucose inhibited division protein (gidA) 40.7346 61.9366 MG379(900- {Escherichia coli} 1184 of 1836) GB: U01812(1- 285 of 285) MG385 484699 483992 -- -- -- -- MG385(234- 6 of 708) GB: U02112(1- 229 of 229) MG385 484699 483992 -- -- -- -- MG385(523- 708 of 708) GB: U02239(1- 186 of 320) MG385 484699 483992 -- -- -- -- MG385(528- 259 of 708) GB: U02246(1- 270 of 270) MG386 489552 484705 GB: U11381_1 cytadherence-accessory protein (hmwl) 31.1755 49.4037 MG386(1294- {Mycoplasma pneumoniae} 1628 of 4848) GB: U02175(1- 335 of 335) MG386 489552 484705 GB: U11381_1 cytadherence-accessory protein (hmwl) 31.1755 49.4037 MG386(2274- {Mycoplasma pneumoniae} 1991 of 4848) GB: X61519(1- 283 of 284) MG386 489552 484705 GB: U11381_1 cytadherence-accessory protein (hmwl) 31.1755 49.4037 MG386(3247- {Mycoplasma pneumoniae} 3420 of 4848) GB: U02126(1- 174 of 174) MG386 489552 484705 GB: U11381_1 cytadherence-accessory protein (hmwl) { 31.1755 49.4037 MG386(3842- Mycoplasma pneumoniae} 4196 of 4848) GB: U02192(1- 355 of 355) MG386 489552 484705 GB: U11381_1 cytadherence-accessory protein (hmwl) 31.1755 49.4037 MG386(767- {Mycoplasma pneumoniae} 1281 of 4848) GB: U02245(2- 515 of 515) MG388 491004 490702 GB: U00016_19 hypothetical protein (GB: U00016_19) 30.9278 56.701 MG388(285- {Mycobacterium leprae} 1 of 303) GB: U02265(1- 285 of 339) MG389 491530 491150 -- -- -- -- MG389(320- 129 of 381) GB: U01813(1- 192 of 192) MG390 493516 491537 SP: P37608 lactococcin transport ATP-binding protein (lcnDR3) 22.3421 46.5331 MG390(1395- {Lactococcus lactis} 1744 of 1980) GB: U02218(1- 350 of 350) MG390 493516 491537 SP: P37608 lactococcin transport ATP-binding protein (lcnDR3) 22.3421 46.5331 MG390(1400- {Lactococcus lactis} 1174 of 1980) GB: U02248(1- 227 of 227) MG391 494967 493627 GB: D17450_1 aminopeptidase {Mycoplasma salivarium} 41.2921 60.3933 MG391(1- 217 of 1341) GB: U02268(256- 472 of 472) MG391 494967 493627 GB: D17450_1 aminopeptidase {Mycoplasma salivarium} 41.2921 60.3933 MG391(412- 735 of 1341) GB: U01801(1- 324 of 324) MG391 494967 493627 GB: D17450_1 aminopeptidase {Mycoplasma salivarium} 41.2921 60.3933 MG391(412- 735 of 1341) GB: U01802(1- 324 of 324) MG392 496615 494987 GB: L10132_2 heat shock protein (groEL) 51.5209 71.4829 MG392(1394- {Bacillus stearothermophilus} 1629 of 1629) GB: U02268(1- 236 of 472) MG392 496615 494987 GB: L10132_2 heat shock protein (groEL) 51.5209 71.4829 MG392(181- {Bacillus stearothermophilus} 1 of 1629) GB: U02252(1- 181 of 296) MG393 496960 496631 GB: D17398_1 heat shock protein 60-like protein (PggroES) 39.5604 54.9451 MG393(330- {Porphyromonas gingivalis} 231 of 330) GB: U02252(197- 296 of 296) MG394 498306 497089 SP: P06192 serine hydroxymethyltransferase (glyA) 55.303 70.7071 MG394(328- {Salmonella typhimurium} 683 of 1218) GB: U02131(1- 356 of 356) MG395 499890 498319 -- -- -- -- MG395(457- 116 of 1572) GB: U02260(1- 342 of 342) MG395 499890 498319 -- -- -- -- MG395(763- 979 of 1572) GB: X61530(1- 217 of 217) MG399 503976 502831 SP: P33253 ATP synthase beta chain (atpD) 80.9524 89.418 MG399(447- {Mycoplasma gallisepticum} 852 of 1146) GB: U01752(1- 406 of 406) MG400 505099 504263 SP: P33257 ATP synthase gamma chain (atpG) 37.9433 62.0567 MG400(160- {Mycoplasma gallisepticum} 711 of 837) GB: U01703(1- 552 of 552) MG401 506655 505102 SP: P33252 ATP synthase alpha chain (atpA) 63.3911 79.5761 MG401(973- {Mycoplasma gallisepticum} 1554 of 1554) GB: U01727(1- 583 of 598) MG405 509012 508137 GB: X64256_2 adenosinetriphosphatase (atpB) 36.4261 63.9175 MG405(75- {Mycoplasma gallisepticum} 1 of 876) GB: U01728(1- 75 of 299) MG406 509319 508981 SP: P15362 transport system permease protein P69 (P69) 40 57.1429 MG406(339- {Mycoplasma hyorhinis} 84 of 339) GB: U01728(44- 299 of 299) MG410 513042 512056 GB: L10328_89 peripheral membrane protein B (pstB) {Escherichia coli} 50.813 70.3252 MG410(301- 941 of 987) GB: U01707(1- 640 of 640) MG411 514991 513030 GB: X75297_1 periplasmic phosphate permease homolog (AG88) 30.7692 56.2753 MG411(406- {Mycobacterium tuberculosis} 632 of 1962) GB: U01746(1- 227 of 229) MG412 516124 514994 -- -- -- -- MG412(252- 1 of 1131) GB: U01702(1- 252 of 313) MG412 516124 514994 -- -- -- -- MG412(675- 563 of 1131) GB: U02101(1- 113 of 113) MG413 518389 516248 GB: L22432_4 hypothetical protein (GB: L22432_4) 25 54.1667 MG413(1179- {Mycoplasma capricolum} 701 of 2142) GB: U01699(1- 480 of 480) MG413 518389 516248 GB: L22432_4 hypothetical protein (GB: L22432_4) 25 54.1667 MG413(1535- {Mycoplasma capricolum} 1230 of 2142) GB: U01804(1- 305 of 305) MG414 519355 516248 -- -- -- -- MG414(438- 154 of 917) GB: U01695(1- 285 of 285) MG416 521414 520371 -- -- -- -- MG416(1- 39 of 1044) GB: U01744(580- 618 of 620) MG416 521414 520371 -- -- -- -- MG416(7- 351 of 1044) GB: U02102(1- 345 of 345) MG418 522314 521877 SP: P02410 ribosomal protein L13 (rpL13) {Escherichia coli} 41.3043 70.2899 MG418(321- 438 of 438) GB: U01744(1- 118 of 620) MG421 526696 524153 SP: P07671 excinuclease ABC subunit A (uvrA) {Escherichia coli} 47.7541 68.5579 MG421(1693- 1393 of 2544) GB: X61514(1- 301 of 301) MG422 529493 526989 -- -- -- -- MG422(2274- 2101 of 2505) GB: U02117(1- 174 of 174) MG422 529493 526989 -- -- -- -- MG422(2439- 2505 of 2505) GB: U02172(1- 67 of 318) MG422 529493 526989 -- -- -- -- MG422(35- 1 of 2505) GB: U02228(1- 35 of 304) MG423 531216 529534 -- -- -- -- MG423(1434- 1197 of 1683) GB: X61510(1- 238 of 238) MG423 531216 529534 -- -- -- -- MG423(161- 413 of 1683) GB: X61524(1- 252 of 255) MG423 531216 529534 -- -- -- -- MG423(1683- 1455 of 1683) GB: U02228(76- 304

of 304) MG425 531668 533014 SP: P23304 ATP-dependent RNA helicase (deaD) {Escherichia coli} 32.4121 58.0402 MG425(989- 769 of 1347) GB: U01805(1- 220 of 220) MG431 538290 537559 GB: L27492_1 triosephosphate isomerase (tim) {Thermotoga maritima} 39.7541 61.8852 MG431(463- 732 of 732) GB: U02109(1- 270 of 277) MG437 542067 542981 GB: M11330_1 CDP-diglyceride synthetase (cdsA) {Escherichia coli} 38.0165 55.3719 MG437(679- 378 of 915) GB: U02189(2- 303 of 303) MG441 546707 546300 -- -- -- -- MG441(20- 318 of 408) GB: U02128(1- 299 of 299) MG447 552444 550804 GB: L08897_1 hypothetical protein (GB: L08897_1) 34.058 55.0725 MG447(319- {Mycoplasma gallisepticum} 645 of 1641) GB: U01788(1- 327 of 327) MG451 555612 554431 SP: P13927 elongation factor TU (tuf) {Mycoplasma genitalium} 100 100 MG451(927- 586 of 1182) GB: U02255(1- 342 of 342) MG453 556435 557310 GB: L12272_1 UDP-glucose pyrophosphorylase (gtaB) 48.0287 65.233 MG453(491- {Bacillus subtilis} 181 of 876) GB: U02258(1- 311 of 311) MG455 557724 558944 GB: M77668_1 tyrosyl tRNA synthetase (tyrS) 38.539 61.7128 MG455(604- {Bacillus stearothermophilus} 362 of 1221) GB: U02247(5- 247 of 247) MG456 559941 558940 -- -- -- -- MG456(256- 568 of 1002) GB: U01790(1- 312 of 312) MG458 563307 562783 SP: Q02522 hypoxanthine-guanine phosphoribosyltransferase (hpt) 38.3721 66.8605 MG458 (295- {Lactococcus lactis} 24 of 525) GB: U02193(1- 272 of 272) MG459 563818 563312 GB: M64978_2 surface exclusion protein (prgA) (Plasmid pCF10) 28.3582 49.2537 MG459(330- {Enterococcus faecalis} 1 of 507) GB: U01725(1- 330 of 638) MG460 563991 564926 SP: P33572 L-lactate dehydrogenase (ldh) 50.3226 67.7419 MG460(1 - {Mycoplasma hyopneumoniae} 136 of 936) GB: U01725(503- 638 of 638) MG462 567638 566187 GB: M55072_1 glutamyl-tRNA synthetase (gltX) 42.887 65.272 MG462(1452- {Bacillus stearothermophilus} 1081 of 1452) GB: U02122(9- 379 of 379) MG463 568404 567628 GB: D26185_105 high level kasgamycin resistance (ksgA) 35.6164 53.8813 MG463(777- {Bacillus subtilis} 409 of 777) GB: U01719(36- 405 of 405) MG467 570988 570056 GB: X75422_1 heterocyst maturation protein (devA) {Anabaena sp.} 39.899 63.1313 MG467(40- 352 of 933) GB: U01741(1- 313 of 313) MG469 578578 577268 SP: P34028 chromosomal replication initiator protein (dnaA) 30.9469 57.2748 MG469(845- {Spiroplasma citri} 547 of 1311) GB: U02259(1- 299 of 299) MG469 578578 577268 SP: P34028 chromosomal replication initiator protein (dnaA) 30.9469 57.2748 MG469(855- {Spiroplasma citri} 1206 of 1311) GB: U02145(1- 352 of 352)

[0279]

4TABLE 1(d) UID Old_id(s) MG001 MORF-20072 MG002 MORF-19817 MG003 MORF-19818 MORF-20073 MG004 MORF-19819 MORF-20074 MG005 MORF-20075 MG006 MORF-20076 MG007 MORF-19820 MG008 MORF-20077 MG009 MORF-20078 MG010 MORF-20079 MG011 MORF-19821 MORF-19822 MG012 MORF-20080 MG013 MORF-19823 MORF-20080 MORF-20081 MG014 MORF-20082 MG015 MORF-20084 MG016 MORF-19824 MG017 MORF-19825 MG018 MORF-20085 MG019 MORF-20086 MG020 MORF-20088 MG021 MORF-20089 MG022 MORF-20091 MG023 MORF-20092 MG024 MORF-19826 MORF-20093 MG025 MORF-20094 MG026 MORF-20095 MG027 MORF-19827 MG028 MORF-19828 NG029 MORF-19829 MG030 MORF-20096 MG031 MORF-19830 MORF-20097 MG032 MORF-20099 MG033 MORF-20100 MG034 MORF-20101 MG035 MORF-20102 MG036 MORF-20103 MG037 MORF-20104 MG038 MORF-20105 MG039 MORF-19831 MORF-20106 MG040 MORF-20107 MG042 MORF-19832 MORF-20108 MG043 MORF-20110 MG044 MORF-20111 MG045 MORF-19833 MG046 MORF-20112 MG047 MORF-20113 MG048 MORF-19834 MORF-20114 MORF-20115 MG049 MORF-20114 MORF-20115 MG050 MORF-20117 MG051 MORF-19835 MORF-20118 MG052 MORF-20119 MG053 MORF-20120 MG054 MORF-20120 MORF-20121 MG055 MORF-19836 MG056 MORF-20122 MG057 MORF-20123 MG058 MORF-20124 MG059 MORF-20124 MORF-20125 MG060 MORF-20126 MG061 MORF-19838 MG062 MORF-19839 MORF-20127 MORF-20128 MG063 MORF-19840 MORF-20128 MG064 MORF-19841 MORF-19842 MG065 MORF-19843 MORF-20129 MG066 MORF-19844 MORF-20130 MG067 MORF-19845 MG068 MORF-20131 MG069 MORF-19847 MORF-20135 MG070 MORF-20136 MG071 MORF-19848 MORF-19849 MORF-19850 MORF-19851 MORF-20137 MG072 MORF-19852 MORF-19853 MORF-19854 MORF-20138 MG073 MORF-20139 MG074 MORF-19855 MG075 MORF-19856 MORF-19857 MG076 MORF-19858 MG077 MORF-20140 MG078 MORF-19859 MORF-20141 MG079 MORF-20142 MG080 MORF-20143 MG081 MORF-20144 MG082 MORF-20145 MG083 MORF-20146 MG084 MORF-20147 MG085 MORF-20147 MORF-20148 MG086 MORF-19860 MORF-19861 MG087 MORF-20149 MG088 MORF-20150 MG089 MORF-20151 MORF-20152 MG090 MORF-19862 MG091 MORF-20153 MG092 MORF-20154 MG093 MORF-20155 MG094 MORF-20156 MG095 MORF-19863 MG096 MORF-20157 MG097 MORF-20158 MG098 MORF-20159 MG099 MORF-19864 MORF-20160 MG100 MORF-19865 MORF-20161 MG101 MORF-19866 MG102 MORF-20162 MG103 MORF-19867 MORF-19868 MG104 MORF-20163 MG105 MORF-19869 MG106 MORF-20164 MORF-20165 MG107 MORF-20164 MORF-20165 MG108 MORF-20166 MG109 MORF-20167 MG110 MORF-20168 MG111 MORF-20169 MG112 MORF-20170 MG113 MORF-19870 MORF-20171 MORF-20172 MG114 MORF-20171 MORF-20172 MG116 MORF-19871 MG117 MORF-19872 MG118 MORF-20173 MG119 MORF-19873 MORF-20174 MG120 MORF-19874 MG121 MORF-19875 MORF-20175 MG122 MORF-20176 MG123 MORF-19876 MG124 MORF-20177 MG125 MORF-19877 MG126 MORF-20178 MG127 MORF-20179 MG128 MORF-20180 MG129 MORF-20181 MG130 MORF-20182 MG132 MORF-20183 MG133 MORF-19878 MG134 MORF-20184 MG135 MORF-20185 MG136 MORF-20186 MORF-20187 MG137 MORF-20186 MORF-20187 MG138 MORF-20188 MG139 MORF-19879 MG140 MORF-19884 MG141 MORF-19885 MORF-20192 MG142 MORF-19886 MORF-20193 MG143 MORF-20194 MG144 MORF-19887 MG145 MORF-20195 MG146 MORF-20196 MG147 MORF-19888 MORF-19889 MG148 MORF-19890 MG149 MORF-19891 MG150 MORF-19893 MORF-20197 MG151 MORF-19893 MORF-20198 MG152 MORF-19895 MORF-20199 MG153 MORF-19894 MG154 MORF-19896 MORF-20200 MG156 MORF-19897 MG157 MORF-20201 MG158 MORF-20202 MG159 MORF-19898 MG161 MORF-19900 MORF-20203 MG162 MORF-19899 MORF-19900 MG163 MORF-20204 MG165 MORF-20205 MG166 MORF-19901 MORF-20206 MG167 MORF-19901 MORF-20207 MG168 MORF-19902 MORF-20208 MG169 MORF-20209 MG170 MORF-20210 MG171 MORF-20211 MG172 MORF-20212 MG175 MORF-20213 MG176 MORF-20214 MG177 MORF-19903 MORF-20215 MG178 MORF-20216 MG179 MORF-19904 MORF-20217 MG180 MORF-20218 MG181 MORF-19905 MG182 MORF-20219 MG183 MORF-20219 MG184 MORF-20220 MG185 MORF-20221 MG186 MORF-19907 MG187 MORF-19908 MORF-19909 MORF-20225 MG188 MORF-20226 MORF-20227 MG189 MORF-20226 MORF-20227 MG190 MORF-20228 MG191 MORF-19910 MORF-19911 MORF-20229 MG192 MORF-19911 MORF-19912 MORF-20230 MG194 MORP-19913 MORF-20234 MG195 MORF-20235 MG196 MORF-20236 MG199 MORF-19914 MG200 MORF-19915 MORF-20237 MG201 MORF-19916 MORF-20239 MG202 MORF-19917 MG203 MORF-19918 MORF-19919 MORF-20240 MG204 MORF-20241 MORF-20242 MG205 MORF-20243 MG206 MORF-20244 MG207 MORF-19920 MG208 MORF-19921 MG209 MORF-20245 MG210 MORF-20246 MG211 MORF-19922 MG212 MORF-19924 MORF-20247 MORF-20248 MG213 MORF-20248 MG214 MORF-20249 MG215 MORF-20250 MG216 MORF-20251 MG217 MORF-20252 MG218 MORF-19926 MORF-19927 MORF-20253 MG219 MORF-19928 MORF-19930 MORF-20253 MG220 MORF-19931 MG221 MORF-20255 MG222 MORF-20256 MG223 MORF-19932 MG224 MORF-20257 MG225 MORF-20258 MG226 MORF-20259 MG227 MORF-20260 MG228 MORF-19933 MG229 MORF-19934 MORF-20261 MG230 MORF-19935 MG231 MORF-20262 MG232 MORF-20263 MG234 MORF-20264 MG235 MORF-19936 MORF-20265 MG236 MORF-19937 MG237 MORF-19938 MG238 MORF-19939 MORF-20266 MG239 MORF-20267 MG240 MORF-20268 MG241 MORF-19940 MORF-19941 MORF-19942 MG242 MORF-19943 MG243 MORF-19945 MG244 MORF-20269 MG245 MORF-19946 MG246 MORF-19947 MG247 MORF-20270 MG248 MORF-19948 MG249 MORF-19949 MORF-20271 MG250 MORF-20272 MG251 MORF-19950 MORF-20273 MG252 MORF-20274 MG253 MORF-20275 MG254 MORF-20276 MG255 MORF-19951 MORF-19952 MG256 MORF-19953 MG258 MORF-19954 MORF-20277 MG259 MORF-20278 MG260 MORF-19955 MORF-19956 MORF-20279 MG261 MORF-19958 MORF-20282 MG262 MORF-20283 MG263 MORF-20285 MG264 MORF-20286 MORF-20287 MG265 MORF-20286 MORF-20287 MG266 MORF-20288 MG267 MORF-19959 MORF-19960 MG268 MORF-20290 MG269 MORF-20291 MG270 MORF-20292 MG271 MORF-20293 MG272 MORF-19961 MORF-19962 MORF-20294 MG273 MORF-20295 MG274 MORF-20296 MG275 MORF-20297 MG276 MORF-20298 MG277 MORF-19963 MORF-20299 MG278 MORF-19964 MORF-20300 MG279 MORF-19965 MG280 MORF-19966 MORF-20301 MG281 MORF-19967 MORF-19968 MG282 MORF-20302 MG283 MORF-20303 MG284 MORF-19969 MORF-19970 MORF-19971 MG285 MORF-19969 MORF-19970 MORF-19971 MG286 MORF-19972 MG288 MORF-20306 MG289 MORF-20307 MG290 MORF-20308 MG291 MORF-20309 MG292 MORF-20310 MG293 MORF-20311 MG294 MORF-19974 MORF-20312 MG295 MORF-20313 MG296 MORF-19975 MG297 MORF-20314 MG298 MORF-19976 MORF-20315 MG299 MORF-20316 MG300 MORF-20317 MG301 MORF-19977 MORF-20318 MG302 MORF-19978 MG303 MORF-20319 MG304 MORF-20320 MG305 MORF-19979 MORF-20321 MG306 MORF-19980 MG307 MORF-19981 MORF-19982 MG308 MORF-20323 MG309 MORF-19983 MORF-19984 MG310 MORF-20324 MG311 MORF-20325 MG312 MORF-20326 MG314 MORF-19985 MORF-19986 MG315 MORF-19987 MORF-19988 MORF-20327 MG316 MORF-19988 MORF-20327 MG317 MORF-20328 MORF-20329 MG318 MORF-19989 MORF-19990 MG319 MORF-20330 MG320 MORF-19991 MG321 MORF-19992 MG322 MORF-19993 MORF-20331 MG323 MORF-19994 MORF-20332 MG324 MORF-19995 MORF-20333 MG326 MORF-20334 MG327 MORF-20335 MG328 MORF-19996 MORF-20336 MG329 MORF-19997 MORF-20337 MG330 MORF-20338 MORF-20339 MG331 MORF-20339 MG332 M0RF-20340 MG333 MORF-19998 MG334 MORF-20341 MG336 MORF-20343 MORF-20344 MG337 MORF-19999 MG338 MORF-20000 MG339 MORF-20001 MORF-20345 MG340 M0RF-20006 MORF-20348 MG341 MORF-20349 MG342 MORF-20350 MG343 MORF-20007 MG344 MORF-20008 MG345 MORF-20351 MG346 MORF-20352 MG348 MORF-20009 MG349 MORF-20010 MG350 MORF-20011 MG351 MORF-20353 MG352 MORF-20354 MG353 MORF-20355 MG354 MORF-20013 MORF-20014 MG355 MORF-20015 MORF-20016 MORF-20356 MG356 MORF-20357 MG357 MORF-20358 MG358 MORF-20017 MORF-20018 MORF-20019 MORF-20359 MG359 MORF-20019 MORF-20359 MORF-20360 MG360 MORF-20361 MG361 MORF-20362 MG362 MORF-20363 MG364 MORF-20364 MG365 MORF-20020 MORF-20365 MG366 MORF-20021 MG367 MORF-20366 MG368 MORF-20022 MORF-20366 MORF-20367 MG369 MORF-20022 MORF-20023 MG370 MORF-20368 MG371 MORF-20368 MORF-20369 MG372 MORF-20370 MG373 MORF-20024 MG374 MORF-20025 MG375 MORF-20371 MG376 MORF-20026 MG377 MORF-20027 MG378 MORF-20372 MG379 MORF-20373 MG380 MORF-20374 MG381 MORF-20028 MG382 MORF-20375 MG383 MORF-20376 MG384 MORF-20029 MORF-20377 MG385 MORF-20031 MORF-20378 MG386 MORF-20032 MORF-20379 MORF-20381 MG387 MORF-20382 MG388 MORF-20383 MG389 MORF-20033 MG390 MORF-20034 MORF-20384 MG391 MORF-20034 MORF-20035 MORF-20385 MG392 MORF-20036 MORF-20037 MORF-20386 MG393 MORF-20038 MG394 MORF-20387 MG395 MORF-20039 MG396 MORF-20388 MG397 MORF-20040 MORF-20041 MG398 MORF-20042 MG399 MORF-20389 MG400 MORF-20390 MG401 MORF-20043 MORF-20391 MG402 MORF-20392 MG403 MORF-20393 MG404 MORF-20394 MG405 MORF-20395 MORF-20396 MG406 MORF-20395 MORF-20396 MG407 MORF-20044 MORF-20397 MG408 MORF-20398 MG409 MORF-20045 MG410 MORF-20046 MORF-20399 MG411 MORF-20400 MG412 MORF-20047 MG413 MORF-20401 MG414 MORF-20048 MG415 MORF-20049 MG416 MORF-20050 MORF-20051 MG417 MORF-20402 MG418 MORF-20052 MG419 MORF-20053 MG420 MORF-20403 MG421 MORF-20404 MG422 MORF-20054 MORF-20055 MG423 MORF-20056 MG425 MORF-20406 MG427 MORF-20057 MG428 MORF-20058 MG429 MORF-20059 MORF-20407 MG430 MORF-20408 MG431 MORF-20409 MG432 MORF-20410 MG433 MORF-20411 MG435 MORF-20060 MORF-20412 MG436 MORF-20060 MORF-20412 MG437 MORF-20413 MG438 MORF-20414 MG439 MORF-20061 MG440 MORF-20062 MG441 MORF-20063 MG442 MORF-20415 MG443 MORF-20064 MG444 MORF-20065 MORF-20416 MG445 MORF-20417 MG447 MORF-20418 MG448 MORF-20419 MORF-20420 MG449 MORF-20419 MORF-20420 MG450 MORF-20066 MG451 MORF-20421 MG452 MORF-20067 MG453 MORF-20422 MG454 MORF-20423 MORF-20424 MG455 MORF-20423 MORF-20424 MG456 MORF-20068 MG457 MORF-20069 MORF-20425 MG458 MORF-20426 MG459 MORF-20070 MG460 MORF-20427 MG461 MORF-20428 MG462 MORF-20429 MG463 MORF-20430 MG464 MORF-20431 MG467 MORF-20432 MG468 MORF-20283 MG469 MORF-20434 MG470 MORF-20071 MORF-20435

[0280]

5 TABLE 2 UID end5 end3 gene_len MG016 19253 19756 504 MG017 19825 20352 528 MG027 30092 30544 453 MG028 30547 31149 603 MG064 74066 77683 3618 MG076 102870 102457 414 MG105 133569 134168 600 MG117 143310 143951 642 MG147 186138 187262 1125 MG185 211445 213547 2103 MG186 216017 216766 750 MG199 237094 236594 501 MG202 239826 240191 366 MG207 247523 247906 384 MG211 250997 251437 441 MG223 268011 269243 1233 MG230 276166 276624 459 MG236 280663 281082 420 MG241 286884 288743 1860 MG243 290976 291323 348 MG246 293936 294778 843 MG256 306819 307586 768 MG267 325157 324813 345 MG279 341181 340528 654 MG284 346853 347248 396 MG286 348260 348847 588 MG296 364414 364028 387 MG306 377974 376796 1179 MG321 402922 400121 2802 MG331 415622 414987 636 MG333 416716 416339 378 MG349 446576 447787 1212 MG350 447790 448722 933 MG354 451197 451607 411 MG366 462619 464619 2001 MG372 471234 470080 1155 MG373 472066 471224 843 MG376 474892 474581 312 MG377 475479 474901 579 MG381 479570 480223 654 MG397 502420 500723 1698 MG415 520238 519929 310 MG419 523215 522355 861 MG427 533270 533692 423 MG428 533806 534318 513 MG436 542092 541739 354 MG439 545378 544563 816 MG440 546154 545381 774 MG449 553295 552864 432 MG450 554269 553559 711 MG452 555665 556447 783 MG468 318330 319202 873

[0281]

6TABLE 3 Whole Genome Sequencing Strategy Stage Description Random small insert and large Randomly shear genomic DNA on the insert library construction order of 2 kb and 15-20 kb, respectively Library plating Maximize random selection of small insert and large insert clones for template production High-throughput DNA Sequence xxx,xxxx templates from both sequencing ends (>99% genome coverage) Assembly (TIGR Assembler, Assembly of sequence fragments into GRASTA) contigs Gap closure a. Physical gaps Order all contigs into a circular genome and provide templates for closure of all physical gaps b. Sequence gaps Complete the genome by primer walking Editing Visual inspection and resolution of all sequence ambiguities when possible, including frameshifts Annotation Identification and description of all ORF's, putative identification, role assignments

[0282]

7TABLE 4 Computer simulation of random sequencing experiments where L = 580,000 and w = 400. Clones Percent of Number Average sequenced genome Base pairs of double gap length (n) unsequenced unsequenced strand gaps (bp) 1000 50.18 291014 501 580 2000 25.18 146016 503 289 4000 6.34 36759 253 145 6000 1.60 9254 97 96 7250 0.67 3886 48 80 8000 0.40 2330 32 72 10000 0.10 586 10 59

[0283]

8TABLE 5 Mycoplasma genitalium - EcoRI fragments 5' Enzyme Start Res 3' Enzyme End Res Length M W EcoRI 572231 EcoRI 1530 9367 5763365 EcoRI 1531 EcoRI 6723 5193 3195384 EcoRI 6724 EcoRI 15283 8560 5266795 EcoRI 15284 EcoRI 25781 10498 6459359 EcoRI 25782 EcoRI 35532 9751 5999831 EcoRI 35533 EcoRI 39821 4289 2639037 EcoRI 39822 EcoRI 43179 3358 2066196 EcoRI 43180 EcoRI 43707 528 324906 EcoRI 43708 EcoRI 49410 5703 3509174 EcoRI 49411 EcoRI 62708 13298 8182420 EcoRI 62709 EcoRI 71387 8679 5340230 EcoRI 71388 EcoRI 80769 9382 5772840 EcoRI 80770 EcoRI 84845 4076 2507946 EcoRI 84846 EcoRI 89622 4777 2939580 EcoRI 89623 EcoRI 93383 3761 2314332 EcoRI 93384 EcoRI 94573 1190 732268 EcoRI 94574 EcoRI 102229 7656 4710994 EcoRI 102230 EcoRI 107347 5118 3149292 EcoRI 107348 EcoRI 110797 3450 2122895 EcoRI 110798 EcoRI 114909 4112 2530290 EcoRI 114910 EcoRI 116440 1531 942140 EcoRI 116441 EcoRI 137514 21074 12967294 EcoRI 137515 EcoRI 144092 6578 4047534 EcoRI 144093 EcoRI 155336 11244 6918646 EcoRI 155337 EcoRI 162136 6800 4184109 EcoRI 162137 EcoRI 163907 1771 1089750 EcoRI 163908 EcoRI 169816 5909 3636217 EcoRI 169817 EcoRI 171885 2069 1273325 EcoRI 171886 EcoRI 176630 4745 2920129 EcoRI 176631 EcoRI 221880 45250 27844584 EcoRI 221881 EcoRI 225692 3812 2345923 EcoRI 225693 EcoRI 228254 2562 1576700 EcoRI 228255 EcoRI 277826 49572 30503951 EcoRI 277827 EcoRI 282740 4914 3023818 EcoRI 282741 EcoRI 285470 2730 1679928 EcoRI 285471 EcoRI 292152 6682 4111409 EcoRI 292153 EcoRI 293879 1727 1062607 EcoRI 293880 EcoRI 312725 18846 11596154 EcoRI 312726 EcoRI 347231 34506 21232617 EcoRI 347232 EcoRI 352330 5099 3137714 EcoRI 352331 EcoRI 362310 9980 6140434 EcoRI 362311 EcoRI 377990 15680 9648201 EcoRI 377991 EcoRI 390080 12090 7439090 EcoRI 390081 EcoRI 402043 11963 7361170 EcoRI 402044 EcoRI 408452 6409 3943775 EcoRI 408453 EcoRI 419230 10778 6631662 EcoRI 419231 EcoRI 422653 3423 2106066 EcoRI 422654 EcoRI 425383 2730 1679735 EcoRI 425384 EcoRI 426391 1008 620235 EcoRI 426392 EcoRI 439467 13076 8046286 EcoRI 439468 EcoRI 444297 4830 2971763 EcoRI 444298 EcoRI 444940 643 395631 EcoRI 444941 EcoRI 452525 7585 4667018 EcoRI 452526 EcoRI 455595 3070 1888976 EcoRI 455596 EcoRI 461533 5938 3653550 EcoRI 461534 EcoRI 467016 5483 3373523 EcoRI 467017 EcoRI 483871 16855 10370549 EcoRI 483872 EcoRI 487269 3398 2090889 EcoRI 487270 EcoRI 488085 816 502090 EcoRI 488086 EcoRI 488496 411 252914 EcoRI 488497 EcoRI 498574 10078 6201025 EcoRI 498575 EcoRI 499113 539 331666 EcoRI 499114 EcoRI 516146 17033 10480304 EcoRI 516147 EcoRI 524998 8852 5446303 EcoRI 524999 EcoRI 527362 2364 1454583 EcoRI 527363 EcoRI 529777 2415 1485826 EcoRI 529778 EcoRI 530256 479 294749 EcoRI 530257 EcoRI 531045 789 485489 EcoRI 531046 EcoRI 533591 2546 1566584 EcoRI 533592 EcoRI 549000 15409 9480966 EcoRI 549001 EcoRI 550638 1638 1007852 EcoRI 550639 EcoRI 563713 13075 8045103 EcoRI 563714 EcoRI 566925 3212 1976345 EcoRI 566926 EcoRI 572230 5305 3264227

[0284]

9 MG# Identification MatchAcc % ID Length MG# Identification MatchAcc % ID Length *MG394 Uridine Kinase (udk) (Escherichia coli) SP: P31218 34.5 204 *MG390 arginyl-tRNA synthetase (argS) (Corynebacterium glutamicum) SP: P35868 33.6 431 Purine ribonucleotide biosynthesis *MG114 asparaginyl-tRNA synthetase (asnS) (Escherichia coli) GP: M33145_1 41.5 449 *MG107 5' guanylate kinase (gmk) (Escherichia coli) GP: L10328_14 42.6 183 *MG036 aspartyl-tRNA synthetase (aspS) (Thermus aqusticus) SP: P36419 40.9 563 *MG175 adenylate kinase (adk) (Bacillus stearothermophilus) GP: M88104_2 32.2 210 *MG258 cysteinyl-tRNA synthetase (cryS) (Bacillus subtilis) GP: D26185_158 34.3 437 *MG058 phosphoribosyloyrophospha- te synthetase (prs) GP: D26185_114 44.4 310 *MG474 glutamyl-tRNA synthetase (gtiX) (Bacillus stearothermophilus) GP: M55072_1 42.9 480 (Bacillus subtilis) Pyrimidine nibonucleotide biosynthesis *MG256 glycyl-tRNA synthetase (Bombyx mori) GP: L06106_1 35.9 574 Salvage of nucleosides and nucleotides *MG035 histidyl-tRNA synthetase (hisS) (Mycobacterium leprae) GP: U00011_2 30.7 386 *MG284 adenine phosphoribosyltransferase (apt) GP: M14040_1 34.1 153 *MG357 Isoleucyl-tRNA synthetase (ileS) (Escherichia coli) SP: P00958 33.3 921 (Escherichia coli) *MG052 cytidine deaminase (cdd) (Mycoplasma pirum) GP: L13289_4 38.2 121 *MG274 leucyl-tRNA synthetase (leuS) (Bacillus stearothermophilus) GP: M88581_1 43.4 799 *MG340 cytidylate kinase (cmk) (Bacillus subtilis) SP: P38493 40.4 215 *MG137 lysyl-tRNA synthetase (lysS) (Bacillus subtilis) GP: D26185_144 45.6 490 MG276 deoxyguanosine/deoxyadenosine kinase(I) subunit 2 GP: U01881_2 29.5 164 *MG377 methlonyl-tRNa lormytransferase (lmt) (Escherichia coli) GP: X63668_2 24.1 304 (Lactobacillus acrdophilus) *MG021 methlonyl-tRNA synthetase (metS) (Bacillus subtilis) GP: D26185_101 37.5 515 *MG470 hypoxanthine-guanine phosphorlbosyltransferase SP: O02522 38.4 170 *MG085 peptidyl-tRNA hyorolase homolog (ptn) (Borrelia burgdorieri) GP: L32144_1 38.2 154 (hpt) (Lactococcus lactus) *MG201 phenylalanyl-tRNA synthetase beta chain (pheT) (Bacillus subtilis) SP: P17922 26.0 677 *MG048 punne-nucleoside phosphorylase (deoD) GP: U14003_295 44.3 228 *MG200 phenylalanyl-tRNA synthetase beta-subunit (pheS) (Escherichia coli) GP: V00291_5 35.1 320 (Escherichia coli) *MG034 thymidine kinase (Bacillus subtilis) GP: M97678_5 48.1 187 *MG292 prolyl-tRNA synthetase (proS) (Escherichia coli) GP: M97858_1 22.7 438 MG051 thymidine phosphorylase (deoA) GP: L13289_3 52.7 416 *MG005 Seryl-tRNA synthetase (serS) (Bacillus subtilis) GP: D26185_77 42.6 416 (Mycoplasma plrum) *MG030 uracil phosphorlbosyltransferase (upp) GP: Z27121_3 44.9 206 *MG387 threonyl-tRNA synthetase (thrSv) (Bacillus subtills) GP: M36594_1 38.7 558 (Mycoplasma hominis) Sugar-nucleotide biosynthesis and conversions *MG457 tRNA (guanine-N1)-methyltransferase (trmD) (Salmonelia SP: P36245 40.8 223 *MG119 UDP-glucose 4-epinerase (galE) (Escherichia coli) SP: P09147 34.1 322 typhimurium) *MG465 UDP-glucose pyrophosphorylase (gtaB) GP: L12272_1 48.0 277 *MG127 tryptophanyl-tRNA synthetase (trpS) (Bacillus subtills) GP: M24068_1 41.2 324 (Bacillus subtilis) *MG466 tyrosyl tRNA synthetase (tyrS) (Bacillus stearothermophilus) GP: M77668_1 38.5 418 Regulatory functions *MG344 valyl-tRNA synthetase (valS) (Bacillus subtilis) SP: O06873 38.5 857 *MG396 GTP-binding protein (obg) (Bacillus subtilis) GP: M24537_2 39.6 426 Degradation of proteins, peptides, and glycopeptides *MG399 GTP-binding protein era homolog (spg) SP: P37214 27.4 273 *MG334 aminopeptidase P (pepP) (Escherichia coli) GP: D00398_1 30.5 254 (Streptococcus mulans) *MG460 pilB homolog transcription repressor GP: Z33052_1 53.5 128 *MG403 aminopeptidase (Mycoplasma salivarium) GP: D17450_1 44.6 303 (Mycoplasma capricolum) *MG420 PILB protein MOTIF (Neisseria gonorrhoeae) SP: P14930 49.2 127 *MG244 ATP-dependent protease (lon) (Bacillus subtilis) SP: P37945 43.6 753 *MG105 virulence associated protein homolog (vacB) GP: U14003_91 29.2 560 *MG367 ATP-dependent protease binding subunit (ctpB) (Escherichia coli) GP: M29364_2 47.7 709 (Escherichia coli) *MG067 glutamic acid specific protease prepropetide (Staphylococcus GP: D00730_1 28.8 250 Replication aureus) Degradation of DNA MG224 IgA1 protease (Haemophilus Influenzae) GP: M87491_1 32.2 675 MG032 ATP-dependent nuclease (addA) (Bacillus subtilis) GP: M63489_1 26.8 706 MG186 oligoendopeptidase F (pepF) (Lactococcus lactis) GP: Z32522_1 30.0 442 MG240 endonuclease IV (nfo) (Escherichia coli) SP: P12638 29.4 267 MG321 proline iminopeptidase (pip) (Bacillus coagulans) GP: D11037_1 29.2 209 DNA replication, restinction, modification, recombination, and repair MG020 proline iminopeptidase (pip) (Neisseria gonorrhoaeae) GP: Z25461_2 37.5 281 *MG481 chromosomal replication initiator protein (dnaA) SP: P34028 30.9 432 *MG046 sialoglycoprotease (gcp) (Pasteurella haemolytica) GP: M62384_1 36.4 313 (Spiroplasma citri) *MG210 DNA gyrase subunit A (Mycoplasma genitalium) GP: U09251_4 37.4 782 Nucleoproteins *MG004 DNA gyrase subunit A (Mycoplasma genitalium) GP: U09251_4 99.9 835 Protein modification and translation factors *MG003 DNA gyrase subunit B (gyrB) GP: U09251_3 99.2 645 *MG090 elongation factor G (fus) (Thermus aquaticus) SP: P13551 59.2 683 (Mycoplasma genitalium) *MG249 DNA helicase II (mutB1) (Haernophilus influenzae) GP: M99049_1 36.0 715 *MG026 elongation factor P (efp) (Escherichia coli) GP: U14003_62 26.4 162 *MG259 DNA ligase (lig) (Escherichia coli) GP: M24278_1 38.2 657 *MG445 elongation factor Ts (lsf) (Spiroplasma citri) GP: M31161_2 39.1 294 *MG269 DNA polymerase I (poll) MOTIF GP: L11920_1 29.9 837 *MG463 elongation factor TU (luf) (Mycoplasma genitalium) SP: P13927 100.0 383 (Mycobacterium tuberculosis) *MG031 DNA polymerase III (polC) (Mycoplasma pulmonis) GP: U06833_1 38.1 1352 *MG176 methionine amino peptidase (Bacillus subtilis) GP: D00619_5 36.3 245 *MG001 DNA polymerase III beta subunit (dnaN) GP: U09251_1 100.0 97 *MG263 peptide chain release factor I (RF-1) (Escherichia coli) GP: M11519_1 43.2 320 (Mycoplasma genitalium) MG007 DNA polymerase III subunit (dnaH) MOTIF GP: D26185_83 22.7 142 *MG108 polypeptide delormylase (lormylmethionine delormylase) (def) SP: P27251 36.9 107 (Bacillus subtilis) *MG432 DNA polymerase III subunit (dnaH) GP: D26185_83 49.1 224 MOTIF (Escherichia coli) (Bacillus subtilis) *MG268 DNA polymerase III, alpha chain (dnaE) GP: M19334_4 31.9 843 MG109 protein phosphatase 2C homolog (ptc1) MOTIF (Saccharomyces SP: P35182 27.5 141 *MG010 DNA primase (dnaE) MOTIF (Clostridium SP: P33655 25.7 174 cerevisiae) acatobutylicum) (Escherichia coli) *MG255 DNA primase (dnaE) (Bacillus subtilis) GP: M10040_1 27.3 587 MG110 protein serine/threonine kinase MOTIF (Arabidopsis thaliana) PIR: S36944 33.7 242 *MG123 DNA topolsomerase I (topA) (Bacillus subtilis) GP: L27797_2 38.9 658 *MG146 protein synthesis initiation factor 2 (inIB) (Bacillus subtilis) GP: M34836_1 48.0 619 *MG433 excinuclease ABC subunit A (uvrA) SP: P07671 47.8 842 *MG447 ribosome releasing factor (irr) (Escherichia coli) GP: D26552_57 34.9 169 (Escherichia coli) *MG075 excinuclease ABC subunit B (uvrB) SP: P07025 48.0 662 *MG291 transcription elongation factor (greA) (Rickettsia prowazekll) SP: P27640 40.1 135 (Escherichia coli) *MG270 formamidopyrimidine-DNA glycosylase (tpg) SP: P19210 37.6 272 *MG202 translation initiation factor IF3 (inIC) Bacillus stearothermophilus) GP: X16188_1 31.3 133 (Bacillus firmus) *MG391 glucose inhibited division protein (gidA) GP: L10328_106 40.3 600 Ribosomal proteins synthesis end modification (Escherichia coli) *MG392 glucose inhibited division protein (gidB) GP: L10328_105 24.8 143 *MG084 ribosomal protein L1 (rpL1) (Bacillus stearothermophilis) SP: P04447 48.2 221 (Escherichia coli) *MG370 Holliday junction DNA helicase (ruvA) GP: M21298_1 26.2 153 *MG373 ribosomal protein L10 (rpL10) (Thermologa maritime) SP: P29394 29.8 162 (Escherichia coli) *MG371 Holliday junction DNA helicase (ruvB) GP: M21298_2 34.7 297 *MG083 ribosomal protein L11 (RPL11) (Thermologa maritime) SP: P29395 51.8 140 (Escherichia coli) MG187 methyltransferase (ssoIM) GP: M97479_2 42.5 314 *MG430 ribosomal protein L13 (Escherichia coli) SP: P02410 39.9 137 (Shigella sonnei) *MG349 recombination protein (recA) GP: L25893_1 46.6 292 *MG165 ribosomal protein L14 (rpL14) (Bacillus stearothermophilus) SP: P04450 63.1 121 (Staphylococcus aureus) *MG095 replicative DNA helicase (dnaB) SP: P03005 33.1 439 *MG173 ribosomal protein L15 (rpL15) (Mycoplasma capricolum) SP: P10138 41.9 144 (Escherichia coli) MG450 restriction-modification enzyme EcoD GP: J01631_1 24.6 390 *MG162 ribosomal protein L16 (rpL16) (Mycoplasma capricolum) SP: P02415 63.5 136 specificity subunit (nsdS) (Escherichia coli) *MG161 ribosomal protein L17 (rpL17) (Bacillus subtilis) GP: M26414_6 34.8 115 *MG047 S-adenosylmethlonine synthetase 2 (metX) SP: P30869 43.8 363 *MG171 ribosomal protein L18 (rpL18) (Bacillus stearothermophilus) GP: M57624_1 43.0 113 (Escherichia coli) *MG092 single-stranded DNA binding protein (ssb) GP: U04997_2 21.8 162 *MG456 ribosomal protein L19 (rpL19) (Bacillus stearothermophilus) SP: P30529 49.1 111 (Haemophilus Influenzae) *MG209 topoisomerase II subunit B (topIIB) GP: L35044_2 52.4 630 *MG158 ribosomal protein L2 (rpL2) (Bacillus stearothermophilus) SP: P04257 58.4 273 (Mycoplasma gallisepticum) *MG098 uracil DNA glycosylase (ung) (Escherichia coli) GP: D13169_3 32.6 217 *MG238 ribosomal protein L21 (rpL21) (Bacillus subtilis) SP: P26908 37.9 98 *MG160 ribosomal protein L22 (rpL22) (Mycoplasma-like organism) GP: M74770_4 49.0 103 Transcription *MG157 ribosomal protein L23 (Bacillus stearothermophilus) SP: P04454 38.7 89 Degradation of RNA *MG166 ribosomal protein L24 (Bacillus stearothermophilus) SP: P04455 44.6 83 *MG379 ribonuclease III (rnc) (Escherichia coli) GP: X02673_1 30.2 118 *MG239 ribosomal protein L27 (rpL27) (Bacillus subtilis) GP: K02665_2 64.4 88 *MG477 RNaseP C5 subunit (Mycoplasma capricolum) GP: D14982_2 40.0 78 *MG163 ribosomal protein L29 (Thermologa mantima) SP: P38514 41.7 59 RNA synthesis, modification, and DNA transcription *MG155 ribosomal protein L3 (rpL3) (Mycoplasma capricolum) SP: P10134 42.6 213 *MG319 ATP-dependent RNA helicase (deaD) SP: P23304 23.1 369 *MG335 ribosomal protein L33 (Bacillus stearothermophilus) SP: P23375 58.1 42 (Escherichia coli) *MG437 ATP-dependent RNA helicase (deaD) SP: P23304 32.4 390 *MG478 ribosomal protein L34 (rpL34) (Escherichia coli) GP: L10328_67 67.4 45 (Escherichia coli) *MG352 DNA-directed RNA polymerase beta' chain (rpoC) SP: P00577 44.5 1348 *MG156 ribosomal protein L4 (rpL4) (Bacillus stearothermophilus) SP: P28601 39.2 205 (Escherichia coli) *MG018 helicase (mol1) MOTIF (Saccharomyces cerevisiae) SP: P32333 36.5 502 *MG167 ribosomal protein L5 (rpL5) (Bacillus stearothermophilus) SP: P08895 57.5 178 *MG145 N-utilization substance protein A homolog (nusA) SP: P32727 30.9 360 *MG170 ribosomal protein L6 (rpL6) (Mycoplasma capricolum) SP: P04446 46.4 179 (Bacillus subtilis) *MG180 RNA polymerase alpha-core-subunit (rpoA) GP: M26414_5 39.4 295 *MG374 ribosomal protein L7/L12 (`A`type) (rpL7/L12)(Bacillus subtilis) SP: P02394 47.5 118 (Bacillus subtilis) *MG353 RNA polymerase beta-subunit (rpoB) GP: L24376_3 46.5 1144 *MG094 ribosomal protein L9 (rpL9) (Bacillus stearothermophilus) GP: M57623_1 32.9 148 (Bacillus subtilis) MG022 RNA polymerase delta-subunit (rpoE) GP: M21677_1 28.7 152 *MG154 ribosomal protein S10 (rpS10) (Thermologa maritime) SP: P38518 48.9 91 (Bacillus subtilis) *MG254 RNA polymerase sigma-A factor (sigA) SP: P33656 43.7 370 *MG179 ribosomal protein S11 (rpS11) (Escherichia coli) GP: X02543_2 47.8 112 (Clostridium acetobutylicum) *MG054 transcription antitermination factor (nusG) GP: D13303_4 30.9 171 *MG088 ribosomal protein S12 (rpS12) (Bacillus stearothermophilus) SP: P09901 75.4 133 (Bacillus subtilis) *MG178 ribosomal protein S13 (rpS13) (Bacillus subtilis) GP: M26414_3 63.3 119 Translation *MG168 ribosomal protein S14 (Mycoplasma capricolum) GP: X06414_15 70.0 59 Amino acyl tRNA synthetases and tRNA modification *MG438 ribosomal protein S15 (BS18) (Bacillus stearothermophilus) SP: P05768 48.1 80 *MG303 alanyl-tRNA-synthetase (alaS) (Escherichia coli) GP: J01581_1 33.8 795 *MG458 ribosomal protein S16 (BS17) (Bacillus subtilis) SP: P21474 48.8 81 Amino acid biosynthesis Central Intermediary metabolism Aromatic amino acid family Amino sugars Aspartate family Degradation of polysaccharides Branched chain family *MG222 bifunctional endo-1,4-beta-xylanase xyla precursor MOTIF SP: P29126 37.6 240 Glutamate family (Ruminococcus flavefaciens) Pyruvate family Other Sarine family *MG369 acetate kinase (Bacillus subtilis) GP: L17320_2 42.7 391 *MG406 serine hydroxymethyltransferase (glyA) SP: P06192 55.3 397 *MG038 glycerol kinase (glpK) (Escherchia coli) GP: L19201_68 46.8 498 (Salmonella typhimurium) *MG304 glycerophosphoryl diester phosphodiesterase (glpO) (Bacillus subtilis) SP: P37965 30.4 235 Biosynthesis of cofactors, prosthetic groups, and carriers *MG310 phosphotransacetylase (Closindium acetobutylicum) SP: P39648 44.7 320 Biotin Phosphorus compounds Folic acid *MG363 Inorganic pyrophosphalase (ppa) (Thermoplasma acidophilum) SP: P37981 38.9 156 *MG013 5,10-methylene-tetrahydrofolate dehydrogenase GP: D10588_1 33.0 238 Polyamine biosynthesis (foID) (Escherichia coli) Polysaccharides --(cytoplasmic) *MG234 dihydrofolate reductase GP: X60681_1 33.1 166 Sulfur metabolism (Lactococcus lactis) Hemo and porphyrin *MG264 protoporphyrinogen oxidase (hernK) GP: D28567_2 30.6 160 Energy metabolism (Escherichia coli) Locate Aerobic Menaquinone and ubiquinone *MG039 glycerol-3-phospate dehydrogenase (GUT2) (Saccharomyces PIR: S48379 43.2 212 Molybdopterin cerevisiae) Pantothenate *MG472 L-lactate dehydrogenase (ldh) (Mycoplasma hyopneumoniae) SP: P33572 50.3 312 Pyndoxne MG283 NADH oxidase (nox) (Enterococcus faecalis) SP: P37061 39.2 433 Riboflavin Amino acids and amines Thoredoxin, glutaredoxin, and glutathione Anaerobic ATP-proton motive force interconversion Cell envelope *MG410 ATP synthase epsilon chain (atpC) (Mycoplasma gallisepticum) SP: P33255 36.9 129 Membranes, lipoproteins, and portis *MG411 ATP synthase beta chain (atpD) (Mycoplasma gallisepticum) SP: P33253 81.0 377 MG328 fibronectin-binding protein (fnbA) GP: J04151_1 24.6 913 *MG412 ATP synthase gamma chain (atpG) (Mycoplasma gallisepticum) SP: P33257 37.9 285 (Staphylococcus aureus) MG040 membrane lipoprotein (tmpC) (Treponema pallidum) SP: P29724 30.9 248 *MG413 ATP synthase alpha chain (atpA) (Mycoplasma gallisepticum) SP: P33252 63.4 517 *MG087 prolipoprotein diacylglyceryl transferase GP: L13259_2 29.1 261 *MG414 ATP synthase delta chain (atpH) (Mycoplasma gallisepticum) SP: P33254 33.9 168 (Salmonella typhimurium) Murein sacculus and peptidoglycan *MG415 ATP synthase B chain (atpF) (Mycoplasma gallisepticum) SP: P33258 36.6 192 Surface polysaccharides, lipopolysaccharides and antigens *MG416 ATP synthase C chain (atpE) (Mycoplasma gallisepticum) SP: P33258 50.0 77 *MG368 lic-1 operon protein (licA) MOTIF GP: M27280_1 27.8 152 *MG417 adenosinetriphosphatase (atpB) (Mycoplasma gallisepticum) GP: X64256_2 35.7 292 (Haemophilus Influenzae) *MG060 lipopolysaccharide biosynthesis protein SP: P26401 36.1 185 Electron transport (rfbV) MOTIF (Salmonella typhimurium) Entner-Doudoroff *MG277 surface protein antigen precursor (pag) GP: D90354_1 25.5 797 Fermentation MOTIF (Streptococcus sobrinus) Gluconeogenesis Surface structures Glycolysis MG196 attachment protein (mgpA) SP: P20796 100.0 1443 *MG063 1-phosphotructokinase (lruK) (Escherichia coli) SP: P23539 26.3 268 (Mycoplasma genitalium) MG190 attachment protein repeat (mgpA) SP: P20796 36.6 903 *MG220 6-phosphotructokinase (phosphotructokinase) (phosphohexokinase) SP: P20275 39.4 321 (Mycoplasma genitalium) MG267 attachment protein repeat (mgpA) SP: P20796 38.0 963 (Spiroplasma citn) (Mycoplasma genitalium) MG188 attachment protein repeat (mgpA) SP: P20796 61.8 943 *MG419 enolase (Bacillus subtilis) GP: L29475_4 54.1 425 (Mycoplasma

genitalium) MG069 attachment protein repeat (mgpA) SP: P20796 76.4 760 *MG023 fructose-bisphosphate aidolase (tsr) (Bacillus subtilis) GP: M22039_4 46.0 282 (Mycoplasma genitalium) MG189 attachment protein repeat (mgpA) SP: P20796 77.9 763 *MG312 glyceraldehyde-3-phospha- te dehydrogenase (gap) (Clostridium GP: X72219_1 56.1 329 (Mycoplasma genitalium) MG232 attachment protein repeat (mgpA) SP: P20796 78.2 86 pasteurianum) (Mycoplasma genitalium) MG297 attachment protein repeat (mgpA) SP: P20796 80.2 756 *MG112 phosphoglucose Isomerase B (pgiB) (Bacillus stearothermophilus) SP: P13376 34.8 424 (Mycoplasma genitalium) MG141 attachment protein repeat (mgpA) SP: P20796 80.3 753 *MG311 phosphoglycerate kinase (Thermotoga maritima) SP: P36204 51.3 383 (Mycoplasma genitalium) *MG198 attachment protein repeat (mgpA) SP: P20796 81.3 753 MG442 phosphoglycerate mutase (pgm) (Bacillus subtilis) GP: L29475_3 45.2 510 (Mycoplasma genitalium) MG266 attachment protein repeat (mgpA) SP: P20796 82.2 753 *MG221 pyruvate kinase (pyk) (Lactococcus lactis) GP: L07920_2 35.3 467 (Mycoplasma genitalium) MG351 attachment protein repeat (mgpA) SP: P20796 84.3 734 *MG443 triosephosphate Isomerase (tim) Thermotoga maritima) GP: L27492_1 39.8 247 (Mycoplasma genitalium) *MG398 Cylacherence-accessory protein (hmw1) GP: U11381_1 34.1 876 Pentose phosphate pathway (Mycoplasma pneumoniae) MG323 Cylacherence-accessory protein (hmw1) GP: U11381_1 39.3 1015 *MG272 6-phosphogluconate dehydrogenase (gnd) (Escherichia coli) GP: M64324_1 29.9 440 (Mycoplasma pneumoniae) *MG327 Cylacherence-accessory protein (hmw3) GP: M82965_1 41.1 669 *MG066 transketolase 1 (TK 1) (tk1A) (Escherichia coli) SP: P27302 32.8 647 (Mycoplasma pneumoniae) Pyruvate dehydrogenase Cellular processes *MG280 dihydrolipoamide acetyltransferase (pdhC) (Acholeplasma Isidiawi) GP: M81753_3 45.2 524 Cell division *MG279 lipoamide dehydrogenase component (E3) of pyruvate dehydrogenase SP: P11959 38.4 453 *MG469 cell division protein (ftsH) (Bacillus subtilis) GP: D26185_132 49.7 627 complex dihydrolipoamide dehydrog *MG308 cell division protein (ftsY) (Escherichia coli) GP: U00039_18 36.1 323 *MG282 pyruvate dehydrogenase E1-alpha subunit (pdhA) (Acholeplasma GP: M81753_1 43.0 341 *MG229 cell division protein (ftsZ) (Staphylococcus aureus) GP: U06462_1 30.9 274 ladiawi) Cell killing *MG281 pyruvate dehydrogenase E1-beta subunit (pdhB) (Acholeplasma GP: M81753_2 55.0 317 *MG150 hemolysin (ftyC) (Serpulina hyodysenterise) GP: X73141_2 26.3 234 ladiawi) MG225 pre-procylotoxin (Helicobacter pylon) GP: Z26883_1 36.1 789 Sugars Chaperones *MG113 D-ribulose-5-phosphate 3 epimerase (ctxEc) (Aicafigenes eutrophus) GP: M64173_3 33.1 175 *MG404 groEL protein (Bacillus stearothermophilus) GP: L10132_2 51.5 524 *MG050 deoxyribose-phosphate aidolase (deoC) (Mycoplasma pneumoniae) GP: X13544_1 83.0 223 *MG206 heat shock protein (dnaJ) MOTIF (Coxiella burnatil) GP: L36455_1 33.6 349 MG408 galactosidase acetyltransferase (Streptococcus mutans) GP: M80797_2 40.3 135 MG002 heat shock protein (dnaJ) MOTIF SP: P35514 40.0 60 *MG053 phosphomannomutase (cpsG) (Mycoplasma pirum) GP: L13289_5 38.6 534 (Lactococcus lactis) *MG019 heat shock protein (dnaJ) (Lactococcus lactis) SP: P35514 34.0 357 TCA cycle *MG207 heat shock protein (grpE) (Bacillus subtilis) GP: M84964_2 31.7 158 *MG405 heat shock protein 60 (GroEL) like protein GP: D17398_1 39.6 87 Fatty acid and phospholipid metabolism (PggroES) (Porphyromonas gingivalis) *MG217 1-acyl-sn-glycerol-3-pho- sphate acetyltransferase (ptsC) (Borrella GP: L32881_1 32.1 119 *MG316 heat shock protein 70 (HSP70) GP: D30690_3 57.3 580 burgdorfen) (Staphylococcus aureus) Detoxification *MG448 CDP-diglyceride synthetase (cdsA) (Escherichia coli) GP: M11330_1 38.0 120 *MG008 thiophene and furan oxidizer (tohF) GP: D26185_60 31.9 456 MG380 fatty acid phosphol-pid synthesis protein (ptsX) (Escherichia coli) GP: M96793_1 29.0 327 (Bacillus subtilis) Protein and peptide secretion MG086 hydroxymethylglutaryl-CoA reductase (NADPH) PIR: S24760 23.3 502 (Nicotiana sylvestris) *MG139 GTP-binding membrane protein (lepA) GP: K00426_1 47.5 589 *MG115 phosphatidylglycerophosphate synthase (pgsA) (Escherichia coli) GP: M12299_2 29.3 156 (Escherichia coli) *MG182 haemolysin secretion ATP-binding protein SP: P11599 34.6 236 (hlyB) MOTIF (Proteus vulgaris) Purines, Pyrimidines, nucleosides, and nucleotides *MG074 preprotein translocase (secA) (Bacillus sutilis) GP: D10279_2 43.7 764 2'-Deoxyribonucleotide metabolism *MG174 preprotein translocase secY subunit (SecY) SP: P10250 38.8 449 *MG237 ribonucleoside-diphosphate reductase (nrdE) (Salmonella GP: X73226_1 54.1 703 (Mycoplasma capricolum) MG215 prolipoprotein signal peptidase (lsp) GP: M83994_1 32.4 145 typhimurium) (Staphylococcus aureus) *MG049 signal recognition particle protein (lfh) SP: P37105 43.0 439 *MG235 ribonucleotide reductase 2 (nrdF) (Salmonella typhimurium) SP: P17424 50.0 313 (Bacillus subtilis) *MG243 trigger factor (tig) Escherichia coli) GP: M34066_1 24.6 391 *MG125 thioredoxin (trx) (Bacillus subtilis) GP: J03294_1 36.1 98 Transformation *MG103 thioredoxin reductase (trxB) (Escherichia coli) GP: J03762_1 38.6 299 MG326 competence locus E (comE3) MOTIF GP: L15202_4 30.5 239 *MG233 thymidylate synthase (thyA) (Staphylococcus aureus) SP: P13954 56.6 311 (Bacillus subtilis) Nucleotide and nucleoside interconversions *MG164 ribosomal protein S17 (Mycoplasma capricolum) SP: P10131 51.2 82 *MG309 115 kDa protein (p115) (Mycoplasma hyorhinis) GP: M34958_1 33.4 975 *MG093 ribosomal protein S18 (rpS18) (Escherichia coli) GP: U14003_114 45.5 64 *MG065 heterocysl maturation protein (devA) (Anabaena sp.) GP: X75422_1 35.3 221 *MG159 ribosomal protein S19 (Escherichia coli) GP: X02613_6 58.6 86 *MG479 heterocysl maturation protein (devA) (Anabaena sp.) GP: X75422_1 39.9 198 *MG072 ribosomal protein S2 (Spirulina platensis) SP: P34831 34.8 247 MG100 hydrolase (aux2) (Agrobacterium rhizogenes) GP: M61151_1 32.1 458 *MG161 ribosomal protein S3 (rpS3) SP: P02353 46.7 212 *MG223 macrogolgin (Homo sapiens) PIR: S37538 25.3 3055 (Mycoplasma capricolum) *MG322 ribosomal protein S4 (rpS4) (Bacillus subtilis) GP: M59358_1 43.0 197 *MG337 magnesium-chelatase 30 kDa subunit (bchO) (Rhodobacter SP: P26174 26.7 245 *MG172 ribosomal protein S5 (Bacillus stearothermophilus) GP: M57621_1 56.0 157 capsulatus) *MG012 ribosomal protein S6 modification protein (nmK) SP: P17116 31.5 127 *MG315 membrane associated ATPase (cbrO)) (Propionibacterium GP: U13043_1 30.0 227 MOTIF (Escherichia coli) Ireudenrechii) *MG091 ribosomal protein S6 (Escherichia coli) SP: P02358 23.9 87 MG376 mobilization protein (mob13) MOTIF (Leuconostoc cenos) GP: M95954_1 30.9 161 *MG089 ribosomal protein S7 rpS7) SP: P22744 64.9 153 MG372 muc8 protein (muc8) (Salmonella typhimurium) SP: P14303 22.1 331 (Bacillus stearothermophilus) *MG169 ribosomal protein S8 (Mycoplasma capriocolum) SP: P04446 46.9 125 *MG346 nitrogen fixation protein (nifS) (Mycobacterium leprae) GP: U00013_6 26.2 358 *MG429 ribosomal protein S9 (rpS9) SP: P07842 52.0 125 MG296 nodulation protein F (host-specificity of nodulation protein A) SP: P04686 34.9 86 (Bacillus stearothermophilus) (Rhizobium legumnosarum) Transport and binding protein MG299 protein L (Peptostreptococcus magnus) GP: L04466_1 31.1 663 Amino acids, peptides and amines *MG338 protein V (IcrV) (Streptococcus sp.) GP: X62467_1 28.3 478 MG231 aromatic amino acid transport protein (aroP) GP: D26562_11 24.6 389 *MG149 protein X (Pseudomonas fluorescens) GP: M35367_1 29.1 280 (Escherichia coli) *MG314 membrane transport protein (gtnQ) GP: M61017_1 32.0 219 MG132 protein X (Spiroplasma citn) GP: M31161_3 21.6 88 (Bacillus stearothermophilus) *MG183 membrane transport protein (gtnQ) GP: M61017_1 37.4 210 *MG288 sensory rhodopsin II transducer (hiril) MOTIF (Natronobacterium GP: Z35088_1 15.7 208 (Bacillus stearothermophilus) *MG081 oligopeptide transport ATP-binding protein SP: P18765 47.9 336 pharaonis) (amiE) (Streptococcus pneumoniae) *MG059 small protein (smpB) (Escherichia coli) GP: D12501_1 32.6 128 *MG082 oligopeptide transport ATP-binding protein (amiF) SP: P18766 46.6 250 (Streptococcus pneumoniae) Hypothetical *MG080 oligopeptide transport system permease protein SP: P26904 33.5 269 MG142 hypothetical 130K protein (P1 operon) MOTIF (Mycoplasma PIR: JS0069 55.4 512 (dciAC) (Bacillus subtilis) pneumoniae *MG079 oligopeptide transport system permease protein SP: P24138 28.1 308 MG199 hypothetical 130K protein (P1 operon) (Mycoplasma pneumoniae) PIR: JS0069 45.2 570 (oppB) (Bacillus subtilis) MG195 hypothetical 28K protein (P1 operon) (Mycoplasma pneumoniae) PIR: JS0068 61.7 239 *MG042 spermidine/putrescine transport ATP-binding protein GP: M64519_1 41.9 262 *MG342 hypothetical protein (GB: D10165_3) (Escherichia coli) GP: D10165_3 28.9 233 (potA) (Escherichia coli) *MG227 hypothetical protein (GB: D10483_63) (Escherichia coli) GP: D10483_63 35.2 304 *MG043 spermidine/putrescine transport system permease GP: M64519_2 26.5 221 *MG476 hypothetical protein (GB: D14982_3) (Mycoplasma capricolum) GP: D14982_3 32.0 377 protein (potB) (Escherichia coli) MG455 hypothetical protein (GB: D16311_1) (Bacillus subtilis) GP: D16311_1 26.2 267 *MG044 spermidine/putrescine transport system permease GP: M64519_3 29.5 252 MG383 hypothetical protein (GB: D26185_10) (Bacillus subtilis) GP: D26185_10 25.8 221 protein (potC) (Escherichia coli) *MG009 hypothetical protein (GB: D26185_102) (Bacillus subtilis) GP: D26185_102 35.4 249 Anions MG057 hypothetical protein (GB: D26185_104) (Bacillus subtilis) GP: D26185_104 28.9 175 *MG422 peripheral membrane protein B (pstB) GP: L10328_89 50.8 244 *MG024 hypothetical protein (GB: D26185_50) (Bacillus subtilis) GP: D26185_50 51.1 363 (Escherichia coli) MG421 peripheral membrane protein U (Escherichia coli) GP: L10328_88 27.0 169 *MG006 hypothetical protein (GB: D26185_92) (Bacillus subtilis) GP: D26185_92 41.5 178 *MG423 periplasmic phosphate permease homolog (AG88) GP: X75297_1 30.8 254 *MG056 hypothetical protein (GB: D26185_99) (Bacillus subtilis) GP: D26185_99 29.3 275 (Mycobacterium tuberculosis) MG333 hypothetical protein (GB: D37799_6) (Bacillus subtilis) GP: D37799_6 27.6 211 Carbohydrates, organic alcohols, and acids MG459 hypothetical protein (GB: L08897_1) (Mycoplasma gallisepticum) GP: L08897_1 34.1 138 *MG192 ATP-binding protein (msmK) GP: M77351_7 40.5 357 MG218 hypothetical protein (GB: L09228_16) (Bacillus subtilis) GP: L09228_16 27.1 238 (Streptococcus mutans) *MG062 fructose-permease IIBC component (truA) SP: P20966 42.7 416 MG219 hypothetical protein (GB: L09228_17) (Bacillus subtilis) GP: L10228_17 34.9 174 (Escherichia coli) *MG033 glycerol uptake facilitator (glpF) GP: M99611_2 35.9 189 *MG273 hypothetical protein (GB: L10328_61) (Escherichia coli) GP: L10328_61 27.2 267 (Bacillus subtilis) MG061 hexosephosphate transport protein (uhpT) GP: M89480_4 30.9 158 *MG271 hypothetical protein (GB: L10328_61) (Escherichia coli) GP: L10328_61 27.8 250 (Salmonella typhimurium) *MG193 membrane protein (msmF) (Streptococcus mutans) GP: M77351_4 22.5 263 MG126 hypothetical protein (GB: L10328_61) (Escherichia coli) GP: L10328_61 31.9 252 MG194 membrane protein (msmG) GP: M77351_5 27.1 272 MG140 hypothetical protein (GB: L18927_2) (Buchnera aphidicola) GP: L18927_2 28.6 68 *MG120 methylgalactoside permease ATP-binding protein GP: M59444_2 33.2 487 MG152 hypothetical protein (GB: L18965_6) (Thermophilic bacterial sp.) GP: L18965_6 25.3 170 (mglA) (Escherichia coli) MG305 hypothetical protein (GB: L9201_18) (Escherichia coli) GP: L19201_18 23.1 328 *MG441 PEP-dependent HPr protein kinase GP: M69050_2 46.5 570 MG029 hypothetical protein (GB: L19300_1) (Staphylococcus aureus) GP: L19300_1 27.0 109 phosphoryltransferase (ptsI) MG425 hypothetical protein (GB: L22432_4) (Mycoplasma capricolum) GP: L22432_4 25.0 94 (Staphylococcus camosus) MG041 phosphohistidinoprotein-hexose phosphotransferase GP: L22432_2 48.9 86 MG250 hypothetical protein (GB: M12965_1) (Escherichia coli) GP: M12965_1 33.8 64 (ptsH) (Mycoplasma capricolum) *MG135 hypothetical protein (GB: M38777_3) (Escherichia coli) GP: M38777_3 28.6 98 *MG071 phosphotransferase enzyme II, ABC component SP: P20166 43.2 620 *MG358 hypothetical protein (GB: M65289_3) (Bacillus stearothermophilus) GP: M65289_3 38.0 155 (ptsG) (Bacillus subtilis) MG211 hypothetical protein (GB: M84964_1) (Bacillus subtilis) GP: M84964_1 30.7 341 MG130 PTS glucose-specific permease GP: U12340_1 25.5 108 MG124 hypothetical protein (GB: M91593_1) (Mycoplasma mycoides) GP: M91593_1 24.0 249 (Bacillus stearothermophilus) *MG121 ribose transport system permease protein RBSC SP: P36948 27.5 199 MG245 hypothetical protein (GB: M91593_1) (Mycoplasma mycoides) GP: M91593_1 27.8 130 (Bacillus subtilis) Cations MG131 hypothetical protein (GB: M91593_1) (Myocoplasma mycoides) GP: M91593_1 30.7 246 MG073 cation-transporting ATPase (pacL) SP: P37278 34.4 887 *MG400 hypothetical protein (GB: U00016_19) (Mycobacterium leprae) GP: U00016_19 30.9 106 (Synechococcus sp) Nucleosides, purines and pyrimidines *MG129 hypothetical protein (GB: U00021_19) (Mycobacterium leprae) GP: U00021_19 27.7 152 Other *MG454 hypothetical protein (GB: U00021_5) (Mycobacterium leprae) GP: U00021_5 26.9 150 *MG301 ATP-binding protein P29 (Mycoplasma hyorhinis) SP: P15361 32.3 227 *MG339 hypothetical protein (GB: U00021_5) (Mycobacterium leprae) GP: U00021_5 32.9 430 *MG402 lactococcin transport ATP-binding protein (lcnDR3) SP: P37608 22.3 654 MG364 hypothetical protein (GB: U11883_2) (Bacillus subtilis) GP: U11883_2 33.3 167 (Laciococcus tactis) MG230 hypothetical protein (GB: U14003_71) (Escherichia coli) GP: U14003_71 22.0 481 MG332 Na + ATPase subunit J (ntpJ) (Enterococcus hirae) GP: D17462_11 31.1 436 *MG111 hypothetical protein (GB: U14003_76) (Escherichia coli) GP: U14003_76 28.6 230 MG300 protein P37 precursor (Mycoplasma hyorhinis) SP: P15363 35.8 331 MG473 hypothetical protein (GB: X73124_94) (Bacillus subtilis) GP: X73124_94 40.0 68 *MG014 transport ATP-binding protein (msbA) SP: P27299 28.1 518 MG265 hypothetical protein (GB: Z32651_1) (Mycoplasma pneumoniae) GP: Z32651_1 57.1 41 (Escherichia coli) *MG015 transport ATP-binding protein (msbA) SP: P27299 32.2 482 *MG257 hypothetical protein (GB: Z33076_2) (Mycoplasma capricolum) GP: Z33076_2 37.7 210 (Escherichia coli) *MG418 transport system permease protein P69 MOTIF SP: P15362 40.0 252 *MG147 hypothetical protein (SP: P09170) (Escherichia coli) SP: P09170 24.1 109 (Mycoplasma hyorhinis) *MG128 hypothetical protein (SP: P19434) (Streptomyces vindochromogenes) SP: P19434 26.0 105 MG302 transport system permease protein P69 SP: P15362 27.9 524 *MG226 hypothetical protein (SP: P22186) (Escherichia coli) SP: P22186 28.9 148 (Mycoplasma hyorhinis) *MG382 hypothetical protein (SP: P23851) (Escherichia coli) SP: P23851 27.0 253 Other categories *MG214 hypothetical protein (SP: P23851) (Escherichia coli) SP: P23851 30.5 295 Adaptations and atypical conditions *MG306 hypothetical protein (SP: P25745) (Escherichia coli) SP: P25745 34.7 123 MG467 osmotically inducible protein (osmC) SP: P23929 28.4 88 MG444 hypothetical protein (SP: P27712) (Spiroplasma citri) SP: P27712 28.4 231 (Escherichia coli) MG640 phosphate limitation protein (sphX) GP: D26161_1 30.9 271 *MG252 hypothetical protein (SP: P31056) (Escherichia coli) SP: P31058 33.0 180 (Synechococcus sp) MG482 SpoOJ regulator MOTIF (Bacillus subtilis) GP: D26185_55 27.5 245 MG116 hypothetical protein (SP: P31131) (Escherichia coli) SP: P31131 32.6 45 *MG285 spore germination apparatus protein (gerBB) GP: L16960_2 31.2 128 *MG359 hypothetical protein (SP: P32049) (Escherichia coli) SP: P32049 28.5 128 MOTIF (Bacillus subtilis) MG480 hypothetical protein (SP: P32049) (Escherichia coli) SP: P32049 28.5 128 MG395 sporulation protein (outB) MOTIF (Bacillus subtilis) GP: M15811_1 36.4 235 *MG133 hypothetical protein (SP: P32083) (Mycoplasma hyorhinis) SP: P32083 30.1 102 Colicin-related functions *MG122 hypothetical protein (SP: P32720) (Escherichia coli) SP: P32720 30.9 132 Drug and analog sensitivity MG138 hypothetical protein (SP: P37747) (Escherichia coli) SP: P37747 34.1 363 *MG475 high level kasgamycin resistance (ksgA)

GP: D26185_105 35.6 224 *MG345 hypothetical protein (SP: P38424) (Bacillus subtilis) SP: P38424 33.9 167 (Bacillus subtilis) Phage-related functions and prophages *MG136 hypothetical protein 4 (Trypanosoma brucei) PIR: E22845 30.8 302 Radiation sensitivity *MG286 sinngent response-like protein (rel) (Streptococcus equisimllis) GP: X72832_5 29.1 713 Transposon-related functions MG338 U3 protein (Bacillus subtilis) GP: Z18629_1 27.1 272 Other MG278 $$F protein (Escherichia coli) GP: U14003_297 38.3 302

[0285]

10TABLE 7 Summary of gene content in H. influenzae and M. genitalium sorted by functional category Biological role H. influenzae M. genitalium Amino acid biosynthesis 68 (6.8%) 1 (0.3%) Biosynthesis of cofactors 54 (5.4%) 3 (0.8%) Cell envelope 84 (8.3%) 21 (5.8%) Cellular processes 53 (5.3%) 21 (5.8%) Cell division 16 3 Cell killing 5 2 Chaperones 6 7 Detoxification 3 1 Protein secretion 15 7 Transformation 8 1 Central intermediary metabolism 30 (3%) 6 (1.7%) Energy metabolism 112 (10 4%) 31 (8.5%) Aerobic 4 3 Amino acids and amines 4 0 Anerobic 24 0 ATP-proton force interconversion 9 8 Electron transport 9 0 Entner-Doudoroff 9 0 Fermentation 8 0 Gluconeogenesis 2 0 Glycolysis 10 10 Pentose phosphate pathway 3 2 Pyruvate dehydrogenase 4 4 Sugars 15 4 TCA cycle 11 0 Fatty acid and phospholipid metabolism 25 (2.5%) 5 (1.4%) Purines, pyrimidines, nucleosides and 53 (5 3%) 20 (5.4%) nucleotides 2' Deoxyribonucleotide metabolism 8 5 Nucleotide and nucleoside 3 1 interconversions Purine ribonucleotide biosynthesis 18 3 Pyrimidine ribonucleotide biosynthesis 5 0 Salvage of nucleosides and nucleotides 13 9 Sugar-nucleotide biosynthesis and 6 2 conversions Regulatory functions 64 (6.3%) 5 (1.4%) Replication 87 (8.6%) 32 (8.8%) Degradition of DNA 8 2 DNA replication, restriction, 76 30 modification, recombination and repair Transcription 27 (2.7%) 12 (3.3%) Degradation of RNA 10 2 RNA synthesis, modification and DNA 17 10 transcription Translation 141 (14%) 90 (24.7%) Transport and binding proteins 123 (12 2%) 34 (9.3%) Amino acids and peptides 38 10 Anions 8 3 Carbohydrates 30 12 Cations 24 1 Other transporters 22 8 Other Categories 93 (9.2%) 23 (6.3%) Unassigned role 736 (43%) 178 (37%) No database match 389 117 Match hypothetical proteins 347 61

[0286]

Sequence CWU 0

0

* * * * *

References

psort.nibb.ac.jp