Conserved Nucleotide Elements In Ribosomal RNA Gerbi; Susan A. ; et al. [Brown University]

Conserved Nucleotide Elements In Ribosomal RNA

Gerbi; Susan A. ; et al.

Patent Application Summary

U.S. patent application number 14/204223 was filed with the patent office on 2014-09-18 for conserved nucleotide elements in ribosomal rna. This patent application is currently assigned to Brown University. The applicant listed for this patent is Brown University. Invention is credited to Julia Beamesderfer, Stephen M. Doris, Susan A. Gerbi, Benjamin J. Raphael, Deborah R. Smith.

Application Number	20140278134 14/204223
Document ID	/
Family ID	51531643
Filed Date	2014-09-18

United States Patent Application	20140278134
Kind Code	A1
Gerbi; Susan A. ; et al.	September 18, 2014

Conserved Nucleotide Elements In Ribosomal RNA

Abstract

The present invention relates to a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one domain of life, Eukarya, Bacteria, or Archaea, and degenerate in at least one other domain of life. The invention also relates to a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in another subgroup within a domain of life or for a subset group within a domain of life. The invention relates to a method of identifying a compound that is a domain-specific or subgroup-specific ribosomal RNA inhibitor.

Inventors:

Gerbi; Susan A.; (Providence, RI) ; Doris; Stephen M.; (Providence, RI) ; Smith; Deborah R.; (Providence, RI) ; Beamesderfer; Julia; (Providence, RI) ; Raphael; Benjamin J.; (Providence, RI)

Applicant:

Name	City	State	Country	Type
Brown University	Providence	RI	US

Assignee:

Brown University
Providence
RI

Family ID:

51531643

Appl. No.:

14/204223

Filed:

March 11, 2014

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
61798468	Mar 15, 2013

Current U.S. Class:	702/19
Current CPC Class:	G16B 30/00 20190201; G16B 20/00 20190201
Class at Publication:	702/19
International Class:	G06F 19/22 20060101 G06F019/22

Goverment Interests

GOVERNMENT SUPPORT

[0002] This invention was made with government support under MCB-0718714 and MCB-1120971 awarded by the National Science Foundation. The government has certain rights in the invention.

Claims

1. A method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one domain of life and degenerate in at least one other domain of life, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides proximate to the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for each of the Eukarya, Bacteria or Archaea domains of life or a merger of the domains of life or for a subgroup within a domain of life; b) filtering the data set against at least one representative structural sequence from each of the Eukarya, Bacteria or Archaea domains of life to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the Eukarya, Bacteria or Archaea domains of life to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in Eukarya (eCNE=conserved nucleotide elements in Eukarya), rRNA nucleotide motifs in Bacteria (bCNE=conserved nucleotide elements in Bacteria), rRNA nucleotide motifs in Archaea (aCNE=conserved nucleotide elements in Archaea), or in any subgroup within a domain of life; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one domain of life and degenerate in at least one other domain of life from the collection of rRNA nucleotide motifs in Eukarya (domain-specific d-s eCNE), rRNA nucleotide motifs in Bacteria (domain-specific d-s bCNE), and rRNA nucleotide motifs in Archaea (domain-specific d-s aCNE).

2. The method of claim 1, wherein the representative structural sequence specific for Bacteria is at least one member selected from the group consisting of Escherichia coli and Clostridium ramosum.

3. The method of claim 1, wherein the representative structural sequence specific for Eukarya is at least one member selected from the group consisting of Saccharomyces cerevisiae and Arabidopsis thaliana.

4. The method of claim 1, wherein the representative structural sequence specific for Archaea is at least one member selected from the group consisting of Haloarcula marismortui and Sulfolobus solfataricus.

5. The method of claim 1, wherein the conserved rRNA nucleotide motifs are small ribosomal subunit conserved rRNA nucleotide motifs.

6. The method of claim 1, wherein the conserved rRNA nucleotide motifs are large ribosomal subunit conserved rRNA nucleotide motifs.

7. The method of claim 1, wherein the conserved rRNA nucleotide motifs that are specific to Eukarya (d-s eCNE), Bacteria (d-s bCNE), or Archaea (d-s aCNE) and degenerate to at least one other domain of life have a length of at least one member selected from the group consisting of at least about 6 nucleotides, at least about 8 nucleotides, at least about 10 nucleotides, at least about 15 nucleotides, at least about 20 nucleotides, at least about 25 nucleotides, at least about 30 nucleotides and at least about 35 nucleotides.

8. The method of claim 1, wherein the conserved rRNA nucleotide motif is specific to Bacteria and degenerate in Eukarya.

9. The method of claim 8, wherein the conserved rRNA nucleotide motif that is specific to Bacteria is at least one of AGCACU or UCGCUCAACG.

10. The method of claim 8, wherein the Eukarya is a vertebrate Eukarya.

11. The method of claim 10, wherein the vertebrate Eukarya is a human

12. The method of claim 8, wherein the Bacteria is gram-positive bacteria.

13. The method of claim 8, wherein the Bacteria is gram-negative bacteria.

14. A method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in at least one other subgroup within Eukarya, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides near the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for the Eukarya domain of life or for a subset group with a domain of life; b) filtering the data set against at least one representative structural sequence from the subgroup within Eukarya to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the subgroups within Eukarya to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in the subgroup within Eukarya; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one subgroup within Eukarya and degenerate in at least one other subgroup within Eukarya from the collection of rRNA nucleotide motifs in the subgroup within Eukarya.

15. The method of claim 14, wherein the conserved rRNA nucleotide motifs are a small ribosomal subunit conserved rRNA nucleotide motif.

16. The method of claim 14, wherein the conserved rRNA nucleotide motifs are a large ribosomal subunit conserved rRNA nucleotide motif.

17. The method of claim 14, wherein the conserved rRNA nucleotide motif is specific to Protista and degenerate in other Animalia.

18. The method of claim 14, wherein the conserved rRNA nucleotide motif is specific to Fungi and degenerate in other Animalia.

19. The method of claim 14, wherein the conserved rRNA nucleotide motif is specific to Nematodes and degenerate in other Animalia.

20. The method of claim 14, wherein the Animalia is in the Vertebrata subphylum.

21. The method of claim 20, wherein the Vertebrata is a human.

22. The method of claim 14, wherein the conserved rRNA nucleotide motif is specific to a sub-group of Eukarya selected from the group consisting of yeast, protozoa, and worms and is degenerate in other subgroups of Eukarya.

23. A method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in at least one other subgroup within Bacteria, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides near the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for the Bacteria domain of life or for a subset group within a domain of life; b) filtering the data set against at least one representative structural sequence from the subgroup within Bacteria to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the subgroups within Bacteria to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in the subgroup within Bacteria; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one subgroup within Bacteria and degenerate in at least one other subgroup within Bacteria from the collection of rRNA nucleotide motifs in the subgroup within Bacteria.

24. The method of claim 23, wherein the conserved rRNA nucleotide motifs are a small ribosomal subunit conserved rRNA nucleotide motif.

25. The method of claim 23, wherein the conserved rRNA nucleotide motifs are a large ribosomal subunit conserved rRNA nucleotide motif.

26. The method of claim 23, wherein the conserved rRNA nucleotide motif is specific to pathogenic Bacteria and degenerate in other Bacteria.

27. A method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in at least one other subgroup within Archaea, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides near the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for the Archaea domain of life or for a subset group within a domain of life; b) filtering the data set against at least one representative structural sequence from the subgroup within Archaea to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the subgroups within Archaea to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in the subgroup within Archaea; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one subgroup within Archaea and degenerate in at least one other subgroup within Archaea from the collection of rRNA nucleotide motifs in the subgroup within Archaea.

28. The method of claim 27, wherein the conserved rRNA nucleotide motifs are a small ribosomal subunit conserved rRNA nucleotide motif.

29. The method of claim 27, wherein the conserved rRNA nucleotide motifs are a large ribosomal subunit conserved rRNA nucleotide motif.

30. The method of claim 27, wherein the conserved rRNA nucleotide motif is specific to pathogenic Archaea and degenerate in other Archaea.

31. A method of identifying a compound that is a domain-specific rRNA inhibitor, comprising the steps of: a) generating a space-filling model of an rRNA nucleotide motif identified using the method of claim 1 and a test compound; and b) determining docking of the test compound to at least one rRNA nucleotide motif identified using the method of claim 1 in the space-filling model, wherein the fitting accuracy based on three-dimensional structure and functional surface of the docking of the test compound to the rRNA nucleotide motif identified using the method of claim 1 identifies a compound that specifically inhibits the domain-specific rRNA nucleotide motif identified using the method of claim 1.

32. The method of claim 31, wherein the domain-specific rRNA nucleotide motif is in the rRNA of Bacteria and not in Eukarya.

33. The method of claim 32, wherein the domain-specific motif is AGCACU (SEQ ID NO: 136) or UCGCUCAACG (SEQ ID NO: 163) in the rRNA of Bacteria and not in Eukarya.

34. The method of claim 31, wherein the domain-specific rRNA nucleotide motif is in rRNA of Archaea and not in Eukarya.

35. The method of claim 31, wherein the domain-specific rRNA nucleotide motif is in the small ribosomal subunit.

36. The method of claim 31, wherein the domain-specific rRNA nucleotide motif is in the large ribosomal subunit.

37. A method of identifying a compound that is a subgroup-specific rRNA inhibitor, comprising the steps of: a) generating a space-filling model of an rRNA nucleotide motif identified using the method of claim 14 and a test compound; and b) determining docking of the test compound to at least one conserved rRNA nucleotide motif that is specific to one domain of life and degenerate in at least one other domain of life in the space-filling model, wherein the fitting accuracy based on three-dimensional structure and functional surface of the docking of the test compound to the rRNA nucleotide motif identified using the method of claim 14 identifies a compound that specifically inhibits the subgroup-specific rRNA nucleotide motif identified using the method of claim 14.

38. The method of claim 37, wherein the subgroup-specific rRNA nucleotide motif is in rRNA of Eukarya.

39. The method of claim 37, wherein the subgroup-specific rRNA nucleotide motif is in rRNA of Bacteria.

40. The method of claim 37, wherein the subgroup-specific rRNA nucleotide motif is in rRNA of Archaea.

41. The method of claim 37, wherein the domain-specific rRNA nucleotide motif is in the small ribosomal subunit.

42. The method of claim 37, wherein the domain-specific rRNA nucleotide motif is in the large ribosomal subunit.

Description

RELATED APPLICATION

[0001] This application claims the benefit of U.S. Provisional Application No. 61/798,468, filed on Mar. 15, 2013. The entire teachings of the above application are incorporated herein by reference.

INCORPORATION BY REFERENCE OF MATERIAL IN ASCII TEXT FILE

[0003] This application incorporates by reference the Sequence Listing contained in the following ASCII text file being submitted concurrently herewith: [0004] a) File name: 26702024001SeqList.txt; created Mar. 11, 2014, 35 KB in size.

BACKGROUND OF THE INVENTION

[0005] Studies of ribosomal RNA (rRNA) sequence evolution have elucidated deep phylogenetic relationships. However, this powerful approach has not been fully applied to understanding functions of the ribosome itself. Accordingly, a need exists for methods to provide additional insights into aspects of ribosomes. Methods to provide additional insights into aspects of ribosomes can identify drug targets to combat, for example, pathogenic bacteria.

SUMMARY OF THE INVENTION

[0006] The present invention relates to a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one domain of life, Eukarya, Bacteria, or Archaea, and degenerate in at least one other domain of life. The invention also relates to a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in another subgroup within a domain of life or for a subset group within a domain of life. The invention relates to a method of identifying a compound that is a domain-specific or subgroup-specific ribosomal RNA inhibitor.

[0007] In an embodiment, the invention is a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one domain of life and degenerate in at least one other domain of life, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides proximate to the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for each of the Eukarya, Bacteria or Archaea domains of life or a merger of the domains of life or for a subgroup within a domain of life; b) filtering the data set against at least one representative structural sequence from each of the Eukarya, Bacteria or Archaea domains of life to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the Eukarya, Bacteria or Archaea domains of life to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in Eukarya (eCNE=conserved nucleotide elements in Eukarya), rRNA nucleotide motifs in Bacteria (bCNE=conserved nucleotide elements in Bacteria), rRNA nucleotide motifs in Archaea (aCNE=conserved nucleotide elements in Archaea), or in any subgroup within a domain of life; and determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one domain of life and degenerate in at least one other domain of life from the collection of rRNA nucleotide motifs in Eukarya (domain-specific d-s eCNE), rRNA nucleotide motifs in Bacteria (domain-specific d-s bCNE), and rRNA nucleotide motifs in Archaea (domain-specific d-s aCNE).

[0008] In another embodiment, the invention is a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in at least one other subgroup within Eukarya, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides near the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for the Eukarya domain of life or for a subset group with a domain of life; b) filtering the data set against at least one representative structural sequence from the subgroup within Eukarya to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the subgroups within Eukarya to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in the subgroup within Eukarya; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one subgroup within Eukarya and degenerate in at least one other subgroup within Eukarya from the collection of rRNA nucleotide motifs in the subgroup within Eukarya.

[0009] In yet another embodiment, the invention is a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in at least one other subgroup within Bacteria, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides near the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for the Bacteria domain of life or for a subset group within a domain of life; b) filtering the data set against at least one representative structural sequence from the subgroup within Bacteria to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the subgroups within Bacteria to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in the subgroup within Bacteria; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one subgroup within Bacteria and degenerate in at least one other subgroup within Bacteria from the collection of rRNA nucleotide motifs in the subgroup within Bacteria.

[0010] In a further embodiment, the invention is a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in at least one other subgroup within Archaea, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides near the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for the Archaea domain of life or for a subset group within a domain of life; b) filtering the data set against at least one representative structural sequence from the subgroup within Archaea to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the subgroups within Archaea to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in the subgroup within Archaea; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one subgroup within Archaea and degenerate in at least one other subgroup within Archaea from the collection of rRNA nucleotide motifs in the subgroup within Archaea.

[0011] In yet another embodiment, the invention is a method of identifying a compound that is a domain-specific rRNA inhibitor, comprising the steps of: a) generating a space-filling model of an rRNA nucleotide motif identified using the method of claim 1 and a test compound; and b) determining docking of the test compound to at least one rRNA nucleotide motif identified using the method of claim 1 in the space-filling model, wherein the fitting accuracy based on three-dimensional structure and functional surface of the docking of the test compound to the rRNA nucleotide motif identified using the method of claim 1 identifies a compound that specifically inhibits the domain-specific rRNA nucleotide motif identified using the method of claim 1.

[0012] In yet another embodiment, the invention is a method of identifying a compound that is a subgroup-specific rRNA inhibitor, comprising the steps of: a) generating a space-filling model of an rRNA nucleotide motif identified using the method of claim 14, 23, or 27 and a test compound; and b) determining docking of the test compound to at least one rRNA nucleotide motif identified using the method of claim 1 in the space-filling model, wherein the fitting accuracy based on three-dimensional structure and functional surface of the docking of the test compound to the rRNA nucleotide motif identified using the method of claim 14, 23, or 27 identifies a compound that specifically inhibits the subgroup-specific rRNA nucleotide motif identified using the method of claim 14, 23, or 27.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.

[0014] FIG. 1 CNEs in the large ribosomal subunit of Eukarya. The position of universally conserved uCNEs (.gtoreq.90% sequence conservation in all three domains) are outlined in red. The domain-specific d-s CNEs that are .ltoreq.50% conserved in sequence in the other two domains of life are shown in blue.

[0015] FIG. 2. Universal CNEs (uCNEs) in rRNA of the large ribosomal subunit. uCNEs that are conserved in position in the three domains of life are shown in blue. The subset of these that are .gtoreq.90% conserved in sequence in all forms of life are outlined in red. Functional regions of the rRNA are labeled (see text).

[0016] FIG. 3 Heat map of conservation of CNEs in Eukarya, Archaea and Bacteria. Degree of sequence conservation is color-coded for each CNE, ranging from green (most conserved) through black to red (least conserved).

[0017] FIG. 4 Three dimensional view of the large ribosomal subunit. Panel A: crown view (from the subunit interface) of the uCNEs that are .gtoreq.90% conserved in sequence in all domains of life. Panel B: crown view for the d-s CNEs in Eukarya with .ltoreq.50% sequence conservation in Bacteria and Archaea. In both panels, the L1 stalk is at the upper left of the image.

[0018] FIG. 5. Evolutionary distribution of eukaryotic (yellow), archaeal (blue) and bacterial (red) ribosomal RNA sequences within the FLORA databases. Tree of life cladogram within the ARB/SILVA LSU Ref guide tree using the interactive tree of life (iTOL) software platform (http://itol.embl.de/) to show the phylogenetic relationships and branch distances of organisms whose 23S-28S rRNA sequences were included in FLORA and used in this analysis.

[0019] FIG. 6 CNEs in the large ribosomal subunit of Archaea. The position of universally conserved uCNEs (.gtoreq.90% sequence conservation in all three domains) are outlined in red. The domain-specific d-s CNEs that are .ltoreq.50% conserved in sequence in the other two domains of life are shown in blue.

[0020] FIG. 7 CNEs in the large ribosomal subunit of Bacteria. The position of universally conserved uCNEs (.gtoreq.90% sequence conservation in all three domains) are outlined in red. The domain-specific d-s CNEs that are .ltoreq.50% conserved in sequence in the other two domains of life are shown in blue

DETAILED DESCRIPTION

[0021] In an embodiment, the invention is a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one domain of life and degenerate in at least one other domain of life, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides proximate to the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for each of the Eukarya, Bacteria or Archaea domains of life or a merger of the domains of life or for a subgroup within a domain of life; b) filtering the data set against at least one representative structural sequence from each of the Eukarya, Bacteria or Archaea domains of life to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the Eukarya, Bacteria or Archaea domains of life to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in Eukarya (eCNE=conserved nucleotide elements in Eukarya), rRNA nucleotide motifs in Bacteria (bCNE=conserved nucleotide elements in Bacteria), rRNA nucleotide motifs in Archaea (aCNE=conserved nucleotide elements in Archaea), or in any subgroup within a domain of life; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one domain of life and degenerate in at least one other domain of life from the collection of rRNA nucleotide motifs in Eukarya (domain-specific d-s eCNE), rRNA nucleotide motifs in Bacteria (domain-specific d-s bCNE), and rRNA nucleotide motifs in Archaea (domain-specific d-s aCNE).

[0022] The representative structural sequence specific for Bacteria can be at least one member selected from the group consisting of Escherichia coli and Clostridium ramosum. The representative structural sequence specific for Eukarya is at least one member selected from the group consisting of Saccharomyces cerevisiae and Arabidopsis thaliana. The representative structural sequence specific for Archaea is at least one member selected from the group consisting of Haloarcula marismortui and Sulfolobus solfataricus.

[0023] The conserved rRNA nucleotide motifs can be small ribosomal subunit conserved rRNA nucleotide motifs. The conserved rRNA nucleotide motifs can be large ribosomal subunit conserved rRNA nucleotide motifs. The conserved rRNA nucleotide motifs that are specific to Eukarya (d-s eCNE), Bacteria (d-s bCNE), or Archaea (d-s aCNE) and degenerate to at least one other domain of life can have a length of at least one member selected from the group consisting of at least about 6 nucleotides, at least about 8 nucleotides, at least about 10 nucleotides, at least about 15 nucleotides, at least about 20 nucleotides, at least about 25 nucleotides, at least about 30 nucleotides and at least about 35 nucleotides.

[0024] The conserved rRNA nucleotide motif can be specific to Bacteria and degenerate in Eukarya. The conserved rRNA nucleotide motif that is specific to Bacteria can be at least one of AGCACU (SEQ ID NO: 136) or UCGCUCAACG (SEQ ID NO: 163). The Eukarya can be a vertebrate Eukarya. The vertebrate Eukarya can be a human. The Bacteria can be a gram-positive bacteria. The method Bacteria can be gram-negative bacteria.

[0025] In another embodiment, the invention is a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in at least one other subgroup within Eukarya, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides near the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for the Eukarya domain of life or for a subset group with a domain of life; b) filtering the data set against at least one representative structural sequence from the subgroup within Eukarya to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the subgroups within Eukarya to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in the subgroup within Eukarya; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one subgroup within Eukarya and degenerate in at least one other subgroup within Eukarya from the collection of rRNA nucleotide motifs in the subgroup within Eukarya.

[0026] The conserved rRNA nucleotide motifs can be a small ribosomal subunit conserved rRNA nucleotide motif. The conserved rRNA nucleotide motifs can be a large ribosomal subunit conserved rRNA nucleotide motif. The conserved rRNA nucleotide motif can be specific to Protista and degenerate in other Animalia. The conserved rRNA nucleotide motif can be specific to Fungi and degenerate in other Animalia. The conserved rRNA nucleotide motif can be specific to Nematodes and degenerate in other Animalia. The Animalia can be in the Vertebrata subphylum. The Vertebrata can be a human. The conserved rRNA nucleotide motif can be specific to a sub-group of Eukarya selected from the group consisting of yeast, protozoa, and worms and is degenerate in other subgroups of Eukarya.

[0027] In yet another embodiment, the invention is a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in at least one other subgroup within Bacteria, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides near the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for the Bacteria domain of life or for a subset group within a domain of life; b) filtering the data set against at least one representative structural sequence from the subgroup within Bacteria to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the subgroups within Bacteria to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in the subgroup within Bacteria; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one subgroup within Bacteria and degenerate in at least one other subgroup within Bacteria from the collection of rRNA nucleotide motifs in the subgroup within Bacteria. The conserved rRNA nucleotide motifs can be a small ribosomal subunit conserved rRNA nucleotide motif. The conserved rRNA nucleotide motifs can be a large ribosomal subunit conserved rRNA nucleotide motif. The conserved rRNA nucleotide motif can be specific to pathogenic Bacteria and degenerate in other Bacteria.

[0028] In yet another embodiment, the invention is a method of determining conserved ribosomal RNA (rRNA) nucleotide motifs that are specific to one subgroup and degenerate in at least one other subgroup within Archaea, comprising the steps of: a) generating a data set of a single copy of full length rRNA sequences, including a greater than or equal to about 70% identity to a sequence of about 15 nucleotides near the 3' end of the small subunit ribosomal RNA or the large ribosomal subunit RNA, for the Archaea domain of life or for a subset group within a domain of life; b) filtering the data set against at least one representative structural sequence from the subgroup within Archaea to align all sequences to the representative secondary structure; c) using overlapping windows of at least about 6 nucleotides for each of the subgroups within Archaea to obtain rRNA nucleotide sequences that have an informational content score of greater than or equal to about 11 and a nucleotide sequence identity of greater than about 90%, with subsequent merger of the about 6 nucleotide stretches that overlap to generate a collection of rRNA nucleotide motifs in the subgroup within Archaea; and d) determining conserved rRNA nucleotide motifs of at least about 6 nucleotides in length that are specific for one subgroup within Archaea and degenerate in at least one other subgroup within Archaea from the collection of rRNA nucleotide motifs in the subgroup within Archaea.

[0029] The conserved rRNA nucleotide motifs can be a small ribosomal subunit conserved rRNA nucleotide motif. The conserved rRNA nucleotide motifs can be a large ribosomal subunit conserved rRNA nucleotide motif. The conserved rRNA nucleotide motif is specific to pathogenic Archaea and degenerate in other Archaea.

[0030] In a further embodiment, the invention is a method of identifying a compound that is a domain-specific rRNA inhibitor, comprising the steps of: a) generating a space-filling model of an rRNA nucleotide motif identified using the method of claim 1 and a test compound; and b) determining docking of the test compound to at least one rRNA nucleotide motif identified using the method of claim 1 in the space-filling model, wherein the fitting accuracy based on three-dimensional structure and functional surface of the docking of the test compound to the rRNA nucleotide motif identified using the method of claim 1 identifies a compound that specifically inhibits the domain-specific rRNA nucleotide motif identified using the method of claim 1. In this method, the domain-specific rRNA nucleotide motif can be in the rRNA of Bacteria and not in Eukarya; the domain-specific motif is AGCACU (SEQ ID NO: 136) or UCGCUCAACG (SEQ ID NO: 163) in the rRNA of Bacteria and not in Eukarya. In this method, the domain-specific rRNA nucleotide motif is in rRNA of Archaea and not in Eukarya; the domain-specific rRNA nucleotide motif can be in the small ribosomal subunit; the domain-specific rRNA nucleotide motif is in the large ribosomal subunit.

[0031] In yet another embodiment, the invention is a method of identifying a compound that is a subgroup-specific rRNA inhibitor, comprising the steps of: a) generating a space-filling model of an rRNA nucleotide motif identified using the method of claim 14, 23, or 27 and a test compound; and b) determining docking of the test compound to at least one rRNA nucleotide motif identified using the method of claim 1 in the space-filling model, wherein the fitting accuracy based on three-dimensional structure and functional surface of the docking of the test compound to the rRNA nucleotide motif identified using the method of claim 14, 23, or 27 identifies a compound that specifically inhibits the subgroup-specific rRNA nucleotide motif identified using the method of claim 14, 23, or 27. In this method, the subgroup-specific rRNA nucleotide motif can be in rRNA of Eukarya; the subgroup-specific rRNA nucleotide motif can be in rRNA of Bacteria; the subgroup-specific rRNA nucleotide motif is in rRNA of Archaea; the domain-specific rRNA nucleotide motif is in the small ribosomal subunit; the domain-specific rRNA nucleotide motif is in the large ribosomal subunit.

[0032] All cells require a system for storing and extracting biological information, and the basic aspects of this system are conserved in all forms of life. Ribosomes are large macromolecular machines that function toward this requirement as the conserved site of protein synthesis. Structural studies of the ribosome have shown that the active site of peptide bond formation is composed solely of ribosomal RNA (rRNA)(1); thus, the ribosome is the largest known ribozyme. This underscores the central role of rRNA in translation and the probability that the initial ribosome in early evolution was composed only of rRNA (2, 3). Since translation is an ancient and ubiquitous process to which rRNA is central, the evolution of rRNA sequences has provided a wealth of information about phylogenetic relationships, including a revised tree of life containing three primary domains: Bacteria, Archaea, and Eukarya (4).

[0033] Phylogenetic comparisons have been less mined to understand the function of ribosomes. With regard to ribosome structure, such studies revealed that although the rRNA primary sequence largely differs, a universal core secondary structure is maintained by compensatory base changes (5, 6). Domain-specific features are superimposed on the conserved secondary structure of rRNA, such as the insertion of expansion segments (7) that accounts for the increased length of rRNA in Eukarya compared to Bacteria and Archaea. In addition, a comparative structural analysis of bacterial and archaeal rRNAs revealed domain-specific structural features found within their core structures, including insertions/deletions and alternative secondary or tertiary conformations (8). The presence of these domain-specific features suggests that, outside of the catalytic core, rRNA may have adapted specialized structures, and thus functions, in each lineage. However, this idea is largely unexplored. Ribosomal proteins can be domain-specific, with several occurring in Eukarya (8-11), the universally conserved characteristics of the ribosome is much deeper than the knowledge of the domain-specific characteristics.

[0034] As a step towards fully characterizing the specialized structures/functions of the ribosome in each domain of life, we have examined the comparative molecular evolution of 23S-28S ribosomal RNA sequences in a new database that we created to widely represent the phylogenetic diversity within all three domains. Described herein are de novo identification and quantitative characterization of Conserved Nucleotide Elements (CNEs) in rRNA discovered within large ribosomal subunit within each of the three phylogenetic domains of life. Unlike a previous study that identified individual nucleotides that are conserved in Bacteria and Archaea (8), Eukarya is included to identify rRNA sequence conservation in all three domains of life. Moreover, In order to identify potential RNA- and protein-recognition motifs, searched specifically for conserved regions at least six nucleotides in length. Several CNEs were identitied--57, 49 and 47 CNEs that are .gtoreq.6 nt in 23S-28S rRNA of Eukarya, Bacteria and Archaea, respectively. Of these, 22 CNEs are universally conserved (uCNEs) in position and sequence in all domains of life, with nine of these .gtoreq.90% conserved in sequence. The uCNEs map to regions of rRNA with established functions, but, unexpectedly, some uCNEs reside in areas with no functions identified to date. This underscores the value of our approach to identify new areas in rRNA of potential functional importance. In addition, we also discovered domain-specific (d-s) CNEs that are highly conserved in one domain of life but degenerate in the other domains. The majority of the d-s CNEs are in Eukarya, representing new, not previously appreciated, structural features of eukaryotic ribosomes. Together, these analyses represent a new framework for investigations on the assembly, structure and function of ribosomes.

[0035] The major advance of the X-ray crystal structure of the ribosome in Bacteria and Archaea (17-20) and recently in Eukarya (10-11, 21-22) offers snapshots of the dynamic ribosome, which undergoes conformational changes during translation (23), as first visualized by cryo-EM (24). Since the heart of the ribosome is rRNA, understanding its role requires the discovery of which nucleotides are essential for ribosome function. Evolutionary comparisons provide a method to identify sequences within rRNA that are vital for its function. Over evolutionary time, mutations accumulate in nonfunctional nucleotides, whereas sequences important for function are maintained by natural selection. In this study, conserved motifs in the large ribosomal subunit rRNA were identified. The fact that we found the previously known regions of rRNA required for translation validates our approach for identifying novel sequence motifs of potential functional importance. We began by establishing FLORA, with full-length and non-redundant rRNA sequence entries derived from ARB/SILVA, where they are aligned according to secondary structure. Conserved nucleotide elements (CNEs).gtoreq.6 nt that are .gtoreq.90% conserved in 23S-28S rRNA from each of the three domains of life were identified. Sequence comparisons between the three domains allowed us to discover universal CNEs (uCNEs) and other CNEs that are domain-specific (d-s CNEs).

Universal CNEs (uCNEs)

[0036] There are 22 uCNEs that are conserved in their secondary structure position and sequence in 23S-28S rRNA in all three domains of life (Table 1; FIG. 2). Of these, 9 uCNEs are .gtoreq.90% conserved in primary sequence in the three domains of life, suggesting that they are essential for the ribosome. When superimposed on the X-ray crystal structure of the yeast 60S ribosomal subunit (10), it can be seen that the uCNE motifs are centrally clustered and mostly at the subunit interface (FIG. 4A). Many of the activities of the ribosome occur at the subunit interface, and several of these coincide with the uCNEs, as discussed below.

[0037] Bridges Between the Ribosomal Subunits.

[0038] Bridges between the two ribosomal subunits help to coordinate their activities and conformational changes. Of the 12 bridges universal to all domains of life, two-thirds involve the large ribosomal subunit rRNA (10, 20-21). Almost all of the 23S-28S rRNA-containing universal bridges coincide with CNEs (Table 9), most of which are clustered in the secondary structure of 23S-28S rRNA (FIG. 2). In addition, almost all of these bridge-containing CNEs coincide with uCNEs, including two (uCNE 4 and uCNE 5) that are universally .gtoreq.90% conserved in sequence (Table 1). Thus, in general, many but not all of the contact sites in 23S-28S rRNA that are involved in universal bridges between the ribosomal subunits coincide with uCNEs (Table 9). Since contact sites have been mapped for only a few of the ribosome states of conformational changes during ratcheting, some of the uCNEs in the bridge region (FIG. 2) may reflect inter-subunit contact sites that are yet to be discovered. In contrast to the universal bridges, the additional eukaryotic-specific bridges (25) involve interactions with expansion segment rRNA or eukaryotic-specific ribosomal proteins and not CNEs.

[0039] Peptidyl Transferase Center (PTC).

[0040] The peptidyl transferase center (PTC)(26), where peptide bond formation occurs in the large ribosomal subunit, is made up almost exclusively of uCNEs (FIG. 2), including uCNEs 6, 7 and 8 that are .gtoreq.90% conserved in sequence in all domains of life. The CCA terminus of tRNA, adjacent to the nascent peptide, also interacts with some nucleotides of the PTC (20, 27), as well as with the P-loop and A-loop (FIG. 2) (28-32). The P-loop and A-loop coincide with CNEs in Eukarya (FIG. 1: eCNEs 45 and 54) and in Bacteria (FIG. 7: bCNEs 32 and 44) but not in Archaea (FIG. 6), and therefore are not classified as uCNEs.

[0041] The Sarcin-Ricin Loop (SRL) and GTPase Associated Center (GAC).

[0042] The sarcin-ricin loop (SRL) anchors Elongation Factor G (EF-G) on the ribosome during mRNA-tRNA translocation (33). The SRL coincides with uCNE 9 (FIG. 2), which is conserved in .gtoreq.90% of rRNA sequences in all three domains of life. The GTPase Associated Center (GAC; FIG. 2) (composed of 23S-28S rRNA helices 43 and 44 and ribosomal protein L11, and is found close to the SRL near the base of the L7/L12 stalk (P stalk in Eukarya)(34)) activates the GTPase activity of translation factors including EF-G. Much of the rRNA sequence composing the GAC is highly conserved in Eukarya (FIG. 1, eCNEs 19-21), moderately in Bacteria (FIG. 7, bCNEs 12-13) and less so in Archaea (FIG. 6, aCNEs 12-13), with the universal overlap of .gtoreq.6 nt represented by uCNE 19 (FIG. 2). Just as the uCNEs of the inter-subunit bridges region coincide with areas implicated in conformational changes of the ribosome during translation, the uCNE of the GAC also undergoes conformational changes (34-37).

[0043] While many of the uCNEs correspond to region of known function in the ribosome, as discussed above, some are in regions of 23S-28S rRNA of unknown function. Most of these map to the 5' half of the molecule. Of special interest are uCNEs 1-3 that are .gtoreq.90% conserved in sequence in all three domains of life. They underscore the power of our approach to highlight new areas of the ribosome of likely great functional importance that are worthy of future study.

Domain-Specific CNEs (d-s CNEs)

[0044] Of the CNEs found in each domain (eCNEs, aCNEs, bCNEs), only a subset of them are universally conserved in all forms of life (uCNEs), and the remainder shows varying degrees of sequence degeneracy when compared between domains. Those that have .ltoreq.50% sequence conservation between domains are termed here domain-specific CNEs (d-s CNEs) and may play important roles unique to ribosomes from that domain of life. To our knowledge, this is the first report of stretches of conserved sequence in rRNA that are domain-specific.

[0045] There are two d-s bCNEs (bCNEs 10 and 37; FIG. 7 and Table 8) with sequences that are .gtoreq.90% conserved in all Bacteria but .ltoreq.50% conserved in the other two domains of life. They represent excellent potential drug targets to combat pathogenic bacteria, as the corresponding rRNA sequences are degenerate in the eukaryotic hosts. Roberts et al. (8) identified individual nt that are conserved signatures of the bacterial or the archaeal domain, with .gtoreq.90% conservation in one domain and .ltoreq.10% conservation in the other domain. With this rigorous definition, they identified only 4/10 nt in bCNE 37 and none in bCNE 10 as domain-specific signatures in Bacteria. The function of bCNE 37 is unknown. bCNE 10 includes nt U860 and G864 in E. coli that were observed to be strongly deleterious when mutated (38). bCNE10 resides in Helix 38 (H38), named the A-site Finger (ASF) because it interacts with A-site tRNA (20, 39). The apex of H38 forms bridge B1a between the ribosomal subunits, but mutation of the apex cannot account for the lethal phenotype of mutating U860 or G864 (38). The precise function of d-s bCNE 10 as well as d-s bCNE 37 in Bacteria and d-s aCNE 18 in Archaea remain to be established.

[0046] In contrast to the one or two d-s CNEs found in Archaea and Bacteria, respectively, there are 12 d-s CNEs in Eukarya (FIGS. 1, 6, 7 and Tables 6-8). Therefore, d-s CNEs are largely a eukaryotic phenomenon. The positions of the d-s eCNEs on the three dimensional structure of the ribosome gives clues to their functions, as discussed below and summarized in Table 10.

[0047] The d-s eCNEs form a semi-circle in the large ribosomal subunit.

[0048] When superimposed on the X-ray crystal structure of the yeast 60S ribosomal subunit (10), it can be seen that the d-s eCNEs are arranged as a semi-circle cluster, with several exposed to the subunit interface (FIG. 4B). This is reminiscent of the findings of Ben-Shem et al. (10) who found that the expansion segments (not conserved in sequence and structurally distinctive for the eukaryotic domain) are arranged as a ring on the solvent (back) side of the yeast 60S ribosomal subunit. Our data reveal that some of the eCNEs are near expansion segments, including d-s eCNE 34 and d-s eCNE 43 that abut expansion segments ES27L and ES31L, respectively (compare FIG. 1 to FIG. 1 of ref (40)). Perhaps the insertion of these expansion segments created additional evolutionary constraints on the neighboring sequences, resulting in conservation of d-s eCNEs 34 and 43.

[0049] In addition to expansion segments, the eukaryotic-specific ribosomal proteins as well as the eukaryotic extensions on the ribosomal proteins found in the other domains of life are associated with this ring (10). Of the six ribosomal proteins that are unique to eukaryotes (10-11, 40), L36e contacts d-s eCNE 37 and L29e contacts d-s eCNEs 14, 16 and 50, as well as eCNEs 10 and 45. Therefore, four of the nine d-s eCNEs contact ribosomal proteins that are unique to eukaryotes.

[0050] L1 Stalk.

[0051] tRNA leaves the ribosome through the Exit (E) site (41). The dynamic changes in conformation of the rRNA stalk that binds ribosomal protein L1 (27, 42-43) plays a role in the exit of tRNA from the ribosome (42, 44-45). No uCNE is near the L1 stalk (FIG. 2), but eCNEs 42 and 43 are part of this structure (FIG. 1). Moreover, eCNE 43 is a d-s eCNE that is uniquely conserved in Eukarya. This suggests a eukaryotic-specific functional role for d-s eCNE 43 to evacuate tRNA from the ribosome, and complements the notion that the E-site for tRNA on the ribosome evolved relatively late (46-48), as reflected in E site differences between the domains of life (49).

[0052] Many eCNEs Coincide with the Tunnel of the Large Ribosomal Subunit.

[0053] Nascent polypeptides leave the PTC of the large ribosomal subunit via a tunnel (50-51), the walls of which are primarily composed of rRNA (1, 17, 52-53). The 10-20 .ANG. narrow diameter of the tunnel precludes much folding of the nascent polypeptide beyond the formation of a helices (54). Recently it has been suggested that the tunnel may play a more active, though as yet unknown, role than previously believed (55). In this regard, it is exciting to note that there is enormous overlap of the eCNEs (FIG. 1) with rRNA stretches that compose the tunnel (FIG. 1F of ref (1)). Even more noteworthy is the congruence of d-s eCNEs 14, 16, 23 and 40, accounting for about half of the sequences that are .gtoreq.90% conserved in all Eukarya but very degenerate in the other two domains of life. These observations suggest that these d-s eCNEs may play a heretofore unknown function for the tunnel with the nascent polypeptide.

[0054] The domain-specific CNEs are prevalent primarily in Eukarya, where they represent a new feature of eukaryotic ribosomes. As discussed above and summarized in Table 10, all of the nine d-s eCNEs, except for d-s eCNE 27, correlate with sites suggesting their potential eukaryotic-specific functions in structure of the ribosome and in translation. Eukaryotic CNEs may also serve as binding sites for biogenesis factors or function in rRNA folding.

[0055] Because ribosomal RNA contains a universally conserved core structure, it is believed that the ribosome formed before life differentiated into branches. Upon splitting into three domains, the ribosomes within these branches maintained universal and unique characteristics. At the root of the tree, these features became fixed and remained constant throughout evolution. Tracing of the evolutionary path of 23S-28S rRNA through the study of conserved nucleotide elements (CNEs) is described herein. The invariant nature of CNEs highlights their biological importance, and it appears that CNEs evolved with the basic functions of the cell. Although some of these functions are highlighted here, the analysis of individual CNEs will yield additional insights into previously unknown aspects of ribosomes.

EXEMPLIFICATION

[0056] Studies of ribosomal RNA (rRNA) sequence evolution have elucidated deep phylogenetic relationships. However, this powerful approach has not been fully applied to understanding functions of the ribosome itself. Highly conserved nucleotide elements (CNEs) in 23S-28S rRNA sequences from each phylogenetic domain (Eukarya, Bacteria and Archaea), using a new structurally aligned rRNA database, FLORA (Full-Length Organismal rRNA Alignment) were identified systematically. By quantifying conservation of CNE motifs across phylogenetic domains, we identified universal CNEs (uCNEs) located at the same structural position in all three major branches of the phylogenetic tree and domain-specific CNEs (d-s CNEs) that are uniquely conserved in one phylogenetic domain but absent in the other two. As expected, most uCNEs reside within the functionally important regions of rRNA essential for translation. However, a few uCNEs do not correspond to sites of known function, thus identifying novel sequences in rRNA of potential importance. In contrast to the uCNEs, the d-s CNEs provide new insights into facets of ribosomes that are unique to that domain of life. The d-s CNEs are largely a eukaryotic phenomenon and provide evidence for sites within rRNA that have eukaryotic-specific functions in ribosome biogenesis and translation, including nascent polypeptide transit. Thus, the data described herein give new insight into the evolution of ribosomes and support the hypothesis that motifs within the rRNA core have been tailored by evolution for specialized functions in each phylogenetic domain.

[0057] rRNA data were obtained from the SILVA Ref database (12) and curated to create the Full-Length Organismal rRNA Alignment (FLORA) database for 23S-28S rRNA sequences. ARB (56) was used to construct individual position tree servers for each domain of life and for rRNA sequence alignments. A sliding window of 6 nucleotides was used to identify conserved motifs with an information content.gtoreq.11.0, and overlapping motifs were merged into longer motifs to derive the CNEs. The consensus sequence for the CNE motif in each domain was derived using WebLogo (57), and the percent conservation of each CNE was calculated based on the frequency of mismatches. To identify the uCNEs, the coordinates of the CNEs in each domain of life were aligned in ARB to identify all motifs that were structurally conserved in position. The false discovery rate (FDR) was derived from p-values.

FLORA: The Customization of rDNA Alignments for Optimized, Unbiased Identification of Conserved Elements

[0058] The first step in comprehensive rRNA motif discovery is to produce a global sequence alignment with broad phylogenetic representation from each domain of life. Several databases exist for rRNA sequences, but often they just include the small ribosomal subunit rRNA, lack eukaryotic sequences, or are not compatible with high-throughput computational analysis. ARB/SILVA was employed for the study because it provides the most comprehensive resource of rRNA sequences from Bacteria, Archaea and Eukarya, and the thousands of rRNA sequences are aligned according to secondary structure (12-14).

[0059] As our starting point, the thousands of sequences in the complete SILVA LSU Reference database of 23S-28S rRNA were catalogued into three position-tree servers according to phylogenetic domain. Several parameters were then used to produce a global alignment containing only complete 23S-28S rRNA sequences: (i) All sequence data containing the term "partial" or "shotgun" in their abstract were eliminated. (ii) Sequences were only included if they had the highly conserved sarcin-ricin loop (SRL) sequence at the 3' end of 23S-28S rRNA (15). In addition, to avoid phylogenetic biases stemming from the fact that the SILVA LSU Reference database allows multiple entries for a single species, all duplicate species entries were eliminated such that the final datasets contain only one full-length rRNA sequence per species. These steps reduced the number of large ribosomal subunit sequences to 342 (Eukarya), 915 (Bacteria) and 86 (Archaea), which is double the number of entries for each domain of life as used in a previous rRNA database (16; http://www.rna.icmb.utexas.edu). Our refined data set represents a Full-Length Organismal rRNA Alignment (FLORA) that represents a broad distribution of organisms from the tree of life (FIG. 6) and is optimized for comprehensive, global motif discovery.

Database Construction and Server Construction

[0060] Ribosomal RNA data were obtained from the SILVA Comprehensive Ribosomal RNA database maintained by the Microbial Genomics and Bioinformatics Research Group at the Max Planck Institute (58; LSU ref 96) and refined to contain only full length 23S-28S rRNA sequences with only one entry per organism. Accessions that did not contain the 14 nucleotide sarcin-ricin loop (SRL) AGUACGAGAGGAAC sequence at least 70% conserved (i.e., .ltoreq.4 mismatches) at the appropriate structural position at the 3' end of 23S-28S rRNA were eliminated. To balance the distribution of representative organisms from the eukaryotic tree, an equal number of plants were removed from each subtaxon to maintain phylogenetic breadth in the plant species that were retained. As a result of these steps, the Full-Length Organismal rRNA Alignment (FLORA) was created and is publicly available through Brown University and is maintained by the Gerbi Research Group. Organisms in FLORA were organized into phylogenetic trees and individual position tree servers for each domain of life were constructed using ARB (59).

Identification of Conserved Nucleotide Elements (CNEs) in the Large Ribosomal Subunit within Each Domain of Life

[0061] Motif discovery in rRNA presents unique challenges owing to the variable lengths of the 23S-28S molecule. This is especially true for the eukaryotes as human 28S rRNA (5100 nt) is about 1.5 times larger than budding yeast 25S rRNA (3400 nt). Much of this variation is due to expansion segments that are of lesser concern because neither their lengths nor sequences are evolutionarily conserved (7). To overcome the problem of rRNA length variation, the analyses began on structurally filtered alignments. A representative model organism was chosen from each domain as the structural filter, producing a database where all alignment columns are structurally homologous to the filtering organism, insertions are excluded, and deletions are held by gaps. This allowed us to compare orthologous positions in rRNA that descended from the same structure throughout evolution. Conserved motifs in each structurally aligned FLORA database using a combined approach of information content (IC) (scores.gtoreq.11.0) and percent sequence identity 90% throughout the entire domain) was tested. A minimum length of six bases with no maximum length was superimposed in order to select for biologically significant motifs likely to act as either protein- or RNA-binding sites. When carried out separately for each of the three domains of life, this identified 57 eukaryotic conserved nucleotide elements (eCNEs), 49 bacterial CNEs (bCNEs), and 47 archaeal CNES (aCNEs) of various lengths up to 69 bases (Tables 2-4, respectively). In some cases, two adjacent CNEs may be separated by only a few non-conserved nucleotides. To identify any biases imposed by structural filters, CNE motif discovery was repeated using a different filtering organism for each domain of life, chosen from a phylogenetic kingdom that was distant from the first. Both sets of filters discovered the same set of CNEs, with only a few cases where the motif boundaries changed (Tables 2-4). As confirmed by sliding window motif discovery conducted on 500 randomized FLORA alignments, CNEs are exceptionally well conserved above background, with CNEs.gtoreq.8 nucleotides long showing the lowest false discovery rates (FDRs; Table 5). Thus, the CNEs represent the highly invariant and evolutionarily fixed core of rRNA sequence elements within each domain of life.

[0062] By definition, all CNEs are .gtoreq.90% conserved within their respective phylogenetic domains, but by conducting cross-domain analysis, how well each motif is conserved in the other two domains was examined (Table 6-8). As evident from conservation heat maps (FIG. 3), CNEs demonstrate varying degrees of sequence degeneracy between phylogenetic domains. Along this continuum, the most degenerate of these sequences (<50% sequence conservation) are identified as domain-specific CNEs (d-s CNEs). There are 9 d-s CNEs in Eukarya, 2 d-s CNEs in Bacteria, and 1 d-s CNE in Archaea (FIGS. 1, 6, 7 and Tables 6-8). Therefore, domain-specific motifs are largely a eukaryotic phenomenon (16% of all CNEs in Eukarya are d-s CNEs compared to 4% in Bacteria and 2% Archaea). Thus, the identification of d-s CNEs focuses attention on special features that may play unique roles for ribosome biogenesis and function in eukaryotes.

Sequence Alignments

[0063] All sequence alignments for the 23S-like molecule were obtained using the alignment tool in ARB (59). For alignment within each domain, a structural filter was employed using Saccharomyces cerevisiae (Sc; Eukarya; Accession J01355), Haloarcula marismortui (Hm; Archaea; Accession X13738), or Escherichia coli (Ec; Bacteria; Accession J01695). This process was repeated using a second structural filter from a different set of organisms: Arabidopsis thaliana (At; Eukarya; Accession X52320), Sulfolobus solfataricus (Ss; Archaea, Accession AE006720) and Clostridium ramosum (Cr; Bacteria; Accession ABFX02000008).

Motif-Finding Algorithm and Information Content (IC) Scores

[0064] Motifs in the rRNA alignments using the following algorithm were identified. First, positions (columns in the alignment) were removed where 10% or more of the sequences contained a non-nucleotide character (e.g., an indel) at the position. For the remaining positions, the position weight matrix (PWM) was computed of length 6 starting at each position. The information content (IC) was computed for each PWM (60) by summing the relative entropy of each column using the following equation:

i , j P ( i , j ) log 2 [ P ( i , j ) Q ( i ) ] . ##EQU00001##

Here P(i,j) is the observed frequency of character i at position j in the motif, and Q(i) is the background frequency of character i across all positions of the alignment. In cases, where P(i,j)=0,

P ( i , j ) log 2 [ P ( i , j ) Q ( i ) ] = 0 , ##EQU00002##

was set, rather than use pseudocounts. Therefore, each summand (in j) is the relative entropy of the position. Note that if a position is 100% conserved, and the background frequencies are uniform, then the relative entropy of the position equals 2 (bits). Thus, a 100% conserved motif of length L has IC=2L. The position to indicate a conserved motif of length 6 if the IC score of the PWM was at least 11.0 was considered and then merged overlapping motifs into longer motifs to derive the CNEs. Note that the IC scores for the merged CNEs can only be compared between different CNEs if normalized for the various CNE lengths.

Homology Modeling for 2D and 3D Structures

[0065] Homologous sequence positions in the three domains of life were obtained using the ARB (V. 07.12.07) sequence aligner tool matched to S. cerevisiae (Eukarya), H. marismortui (Archaea), or E. coli (Bacteria) for modeling onto the 23S-25S rRNA secondary structures which were downloaded and modified from the Comparative RNA Website (61). The S. cerevisiae X-ray crystal structure was used for three-dimensional modeling (62; PDB 3U5D) using MacPyMol (2006 DeLano Scientific LLC).

Calculating Percent Conservation of CNEs

[0066] The consensus sequence for the CNE motif in each domain (eCNE, aCNE, bCNE) was derived using WebLogo (63). The algorithm to calculate percent conservation for each CNE was performed in two steps, without the use of structural filters. First, the frequency of mismatches relative to the consensus sequence was computed for each position in the alignment and an average mismatch was determined based on total number of aligned sequences. In this calculation, an indel with one or more nucleotides insertion or deletion was penalized as a single nt mismatch. Next, the percent conservation was calculated based on the frequency of mismatches: conservation=(L-M)/L, where L is motif length and M is the average mismatch. The same method just described to calculate the % conservation of a CNE within one domain was used to calculate the % conservation of a given CNE when compared to the consensus motif of its homologous position (based on the ARB secondary structure alignment) in each of the other two domains.

Identification of Universally Conserved Nucleotide Elements (uCNEs)

[0067] Homology modeling to position the CNEs from each domain of life onto the secondary structure of rRNA (FIGS. 1, 6, and 7) was used. Although less than half of the CNEs discovered in one domain overlap in structural position with CNEs in the other domains of life, there were 22 universal CNEs (uCNEs) that are structurally conserved in their position in rRNA in all forms of life (FIG. 2). We quantified the sequence conservation of the 22 uCNEs (Table 1); most universal CNEs display at least 80% sequence conservation in all three phylogenetic domains with only 4 exceptions, and 9 of these 22 universal motifs display over 90% sequence conservation across all evolution. The uCNEs are of high statistical significance (Table 5), and, as expected, most of them reside within regions important for translation such as the peptidyl transferase center (uCNE6-8), and regions that undergo conformational changes including the sarcin-ricin loop (uCNE9), GTPase-associated center (uCNE19), and bridges between the ribosomal subunits (uCNE4 and uCNE 5) (FIG. 2). Interestingly, however, some universal CNEs do not correspond to sites of known function, demonstrating the power of our approach to highlight as-yet-uncharacterized features of the ribosome warranting future study.

[0068] To identify the universal CNEs, the coordinates of the CNEs in each domain of life were aligned in ARB to identify all motifs that were structurally conserved in position (uCNE). The longest commonly shared core of each structurally conserved CNE was then used to define the 5' and 3' uCNE coordinates. To derive the uCNE consensus sequence, a consensus was derived first in each individual domain of life, before deriving the final universal sequence that represents the consensus of the three domains. An "N" is used to indicate positions where a consensus could not be derived. Percent conservation was calculated as described in the preceding section.

Statistical Tests

[0069] To assess the statistical significance of the observed CNEs, p-values were computed by comparing the number of CNEs of a given length to the number of conserved motifs observed in random sequences obtained by permuting the columns of the rRNA alignment. This permutation approach generates a random alignment with the same base composition as the actual rRNA data set, but where the positions of the nucleotide similarities are not preserved. For each such random alignment, the number of conserved sequence motifs with length and information content as least as large in the actual rRNA alignments by computing the IC of position weight matrices in sliding windows across the alignment was computed. 500 permutations were used for all calculations. This permutation test was computed separately in each domain of life to calculate intra-domain p-values. The permutation test was also computed on the merged alignment to compute a p-value for each uCNE. From these p-values, the False Discovery Rate (FDR; 64) for the number of observed CNEs was computed using the method of Siegmund et al. (65).

REFERENCES

[0070] 1. Nissen P, Ban N, Hansen J, Moore P B and Steitz T A (2000). The structural basis of ribosome activity in peptide bond synthesis. Science 289: 920-930. [0071] 2. Moore P B and Steitz T A (2010) The roles of RNA in the synthesis of protein. Cold Spring Harbor Perspect. Biol. doi: 10.1101/cshperspect.a003780. [0072] 3. Noller H F (2012) Evolution of protein synthesis from an RNA world. Cold Spring Harbor Perspect. Biol. 4 (4): a003681. doi: 10.1101. [0073] 4. Woese C R, Kandler O and Wheelis M L (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Nat Acad Sci 87: 4576-4579.

[0074] 5. Clark C G, Tague B W, Ware V C, and Gerbi S A (1984) Xenopus laevis 28S ribosomal RNA: a secondary structure model and its evolutionary and functional implications. Nucleic Acids Res 12: 6197-6220. [0075] 6. Gutell R R, Larsen N and Woese C R (1994) Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. Microbiol Rev 58(1):10-26. [0076] 7. Gerbi S A (1996) Expansion Segments: Regions of Variable Size that Interrupt the Universal Core Secondary Structure of Ribosomal RNA. "Ribosomal RNA--Structure, Evolution, Processing, and Function in Protein Biosynthesis" (eds.: R. A. Zimmermann and A. E. Dahlberg), 71-87. [0077] 8. Roberts E, Sethi A, Montoya J, Woese C R and Luthey-Schulten Z (2008) Molecular signatures of ribosomal evolution. Proc Nat Acad Sci 105: 13953-13958. [0078] 9. Dresios, J., Panopoulos, P. and Synetos, D. (2006). Eukaryotic ribosomal proteins lacking a eubacterial counterpart: important players in ribosomal function. Mol Microbiol 59: 1651-1663. [0079] 10. Ben-Shem A et al. (2011) The structure of the eukaryotic ribosome at 3.0 .ANG. resolution. Science 334:1524-1529. [0080] 11. Klinge S, Voigts-Hoffmann F, Leibundgut M, Arpagaus S and Ban N (2011) Crystal structure of the eukaryotic 60S ribosomal subunit in complex with initiation factor 6. Science 334: 941-948. [0081] 12. Pruesse E et al. (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35: 7188-7196. [0082] 13. Yarza P et al. (2010) Update of the All-Species Living Tree Project based on 16S and 23S rRNA sequence analysis. Syst. Appl. Microbiol. 33, 291-299. [0083] 14. Quast C et al. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41 (D1): D590-D596. [0084] 15. Chan Y L, Endo Y and Wool I G (1983) The sequence of the nucleotides at the alpha-sarcin cleavage site in rat 28 S ribosomal ribonucleic acid. J Biol Chem 258: 12768-12770. [0085] 16. Cannone J J. et al. (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3: 2. Erratum: BMC Bioinformatics 3: 15. [0086] 17. Ban N, Nissen P, Hansen J, Moore P B and Steitz T A (2000) The complete atomic structure of the large ribosomal subunit at 2.4 .ANG. resolution. Science 289: 905-920. [0087] 18. Schluenzen F et al. (2000) Structure of functionally activated small ribosomal subunit at 3.3 angstroms resolution. Cell 102: 615-623. [0088] 19. Wimberly B T et al. (2000) Structure of the 30S ribosomal subunit. Nature 407: 327-339. [0089] 20. Yusupov M M et al. (2001) Crystal structure of the ribosome at 5.5 A resolution. Science 292: 883-896. [0090] 21. Ben-Shem A, Jenner L, Yusupova G and Yusupov M (2010) Crystal structure of the eukaryotic ribosome. Science 330: 1203-1209. [0091] 22. Rabl J, Leibundgut M, Ataide S F, Haag A, Ban N (2011) Crystal structure of the eukaryotic 40S ribosomal subunit in complex with initiation factor 1. Science. 331: 730-736. [0092] 23. Noeske J and Cate J H D (2012). Structural basis for protein synthesis: snapshots of the ribosome in motion. Curr. Opin. Struct. Biol. 22: 743-749. [0093] 24. Frank J and Agrawal R K (2000) A ratchet-like inter-subunit reorganization of the ribosome during translocation. Nature 406: 318-322. [0094] 25. Spahn C M et al. (2001) Structure of the 80S ribosome from Saccharomyces cerevisiae--tRNA-ribosome and subunit-subunit interactions. Cell 107: 373-386. [0095] 26. Polacek N and Mankin A S (2005) The ribosomal peptidyl transferase center: structure, function, evolution, inhibition. Crit. Rev. Biochem. Mol. Biol. 40: 285-311. [0096] 27. Budkevich T et al. (2011) Structure and dynamics of the mammalian ribosomal pretranslocation complex. Mol. Cell. 44:214-224. [0097] 28. Samaha R R, Green R and Noller H F (1995) A base pair between tRNA and 23S rRNA in the peptidyl transferase centre of the ribosome. Nature 377: 309-314. Erratum in Nature 378: 419 (1995). [0098] 29. Green R, Switzer C and Noller H F (1998) Ribosome-catalyzed peptide bond formation with an A-site substrate covalently linked to 23S ribosomal RNA. Science 280: 286-289. [0099] 30. Kim D F and Green R (1999) Base-pairing between 23S rRNA and tRNA in the ribosomal A site. Mol. Cell 4: 859-864. [0100] 31. Blanchard S C and Puglisi J D (2001) Solution structure of the A loop of 23S ribosomal RNA. Proc Nat. Acad. Sci. 98:3720-3725. [0101] 32. Hansen J L, Schmeing T M, Moore P B and Steitz T A (2002) Structural insights into peptide bond formation. Proc. Nat. Acad. Sci. 99: 1670-11675. [0102] 33. Shi X, Khade P K, Sanbonmatsu K Y and Joseph S (2012) Functional role of the sarcin-ricin loop of the 23S rRNA in the elongation cycle of protein synthesis. J. Mol. Biol. 419: 125-138. [0103] 34. Li W, Sengupta J, Rath B K and Frank J (2006) Functional conformations of the L11-ribosomal RNA complex revealed by correlative analysis of cryo-EM and molecular dynamics simulations. RNA 12: 1240-1253. [0104] 35. Cundliffe E (1987) On the nature of antibiotic binding sites in ribosomes. Biochimie 69: 863-869. [0105] 36. Gao Y G et al. (2009) The structure of the ribosome with elongation factor G trapped in the posttranslocation state. Science 326: 694-699. [0106] 37. Li W, Trabuco L G, Schulten K and Frank J (2011) Molecular dynamics of EF-G during translocation. Proteins 79: 1478-1486. [0107] 38. Yassin A and Mankin A S (2007) Potential new antibiotic sites in the ribosome revealed by deleterious mutations in RNA of the large ribosomal subunit. J. Biol. Chem. 282: 24329-24342. [0108] 39. Gao H et al. (2003) Study of the structural dynamics of the E. coli 70S ribosome using real-space refinement. Cell 113: 789-801. [0109] 40. Jenner L et al. (2012) Crystal structure of the 80S yeast ribosome. Curr. Opin. Struct. Biol. 22: 759-767. [0110] 41. Rheinberger H J, Sternbach H and Nierhaus K H (1981) Thre tRNA binding sites on Escherichia coli ribosomes. Proc. Nat. Acad. Sci. 78: 5310-5314. [0111] 42. Cornish P V et al. (2009) Following movement of the L1 stalk between three functional states in single ribosomes. Proc Nat Acad Sci 106:2571-2576. [0112] 43. Munro J B et al. (2010) Spontaneous formation of the unlocked state of the ribosome is a multistep process. Proc Nat Acad Sci. 107:709-714. [0113] 44. Korostelev A, Ermolenko D N, and Noller H F (2008) Structural dynamics of the ribosome. Curr. Opin. Chem. Biol. 12: 674-683. [0114] 45. Trabuco L G et al. (2010) The role of L1 stalk-tRNA interaction in the ribosome elongation cycle. J Mol Biol. 402:741-760. [0115] 46. Schmeing T M, Moore P B and Steitz T A (2003) Structure of deacylated tRNA mimics bound to the E site of the large ribosomal subunit. RNA 9: 1345-1352. [0116] 47. Selmer M et al. (2006) Structure of the 70S ribosome complexed with mRNA and tRNA. Science 313, 1935-1942. [0117] 48. Bokov K and Steinberg S V (2009) A hierarchical model for evolution of 23S ribosomal RNA. Nature 457: 977-980. [0118] 49. Dunkle, J A et al., (2011) Structure of the bacterial ribosome in classical and hybrid states of tRNA binding. Science 332: 981-984.

[0119] 50. Yonath A. Leonard K R and Wittmann H G (1987) A tunnel in the large ribosomal subunit revealed by three-dimensional image reconstruction. Science 236: 813-816. [0120] 51. Gabashvili I S et al. (2001) The polypeptide tunnel system in the ribosome and its gating in erythromycin resistance mutants of L4 and L22. Mol. Cell 8: 181-188. [0121] 52. Harms J et al. (2001) High resolution structure of the large ribosomal subunit from a mesophilic eubacterium. Cell 107: 679-688. [0122] 53. Jenni S and Ban N (2003) The chemistry of protein synthesis and voyage through the ribosomal tunnel. Curr. Opin. Struct. Biol. 13: 212-219. [0123] 54. Voss N R, Gerstein M, Steitz T A and Moore P B (2006) The geometry of the ribosomal polypeptide exit tunnel. J. Mol. Biol. 360: 893-906. [0124] 55. Wilson D N and Beckmann R (2011) The ribosomal tunnel as a functional environment for nascent polypeptide folding and translational stalling. Curr. Opin. Struct. Biol. 21: 274-282. [0125] 56. Ludwig W et al. (2004) ARB: a software environment for sequence data. Nucleic Acids Res 32: 1363-1371. [0126] 57. Crooks G E, Hon G, Chandonia J M, Brenner S E (2004). WebLogo: A sequence logo generator. Genome Res 14:1188-1190. [0127] 58. Pruesse E et al. (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35: 7188-7196. [0128] 59. Ludwig W et al. (2004) ARB: a software environment for sequence data. Nucleic Acids Res 32: 1363-1371. [0129] 60. Stormo G D (2000) DNA binding sites: representation and discovery. Bioinformatics 16: 16-23. [0130] 61. Cannone J J. et al. (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3: 2. Erratum: BMC Bioinformatics 3: 15. [0131] 62. Ben-Shem A et al. (2011) The structure of the eukaryotic ribosome at 3.0 .ANG. resolution. Science 334:1524-1529. [0132] 63. Crooks G E, Hon G, Chandonia J M, Brenner S E (2004). WebLogo: A sequence logo generator. Genome Res 14:1188-1190. [0133] 64. Benjamini Y and Hochberg Y (1995) Controlling the false discovery rate. J.R. Stat. Soc. Ser. B 57: 289-300. [0134] 65. Siegmund D O, Zhang N R and Yakir B (2011) False discovery rates for scanning statistics. Biometrika 98: 979-985.

TABLE-US-00001 [0134] TABLE 1 Conservation of universally distributed conserved nucleotide elements (CNEs). uCNE 5'- % 5'- % 5'- % No. Motif Length Euk Euk Arc Arc Bac Bac 1 CCGAUAG (SEQ ID NO: 1) 7 338 97.7 450 98.8 444 97.8 2 CCUAAG (SEQ ID NO: 2) 6 1530 98.4 1453 96.5 1350 98.4 3 CGUACC (SEQ ID NO: 3) 6 1830 90.1 1672 99.0 1600 94.9 4 UAACUU (SEQ ID NO: 4) 6 1918 94.3 1763 92.2 1688 98.3 5 GACUGUUUA (SEQ ID NO: 5) 9 2129 94.7 1825 93.4 1772 97.4 6 AAGACCC (SEQ ID NO: 6) 7 2400 99.7 2097 98.3 2059 99.8 7 GGAUAAC (SEQ ID NO: 7) 7 2811 99.7 2477 97.2 2446 100.0 8 GAGCUGGGUUUA (SEQ ID 12 2941 99.7 2606 98.0 2576 93.9 NO: 8) 9 UAGUACGAGAGGAAC (SEQ ID 15 3017 98.5 2685 97.8 2653 92.8 NO: 9) 10 CUGGUUCCC (SEQ ID NO: 10) 9 937 95.0 898 91.1 806 86.3 11 CAAACUC (SEQ ID NO: 11) 7 1044 85.3 1003 96.8 908 95.5 12 NGUAACUAU (SEQ ID NO: 12) 9 2251 88.4 1947 83.5 1909 88.0 13 ACNCUCUUAAGGUAGC (SEQ ID 16 2261 93.1 1957 95.0 1919 81.0 NO: 13) 14 GCAUGAA (SEQ ID NO: 14) 7 2306 99.7 2002 98.8 1964 86.6 15 ACUGUCCC (SEQ ID NO: 15) 8 2331 99.5 2027 98.1 1989 81.7 16 AGCUUUACU (SEQ ID NO: 16) 9 2412 88.8 2109 93.8 2071 91.7 17 UUGNUACCUCGAUGUCG (SEQ ID 17 2857 82.1 2523 90.7 2492 88.1 NO: 17) 18 GACCGUCGUGAGACAGGU (SEQ 18 2953 99.7 2618 97.6 2588 88.1 ID NO: 18) 19 CGUAACAG (SEQ ID NO: 19) 8 1266 74.6 1195 95.2 1092 89.8 20 UNCCUUGUC (SEQ ID NO: 20) 9 2281 77.5 1977 93.2 1939 88.3 21 GCAUCUA (SEQ ID NO: 21) 7 3112 78.9 2781 99.8 2751 94.7 22 CAUCCUG (SEQ ID NO: 22) 7 2882 56.6 2548 94.4 2517 95.0

TABLE-US-00002 TABLE 2 Conserved nucleotide elements (CNEs) in Eukarya. Structural filter eCNE Eukarya conserved nucleotide elements S. cerevisiae A. thaliana No. eCNE motif Position Length IC Score Length IC Score 1 GCUAAAUA (SEQ ID NO: 23) 319 8 15.0 8 15.1 2 CCGAUAGC (SEQ ID NO: 24) 338 8 15.2 8 15.3 3 AACAAGUAC (SEQ ID NO: 25) 347 9 17.1 9 17.2 4 CCCGUCUUGAAACACGGACCAAG 635 23 44.3 23 44.5 (SEQ ID NO: 26) 5 ACCCGAA (SEQ ID NO: 27) 800 7 12.8 7 12.9 6 AACUAUGC (SEQ ID NO: 28) 815 8 14.5 8 14.6 7 GAAACUCUG (SEQ ID NO: 29) 844 9 16.9 9 17.0 8 CUGACGUGCAAAUCG (SEQ ID NO: 30) 872 15 29.0 15 29.2 9 GGGCGAAAGACUAAUCGAACCAUCUA 907 39 72.2 39 74.2 GUAGCUGGUUCCC (SEQ ID NO: 31) 10 AAGUUUCCCUCAGGAUAGC (SEQ ID 950 19 36.1 19 36.3 NO: 32) 11 GGUAAAGC (SEQ ID NO: 33) 992 8 14.9 8 14.9 12 AAUGAUUAG (SEQ ID NO: 34) 1001 9 15.1 9 16.8 13 CCUAUUCUCAAACUUUAA (SEQ ID NO: 1036 18 34.5 18 34.7 35) 14 GUGGGCCA (SEQ ID NO: 36) 1112 8 15.0 8 15.0 15 UGGUAAGCA (SEQ ID NO: 37) 1124 9 16.7 9 16.7 16 ACUGGCG (SEQ ID NO: 38) 1135 7 12.8 7 12.8 17 UGAACC (SEQ ID NO: 39) 1150 6 11.2 6 11.2 18 GACAGCA (SEQ ID NO: 40) 1221 7 12.8 7 12.8 19 GACGGUGG (SEQ ID NO: 41) 1229 8 14.5 8 14.5 20 CAUGGAAGUCG (SEQ ID NO: 42) 1238 11 20.5 11 20.5 21 AUCCGCUAAGGAGUGUGUAACAACUC 1251 37 71.8 37 71.7 ACCUGCCGAAU (SEQ ID NO: 43) 22 CUAGCCCUGAAAAUGGAUGGCGCU 1291 24 42.7 24 44.4 (SEQ ID NO: 44) 23 GCAGAUCUUGGU (SEQ ID NO: 45) 1430 12 22.3 12 22.4 24 GGUAGUAGCAAAUAUUCA (SEQ ID 1442 18 33.2 18 33.2 NO: 46) 25 GGUUCCAU (SEQ ID NO: 47) 1491 8 14.5 8 14.5 26 UCCUAAG (SEQ ID NO: 48) 1529 7 13.3 7 13.3 27 UCUUUUCU (SEQ ID NO: 49) 1682 8 15.6 8 15.5 28 CCUUGAAAA (SEQ ID NO: 50) 1790 9 16.8 9 16.8 29 UCGUACU (SEQ ID NO: 51) 1829 7 12.5 7 12.5 30 AACCGCAUCAGGUCUCCAAGGU (SEQ 1839 22 41.5 22 41.5 ID NO: 52) 31 CAGCCUCUG (SEQ ID NO: 53) 1864 9 16.3 9 16.4 32 AGGGAAGUCGGCAA (SEQ ID NO: 54) 1894 14 26.2 14 26.2 33 GAUCCGUAACUUCG (SEQ ID NO: 55) 1912 14 26.4 14 26.4 34 AGGAUUGGCUCU (SEQ ID NO: 56) 1931 12 22.6 12 22.6 35 AUCCGACUGUUUAAUUAAAACA (SEQ 2125 22 42.0 22 42.0 ID NO: 57) 36 GUGAUUUCUGCCCAGUGCUCUGAAUG 2184 30 58.5 30 58.4 UCAA (SEQ ID NO: 58) 37 AAUUCAA (SEQ ID NO: 59) 2222 7 13.1 7 13.0 38 CAAGCGCGGGUAAACGGCGGGAGUAA 2230 69 133.2 69 133.1 CUAUGACUCUCUUAAGGUAGCCAAAU GCCUCGUCAUCUAAUUA (SEQ ID NO: 60) 39 UGACGCGCAUGAAUGGAUUAACGAG 2300 41 79.0 41 78.9 AUUCCCACUGUCCCUA (SEQ ID NO: 61) 40 UCUACUAUCUAGCGAAACCACAGCCA 2341 35 63.0 35 64.4 AGGGAACGG (SEQ ID NO: 62) 41 AGCGGGGAAAGAAGACCCUGUUGAGC 2389 37 71.9 37 71.9 UUGACUCUAGU (SEQ ID NO: 63) 42 AAAUACCACUAC (SEQ ID NO: 64) 2483 12 22.1 12 22.1 43 UUUACUUAUU (SEQ ID NO: 65) 2507 10 18.5 10 18.4 44 GAGUUUG (SEQ ID NO: 66) 2607 7 13.0 7 12.9 45 CUGGGGC (SEQ ID NO: 67) 2615 7 13.2 7 13.2 46 ACAUCUGU (SEQ ID NO: 68) 2625 8 15.2 8 15.2 47 GUGUCCUAAG (SEQ ID NO: 69) 2648 10 18.3 10 18.3 48 ACAGAAAUCU (SEQ ID NO: 70) 2673 10 18.4 9 16.6 49 UUGAUU (SEQ ID NO: 71) 2709 6 11.2 6 11.2 50 AUUUUCAGU (SEQ ID NO: 72) 2717 8 14.7 6 11.0 51 UGGCCUAUCGAUCCUUUA (SEQ ID NO: 2748 18 33.6 18 33.6 73) 52 CAGAAAAGUUACCACAGGGAUAACUG 2794 35 68.4 35 68.3 GCUUGUGGC (SEQ ID NO: 74) 53 GCCAAGCGUUCAUAGCGACGUUGCUU 2830 59 115.5 59 115.4 UUUGAUCCUUCGAUGUCGGCUCUUCC UAUCAUU (SEQ ID NO: 75) 54 GGAUUGUUCACCCAC (SEQ ID NO: 76) 2913 15 29.3 15 29.3 55 AGGGAACGUGAGCUGGGUUUAGACC 2932 56 109.5 56 109.4 GUCGUGAGACAGGUUAGUUUUACCCU ACUGA (SEQ ID NO: 77) 56 UAGUACGAGAGGAAC (SEQ ID NO: 78) 3017 15 28.3 15 28.2 57 CGCCUCUA (SEQ ID NO: 79) 3111 8 14.7 8 14.7

TABLE-US-00003 TABLE 3 Conserved nucleotide elements (CNEs) in Archaea. Structural filter aCNE Archaea conserved nucleotide elements H. marismortui S. solfataricus No. aCNE motif Position Length IC Score Length IC Score 1 AAACAUCUUA (SEQ ID NO: 80) 165 10 18.1 10 18.1 2 UAAAUA (SEQ ID NO: 81) 434 6 11.1 6 11.1 3 ACCGAUAG (SEQ ID NO: 82) 449 8 15.2 8 15.2 4 CUGAAAAG (SEQ ID NO: 83) 480 8 15.4 8 15.4 5 UGAAAC (SEQ ID NO: 84) 517 6 11.4 6 11.4 6 CGAUCUA (SEQ ID NO: 85) 773 7 13.4 7 13.4 7 CCAAUC (SEQ ID NO: 86) 878 6 11.2 6 11.2 8 CUGGUUCCC (SEQ ID NO: 87) 898 9 16.1 9 16.1 9 UCAAACUCCGAA (SEQ ID NO: 88) 1002 12 22.8 12 22.8 10 GGUUAAGG (SEQ ID NO: 89) 1092 8 14.3 8 14.3 11 CUAAGUG (SEQ ID NO: 90) 1114 7 13.0 7 13.0 12 AGCAGC (SEQ ID NO: 91) 1173 6 11.4 6 11.4 13 CGUAACAG (SEQ ID NO: 92) 1195 8 14.5 8 14.5 14 UGGACC (SEQ ID NO: 93) 1336 6 11.2 6 11.2 15 AUCCUG (SEQ ID NO: 94) 1356 6 11.0 N/D N/D 16 GGUCCUAAG (SEQ ID NO: 95) 1450 9 16.1 9 16.1 17 GGUUAAUAUUCC (SEQ ID NO: 96) 1495 12 24.5 12 24.5 18 CGUAAU (SEQ ID NO: 97) 1592 6 11.1 6 11.1 19 UGAAAA (SEQ ID NO: 98) 1651 6 11.1 6 11.1 20 CCGUACC (SEQ ID NO: 99) 1671 7 13.1 7 13.1 21 AGGGAA (SEQ ID NO: 100) 1739 6 11.0 6 11.0 22 UCGGCAAAUU (SEQ ID NO: 101) 1746 10 18.9 10 18.9 23 UAACUU (SEQ ID NO: 102) 1763 6 11.2 6 11.2 24 GUCGCA (SEQ ID NO: 103) 1803 6 11.1 6 11.1 25 GACUGUUUAAU (SEQ ID NO: 104) 1825 11 20.1 11 20.1 26 AACAUA (SEQ ID NO: 105) 1839 6 11.1 6 11.1 27 GGUAACUAU (SEQ ID NO: 106) 1947 9 16.8 9 16.8 28 ACCCUCUUAAGGUAGC (SEQ ID NO: 107) 1957 16 31.6 16 31.6 29 UACCUUGCC (SEQ ID NO: 108) 1977 9 16.2 9 16.1 30 GCAUGAAU (SEQ ID NO: 109) 2002 8 15.2 8 15.2 31 CACUGUCCC (SEQ ID NO: 110) 2026 9 17.2 9 17.2 32 AAGACCC (SEQ ID NO: 111) 2097 7 13.4 7 13.4 33 GAGCUUUACUGCA (SEQ ID NO: 112) 2108 13 24.4 13 24.4 34 GCAGUU (SEQ ID NO: 113) 2269 6 11.2 6 11.2 35 AGAAAA (SEQ ID NO: 114) 2461 6 11.1 6 11.1 36 CUACCCC (SEQ ID NO: 115) 2468 7 12.4 7 12.4 37 GGAUAAC (SEQ ID NO: 116) 2477 7 12.8 7 12.8 38 UUGCUACCUC (SEQ ID NO: 117) 2523 10 18.6 10 18.6 39 GAUGUCG (SEQ ID NO: 118) 2533 7 13.0 7 13.0 40 CCAUCCUGG (SEQ ID NO: 119) 2547 9 15.2 9 16.8 41 AAGGGU (SEQ ID NO: 120) 2571 6 11.3 6 11.3 42 CCUAUUAAAGG (SEQ ID NO: 121) 2588 11 20.3 11 20.3 43 UGAGCUGGGUUUAGACCGUCG 2605 31 58.3 31 58.3 UGAGACAGGU (SEQ ID NO: 122) 44 UAGUACGAGAGGAAC (SEQ ID NO: 123) 2685 15 28.0 15 28.0 45 GUUGUC (SEQ ID NO: 124) 2726 6 11.2 6 11.1 46 GCUGAA (SEQ ID NO: 125) 2774 6 11.4 6 11.4 47 GCAUCUAAGC (SEQ ID NO: 126) 2781 10 19.7 10 19.7

TABLE-US-00004 TABLE 4 Conserved nucleotide elements (CNEs) in Bacteria. Structural filter bCNE Bacteria conserved nucleotide elements E. coli C. ramosum No. bCNE motif Position Length IC Score Length IC Score 1 UGAAACAUCU (SEQ ID NO: 127) 193 10 19.3 10 19.3 2 CUAAAU (SEQ ID NO: 128) 426 6 11.1 6 11.1 3 ACCGAUAG (SEQ ID NO: 129) 443 8 15.0 8 15.0 4 AGUACCGU (SEQ ID NO: 130) 457 8 15.1 8 15.1 5 CCUUUUG (SEQ ID NO: 131) 564 7 13.4 7 13.4 6 ACCCGAA (SEQ ID NO: 132) 670 7 13.2 7 13.2 7 UGAUCUA (SEQ ID NO: 133) 683 7 13.2 7 13.2 8 CCGAAC (SEQ ID NO: 134) 731 6 11.3 6 11.3 9 AAUAGCUGGUUCUCC (SEQ ID NO: 802 14 26.0 14 26.0 135) 10 AGCACU (SEQ ID NO: 136) 863 6 11.5 6 11.5 11 CAAACUC (SEQ ID NO: 137) 908 7 12.8 7 12.8 12 GAAGCAGCCA (SEQ ID NO: 138) 1068 10 18.4 10 18.4 13 GCGUAAUAGCUCACUG (SEQ ID 1091 16 29.9 16 29.9 NO: 139) 14 UGAGUA (SEQ ID NO: 140) 1263 6 11.1 6 11.1 15 CCUAAG (SEQ ID NO: 141) 1350 6 11.5 6 11.5 16 CGUACC (SEQ ID NO: 142) 1600 6 11.2 6 11.2 17 ACCGACAC (SEQ ID NO: 143) 1610 8 15.2 8 15.2 18 AGGAACUCGGC (SEQ ID NO: 144) 1665 11 20.2 11 20.2 19 CCGUAACUUCGG (SEQ ID NO: 145) 1685 12 22.6 12 22.6 20 GACUGUUUA (SEQ ID NO: 146) 1772 9 17.8 9 17.8 21 AAAAACACAG (SEQ ID NO: 147) 1783 10 18.6 10 18.6 22 ACGCCUGCCCGGU (SEQ ID NO: 148) 1829 13 24.1 13 24.1 23 AAGCCC (SEQ ID NO: 149) 1889 6 11.2 6 11.2 24 AACGGC (SEQ ID NO: 150) 1900 6 11.3 6 11.2 25 CCGUAACUAUAACGGUCCUAAGG 1908 42 81.1 42 81.1 UAGCGAAAUUCCUUGUCGG (SEQ ID NO: 151) 26 GUAAGUUCCGACCUGCACGAA 1950 21 39.1 21 39.1 (SEQ ID NO: 152) 27 ACUGUCUC (SEQ ID NO: 153) 1989 8 15.9 8 15.9 28 AAAGACCCCGU (SEQ ID NO: 154) 2058 11 20.1 11 20.1 29 ACCUUUACU (SEQ ID NO: 155) 2071 9 17.5 9 17.5 30 UCUAAC (SEQ ID NO: 156) 2195 6 11.3 6 11.3 31 CAGUUUG (SEQ ID NO: 157) 2240 7 12.4 7 12.4 32 UGGGGC (SEQ ID NO: 158) 2249 6 11.2 6 11.2 33 GCCUCCCAA (SEQ ID NO: 159) 2259 9 16.4 9 16.4 34 GUAACGGA (SEQ ID NO: 160) 2271 8 15.1 8 15.1 35 UUGACUG (SEQ ID NO: 161) 2343 7 12.9 7 12.9 36 UAGUGAUCCG (SEQ ID NO: 162) 2387 10 19.6 10 19.6 37 UCGCUCAACG (SEQ ID NO: 163) 2419 10 18.9 10 18.9 38 GAUAAAAG (SEQ ID NO: 164) 2429 8 14.8 8 14.8 39 GGGAUAACAGGCUGAU (SEQ ID 2445 16 30.1 16 30.1 NO: 165) 40 CACAUCGACG (SEQ ID NO: 166) 2475 10 18.4 10 18.4 41 GUUUGGCACCUCGAUGUCGGCUC 2490 26 51.8 26 51.8 AUC (SEQ ID NO: 167) 42 CAUCCUGGG (SEQ ID NO: 168) 2517 9 16.1 9 16.1 43 GUCCCAAGGGU (SEQ ID NO: 169) 2536 11 20.7 11 20.7 44 GGCUGUUCGCC (SEQ ID NO: 170) 2549 11 22.1 11 22.1 45 UUAAAG (SEQ ID NO: 171) 2562 6 11.5 6 11.5 46 GAGCUGGGUUCA (SEQ ID NO: 172) 2576 12 22.4 12 22.4 47 GAACGUCGUGAGACAGUUCGGUC 2588 30 56.7 30 56.7 CCUAUCU (SEQ ID NO: 173) 48 CUAGUACGAGAGGACC (SEQ ID 2652 16 29.9 16 29.9 NO: 174) 49 GCAUCUAA (SEQ ID NO: 175) 2751 8 14.7 8 14.7

TABLE-US-00005 TABLE 5 Summary of conserved nucleotide element false discovery rates (FDRs). Length Euk CNE Arc CNE Bac CNE Univ CNE 6 0.549517241 0.279764706 0.38428 0.116222222 7 0.288321429 0.197225806 0.229435897 0.053789474 8 0.198208333 0.141666667 0.156363636 0.025882353 9 0.158944444 0.097666667 0.110615385 0.0175 10 0.108258065 0.070923077 0.072818182 0.0086 11 0.066785714 0.047555556 0.050125 0.00475 12 0.040518519 0.031428571 0.033166667 0.0032 13 0.030666667 0.03 0.0248 0.002 14 0.020833333 0.023 0.017555556 0.000666667 15 0.015454545 0.0185 0.01225 0 16 0.013578947 0.02 0.008 0 17 0.011052632 0.022 0.0104 0 18 0.008210526 0.017 0.008 0 19 0.00775 0.01 0.0068 0 20 0.006933333 0.01 0.0064 0 21 0.006 0.007 0.0048 0 22 0.005333333 0.006 0.006 0 23 0.005846154 0.004 0.004 0 24 0.006727273 0.003 0.004 0 25 0.006363636 0.003 0.004 0 26 0.006 0.003 0.004 0 27 0.005818182 0.002 0.004 0 28 0.005454545 0.002 0.004 0 29 0.005454545 0.002 0.002 0 30 0.004727273 0.002 0.001333333 0

TABLE-US-00006 TABLE 6 Cross-domain conservation of CNEs from eukaryotes. The domain-specific (DS) CNEs are indicated. eCNE Euk Euk Arc Arc Bac Bac No. Position Percent Position Percent Position Percent 1 319 98.0 432 87.6 425 92.7 2 338 97.8 450 94.3 444 89.6 3 347 98.1 459 73.5 453 83.5 4 635 98.3 619 73.3 562 47.4 5 800 96.9 760 82.8 670 98.0 6 815 96.5 775 69.3 685 66.5 7 844 97.7 804 57.3 714 32.5 8 872 98.8 832 66.0 741 39.0 9 907 97.2 868 68.2 776 68.7 10 950 97.7 911 56.1 819 46.6 11 992 98.7 951 79.2 858 89.5 12 1001 97.7 960 64.5 866 62.1 13 1036 97.3 995 56.1 900 46.3 14.sup.DS 1112 99.0 1040 33.2 942 46.4 15 1124 98.2 1052 67.4 954 47.1 16.sup.DS 1135 98.2 1063 38.9 966 40.7 17 1150 96.9 1078 67.3 981 57.4 18 1221 97.2 1150 86.6 1047 73.9 19 1229 99.0 1158 64.5 1055 54.6 20 1238 98.7 1167 58.9 1064 63.4 21 1251 98.7 1180 74.8 1077 64.5 22 1291 96.6 1220 65.6 1117 60.5 23.sup.DS 1430 97.8 1352 45.5 1250 48.7 24 1442 97.9 1364 53.0 1262 45.5 25 1491 96.0 1414 72.1 1310 69.3 26 1529 97.9 1452 89.1 1349 85.6 27.sup.DS 1682 96.4 1558 4.9 1456 23.6 28 1790 97.7 1649 76.2 1565 64.7 29 1829 91.6 1671 74.4 1599 70.4 30 1839 96.8 1681 58.1 1609 57.5 31 1864 93.5 1707 53.4 1634 42.6 32 1894 99.1 1739 92.1 1664 83.3 33 1912 96.8 1757 80.6 1682 81.6 34.sup.DS 1931 98.2 1776 33.8 1701 45.2 35 2125 96.9 1821 80.5 1768 76.5 36 2184 98.6 1879 64.9 1826 70.6 37.sup.DS 2222 97.3 1917 11.8 1871 -1191.7 38 2230 98.8 1926 72.3 1888 74.1 39 2300 98.2 1996 77.8 1958 66.2 40.sup.DS 2341 96.7 2037 43.9 1999 34.3 41 2389 99.5 2086 73.0 2048 60.5 42 2483 96.5 2211 60.8 2169 69.9 43.sup.DS 2507 96.4 2234 46.0 2192 42.9 44 2607 98.7 2270 87.0 2240 82.2 45 2615 99.2 2278 96.8 2248 97.4 46 2625 97.8 2288 51.3 2258 34.5 47 2648 97.0 2311 71.9 2280 65.1 48 2673 95.6 2336 62.5 2305 60.6 49 2709 97.1 2374 61.6 2343 77.6 50.sup.DS 2717 94.3 2383 -1.2 2352 27.5 51 2748 96.6 2415 68.0 2382 54.7 52 2794 99.5 2460 76.5 2427 66.3 53 2830 98.1 2496 66.7 2465 68.9 54 2913 98.2 2578 65.1 2548 64.0 55 2932 99.4 2597 76.1 2567 68.4 56 3017 98.7 2685 97.8 2653 92.8 57 3111 90.2 2780 80.7 2750 70.3 .sup.DS= domain specific conservation

TABLE-US-00007 TABLE 7 Cross-domain conservation of CNEs from Archaea. The domain-specific (DS) CNEs are indicated. aCNE Arc Arc Euk Euk Bac Bac No. Position Perc Position Perc Position Percent 1 165 90.0 39 58.0 195 88.4 2 434 92.6 321 97.9 427 94.5 3 449 97.4 337 95.2 443 97.8 4 480 99.1 368 84.9 474 82.4 5 517 96.1 404 85.5 511 96.1 6 773 95.0 813 70.3 683 84.3 7 878 94.2 917 82.7 786 90.2 8 898 91.1 937 95.0 806 86.3 9 1002 96.0 1043 75.0 907 88.4 10 1092 96.4 1164 88.7 996 76.1 11 1114 95.3 1186 47.2 1018 91.3 12 1173 99.8 1244 50.3 1070 99.3 13 1195 95.2 1266 74.6 1092 89.8 14 1336 97.9 1414 72.9 1234 74.0 15 1356 93.6 1434 81.2 1254 68.8 16 1450 94.3 1527 89.4 1346 44.7 17 1495 98.6 1597 55.7 1388 90.1 18.sup.DS 1592 95.3 1718 27.6 1492 41.7 19 1651 96.1 1793 98.6 1567 77.3 20 1671 95.2 1829 79.9 1599 93.4 21 1739 100.0 1894 99.7 1664 83.1 22 1746 96.4 1901 90.1 1671 88.8 23 1763 92.2 1918 94.3 1688 98.3 24 1803 96.7 2107 45.3 1750 85.9 25 1825 91.9 2129 95.4 1772 84.0 26 1839 92.2 2143 87.3 1786 83.8 27 1947 91.3 2251 88.5 1909 88.6 28 1957 97.6 2261 93.1 1919 81.3 29 1977 87.5 2281 66.5 1939 77.5 30 2002 97.4 2306 99.6 1964 83.5 31 2026 94.8 2330 98.5 1988 81.4 32 2097 98.3 2400 99.7 2059 99.8 33 2108 95.7 2411 76.9 2070 78.6 34 2269 95.7 2606 82.3 2239 89.1 35 2461 98.3 2795 99.2 2428 65.9 36 2468 87.0 2802 70.8 2437 71.6 37 2477 97.2 2811 99.7 2446 100.0 38 2523 93.0 2857 69.8 2492 79.9 39 2533 97.7 2867 99.7 2502 99.9 40 2547 95.1 2881 62.0 2516 83.3 41 2571 99.8 2906 81.1 2541 99.9 42 2588 93.0 2923 58.2 2558 78.0 43 2605 97.8 2940 99.7 2575 89.0 44 2685 97.8 3017 98.5 2653 92.8 45 2726 94.6 3059 59.2 2694 88.4 46 2774 98.4 3105 80.1 2744 94.6 47 2781 98.8 3112 74.5 2751 89.7 .sup.DS= domain specific conservation

TABLE-US-00008 TABLE 8 Cross-domain conservation of CNEs from Bacteria. The domain-specific (DS) CNEs are indicated. bCNE Bac Bac Euk Euk Arc Arc No. Position Percent Position Percent Position Percent 1 193 95.2 37 55.3 163 91.6 2 426 94.9 320 97.9 433 91.9 3 443 97.8 337 95.2 449 97.4 4 457 96.9 351 91.0 463 91.3 5 564 94.3 637 70.1 621 69.8 6 670 98.0 800 97.0 760 87.9 7 683 95.8 813 83.8 773 88.7 8 731 97.8 861 59.5 821 53.7 9 802 94.6 933 75.7 894 78.0 10.sup.DS 863 95.9 997 49.2 956 46.1 11 908 95.4 1044 85.3 1003 96.7 12 1068 98.6 1242 60.6 1171 91.9 13 1091 95.8 1265 69.6 1194 76.1 14 1263 96.8 1443 66.8 1365 69.0 15 1350 98.2 1530 98.4 1453 96.5 16 1600 94.8 1830 90.1 1672 99.0 17 1610 98.0 1840 58.9 1682 93.5 18 1665 97.0 1895 81.5 1740 84.8 19 1685 97.2 1915 96.0 1760 91.8 20 1772 97.4 2129 94.7 1825 93.4 21 1783 98.9 2140 80.0 1836 87.8 22 1829 95.3 2187 69.1 1882 75.7 23 1889 97.6 2231 83.3 1927 73.1 24 1900 99.0 2242 99.2 1938 89.1 25 1908 98.3 2250 73.7 1946 77.0 26 1950 96.4 2292 62.5 1988 77.0 27 1989 94.1 2331 87.0 2027 85.6 28 2058 92.9 2399 81.4 2096 85.3 29 2071 93.4 2412 77.7 2109 82.9 30 2195 95.9 2510 51.8 2237 74.4 31 2240 89.5 2607 84.6 2270 90.5 32 2249 99.5 2616 99.2 2279 99.6 33 2259 90.1 2627 48.0 2290 55.9 34 2271 99.6 2639 75.6 2302 53.2 35 2343 94.6 2709 69.3 2374 75.6 36 2387 99.0 2753 69.4 2420 67.6 37.sup.DS 2419 97.4 2787 30.1 2453 37.0 38 2429 99.1 2796 49.9 2462 51.5 39 2445 98.4 2810 81.0 2476 76.2 40 2475 93.9 2840 78.3 2506 73.7 41 2490 98.3 2855 76.7 2521 79.4 42 2517 95.0 2882 53.0 2548 66.4 43 2536 97.5 2901 57.9 2566 77.0 44 2549 99.4 2914 72.4 2579 89.1 45 2562 98.7 2927 42.9 2592 96.5 46 2576 96.5 2941 91.4 2606 92.0 47 2588 95.9 2953 66.5 2618 82.9 48 2652 97.6 3016 87.9 2684 87.1 49 2751 94.1 3112 80.4 2781 99.7 .sup.DS= domain specific conservation

TABLE-US-00009 TABLE 9 Inter-subunit bridges involving 23S-28S rRNA CNEs. Bridge* uCNE eCNE B1a*** X X B2a uCNE 12, 13 eCNE 38 B2b uCNE 13 eCNE 36, 38 B2c X eCNE 36 B3 uCNE 20 eCNE 38, 39 B4*** X eCNE 7 B5** uCNE 4, 15 X B6 uCNE 4 X B7a X eCNE 36, 38 *Positions of bridges taken from Yusupov et al. (2001) for bacteria and Ben-Shem et al. (2010, 2011) for yeast. **Bridge B5 has shifted upward from its positions in bacteria upward to sites in an expansion segment region in yeast. ***Bridges B1a and B4 are not clustered with the other rRNA-containing bridges

TABLE-US-00010 TABLE 10 Correlation of domain-specific eCNEs with ribosome functions. d-s eCNE No. Function 14 tunnel; contacts eukaryotic-specific ribosomal protein L29e 16 tunnel; contacts eukaryotic-specific ribosomal protein L29e 23 tunnel 27 unknown 34 abuts expansion segment 27L 37 contacts eukaryotic-specific ribosomal protein L36e 40 tunnel 43 E site tRNA; abuts expansion segment 31L 50 contacts eukaryotic-specific ribosomal protein L29e

[0135] The teachings of all of the above references are hereby incorporated by reference in their entirety.

[0136] While this invention has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Sequence CWU 1

1

17517RNAArtificial SequenceUniversal conserved nucleotide elements 1ccgauag 726RNAArtificial SequenceUniversal conserved nucleotide elements 2ccuaag 636RNAArtificial SequenceUniversal conserved nucleotide elements 3cguacc 646RNAArtificial SequenceUniversal conserved nucleotide elements 4uaacuu 659RNAArtificial SequenceUniversal conserved nucleotide elements 5gacuguuua 967RNAArtificial SequenceUniversal conserved nucleotide elements 6aagaccc 777RNAArtificial SequenceUniversal conserved nucleotide elements 7ggauaac 7812RNAArtificial SequenceUniversal conserved nucleotide elements 8gagcuggguu ua 12915RNAArtificial SequenceUniversal conserved nucleotide elements 9uaguacgaga ggaac 15109RNAArtificial SequenceUniversal conserved nucleotide elements 10cugguuccc 9117RNAArtificial SequenceUniversal conserved nucleotide elements 11caaacuc 7129RNAArtificial SequenceUniversal conserved nucleotide elements 12nguaacuau 91316RNAArtificial SequenceUniversal conserved nucleotide elements 13acncucuuaa gguagc 16147RNAArtificial SequenceUniversal conserved nucleotide elements 14gcaugaa 7158RNAArtificial SequenceUniversal conserved nucleotide elements 15acuguccc 8169RNAArtificial SequenceUniversal conserved nucleotide elements 16agcuuuacu 91717RNAArtificial SequenceUniversal conserved nucleotide elements 17uugnuaccuc gaugucg 171818RNAArtificial SequenceUniversal conserved nucleotide elements 18gaccgucgug agacaggu 18198RNAArtificial SequenceUniversal conserved nucleotide elements 19cguaacag 8209RNAArtificial SequenceUniversal conserved nucleotide elements 20unccuuguc 9217RNAArtificial SequenceUniversal conserved nucleotide elements 21gcaucua 7227RNAArtificial SequenceUniversal conserved nucleotide elements 22cauccug 7238RNAArtificial SequenceEukarya conserved nucleotide elements 23gcuaaaua 8248RNAArtificial SequenceEukarya conserved nucleotide elements 24ccgauagc 8259RNAArtificial SequenceEukarya conserved nucleotide elements 25aacaaguac 92623RNAArtificial SequenceEukarya conserved nucleotide elements 26cccgucuuga aacacggacc aag 23277RNAArtificial SequenceEukarya conserved nucleotide elements 27acccgaa 7288RNAArtificial SequenceEukarya conserved nucleotide elements 28aacuaugc 8299RNAArtificial SequenceEukarya conserved nucleotide elements 29gaaacucug 93015RNAArtificial SequenceEukarya conserved nucleotide elements 30cugacgugca aaucg 153139RNAArtificial SequenceEukarya conserved nucleotide elements 31gggcgaaaga cuaaucgaac caucuaguag cugguuccc 393219RNAArtificial SequenceEukarya conserved nucleotide elements 32aaguuucccu caggauagc 19338RNAArtificial SequenceEukarya conserved nucleotide elements 33gguaaagc 8349RNAArtificial SequenceEukarya conserved nucleotide elements 34aaugauuag 93518RNAArtificial SequenceEukarya conserved nucleotide elements 35ccuauucuca aacuuuaa 18368RNAArtificial SequenceEukarya conserved nucleotide elements 36gugggcca 8379RNAArtificial SequenceEukarya conserved nucleotide elements 37ugguaagca 9387RNAArtificial SequenceEukarya conserved nucleotide elements 38acuggcg 7396RNAArtificial SequenceEukarya conserved nucleotide elements 39ugaacc 6407RNAArtificial SequenceEukarya conserved nucleotide elements 40gacagca 7418RNAArtificial SequenceEukarya conserved nucleotide elements 41gacggugg 84211RNAArtificial SequenceEukarya conserved nucleotide elements 42cauggaaguc g 114337RNAArtificial SequenceEukarya conserved nucleotide elements 43auccgcuaag gaguguguaa caacucaccu gccgaau 374424RNAArtificial SequenceEukarya conserved nucleotide elements 44cuagcccuga aaauggaugg cgcu 244512RNAArtificial SequenceEukarya conserved nucleotide elements 45gcagaucuug gu 124618RNAArtificial SequenceEukarya conserved nucleotide elements 46gguaguagca aauauuca 18478RNAArtificial SequenceEukarya conserved nucleotide elements 47gguuccau 8487RNAArtificial SequenceEukarya conserved nucleotide elements 48uccuaag 7498RNAArtificial SequenceEukarya conserved nucleotide elements 49ucuuuucu 8509RNAArtificial SequenceEukarya conserved nucleotide elements 50ccuugaaaa 9517RNAArtificial SequenceEukarya conserved nucleotide elements 51ucguacu 75222RNAArtificial SequenceEukarya conserved nucleotide elements 52aaccgcauca ggucuccaag gu 22539RNAArtificial SequenceEukarya conserved nucleotide elements 53cagccucug 95414RNAArtificial SequenceEukarya conserved nucleotide elements 54agggaagucg gcaa 145514RNAArtificial SequenceEukarya conserved nucleotide elements 55gauccguaac uucg 145612RNAArtificial SequenceEukarya conserved nucleotide elements 56aggauuggcu cu 125722RNAArtificial SequenceEukarya conserved nucleotide elements 57auccgacugu uuaauuaaaa ca 225830RNAArtificial SequenceEukarya conserved nucleotide elements 58gugauuucug cccagugcuc ugaaugucaa 30597RNAArtificial SequenceEukarya conserved nucleotide elements 59aauucaa 76069RNAArtificial SequenceEukarya conserved nucleotide elements 60caagcgcggg uaaacggcgg gaguaacuau gacucucuua agguagccaa augccucguc 60aucuaauua 696141RNAArtificial SequenceEukarya conserved nucleotide elements 61ugacgcgcau gaauggauua acgagauucc cacugucccu a 416235RNAArtificial SequenceEukarya conserved nucleotide elements 62ucuacuaucu agcgaaacca cagccaaggg aacgg 356337RNAArtificial SequenceEukarya conserved nucleotide elements 63agcggggaaa gaagacccug uugagcuuga cucuagu 376412RNAArtificial SequenceEukarya conserved nucleotide elements 64aaauaccacu ac 126510RNAArtificial SequenceEukarya conserved nucleotide elements 65uuuacuuauu 10667RNAArtificial SequenceEukarya conserved nucleotide elements 66gaguuug 7677RNAArtificial SequenceEukarya conserved nucleotide elements 67cuggggc 7688RNAArtificial SequenceEukarya conserved nucleotide elements 68acaucugu 86910RNAArtificial SequenceEukarya conserved nucleotide elements 69guguccuaag 107010RNAArtificial SequenceEukarya conserved nucleotide elements 70acagaaaucu 10716RNAArtificial SequenceEukarya conserved nucleotide elements 71uugauu 6729RNAArtificial SequenceEukarya conserved nucleotide elements 72auuuucagu 97318RNAArtificial SequenceEukarya conserved nucleotide elements 73uggccuaucg auccuuua 187435RNAArtificial SequenceEukarya conserved nucleotide elements 74cagaaaaguu accacaggga uaacuggcuu guggc 357559RNAArtificial SequenceEukarya conserved nucleotide elements 75gccaagcguu cauagcgacg uugcuuuuug auccuucgau gucggcucuu ccuaucauu 597615RNAArtificial SequenceEukarya conserved nucleotide elements 76ggauuguuca cccac 157756RNAArtificial SequenceEukarya conserved nucleotide elements 77agggaacgug agcuggguuu agaccgucgu gagacagguu aguuuuaccc uacuga 567815RNAArtificial SequenceEukarya conserved nucleotide elements 78uaguacgaga ggaac 15798RNAArtificial SequenceEukarya conserved nucleotide elements 79cgccucua 88010RNAArtificial SequenceArchaea conserved nucleotide elements 80aaacaucuua 10816RNAArtificial SequenceArchaea conserved nucleotide elements 81uaaaua 6828RNAArtificial SequenceArchaea conserved nucleotide elements 82accgauag 8838RNAArtificial SequenceArchaea conserved nucleotide elements 83cugaaaag 8846RNAArtificial SequenceArchaea conserved nucleotide elements 84ugaaac 6857RNAArtificial SequenceArchaea conserved nucleotide elements 85cgaucua 7866RNAArtificial SequenceArchaea conserved nucleotide elements 86ccaauc 6879RNAArtificial SequenceArchaea conserved nucleotide elements 87cugguuccc 98812RNAArtificial SequenceArchaea conserved nucleotide elements 88ucaaacuccg aa 12898RNAArtificial SequenceArchaea conserved nucleotide elements 89gguuaagg 8907RNAArtificial SequenceArchaea conserved nucleotide elements 90cuaagug 7916RNAArtificial SequenceArchaea conserved nucleotide elements 91agcagc 6928RNAArtificial SequenceArchaea conserved nucleotide elements 92cguaacag 8936RNAArtificial SequenceArchaea conserved nucleotide elements 93uggacc 6946RNAArtificial SequenceArchaea conserved nucleotide elements 94auccug 6959RNAArtificial SequenceArchaea conserved nucleotide elements 95gguccuaag 99612RNAArtificial SequenceArchaea conserved nucleotide elements 96gguuaauauu cc 12976RNAArtificial SequenceArchaea conserved nucleotide elements 97cguaau 6986RNAArtificial SequenceArchaea conserved nucleotide elements 98ugaaaa 6997RNAArtificial SequenceArchaea conserved nucleotide elements 99ccguacc 71006RNAArtificial SequenceArchaea conserved nucleotide elements 100agggaa 610110RNAArtificial SequenceArchaea conserved nucleotide elements 101ucggcaaauu 101026RNAArtificial SequenceArchaea conserved nucleotide elements 102uaacuu 61036RNAArtificial SequenceArchaea conserved nucleotide elements 103gucgca 610411RNAArtificial SequenceArchaea conserved nucleotide elements 104gacuguuuaa u 111056RNAArtificial SequenceArchaea conserved nucleotide elements 105aacaua 61069RNAArtificial SequenceArchaea conserved nucleotide elements 106gguaacuau 910716RNAArtificial SequenceArchaea conserved nucleotide elements 107acccucuuaa gguagc 161089RNAArtificial SequenceArchaea conserved nucleotide elements 108uaccuugcc 91098RNAArtificial SequenceArchaea conserved nucleotide elements 109gcaugaau 81109RNAArtificial SequenceArchaea conserved nucleotide elements 110cacuguccc 91117RNAArtificial SequenceArchaea conserved nucleotide elements 111aagaccc 711213RNAArtificial SequenceArchaea conserved nucleotide elements 112gagcuuuacu gca 131136RNAArtificial SequenceArchaea conserved nucleotide elements 113gcaguu 61146RNAArtificial SequenceArchaea conserved nucleotide elements 114agaaaa 61157RNAArtificial SequenceArchaea conserved nucleotide elements 115cuacccc 71167RNAArtificial SequenceArchaea conserved nucleotide elements 116ggauaac 711710RNAArtificial SequenceArchaea conserved nucleotide elements 117uugcuaccuc 101187RNAArtificial SequenceArchaea conserved nucleotide elements 118gaugucg 71199RNAArtificial SequenceArchaea conserved nucleotide elements 119ccauccugg 91206RNAArtificial SequenceArchaea conserved nucleotide elements 120aagggu 612111RNAArtificial SequenceArchaea conserved nucleotide elements 121ccuauuaaag g 1112231RNAArtificial SequenceArchaea conserved nucleotide elements 122ugagcugggu uuagaccguc gugagacagg u 3112315RNAArtificial SequenceArchaea conserved nucleotide elements 123uaguacgaga ggaac 151246RNAArtificial SequenceArchaea conserved nucleotide elements 124guuguc 61256RNAArtificial SequenceArchaea conserved nucleotide elements 125gcugaa

612610RNAArtificial SequenceArchaea conserved nucleotide elements 126gcaucuaagc 1012710RNAArtificial SequenceBacteria conserved nucleotide elements 127ugaaacaucu 101286RNAArtificial SequenceBacteria conserved nucleotide elements 128cuaaau 61298RNAArtificial SequenceBacteria conserved nucleotide elements 129accgauag 81308RNAArtificial SequenceBacteria conserved nucleotide elements 130aguaccgu 81317RNAArtificial SequenceBacteria conserved nucleotide elements 131ccuuuug 71327RNAArtificial SequenceBacteria conserved nucleotide elements 132acccgaa 71337RNAArtificial SequenceBacteria conserved nucleotide elements 133ugaucua 71346RNAArtificial SequenceBacteria conserved nucleotide elements 134ccgaac 613514RNAArtificial SequenceBacteria conserved nucleotide elements 135auagcugguu cucc 141366RNAArtificial SequenceBacteria conserved nucleotide elements 136agcacu 61377RNAArtificial SequenceBacteria conserved nucleotide elements 137caaacuc 713810RNAArtificial SequenceBacteria conserved nucleotide elements 138gaagcagcca 1013916RNAArtificial SequenceBacteria conserved nucleotide elements 139gcguaauagc ucacug 161406RNAArtificial SequenceBacteria conserved nucleotide elements 140ugagua 61416RNAArtificial SequenceBacteria conserved nucleotide elements 141ccuaag 61426RNAArtificial SequenceBacteria conserved nucleotide elements 142cguacc 61438RNAArtificial SequenceBacteria conserved nucleotide elements 143accgacac 814411RNAArtificial SequenceBacteria conserved nucleotide elements 144aggaacucgg c 1114512RNAArtificial SequenceBacteria conserved nucleotide elements 145ccguaacuuc gg 121469RNAArtificial SequenceBacteria conserved nucleotide elements 146gacuguuua 914710RNAArtificial SequenceBacteria conserved nucleotide elements 147aaaaacacag 1014813RNAArtificial SequenceBacteria conserved nucleotide elements 148acgccugccc ggu 131496RNAArtificial SequenceBacteria conserved nucleotide elements 149aagccc 61506RNAArtificial SequenceBacteria conserved nucleotide elements 150aacggc 615142RNAArtificial SequenceBacteria conserved nucleotide elements 151ccguaacuau aacgguccua agguagcgaa auuccuuguc gg 4215221RNAArtificial SequenceBacteria conserved nucleotide elements 152guaaguuccg accugcacga a 211538RNAArtificial SequenceBacteria conserved nucleotide elements 153acugucuc 815411RNAArtificial SequenceBacteria conserved nucleotide elements 154aaagaccccg u 111559RNAArtificial SequenceBacteria conserved nucleotide elements 155accuuuacu 91566RNAArtificial SequenceBacteria conserved nucleotide elements 156ucuaac 61577RNAArtificial SequenceBacteria conserved nucleotide elements 157caguuug 71586RNAArtificial SequenceBacteria conserved nucleotide elements 158uggggc 61599RNAArtificial SequenceBacteria conserved nucleotide elements 159gccucccaa 91608RNAArtificial SequenceBacteria conserved nucleotide elements 160guaacgga 81617RNAArtificial SequenceBacteria conserved nucleotide elements 161uugacug 716210RNAArtificial SequenceBacteria conserved nucleotide elements 162uagugauccg 1016310RNAArtificial SequenceBacteria conserved nucleotide elements 163ucgcucaacg 101648RNAArtificial SequenceBacteria conserved nucleotide elements 164gauaaaag 816516RNAArtificial SequenceBacteria conserved nucleotide elements 165gggauaacag gcugau 1616610RNAArtificial SequenceBacteria conserved nucleotide elements 166cacaucgacg 1016726RNAArtificial SequenceBacteria conserved nucleotide elements 167guuuggcacc ucgaugucgg cucauc 261689RNAArtificial SequenceBacteria conserved nucleotide elements 168cauccuggg 916911RNAArtificial SequenceBacteria conserved nucleotide elements 169gucccaaggg u 1117011RNAArtificial SequenceBacteria conserved nucleotide elements 170ggcuguucgc c 111716RNAArtificial SequenceBacteria conserved nucleotide elements 171uuaaag 617212RNAArtificial SequenceBacteria conserved nucleotide elements 172gagcuggguu ca 1217330RNAArtificial SequenceBacteria conserved nucleotide elements 173gaacgucgug agacaguucg gucccuaucu 3017416RNAArtificial SequenceBacteria conserved nucleotide elements 174cuaguacgag aggacc 161758RNAArtificial SequenceBacteria conserved nucleotide elements 175gcaucuaa 8

* * * * *

Conserved Nucleotide Elements In Ribosomal RNA

Gerbi; Susan A. ; et al.

References