U.S. patent application number 11/367512 was filed with the patent office on 2007-02-22 for techniques for linking non-coding and gene-coding deoxyribonucleic acid sequences and applications thereof.
This patent application is currently assigned to International Business Machines Corporation. Invention is credited to Tien Huynh, Alice Carolyn McHardy, Kevin Charles Miranda, Isidore Rigoutsos, Aristotelis Tsirigos.
Application Number | 20070042397 11/367512 |
Document ID | / |
Family ID | 37767730 |
Filed Date | 2007-02-22 |
United States Patent
Application |
20070042397 |
Kind Code |
A1 |
Rigoutsos; Isidore ; et
al. |
February 22, 2007 |
Techniques for linking non-coding and gene-coding deoxyribonucleic
acid sequences and applications thereof
Abstract
Techniques for linking non-coding and gene coding regions of a
genome are provided. In one aspect, a method of determining
associations between non-coding sequences and gene coding sequences
in a genome of an organism comprises the following steps. At least
one conserved region is identified from one or more non-coding
sequences. Additional instances of the conserved region are located
in the untranslated or amino acid coding regions of one or more
genes in the organism under consideration, and the conserved region
is associated with the one or more biological processes in which
these one or more genes participate.
Inventors: |
Rigoutsos; Isidore;
(Astoria, NY) ; Huynh; Tien; (Yorktown Heights,
NY) ; Tsirigos; Aristotelis; (Astoria, NY) ;
McHardy; Alice Carolyn; (New York, NY) ; Miranda;
Kevin Charles; (McDowall, AU) |
Correspondence
Address: |
RYAN, MASON & LEWIS, LLP
90 FOREST AVENUE
LOCUST VALLEY
NY
11560
US
|
Assignee: |
International Business Machines
Corporation
Armonk
NY
|
Family ID: |
37767730 |
Appl. No.: |
11/367512 |
Filed: |
March 3, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60658251 |
Mar 3, 2005 |
|
|
|
60696213 |
Jul 1, 2005 |
|
|
|
Current U.S.
Class: |
435/6.11 ;
702/20 |
Current CPC
Class: |
G16B 20/00 20190201;
G16B 30/00 20190201 |
Class at
Publication: |
435/006 ;
702/020 |
International
Class: |
C12Q 1/68 20060101
C12Q001/68; G06F 19/00 20060101 G06F019/00 |
Claims
1. A method of determining associations between non-coding
sequences and gene coding sequences in a genome of an organism, the
method comprising the steps of: identifying at least one conserved
region from a plurality of the non-coding sequences; linking the at
least one conserved region with one or more of the gene coding
sequences of the genome; and associating the at least one conserved
region with one or more biological processes of the organism.
2. The method of claim 1, wherein the genome comprises
deoxyribonucleic acid.
3. The method of claim 1, wherein the non-coding sequences comprise
one or more of intergenic sequences and intronic sequences.
4. The method of claim 1, wherein the step of identifying, further
comprises the step of discovering patterns in the non-coding
sequences.
5. The method of claim 4, further comprising the step of retaining
patterns that have a minimum length and a minimum number of
appearances in the non-coding sequences
6. The method of claim 5, wherein the minimum length is sixteen
nucleotides.
7. The method of claim 5, wherein the minimum number of appearances
is forty.
8. The method of claim 4, further comprising the step of sorting
patterns so that longer and more frequent patterns are considered
before shorter and less frequent patterns.
9. The method of claim 4, further comprising the step of sorting
patterns in order of decreasing value of the product of
length-of-pattern and copy-number-of-pattern.
10. The method of claim 9, wherein the patterns are considered in
their order of appearance in the sorted list.
11. The method of claim 10, wherein instances of a pattern under
consideration are located in the sequences of the region under
consideration.
12. The method of claim 11, wherein the region under consideration
comprises all 5' untranslated sequences for the genome under
consideration.
13. The method of claim 11, wherein the region under consideration
comprises all amino acid coding sequences for the genome under
consideration.
14. The method of claim 11, wherein the region under consideration
comprises all 3' untranslated sequences for the genome under
consideration.
15. The method of claim 11, wherein a pattern is selected or
discarded based on a criterion.
16. The method of claim 15, wherein a pattern is selected if all of
its instances occur at positions that do not contain one or more
instances of an earlier selected pattern.
17. The method of claim 15, wherein a pattern is discarded if one
or more of its instances occur at positions that contain one or
more instances of an earlier selected pattern.
18. The method of claim 1, wherein the organism is a eukaryotic
organism.
19. The method of claim 18, wherein the eukaryotic organism is one
of H. sapiens, C. familiaris, M. musculus, R. norvegicus, G.
gallus, D. melanogaster and C. elegans.
20. The method of claim 1, wherein the identifying step further
comprises the step of discovering patterns in the non-coding
sequences and the linking step further comprises the step of
searching for instances of the discovered patterns in the genome
associated with one or more of the untranslated or amino acid
coding sequences of the genes in the organism under
consideration.
21. The method of claim 1, further comprising the step of: linking
each identified pattern with the one or more biological processes
that are associated with the one or more genes whose untranslated
and amino acid coding regions contain one or more instances of the
pattern.
22. An apparatus for determining associations between non-coding
sequences and gene coding sequences in a genome of an organism, the
apparatus comprising: a memory; and at least one processor, coupled
to the memory, operative to: identify at least one conserved region
from a plurality of the non-coding sequences; link the at least one
conserved region with one or more of the gene coding sequences of
the genome; and associating the at least one conserved region with
one or more biological processes of the organism.
23. An article of manufacture for determining associations between
non-coding sequences and gene coding sequences in a genome of an
organism, comprising a machine readable medium containing one or
more programs which when executed implement the steps of:
identifying at least one conserved region from a plurality of the
non-coding sequences; linking the at least one conserved region
with one or more of the gene coding sequences of the genome; and
associating the at least one conserved region with one or more
biological processes of the organism.
24. A method of designing one or more sequences of short
interfering RNAs that can interact with one or more sites in a
given transcript of a given sequence in a given organism and result
in the down-regulation of the expression of a protein product
encoded by the given transcript, the method comprising the steps
of: identifying one or regions of interest in the sequence of the
given transcript; sub-selecting one or more regions from the
collection of the regions of interest; generating one or more
derived sequences from the sequence of the one or more sub-selected
regions; using the one or more derived sequences to create one or
more instances of a corresponding molecule that the one or more
derived sequences represent; and using the one or more instances of
the created molecule in an appropriate environment to regulate the
expression of the given transcript.
25. The method of claim 24, wherein a region of interest in the
collection of regions of interest is identified to be an instance
of a motif that has one or more copies in intergenic and intronic
regions of the genome of interest and one or more copies in
untranslated and amino acid coding regions of one or more genes in
the genome of interest.
26. The method of claim 24, wherein a region of interest in the
collection of regions of interest is identified using a method that
is based on pattern discovery.
27. The method of claim 24, wherein a region of interest in the
collection of regions of interest is identified to be a target
island.
28. The method of claim 24, wherein a region of interest is located
in a 5' untranslated region of the given transcript.
29. The method of claim 24, wherein a region of interest is located
in an amino acid coding region of the given transcript.
30. The method of claim 29, wherein a region of interest is located
in a 3' untranslated region of the given transcript.
31. The method of claim 24, wherein the genome of interest is a
eukaryotic genome.
32. The method of claim 31, where the eukaryotic genome is a human
genome.
33. The method of claim 31, wherein the eukaryotic genome is a
mouse genome.
34. The method of claim 31, wherein the eukaryotic genome is a rat
genome.
35. The method of claim 31, wherein the eukaryotic genome is a dog
genome.
36. The method of claim 31, wherein the eukaryotic genome is a
fruit fly genome.
37. The method of claim 31, wherein the eukaryotic genome is a worm
genome.
38. The method of claim 24, wherein a region of interest is
sub-selected based on one or more of its attributes.
39. The method of claim 38, wherein an attribute is length of the
region.
40. The method of claim 38, wherein an attribute is location of the
region in the transcript.
41. The method of claim 24, wherein a derived sequence is a reverse
complement of the sequence of the one or more sub-selected
regions.
42. The method of claim 24, wherein a derived sequence is a
near-reverse complement of the sequence of the one or more
sub-selected regions.
43. The method of claim 24, wherein the one or more copies of the
molecule can be built using any of a set of biochemical
processes.
44. A method of engineering a given transcript of a given gene in a
given organism in order to regulate its expression, the method
comprising the steps of: identifying one or more regions of
interest in a sequence of the given transcript; sub-selecting one
or more regions from the collection of the regions of interest; and
using the one or more sub-selected regions to make one or more
modifications to the sequence of the given transcript.
45. The method of claim 44, wherein a region of interest in the
collection of regions of interest comprises an instance of a motif
that has one or more copies in intergenic and intronic regions of
the genome of interest and one or more copies in untranslated and
amino acid coding regions of one or more genes in the genome of
interest.
46. The method of claim 45, where the motif is computed using a
process wherein associations are determined between non-coding
sequences and gene coding sequences in a genome of an organism, the
process comprising the steps of: identifying at least one conserved
region from a plurality of the non-coding sequences; linking the at
least one conserved region with one or more of the gene coding
sequences of the genome; and associating the at least one conserved
region with one or more biological processes of the organism.
47. The method of claim 44, wherein a region of interest in the
collection of regions of interest is computed using a method that
is based on pattern discovery.
48. The method of claim 44, wherein a region of interest is located
in a 5' untranslated region of the given transcript.
49. The method of claim 44, wherein a region of interest is located
in an amino acid coding region of the given transcript.
50. The method of claim 44, wherein a region of interest is located
in a 3' untranslated region of the given transcript.
51. The method of claim 44, wherein a region of interest is
sub-selected based on one or more of its attributes.
52. The method of claim 51, wherein an attribute is length of the
region.
53. The method of claim 51, wherein an attribute is location of the
region in the transcript.
54. The method of claim 51, wherein an attribute is association of
the region with a given biological process.
55. The method of claim 51, wherein an attribute is association of
the region with a given tissue.
56. The method of claim 51, wherein an attribute is association of
the region with a given cellular compartment.
57. The method of claim 44, wherein a modification comprises an
extension of the sequence of the given transcript.
58. The method of claim 57, wherein the extension comprises one or
more instances of a region of interest.
59. The method of claim 44, wherein a modification comprises a
shortening of the sequence of the given transcript.
60. The method of claim 59, wherein the shortening comprises one or
more instances of a region of interest.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority to U.S. provisional
applications identified as: Ser. No. 60/658,251 (attorney docket
no. YOR920050130US1), filed Mar. 3, 2005, and entitled
"Overrepresentation of Nucleotides in Human Chromosomes and in the
3' Untranslated Regions of Human Genes;" and Ser. No. 60/696,213
(attorney docket no. YOR920050350US1), filed Jul. 1, 2005, and
entitled "Techniques For Linking Non-Coding And Gene-Coding
Deoxyribonucleic Acid Sequences," the disclosures of which are
incorporated by reference herein.
FIELD OF THE INVENTION
[0002] The present invention relates to genes and, more
particularly, to techniques for linking regions of a genome.
BACKGROUND OF THE INVENTION
[0003] It is known that the intergenic and intronic regions
comprise most of the genomic sequence of higher organisms. The
intergenic and intronic regions are collectively referred to as the
"non-coding region" of an organism's genome, as opposed to the
"gene-" or "protein-coding region" of the genome. Even though
recent work suggested participation of the intergenic and intronic
regions in a regulatory role, for the most part, their true
function remains elusive. The search for conserved motifs, presumed
to be regulatory and control signals, in regions upstream of the 5'
untranslated regions (5'UTRs) of genes, has been the focus of
research activities for many years.
[0004] More recently, researchers began studying the 3'
untranslated regions (3'UTRs) of genes where they discovered
conserved regions and showed them to be functionally significant,
in direct analogy to the cis-motifs of promoter regions.
Large-scale comparative analyses allowed researchers to also study
conservation in the vicinity of genes and elsewhere in the genome
with great success. However, these studies were carried out on only
a handful of organisms at a time because of the magnitude of the
necessary computations.
[0005] The analysis of 3'UTRs intensified further after it was
discovered that they contain binding sites that are targeted by
short interfering ribonucleic acids (RNAs) that induce the
post-transcriptional control of the corresponding gene's expression
through either messenger RNA (mRNA) degradation or translational
inhibition. Accumulating evidence that non-coding RNAs control
developmental and physiological processes and that a considerable
part of the human genome is transcribed, has helped researchers
identify "functional" elements in areas of the genome that are not
associated with protein-coding regions.
[0006] Thus, techniques for efficiently and effectively identifying
and associating non-coding regions with gene coding regions of a
genome would be desirable.
SUMMARY OF THE INVENTION
[0007] Techniques for linking non-coding and gene coding regions of
a genome are provided, in accordance with an illustrative
embodiment of the present invention.
[0008] In a first aspect of the invention, a method of determining
associations between non-coding sequences and gene coding sequences
in a genome of an organism comprises the following steps. At least
one conserved region is identified from a plurality of the
non-coding sequences. The at least one conserved region is linked
with one or more of the gene coding sequences of the genome. The at
least one conserved region is associated with one or more
biological processes of the organism.
[0009] In a second aspect of the invention, a method of designing
one or more sequences of small interfering RNAs that can interact
with one or more sites in a given transcript of a given sequence in
a given organism and result in the down-regulation of the
expression of the protein product encoded by the given transcript
comprises the following steps. One or regions of interest are
identified in the sequence of a given transcript. One or more
regions are sub-selected from the collection of these regions. One
or more derived sequences are generated from the sequence of the
one or more sub-selected regions. The one or more derived sequences
are used to create one more instances of the corresponding molecule
that the one or more derived sequences represent. The one or more
instances of the created molecule are used in an appropriate
environment to regulate the expression of the given transcript.
[0010] In a third aspect of the invention, a method of engineering
a given transcript of a given gene in a given organism in order to
regulate its expression comprises the following steps. One or more
regions of interest are identified in the sequence of a given
transcript. One or more regions are sub-selected from the
collection of these regions. The one or more sub-selected regions
are used to make one or more modifications to the sequence of the
given transcript.
[0011] A more complete understanding of the present invention, as
well as further features and advantages of the present invention,
will be obtained by reference to the following detailed description
and drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] FIG. 1 is a diagram illustrating an exemplary methodology
for determining associations between non-coding and gene coding
sequences in a genome of an organism, according to an embodiment of
the present invention;
[0013] FIG. 2 is a diagram illustrating a probability density
function and a cumulative distribution for lengths of patterns
discovered in analyzed intergenic and intronic sequences of the
human genome, according to an embodiment of the present
invention;
[0014] FIG. 3 is a diagram illustrating a probability density
function for a number of 16-mers with a given number of copies in a
random input set, according to an embodiment of the present
invention;
[0015] FIG. 4 is a diagram illustrating a preprocessing
methodology, according to an embodiment of the present
invention;
[0016] FIG. 5 is a diagram illustrating pattern specificity and
number of appearances, according to an embodiment of the present
invention;
[0017] FIG. 6 is a diagram illustrating a process for determining
logically distinct patterns, according to an embodiment of the
present invention;
[0018] FIG. 7 is a diagram illustrating a probability density
function for lengths of pyknons, according to an embodiment of the
present invention;
[0019] FIG. 8 is a diagram illustrating a number of blocks that are
shared by three sets whose union makes up a pyknon collection,
according to an embodiment of the present invention;
[0020] FIG. 9 is a diagram illustrating a cumulative function for a
number of intergenic and intronic copies of pyknons, according to
an embodiment of the present invention;
[0021] FIG. 10 is a diagram illustrating a cumulative function
showing what percentage of affected transcripts contain N or more
pyknon instances, according to an embodiment of the present
invention;
[0022] FIG. 11 is a diagram illustrating combinatorial arrangements
of pyknons in 3'UTRs, according to an embodiment of the present
invention;
[0023] FIG. 12 is a diagram illustrating combinatorial arrangements
of pyknons in 5'UTRs, according to an embodiment of the present
invention;
[0024] FIG. 13 is a diagram illustrating combinatorial arrangements
of pyknons in amino acid-coding regions, according to an embodiment
of the present invention;
[0025] FIG. 14 is a diagram illustrating a probability density
function and corresponding cumulative function for variable
representing the fraction of pyknon instances located in
repeat-free regions, according to an embodiment of the present
invention;
[0026] FIG. 15 is a diagram illustrating a partial list of
biological processes whose corresponding genes show significant
enrichment or depletion in pyknon instances in their 5'UTRs, CRs or
3'UTRs, according to an embodiment of the present invention;
[0027] FIG. 16 is a diagram illustrating probability density
functions for the distance between starting points of consecutive
instances of pyknons, according to an embodiment of the present
invention;
[0028] FIG. 17 is a diagram illustrating probability density
functions for the number of intergenic and intronic copies of
variable-length strings derived by counting instances of
3'UTR-conserved pyknons after shifting, according to an embodiment
of the present invention;
[0029] FIG. 18 is a diagram illustrating the number of
intergenic/intronic neighborhoods each of which contains the
reverse complement of a pyknon and is predicted to fold into a
hairpin-like structure, according to an embodiment of the present
invention;
[0030] FIG. 19 is a diagram illustrating the number of positions
per 10,000 nucleotides that human pyknons cover in other genomes,
according to an embodiment of the present invention;
[0031] FIG. 20 is a diagram illustrating the number of human
pyknons that can be found in the untranslated and coding regions of
other genomes, and the number of intergenic/intronic positions that
human pyknons cover in other genomes, according to an embodiment of
the present invention;
[0032] FIG. 21 is a diagram illustrating a first methodology for
using pyknons, according to an embodiment of the present
invention;
[0033] FIG. 22 is a diagram illustrating a second methodology for
using pyknons, according to an embodiment of the present invention;
and
[0034] FIG. 23 is a block diagram of an exemplary hardware
implementation of one or more of the methodologies of the present
invention, according to an embodiment of the present invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0035] FIG. 1 is a diagram illustrating exemplary methodology 100
for determining associations between non-coding and gene coding
sequences in a genome of an organism. In step 102, non-coding
sequences from the genome of an organism are obtained. Preferably,
the non-coding sequences comprise intergenic and/or intronic
sequences. As used herein, the term "intergenic" refers generally
to any portion of a deoxyribonucleic acid (DNA) strand that is
between gene sequences. Further, as used herein, the term
"intronic" refers to any portion of a DNA strand that is
encompassed by an intron.
[0036] According to one exemplary embodiment, the genome comprises
the human genome. Further, as will be described below, the starting
point of the present techniques may be the genome of a single
organism, e.g., the human genome.
[0037] In step 104, conserved regions in the intergenic/intronic
sequences are identified. As will be described in detail below,
these conserved regions may be identified by used pattern discovery
techniques to identify patterns existing in the sequences.
[0038] In step 106, the identified conserved regions (also referred
to as `conserved motifs`) of the intergenic/intronic non-coding
sequences are linked to gene coding regions of the genome.
Specifically, instances of the patterns, described, for example, in
conjunction with the description of step 104, above, may be
searched for in gene coding regions of the genome. For example, as
will be described in detail below, sequences will be identified,
e.g., that are at least 16 nucleotide bases in length and appear a
minimum of 40 times in the non-coding region of a genome, and which
also appear at least one time in the coding region of the genome.
These identified sequences (motifs) link the coding and non-coding
regions of the genome. As will also be described in detail below,
linking the conserved regions and gene coding regions of the genome
provides for an association to be made with the biological
processes of the organism.
[0039] In step 108, as will be described in detail below, the
identified motifs that link the non-genic and genic regions of a
genome also provide for an association between these motifs and
specific biological processes that even persists in organisms
beyond the human genome.
[0040] As used herein, the phrase "conserved region" may also be
referred to as a "conserved motif" or a "conserved block" or "an
exceptionally-well-conserved block (EWCB)" or a "pyknon." The term
"pyknon" is from the Greek adjective .pi.ukvo.delta./.pi.ukv{acute
over (.eta.)}/.pi.vkvov meaning "serried, dense, frequent."
[0041] Accordingly, using an unsupervised pattern discovery method,
we processed the human intergenic and intronic regions and
catalogued all variable-length patterns with identically conserved
copies and multiplicities above what is expected by chance. Among
the millions of discovered patterns, we found a subset of 127,998
patterns, termed pyknons, which have one or more additional
non-overlapping instances in the untranslated and protein-coding
regions of 30,675 transcripts from 20,059 human genes. The pyknons
are found to arrange combinatorially in the 5' untranslated,
coding, and 3' untranslated regions of numerous human genes where
they form mosaics. Consecutive instances of pyknons in these
regions exhibit a strong bias in their relative placement favoring
distances of about 22 nucleotides. We also found that pyknons are
enriched in a statistically significant manner in genes involved in
specific processes, e.g., cell communication, transcription,
regulation of transcription, signaling, transport, etc. For about
1/3.sup.rd of the pyknons, the intergenic/intronic instances of
their reverse complement lie within 382,244 non-overlapping
regions, typically 60-80 nucleotides long, which are predicted to
form double-stranded, energetically stable, hairpin-shaped RNA
secondary structures; additionally, the pyknons subsume about 40%
of the known microRNA sequences, thus suggesting a possible link
with post-transcriptional gene silencing and RNA interference.
Cross-genome comparisons revealed that many of the pyknons are also
present in the 3'UTRs of genes from other vertebrates and
invertebrates where they are over-represented in similar biological
processes as in the human genome. These novel and unexpected
findings suggest potential functional connections between the
coding and non-coding parts of the human genome.
[0042] Thus, in accordance with an illustrative methodology of the
invention, we examine whether highly specific patterns exist within
a single genome that may act as targets or sources for such
putative regulatory activity or as a `vocabulary` for yet
undiscovered mechanisms. Our analysis represents a substantial
point of departure from previous efforts. First, we carry out all
of the analysis on a single genome. Second, we seek patterns in the
intergenic and intronic regions of the genome (not the UTRs or
protein coding) regions. Third, the pattern instances transcend
chromosomal boundaries. And, fourth, we rely on the unsupervised
discovery of conserved motifs instead of searching schemes. In
particular, we sought to discover identically conserved,
variable-length motifs of certain minimum length but unlimited
maximum length in human intergenic and intronic regions. We
discovered more than 66 million motifs with multiplicities well
above what is expected by chance. A sizeable subset of these
motifs, referred to as the pyknons, have one or more additional
instances in the untranslated and coding regions of almost all
known human genes and exhibit properties that suggest the
possibility of an extensive link between the non-genic and genic
regions of the genome and a connection with post-transcriptional
gene silencing (PTGS) and RNA interference (RNAi).
[0043] As described, for example, in conjunction with the
description of step 104 of FIG. 1, above, according to the
techniques described herein, a pattern discovery step may be
performed. We used the parallel version of a pattern discovery
algorithm described in I. Rigoutsos et al., Combinatorial Pattern
Discovery in Biological Sequences: the TEIRESIAS Algorithm, 14
BIOINFORMATICS 1, pgs. 55-67 (January 1998) (hereinafter
"Rigoutsos"), the disclosure of which is incorporated by reference
herein. The pattern discovery (Teiresias) algorithm seeks
variable-length motifs that are identically conserved across all of
their instances, comprise a minimum of L=16 nucleotides and appear
a minimum of K=40 copies in the processed input (see below
regarding the values of L, K). The algorithm guarantees the
reporting of all composition-maximal and length-maximal patterns
satisfying these parameters. The input comprised the intergenic and
intronic sequences (step 102 of FIG. 1) of the human genome from
ENSEMBL Rel. 31 (see Stabenau, A., McVicker, G., Melsopp, C.,
Proctor, G., Clamp, M. & Birney, E. (2004) Genome Res 14,
929-33) for a total of 6,039,720,050 nucleotides. The input did not
include the reverse complement of the 5' untranslated, amino acid
coding or 3' untranslated regions of any human genes. This
exclusion ensures that any discovered patterns are not connected in
any way to sequences of known genes, protein motifs or domains. The
algorithm ran on a shared-memory architecture with 128 Gigabytes of
main memory and 8 processors running at a clock frequency of 1.75
GHz, and generated an initial set P.sub.init of 66+ million
statistically significant patterns (see below). Most of the
patterns in P.sub.init were a few tens of nucleotides in length.
FIG. 2 shows the probability density function (in black) and
cumulative distribution (in light gray) for the lengths of the more
than 66 million patterns discovered in the analyzed intergenic and
intronic sequences of the human genome. These patterns form the set
P.sub.init. As can be seen from FIG. 2, more than 95% of all
discovered patterns are shorter than 100 nucleotides. Note that the
primary Y-axis is logarithmic whereas the secondary is linear.
[0044] The Teiresias discovery algorithm that we used for this
analysis requires the setting of three parameters: L, W and K. The
parameter L controls the minimum possible size of the discovered
patterns but has no bearing on the patterns' maximum length; the
latter is not constrained in any way. The parameter W satisfies the
inequality W.gtoreq.L and controls the `degree of conservation`
across the various instances of the reported patterns: smaller
(respectively, larger) values of W will tolerate fewer
(respectively, more) mismatches across the instances. Since for
this analysis, we are interested only in patterns with identically
conserved instances, we set W=L (i.e., the patterns contained no
"wild cards"). Finally, the parameter K controls the minimum
required number of appearances before a pattern can be reported by
the algorithm.
[0045] For a given choice of L, W and K, the algorithm guarantees
the reporting of all patterns that have K or more appearances in
the processed input and are such that any L consecutive (but not
necessarily contiguous) positions span at most W positions. These
patterns are generally overlapping: a given sequence location can
simultaneously appear in multiple, distinct, non-redundant
patterns. It is also important to stress three properties of the
algorithm. First, as stated above, the value L does not impose any
constraint on the maximum length of a pattern which is unbounded.
Second, each reported pattern will be maximal in composition, i.e.,
it cannot be made more specific by specifying the value of a
wild-card without decreasing the number of locations where it
appears. And, third, each reported pattern will be maximal in
length, i.e., it cannot be made longer without decreasing the
number of locations where it appears. In this discussion, we use
the terms pattern, block and motif interchangeably.
[0046] Opting for small L values generally permits the
identification of shorter conserved motifs that may be present in
the processed input, in addition to all longer ones--see above
properties. Generally, for short motifs to be claimed as
statistically significant they need to have a large number of
copies in the processed input; requiring a lot of copies runs the
risk of discarding bona fide motifs. On the other hand, larger
values of L will generally permit the identification of
statistically significant motifs even if these motifs repeat only a
small number of times. This happens at the expense of significant
decreases in sensitivity; i.e. bona fide motifs will be missed.
[0047] For our analysis, we have selected L=16, a value that
strikes a balance between the desirable sensitivity (which favors
lower L values) and achievable specificity (which favors higher L
values). We stress that the maximality properties of the pattern
discovery step ensure that we will be able to report any and all
motifs that are 16 nucleotides or longer. And as explained above,
we will set W=L.
[0048] The last parameter that needs to be set is K, the required
number of appearances for a pattern to be reported. K needs to be
set to a value that can ensure that the reported patterns could not
have been derived from a random database with the same size as the
input at hand. In order to determine this value, we used several
randomly-shuffled versions of our intergenic and intronic input (of
approximately 6 billion bases) and in there sought frequent,
fixed-size 16-mers with all low-complexity 16-mers removed by NSEG
(see Wootton, J.C. & Federhen, S. (1993) Computers in Chemistry
17, 149-163). The idea here is that if a randomly-shuffled version
of our input set cannot give rise to any 16-mers that appear more
than K.sub.x times, then it will also be true that no patterns
exist in the input set that are longer than 16 nucleotides and have
more than K.sub.x copies. Several iterations of this process
allowed us to establish that K.sub.x=23. FIG. 3 shows the
probability density function for the number of 16-mers with a given
number of copies in the random input set--note that both the X and
Y axes are logarithmic. From this, it follows that a
randomly-shuffled version of our input set cannot possibly give
rise to patterns which are longer than 16 nucleotides and have more
than 23 copies: in fact, as a pattern increases in length, the
number of times it appears in a given input set can only decrease.
We thus opted for the even larger threshold of K=40 for our pattern
discovery step.
[0049] Before we sought to discover patterns in the intergenic and
intronic regions of the human genome, we preprocessed the sequences
and removed: a) all the regions that corresponded to 5'
untranslated, coding and 3' untranslated regions of known genes;
and, b) all the regions that were the reverse complement of 5'
untranslated, coding and 3' untranslated regions of known genes. We
show this preprocessing step pictorially in FIG. 4. The genomic
input before the preprocessing step is shown above the arrow, and
the input upon which pattern discovery is run is shown below the
arrow.
[0050] Under the assumption that all four nucleotides are
equiprobable (i.e., p.sub.A=p.sub.T=p.sub.C=p.sub.G=1/4),
independent, and, identically distributed, we estimate the
probability p of a pattern of length l to be p=4.sup.-l. We can
compute the probability Pr.sub.k to observe k instances of a given
pattern in a database of size D (D>>1) to be
Pr.sub.k.apprxeq.(pD).sup.ke.sup.-pD/k! (Poisson distribution). The
least specific pattern that our method will discover is one that is
the shortest possible (i.e., l=L=16) and appears the smallest
allowed number of times (i.e., k=K=40): if D=6.0.times.10.sup.9
bases (=all chromosomes and both strands), then
Pr.sub.k=1.95.times.10.sup.-43.
[0051] We now revisit this calculation by taking into account the
nucleotides' natural probability of occurrence. Using ENSEMBL
Release 31 from May 2005 (based on NCBI Assembly 35 from July 2004)
as our database D, we see that the fraction of bases that are
undetermined across the 24 human chromosomes ranges from roughly
1.2 to 61.0% for the Y chromosome. Of course, the following
constraints should apply: p.sub.A=p.sub.T and p.sub.C=p.sub.G.
Since the fractions of nucleotides that are undetermined are not
equal, the required balance between A/T and C/G is only
approximately preserved. Ignoring the unspecified positions and
recomputing ratios based on the remaining bases, we find that
p.sub.A=p.sub.T.apprxeq. 3/10 and p.sub.C=p.sub.G.apprxeq.
2/10.
[0052] Let us consider a block of size l and let "match" indicate
the match between the i-th character of this block and a character
c at position in a database D of nucleotide sequences. Then it is
easy to see that: Pr .function. ( match ) = .times. Pr .function. (
match .times. .times. .times. with .times. .times. c ) = .times. Pr
.function. ( match ( c .times. .times. is .times. .times. one
.times. .times. .times. of .times. .times. A , C , G , T ) ) =
.times. Pr ( ( match c = A ) ( match c = C ) .times. ( match c = G
) ( match c = T ) ) = .times. Pr .function. ( match .times. .times.
A ) .times. Pr .function. ( A ) + Pr .function. ( match .times.
.times. C ) .times. Pr .function. ( C ) + .times. Pr .function. (
match .times. .times. G ) .times. Pr .function. ( G ) + Pr
.function. ( match .times. .times. T ) .times. P .function. ( T ) =
.times. Pr .function. ( A ) .times. Pr .function. ( A ) + Pr
.function. ( C ) .times. Pr .function. ( C ) + .times. Pr
.function. ( G ) .times. Pr .function. ( G ) + Pr .function. ( T )
.times. P .function. ( T ) = .times. Pr .function. ( A ) 2 + Pr
.function. ( C ) 2 + Pr .function. ( G ) 2 + Pr .function. ( T ) 2
= .times. p A 2 + p C 2 + p G 2 + p T 2 = .times. 0.3 2 + 0.3 2 +
0.2 2 + 0.2 2 = .times. 0.26 ##EQU1##
[0053] In this analysis, we consider blocks of length l with
l.gtoreq.16. Naturally, these shortest blocks will be associated
with the largest probability p of observing a pattern
accidentally--the value p decreases as the value of l increases.
The probability that a block of length l=16 will have one instance
in the database D is then p.sub.lPr(match).sup.16=(0.26).sup.16 or
p.sub.l=4.4 10.sup.-10.
[0054] An alternative way to approach this is to assume that the
block of length l is constructed by drawing from the same
nucleotide distribution that gives rise to the database D. Then, a
block of length l=16 would comprise p.sub.A*16.apprxeq.5 A's,
p.sub.C*16.apprxeq.5 C's, p.sub.G*16.apprxeq.3 G's and
p.sub.T*16.apprxeq.3 T's. Then, the probability that this block
will arise accidentally is
p.sub.2p.sub.A.sup.5*p.sub.C.sup.5*p.sub.G.sup.3*p.sub.T.sup.3=3.8*10.sup-
.-10.
[0055] We can compute the probability of finding k accidental
instances in a database D that contains 6.times.10.sup.9 bases
where each of the instances is independent of all the preceding
instances using the Poisson distribution
Pr.sub.k=(pD).sup.ke.sup.-pD/k!. The probability Pr.sub.k that a
16-mer will appear k times with k=40 is equal to 4.5*10.sup.-33
(resp. 2.6*10.sup.-35) if p.sub.1 (resp. P.sub.2) is used in the
calculation.
[0056] We thus can see that even if we take into account the
natural frequency of appearance in the human genome of each of the
four nucleotides, the probability that one of our discovered blocks
is accidental remains very small even for blocks of size 16 that
appear only 40 times.
[0057] Alternatively, we can estimate the significance of our
patterns using z-scores: for the least specific patterns of length
16 that have exactly 40 identical copies we obtain the
remarkably-high value of z=32.66. Longer patterns and patterns with
more intergenic/intronic copies have even higher z-scores. These
analyses confirm in different ways that every one of our discovered
patterns is statistically significant and not the result of a
random process. These conclusions hold true for the reverse
complements of the discovered patterns as well and for the pyknons
that are a subset of the discovered patterns P.sub.init.
[0058] It is to be noted that we will use the terms "coding" and
"coding region" (abbreviated as CR and CRs) to refer to the
translated, amino-acid coding part of exons.
[0059] We now describe the step of determining which of the
discovered patterns have additional instances in the 5'UTRs, CRs or
3'UTRs of known genes. Once the pattern discovery step has produced
the set P.sub.init of variable length patterns, we processed it to
identify `logically distinct` patterns using the following
approach. Let there be a recurrent logical unit which appears
several times in the intergenic/intronic regions of the human
genome; each one of its instances is assumed to have different
lengths that reflect varying degrees of conservation. For
simplicity, we assume here that different degrees of conservation
will result in variable length instances of the pattern. We only
seek patterns with identically-conserved instances so this is a
correct assumption. For example's sake, we will assume that all
variations of the logical unit contain an intact copy of an
18-nucleotide core motif; let TCCCATACCACGGGGATT represent this
core. As the instances of the logical unit become longer and thus
more specific, the number of appearances in the input decreases.
FIG. 5 shows this example in more detail. Several hypothetical
variations of the logical unit are aligned around the common core
motif and the number of instances is listed next to each
variation.
[0060] We reasoned that these patterns should be processed in order
of decreasing value of the total number of positions that they
span: this number is simply the product of each pattern's length by
the number of times it appears in the input. As patterns are
examined in turn, some of them are selected and kept whereas others
collide with earlier-made selections.
[0061] Two collision scenarios are possible and we examine them
with the help of FIG. 6. Two blocks, light and darker gray, are
shown therein together with a `reference set` of sequences. The
light gray block corresponds to a pattern that has already been
examined and placed at all its instances. The instances of the
darker gray block show the intended placements for the pattern
currently under consideration. The blocks collide at two locations
(they overlap in the first and second sequence) but the rest of
their instances are disjoint. We have two possibilities regarding
the handling of collisions. The darker gray block is kept if and
only if there is at least one other location in the reference
sequence set where it can be placed without generating a collision
(e.g. the fifth and sixth sequences in FIG. 6). Alternatively, the
darker gray block is kept if and only if it generates no collisions
whatsoever with any block that has already been selected and
placed. We have opted for the stricter, second choice: if a
pattern's instance uses a position that has already been claimed by
an earlier-selected pattern, then the pattern under consideration
will be discarded and not considered further. Generally, it will be
redundant variations of the same pattern that will generate
collisions: only one pattern will be used to represent a core motif
such as the one shown in FIG. 5.
[0062] The one remaining element is to decide which sequences to
use as the reference set. We have chosen to use each of the 5'UTRs,
CRs, and 3'UTRs in turn. Sub-selecting among the patterns in
P.sub.init with the help of each of the 5' untranslated, coding and
3' untranslated regions gives rise to the pattern collections
P.sub.5'UTR, P.sub.CODING and P.sub.3'UTR respectively. The union
of these sets, P.sub.5'UTR U P.sub.CODING U P.sub.3'UTR comprises
the pyknons, i.e., patterns that were originally discovered in the
intergenic and intronic regions of the human genome and which have
additional instances in the 5' untranslated, coding and 3'
untranslated regions of known human genes.
[0063] We used the above steps to determine which of the discovered
patterns has additional instances in the untranslated and coding
regions of genes. After filtering the surviving patterns for
low-complexity with NSEG (Wootton, J.C. & Federhen, S. (1993)
Computers in Chemistry 17, 149-163), we generated three patterns
sets P.sub.5'UTR, P.sub.CODING and P.sub.3'UTR that contained
12,267, 54,396 and 67,544 patterns respectively and had additional
instances in 5'UTRs, CRs and 3'UTRs. The union of P.sub.5'UTR U
P.sub.CODING U P.sub.3'UTR contained 127,998 patterns indicating
that the three pattern sets are largely disjoint. We refer to these
127,998 patterns as pyknons.
[0064] We know describe some properties of the pyknons. The three
patterns sets P.sub.5'UTR, P.sub.CODING and P.sub.3'UTR contain
12,267, 54,396 and 67,544 blocks respectively. The union
P.sub.5'UTR U P.sub.CODING U P.sub.3'UTR comprises the 127,998
pyknons. In FIG. 7, we show the probability density function for
the length of the pyknons; the function is shown separately for
each of the three subsets that make up the pyknon collection. Note
that the Y-axis is logarithmic.
[0065] The patterns in each of the three collections, P.sub.5'UTR,
P.sub.CODING and P.sub.3'UTR, fall into one of two types. "Type-A"
patterns are patterns whose reverse complement is also present in
the same collection (note that reverse palindromes are included
among the type-A patterns). "Type-B" patterns are patterns whose
reverse complement is absent from the collection. The breakdown for
each of P.sub.5'UTR, P.sub.CODING and P.sub.3'UTR is as follows:
P.sub.5'UTR contains 217 type-A blocks and 11,835 type-B blocks;
P.sub.CODING contains 1,038 type-A blocks and 52,330 type-B blocks;
and P.sub.3'UTR contains 2,501 type-A blocks and 62,577 type-B
blocks. The clear majority of the blocks in each of the three
collections are type-B blocks.
[0066] With respect to their content, the three collections are
largely disjoint, a characteristic that presumably reflects
sequence differences that are inherent to the actual 5'UTRs, CRs
and 3'UTRs. FIG. 8 shows pictorially the relationship among the
members of the three sets P.sub.5'UTR, P.sub.CODING and
P.sub.3'UTR: note the small cardinalities of the various
intersections.
[0067] Finally, we comment on the number of intergenic and intronic
copies of a pyknon. This number spans a very wide range of values
with the most frequent pyknon having 356,989 copies --the minimum
number of copies is, by design, equal to K=40. For about 95% of the
pyknons, their intergenic/intronic copies are fewer than 2,000.
FIG. 9 shows the cumulative distributions for the number of
intergenic and intronic copies of the pyknons--the distribution is
again shown separately for each of P.sub.5'UTR, P.sub.CODING and
P.sub.3'UTR in order to highlight the similarities and differences
of the three sets.
[0068] The pyknons also exhibit a number of properties that connect
the non-genic and genic regions of the human genome, as well as
other genomes, in unexpected ways. In particular:
[0069] The pyknons have one or more instances within nearly all
known genes. The 127,998 pyknons that we originally discovered in
the human intergenic and intronic regions have an additional
226,874 non-overlapping copies in the 5'UTRs, CRs or 3'UTRs of
20,059 genes (30,675 transcripts). That is, more than 90% of all
human genes contain one or more pyknon instances. The pyknons in
P.sub.5'UTR cover 3.82% of the 6,947,437 nucleotides in human
5'UTRs; the pyknons in P.sub.CODING cover 3.04% of the 50,737,024
nucleotides in human CRs; and, the pyknons in P.sub.3'UTR cover
7.33% of the 25,597,040 nucleotides in human 3'UTRs. The
distribution of pyknons in the various transcripts is not uniform.
FIG. 10 shows the cumulative for the number of transcripts with a
given number of pyknons instances in them. As can be seen, about
52% of the 30,675 affected transcripts contain four or more pyknon
instances; of these about 2,200 transcripts contain 20 or more
pyknon instances in them.
[0070] The pyknons arrange combinatorially in many human 5'UTRs,
CRs and 3'UTRs forming mosaics. In those cases where we find many
pyknons in one transcript, the pyknons arrange combinatorially and
form mosaics. FIG. 11 shows an example of such a combinatorial
arrangement in the 3'UTRs of birc4 (an apoptosis inhibitor) and
nine other human genes. The 3'UTR of birc4 contains 100 instances
of 95 distinct pyknons: of these, 22 are also present in the 3'UTRs
of the other nine genes shown. One or more instances of the 95
pyknons from birc4's 3'UTR exist in the 3'UTRs of 2,306 transcripts
(data not shown).
[0071] We next show two more examples, one involving 5'
untranslated and the other involving coding regions. It is
important to stress here that the pyknons are initially discovered
in an input that includes neither untranslated/amino-acid-coding
sequences nor their reverse-complement; thus, pyknon arrangements
such as the ones shown in the following two examples represent
non-trivial findings from the standpoint of statistical
significance. FIG. 12 shows an example of combinatorial
rearrangement in the 5'UTRs of ENSG00000196809 a gene of unknown
function and 8 more human genes. 63 distinct pyknons have a total
of 65 instances in the 5'UTR of ENSG00000196809. Of the 63 pyknons
in the 5'UTR of ENSG00000196809, nine are also shared with the
remaining eight genes of the shown group.
[0072] FIG. 13 shows an example of combinatorial rearrangement in
human coding regions with the help of the amino-acid-coding
sequences from 10 distinct genes: 9 pyknons have a total of 124
instances in the coding regions of the shown transcripts with
several of the conserved pyknons appearing twice or more in a given
sequence.
[0073] Recall that we initially discovered the pyknons in an input
that included neither transcribed gene-related sequences nor their
reverse-complement. Thus, finding so many pyknons with instances in
human 5'UTRs, CRs and 3'UTRs is significant, especially in view of
the three striking examples of combinatorial rearrangements shown
above.
[0074] The pyknons account for 1/6.sup.th of the human intergenic
and intronic regions. The intergenic and intronic copies of the
pyknons span 692,393,548 positions on the forward and reverse
strands. For those pyknons whose reverse complements are not
already in the list of 127,998 pyknons, their Watson-strand
instances impose constraints on their Crick-strand instances.
Considering this and recalculating shows that 898,424,004
positions, i.e., about 1/6.sup.th of the human intergenic/intronic
regions, are covered by pyknons and their reverse complement.
[0075] The pyknons are non-redundant. We clustered the pyknons
using a scheme based on BLASTN (Altschul, S. F., Gish, W., Miller,
W., Myers, E. W. & Lipman, D. J. (1990) J Mol Biol 215,
403-10). Two pyknons are redundant if they agree on at least X/o of
their positions. Since our collection includes pyknon pairs whose
members are the reverse complement of one another, we had to ensure
that the clustering scheme did not over-count: when comparing
sequences A and B, we examined for redundancy the pair (A,B) and
the pair (reverse-complement-of-A,B). Clustering at X=70, 80 and
90%, we generated clusters with 32621, 44417 and 89159 pyknons
respectively. The high numbers of surviving clusters show that the
pyknons are largely distinct.
[0076] We next describe the BLASTN-based clustering scheme. Let us
assume that we are given a set of N sequences of nucleic acids of
variable length, and a user-defined threshold X for the permitted,
maximum remaining pair-wise sequence similarity. Then, we carry out
the following steps: TABLE-US-00001 sort the N sequences in order
of decreasing length ; let S.sub.i denote the i-th sequence of the
sorted set ; let S.sub.l be the longest sequence of the sorted set
; CLEANED_UP_SET S.sub.l for i = 2 through TV do use S.sub.i as
query to run BLAST against the current contents of CLEAN if the top
BLAST hit T agrees with S.sub.i or with the reverse complement of
S.sub.i at more than X % of T 's positions then make S.sub.i a
member of the cluster represented by T ; discard S.sub.i ; else
CLEANED_UP_SET CLEANED_UP_SET U { S.sub.i } ; end-for-loop
[0077] Upon termination, the set CLEANED_UP_SET contains sequences
no pair of which agrees on more than X % of the positions in the
shorter of the two sequences. * On pyknons and repeat elements.
1,292 pyknons (1.0%) have instances occurring exclusively inside
repeat elements as determined with the help of RepeatMasker (Smit,
A. & Green, P. RepeatMasker:
ftp.genome.washington.edu/RM/RepeatMasker.html). Seventy-nine
pyknons have instances exclusively in repeat-free regions. And, the
remaining 126,627 pyknons (98.9% of total) have instances both
inside repeat elements and in repeat-free regions. A question that
arises here is what fraction, on average, of the total number of
copies of pyknons is generated from repeat-free regions. We have
computed the probability density and cumulative functions for this
fraction, and plot them in FIG. 14. As can be seen, about 60% of
the pyknons have more than 90% of their copies inside repeat
elements. However, the remaining 40% of the pyknons, which amounts
to a little more than 50,000 pyknons, have between 10% and 100% of
their instances in regions that are free of repeats.
[0078] The pyknons are distinct from the "ultraconserved elements."
52 pyknons have instances in 46 of the 481 ultraconserved elements
(Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W. J.,
Mattick, J. S. & Haussler, D. (2004) Science 304, 1321-5) and
cover 0.67% of the 126,007 positions: uc.73+ contains four pyknons;
uc.23+, uc.66+, uc.143+ and uc.414+ each contain two pyknons; the
remaining 41 elements contain a single pyknon each.
[0079] The pyknons are associated with specific biological
processes. For 663 GO terms (Ashbumer, M., Ball, C. A., Blake, J.
A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P.,
Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D.
P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C.,
Richardson, J. E., Ringwald, M., Rubin, G. M. & Sherlock, G.
(2000) Nat Genet 25, 25-9) describing biological processes at
varying levels of detail, we found that the corresponding genes had
either a significant enrichment or a significant depletion in
pyknon instances. FIG. 15 shows a partial list of GO terms that are
enriched or depleted in pyknons.
[0080] We determined these associations as follows: each gene was
included in a list n times, where n is the number of pyknons found
in its 5' untranslated, coding or 3' untranslated region,
respectively--to avoid over-counting, pyknons with multiple
instances in the transcript(s) of a given gene were counted only
once. For the sets of sequences belonging to human 5'UTRs, CRs and
3'UTRs, respectively, the binomial distribution was used to
estimate the significance of enrichment (or depletion) of pyknons
encountered in a group of genes associated with a certain term,
compared to the expected frequency of this term in a background set
comprising all genes with 5' untranslated, coding or 3'
untranslated regions respectively.
[0081] Two control tests helped ensure the significance of our
findings. First, we generated gene lists identical to the ones
derived from the real data but which were created by random
associations with pyknons: we found that only 1 of the generated
84,780 p-values exceeded our selected significance threshold of a
Bonferroni-corrected|log(p-value)| of about 2.3 (data not shown).
Second, we examined the relation between GO-process associations
and the amount of sequence covered by the pyknons: this test
allowed us to rule out the possibility that the derived significant
enrichment/depletion were due to variations in sequence length for
the genes associated with given cellular processes.
[0082] The relative positioning of pyknons in 5'UTRs, CRs and
3'UTRs is strongly biased but consecutive pyknon instances are not
correlated. We examined the distances between consecutive pyknons,
independently for each of the 5'UTRs, CRs and 3'UTRs. FIG. 16 shows
the calculated probability density functions. Given the stringent
criteria, we used when selecting pyknons, the coverage of each
region is not dense, hence the tail-heavy distributions. The three
curves have similar shapes, pronounced peaks at abscissas 18 and
22, and an overall preference for distances between 18 and 31
nucleotides.
[0083] We next examined whether or not the pyknons are fragments of
larger conserved regions. Let b denote a pyknon and let us assume
that, unbeknownst to us, b is part of a larger-size conserved unit
B. Then B will correspond to a larger area than the instance carved
out by b, and thus there will be length(B)-length(b)+1 strings in
the immediate neighborhood of b whose intergenic and intronic
counterparts have as many identically-conserved copies as B. We
tested this possibility in 3'UTRs by taking each instance of a
pyknon, shifting it by +d (resp. -d), generating a new string and
locating the new string's instances in the human intergenic and
intronic regions. Had b been part of a larger logical unit, then
for some values of d the number of intergenic and intronic copies
of the newly formed string would have remained identical to those
of b. On the other hand, if b were not part of a larger unit, then
the new string would now cross the "natural boundaries" of the
underlying presumed logical units and the new string's
intergenic/intronic copies would be reduced drastically. Given the
strict criteria that we used in identifying pyknons, it is possible
that we discarded blocks that are conserved in intergenic/intronic
regions and have instances in human coding regions. In this case, a
shift of .+-.d may end up generating a string that was not included
in our set of pyknons but continues to have numerous
intergenic/intronic copies. FIG. 17 shows the results obtained for
the 3'UTRs for d .epsilon. and separately for intergenic (top half)
and intronic (bottom half) regions; the curves for d=0 correspond
to the pyknons in P.sub.3'UTR. Note that even for a small shift of
d=2 positions, the derived, shifted strings have strikingly fewer
copies than the pyknons in P.sub.3'UTR, and this holds true for
both the intergenic and intronic instances. We obtained similar
results for negative d's (data not shown).
[0084] The pyknons are possibly linked to PTGS. The most
conspicuous feature of FIG. 16 is the strong preference for
distances typically encountered in the context of PTGS. By
definition, the 127,998 pyknons have one or more instances in the
untranslated and coding regions of human genes: for each pyknon, we
generated its reverse complement {overscore (.beta.)}, identified
all of {overscore (.beta.)}'s intergenic and intronic instances,
and predicted the RNA structure and folding energy of the
immediately surrounding neighborhoods using the Vienna package
(Hofacker, I. L., Fontana, W., Stadler, P., Bonhoeffer, L. S.,
Tacker, M. & Schuster, P. (1994) Monatshefte f. Chemie 125,
167-188). We discarded structures whose predicted folding energies
were >-30 Kcal/mol, and structures (including ones with
favorable folding energies) that were predicted to locally
self-hybridize, even if the involved positions represented a
miniscule fraction of the total length of the regions under
consideration. We also discarded structures that contained either a
single large bulge or many unmatched bases. Each of the surviving
regions was predicted to fold into a hairpin-shaped RNA structure
that had a straightforward arm-loop-arm architecture, contained
very small bulges if any, and was energetically very stable. The
analysis identified 380,084 non-overlapping regions predicted to
form hairpin-shaped structures (298,197 in intergenic and 81,887 in
intronic sequences). These 380,084 regions contained instances of
the reverse complement of 37,421 pyknons (29.23% of total). In
terms of length, the clear majority of these regions are between 60
and 80 nucleotides long.
[0085] FIG. 18 shows the density of the surviving regions per
10,000 nucleotides and for each chromosome separately. The density
is reported for each chromosome and separately for the intergenic
and intronic regions. Per unit length, there are more predicted
hairpins in intronic rather than intergenic regions but the shear
difference in the magnitude of these regions results in the
intergenic regions contributing the bulk of the hairpins.
Interestingly, the density of discovered hairpins is not constant
across chromosomes: chromosomes 16, 17, 19 and 22 who are the most
densely-packed in terms of predicted hairpins are also among the
shortest in length. We emphasize that the average pyknon has length
similar to that of a typical microRNA and that there is a
straightforward sense-antisense relationship between segments of
the 380,084 hairpins and the pyknons instances in human
5'UTRs/CRs/3'UTRs. Also note that the regions containing the 81,887
intronic hairpins will be transcribed: these regions account for
21,727 of the 37,421 pyknons that are linked to hairpins.
[0086] If pyknons are indeed connected to PTGS, then two hypotheses
arise from FIG. 16: a) in addition to 3'UTRs, gene silencing is
likely effected through the 5'UTRs and amino acid coding regions;
and, b) RNAi products in animals likely form distinct quantized
categories based on size and have preferences for lengths of 18,
22, 24, 26, 29, 30 and 31 nucleotides.
[0087] The pyknons relate to known microRNAs. We formed the union
of the RFAM (Griffiths-Jones, S., Bateman, A., Marshall, M.,
Khanna, A. & Eddy, S. R. (2003) Nucleic Acids Res 31, 439-41)
and pyknon collections, and clustered it with the above-described
BLASTN-based scheme, using a threshold of pair-wise remaining
sequence similarity of 70%; i.e., we allowed up to six mismatches
in 22 nucleotides. When comparing two sequences A and B, we avoided
over-counting by examining for redundancy the pairs (A,B) and
(reverse-complement-of-A,B). In total, 1,087 known microRNAs
clustered with 689 pyknons across 279 of the 32,994 formed
clusters.
[0088] The pyknons relate to recently discovered 3'UTR motifs. We
compared the pyknons in P.sub.3'UTR to the 72 8-mer motifs that
were recently reported to be conserved in human, mouse, rat and dog
3'UTRs (Xie, X., Lu, J., Kulbokas, E. J., Golub, T. R., Mootha, V.,
Lindblad-Toh, K., Lander, E. S. & Kellis, M. (2005) Nature 434,
338-45). We say that one of these 8-mers coincides with a pyknon of
length l if one of the following conditions holds: the 8-mer agrees
with letters l-7 through l of a pyknon (`type 0` agreement); the
8-mer agrees with letters l-8 through l-1 (`type 1` agreement); or,
the 8-mer agrees with letters l-9 through l-2 (`type 2` agreement).
Of the 72 reported conserved 8-mers, 39 were in `type 0` agreement,
10 in `type 1` agreement, and seven in `type 2` agreement with one
or more pyknons from P.sub.3'UTR. Six of the 8-mers did not match
at all any of the pyknons in P.sub.3'UTR. In summary, the pyknons
that we have derived by intragenomic analysis overlap with 56 out
of the 72 motifs that were discovered through cross-species
comparisons.
[0089] Human pyknons are also present in other genomes where they
associate with similar biological processes. In FIG. 19, and for
each of 7 genomes in turn, we show how many positions in region X
of the genome at hand are covered by the human pyknons contained in
set P.sub.x, X={5'UTR,CODING,3'UTR}. We account for length
differences across genomes by reporting the number of covered
positions per 10,000 nucleotides. FIG. 20 shows how many of the
human pyknons contained in set P.sub.x can also be found in the
region X of the genome under consideration, X={5'UTR,CODING,3'UTR}.
FIG. 20 also shows the total number of intergenic and intronic
positions covered by those of the human pyknons that are also in
other genomes. Notably, the human genome contains more than 600
million nucleotides that are associated with identical copies of
pyknons and are absent from the mouse and rat genomes.
Interestingly, the human pyknons have many instances in the
intergenic and intronic regions of the phylogenetically distant
worm and fruit-fly genomes covering about 1.6 million nucleotides
in each.
[0090] A set of 6,160 human-genome-derived pyknons are
simultaneously present in human 3'UTRs (5,752 genes) and mouse
3'UTRs (4,905 genes) whereas a second set of 388 pyknons are
simultaneously present in human 3'UTRs (565 genes), mouse 3'UTRs
(673 genes) and fruit-fly 3'UTRs (554 genes). Strikingly, we found
these two sets of common pyknons to be significantly
over-represented in the same biological processes in these other
genomes (i.e. mouse and fruit-fly) as in the human genome, even
though the pyknons were initially discovered by processing the
human genome in isolation (data not shown). The common processes
include regulation of transcription, cell communication, signal
transduction etc. Finally, for each of the 388 pyknons in this
second set, we manually analyzed about 130 nucleotide-long
neighborhoods centered on the instances of each pyknon across the
human, mouse and fruit-fly 3'UTRs and for a total of more than
4,000 such neighborhoods: notably, we did not find any instance of
syntenic conservation across the three genomes.
[0091] Accordingly, as explained above, we explored the existence
of sequence-based links between coding and non-coding regions of
the human genome and identified 127,998 pyknons with a combined
226,874 non-overlapping instances in the 5'UTRs, CRs or 3' UTRs of
30,675 transcripts from 20,059 human genes. In transcripts that
contained multiple pyknon instances, we found that the pyknons
arrange themselves combinatorially forming mosaics. Statistical
analysis revealed that the untranslated and/or coding regions of
genes associated with specific biological processes are
significantly enriched/depleted in pyknons.
[0092] We also found that the pyknon placement in 5'UTRs, CRs and
3'UTRs is strongly biased: the starting positions of consecutive
pyknons show a clear preference for distances between 18 and 31
nucleotides. Importantly, we found an apparent lack of correlation
between consecutive pyknon instances in these regions. The observed
bias in the relative placement of the pyknons is conspicuously
reminiscent of lengths that are associated with small RNA molecules
that induce PTGS, suggesting the hypothesis that the pyknons'
instances correspond to binding sites for microRNAs. Analysis of
the regions immediately surrounding the intergenic and intronic
instances of the reverse complement of the 127,998 discovered
pyknons revealed that 30.0% of the pyknons have instances within
about 380,000 distinct, non-overlapping regions between 60 and 80
nucleotides in length that are predicted to fold into
hairpin-shaped RNA secondary structures with folding energies
.ltoreq.-30 Kcal/mol. Many of these predicted hairpin-shaped
structures are located inside known introns and thus will be
transcribed. Our analysis also suggests that PTGS may be effected
though the genes' 5'UTR and amino acid regions, in addition to
their 3'UTRs. Another resulting hypothesis is that RNAi products in
animals likely fall into distinct categories that are quantized in
terms of size and have preferences for lengths of 18, 22, 24, 26,
29, 30 and 31 nucleotides. Notably, through sequence-based
analysis, we showed that about 40% of the known microRNAs are
similar to 689 pyknons, and that the pyknons subsume 56 of the 72
recently reported 3'UTR motifs, lending further support to the
possibility of a connection between the pyknons and RNAi/PTGS.
[0093] The intergenic/intronic copies of the 127,998 pyknons
constrain approximately 900 million nucleotides of the human
genome. Instances of human pyknons can also be found in other
genomes namely C. elegans, D. melanogaster, G. gallus, M. musculus,
R. norvegicus and C. familiaris. The number of human pyknons that
can be located in the 5'UTRs, CRs and 3'UTRs of other genomes
decreases with phylogenetic distance. Strikingly, the pyknons that
we found inside mouse and fruit-fly 3'UTRs were over-represented in
the same biological processes as in the human genome. On a related
note, more than 600 million bases, which correspond to identically
conserved intergenic and intronic copies of human pyknons, are not
present in the mouse and rat genomes.
[0094] The fact that some of the intergenic/intronic copies of
pyknons originate in repeat elements may lead one to assume that
our analysis has merely `rediscovered` such elements. However, as
mentioned above, more than 50,000 of the pyknons have many of their
instances in repeat-free regions. Moreover, the typical length of a
pyknon is substantially smaller than, e.g., that of an ALU. It was
recently reported that genes can achieve evolutionary novelty
through the `careful` incorporation of ALUs in their coding regions
(Iwashita, S., Osada, N., Itoh, T., Sezaki, M., Oshima, K.,
Hashimoto, E., Kitagawa-Arita, Y., Takahashi, I., Masui, T.,
Hashimoto, K. & Makalowski, W. (2003) Mol Biol Evol 20,
1556-63; and Lev-Maor, G., Sorek, R., Shomron, N. & Ast, G.
(2003) Science 300, 1288-91). Also, the "pack-mule" paradigm
revealed that entire genes, large fragments from a single gene, or
fragments from multiple genes can be `hijacked` by transposable
elements (Jiang, N., Bao, Z., Zhang, X., Eddy, S. R. & Wessler,
S. R. (2004) Nature 431, 569-73). However, `fortuitous coincidence`
is generally considered the prevailing mechanism by which such
potential is unleashed. Contrasting this, the combinatorial
arrangement of the pyknons within the untranslated and coding
regions of genes together with the large number of instances in
these regions and the association of pyknons with specific
biological processes suggests that their placement is not
accidental and likely serves a specific purpose. Our findings do
not rule out a link with transposable elements. On the contrary,
the findings seem to support a dynamic view of a genome (Jorgensen,
R. A. (2004) Cold Spring Harb Symp Quant Biol 69, 349-54) that has
learned to respond, and likely continues to respond, to
environmental challenges or "stress" in a controlled, organized
manner.
[0095] Taken together, the results suggest the existence of an
extensive link between the non-coding and gene-coding parts in
animal genomes. It is conceivable that this link could be the
result of integration into the genome of dsRNA-breakdown products.
Since many genes are known to give rise to antisense transcripts,
it is possible that these genes were at some point subjected to
RNAi-mediated dsRNA breakdown which in turn gave rise to products
about 20 nucleotides in length. The latter, through repeated
integration, could have eventually given rise to the numerous
intergenic and intronic copies of the pyknons that we have
identified. However, this explanation would have to be reconciled
with four of our findings. First, the pyknons have identically
conserved copies in non-genic regions. Second, pyknons appear to
favor a specific size and, in genic regions, a specific relative
placement. Third, slight modification of the 3'UTR instances of the
pyknons by either prepending or appending immediately neighboring
positions results in new strings whose intergenic and intronic
copies are markedly decreased. And fourth, we can discover human
pyknons in other organisms such as the mouse and the fruit-fly
where they exhibit a persistent enrichment within specific
processes yet are not the result of syntenic conservation. It may
well be that we are seeing traces of an organized, coordinated
activity that involves nearly all known genes. The existence of a
pyknon-based regulatory layer that is massive in scope and extent,
originates in the non-coding part of the genome, operates through
the genes' untranslated and coding regions, and, is likely linked
to PTGS, is a tantalizing possibility. Moreover, the observed
disparity in the number of intergenic/intronic positions covered by
human pyknons in the human and the phylogenetically-close mouse/rat
genomes suggests that pyknons and thus the presumed regulatory
layer may be organism-specific to some degree ("pyknome").
Addressing such questions might eventually help explain the
apparent lack of correlation between the number of amino-acid
coding genes in an organism and the organism's apparent
complexity.
[0096] In the above description, and in order to identify motifs
that are present in both non-genic and genic regions, we proceeded
by first carrying out pattern discovery in the intergenic and
intronic regions of the human genome. Once those patterns were
determined, we identified additional instances for them in the
genic regions of the genome and in particular in the 5'
untranslated, amino acid coding and 3' untranslated regions of the
genes. In other words, the computational analysis flowed from the
non-genic to the genic-regions. But there is nothing that
inherently prevents us from carrying out the computation in the
other direction, i.e., from the genic to the non-genic regions,
although there is potential for a loss in sensitivity that might
result in the identification of smaller sets of motifs linking
non-genic with genic regions. One could carry out the
genic/non-genic analysis in a number of ways. For example, one
could use a pattern discovery method to process the full collection
of 5' untranslated, amino acid coding and 3' untranslated regions
(with the regions processed separately or together), identifying
recurrent motifs contained therein, and finally establishing links
with the non-genic regions of the genome by locating the intergenic
and intronic copies for these motifs.
[0097] Instead of working with the full length sequences of the
genes' untranslated and coding regions, an alternative method would
be to delineate areas of interest in these regions (effectively
subselecting), analyzing those areas to derive motifs, and finally
locating additional instances of these motifs in the non-genic
parts of the genome. Such areas of interest could, for example, be
known or putative microRNA binding sites. Alternatively, the areas
of interest could be what, in our work on the problem of RNA
interference, we refer to as "target islands." A detailed
description of the work is described in the U.S. patent application
identified as Ser. No. 11/351,821, filed on Feb. 10, 2006, and
entitled "System and Method for Identification of MicroRNA Target
Sites and Corresponding Targeting MicroRNA Sequences," the
disclosure of which is incorporated herein.
[0098] Summarily, our approach for finding microRNA target sites is
known as rna22 and proceeds as follows: it discovers statistically
significant patterns that are contained in the sequences of known
microRNAs, generates their reverse complement, identifies all the
instances of these reverse-complement patterns in a region of
interest (namely one of 5'UTRs, CRs or 3'UTRs) and finally reports
groups of consecutive locations from the region of interest as long
as they are `hit` a minimum number of times by these patterns.
Generally, the groups of consecutive locations that rna22 reports
will be variable in length and may correspond to one or more
binding sites: consequently, and so as to not loose generality, we
have been referring to them as "target islands."
[0099] Let us assume that target islands are available for the
region of interest. One could proceed by doing an all-against-all
comparison of the target islands forming clusters. Any two
target-islands that end up in the same cluster have the property
that their corresponding sequences share a substantial portion of
their extent, say a minimum of N locations. Initially, each target
island is in its own cluster. There is always the possibility that
the thresholds used in the various stages of the process are too
stringent, thus resulting in the method to miss some target-islands
that could have otherwise become members of some cluster c. In
order to account for this, one could enhance the cluster-forming
process as follows. Using the Clustal-W multiple alignment
algorithm (Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson,
T. J., Higgins, D. G. & Thompson, J. D. (2003) Nucleic Acids
Res 31, 3497-500), we could align the sequences in cluster c and
extract the core region of the alignment, then use it to search the
sequences of interest for instances of the core region that were
skipped because of the employed thresholds. If a given cluster
contains more than one core regions then it can be replaced by as
many new clusters as the number of its distinct core regions. For
each one of the formed clusters whose core region that resulted
from the Clustal-W alignment of its members is at least N
nucleotides in length, we report the region as a (genic) motif.
[0100] Optionally, one can discard core regions that exhibit
low-complexity using the NSEG algorithm (Wootton, J. C. &
Federhen, S. (1993) Computers in Chemistry 17, 149-163). These
motifs are then sought in the corresponding genome's
intergenic/intronic regions instances to establish links between
coding and non-coding parts of the genome. Finally, it is clear
that instead of clustering the target islands to determine motifs,
one could simply use a pattern discovery approach and subselect
among the reported patterns to keep only those that, for example,
satisfy a minimum length requirement, or some other property.
[0101] Given the above description, a few points should be noted.
First, it is clear that the method which we have described and the
ensuing analysis is not specific to the human genome; in fact, it
can be carried out separately in other eukaryotic genomes such as
chimpanzee, mouse, rat, dog, chicken, fruit-fly, worm, etc. It is
expected that the resulting pyknomes will have non-zero
intersections with one another but will likely also contain
organism-specific pyknons. Whether generated from the human or some
other genome, the pyknons are statistically significant and link
the non-genic and genic regions of the genome at hand. The links
that are instantiated by the pyknons are `natural` in that they
involve large numbers of sequences that occur naturally in the
genome at hand. Consequently, the pyknons would form natural
candidates for a number of processes that to date have been carried
out using schemes that make use of local information alone and do
not take into account long-range conservations of the kind that we
presented in our discussion.
[0102] One such application would be the design of small
interfering RNAs (siRNAs) to regulate the gene expression of one
specific gene. Some of our pyknons have the property of being
shared by two or more genes which allows the design of siRNAs that
can interfere with a cluster of genes at once. As illustrated in
the flow diagram of FIG. 21, a method for designing one or more
sequences of siRNAs that can interact with one or more sites in a
given transcript of a given sequence in a given organism and result
in the down-regulation of the expression of the protein product
encoded by the given transcript can comprise the following steps.
One or regions of interest are identified in the sequence of a
given transcript (step 2102). One or more regions are sub-selected
from the collection of these regions (step 2104). One or more
derived sequences are generated from the sequence of the one or
more sub-selected regions (step 2106). The one or more derived
sequences are used to create one more instances of the
corresponding molecule that the one or more derived sequences
represent (step 2108). The one or more instances of the created
molecule are used in an appropriate environment to regulate the
expression of the given transcript (step 2110).
[0103] Further, the method of designing one or more siRNAs may use
a region of interest in the collection of regions of interest
identified to be an instance of a motif that has one or more copies
in the intergenic and intronic regions of the genome of interest,
and one or more copies in the untranslated and amino acid coding
regions of one or more genes in the genome of interest, each such
region of interest being computed using the method and system for
finding pyknons described above.
[0104] The method may use a region of interest in the collection of
regions of interest identified using a method that is based on
pattern discovery, for example, the method described in the
above-referenced U.S. patent application identified as Ser. No.
11/351,821. A region of interest in the collection of regions of
interest can also be identified to be a target island that is
computed using the method also described in the above-referenced
U.S. patent application identified as Ser. No. 11/351,821.
[0105] The method of designing one or more siRNAs may also use a
region of interest, for example, located in the 5' untranslated
region of the given transcript, located in the amino acid coding
region of the given transcript, or located in the 3' untranslated
region of the given transcript.
[0106] As detailed above, the method of designing one or more
siRNAs can be used where the genome of interest is a eukaryotic
genome, and wherein the eukaryotic genome is, for example, is the
human genome, the mouse genome, the rat genome, the dog genome, the
fruit fly genome, or the worm genome.
[0107] Also, the method of designing one or more siRNAs may use a
region of interest that is sub-selected based on one or more of its
attributes. These attributes may include, for example, the region's
length and the region's location in the transcript.
[0108] The method of designing one or more siRNAs can also use a
derived sequence that is, for example, the reverse complement of
the sequence of the one or more sub-selected regions, or a
near-reverse complement of the sequence of the one or more
sub-selected regions, i.e. it contains mismatches at one or more
locations.
[0109] The method of designing one or more siRNAs can be used such
that the one or more copies of the molecule can be built using any
of a set of biochemical processes.
[0110] Another application would involve the rational use of
pyknons to appropriately engineer a transcript of interest in order
to control its expression (either up-regulate or down-regulate) in
a specific tissue or for a specific cellular process. For example,
one could remove one or more of the pyknons existing in the
transcript of interest leading to an up-regulation of the
transcript. Alternatively, one could down-regulate the transcript
of interest by adding more instances of existing pyknons and rely
on the naturally occurring agent that targets this pyknon to induce
down-regulation. Or one could add the sequence of a pyknon that is
not among those contained in the transcript and selective control
the transcript's expression by adding or removing appropriately
generated instances of the reverse complement of the pyknon.
[0111] As illustrated in the flow diagram of FIG. 22, a method for
engineering a given transcript of a given gene in a given organism
in order to regulate its expression may comprise the following
steps. One or more regions of interest are identified in the
sequence of a given transcript (step 2202). One or more regions are
sub-selected from the collection of these regions (step 2204). The
one or more sub-selected regions are used to make one or more
modifications to the sequence of the given transcript (step
2206).
[0112] Further, the method of engineering a given transcript to
regulate gene expression can comprise many of the same steps as
mentioned above in the method for designing one or more siRNAs. For
example, the method of engineering a given transcript to regulate
gene expression may use a region of interest in the collection of
regions of interest identified to be an instance of a motif that
has one or more copies in the intergenic and intronic regions of
the genome of interest, and one or more copies in the untranslated
and amino acid coding regions of one or more genes in the genome of
interest. The motif can be computed, for example, using the pyknons
discovery method and system described above.
[0113] Also, as above, the method of engineering a given transcript
to regulate gene expression may use a region of interest in the
collection of regions of interest computed using a method that is
based on pattern discovery, for example, the method described in
the above-referenced U.S. patent application identified as Ser. No.
11/351,821.
[0114] The present method may also use a region of interest, for
example, located in the 5' untranslated region of the given
transcript, located in the amino acid coding region of the given
transcript, or located in the 3' untranslated region of the given
transcript.
[0115] Also, similar to the above methodology, the method of
engineering a given transcript to regulate gene expression may use
a region of interest that is sub-selected based on one or more of
its attributes including, for example, the region's length and the
region's location in the transcript. Additional attributes may
include the association of the region with a given biological
process, the region's association with a given tissue, and the
region's association with a given cellular compartment.
[0116] Further, the method of engineering a given transcript to
regulate gene expression can include a modification that, for
example, comprises an extension of the sequence of the given
transcript, or a shortening of the sequence of the given
transcript. The extension can, for example, comprise one or more
instances of a region of interest, and the shortening can, for
example, comprise one or more instances of a region of
interest.
[0117] Another application of pyknons, for example, would be the
measuring of the impact that one or more pyknons can have on a
gene's regulation "by proxy." This would entail the engineering of
an assay that involves a reporter gene (for example, luciferase)
and instances of the one or more pyknons placed downstream from the
region that codes for the reporter's amino acid sequence. Then, one
can measure the impact on the expression of the reporter gene by
using various combinations of appropriately generated instances of
the reverse complement of these pyknons. The observations made in
the context of the reporter assay can then be carried over to the
gene that is studied. Additional applications are also possible if
one assumes that for the organism that is being studied the
sequences of the corresponding pyknons are available.
[0118] FIG. 23 is a block diagram of an exemplary hardware
implementation of one or more of the methodologies of the present
invention. That is, apparatus 2300 may implement one or more of the
steps/components described above in the context of FIGS. 1-22.
Apparatus 2300 comprises a computer system 2310 that interacts with
media 2350. Computer system 2310 comprises a processor 2320, a
network interface 2325, a memory 2330, a media interface 2335 and
an optional display 2340. Network interface 2325 allows computer
system 2310 to connect to a network, while media interface 2335
allows computer system 2310 to interact with media 2350, such as a
Digital Versatile Disk (DVD) or a hard drive.
[0119] As is known in the art, the methods and apparatus discussed
herein may be distributed as an article of manufacture that itself
comprises a computer-readable medium having computer-readable code
means embodied thereon. The computer-readable program code means is
operable, in conjunction with a computer system such as computer
system 2310, to carry out all or some of the steps to perform one
or more of the methods or create the apparatus discussed herein.
For example, the computer-readable code is configured to implement
a method of determining associations between non-coding sequences
and gene coding sequences in a genome of an organism, by the steps
of: identifying at least one conserved region from a plurality of
the non-coding sequences; and linking the at least one conserved
region with one or more of the gene coding sequences of the genome
to associate the at least one conserved region with one or more
biological processes of the organism. The computer-readable medium
may be a recordable medium (e.g., floppy disks, hard drive, optical
disks such as a DVD, or memory cards) or may be a transmission
medium (e.g., a network comprising fiber-optics, the world-wide
web, cables, or a wireless channel using time-division multiple
access, code-division multiple access, or other radio-frequency
channel). Any medium known or developed that can store information
suitable for use with a computer system may be used. The
computer-readable code means is any mechanism for allowing a
computer to read instructions and data, such as magnetic variations
on a magnetic medium or height variations on the surface of a
compact disk.
[0120] Memory 2330 configures the processor 2320 to implement the
methods, steps, and functions disclosed herein. The memory 2330
could be distributed or local and the processor 2320 could be
distributed or singular. The memory 2330 could be implemented as an
electrical, magnetic or optical memory, or any combination of these
or other types of storage devices. Moreover, the term "memory"
should be construed broadly enough to encompass any information
able to be read from or written to an address in the addressable
space accessed by processor 2320. With this definition, information
on a network, accessible through network interface 2325, is still
within memory 2330 because the processor 2320 can retrieve the
information from the network. It should be noted that each
distributed processor that makes up processor 2320 generally
contains its own addressable memory space. It should also be noted
that some or all of computer system 2310 can be incorporated into
an application-specific or general-use integrated circuit.
[0121] Optional video display 2340 is any type of video display
suitable for interacting with a human user of apparatus 2300.
Generally, video display 2440 is a computer monitor or other
similar video display.
[0122] Although illustrative embodiments of the present invention
have been described herein with reference to the accompanying
drawings, it is to be understood that the invention is not limited
to those precise embodiments, and that various other changes and
modifications may be made by one skilled in the art without
departing from the scope or spirit of the invention.
Sequence CWU 1
1
52 1 18 DNA Unknown The pattern is a representation of an
18-nucleotide core motif presently assumed to be present in all
variations of the logical unit which appears several times in the
intergenic/intronic regions of the human genome. 1 tcccatacca
cggggatt 18 2 18 DNA Unknown The pattern is a representation of a
core pattern presently assumed to be present in all variations of
the logical unit which appears several times in the
intergenic/intronic regions of the human genome. 2 tcccatacca
cggggatt 18 3 19 DNA Unknown The pattern is a representation of a
core pattern presently assumed to be present in all variations of
the logical unit which appears several times in the
intergenic/intronic regions of the human genome. 3 tcccatacca
cggggatta 19 4 19 DNA Unknown The pattern is a representation of a
core pattern presently assumed to be present in all variations of
the logical unit which appears several times in the
intergenic/intronic regions of the human genome. 4 ctcccatacc
acggggatt 19 5 20 DNA Unknown The pattern is a representation of a
core pattern presently assumed to be present in all variations of
the logical unit which appears several times in the
intergenic/intronic regions of the human genome. 5 ctcccatacc
acggggatta 20 6 20 DNA Unknown The pattern is a representation of a
core pattern presently assumed to be present in all variations of
the logical unit which appears several times in the
intergenic/intronic regions of the human genome. 6 tcccatacca
cggggattac 20 7 21 DNA Unknown The pattern is a representation of a
core pattern presently assumed to be present in all variations of
the logical unit which appears several times in the
intergenic/intronic regions of the human genome. 7 gcctcccata
ccacggggat t 21 8 21 DNA Unknown The pattern is a representation of
a core pattern presently assumed to be present in all variations of
the logical unit which appears several times in the
intergenic/intronic regions of the human genome. 8 tcccatacca
cggggattac a 21 9 22 DNA Unknown The pattern is a representation of
a core pattern presently assumed to be present in all variations of
the logical unit which appears several times in the
intergenic/intronic regions of the human genome. 9 gcctcccata
ccacggggat ta 22 10 24 DNA Unknown The pattern is a representation
of a core pattern presently assumed to be present in all variations
of the logical unit which appears several times in the
intergenic/intronic regions of the human genome. 10 gcctcccata
ccacggggat taca 24 11 23 DNA Unknown The pattern is a
representation of a core pattern presently assumed to be present in
all variations of the logical unit which appears several times in
the intergenic/intronic regions of the human genome. 11 ggcctcccat
accacgggga tta 23 12 16 DNA Unknown The pattern is an example of a
pyknon. 12 tgcactccag cctggg 16 13 19 DNA Unknown The pattern is an
example of a pyknon. 13 taatcccagc actttggga 19 14 17 DNA Unknown
The pattern is an example of a pyknon. 14 ggctgaggca ggagaat 17 15
16 DNA Unknown The pattern is an example of a pyknon. 15 gaggttgcag
tgagcc 16 16 1748 DNA Unknown The pattern is a combinatorial
arrangement of pyknons in 3'UTRs. misc_feature n indicates the
number of nucleotides separating the pyknons that surround it, as
specified by Figure 11A. misc_feature (1)..(1) n is a, c, g, or t
misc_feature (18)..(18) n is a, c, g, or t misc_feature (35)..(35)
n is a, c, g, or t misc_feature (52)..(52) n is a, c, g, or t
misc_feature (69)..(69) n is a, c, g, or t misc_feature (86)..(86)
n is a, c, g, or t misc_feature (103)..(103) n is a, c, g, or t
misc_feature (120)..(120) n is a, c, g, or t misc_feature
(137)..(137) n is a, c, g, or t misc_feature (154)..(154) n is a,
c, g, or t misc_feature (174)..(174) n is a, c, g, or t
misc_feature (191)..(191) n is a, c, g, or t misc_feature
(208)..(208) n is a, c, g, or t misc_feature (225)..(225) n is a,
c, g, or t misc_feature (242)..(242) n is a, c, g, or t
misc_feature (260)..(260) n is a, c, g, or t misc_feature
(278)..(278) n is a, c, g, or t misc_feature (295)..(295) n is a,
c, g, or t misc_feature (312)..(312) n is a, c, g, or t
misc_feature (348)..(348) n is a, c, g, or t misc_feature
(365)..(365) n is a, c, g, or t misc_feature (382)..(382) n is a,
c, g, or t misc_feature (418)..(418) n is a, c, g, or t
misc_feature (436)..(436) n is a, c, g, or t misc_feature
(455)..(455) n is a, c, g, or t misc_feature (472)..(472) n is a,
c, g, or t misc_feature (490)..(490) n is a, c, g, or t
misc_feature (508)..(508) n is a, c, g, or t misc_feature
(543)..(543) n is a, c, g, or t misc_feature (560)..(560) n is a,
c, g, or t misc_feature (577)..(577) n is a, c, g, or t
misc_feature (594)..(594) n is a, c, g, or t misc_feature
(611)..(611) n is a, c, g, or t misc_feature (628)..(628) n is a,
c, g, or t misc_feature (664)..(664) n is a, c, g, or t
misc_feature (685)..(685) n is a, c, g, or t misc_feature
(702)..(702) n is a, c, g, or t misc_feature (719)..(719) n is a,
c, g, or t misc_feature (737)..(737) n is a, c, g, or t
misc_feature (755)..(755) n is a, c, g, or t misc_feature
(772)..(772) n is a, c, g, or t misc_feature (789)..(789) n is a,
c, g, or t misc_feature (822)..(822) n is a, c, g, or t
misc_feature (839)..(839) n is a, c, g, or t misc_feature
(856)..(856) n is a, c, g, or t misc_feature (889)..(889) n is a,
c, g, or t misc_feature (906)..(906) n is a, c, g, or t
misc_feature (923)..(923) n is a, c, g, or t misc_feature
(940)..(940) n is a, c, g, or t misc_feature (957)..(957) n is a,
c, g, or t misc_feature (974)..(974) n is a, c, g, or t
misc_feature (991)..(991) n is a, c, g, or t misc_feature
(1008)..(1008) n is a, c, g, or t misc_feature (1025)..(1025) n is
a, c, g, or t misc_feature (1042)..(1042) n is a, c, g, or t
misc_feature (1059)..(1059) n is a, c, g, or t misc_feature
(1076)..(1076) n is a, c, g, or t misc_feature (1096)..(1096) n is
a, c, g, or t misc_feature (1113)..(1113) n is a, c, g, or t
misc_feature (1130)..(1130) n is a, c, g, or t misc_feature
(1147)..(1147) n is a, c, g, or t misc_feature (1164)..(1164) n is
a, c, g, or t misc_feature (1181)..(1181) n is a, c, g, or t
misc_feature (1198)..(1198) n is a, c, g, or t misc_feature
(1215)..(1215) n is a, c, g, or t misc_feature (1232)..(1232) n is
a, c, g, or t misc_feature (1265)..(1265) n is a, c, g, or t
misc_feature (1282)..(1282) n is a, c, g, or t misc_feature
(1319)..(1319) n is a, c, g, or t misc_feature (1336)..(1336) n is
a, c, g, or t misc_feature (1353)..(1353) n is a, c, g, or t
misc_feature (1389)..(1389) n is a, c, g, or t misc_feature
(1406)..(1406) n is a, c, g, or t misc_feature (1427)..(1427) n is
a, c, g, or t misc_feature (1468)..(1468) n is a, c, g, or t
misc_feature (1488)..(1488) n is a, c, g, or t misc_feature
(1505)..(1505) n is a, c, g, or t misc_feature (1526)..(1526) n is
a, c, g, or t misc_feature (1543)..(1543) n is a, c, g, or t
misc_feature (1560)..(1560) n is a, c, g, or t misc_feature
(1577)..(1577) n is a, c, g, or t misc_feature (1595)..(1595) n is
a, c, g, or t misc_feature (1612)..(1612) n is a, c, g, or t
misc_feature (1629)..(1629) n is a, c, g, or t misc_feature
(1646)..(1646) n is a, c, g, or t misc_feature (1663)..(1663) n is
a, c, g, or t misc_feature (1680)..(1680) n is a, c, g, or t
misc_feature (1697)..(1697) n is a, c, g, or t misc_feature
(1714)..(1714) n is a, c, g, or t misc_feature (1731)..(1731) n is
a, c, g, or t misc_feature (1748)..(1748) n is a, c, g, or t 16
ntttcttgat ttttcagnat actgatttaa tttcnatttt cttttaaagt tnactgacat
60 ggaaagatnt gtattacttt tgtaanattt ttagaaagta ttntggatga
aaaatatttn 120 ctcacaacaa acctatnatt catttaaaac attncttttt
gagatggagt cttncccagg 180 ctggagtgca natctctgct cactgcangc
cttctgggtt caagnattct cgtgcctcag 240 cnagtagctg gaattacagn
gccaccatgc ccgactantt ggccaggctg gtatnaactc 300 ctgacctcaa
gntcccaaag tgctgggatt acaggcttga gccaccantt attttacatt 360
ttagnaaatt tgtattttga ancagtggct cacgcctgta atcccagcac tttggganga
420 tcacgaggtc aggagnagac catcctggct aacantctct actaaaaaac
anttagccgg 480 gcgtggtggn ctgtagtccc agctactngg ctgaggcagg
agaatggtgt gaacccggga 540 ggngagcttg cagtgagccn tgcactccag
cctgggnaga gcaagactct gtcntctgtg 600 aaaggaaaat nactcccatc
ctaatacncg gtggctcatg cctgtaatcc cagcactttg 660 ggangatcac
ctgaggtcgg gaggnagacc agcctgacca anttagctgg gcgtggtgnc 720
tgtaatccca gctactnggc tgaggcagga gaatncttga acccaggagg cngaggttgt
780 ggtgagcgnt gcactccagc ctgggcaaca agagcaaaac tnttagccag
gcgtggtgng 840 cagctactct ggaggnagag gcaggaggat cacttgagcc
catgaattng aggcagcagt 900 gagctntgta ctccagtctg ggnacagagt
gagaccccan ttgaaaagat tattctntca 960 ttttacaggt gagnccatgg
attcaaccaa nctgttgtta agcaacanta acaactattt 1020 acatnagcaa
ttatttttaa anattgtatt aggtattanc cagcctggac aaaagnaaac 1080
cctgtctcta caaaanttag ctgggcatgg tgntgtagtc ctggctactn ggaggatcgc
1140 ttgagtngag gctgcattga gctntgcatt ccagcctggg nagaccttgt
ctcagaanat 1200 tatatgcaaa tactncagtc tcactgtgtt gnggatggag
tgcaatggca caatcttggc 1260 tcatncagct gggactacag gngtgcccag
ttaatttttt ttgtattctt agtagagant 1320 tggccaggct agtctnaatt
tctgacctca agntcccaaa gtgctgggat tacaggcgtg 1380 agccaccant
tgaccaggct ggtctnaact cctgatctca ggtgatnctc ggcctcacaa 1440
agtgctggga ttacaggtgt gaaccacngg cacggtggct cacgcctncc gaggctgagg
1500 caggnctcac ctgaggtcag gagttnagac cagcctggcc aancctgtct
gtacaaaaan 1560 atagctgggc atggtgnctg tagtcccagc tactnactga
ggcaggagaa tncttgaacc 1620 tgggaggcng aggttgcagg gagccncgca
ctccagccta ggngatagag tgagactccn 1680 atgttttgag acagagngaa
aactaagaaa attnttattt ttctgtgaat ncaataaaat 1740 actattcn 1748 17
603 DNA Unknown The pattern is a combinatorial arrangement of
pyknons in 3'UTRs. misc_feature n indicates the number of
nucleotides separating the pyknons that surround it, as specified
by Figure 11B. misc_feature (1)..(1) n is a, c, g, or t
misc_feature (18)..(18) n is a, c, g, or t misc_feature (38)..(38)
n is a, c, g, or t misc_feature (59)..(59) n is a, c, g, or t
misc_feature (76)..(76) n is a, c, g, or t misc_feature (94)..(94)
n is a, c, g, or t misc_feature (112)..(112) n is a, c, g, or t
misc_feature (130)..(130) n is a, c, g, or t misc_feature
(147)..(147) n is a, c, g, or t misc_feature (164)..(164) n is a,
c, g, or t misc_feature (181)..(181) n is a, c, g, or t
misc_feature (198)..(198) n is a, c, g, or t misc_feature
(234)..(234) n is a, c, g, or t misc_feature (252)..(252) n is a,
c, g, or t misc_feature (271)..(271) n is a, c, g, or t
misc_feature (288)..(288) n is a, c, g, or t misc_feature
(305)..(305) n is a, c, g, or t misc_feature (322)..(322) n is a,
c, g, or t misc_feature (339)..(339) n is a, c, g, or t
misc_feature (356)..(356) n is a, c, g, or t misc_feature
(373)..(373) n is a, c, g, or t misc_feature (390)..(390) n is a,
c, g, or t misc_feature (410)..(410) n is a, c, g, or t
misc_feature (448)..(448) n is a, c, g, or t misc_feature
(466)..(466) n is a, c, g, or t misc_feature (483)..(483) n is a,
c, g, or t misc_feature (500)..(500) n is a, c, g, or t
misc_feature (518)..(518) n is a, c, g, or t misc_feature
(535)..(535) n is a, c, g, or t misc_feature (552)..(552) n is a,
c, g, or t misc_feature (569)..(569) n is a, c, g, or t
misc_feature (586)..(586) n is a, c, g, or t misc_feature
(603)..(603) n is a, c, g, or t 17 ngccaggtat ggtggctnta atcccagcac
tttggganat cacctgatgt caggagttna 60 gaccagcctg gccaanacta
gccaggcgtg gtgnctgtaa tcccagctac tnggctgagg 120 caggagaatn
cttgaaccca ggaggcngag gttgcagtga gccncactgc actccagccc 180
ngtgacagtg tgagactncg gtggctcaag cctgtaatcc cagcactttg ggangatcac
240 gaggtcagga gnagaccatc ctggctaaca ntagtcgggc gtggtggncc
tgtattcaca 300 gctangaggt tgaggcagga gngggtgaac ccgggaggng
agcttgcagt gagccntgca 360 ctccagcctg ggngacagag ccagactccn
aggctcacac ctgtaatccn gacgctgagg 420 tgggaggatc acttgagccc
aggagttntg ggcaatatag tgaganctac aaaaaagttt 480 ttnagcatgg
tggcacatgn ctgtagtccc acctactnag ggtcacctga gcctngaggc 540
tgcagtgagc cntgcactcc agcctgggna gagtaagacc ctgtcntttt attgagcagt
600 ttn 603 18 1070 DNA Unknown The pattern is a combinatorial
arrangement of pyknons in 3'UTRs. misc_feature n indicates the
number of nucleotides separating the pyknons that surround it, as
specified by Figure 11B. misc_feature (1)..(1) n is a, c, g, or t
misc_feature (18)..(18) n is a, c, g, or t misc_feature (36)..(36)
n is a, c, g, or t misc_feature (56)..(56) n is a, c, g, or t
misc_feature (74)..(74) n is a, c, g, or t misc_feature
(118)..(118) n is a, c, g, or t misc_feature (135)..(135) n is a,
c, g, or t misc_feature (152)..(152) n is a, c, g, or t
misc_feature (169)..(169) n is a, c, g, or t misc_feature
(186)..(186) n is a, c, g, or t misc_feature (203)..(203) n is a,
c, g, or t misc_feature (220)..(220) n is a, c, g, or t
misc_feature (237)..(237) n is a, c, g, or t misc_feature
(254)..(254) n is a, c, g, or t misc_feature (271)..(271) n is a,
c, g, or t misc_feature (291)..(291) n is a, c, g, or t
misc_feature (324)..(324) n is a, c, g, or t misc_feature
(341)..(341) n is a, c, g, or t misc_feature (359)..(359) n is a,
c, g, or t misc_feature (377)..(377) n is a, c, g, or t
misc_feature (394)..(394) n is a, c, g, or t misc_feature
(411)..(411) n is a, c, g, or t misc_feature (428)..(428) n is a,
c, g, or t misc_feature (445)..(445) n is a, c, g, or t
misc_feature (462)..(462) n is a, c, g, or t misc_feature
(479)..(479) n is a, c, g, or t misc_feature (496)..(496) n is a,
c, g, or t misc_feature (513)..(513) n is a, c, g, or t
misc_feature (530)..(530) n is a, c, g, or t misc_feature
(547)..(547) n is a, c, g, or t misc_feature (564)..(564) n is a,
c, g, or t misc_feature (581)..(581) n is a, c, g, or t
misc_feature (598)..(598) n is a, c, g, or t misc_feature
(615)..(615) n is a, c, g, or t misc_feature (632)..(632) n is a,
c, g, or t misc_feature (649)..(649) n is a, c, g, or t
misc_feature (685)..(685) n is a, c, g, or t misc_feature
(704)..(704) n is a, c, g, or t misc_feature (721)..(721) n is a,
c, g, or t misc_feature (739)..(739) n is a, c, g, or t
misc_feature (757)..(757) n is a, c, g, or t misc_feature
(776)..(776) n is a, c, g, or t misc_feature (793)..(793) n is a,
c, g, or t misc_feature (810)..(810) n is a, c, g, or t
misc_feature (827)..(827) n is a, c, g, or t misc_feature
(845)..(845) n is a, c, g, or t misc_feature (862)..(862) n is a,
c, g, or t misc_feature (879)..(879) n is a, c, g, or t
misc_feature (896)..(896) n is a, c, g, or t misc_feature
(913)..(913) n is a, c, g, or t misc_feature (930)..(930) n is a,
c, g, or t misc_feature (947)..(947) n is a, c, g, or t
misc_feature (964)..(964) n is a, c, g, or t misc_feature
(997)..(997) n is a, c, g, or t misc_feature (1016)..(1016) n is a,
c, g, or t misc_feature (1034)..(1034) n is a, c, g, or t
misc_feature (1070)..(1070) n is a, c, g, or t 18 naaagaagat
cattttgnag cagtggctca cacctntaat cccagcactt tggganaggt 60
gggcggatca cccnaaacca gcctgaccaa catggtgaaa ccctgtctct actaaatntt
120 agcggggtgt ggtgngcacc tgtaatcgca gngaggctga gacaggagnc
ttgaacccta 180 gaggcngagt ttgcagtgag ccnccattgt actccagctn
agtaagactc tgtctcntgg 240 gagtcatcct tganggccag gcatggtggc
ntaatcccag cattttggga ntgaggtcag 300 gagctcaaga ccagcctggc
caantagtcg ggcgtggtgg nctgtaatcc cagctactng 360 gctgaggcag
gagaatnctt gaacctggga ggcngaggtt gcagtgagcc ntgtactcca 420
gcctgggnag tgagactctg tcttnctgat aaatattgat gncagctcca ctaggaagnc
480 cattcaattc catttngagc tctttgaggc cancctagca catagtaggn
tgaatgaatg 540 aatgaantca ttttatgaag ctanttttat cagaaaaaaa
nacttaatcc ccagtgtnac 600 aaaggaatga agagnagcat ttaggccatt
tnaaatggta tttagaaanc ggtggctcat 660 gcctgtaatc ccagcacttt
gggantgagg caggcggatc actnagacca gcctggccaa 720 nttagccggg
cgtggtggnc tgtaatccca gctactnagg ctgaggcagg agaaancctg 780
aacccagaag gcngaggttg cagtgagccn cactgcactc cagccgncct ctgtctcaaa
840 aaaantttct ttatctgtaa angtttagaa agtaaaaanc ccaggttgga
gtgcanggct 900 cactgcaacc tcngcctccc gggttcaagn ctcctgtctc
agcctcngag tacctgggac 960 tacngcccgg ctaatttttt gtatttgtag
tagagantgt tagccaggat ggtctnctcc 1020 tgacctcatg atcntcccaa
agtgctggga ttacaggcgt gagcccccgn 1070 19 723 DNA Unknown The
pattern is a combinatorial arrangement of pyknons in 3'UTRs.
misc_feature n indicates the number of nucleotides separating the
pyknons that surround it, as specified by Figure 11B. misc_feature
(1)..(1) n is a, c, g, or t misc_feature (18)..(18) n is a, c, g,
or t misc_feature (35)..(35) n is a, c, g, or t misc_feature
(54)..(54) n is a, c, g, or t misc_feature (73)..(73) n is a, c, g,
or t misc_feature (90)..(90) n is a, c, g, or t
misc_feature (107)..(107) n is a, c, g, or t misc_feature
(126)..(126) n is a, c, g, or t misc_feature (143)..(143) n is a,
c, g, or t misc_feature (160)..(160) n is a, c, g, or t
misc_feature (177)..(177) n is a, c, g, or t misc_feature
(194)..(194) n is a, c, g, or t misc_feature (211)..(211) n is a,
c, g, or t misc_feature (228)..(228) n is a, c, g, or t
misc_feature (245)..(245) n is a, c, g, or t misc_feature
(262)..(262) n is a, c, g, or t misc_feature (279)..(279) n is a,
c, g, or t misc_feature (316)..(316) n is a, c, g, or t
misc_feature (333)..(333) n is a, c, g, or t misc_feature
(350)..(350) n is a, c, g, or t misc_feature (367)..(367) n is a,
c, g, or t misc_feature (385)..(385) n is a, c, g, or t
misc_feature (402)..(402) n is a, c, g, or t misc_feature
(435)..(435) n is a, c, g, or t misc_feature (452)..(452) n is a,
c, g, or t misc_feature (469)..(469) n is a, c, g, or t
misc_feature (505)..(505) n is a, c, g, or t misc_feature
(526)..(526) n is a, c, g, or t misc_feature (543)..(543) n is a,
c, g, or t misc_feature (564)..(564) n is a, c, g, or t
misc_feature (581)..(581) n is a, c, g, or t misc_feature
(602)..(602) n is a, c, g, or t misc_feature (619)..(619) n is a,
c, g, or t misc_feature (636)..(636) n is a, c, g, or t
misc_feature (654)..(654) n is a, c, g, or t misc_feature
(672)..(672) n is a, c, g, or t misc_feature (689)..(689) n is a,
c, g, or t misc_feature (706)..(706) n is a, c, g, or t
misc_feature (723)..(723) n is a, c, g, or t 19 ncctgttgtg
gggagggncg gtggctcatg cctgncatcc cagcactttg gganaacctg 60
aggtcaggag ttnaacaaca tggtgaaacn tgcctgcctg taatccngag gctgaggcag
120 gaaaancttg aacccgaaag gcngaggttg cagtgtgccn cactgaactc
cagcctncaa 180 caagagtgaa actntattga acacttacta nggcagagcc
aggatttntg ttttttaaaa 240 agaantttac agacaaggaa anggccaggc
atggtggcnt aatcccagca gtttgggagg 300 ctgaggtggg aggagnagac
cagcctaggc aancccatct ctacaaaaan ttatctgggc 360 ctggtgnctg
tagtcccagc tactnagagg ctgaggtggg ancttgagcc cagaagttga 420
ggctgcagtg agccntgtac tccagcctgg gncaaagcaa aaccctgtnc ggtggctcac
480 acctgtaatc ccaacacttt ggganatcac ctgaggtcag gagttnagac
cagcctggcc 540 aancctctac tgaaaataca aaangcattg tggcacatgc
natcacctga ggtcaggagt 600 tnagaccagc ctggccaant tagctgggcg
tggtgnctgt agtcccagct actnggctga 660 ggcaggagaa tncttgaacc
tgggaggtng aggttgcagt aagccntgca ctccagcctg 720 ggn 723 20 826 DNA
Unknown The pattern is a combinatorial arrangement of pyknons in
3'UTRs. misc_feature n indicates the number of nucleotides
separating the pyknons that surround it, as specified by Figure
11B. misc_feature (1)..(1) n is a, c, g, or t misc_feature
(18)..(18) n is a, c, g, or t misc_feature (35)..(35) n is a, c, g,
or t misc_feature (52)..(52) n is a, c, g, or t misc_feature
(69)..(69) n is a, c, g, or t misc_feature (102)..(102) n is a, c,
g, or t misc_feature (119)..(119) n is a, c, g, or t misc_feature
(152)..(152) n is a, c, g, or t misc_feature (204)..(204) n is a,
c, g, or t misc_feature (224)..(224) n is a, c, g, or t
misc_feature (241)..(241) n is a, c, g, or t misc_feature
(258)..(258) n is a, c, g, or t misc_feature (276)..(276) n is a,
c, g, or t misc_feature (293)..(293) n is a, c, g, or t
misc_feature (310)..(310) n is a, c, g, or t misc_feature
(327)..(327) n is a, c, g, or t misc_feature (344)..(344) n is a,
c, g, or t misc_feature (361)..(361) n is a, c, g, or t
misc_feature (378)..(378) n is a, c, g, or t misc_feature
(395)..(395) n is a, c, g, or t misc_feature (412)..(412) n is a,
c, g, or t misc_feature (429)..(429) n is a, c, g, or t
misc_feature (446)..(446) n is a, c, g, or t misc_feature
(463)..(463) n is a, c, g, or t misc_feature (480)..(480) n is a,
c, g, or t misc_feature (497)..(497) n is a, c, g, or t
misc_feature (514)..(514) n is a, c, g, or t misc_feature
(532)..(532) n is a, c, g, or t misc_feature (549)..(549) n is a,
c, g, or t misc_feature (566)..(566) n is a, c, g, or t
misc_feature (583)..(583) n is a, c, g, or t misc_feature
(600)..(600) n is a, c, g, or t misc_feature (617)..(617) n is a,
c, g, or t misc_feature (634)..(634) n is a, c, g, or t
misc_feature (670)..(670) n is a, c, g, or t misc_feature
(689)..(689) n is a, c, g, or t misc_feature (706)..(706) n is a,
c, g, or t misc_feature (723)..(723) n is a, c, g, or t
misc_feature (741)..(741) n is a, c, g, or t misc_feature
(759)..(759) n is a, c, g, or t misc_feature (776)..(776) n is a,
c, g, or t misc_feature (793)..(793) n is a, c, g, or t
misc_feature (826)..(826) n is a, c, g, or t 20 nactgtactg
tatttatnat attttacaga aatanttatt catttgttta ancagtggct 60
catgcctgng ctgaggcagg aggacagttt gaggccagga gnagactagc ctggacaant
120 ctctacaaaa acataaaaat aaattagctg gnaaaaatag gctgggtgtg
gtggttcatg 180 cctgtaatcc tagcactttg gganggatca cctgaggtca
ggtncccaat atggtgaaac 240 ntagccaggt gtggtggnct gtagtcccag
ctactngagg ctgacacagg agncttgaac 300 ccaggaagtn gaggctgcag
tgagccntgc actccagcct gggnagtgag actctgtctc 360 ngtgtggtgg
catgtgtntg tggtcccagc tactnccaga aggtcaaggc tntgagctgt 420
gattgcatnt gcactccagc ctgggnagca agaccctatc tcnaatttac aatttacaan
480 ctgaagaact ttctttnatt ttgtttctaa atanagagat ggggtctcgc
tncccaggct 540 ggagtgcant agctcactgc agtctnaaat gcatttttaa
aantttctga ttaataaatn 600 tgtatgtgcc acatttncac acacacatgt
gtgncagtgg ctcacgcctg taatcccagc 660 actttgggan aacctgaggt
caggagttna gaccagcctg accaanttag ccaggcatgg 720 tgnctgtaat
cccagctact nggctgaggc aggagaatnc ttgaacccgg gaggcngagg 780
ttgcagttag ccntgcactc cagcctgggc aacaagagta aaactn 826 21 730 DNA
Unknown The pattern is a combinatorial arrangement of pyknons in
3'UTRs. misc_feature n indicates the number of nucleotides
separating the pyknons that surround it, as specified by Figure
11B. misc_feature (1)..(1) n is a, c, g, or t misc_feature
(18)..(18) n is a, c, g, or t misc_feature (35)..(35) n is a, c, g,
or t misc_feature (52)..(52) n is a, c, g, or t misc_feature
(88)..(88) n is a, c, g, or t misc_feature (109)..(109) n is a, c,
g, or t misc_feature (126)..(126) n is a, c, g, or t misc_feature
(143)..(143) n is a, c, g, or t misc_feature (161)..(161) n is a,
c, g, or t misc_feature (179)..(179) n is a, c, g, or t
misc_feature (196)..(196) n is a, c, g, or t misc_feature
(213)..(213) n is a, c, g, or t misc_feature (230)..(230) n is a,
c, g, or t misc_feature (247)..(247) n is a, c, g, or t
misc_feature (264)..(264) n is a, c, g, or t misc_feature
(281)..(281) n is a, c, g, or t misc_feature (298)..(298) n is a,
c, g, or t misc_feature (315)..(315) n is a, c, g, or t
misc_feature (332)..(332) n is a, c, g, or t misc_feature
(349)..(349) n is a, c, g, or t misc_feature (401)..(401) n is a,
c, g, or t misc_feature (418)..(418) n is a, c, g, or t
misc_feature (435)..(435) n is a, c, g, or t misc_feature
(452)..(452) n is a, c, g, or t misc_feature (469)..(469) n is a,
c, g, or t misc_feature (486)..(486) n is a, c, g, or t
misc_feature (503)..(503) n is a, c, g, or t misc_feature
(520)..(520) n is a, c, g, or t misc_feature (537)..(537) n is a,
c, g, or t misc_feature (554)..(554) n is a, c, g, or t
misc_feature (571)..(571) n is a, c, g, or t misc_feature
(592)..(592) n is a, c, g, or t misc_feature (609)..(609) n is a,
c, g, or t misc_feature (626)..(626) n is a, c, g, or t
misc_feature (644)..(644) n is a, c, g, or t misc_feature
(662)..(662) n is a, c, g, or t misc_feature (679)..(679) n is a,
c, g, or t misc_feature (696)..(696) n is a, c, g, or t
misc_feature (713)..(713) n is a, c, g, or t misc_feature
(730)..(730) n is a, c, g, or t 21 ncacacacac acccctgnaa ttgttttttc
taatnctttt tgaattttta ancagtggcc 60 catgcctgta atcccagcac
tttggggnat cacctgagga caggagttna gaccagtctg 120 gccaantagc
cgggcatggt ggnctgtaat cccagctact nggctgaggc aggagaatnc 180
ttgaacctag gaggcngagg ttgcagtgag ccntgcactc cagcctgggn gacaagagtg
240 aaactcnttt gaattaattt ttcngcctct aaaactgtga ngcatcagaa
tcacctgnag 300 ggcttgttaa aacanacatt tattgagctc cnagggcaga
ggaaacaanc aatggctcat 360 gcctgtaatc ccaacacttt gggaggccaa
ggtgggagga nagaccagcc tggacaanag 420 tgagatccta tctcnttagc
caggcatggt gncctatagt cctggctang ctgaggcagg 480 aggatnaggc
tgcagtaagc cangccactg cactcagccn agcaagaccc tgtctcntct 540
gggctgggca cagncacact ttgggaggcc natcacctga ggtcaggagt tnagaccagc
600 ctggccaant tagctgggcg tggtgnctgt aatcccagct actnggctga
ggcaggagaa 660 tncttgaacc caggaggcng aggttgcagt gagccntgca
ctccagcctg ggngacaaga 720 gcaaaactcn 730 22 1097 DNA Unknown The
pattern is a combinatorial arrangement of pyknons in 3'UTRs.
misc_feature n indicates the number of nucleotides separating the
pyknons that surround it, as specified by Figure 11B. misc_feature
(1)..(1) n is a, c, g, or t misc_feature (18)..(18) n is a, c, g,
or t misc_feature (35)..(35) n is a, c, g, or t misc_feature
(52)..(52) n is a, c, g, or t misc_feature (76)..(76) n is a, c, g,
or t misc_feature (95)..(95) n is a, c, g, or t misc_feature
(112)..(112) n is a, c, g, or t misc_feature (129)..(129) n is a,
c, g, or t misc_feature (147)..(147) n is a, c, g, or t
misc_feature (165)..(165) n is a, c, g, or t misc_feature
(182)..(182) n is a, c, g, or t misc_feature (199)..(199) n is a,
c, g, or t misc_feature (216)..(216) n is a, c, g, or t
misc_feature (235)..(235) n is a, c, g, or t misc_feature
(268)..(268) n is a, c, g, or t misc_feature (285)..(285) n is a,
c, g, or t misc_feature (320)..(320) n is a, c, g, or t
misc_feature (337)..(337) n is a, c, g, or t misc_feature
(355)..(355) n is a, c, g, or t misc_feature (372)..(372) n is a,
c, g, or t misc_feature (405)..(405) n is a, c, g, or t
misc_feature (438)..(438) n is a, c, g, or t misc_feature
(456)..(456) n is a, c, g, or t misc_feature (474)..(474) n is a,
c, g, or t misc_feature (507)..(507) n is a, c, g, or t
misc_feature (524)..(524) n is a, c, g, or t misc_feature
(561)..(561) n is a, c, g, or t misc_feature (578)..(578) n is a,
c, g, or t misc_feature (595)..(595) n is a, c, g, or t
misc_feature (612)..(612) n is a, c, g, or t misc_feature
(629)..(629) n is a, c, g, or t misc_feature (646)..(646) n is a,
c, g, or t misc_feature (664)..(664) n is a, c, g, or t
misc_feature (681)..(681) n is a, c, g, or t misc_feature
(698)..(698) n is a, c, g, or t misc_feature (715)..(715) n is a,
c, g, or t misc_feature (732)..(732) n is a, c, g, or t
misc_feature (749)..(749) n is a, c, g, or t misc_feature
(769)..(769) n is a, c, g, or t misc_feature (786)..(786) n is a,
c, g, or t misc_feature (803)..(803) n is a, c, g, or t
misc_feature (820)..(820) n is a, c, g, or t misc_feature
(837)..(837) n is a, c, g, or t misc_feature (854)..(854) n is a,
c, g, or t misc_feature (871)..(871) n is a, c, g, or t
misc_feature (888)..(888) n is a, c, g, or t misc_feature
(905)..(905) n is a, c, g, or t misc_feature (922)..(922) n is a,
c, g, or t misc_feature (942)..(942) n is a, c, g, or t
misc_feature (960)..(960) n is a, c, g, or t misc_feature
(979)..(979) n is a, c, g, or t misc_feature (996)..(996) n is a,
c, g, or t misc_feature (1013)..(1013) n is a, c, g, or t
misc_feature (1030)..(1030) n is a, c, g, or t misc_feature
(1063)..(1063) n is a, c, g, or t misc_feature (1080)..(1080) n is
a, c, g, or t misc_feature (1097)..(1097) n is a, c, g, or t 22
ntgtcaccca gctggagntg gttggagcag gaggncagtg gctcacgcct angaagccga
60 ggcgggtgga tcaccnagac catcctggct aacancctat ctctacaaaa
anttagctgg 120 gtgtggtgnc tgtagtccca gctactnggc tgaggcagga
gaatnaacct ctgcctccgg 180 gncctgcctc agcgtcccna gtagcaggga
ctacancgtg caccaccatg cccgngtatt 240 tttagtagag attggccagg
ctggtctnaa ctcttgacct caagngccaa agtgctggga 300 ttataggcgt
gagccaccgn ctgccaggac tgggttntgg ggacttgggg ggaancctct 360
ctgggctgca gnagtttcac tcttgttgcc caggctggag tgcangctca cctcaacctc
420 cgcctcccag gttcaagnat tctcctgcct cagccnagta gctgggatta
cagnatgcct 480 ggctaatttt gtatttttag tagaganagc tggtctcgaa
ctcnatctgc ccacttcggc 540 ctcccaaagt gctgggatta naggcatgag
ccaccgcnga gacagagtct cactncccag 600 gttggagtgc anatctgagc
tcactgcang cctcttgggt tcaagnattc tcctgcctca 660 gccnagtagc
taggactaca ncaccatgcc cagctaangt atttttagta gaganttggc 720
caggctggtc tnacctaagg tgatccacnt cccaaagtgc tgggattant gagccatggc
780 acctgnggaa aggggtcagg gcnggaaact gaggcccagn ctctgcctct
gggattngtg 840 ggcagcccag gagncatctg tgaaatggga ntgcctgggt
ttgaatcntc tgtgccttca 900 tttcngctgg gattccaggc antaatccca
gcactttggg angatcacga ggtcaggagn 960 agaccatcct agctaacana
gtgaaaccct atctcnttag ctgggcatgg tgntgtagtc 1020 ccagatactn
taaggcagaa gaatcgcttc aacctgggag gcngaggttg cagtgagccn 1080
tgcactccag cctgggn 1097 23 425 DNA Unknown The pattern is a
combinatorial arrangement of pyknons in 3'UTRs. misc_feature n
indicates the number of nucleotides separating the pyknons that
surround it, as specified by Figure 11B. misc_feature (1)..(1) n is
a, c, g, or t misc_feature (37)..(37) n is a, c, g, or t
misc_feature (55)..(55) n is a, c, g, or t misc_feature (74)..(74)
n is a, c, g, or t misc_feature (91)..(91) n is a, c, g, or t
misc_feature (109)..(109) n is a, c, g, or t misc_feature
(144)..(144) n is a, c, g, or t misc_feature (161)..(161) n is a,
c, g, or t misc_feature (178)..(178) n is a, c, g, or t
misc_feature (195)..(195) n is a, c, g, or t misc_feature
(212)..(212) n is a, c, g, or t misc_feature (231)..(231) n is a,
c, g, or t misc_feature (252)..(252) n is a, c, g, or t
misc_feature (269)..(269) n is a, c, g, or t misc_feature
(287)..(287) n is a, c, g, or t misc_feature (306)..(306) n is a,
c, g, or t misc_feature (323)..(323) n is a, c, g, or t
misc_feature (341)..(341) n is a, c, g, or t misc_feature
(374)..(374) n is a, c, g, or t misc_feature (391)..(391) n is a,
c, g, or t misc_feature (408)..(408) n is a, c, g, or t
misc_feature (425)..(425) n is a, c, g, or t 23 ncggtggctc
acacctgtaa tcccagcact ttgggangat cacaaggtca ggagnagacc 60
atcctgtcta acanaaatta gtcggacatg nctgtagtcc cagctactng gctgaggcag
120 gagaatggcg tgaacccagg aggngagctt gcagtgagcc ntgcactcca
gcctgggnga 180 cagagtgaga ctccncagtg gctcacgcct gncacattgg
gaggctgagg natcacctga 240 ggtcaggagt tnagaccagc ctggccaanc
tgtaatccca gctactngag gctgaggccg 300 gagaancttg agcccgagag
gtnaagccaa gatcatgcca ntgcactcca gcctgggcaa 360 cacagggaga
ctcntttgag aggcctaggc ntttttcttg tagaggtnca aaaatgggca 420 aaatn
425 24 721 DNA Unknown The pattern is a combinatorial arrangement
of pyknons in 3'UTRs. misc_feature n indicates the number of
nucleotides separating the pyknons that surround it, as specified
by Figure 11B. misc_feature (1)..(1) n is a, c, g, or t
misc_feature (18)..(18) n is a, c, g, or t misc_feature (35)..(35)
n is a, c, g, or t misc_feature (52)..(52) n is a, c, g, or t
misc_feature (69)..(69) n is a, c, g, or t misc_feature (86)..(86)
n is a, c, g, or t misc_feature (123)..(123) n is a, c, g, or t
misc_feature (141)..(141) n is a, c, g, or t misc_feature
(158)..(158) n is a, c, g, or t misc_feature (175)..(175) n is a,
c, g, or t misc_feature (192)..(192) n is a, c, g, or t
misc_feature (209)..(209) n is a, c, g, or t misc_feature
(226)..(226) n is a, c, g, or t misc_feature (246)..(246) n is a,
c, g, or t misc_feature (283)..(283) n is a, c, g, or t
misc_feature (300)..(300) n is a, c, g, or t misc_feature
(318)..(318) n is a, c, g, or t misc_feature (335)..(335) n is a,
c, g, or t misc_feature (352)..(352) n is a, c, g, or t
misc_feature (369)..(369) n is a, c, g, or t misc_feature
(386)..(386) n is a, c, g, or t misc_feature (403)..(403) n is a,
c, g, or t misc_feature (420)..(420) n is a, c, g, or t
misc_feature (437)..(437) n is a, c, g, or t misc_feature
(454)..(454) n is a, c, g, or t misc_feature (471)..(471) n is a,
c, g, or t misc_feature (488)..(488) n is a, c, g, or t
misc_feature (505)..(505) n is a, c, g, or t misc_feature
(525)..(525) n is a, c, g, or t misc_feature (542)..(542) n is a,
c, g, or t misc_feature (563)..(563) n is a, c, g, or t
misc_feature (580)..(580) n is a, c, g, or t misc_feature
(600)..(600) n is a, c, g, or t misc_feature (618)..(618) n is a,
c, g, or t misc_feature (636)..(636) n is a, c, g, or t
misc_feature (654)..(654) n is a, c, g, or t misc_feature
(671)..(671) n is a, c, g, or t misc_feature (688)..(688) n is a,
c, g, or t misc_feature (721)..(721) n is a, c, g, or t 24
ntttttgttt ttgaggcntc gcccagggtg gagtnggctc attgcaacct cngcctcccg
60 gcttcaagnc tgctgcctca gcctcngagt agctgggata acaggtgtgg
tggcgcatgc 120 ctncagctat tctggaggct ncttgaaccc aggaggtnga
ggttgcagtg agccncaaca 180 agagcaaaac tnaagagatg acatttgant
gcaaaggccc tgaggntaat cccagcactt 240 tggganatca cctgaggtca
ggagttagac cagcctggcc aanttagctg ggtgtggtgn 300 ctgtaatccc
agatactngc tgaggcatga gaatncttga acctgggagg cntgcagtga 360
gctgagatnt gcactccagc ctgggntagc tgggcctggt ggntatagtc ccagctactn
420 gaggctaagg caggagnctg tgagctatga ttgntatact ccagcctggc
ngcaagaccc 480 tgtctttnaa aaaaaaatct aggcnggcac ggtggctcac
gcctntaatc tcaacacttt 540 gnatcacctg aggtcaggag ttnagaccag
actgaccaan tctctactaa aaacgcaaan 600 ttagccgggc gtggtggnct
gtaatcccag ctactnggct gaggcaggag aatncttgaa 660 cccaggaggc
ntgcagtgag ctgagatntg cactccagcc taggcaacaa gagtgaaact 720 n 721 25
830 DNA Unknown The pattern is a combinatorial arrangement of
pyknons in 3'UTRs. misc_feature n
indicates the number of nucleotides separating the pyknons that
surround it, as specified by Figure 11B. misc_feature (1)..(1) n is
a, c, g, or t misc_feature (18)..(18) n is a, c, g, or t
misc_feature (35)..(35) n is a, c, g, or t misc_feature (71)..(71)
n is a, c, g, or t misc_feature (88)..(88) n is a, c, g, or t
misc_feature (105)..(105) n is a, c, g, or t misc_feature
(122)..(122) n is a, c, g, or t misc_feature (155)..(155) n is a,
c, g, or t misc_feature (172)..(172) n is a, c, g, or t
misc_feature (189)..(189) n is a, c, g, or t misc_feature
(206)..(206) n is a, c, g, or t misc_feature (223)..(223) n is a,
c, g, or t misc_feature (240)..(240) n is a, c, g, or t
misc_feature (257)..(257) n is a, c, g, or t misc_feature
(274)..(274) n is a, c, g, or t misc_feature (291)..(291) n is a,
c, g, or t misc_feature (308)..(308) n is a, c, g, or t
misc_feature (325)..(325) n is a, c, g, or t misc_feature
(342)..(342) n is a, c, g, or t misc_feature (359)..(359) n is a,
c, g, or t misc_feature (376)..(376) n is a, c, g, or t
misc_feature (394)..(394) n is a, c, g, or t misc_feature
(427)..(427) n is a, c, g, or t misc_feature (444)..(444) n is a,
c, g, or t misc_feature (463)..(463) n is a, c, g, or t
misc_feature (480)..(480) n is a, c, g, or t misc_feature
(514)..(514) n is a, c, g, or t misc_feature (531)..(531) n is a,
c, g, or t misc_feature (548)..(548) n is a, c, g, or t
misc_feature (565)..(565) n is a, c, g, or t misc_feature
(582)..(582) n is a, c, g, or t misc_feature (599)..(599) n is a,
c, g, or t misc_feature (616)..(616) n is a, c, g, or t
misc_feature (633)..(633) n is a, c, g, or t misc_feature
(650)..(650) n is a, c, g, or t misc_feature (670)..(670) n is a,
c, g, or t misc_feature (688)..(688) n is a, c, g, or t
misc_feature (707)..(707) n is a, c, g, or t misc_feature
(724)..(724) n is a, c, g, or t misc_feature (742)..(742) n is a,
c, g, or t misc_feature (779)..(779) n is a, c, g, or t
misc_feature (796)..(796) n is a, c, g, or t misc_feature
(813)..(813) n is a, c, g, or t misc_feature (830)..(830) n is a,
c, g, or t 25 ntcctctgtc tctgcctntt gaattttgta aaatncagtg
gctcacacct gtaatcccag 60 ccctttggga nctgaggcag gtggattnct
tgagcccagg agttnctcat ctctacaaaa 120 angtggtggc gtgtacctct
agtcccagct acccnctagg ctgcagtgag cntgcactcc 180 agcctggtna
cagtgaggcc ctgtcnaaaa aaaatttttc tgnactgagc acttaaaatn 240
aaaatgttac tgaaatnatc atgttatttt tctnttggaa attattttaa nttttgttga
300 aagcaaanat attcactttt taaanataca tatttacata anatagaaac
ttttaaaang 360 ggcatggtgg ctcacntaat cccagcactt tgtngaggtg
ggcggatctc ctaaggtcag 420 gagttcnaga ccagcctggc caanatggtg
aaaccccgtc ttnttagcca ggcgtggtgn 480 ctgtaatcct agctactcag
gaggcaggaa aatncttgaa cctgggaggc ngaggttgca 540 gtgagccntg
cactccagcc tgggngacag agtgagactc cntgagaata ttttaaatna 600
ttttattttc aagatntcat tccaatttta gtncaaaggc cgggcgcggn taatcccagc
660 actttgggan gatcacgagg tcaggagnag accatcctgg ctaacanatt
agccaggcct 720 ggtnctgtag tcccagctac tngaggctga ggcaggagaa
cggcgtgaac ccgggaggng 780 agcttgcagt gagccntgca ctccagccta
ggnggcagag ccagactccn 830 26 19 DNA Unknown The pattern is an
example of a pyknon. 26 tcccaaagtg ctgggatta 19 27 16 DNA Unknown
The pattern is an example of a pyknon. 27 cccaggctgg agtgca 16 28
17 DNA Unknown The pattern is an example of a pyknon. 28 agtagctggg
attacag 17 29 20 DNA Unknown The pattern is an example of a pyknon.
29 aactcctgac ctcaggtgat 20 30 1222 DNA Homo sapiens misc_feature n
indicates a number of nucleotides that separate the surrounding
pyknons, as specificed in Figure 12. misc_feature (1)..(1) n is a,
c, g, or t misc_feature (18)..(18) n is a, c, g, or t misc_feature
(35)..(35) n is a, c, g, or t misc_feature (52)..(52) n is a, c, g,
or t misc_feature (69)..(69) n is a, c, g, or t misc_feature
(86)..(86) n is a, c, g, or t misc_feature (106)..(106) n is a, c,
g, or t misc_feature (124)..(124) n is a, c, g, or t misc_feature
(142)..(142) n is a, c, g, or t misc_feature (160)..(160) n is a,
c, g, or t misc_feature (178)..(178) n is a, c, g, or t
misc_feature (199)..(199) n is a, c, g, or t misc_feature
(219)..(219) n is a, c, g, or t misc_feature (236)..(236) n is a,
c, g, or t misc_feature (253)..(253) n is a, c, g, or t
misc_feature (270)..(270) n is a, c, g, or t misc_feature
(287)..(287) n is a, c, g, or t misc_feature (305)..(305) n is a,
c, g, or t misc_feature (346)..(346) n is a, c, g, or t
misc_feature (363)..(363) n is a, c, g, or t misc_feature
(380)..(380) n is a, c, g, or t misc_feature (397)..(397) n is a,
c, g, or t misc_feature (414)..(414) n is a, c, g, or t
misc_feature (431)..(431) n is a, c, g, or t misc_feature
(466)..(466) n is a, c, g, or t misc_feature (483)..(483) n is a,
c, g, or t misc_feature (500)..(500) n is a, c, g, or t
misc_feature (517)..(517) n is a, c, g, or t misc_feature
(536)..(536) n is a, c, g, or t misc_feature (553)..(553) n is a,
c, g, or t misc_feature (570)..(570) n is a, c, g, or t
misc_feature (587)..(587) n is a, c, g, or t misc_feature
(608)..(608) n is a, c, g, or t misc_feature (628)..(628) n is a,
c, g, or t misc_feature (645)..(645) n is a, c, g, or t
misc_feature (662)..(662) n is a, c, g, or t misc_feature
(679)..(679) n is a, c, g, or t misc_feature (699)..(699) n is a,
c, g, or t misc_feature (718)..(718) n is a, c, g, or t
misc_feature (738)..(738) n is a, c, g, or t misc_feature
(756)..(756) n is a, c, g, or t misc_feature (773)..(773) n is a,
c, g, or t misc_feature (792)..(792) n is a, c, g, or t
misc_feature (810)..(810) n is a, c, g, or t misc_feature
(827)..(827) n is a, c, g, or t misc_feature (844)..(844) n is a,
c, g, or t misc_feature (863)..(863) n is a, c, g, or t
misc_feature (880)..(880) n is a, c, g, or t misc_feature
(897)..(897) n is a, c, g, or t misc_feature (914)..(914) n is a,
c, g, or t misc_feature (932)..(932) n is a, c, g, or t
misc_feature (985)..(985) n is a, c, g, or t misc_feature
(1005)..(1005) n is a, c, g, or t misc_feature (1022)..(1022) n is
a, c, g, or t misc_feature (1039)..(1039) n is a, c, g, or t
misc_feature (1060)..(1060) n is a, c, g, or t misc_feature
(1081)..(1081) n is a, c, g, or t misc_feature (1098)..(1098) n is
a, c, g, or t misc_feature (1133)..(1133) n is a, c, g, or t
misc_feature (1150)..(1150) n is a, c, g, or t misc_feature
(1170)..(1170) n is a, c, g, or t misc_feature (1187)..(1187) n is
a, c, g, or t misc_feature (1204)..(1204) n is a, c, g, or t
misc_feature (1222)..(1222) n is a, c, g, or t 30 ntttcttcat
tcattcanga cagtctcgct ctgtncccag gctggagtgc anatctcagc 60
tcactgcang ccttccaggt tcaagnagtc tcctgcctca gcctcnagta gctgggatta
120 cagnaccacg cctggctaat tncattttta gtagagacgn cttcaccatg
ttggccanaa 180 ctcctgacct caggtaatnt cccaaagtgc tgagattang
tgagctaccg tgcccnggtg 240 gtcagggaag gcnaggcagg aggaacagcn
ttcattcaac aaatatnttg cccaggctgg 300 agttnagctc actgcaacct
ccacctccca gattcaagtg attctnaata gctgggacta 360 canatgtcac
catgcccagn ttggccaggc tggtctnaac tcctggcctc aaanaatcta 420
cctgccttgg ntccaaagtg ctaggattac aggtgtgagc caccgnccca ggctggagtg
480 canggctcat tgcaacctcn acctcccggg ttcaagntca gcttcctgag
tagctntaca 540 ggcacttgcc acngacaggg ttttgccatn ttggccaggc
tggtctnaac tcctgacctc 600 aggtgatntc ccaaagtgct ggcattangg
cttgagccac cacgnggagt ctcgctctgt 660 cngctggagt gcagtgacnt
gatctcggct cactgcagnc tccgccttct gggttcanat 720 tctcctgcct
cagcctcnag tagctgggac tacagnccac ctcgcctggc tantgttagc 780
caggatggtc tnctcgtgac ctcgtgatcn ccagcctcgg cctcccntga gatggagttt
840 cgcntcaggc tggagtgcaa tgnatctcag ctcactgcan gcctcccagg
ttcaagnatt 900 ctcccacctc agcnagtagc tgggattaca gnggggtttc
actatgttgg ccaagctggt 960 cttgaactcc tgacctcagg tgatntccca
aagtgctggg attantggag gtagggcctg 1020 gntgaatgaa agaatgaant
ttgggaggcc aaggtgggtn atcacctgag gtcaggagtt 1080 nagaccagcc
tggccaancc tctactaaaa atacaaaaaa ttagctgggc gtntgcctgt 1140
aatcacagcn gaggctgagg caggagaatn gaggttgcag tgagccntgc cattgcactc
1200 cagnggcaac agagcgagac tn 1222 31 451 DNA Unknown The pattern
is a 5'UTR mosaic composed of pyknons. misc_feature n indicates a
number of nucleotides that separate the surrounding pyknons, as
specificed in Figure 12. misc_feature (1)..(1) n is a, c, g, or t
misc_feature (18)..(18) n is a, c, g, or t misc_feature (52)..(52)
n is a, c, g, or t misc_feature (69)..(69) n is a, c, g, or t
misc_feature (86)..(86) n is a, c, g, or t misc_feature
(103)..(103) n is a, c, g, or t misc_feature (120)..(120) n is a,
c, g, or t misc_feature (137)..(137) n is a, c, g, or t
misc_feature (154)..(154) n is a, c, g, or t misc_feature
(174)..(174) n is a, c, g, or t misc_feature (191)..(191) n is a,
c, g, or t misc_feature (208)..(208) n is a, c, g, or t
misc_feature (225)..(225) n is a, c, g, or t misc_feature
(242)..(242) n is a, c, g, or t misc_feature (259)..(259) n is a,
c, g, or t misc_feature (292)..(292) n is a, c, g, or t
misc_feature (309)..(309) n is a, c, g, or t misc_feature
(327)..(327) n is a, c, g, or t misc_feature (344)..(344) n is a,
c, g, or t misc_feature (361)..(361) n is a, c, g, or t
misc_feature (382)..(382) n is a, c, g, or t misc_feature
(399)..(399) n is a, c, g, or t misc_feature (416)..(416) n is a,
c, g, or t misc_feature (433)..(433) n is a, c, g, or t
misc_feature (451)..(451) n is a, c, g, or t 31 ngagtcttgc
tctgttgnga ggctggagtg cagtgatctc agctcactgg angcctcctg 60
ggttcaagnt ctcaacctcc caagtntgcc accacacctg ccntttttag tagggatggn
120 ttgggcaggc tggtctnaac tcctgacctc aagntcccag agtgctggga
ttantgagcc 180 accatgcctg ncctgtctct aaaacaanca gaaattttct
ttttngagac agagtcttac 240 tntcgctcag gctggagtna tctcagctca
ctgcaacctc tgcctcctgg tnctccttcc 300 tcagcctcna gtagctggga
ttacacncca ccgcgcctgg ctanttggcc aggctggtct 360 naactcctga
cctcaggtga tnagcctccc aaaatgctnc aggcatgagc caccgnctca 420
gtttcttcat ttnggggttg ctgccagctg n 451 32 402 DNA Unknown The
pattern is a 5'UTR mosaic composed of pyknons. misc_feature n
indicates a number of nucleotides that separate the surrounding
pyknons, as specificed in Figure 12. misc_feature (1)..(1) n is a,
c, g, or t misc_feature (20)..(20) n is a, c, g, or t misc_feature
(37)..(37) n is a, c, g, or t misc_feature (54)..(54) n is a, c, g,
or t misc_feature (72)..(72) n is a, c, g, or t misc_feature
(89)..(89) n is a, c, g, or t misc_feature (125)..(125) n is a, c,
g, or t misc_feature (142)..(142) n is a, c, g, or t misc_feature
(159)..(159) n is a, c, g, or t misc_feature (176)..(176) n is a,
c, g, or t misc_feature (193)..(193) n is a, c, g, or t
misc_feature (210)..(210) n is a, c, g, or t misc_feature
(230)..(230) n is a, c, g, or t misc_feature (248)..(248) n is a,
c, g, or t misc_feature (281)..(281) n is a, c, g, or t
misc_feature (298)..(298) n is a, c, g, or t misc_feature
(315)..(315) n is a, c, g, or t misc_feature (332)..(332) n is a,
c, g, or t misc_feature (368)..(368) n is a, c, g, or t
misc_feature (385)..(385) n is a, c, g, or t misc_feature
(402)..(402) n is a, c, g, or t 32 ntgctgggat tacaggcgcn accacgcccg
gctaatnggg tttcactgtg ttgnaggatg 60 gtctctatct cngtgatatg
cccgcctcnt cccaaagtgc tgggattaca ggcttgagcc 120 accgnggcct
atttatttat tngagacgga gtgttgctnc ccaggctgga gtgcanggct 180
cactgcaacc tcngcctccc gggttcaagn attctcctgc ctcagcctcn agtagctggg
240 attacagnac acccggctaa ttttgtattt ttagtagaaa ngggtttctc
catgttgnct 300 ggtttcgaac tcccngtcat ctgcctgcct cntcccaaag
tgctgggatt acaggcgtga 360 gccaccgntc tgtattttta aaaantaaat
gatttgccca an 402 33 550 DNA Unknown The pattern is a 5'UTR mosaic
composed of pyknons. misc_feature n indicates a number of
nucleotides that separate the surrounding pyknons, as specificed in
Figure 12. misc_feature (1)..(1) n is a, c, g, or t misc_feature
(18)..(18) n is a, c, g, or t misc_feature (35)..(35) n is a, c, g,
or t misc_feature (52)..(52) n is a, c, g, or t misc_feature
(69)..(69) n is a, c, g, or t misc_feature (86)..(86) n is a, c, g,
or t misc_feature (103)..(103) n is a, c, g, or t misc_feature
(120)..(120) n is a, c, g, or t misc_feature (140)..(140) n is a,
c, g, or t misc_feature (158)..(158) n is a, c, g, or t
misc_feature (175)..(175) n is a, c, g, or t misc_feature
(192)..(192) n is a, c, g, or t misc_feature (213)..(213) n is a,
c, g, or t misc_feature (233)..(233) n is a, c, g, or t
misc_feature (250)..(250) n is a, c, g, or t misc_feature
(267)..(267) n is a, c, g, or t misc_feature (284)..(284) n is a,
c, g, or t misc_feature (301)..(301) n is a, c, g, or t
misc_feature (318)..(318) n is a, c, g, or t misc_feature
(335)..(335) n is a, c, g, or t misc_feature (352)..(352) n is a,
c, g, or t misc_feature (370)..(370) n is a, c, g, or t
misc_feature (387)..(387) n is a, c, g, or t misc_feature
(404)..(404) n is a, c, g, or t misc_feature (421)..(421) n is a,
c, g, or t misc_feature (439)..(439) n is a, c, g, or t
misc_feature (460)..(460) n is a, c, g, or t misc_feature
(478)..(478) n is a, c, g, or t misc_feature (495)..(495) n is a,
c, g, or t misc_feature (512)..(512) n is a, c, g, or t
misc_feature (534)..(534) n is a, c, g, or t 33 nacaacaccc
aagtcctngc ctgagtgcag tggcnttctt caaagaagaa antttgagac 60
agagtctcnc ccaggctgga gtgcanggct cactacaacc tcngcctccc aggttcaaan
120 attctcctgc ctcagcctcn agtagctggg attacagnat agagatgggg
tttcnttggc 180 caggctggtc tnaactcctg acctcaggtg atntcccaaa
gtgctgggat tancaccaca 240 cccagccatn attatgtttt ctaaaancag
actctgcctc aaangagtct tgctctgttg 300 ncgctggagt gcagtggngg
ctcactgcaa cctcngctcc caggttcaag cnagtagctg 360 ggactacagn
acatccggct aattttnata gagacggggt tttnttggcc aggatggtct 420
nctcctgacc tcatgatcna gcctcccaaa gtgctggggn aggcatgagc caccacgngt
480 cttgctctgt caccntgcag ccttgaactc cncctcctgc ctcagcctcc
cganacaggt 540 atgtaccacc 550 34 475 DNA Unknown The pattern is a
5'UTR mosaic composed of pyknons. misc_feature n indicates a number
of nucleotides that separate the surrounding pyknons, as specificed
in Figure 12. misc_feature (1)..(1) n is a, c, g, or t misc_feature
(35)..(35) n is a, c, g, or t misc_feature (52)..(52) n is a, c, g,
or t misc_feature (69)..(69) n is a, c, g, or t misc_feature
(89)..(89) n is a, c, g, or t misc_feature (109)..(109) n is a, c,
g, or t misc_feature (127)..(127) n is a, c, g, or t misc_feature
(144)..(144) n is a, c, g, or t misc_feature (161)..(161) n is a,
c, g, or t misc_feature (182)..(182) n is a, c, g, or t
misc_feature (218)..(218) n is a, c, g, or t misc_feature
(251)..(251) n is a, c, g, or t misc_feature (268)..(268) n is a,
c, g, or t misc_feature (286)..(286) n is a, c, g, or t
misc_feature (303)..(303) n is a, c, g, or t misc_feature
(320)..(320) n is a, c, g, or t misc_feature (353)..(353) n is a,
c, g, or t misc_feature (371)..(371) n is a, c, g, or t
misc_feature (390)..(390) n is a, c, g, or t misc_feature
(407)..(407) n is a, c, g, or t misc_feature (424)..(424) n is a,
c, g, or t misc_feature (441)..(441) n is a, c, g, or t
misc_feature (458)..(458) n is a, c, g, or t misc_feature
(475)..(475) n is a, c, g, or t 34 nagagtcttg ctctgtcgcc caggctggag
tgcanggctc actgcaacct cngcctccca 60 ggttcaagna ttctcctgcc
tcagcctcnt gtagctggga ttacaggcnc accatgcccg 120 gctaatntta
tttttagtag aganttggcc aggctggtct naactcctga cctcaggtga 180
tntcccaaag tgctgggatt acaggcgtga gccaccgntg gtcttgctgt gtcacccagg
240 ctggagtgca ntcttgaatt cctgggcnaa gcctcccaag tagctncagg
cacatactac 300 cantttttgt agagacaggn ttactatgtt gcccagactg
gtcttgaact ccnattacag 360 gtgtgagcct ngtttttttg agacagggtn
cccaggctgg agtgcanggc tcactgcagc 420 ctcncctccc aaggctcagg
natcctcttg cctcagcnac caagtagctg ggacn 475 35 199 DNA Unknown The
pattern is a 5'UTR mosaic composed of pyknons. misc_feature n
indicates a number of nucleotides that separate the surrounding
pyknons, as specificed in Figure 12. misc_feature (1)..(1) n is a,
c, g, or t misc_feature (18)..(18) n is a, c, g, or t misc_feature
(35)..(35) n is a, c, g, or t misc_feature (53)..(53) n is a, c, g,
or t misc_feature (73)..(73) n is a, c, g, or t misc_feature
(91)..(91) n is a, c, g, or t misc_feature (108)..(108) n is a, c,
g, or t misc_feature (125)..(125) n is a, c, g, or t misc_feature
(146)..(146) n is a, c, g, or t misc_feature (182)..(182) n is a,
c, g, or t misc_feature (199)..(199) n is a, c, g, or t 35
nggctggagg gcagaggncc caggctggag tgcangatct cggctcactg cgnattctcc
60 tgcctcagcc tcnagtagct gggattacag nccatcacgc ccggctantt
ggtcaggcta 120 gtctnaactc ctgacctcag gtgatntccc aaagtgctgg
gattacaggc gtgagccacc 180 gngcagagac tggagggan 199 36 506 DNA
Unknown The pattern is a 5'UTR mosaic composed of pyknons.
misc_feature n indicates a number of nucleotides that separate the
surrounding pyknons, as specificed in Figure 12. misc_feature
(1)..(1) n is a, c, g, or t misc_feature (53)..(53) n is a, c, g,
or t misc_feature (105)..(105) n is a, c, g, or t misc_feature
(122)..(122) n is a, c, g, or t misc_feature (143)..(143) n is a,
c, g, or t misc_feature (160)..(160) n is a, c, g, or t
misc_feature (177)..(177) n is a, c, g, or t misc_feature
(213)..(213) n is a, c, g, or t misc_feature (230)..(230) n is a,
c, g, or t misc_feature (247)..(247) n is a, c, g, or t
misc_feature
(280)..(280) n is a, c, g, or t misc_feature (297)..(297) n is a,
c, g, or t misc_feature (314)..(314) n is a, c, g, or t
misc_feature (331)..(331) n is a, c, g, or t misc_feature
(349)..(349) n is a, c, g, or t misc_feature (366)..(366) n is a,
c, g, or t misc_feature (383)..(383) n is a, c, g, or t
misc_feature (400)..(400) n is a, c, g, or t misc_feature
(419)..(419) n is a, c, g, or t misc_feature (455)..(455) n is a,
c, g, or t misc_feature (472)..(472) n is a, c, g, or t
misc_feature (489)..(489) n is a, c, g, or t misc_feature
(506)..(506) n is a, c, g, or t 36 ntctgttgcc aggctggagt gccgtggtgt
gatctcggct cactgcaacc tcnctctcag 60 gttcaagcga ttctcctgcc
tcagcctcct cagtagctgg gattnaggca tgcactacca 120 tntagagatg
gggtttcacc acnttgccca ggctggtctn aactcctgac ctcaacntcc 180
caaagtgctg ggattacagg cgtgagccac cgncccggcc tgattttttn ctggataaag
240 agaatgngag tcttgctctg ttgctcaggc tggagtgcan gatctcagct
cattgcngcc 300 ccccaggttc aagnttcctg cctcagcctc nagtagctgg
gattacagna cacctggcta 360 attttnattt tagtagagat ggnttggcca
ggatggtctn cctcaagtga tctgcctgnt 420 cccaaagtgc taggattaca
ggtgtgagcc atcantttcc ttatctgtaa angaatgact 480 ttggggtant
tctttctttt ccatgn 506 37 542 DNA Unknown The pattern is a 5'UTR
mosaic composed of pyknons. misc_feature n indicates a number of
nucleotides that separate the surrounding pyknons, as specificed in
Figure 12. misc_feature (1)..(1) n is a, c, g, or t misc_feature
(18)..(18) n is a, c, g, or t misc_feature (51)..(51) n is a, c, g,
or t misc_feature (68)..(68) n is a, c, g, or t misc_feature
(85)..(85) n is a, c, g, or t misc_feature (103)..(103) n is a, c,
g, or t misc_feature (121)..(121) n is a, c, g, or t misc_feature
(138)..(138) n is a, c, g, or t misc_feature (159)..(159) n is a,
c, g, or t misc_feature (176)..(176) n is a, c, g, or t
misc_feature (194)..(194) n is a, c, g, or t misc_feature
(230)..(230) n is a, c, g, or t misc_feature (247)..(247) n is a,
c, g, or t misc_feature (264)..(264) n is a, c, g, or t
misc_feature (281)..(281) n is a, c, g, or t misc_feature
(300)..(300) n is a, c, g, or t misc_feature (317)..(317) n is a,
c, g, or t misc_feature (350)..(350) n is a, c, g, or t
misc_feature (368)..(368) n is a, c, g, or t misc_feature
(387)..(387) n is a, c, g, or t misc_feature (405)..(405) n is a,
c, g, or t misc_feature (441)..(441) n is a, c, g, or t
misc_feature (458)..(458) n is a, c, g, or t misc_feature
(491)..(491) n is a, c, g, or t misc_feature (508)..(508) n is a,
c, g, or t misc_feature (525)..(525) n is a, c, g, or t
misc_feature (542)..(542) n is a, c, g, or t 37 nacatatttt
aaattctnga gtcttgctct gttgcccagg ctggagtgca ngtctcagct 60
cactgcangc ctcttgggtt caagntccta cctcagcctc ctnagtagct gggattacag
120 ntaccaccat gtccagcntt ttactagaga tggggtttnt tggccaggct
ggtctnctcc 180 tgacctcatg atcntcccaa agtgctggga ttacaggcat
gagccaccgn gaatgggcaa 240 aagctgngtc ttgctctgtc accnaggcta
gagtgcagtg nttcagctca ctgcaacctn 300 gcctcccggg ttcaagngcc
tccacctcct gagcagctgg gattacaggn caccatgccc 360 ggctaatntg
ttagccagga ttgtctnctc ttgacctcgt gatcntccca aagtgctggg 420
attacaggcg tgagccactg ncaatgtttt acagtttnct caaggttaca cagcgagtgt
480 gtgtgagaga naacccagga ggttgaanat ctgacctttc tctcntgtgc
caggcccagg 540 gn 542 38 467 DNA Unknown The pattern is a 5'UTR
mosaic composed of pyknons. misc_feature n indicates a number of
nucleotides that separate the surrounding pyknons, as specificed in
Figure 12. misc_feature (1)..(1) n is a, c, g, or t misc_feature
(18)..(18) n is a, c, g, or t misc_feature (35)..(35) n is a, c, g,
or t misc_feature (52)..(52) n is a, c, g, or t misc_feature
(69)..(69) n is a, c, g, or t misc_feature (86)..(86) n is a, c, g,
or t misc_feature (103)..(103) n is a, c, g, or t misc_feature
(120)..(120) n is a, c, g, or t misc_feature (137)..(137) n is a,
c, g, or t misc_feature (154)..(154) n is a, c, g, or t
misc_feature (171)..(171) n is a, c, g, or t misc_feature
(188)..(188) n is a, c, g, or t misc_feature (206)..(206) n is a,
c, g, or t misc_feature (223)..(223) n is a, c, g, or t
misc_feature (240)..(240) n is a, c, g, or t misc_feature
(257)..(257) n is a, c, g, or t misc_feature (274)..(274) n is a,
c, g, or t misc_feature (291)..(291) n is a, c, g, or t
misc_feature (309)..(309) n is a, c, g, or t misc_feature
(328)..(328) n is a, c, g, or t misc_feature (345)..(345) n is a,
c, g, or t misc_feature (362)..(362) n is a, c, g, or t
misc_feature (379)..(379) n is a, c, g, or t misc_feature
(399)..(399) n is a, c, g, or t misc_feature (416)..(416) n is a,
c, g, or t misc_feature (433)..(433) n is a, c, g, or t
misc_feature (450)..(450) n is a, c, g, or t misc_feature
(467)..(467) n is a, c, g, or t 38 nttgtcgccc acgctggnat ctcagttcac
tgcangcctc ccaggttcaa gntgcctcag 60 cttcccgana gctaggacta
caggtnacgc tcagctaatt ttngatgggg ttttcccatn 120 ttggccaggc
tggtctnaac tcctgacctc agancaggcg tgagccactg nacattttgg 180
tatgtttntg aacaggcaac ttacanaata atattttctt cantgctgga gtgcaatggn
240 ggctcactgc aacctcngcc tcctgggttc aggnctcctg cctcagactc
nagtagctgg 300 gattacagng cctgccacca cgcccggngt gtttcactat
gttgngggct ggtctcgaac 360 tnctgccctc aggtgatcnt cccaaagtgc
tgggattang gcgtgagcca ctgctntaga 420 tgaaacaaga ttntatgcat
gtatctttan ttttttttta aagactn 467 39 16 DNA Unknown The pattern is
an example of a pyknon. 39 aaatgtgaag aatgtg 16 40 16 DNA Unknown
The pattern is an example of a pyknon. 40 tcatactgga gagaaa 16 41
16 DNA Unknown The pattern is an example of a pyknon. 41 acaagtgtga
agaatg 16 42 16 DNA Unknown The pattern is an example of a pyknon.
42 catactgaag agaaac 16 43 290 DNA Unknown The pattern is an amino
acid coding region mosaic composed of pyknons. misc_feature n
indicates a number of nucleotides that separate the surrounding
pyknons, as specified by Figure 13. misc_feature (1)..(1) n is a,
c, g, or t misc_feature (18)..(18) n is a, c, g, or t misc_feature
(35)..(35) n is a, c, g, or t misc_feature (52)..(52) n is a, c, g,
or t misc_feature (69)..(69) n is a, c, g, or t misc_feature
(86)..(86) n is a, c, g, or t misc_feature (103)..(103) n is a, c,
g, or t misc_feature (120)..(120) n is a, c, g, or t misc_feature
(137)..(137) n is a, c, g, or t misc_feature (154)..(154) n is a,
c, g, or t misc_feature (171)..(171) n is a, c, g, or t
misc_feature (188)..(188) n is a, c, g, or t misc_feature
(205)..(205) n is a, c, g, or t misc_feature (222)..(222) n is a,
c, g, or t misc_feature (239)..(239) n is a, c, g, or t
misc_feature (256)..(256) n is a, c, g, or t misc_feature
(273)..(273) n is a, c, g, or t misc_feature (290)..(290) n is a,
c, g, or t 43 naaggaaaaa aacctttnag cagagcataa aaganattta
cagtttaaaa anaaatgtga 60 agaatgtgnt catactggag agaaanaaat
gtgaagaatg tgntcatact ggagagaaan 120 aaatgtgaag aatgtgnaaa
tgtgaagaat gtgntcatac tggagagaaa naaatgtgaa 180 gaatgtgnaa
atgtgaagaa tgtgntcata ctggagagaa anaaatgtga agaatgtgnt 240
catactggag agaaanaaat gtgaagaatg tgntcatact ggagagaaan 290 44 222
DNA Unknown The pattern is an amino acid coding region mosaic
composed of pyknons. misc_feature n indicates a number of
nucleotides that separate the surrounding pyknons, as specified by
Figure 13. misc_feature (1)..(1) n is a, c, g, or t misc_feature
(18)..(18) n is a, c, g, or t misc_feature (35)..(35) n is a, c, g,
or t misc_feature (52)..(52) n is a, c, g, or t misc_feature
(69)..(69) n is a, c, g, or t misc_feature (86)..(86) n is a, c, g,
or t misc_feature (103)..(103) n is a, c, g, or t misc_feature
(120)..(120) n is a, c, g, or t misc_feature (137)..(137) n is a,
c, g, or t misc_feature (154)..(154) n is a, c, g, or t
misc_feature (171)..(171) n is a, c, g, or t misc_feature
(188)..(188) n is a, c, g, or t misc_feature (205)..(205) n is a,
c, g, or t misc_feature (222)..(222) n is a, c, g, or t 44
naaatgtgaa gaatgtgnaa atgtgaagaa tgtgntcata ctggagagaa antcatactg
60 gagagaaant catactggag agaaanaaat gtgaagaatg tgntcatact
ggagagaaan 120 tcatactgga gagaaanaaa tgtgaagaat gtgntcatac
tggagagaaa naaatgtaaa 180 gaatgtgntc atactggaga gaaanaaatg
tgaagaatgt gn 222 45 188 DNA Unknown The pattern is an amino acid
coding region mosaic composed of pyknons. misc_feature n indicates
a number of nucleotides that separate the surrounding pyknons, as
specified by Figure 13. misc_feature (1)..(1) n is a, c, g, or t
misc_feature (18)..(18) n is a, c, g, or t misc_feature (35)..(35)
n is a, c, g, or t misc_feature (52)..(52) n is a, c, g, or t
misc_feature (69)..(69) n is a, c, g, or t misc_feature (86)..(86)
n is a, c, g, or t misc_feature (103)..(103) n is a, c, g, or t
misc_feature (120)..(120) n is a, c, g, or t misc_feature
(137)..(137) n is a, c, g, or t misc_feature (154)..(154) n is a,
c, g, or t misc_feature (171)..(171) n is a, c, g, or t
misc_feature (188)..(188) n is a, c, g, or t 45 naaatgtgaa
gaatgtgntc atactggaga gaaanaaatg tgaagaatgt gnaaatgtga 60
agaatgtgnc atactgaaga gaaacnaaat gtgaagaatg tgntgtgaaa aatgtggcan
120 aaatgtgaag aatgtgnaaa tgtgaagaat gtgnaaatgt gaagaatgtg
ntcatactgg 180 agagaaan 188 46 409 DNA Unknown The pattern is an
amino acid coding region mosaic composed of pyknons. misc_feature n
indicates a number of nucleotides that separate the surrounding
pyknons, as specified by Figure 13. misc_feature (1)..(1) n is a,
c, g, or t misc_feature (18)..(18) n is a, c, g, or t misc_feature
(35)..(35) n is a, c, g, or t misc_feature (52)..(52) n is a, c, g,
or t misc_feature (69)..(69) n is a, c, g, or t misc_feature
(86)..(86) n is a, c, g, or t misc_feature (103)..(103) n is a, c,
g, or t misc_feature (120)..(120) n is a, c, g, or t misc_feature
(137)..(137) n is a, c, g, or t misc_feature (154)..(154) n is a,
c, g, or t misc_feature (171)..(171) n is a, c, g, or t
misc_feature (188)..(188) n is a, c, g, or t misc_feature
(205)..(205) n is a, c, g, or t misc_feature (222)..(222) n is a,
c, g, or t misc_feature (239)..(239) n is a, c, g, or t
misc_feature (256)..(256) n is a, c, g, or t misc_feature
(273)..(273) n is a, c, g, or t misc_feature (290)..(290) n is a,
c, g, or t misc_feature (307)..(307) n is a, c, g, or t
misc_feature (324)..(324) n is a, c, g, or t misc_feature
(341)..(341) n is a, c, g, or t misc_feature (358)..(358) n is a,
c, g, or t misc_feature (375)..(375) n is a, c, g, or t
misc_feature (392)..(392) n is a, c, g, or t misc_feature
(409)..(409) n is a, c, g, or t 46 naatttatat agaaatgntc ttttcaaaaa
gcaantttac agttaagaaa antgataaat 60 atttgaaana aatgtaaaga
atgtgnaaat gtgaagaatg tgntcatact ggagagaaan 120 acaagtgtga
agaatgnaca agtgtgaaga atgnacactg gagagaaacc naaatgtgaa 180
gaatgtgntc atactggaga gaaantcata ctggagagaa anacaagtgt gaagaatgna
240 aatgtgaaga atgtgnacac tggagagaaa ccnaaatgtg aagaatgtgn
tcatactgga 300 gagaaanaaa tgtgaagaat gtgncatact gaagagaaac
ntgtgaaaaa tgtggcantc 360 atactggaga gaaanaaatg tgaagaatgt
gnaaatgtga agaatgtgn 409 47 239 DNA Unknown The pattern is an amino
acid coding region mosaic composed of pyknons. misc_feature n
indicates a number of nucleotides that separate the surrounding
pyknons, as specified by Figure 13. misc_feature (1)..(1) n is a,
c, g, or t misc_feature (18)..(18) n is a, c, g, or t misc_feature
(35)..(35) n is a, c, g, or t misc_feature (52)..(52) n is a, c, g,
or t misc_feature (69)..(69) n is a, c, g, or t misc_feature
(86)..(86) n is a, c, g, or t misc_feature (103)..(103) n is a, c,
g, or t misc_feature (120)..(120) n is a, c, g, or t misc_feature
(137)..(137) n is a, c, g, or t misc_feature (154)..(154) n is a,
c, g, or t misc_feature (171)..(171) n is a, c, g, or t
misc_feature (188)..(188) n is a, c, g, or t misc_feature
(205)..(205) n is a, c, g, or t misc_feature (222)..(222) n is a,
c, g, or t misc_feature (239)..(239) n is a, c, g, or t 47
naaatgtgaa gaatgtgnta aacaattctc aaaantcata ctggagagaa anaaatgtga
60 agaatgtgna aatgtgaaga atgtgncata ctgaagagaa acnaaatgtg
aagaatgtgn 120 tcatactgga gagaaantgt gaaaaatgtg gcanaaatgt
gaagaatgtg ncatactgaa 180 gagaaacnca tactgaagag aaacnaaatg
tgaagaatgt gntcatactg gagagaaan 239 48 171 DNA Unknown The pattern
is an amino acid coding region mosaic composed of pyknons.
misc_feature n indicates a number of nucleotides that separate the
surrounding pyknons, as specified by Figure 13. misc_feature
(1)..(1) n is a, c, g, or t misc_feature (18)..(18) n is a, c, g,
or t misc_feature (35)..(35) n is a, c, g, or t misc_feature
(52)..(52) n is a, c, g, or t misc_feature (69)..(69) n is a, c, g,
or t misc_feature (86)..(86) n is a, c, g, or t misc_feature
(103)..(103) n is a, c, g, or t misc_feature (120)..(120) n is a,
c, g, or t misc_feature (137)..(137) n is a, c, g, or t
misc_feature (154)..(154) n is a, c, g, or t misc_feature
(171)..(171) n is a, c, g, or t 48 ncagccagag cagggcanaa agaaacattt
caaantcata ctggagagaa antcatactg 60 gagagaaana aatgtgaaga
atgtgntcat actggagaga aantcatact ggagagaaan 120 aaatgtgaag
aatgtgntca tactggagag aaanaaatgt gaagaatgtg n 171 49 205 DNA
Unknown The pattern is an amino acid coding region mosaic composed
of pyknons. misc_feature n indicates a number of nucleotides that
separate the surrounding pyknons, as specified by Figure 13.
misc_feature (1)..(1) n is a, c, g, or t misc_feature (18)..(18) n
is a, c, g, or t misc_feature (35)..(35) n is a, c, g, or t
misc_feature (52)..(52) n is a, c, g, or t misc_feature (69)..(69)
n is a, c, g, or t misc_feature (86)..(86) n is a, c, g, or t
misc_feature (103)..(103) n is a, c, g, or t misc_feature
(120)..(120) n is a, c, g, or t misc_feature (137)..(137) n is a,
c, g, or t misc_feature (154)..(154) n is a, c, g, or t
misc_feature (171)..(171) n is a, c, g, or t misc_feature
(188)..(188) n is a, c, g, or t misc_feature (205)..(205) n is a,
c, g, or t 49 naaatgtgaa gaatgtgnaa atgtgaagaa tgtgnaaatg
tgaagaatgt gntcatactg 60 gagagaaant catactggag agaaantcat
actggagaga aanaattcat atggaattgn 120 aaatgtgaag aatgtgnact
aatcataaga gaanacactg gagagaaacc naaatgtgaa 180 gaatgtgnaa
atgtgaagaa tgtgn 205 50 256 DNA Unknown The pattern is an amino
acid coding region mosaic composed of pyknons. misc_feature n
indicates a number of nucleotides that separate the surrounding
pyknons, as specified by Figure 13. misc_feature (1)..(1) n is a,
c, g, or t misc_feature (18)..(18) n is a, c, g, or t misc_feature
(35)..(35) n is a, c, g, or t misc_feature (52)..(52) n is a, c, g,
or t misc_feature (69)..(69) n is a, c, g, or t misc_feature
(86)..(86) n is a, c, g, or t misc_feature (103)..(103) n is a, c,
g, or t misc_feature (120)..(120) n is a, c, g, or t misc_feature
(137)..(137) n is a, c, g, or t misc_feature (154)..(154) n is a,
c, g, or t misc_feature (171)..(171) n is a, c, g, or t
misc_feature (188)..(188) n is a, c, g, or t misc_feature
(205)..(205) n is a, c, g, or t misc_feature (222)..(222) n is a,
c, g, or t misc_feature (239)..(239) n is a, c, g, or t
misc_feature (256)..(256) n is a, c, g, or t 50 naaaaatgtg
gaaatgantt taataaattt tcacnaaatg taaagaatgt gnaaagaaat 60
tataccaana aatgtgaaga atgtgnaaat gtgaagaatg tgnaaatgtg aagaatgtgn
120 aaatgtaaag aatgtgnaca ctcctcagcc cttnacactg gagagaaacc
naaatgtgaa 180 gaatgtgnaa atgtgaagaa tgtgnacaag tgtgaagaat
gnaaatgtga agaatgtgnt 240 catactggag agaaan 256 51 205 DNA Unknown
The pattern is an amino acid coding region mosaic composed of
pyknons. misc_feature n indicates a number of nucleotides that
separate the surrounding pyknons, as specified by Figure 13.
misc_feature (1)..(1) n is a, c, g, or t misc_feature (18)..(18) n
is a, c, g, or t misc_feature (35)..(35) n is a, c, g, or t
misc_feature (52)..(52) n is a, c, g, or t misc_feature (69)..(69)
n is a, c, g, or t misc_feature (86)..(86) n is a, c, g, or t
misc_feature (103)..(103) n is a, c, g, or t misc_feature
(120)..(120) n is a, c, g, or t misc_feature (137)..(137) n is a,
c, g, or t misc_feature (154)..(154) n is a, c, g, or t
misc_feature (171)..(171) n is a, c, g, or t misc_feature
(188)..(188) n is a, c, g, or t misc_feature (205)..(205) n is a,
c, g, or t 51 ncagccagag cagggcanaa agaaacattt caaantcata
ctggagagaa antcatactg 60 gagagaaana aatgtgaaga atgtgntcat
actggagaga aantcatact ggagagaaan 120 aaatgtgaag aatgtgntca
tactggagag aaanaaatgt gaagaatgtg ntcatactgg 180 agagaaanaa
atgtgaagaa tgtgn 205 52 205 DNA Unknown The pattern is an amino
acid coding region mosaic composed of pyknons. misc_feature n
indicates a number of nucleotides that separate the surrounding
pyknons, as specified by Figure 13. misc_feature (1)..(1) n is a,
c, g, or t misc_feature (18)..(18) n is a, c, g, or t misc_feature
(35)..(35) n is a, c, g, or t misc_feature (52)..(52) n is a, c, g,
or t misc_feature (69)..(69) n is a, c, g, or t misc_feature
(86)..(86) n is a, c, g, or t misc_feature (103)..(103) n is a, c,
g, or t misc_feature (120)..(120) n is a, c, g, or t misc_feature
(137)..(137) n is a, c, g, or t misc_feature (154)..(154) n is a,
c, g, or t misc_feature (171)..(171) n is a, c, g, or t
misc_feature (188)..(188) n is a, c, g, or t misc_feature
(205)..(205) n is a, c, g, or t 52 ntttacaaat aagaaaanaa atgtaaagaa
tgtgncatac tgaagagaaa cnacaagtgt 60 gaagaatgna aatgtgaaga
atgtgnaaat gtgaagaatg tgnaaatgtg aagaatgtgn 120 catactgaag
agaaacnaaa tgtgaagaat gtgncatact gaagagaaac ntgaagaatg 180
tatcagantc atactggaga gaaan 205
* * * * *