U.S. patent application number 09/270391 was filed with the patent office on 2001-08-02 for isolation and identification of control sequences and genes modulated by transcription factors.
Invention is credited to ADAMSON, EILEEN, DE BELLE, IAN, MERCOLA, DANIEL.
Application Number | 20010010907 09/270391 |
Document ID | / |
Family ID | 23031159 |
Filed Date | 2001-08-02 |
United States Patent
Application |
20010010907 |
Kind Code |
A1 |
MERCOLA, DANIEL ; et
al. |
August 2, 2001 |
ISOLATION AND IDENTIFICATION OF CONTROL SEQUENCES AND GENES
MODULATED BY TRANSCRIPTION FACTORS
Abstract
One aspect of the present invention is a method for isolating at
least one nucleic acid molecule comprising at least a portion of a
gene, including: cross-linking at least one transcription factor to
a nucleic acid molecule in at least one cell or at least one
nucleus, forming at least one transcription factor/nucleic acid
molecule complex; fragmenting the nucleic acid molecule to form at
least one transcription factor/nucleic acid molecule fragment; and
isolating at least one nucleic acid molecule from said at least one
transcription factor/nucleic acid molecule fragment to form at
least one isolated nucleic acid molecule fragment; wherein said at
least one isolated nucleic acid molecule fragment comprises at
least a portion of the first exon of a gene whose expression is
modulated by said transcription factor; further wherein said at
least one isolated nucleic acid molecule fragment comprises at
least one transcription factor binding site that is in close
proximity to or operably linked to said first exon of a gene.
Another aspect of the present invention is a method for isolating
at least one nucleic acid molecule that can include at least a
portion of a gene operably linked to or in close proximity to a
nucleic acid sequence that binds with at least one transcription
factor, comprising: cross-inking at least one transcription factor
to a nucleic acid molecule in at least one cell or at least one
nucleus, forming at least one transcription factor/nucleic acid
molecule complex; fragmenting the nucleic acid molecule to form at
least one transcription factor/nucleic acid molecule fragment;
isolating at least one nucleic acid molecule fragment from said at
least one transcription factor/nucleic acid molecule fragment to
form at least one isolated nucleic acid molecule fragment;
combining the at least one isolated nucleic acid molecule fragment
with either: a cDNA library, or a cDNA derived from reverse
transcription of a population of RNA molecules, to form a mixture
comprising isolated nucleic acid molecule fragment/cDNA complexes;
and isolating the cDNA that binds with the isolated nucleic acid
molecule fragment to obtain at least one isolated cDNA
molecule.
Inventors: |
MERCOLA, DANIEL; (RANCHO
SANTA FE, CA) ; ADAMSON, EILEEN; (RANCHO SANTA FE,
CA) ; DE BELLE, IAN; (SAN DIEGO, CA) |
Correspondence
Address: |
PATENT SOLUTIONS
11404 SORRENTO VALLEY ROAD
SUITE 104
SAN DIEGO
CA
92121
US
|
Family ID: |
23031159 |
Appl. No.: |
09/270391 |
Filed: |
March 16, 1999 |
Current U.S.
Class: |
435/6.11 ; 506/9;
536/24.1 |
Current CPC
Class: |
C12N 15/1034
20130101 |
Class at
Publication: |
435/6 ;
536/24.1 |
International
Class: |
C12Q 001/68; C07H
021/04 |
Goverment Interests
[0001] This invention was made partially with government support
awarded by the Public Health Service, National Institutes of Health
Grant ROI CA 67888. The United States Government may have certain
rights in the invention.
Claims
We claim:
1. A method for isolating a nucleic acid molecule comprising at
least a portion of a gene, comprising: a) cross-linking at least
one transcription factor to a nucleic acid molecule in at least one
cell or at least one nucleus, forming at least one transcription
factor/nucleic acid molecule complex; b) fragmenting said nucleic
acid molecule to form at least one transcription factor/nucleic
acid molecule fragment; and c) isolating the nucleic acid molecule
fragment from said at least one transcription factor/nucleic acid
molecule fragment to form at least one isolated nucleic acid
molecule fragment; wherein said at least one isolated nucleic acid
molecule fragment comprises at least a portion of the first exon of
a gene whose expression is modulated by said transcription factor;
further wherein said at least one isolated nucleic acid molecule
fragment comprises at least one transcription factor binding site
that is operably linked or in close proximity to said first exon of
a gene.
2. The method of claim 1, wherein said nucleic acid molecule
comprises DNA.
3. The method of claim 1, wherein said transcription factor is
selected from the group consisting of leucine zipper factors,
helix-loop-helix factors, helix-loop-helix/leucine zipper factors,
NF-1 factors, RF-X factors, bHSH factors, Cys4 zinc finger of
nuclear receptor factors, diverse Cys4 zinc finger factors,
Cys2His2 zinc finger factors, Cys6 cystein-zinc cluster factors,
Homeo domain factors, paired box factors, fork head/winged helix
factors, heat shock factors, tryptophane cluster factors, TEA
domain factors, RHR factors, p53 factors, MADS box factors,
beta-barrel alpha-helix factors, TATA-binding factors, HMG factors,
heteromeric CCAAT factors, Grainyhead factors, cold-shock domain
factors, Runt factors, copper fist factors, HMGI(Y) factors, STAT
factors and pocket domain factors.
4. The method of claim 1, wherein said transcription factor is
Egr-1.
5. The method of claim 1, wherein said at least one cell is a
living cell.
6. The method of claim 1, wherein said at least one cell is in
culture, from at least a portion of an embryo or from a tissue;
wherein said at least one cell is derived from a vertebrate, an
invertebrate, a mammal, or a human.
7. The method of claim 1, further comprising sequencing said at
least one isolated nucleic acid molecule fragment.
8. The method of claim 1, wherein said at least one isolated
nucleic acid molecule fragment is amplified.
9. The method of claim 1, wherein said at least one isolated
nucleic acid molecule fragment comprises a detectable label.
10. The method of claim 1, wherein said at least one isolated
nucleic acid molecule fragment or a nucleic acid molecule
comprising at least a portion of said at least one nucleic acid
molecule fragment is used as a probe to screen an immobilized
nucleic acid molecule array.
11. A method for isolating a nucleic acid molecule, comprising: a)
cross-linking at least one transcription factor to a nucleic acid
molecule in at least one cell or at least one nucleus, forming at
least one transcription factor/nucleic acid molecule complex; b)
fragmenting said nucleic acid molecule to form at least one
transcription factor/nucleic acid molecule fragment; c) isolating
at least one nucleic acid molecule fragment from said at least one
transcription factor/nucleic acid molecule fragment to form an
isolated nucleic acid molecule fragment; d) combining said isolated
nucleic acid molecule fragment with either: 1) a cDNA library, or
2) a cDNA derived from reverse transcription of a population of RNA
molecules, to form a mixture comprising isolated nucleic acid
molecule fragment/cDNA complexes; and e) isolating the cDNA that
binds with the isolated nucleic acid molecule fragment to obtain at
least one isolated cDNA molecule.
12. The method of claim 11, wherein said at least one isolated cDNA
molecule comprises at least a portion of a gene operably linked to
or in close proximity to a nucleic acid sequence that binds with at
least one transcription factor.
13. The method of claim 11, wherein said nucleic acid molecule
comprises DNA.
14. The method of claim 11, wherein said transcription factor is
selected from the group consisting of leucine zipper factors,
helix-loop-helix factors, helix-loop-helix/leucine zipper factors,
NF-1 factors, RF-X factors, bHSH factors, Cys4 zinc finger of
nuclear receptor factors, diverse Cys4 zinc finger factors,
Cys2His2 zinc finger factors, Cys6 cystein-zinc cluster factors,
Homeo domain factors, paired box factors, fork head/winged helix
factors, heat shock factors, tryptophane cluster factors, TEA
domain factors, RHR factors, p53 factors, MADS box factors,
beta-barrel alpha-helix factors, TATA-binding factors, HMG factors,
heteromeric CCAAT factors, Grainyhead factors, cold-shock domain
factors, Runt factors, copper fist factors, HMGI(Y) factors, STAT
factors and pocket domain factors.
15. The method of claim 11, wherein said transcription factor is
Egr-1.
16. The method of claim 1, wherein said at least one cell comprises
cultured cells, at least a portion of an embryo or tissue.
17. The method of claim 11, wherein said at least one isolated cDNA
molecule or said at least one isolated nucleic acid molecule
fragment is amplified.
18. The method of claim 11, wherein said at least one isolated cDNA
molecule is amplified using said at least one isolated nucleic acid
molecule fragment.
19. The method of claim 11, wherein said at least one isolated cDNA
molecule or said at least one isolated nucleic acid molecule
fragment is detectably labeled to form at least one detectably
labeled cDNA or at least one detectably labeled nucleic acid
molecule fragment.
20. The method of claim 11, wherein said at least one isolated cDNA
molecule, said at least one isolated nucleic acid molecule
fragment, a nucleic acid molecule comprising at least a portion of
said at least one isolated cDNA molecule or a nucleic acid molecule
comprising at least a portion of said at least one nucleic acid
molecule fragment is used as a probe to screen an immobilized
nucleic acid molecule array.
Description
TECHNICAL FIELD
[0002] The present invention generally relates generally to the
fields of molecular biology, in particular transcription factors
and the identification of genes modulated by transcription
factors.
BACKGROUND
[0003] Gene expression is modulated by proteins that bind to
specific sequences in the control regions of genes. Once bound,
these factors modulate transcription of the DNA into messenger RNA.
A transcription factor typically influences the expression of
several genes. By identifying these genes, the mechanisms of a
cell's response during development, under stress conditions, or
while undergoing tumorigenesis may be revealed and
investigated.
[0004] In order to elucidate these mechanisms, it is necessary to
identify the gene targets of the transcription factors that are
active in the cell. A variety of methods have been utilized but
most are indirect. For example, both subtraction cloning and
differential RNA display can be used to obtain cDNAs of genes that
are unique to a particular condition in which the transcription
factor is present. The disadvantage of these methods is that the
genes obtained may not be directly regulated by the transcription
factor of interest. The genes may be controlled by other
transcription factors that are induced under the same conditions or
that act downstream of the transcription factor of interest.
Consequently, the genes identified in these methods may not be part
of the regulatory program being investigated. Another process
screens DNA arrays to identify the genes that hybridize to RNA
prepared from cells which express a particular transcription factor
but not to RNA isolated from cells which do not express the
transcription factor. Unfortunately, this technique also may not
lead to identification of genes under the direct regulation of the
transcription factor.
[0005] To understand a modulated network, such as a signal
transduction pathway, it is important to characterize as many of
the genes that are being controlled by the transcription factor as
possible. Unfortunately the procedure of isolating the genes from
libraries has hindered progress toward identifying a set of genes
regulated together by the transcription factor of interest.
Screening cDNA libraries by hybridization to obtain genes
corresponding to the DNA fragments obtained by a variety of methods
requires that each fragment isolated be used individually to screen
the library. This is extremely time-consuming, labor-intensive, and
costly. Consequently there is a need in the industry to increase
the efficiency of obtaining gene targets of transcription factors
of interest.
BRIEF DESCRIPTION OF THE FIGURES
[0006] FIG. 1 depicts a schematic diagram of one aspect of the
present invention.
[0007] FIG. 2 depicts a Western Blot showing the recovery of Egr-1
from cross-linked nucleic acids.
[0008] FIG. 3 depicts an ethidium bromide stained gel showing PCR
products derived from nucleic acid molecules cross-linked to
Egr-1.
[0009] FIG. 4 depicts TGF-betal promoter region PCR products
obtained using methods of the present invention.
[0010] FIG. 5 depicts gel shift assays using nucleic acid molecules
identified using a method of the present invention and recombinant
Egr-1 protein.
[0011] FIG. 6 depicts competitive gel shift assays using
recombinant Egr-1, Egr-binding domains and nucleic acid molecules
identified by the methods of the present invention.
[0012] FIG. 7 depicts the functionality of the nucleic acids of the
present invention as modulators of gene expression using a reporter
gene in vivo.
[0013] FIG. 8 depicts PCR amplification products from two
independent amplifications, (lane 2 and lane 5) derived from a
mixture of a cDNA library and a 5' PCR primer derived from DNA
cross-linked to Egr-1 in a living cell, and a 3' primer derived
from a library vector sequences.
[0014] FIG. 9 depicts in vitro transcription and translation of a
DNA sequence obtained using the methods of the present invention to
obtain a 37 kDa protein (SEQ ID NO: 16).
[0015] FIG. 10 depicts a gel shift assay for binding of Egr-1 to
the 5' region of a nucleic acid molecule isolated using the methods
of the present invention.
[0016] FIG. 11 depicts the complete sequence of a clone identified
using the methods of the present invention (SEQ ID NO: 15) (Clone
1). Sequences used in gel-shift methods are in bold with a
presumptive Egr-1 binding site in bold and italics. An open reading
frame consisting of 702 base pairs is underlined. A presumptive
TATA box (TTATAT) is also shown in bold.
[0017] FIG. 12 depicts a luciferase assay using the 5' non-coding
region of Clone 1 upstream of a luciferase reporter gene assay in
vivo.
SUMMARY
[0018] The present invention recognizes that nucleotide sequences
that regulate the expression of a gene can be identified by the
binding of at least one transcription factor to at least a portion
of a nucleotide sequence that regulates the expression of a gene.
The present invention also recognizes that nucleotide sequences
that encode at least a portion of a gene can be isolated, sequenced
and characterized based on the binding of at least one
transcription factor to a nucleotide sequence in close proximity to
such nucleotide sequences that encode at least a portion of a
gene.
[0019] One aspect of the present invention is a method for
isolating at least one nucleic acid molecule comprising at least a
portion of a gene, including: cross-linking at least one
transcription factor to a nucleic acid molecule in at least one
cell or at least one nucleus, forming at least one transcription
factor/nucleic acid molecule complex; fragmenting the nucleic acid
molecule to form at least one transcription factor/nucleic acid
molecule fragment; and isolating at least one nucleic acid molecule
from said at least one transcription factor/nucleic acid molecule
fragment to form at least one isolated nucleic acid molecule
fragment; wherein said at least one isolated nucleic acid molecule
fragment comprises at least a portion of the first exon of a gene
whose expression is modulated by said transcription factor; further
wherein said at least one isolated nucleic acid molecule fragment
comprises at least one transcription factor binding site that is in
close proximity to or operably linked to said first exon of a gene.
The nucleic acid molecule is preferably DNA and the transcription
factor can be any transcription factor known in the art or later
identified. The cell can be any cell, such as a living or dead
eukaryotic or prokaryotic cell. The isolated nucleic acid molecule
fragment can be amplified, cloned and sequenced using appropriate
methods. Such sequences can be compared to databases of sequences
such as they are known in the art or later developed to identify
novel genes. The at least one isolated nucleic acid molecule
fragment, or a product or portion thereof, can be linked to a
detectable label and be used as a probe to screen at least one
immobilized nucleic acid molecule, such as on a nucleic acid
molecule array.
[0020] Another aspect of the present invention is a method for
isolating at least one nucleic acid molecule that can include at
least a portion of a gene operably linked to or in close proximity
to a nucleic acid sequence that binds with at least one
transcription factor, comprising: cross-linking at least one
transcription factor to a nucleic acid molecule in at least one
cell or at least one nucleus, forming at least one transcription
factor/nucleic acid molecule complex; fragmenting the nucleic acid
molecule to form at least one transcription factor/nucleic acid
molecule fragment; isolating at least one nucleic acid molecule
fragment from said at least one transcription factor/nucleic acid
molecule fragment to form at least one isolated nucleic acid
molecule fragment; combining the at least one isolated nucleic acid
molecule fragment with either: a cDNA library, or cDNA derived from
reverse transcription of a population of RNA molecules, to form a
mixture comprising isolated nucleic acid molecule fragment/cDNA
complexes; and isolating the cDNA that binds with the isolated
nucleic acid molecule fragment to obtain at least one isolated cDNA
molecule. The nucleic acid molecule is preferably DNA and the
transcription factor can be any transcription factor known in the
art or later identified. The cell can be any cell such as a living
or dead eukaryotic or prokaryotic cell. The isolated nucleic acid
molecule fragment or isolated cDNA molecule can be sequenced and
compared to databases of sequences such as they are known in the
art or later developed to identify novel genes. The isolated
nucleic acid molecule fragment or isolated cDNA molecule can also
be amplified using appropriate methods, such as PCR, and linked to
a detectable label. Preferably, the isolated cDNA molecule is
amplified using the isolated nucleic acid molecule fragment as a
primer, such as a 3' primer or a 5' primer, more preferably as a 5'
primer. The isolated nucleic acid molecule fragment or a portion
thereof, or the isolated cDNA molecule or a portion thereof, can
also be used as a probe to screen at least one immobilized nucleic
acid molecule, such as on a nucleic acid molecule array.
DETAILED DESCRIPTION OF THE INVENTION
[0021] Definitions
[0022] Unless defined otherwise, all technical and scientific terms
used herein have the same meaning as commonly understood by one of
ordinary skill in the art to which this invention belongs.
Generally, the nomenclature used herein and the laboratory
procedures in cell culture, chemistry, microbiology, molecular
biology, cell science and cell culture described below are well
known and commonly employed in the art. Conventional methods are
used for these procedures, such as those provided in the art and
various general references (Sambrook et al., Molecular Cloning: A
Laboratory Manual, 2nd edition, Cold Spring Harbor Press, Cold
Spring Harbor, N.Y. (1989)). Where a term is provided in the
singular, the inventors also contemplate the plural of that term.
The nomenclature used herein and the laboratory procedures
described below are those well known and commonly employed in the
art. As employed throughout the disclosure, the following terms,
unless otherwise indicated, shall be understood to have the
following meanings:
[0023] "Isolated polynucleotide" refers to a polynucleotide of
genomic, cDNA, or synthetic origin, or some combination thereof,
which by virtue of its origin, the isolated polynucleotide (1) is
not associated with the cell in which the isolated polynucleotide
is found in nature, or (2) is operably linked to a polynucleotide
that it is not linked to in nature. The isolated polynucleotide can
optionally be linked to promoters, enhancers, or other regulatory
sequences using methods known in the art (Sambrook et al., supra,
1989).
[0024] "Isolated protein" refers to a protein derived from cDNA or
recombinant RNA, of synthetic origin, or some combination thereof,
which by virtue of its origin the isolated protein (1) is not
associated with proteins normally found within nature, or (2) is
isolated from the cell in which it normally occurs, or (3) is
isolated and substantially free of other proteins from the same
cellular source, for example, free of cellular proteins), or (4) is
expressed by a cell from a different species, or (5) does not occur
in nature by isolation procedures known in the art.
[0025] "Polypeptide" is used herein as a generic term to refer to
native protein, fragments, or analogs thereof.
[0026] "Active fragment" refers to a fragment of a parent molecule,
such as an organic molecule, nucleic acid molecule, or protein or
polypeptide, or combinations thereof, that retains at least one
activity of the parent molecule.
[0027] "Naturally occurring" refers to the fact that an object can
be found in nature. For example, a polypeptide or polynucleotide
sequence that is present in an organism, including viruses, that
can be isolated from a source in nature and which has not been
intentionally modified by man in the laboratory is naturally
occurring.
[0028] "Operably linked" refers to a juxtaposition wherein the
components so described are in a relationship permitting them to
function in their intended manner. For example, a control sequence
operably linked to a coding sequence is ligated in such a way that
expression of the coding sequence is achieved under conditions
compatible with the control sequences.
[0029] "Control sequences" refer to polynucleotide sequences that
effect the expression of coding and non-coding sequences to which
they are ligated. The nature of such control sequences differs
depending upon the host organism; in prokaryotes, such control
sequences generally include promoter, ribosomal binding site, and
transcription termination sequences; in eukaryotes, generally, such
control sequences include promoters, enhancers and transcription
termination sequences. The term control sequences is intended to
include components whose presence can influence expression, and can
also include additional components whose presence is advantageous,
for example, leader sequences and fusion partner sequences.
[0030] "Polynucleotide" refers to a polymeric form of nucleotides
of a least ten bases in length, either ribonucleotides or
deoxynucleotides or a modified from of either type of nucleotide.
The term includes single and double stranded forms of DNA or RNA or
a combination of both.
[0031] "Nucleic acid molecule" refers to a polymeric form of
nucleotides of at least two bases in length, either ribonucleotides
or deoxynucleotides or a modified form of either type of
nucleotide. The term includes single and double stranded forms of
DNA or RNA or a combination of both.
[0032] "Nucleotide" refers to a single nucleotide that can
polymerize to form a polynucleotide or a nucleic acid molecule.
[0033] "Directly" in the context of a biological process or
processes, refers to direct causation of a process that does not
require intermediate steps, usually caused by one molecule
contacting or binding to another molecule (the same type or
different type of molecule). For example, molecule A contacts
molecule B, which causes molecule B to exert effect X that is part
of a biological process.
[0034] "Indirectly" in the context of a biological process or
precesses, refers to indirect causation that requires intermediate
steps, usually caused by two or more direct steps. For example,
molecule A contacts molecule B to exert effect X which in turn
causes effect Y.
[0035] "Sequence homology" refers to the proportion of base matches
between two nucleic acid sequences or the proportion of amino acid
matches between two amino acid sequences. When sequence homology is
expressed as a percentage, for example 50%, the percentage denotes
the proportion of matches of the length of sequences from a desired
sequence that is compared to some other sequence. Gaps (in either
of the two sequences) are permitted to maximize matching; gap
lengths of 15 bases or less are usually used, 6 bases or less are
preferred with 2 bases or less more preferred. When using
oligonucleotides as probes or treatments, the sequence homology
between the target nucleic acid and the oligonucleotide sequence is
generally not less than 17 target base matches out of 20 possible
oligonucleotide base pair matches (85%); preferably not less than 9
matches out of 10 possible base pair matches (90%), and most
preferably not less than 19 matches out of 20 possible base pair
matches (95%).
[0036] "Selectively hybridize" refers to at least two molecules
that can detectably and specifically bind. For example, a molecule
can be a polynucleotides, oligonucleotides and fragments thereof
that selectively hybridize to target nucleic acid strands, under
hybridization and wash conditions that minimize appreciable amounts
of detectable binding to nonspecific nucleic acids. High stringency
conditions can be used to achieve selective hybridization
conditions as known in the art. Generally, the nucleic acid
sequence homology between the polynucleotides, oligonucleotides,
and fragments thereof and a nucleic acid sequence of interest will
be at least 30%, and more typically and preferably of at least 40%,
50%, 60%, 70%, 80% or 90%.
[0037] Hybridization and washing conditions are typically performed
at high stringency according to conventional hybridization
procedures. Typical hybridization conditions and methods for
screening plaque lifts and other purposes are known in the art
(Benton and Davis, Science 196.180 (1978); Sambrook et al., supra,
(1989)).
[0038] Two amino acid sequences are homologous if there is a
partial or complete identity between their sequences. For example,
85% homology means that 85% of the amino acids are identical when
the two sequences are aligned for maximum matching. Gaps (in either
of the two sequences being matched) are allowed in maximizing
matching; gap lengths of 5 or less are preferred with 2 or less
being more preferred. Alternatively and preferably, two protein
sequences (or polypeptide sequences derived from them of at least
30 amino acids in length) are homologous, as this term is used
herein, if they have an alignment score of at least 5 (in standard
deviation units) using the program ALIGN with the mutation data
matrix and a gap penalty of 6 or greater (Dayhoff, in Atlas of
Protein Sequence and Structure, National Biomedical Research
Foundation, volume 5, pp. 101-110 (1972) and Supplement 2, pp.
1-10). The two sequences or parts thereof are more preferably
homologous if their amino acids are greater than or equal to 30%
identical when optimally aligned using the ALIGN program.
[0039] "Corresponds to" refers to a polynucleotide sequence that is
homologous (for example is identical, not strictly evolutionarily
related) to all or a portion of a reference polynucleotide
sequence, or that a polypeptide sequence is identical to all or a
portion of a reference polypeptide sequence. In contradistinction,
the term "complementary to" is used herein to mean that the
complementary sequence will hybridize to all or a portion of a
reference polynucleotide sequence. For illustration, the nucleotide
sequence TATAC corresponds to a reference sequence TATAC and is
complementary to a reference sequence GTATA.
[0040] The following terms are used to describe the sequence
relationships between two or more polynucleotides: "reference
sequence," "comparison window," "sequence identity," "percentage of
sequence identity," and "substantial identity." A reference
sequence is a defined sequence used as a basis for a sequence
comparison; a reference sequence can be a subset of a larger
sequence, for example, as a segment of a full length cDNA or gene
sequence given in a sequence listing, or may comprise a complete
cDNA or gene sequence. Generally, a reference sequence is at least
20 nucleotides in length, frequently at least 25 nucleotides in
length, and often at least 50 nucleotides in length. Since two
polynucleotides can each (1) comprise a sequence (for example a
portion of the complete polynucleotide sequence) that is similar
between the two polynucleotides, and (2) may further comprise a
sequence that is divergent between the two polynucleotides,
sequence comparisons between two (or more) polynucleotides are
typically performed by comparing sequences of the two
polynucleotides over a "comparison window" to identify and compare
local regions of sequence similarity. A comparison window, as used
herein, refers to a conceptual segment of at least 20 contiguous
nucleotide positions wherein a polynucleotide sequence may be
compared to a reference sequence of at least 20 contiguous
nucleotides and wherein the portion of the polynucleotide sequence
in the comparison window can comprise additions and deletions (for
example, gaps) of 20 percent or less as compared to the reference
sequence (which would not comprise additions or deletions) for
optimal alignment of the two sequences. Optimal alignment of
sequences for aligning a comparison window can be conducted by the
local homology algorithm (Smith and Waterman, Adv. Appl. Math.,
2:482 (1981)), by the homology alignment algorithm (Needleman and
Wunsch, J. Mol. Bio., 48:443 (1970)), by the search for similarity
method (Pearson and Lipman, Proc. Natl. Acid. Sci. U.S.A. 85:2444
(1988)), by the computerized implementations of these algorithms
such as GAP, BESTFIT, FASTA and TFASTA (Wisconsin Genetics Software
Page Release 7.0, Genetics Computer Group, Madison, Wis.), BLAST
(http://ncbi.nlm.nih.gov/BLAST (Mar. 7, 1999) and Altschul et al.,
Nucleic Acids Res. 25:3389-3402 (1997)) or by inspection.
Preferably, the best alignment (for example, the result having the
highest percentage of homology over the comparison window)
generated by the various methods is selected.
[0041] "Sequence identity" means that two polynucleotide sequences
are identical (for example, on a nucleotide-by-nucleotide basis)
over the window of comparison.
[0042] "Percentage of sequence identity" is calculated by comparing
two optimally aligned sequences over the window of comparison,
determining the number of positions at which the identical nucleic
acid base occurs in both sequences to yield the number of matched
positions, dividing the number of matched positions by the total
number of positions in the window of comparison (for example, the
window size), and multiplying the result by 100 to yield the
percentage of sequence identity.
[0043] "Substantial identity" as used herein denotes a
characteristic of a polynucleotide sequence, wherein the
polynucleotide comprises a sequence that has at least 30 percent
sequence identity, preferably at least 50 to 60 percent sequence,
more usually at least 60 percent sequence identity as compared to a
reference sequence over a comparison window of at least 20
nucleotide positions, frequently over a window of at least 25 to 50
nucleotides, wherein the percentage of sequence identity is
calculated by comparing the reference sequence to the
polynucleotide sequence that may include deletions or additions
which total 20 percent or less of the reference sequence over the
window of comparison. "Substantial identity" as applied to
polypeptides herein means that two peptide sequences, when
optimally aligned, such as by the programs GAP or BESTFIT using
default gap weights, share at least 30 percent sequence identity,
preferably at least 40 percent sequence identity, and more
preferably at least 50 percent sequence identity, and most
preferably at lest 60 percent sequence identity. Preferably,
residue positions, which are not identical, differ by conservative
amino acid substitutions.
[0044] "Fragment" as used herein for a protein, peptide or
polypeptide is a portion of the parent molecule. Fragment as used
herein for a nucleic acid molecule is a portion of the parent
molecule.
[0045] "Active fragment" as used herein for a protein, peptide or
polypeptide is a fragment of a parent molecule that retains at
least one activity of the parent protein, peptide or polypeptide.
Active fragment as used herein for a nucleic acid molecule is a
fragment that retains at least one activity of the parent nucleic
acid molecule. An active fragment of a nucleic acid molecule also
refers to a fragment of a nucleic acid molecule that encodes a
protein, peptide or polypeptide having at least one activity of the
full-length protein.
[0046] "Conservative amino acid substitutions" refer to the
interchangeability of residues having similar side chains. For
example, a group of amino acids having aliphatic side chains is
glycine, alanine, valine, leucine, and isoleucine; a group of amino
acids having aliphatic-hydroxyl side chains is serine and
threonine; a group of amino acids having amide-containing side
chains is asparagine and glutamine; a group of amino acids having
aromatic side chains is phenylalanine, tyrosine and tryptophan; a
group of amino acids having basic side chains is lysine, arginine
and histidine; and a group of amino acids having sulfur-containing
side chains is cysteine and methionine. Preferred conservative
amino acid substitution groups are: valine-leucine-isoleucin- e;
phenylalanine-tyrosine; lysine-arginine; alanine-valine; glutamic
acid-aspartic acid; and asparagine-glutamine.
[0047] "Modulation" or "modulated" refers to the capacity to either
enhance or interfere with a functional property of a biological
activity or process, for example, but not limited to, enzyme
activity, transcription factor activity or receptor binding. Such
enhancement or interference may be contingent on the occurrence of
a specific event, such as activation of a signal transduction
pathway and/or may be manifest only in particular cell types.
[0048] "Modulator" refers to a chemical (naturally occurring or
non-naturally occurring), such as a biological macromolecule (for
example, nucleic acid, protein, non-peptide or organic molecule) or
an extract made from biological materials, such as prokaryotes,
bacteria, eukaryotes, plants, fungi, multicellular organisms or
animals, invertebrates, vertebrates, mammals and humans, including,
where appropriate, extracts of: whole organisms or portions of
organisms, cells, organs, tissues, fluids, whole cultures or
portions of cultures, or environmental samples or portions thereof
that alters the activity of a biological process or molecule, such
as for example, a receptor, enzyme or transcription factor
Modulators are typically evaluated for potential activity to
enhance or interfere with (directly or indirectly) a biological
process or processes (for example, agonist, partial antagonist,
partial agonist, antagonist, antineoplastic agent, cytotoxins,
inhibitors of neoplastic transformation or cell proliferation, cell
proliferation promoting agents, antiviral agents, antimicrobial
agents, antibacterial agents, antibiotics, and the like) by
inclusion in assays described herein. The activity of a modulator
may be known, unknown or partially known.
[0049] "Label" or "labeled" refers to incorporation of a detectable
marker, for example by incorporation of a radiolabled compound or
attachment to a polypeptide of moieties such as biotin that can be
detected by the binding of a section moiety, such as marked avidin.
Various methods of labeling polypeptide, nucleic acids,
carbohydrates, and other biological or organic molecules are known
in the art. Labels can be radioactive, fluorescent, chromagenic,
chemiluminescent, or have other readouts or properties known in the
art or later developed. Detection can be based on enzymatic
activity, such as beta-galactosidase, beta-lactamase, horseradish
peroxidase, alkaline phosphatase, luciferase; radioisotopes such as
.sup.3H, .sup.14C, .sup.35S, .sup.125I or .sup.131I); fluorescent
proteins, such as green fluorescent proteins; or other fluorescent
labels, such as FITC, rhodamine, and lanthanides. Where
appropriate, these labels can be the product of the expression of
reporter genes, as that term is understood in the art. Examples of
reporter genes are beta-lactamase (U.S. Pat. No. 5,741,657 to Tsien
et al., issued Apr. 21, 1998) and green fluorescent protein (U.S.
Pat. No. 5,777,079 to Tsien et al,, issued Jul. 7, 1998; U.S. Pat.
No. 5,804,387 to Cormack et al., issued Sep. 8, 1998).
[0050] As used herein a nucleic acid molecule "encodes" a
polypeptide if transcription of the nucleic acid molecule and
translation of the MRNA produce the polypeptide. Thus, nucleic acid
molecules of the present invention include those whose nucleotide
sequence encodes a polypeptide directly, such as cDNA, or whose
nucleotide sequence includes introns that are spliced out following
transcription into MRNA, such as genomic DNA. It also includes
nucleic acid molecules having sequences which are degenerate
versions of any of the aforementioned nucleotide sequences.
[0051] "Transcription factor" means a molecule that can modulate
the expression or transcription of a gene or nucleic acid sequence.
Such transcription factors are known in the art, such as those
described in
http://transfac.gbf-braunschweig.de/TRANSFAC/cl/cl.html (Feb. 17,
1999). Transcription factors include, but are not limited to,
leucine zipper factors, helix-loop-helix factors,
helix-loop-helix/leucine zipper factors, NF-1 factors, RF-X
factors, bHSH factors, Cys4 zinc finger of nuclear receptor
factors, diverse Cys4 zinc finger factors, Cys2His2 zinc finger
factors, Cys6 cystein-zinc cluster factors, Homeo domain factors,
paired box factors, fork head/winged helix factors, heat shock
factors, tryptophane cluster factors, TEA domain factors, RHR
factors, p53 factors, MADS box factors, beta-barrel alpha-helix
factors, TATA-binding factors, HMG factors, heteromeric CCAAT
factors, Grainyhead factors, cold-shock domain factors, Runt
factors, copper fist factors, HMGI(Y) factors, STAT factors and
pocket domain factors.
[0052] "In close proximity" means within between about 0 and about
1,000 nucleotide bases, preferably within between about 10 and
about 750 nucleotide bases or within between about 20 and about 500
nucleotide bases, more preferably within between about 30 and about
300 nucleotide bases or with between about 40 and about 200
nucleotide bases, and most preferably within between about 50 and
about 100 nucleotide bases of a nucleotide that binds with a
transcription factor.
[0053] Other technical terms used herein have their ordinary
meaning in the art that they are used, as exemplified by a variety
of technical dictionaries, such as the McGraw-Hill Dictionary of
Chemical Terms and the Stedman's Medical Dictionary.
[0054] Introduction
[0055] The present invention recognizes that nucleotide sequences
that regulate the expression of a gene can be identified by the
binding of transcription factors to such sequences. Such sequences,
and genes that are regulated by such transcription factors, can be
isolated, sequenced and characterized.
[0056] As a non-limiting introduction to the breadth of the present
invention, the present invention includes several general and
useful aspects, including:
[0057] One aspect of the present invention is a method for
isolating at least one nucleic acid molecule comprising at least a
portion of a gene, including: cross-linking at least one
transcription factor to a nucleic acid molecule in at least one
cell or at least one nucleus, forming at least one transcription
factor/nucleic acid molecule complex; fragmenting the nucleic acid
molecule to form at least one transcription factor/nucleic acid
molecule fragment; and isolating at least one nucleic acid molecule
from said at least one transcription factor/nucleic acid molecule
fragment to form at least one isolated nucleic acid molecule
fragment; wherein said at least one isolated nucleic acid molecule
fragment comprises at least a portion of the first exon of a gene
whose expression is modulated by said transcription factor; further
wherein said at least one isolated nucleic acid molecule fragment
comprises at least one transcription factor binding site that is in
close proximity to or operably linked to said first exon of a gene.
The nucleic acid molecule is preferably DNA and the transcription
factor can be any transcription factor known in the art or later
identified. The cell can be any cell, such as a living or dead
eukaryotic or prokaryotic cell. The isolated nucleic acid molecule
fragment can be amplified, cloned and sequenced using appropriate
methods. Such sequences can be compared to databases of sequences
such as they are known in the art or later developed to identify
novel genes. The at least one isolated nucleic acid molecule
fragment, or a product or portion thereof, can be linked to a
detectable label and be used as a probe to screen at least one
immobilized nucleic acid molecule, such as on a nucleic acid
molecule array.
[0058] Another aspect of the present invention is a method for
isolating at least one nucleic acid molecule that can include at
least a portion of a gene operably linked to or in close proximity
to a nucleic acid sequence that binds with at least one
transcription factor, comprising: cross-linking at least one
transcription factor to a nucleic acid molecule in at least one
cell or at least one nucleus, forming at least one transcription
factor/nucleic acid molecule complex; fragmenting the nucleic acid
molecule to form at least one transcription factor/nucleic acid
molecule fragment; isolating at least one nucleic acid molecule
fragment from said at least one transcription factor/nucleic acid
molecule fragment to form at least one isolated nucleic acid
molecule fragment; combining the at least one isolated nucleic acid
molecule fragment with either: a cDNA library, or a cDNA derived
from reverse transcription of a population of RNA molecules, to
form a mixture comprising isolated nucleic acid molecule
fragment/cDNA complexes; and isolating the cDNA that binds with the
isolated nucleic acid molecule fragment to obtain at least one
isolated cDNA molecule. The nucleic acid molecule is preferably DNA
and the transcription factor can be any transcription factor known
in the art or later identified. The cell can be any cell such as a
living or dead eukaryotic or prokaryotic cell. The isolated nucleic
acid molecule fragment or isolated cDNA molecule can be sequenced
and compared to databases of sequences such as they are known in
the art or later developed to identify novel genes. The isolated
nucleic acid molecule fragment or isolated cDNA molecule can also
be amplified using appropriate methods, such as PCR, and linked to
a detectable label. Preferably, the isolated cDNA molecule is
amplified using the isolated nucleic acid molecule fragment as a
primer, such as a 3' primer or a 5' primer, more preferably as a 5'
primer. The isolated nucleic acid molecule fragment or a portion
thereof, or the isolated cDNA molecule or a portion thereof, or an
amplified product or portion thereof can also be used as a probe to
screen at least one immobilized nucleic acid molecule, such as on a
nucleic acid molecule array.
[0059] These aspects of the present invention, as well as others
described herein, can be achieved by using the methods, articles of
manufacture and compositions of matter described herein and as they
are known in the art. To gain a full appreciation of the scope of
the present invention, it will be further recognized that various
aspects of the present invention can be combined to make desirable
embodiments of the invention.
[0060] A method for identifying a nucleic acid molecule comprising
at least a portion of a gene.
[0061] One aspect of the present invention is a method for
isolating at least one nucleic acid molecule comprising at least a
portion of a gene, including: cross-linking at least one
transcription factor to a nucleic acid molecule in at least one
cell or at least one nucleus, forming at least one transcription
factor/nucleic acid molecule complex, fragmenting the nucleic acid
molecule to form at least one transcription factor/nucleic acid
molecule fragment; and isolating at least one nucleic acid molecule
from said at least one transcription factor/nucleic acid molecule
fragment to form at least one isolated nucleic acid molecule
fragment; wherein said at least one isolated nucleic acid molecule
fragment comprises at least a portion of the first exon of a gene
whose expression is modulated by said transcription factor; further
wherein said at least one isolated nucleic acid molecule fragment
comprises at least one transcription factor binding site that is in
close proximity to or operably linked to said first exon of a gene.
The nucleic acid molecule is preferably DNA and the transcription
factor can be any transcription factor known in the art or later
identified. The cell can be any cell, such as a living or dead
eukaryotic or prokaryotic cell. The isolated nucleic acid molecule
fragment can be amplified, cloned and sequenced using appropriate
methods. Such sequences can be compared to databases of sequences
such as they are known in the art or later developed to identify
novel genes. The at least one isolated nucleic acid molecule
fragment, or a product or portion thereof, can be linked to a
detectable label and be used as a probe to screen at least one
immobilized nucleic acid molecule, such as on a nucleic acid
molecule array.
[0062] In practice, the present invention provides at least one
transcription factor and at lest one nucleic acid molecule. The
transcription factor and the nucleic acid molecule are preferably
in at least one cell or nucleus. The nucleic acid molecule can be
any nucleic acid molecule, but is preferably genomic DNA.
[0063] The transcription factor can be any transcription factor as
that term is known in the art. The transcription factor can be a
known transcription factor, a presumptive transcription factor, or
an unknown transcription factor. One preferred transcription factor
of the present invention is Egr-1 and transcription factors that
are similar to Egr-1 in sequence, function and binding affinities
to target nucleic acid sequences.
[0064] A cell, such as a prokaryotic or eukaryotic cell, can be
living or dead and be provided in a primary cell line, a continuous
cell line, a clonal population of cells, or a biological sample,
such as a tissue, organ, embryo, fluid or extract thereof
Eukaryotic organisms are preferred, such as yeast or multicellular
organisms, such as invertebrates or vertebrates, such as, but not
limited to test animals such as mice, rats, rabbits or monkeys, or
human subjects. Nuclei isolated from prokaryotic or eukarytoic
cells can also be used in the present invention. Such nuclei can be
prepared using methods known in the art (see, Sambrook et al.,
supra, 1989).
[0065] The transcription factor binds to the nucleic acid molecule
in order to modulate the expression of genes encoded by the nucleic
acid molecule. Such binding may be, and usually is, reversible. The
transcription factor-nucleic acid molecule complex is made
substantially irreversible, preferably by cross-linking the
transcription factor to the nucleic acid molecule. Such
cross-linking can be accomplished by a variety of methods, such as
by contacting or exposing the cell or nuclei to chemical or
biological fixatives, such as ultraviolet irradiation (Graba et
al., The EMBO Journal, 11:3375-3384 (1992)) or paraformaldehyde or
formaldehyde (Deveaux et al., The EMBO Journal, 16:5654-5661 (1997)
and Cohen-Kaminsky et al., The EMBO Journal 17:5151-5160 (1998) and
Botquin et al., Genes and Development 12:2073-2090 (1998)). The
duration and amount of radiation or chemical used to cross-link the
transcription factor to the nucleic acid molecule can be readily
determined by one skilled in the art using the methods of the
present invention to confirm that such cross-inking has occurred.
However, such cross-linking is not critical to the present
invention (see, for example, Gould and White, Development,
116:1163-1174 (1992), Gould et al., Nature, 348:308-312 (1990),
Bigler and Eisenman, Mol. And Cell. Biol., 14:7621-7632 (1994),
Grandori et al., The EMBO Journal, 15:4344-4357 (1996), and Bigler
et al., The EMBO J. 14:5710-5723 (1995))
[0066] The cell or nuclei are then lysed using methods known in the
art to free the transcription factor-nucleic acid molecule complex
from the cell or nucleus (Bigler et al., Mol. & Cell. Biol.
14:7621-7632 (1994); Gould et al., Nature, 348:308-312 (1990);
Grandori et al., EMBO J. 15:4344-4357 (1996) and Grabe et al., EMBO
J. 11:3375-3384 (1992)). For example, cells or nuclei can be lysed
using a variety of methods, such as detergent solutions, such as
SDS, or by mechanical means, such as passage through a nozzle such
as a needle, or by sonication. The transcription factor-nucleic
acid molecule complexes in the sample can be isolated using a
variety of methods known in the art, such as centrifugation through
a gradient, such as urea or cesium chloride. The regions of the
gradient containing the transcription factor-nucleic acid molecule
complex are collected, and the compound or composition in the
sample used to make the gradient, such as urea or cesium chloride,
is preferably substantially removed by methods known in the art,
such as dialysis, to prevent the compound or composition used in
the gradient from substantially interfering with later reactions or
steps.
[0067] The transcription factor-nucleic acid molecule complexes are
preferably contacted with nucleases, such as endonucleases and/or
exonucleases, in order to divide the nucleic acid molecule into
fragments. In the alternative, such fragments can be obtained using
chemicals that cleave nucleic acid molecules, such as a strong base
or a strong acid, or by mechanical methods, such as passing the
nucleic acid molecule through a nozzle such as a needle, or by
sonication to shear a nucleic acid molecule. This procedure
provides transcription factor-nucleic acid molecule fragment
complexes.
[0068] The transcription factor-nucleic acid molecule fragment
complexes are optionally isolated using methods known in the art,
such as molecular sieve chromatography, density gradient
centrifugation, affinity chromatography, affinity absorption (such
as onto a solid phase, such as a plate or bead) or
immunoprecipitation or a specific-binding reaction. Methods that
utilize specific binding reactions can use receptor preparations,
such as antibodies or active fragments thereof, such as the Fv
region of an antibody, that specifically bind with a transcription
factor. Methods that utilize specific binding reactions are
preferred because they result in a product that is of substantial
purity. The receptor preparations can bind to a known transcription
factor, a presumptive transcription factor, or be directed to a
variety of cellular components. Preferably, the receptor is an
antibody, which can be a monoclonal antibody.
[0069] The nucleic acid molecule fragment in the transcription
factor-nucleic acid molecule fragment complex is isolated.
Preferably, the transcription factor is removed using proteolytic
digestion or treatment with protein denaturing agents, such as
phenol, optionally with heat (de Belle et al., J. Cell. Biol.
141:355-348 (1998)). The resulting nucleic acid molecule fragments
are optionally separated from other components of the mixture using
methods known in the art, such as dialysis, ethanol precipitation,
electrophoresis or molecular sieve chromatography.
[0070] Preferably, the nucleic acid molecule fragments are attached
to linker nucleic acid molecules using methods known in the art,
such as blunt-end ligation or cohesive-end ligation of linkers. The
choice of linkers and method of ligation depends on whether the
nucleic acid molecule has blunt ends or cohesive ends as a result
of the digestion of the nucleic acids in the transcription
factor/nucleic acid molecule complex. Preferably, the linker
nucleic acid molecules are chosen so that they can serve as 5' or
3' primers for nucleic acid molecule amplification procedures, such
as polymerase chain reaction (PCR), and/or for cloning into vectors
using methods known in the art (Sambrook et al, supra, 1989). If
cloned into vectors, the nucleic acid molecule fragments can be
amplified in a host cell appropriate for the vector. Such methods
can result in a library of clones that comprise nucleic acid
molecules that bind with a transcription factor and preferably at
least a portion of at least one gene, such as a control sequence,
3' untranslated region, intron or exon. The vectors can also be
used to amplify the nucleic acid molecule fragment using nucleic
acid amplification procedures, such as PCR, using appropriate
primers that correspond to the linkers. PCR primers generally
comprise two nucleotide sequences, one with sense orientation and
one with antisense orientation, employed under preferred conditions
(see, Innis, PCR Strategies, Academic Press, San Diego, 1995)).
Alternatively, MRNA derived from the vector can be reverse
transcribed and be amplified using appropriate primers. The nucleic
acid molecule fragments of the present invention can be isolated
and/or amplified using a variety of methods, such as those
described below.
[0071] In one aspect of the invention, linker nucleic acid
molecules are ligated to at least one end of the nucleic acid
molecule fragment of the present invention. This nucleic acid
molecule fragment is amplified using appropriate nucleic acid
amplification procedures, such as PCR, using appropriate primers,
such as those derived from the sequences of the linker nucleic acid
molecules or the nucleic acid molecule fragment. The amplified
nucleic acid molecule fragments optionally, but preferably, are
cloned into a vector, such as a plasmid, to create a library of
nucleic acid molecule fragments of the present invention. The
vectors can be optionally digested using, for example, at least one
restriction enzyme, to remove the amplified nucleic acid molecule
fragment from the vector. The vector or the removed amplified
complex are isolated by, for example, gel electrophoresis, to
obtain nucleic acid molecules including the nucleic acid molecule
fragments of the present invention.
[0072] In another aspect of the present invention, linker nucleic
acid molecules are ligated to at least one end of the nucleic acid
molecule fragment of the present invention. This complex is
amplified using appropriate nucleic acid molecule amplification
procedures, such as PCR, using appropriate primers, such as those
derived from the linker nucleic acid molecule's sequence. The
linker nucleic acid molecules are removed from the amplification
product using, for example, at least one appropriate restriction
enzyme. The products of this reaction are separated using
appropriate methods, such as gel electrophoresis, to obtain
isolated nucleic acid molecule fragments of the present
invention.
[0073] In a further aspect of the present invention, the nucleic
acid molecule fragments of the present invention are cloned into a
vector, such as a plasmid, using appropriate methods. The vector is
digested using, for example, at least one appropriate restriction
enzyme. The products of this reaction are separated using, for
example, gel electrophoresis, and the nucleic acid molecule
fragments of the present invention isolated.
[0074] The nucleic acid molecule fragments of the present
invention, their PCR products or their cloned counterparts
optionally digested from a vector can be used in PCR reactions as
described below. The nucleic acid molecule fragments of the present
invention can also be used in hybridization reactions, such as
screening nucleic acid molecule arrays, or be part of a nucleic
acid molecule array. In this instance, the nucleic acid molecule
fragments of the present invention are preferably linked to a
detectable label.
[0075] The nucleotide sequence of the nucleic acid molecule
fragments of the present invention can be determined using methods
known in the art (Sambrook et al., supra, 1989). In addition to
sequences that bind with a transcription factor, the nucleic acid
molecule fragments of the present invention can be linked to at
least a portion of an open reading frame of a gene. When the
nucleotide sequence of the nucleic acid molecule fragments of the
present invention are compared with databases of known nucleic acid
sequences, such genes can be identified. If the sequences of the
present invention are not known, then the present invention has
identified at least a portion of a novel gene that is presumptively
regulated by a transcription factor. If the transcription
factor/nucleic acid molecule fragment was isolated using specific
binding reactions, such as anti-transcription factor antibodies,
then the identified gene is presumptively regulated by
transcription factors that bind with such anti-transcription factor
antibodies.
[0076] The nucleic acid molecule fragments isolated by the present
invention can include sequences that bind with a transcription
factor as well as regions that are in close proximity to regions or
sequences that bind with a transcription factor. Not wishing to be
limited to any mechanism, the inventors contemplate that the
methods of the present invention result in nucleic acid molecule
fragments that include regions cross-linked to transcription
factors and regions that are not cross-linked to transcription
factors. The regions that are not cross-linked to transcription
factors are in close proximity to the regions that are cross-linked
to transcription factors. Regions that are in close proximity to
regions that are cross-linked to transcription factors can be
upstream or downstream from the regions that bind with a
transcription factor and can encode introns or exons. Thus, the
methods of the present invention can isolate nucleic acids
including introns or exons of a gene.
[0077] Preferably, an isolated nucleic acid molecule fragment of
the present invention includes at least a portion of the first exon
of a gene that is regulated by at least one transcription factor.
More preferably, the isolated nucleic acid molecule fragment of the
present invention includes at least a portion of the control
sequence or control sequences that bind with a transcription factor
that modulates the transcription of the gene, which need not be
operably linked to or in close proximity with the first exon of a
gene that is regulated by at least one transcription factor.
Preferably, the nucleic acid molecule fragment of the present
invention includes at least a portion of the control sequence and
at least a portion of the first exon on a gene that are operably
linked or in close proximity to each other. Thus, the isolated
nucleic acid molecule of the present invention comprises control
sequences that modulate at least a portion of the first exon of a
gene, at least a portion of an open reading frame, preferably the
first exon of the open reading frame. Accordingly, the present
invention identifies the appropriate gene whose transcription is
modulated by a transcription factor.
[0078] For example, a region of a nucleic acid molecule that binds
with a transcription factor can be within a gene, upstream of a
gene or downstream of a gene. The isolation of a region that binds
with a transcription factor can result in the isolation of a
portion of a gene that is upstream or downstream from the region
that binds with a transcription factor. The nucleic acid molecule
fragments of the present invention can be optionally cloned or
amplified using appropriate procedures, and the sequence of the
nucleic acid molecule fragments obtained using established methods.
These sequences can be compared to databases of known sequences. If
the present invention isolates at least a portion of a gene having
a known sequence, then that gene is presumptively modulated by the
transcription factor. If the present invention isolates a nucleic
acid molecule having a novel sequence, then the present invention
has isolated at least a portion of a nucleic acid molecule that
encodes a novel control sequence or a novel gene. Novel nucleic
acid sequences identified by the present invention can be used as
primers to isolate the novel gene.
[0079] Alternatively, the nucleic acid molecule fragments of the
present invention, or their amplification products, can be
optionally linked to a detectable label and used to screen arrays
of nucleic acids, such as those including cDNA libraries. The
binding of a nucleic acid molecule fragment of the present
invention to a member of such a nucleic acid molecule array
identifies the cDNA that the nucleic acid molecule fragment of the
present invention corresponds to.
[0080] A method for isolating a nucleic acid molecule that includes
at least a portion of a gene using a cDNA molecule.
[0081] Another aspect of the present invention is a method for
isolating at least one nucleic acid molecule that can include at
least a portion of a gene operably linked to or in close proximity
to a nucleic acid sequence that binds with at least one
transcription factor, comprising: cross-linking at least one
transcription factor to a nucleic acid molecule in at least one
cell or at least one nucleus, forming at least one transcription
factor/nucleic acid molecule complex; fragmenting the nucleic acid
molecule to form at least one transcription factor/nucleic acid
molecule fragment; isolating at least one nucleic acid molecule
fragment from said at least one transcription factor/nucleic acid
molecule fragment to form at least one isolated nucleic acid
molecule fragment; combining the at least one isolated nucleic acid
molecule fragment with either: a cDNA library, or a cDNA or cDNA
population derived from reverse transcription of a population of
RNA molecules, to form a mixture comprising isolated nucleic acid
molecule fragment/cDNA complexes; and isolating the cDNA that binds
with the isolated nucleic acid molecule fragment to obtain at least
one isolated cDNA molecule. The nucleic acid molecule is preferably
DNA and the transcription factor can be any transcription factor
known in the art or later identified. The cell can be any cell such
as a living or dead eukaryotic or prokaryotic cell. The isolated
nucleic acid molecule fragment or isolated cDNA molecule can be
sequenced and compared to databases of sequences such as they are
known in the art or later developed to identify novel genes. The
isolated nucleic acid molecule fragment or isolated cDNA molecule
can also be amplified using appropriate methods, such as PCR, and
linked to a detectable label. Preferably, the isolated cDNA
molecule is amplified using the isolated nucleic acid molecule
fragment as a primer, such as a 3' primer or a 5' primer, more
preferably as a 5' primer. The isolated nucleic acid molecule
fragment or a portion thereof, or the isolated cDNA molecule or a
portion thereof, can also be used as a probe to screen at least one
immobilized nucleic acid molecule, such as on a nucleic acid
molecule array.
[0082] Nucleic acid molecules comprising the nucleic acid molecule
fragments of the present invention can also be contacted with a
cDNA library derived from a cell of choice. Preferably, the cell is
the same cell used to make the nucleic acid molecule fragments of
the present invention. More preferably, the cell used to make the
cDNA library was subjected to the same conditions as the cell used
to make the nucleic acid molecule fragments of the present
invention so that the same nucleic acid molecules were transcribed.
Such cDNA libraries can be made using methods known in the art, or
purchased. Alternatively, a different cell, or a cell subjected to
different conditions than the cell used to make the nucleic acid
molecule fragments and cDNA libraries of the present invention can
be used in order to identify genes that are expressed under
different conditions. Such methods are known as array screening
methods (see, for example, Iyer et al., Science 283:83-87
(1999)).
[0083] The binding of the nucleic acid molecule fragment of the
present invention to a cDNA molecule can be used as the basis of a
PCR reaction to amplify cDNA molecules that bind with a nucleic
acid molecule fragment of the present invention. A cDNA library
from a cell, preferably the cell that was used to produce the
nucleic acid molecule of the present invention, cloned into a known
site of a vector is preferably used as a template. The nucleic acid
molecule of the present invention is used as a 5' PCR primer, and
an appropriate 3' PCR primer is derived from vector sequences.
Preferably, the 3' PCR primer is derived from vector sequences that
are adjacent to the location where the cDNA ligates with the vector
nucleic acid molecule.
[0084] Alternatively, the nucleic acid molecule of the present
invention is used as a 3' PCR primer, and an appropriate 5' PCR
primer is derived from vector sequences. Preferably, the 5' PCR
primer is derived from vector sequences that are adjacent to the
location where the cDNA ligates with the vector nucleic acids. PCR
reactions are performed, such as cDNA molecules that hybridize with
the 5' PCR primer are amplified. In some cases, the transcription
factor binding site may be within an intron or in the 3'
untranslated region of a gene. This invention contemplates that PCR
can also be used in this instance using the nucleic acid molecule
fragments of the present invention as 3' primers and using
sequences derived from the cDNA vector as 5' primers in
amplification reaction.
[0085] The PCR products are isolated and cloned into an appropriate
vector. These PCR amplified sequences can be compared to databases
containing known nucleotide sequences in order to identify the gene
that gave rise to the cDNA molecule. The identified gene is
presumptively regulated by a transcription factor. If the
transcription factor-nucleic acid molecule fragment was isolated
using specific binding reactions, such as anti-transcription factor
antibodies, then the identified gene is presumptively regulated by
transcription factors that bind with such anti-transcription
antibody.
[0086] The present invention includes a nucleic acid molecule
comprising a nucleic acid molecule identified by the method of the
present invention, such as SEQ ID NO: 15, exons thereof, protein
coding regions thereof, control regions thereof, genes thereof,
transcription factor binding regions thereof sequences having
substantial identity thereto, sequences having substantial homology
thereto, and having between at least about 60% and about 99%,
preferably between about 70% and about 95%, and most preferably
between about 80% and about 90% homology to a nucleic acid molecule
identified by the method of the present invention; and fragments or
active fragments of any of the foregoing. The nucleic acid molecule
of the present invention can be cloned into an appropriate vector,
and the vector can be transfected or transformed into an
appropriate host cell using methods established in the art to make
transfected or transformed host cells (see, Sambrook et al., supra,
1989). The transfected or transformed host cells can be used to
make a protein of the present invention.
[0087] Control regions identified by the present invention and
nucleic acid molecules comprising control regions identified by the
present invention are useful, for example, as part of an expression
vector to express a desired gene. Such expression vectors can be
made by operably linking a control region of the present invention
with a gene of interest using methods known in the art (Sambrook et
al., supra, (1989)). Such vectors can be transfected or transduced
into appropriate host cells using methods known in the at (Sambrook
et al., supr, (1989)). Within such cells, the control region can
drive the expression or repress the expression of the gene of
interest under a set of conditions, such as stresses, such as UV
irradiation.
[0088] The function of the protein encoded by the protein coding
region of SEQ ID NO: 16 is considered to be a nucleic acid molecule
binding protein, such as a DNA binding protein based on homologies
with know nucleic acid molecule binding proteins. For example, the
sequence CDNFSAYGWCPLGPQCPQSH (SEQ ID NO:3) has an anchor blocks
score of 1089 (88.6 percentile) matching with a zinc-finger motif
based on BLOCKS search software (www.blocks.thcfc.org. Mar. 12,
1999). Also, the sequence IIDTDEAAAEDKRRRRREKRKRALLNLPG (SEQ ID
NO:4) has an anchor blocks score of 1092 (90.1 percentile) matching
with REV protein, an HIV anti-repression transcription activator,
using BLOCKS search software. In addition, the sequence
HRAGFDAFMTGYV (SEQ ID NO: 5) has an anchor blocks score of 1137
(98.1 percentile) matching with exonuclease, an ATP-dependent
helicase, using BLOCKS search software. The function of proteins,
polypeptides, peptides and fragments of active fragments thereof
can be identified by determining amino acid sequence thereof,
either by amino acid sequence reactions or by deducing the amino
acid sequence from a nucleic acid sequence (see, Sambrook et al.,
supra, 1989). The function of a protein, peptide, polypeptide, or
fragment or active fragment thereof can be inferred by comparing
amino acid sequences or nucleic acid sequences encoding such amino
acid sequences with appropriate databases, wherein substantial
homology with an amino acid sequence or nucleic acid sequence of
known function is predictive of the function of the nucleic acid
molecule or protein identified by a method of the present
invention.
[0089] The present invention also includes proteins identified by
the present methods, such as the protein encoded by SEQ ID NO: 16,
proteins having conservative amino acid substitutions thereof, and
proteins having substantial identity thereto; and portions,
fragments or active fragments of any of the foregoing or proteins
comprising any of the foregoing.
[0090] The present invention also includes antibodies, either
polyclonal or monoclonal, that specifically bind with a protein,
portion thereof, fragment thereof or active fragment thereof of the
present invention. Such antibodies can be made and screened for
such specific binding using methods known in the art (Sambrook et
al., supra, (1989); Harrow et al., Antibodies, A Laboratory Manual,
Cold Spring Harbor Press, (1988)). Such antibodies are useful, for
example, in imunoassays to detect the presence or amount of a
protein, portion thereof, fragment thereof or active fragment
thereof of the present invention. The antibodies can be linked to a
detectable label, such as an enzyme such as horseradish peroxidase,
radionucleotide such at .sup.32P, fluorescent protein such as green
fluorescent protein (GFP) or fluorophore such as rhodamine, using
methods known in the art to detect the specific binding of the
antibody.
[0091] III Nucleic acid molecule Arrays
[0092] In another aspect of the present invention, isolated nucleic
acid molecule fragments, such as isolated DNA fragments or their
cloned or amplified counterparts, or RNAs derived from the isolated
nucleic acid molecule fragments, are labeled according to any
technique as known or as may be developed in the art, such as with
the cyanine dye dUTP analog Cy3 by polymerase chain reaction. Other
labels, such as the fluorescent markers Cy5, fluorescein,
rhodamine, or phycoerythrin, linked to nucleotides or nucleotide
analogs, or radioisotopes such as .sup.33P or .sup.32P incorporated
into nucleotides or nucleotide analogs, or biotin linked to
nucleotides or nucleotide analogs, and other labeling techniques,
such as nick translation, random priming, RNA synthesis, or reverse
transcription of RNA derived from the DNA fragments, may be used
(Yu, et al. NAR 22:3226-3232, Lockhart, et al., Nature
Biotechnology 14:1675-1680; DeRisi, Nature Genetics 14:457-460,
Chalifour, et al., Anal. Biochem 216:299-304; Pietu, et al., Genome
Research 6:492-503). These labeled nucleic acid molecule fragments
are added to hybridization reactions with nucleic acid molecule
arrays, such as DNA arrays. These arrays may be synthesized or
purchased and preferably contain sequences of expressed genes
spotted on an ordered matrix made of glass, nitrocellulose, nylon,
silicon, or other suitable material
((http://cmgm.stanford.edu/pbrown/protocols; Schena, et al.,
Science 270: 467-470; Pietu, et a.l., Genome Research 6:492-503,
Chalifour, et al. Anal. Biochem, 216:299-304; Lockhart, et al.,
Nature Biotechnology 14: 1675-1680; http://www.clonetech.com;
http://www.affymetrix.com; http://www.incyte.com; Stratagene, 11011
North Torrey Pines, La Jolla, Calif. 92037). The sequences spotted
on the array may be from any appropriate source. For example,
databases such as Unigene and IMAG.
(http://www.ncbi.nlm.nih.gov/UniGene;
http://www-bio.llnl.gov/bbrp/image/- image.html) provide catalogs
of unique expressed sequence tags (ESTs) from human. Alternatively,
the arrays may contain genes of known identity and may comprise
genes encoding proteins of a particular type, for example, proteins
that function in apoptosis. Hybridization reactions may be
performed according to methods as known or developed in the future
in the art (http://cmgm.stanford.edu/pbrown/protocols; Schena, et
al. Science 270:467-470; Pietu, et al., Genome Research 6:492-503;
Chalifour, et al, Anal. Biochem. 216:299-304 and Lockhart, et al.
Nature Biotechnology 14:1675-1680) and positive hybridization
signals are detected by machinery such as, in the case of
fluorescently labeled probes, a confocal microscope that scans the
array and detects the presence of labeled nucleic acid molecule
(DeRisi, et al., Nature Genetics 14:457-460; Lockhart, et al.,
Nature Biotechnology 14:1675-1680). Scanning and detection systems
using focused laser beams are available from Affymetrix
(GeneArray.TM. Scanner, Santa Clara, Calif.,
http://www.affymetrix.com), General Scanning (ScanArary.TM.
Scanner, Menlo Park, Calif., http://www.genscan.com), and Incyte
(GemArray Scanner, Palo Alto, Calif., http://www.incyte.com), among
other companies. If the probes are radiolabeled, the array may be
subjected to autoradiography or phosphorimaging (Chalifour, et al.
Anal. Biochem. 216:299-304, Pietu, et al., Genome Research
6:492-503). Other methods of detection may be used in accordance
with the nucleic acid molecule labeling techniques that may be
used. The position of the labeled nucleic acid molecule may be
localized on the array to identify the specific nucleic acid
molecules, such as DNA molecules, on the array that have hybridized
to the isolated nucleic acid molecule fragment (Lockhart, et al.,
Nature Biotechnology 14:1675-1680; DeRisi, Nature Genetics
14:457-460; Chalifour,et al. Anal. Biochem 216:299-304 and Pietu,
et al., Genome Research 6:492-503). Software is commercially
available to facilitate the localization and determine the
intensity of positive hybridization signals (the GeneChip
Workstation Expression Data Mining Tool from Affymetrix, Santa
Clara, Calif., the ScanArray.TM. Acquisition QuantArray.TM. Tools
from General Scanning, Menlo Park, Calif., and the GemTools.TM.
LifeArray.TM. system from Incyte, Palo Alto, Calif.). Positively
hybridizing nucleic acid molecules, such as DNA molecules, whether
of known or unknown identity, are derived from genes presumptively
regulated by the transcription factor.
EXAMPLES
Example 1
[0093] Identification of control elements and genes regulated by
the transcription factor Egr-1.
[0094] H4 cells subcloned from Fibrosarcoma HT1080 cells (ATCC NO:
CCL-121) do not express detectable amounts of transcription factor
EGR-1 (Huang et al., Cancer Res. 55:5054-5062 (1995)). Fibrosarcoma
HT1080 sublone H4E9 (E9 cells) were prepared by transfection of H4
cells with expression vectors for mouse wild-type Egr-1
(pCMV--Egr-1) as described by Huang et al., Cancer Res.
55:5054-5062 (1995). These cells were maintained in DMEM
supplemented with 10% fetal bovine serum and cultured in the
presence of penicillin, streptomycin and 200 (micrograms/ml of
G-418. Cell numbers in culture were determined by direct cell
counting following the general methods of Huang et al., Cancer Res,
55:5054-5062 (1995).
[0095] Separate cultures of H4 and E9 cells (approximately
5.times.10.sup.6 to 1.times.10.sup.7 cells) were irradiated with
approximately 40 J/m.sup.2 UV-C, a procedure that leads to new
Egr-1 synthesis in cells with a normal Egr-1 gene. In E9 cells,
this treatment leads to the hyperphosporylation of exogenous
constitutive Egr-1, peaking at two hours. As controls, cultures of
H4 and E9 cells that were not treated with UV-C were also subjected
to the following procedures. The cells were contacted with 1%
formaldehyde in 5 mM Tris, pH 8; 10 mM NaCl; 0.1 mM EGTA; 0.1 mM
EDTA, for about 2 hours following the general procedures of Orlando
et al., Cell 75:1187-1198 (1993) to cross-link EGR-1 to its target
DNA sequence. The cells were lysed in 4% SDS in 10 mM Tris, pH 8, 1
mM EDTA, and passage through a 20 gauge needle or by brief
sonication. The cross-linked nucleic acids were separated using a
urea gradient (about 5M to about 8M) using centrifugation at 30,000
rpm in a SW41 rotor (Beckman) for about 16 hours. Fractions from
the urea gradient were obtained, and fractions containing
cross-linked nucleic acid molecules were identified in the pellets.
Excess urea from these fractions was removed using dialysis.
[0096] Dialyzed fractions containing cross-linked nucleic acid
molecules were treated with restriction enzyme Eco--RI (about 20
units overnight) at 37.degree. C. to provide a mixture of
cross-linked nucleic acid molecule fragments. The restriction
enzyme digested preparation was precipitated using rabbit
anti-Egr-1 antibodies using Protein-A Sepharose (Sigma Chemical
Co.). The immunoprecipitates were collected by pelleting by
centrifugation in a microcentrifuge.
[0097] To verify that Egr-1 was present in the immunoprecipitated
fractions, samples of the immunoprecipitates were digested with
DNase I (about 10 units at 37.degree. C. for about 30 minutes) to
release proteins bound to the nucleic acid molecule fragments.
These preparations were separated using electrophoresis through a
10% SDS-PAGE gel. The protein in the gel was transferred to
membranes and Western Blots were performed on the membranes using
rabbit anti-Egr-1 antibodies, which were detected using anti-rabbit
secondary antibodies conjugated to horseradish peroxidase and an
appropriate detectable substrate (ECL) using horseradish
peroxidase/hydrogen peroxide catalyzed oxidation of luninol. As
shown in FIG. 2, Egr-1 protein was detected in samples derived from
UV irradiated E9 cells, less in unirradiated H9 cells but not in
irradiated H4 cells or unirradiated H4 cells.
[0098] For the remainder of the immunoprecipitates, the anti-EGR-1
antibodies and EGR-1 transcription factor were removed by treatment
with proteinase K (0.25% SDS with 250 micrograms/ml of proteinase
K, at 37.degree. C. over night, and 680 C for about six hours. The
resulting preparations of DNA fragments were ligated with linkers
of the sequence
[0099] 5'-AATTCGAAGCTTGGATCCGAOCAG-3'- (SEQ ID NO: 11) and
5'-CTGCTCGGATCCAA(CTTCG-3' (SEQ ID NO: 12) having Eco--RI ends,
which ligate to the Eco RI-digested fragments. These fragments were
then amplified in PCR using SEQ ID NO: 11 and SEQ ID NO: 12
oligonucleotide as the primers. The conditions used were:
95.degree. C. for 45 minutes, 55.degree. C. for 30 minutes and
72.degree. C. for 5 minutes, Samples of each of these reactions
(H4, E9, H4/JUV, and E9/UV) were electrophoresed through a 1%
agarose gel and stained with ethidium bromide to detect nucleic
acid molecules. As shown in FIG. 3, DNA was visibly amplified only
from the E9 cells and from E9 cells irradiated with UV light.
Imunoprecipitates from unirradiated H4 cells, and irradiated H4
cells, did not give rise to detectable levels of amplified DNA
fragments.
[0100] The amplified DNA fragments were digested to completion with
Eco--RI. The digested fragments were separated from linker-primer
sequences by agarose gel electrophoresis and cloned into
pBluescript plasmids by Eco--RI digestion of plasmids and
subsequent ligation at 16.degree. C. overnight. These plasmids were
transformed into E. coli strain XL2B (Stratagene). Selected
amplified DNA fragments were sequenced (SEQ ID NO: 17, SEQ ID NO:
18, SEQ ID NO: 19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID
NO:23, SEQ ID NO:24 and SEQ ID NO:25).
[0101] To determine whether we could obtain promoter regions of
genes regulated by Egr-1, an aliquot of the amplified isolated
fragments in a PCR using primers from the promoter region of
TGF-betal that spans -201 to +138 (339 base pairs) of human
TGF-betal promoter (5'-GGGCTGAAGGGACCCCCCTC-3' (SEQ ID NO: 10) and
5'-TCCTCGGCGACTCCTTCCTC-3- ' (SEQ ID NO: 1)). A 339 basepair
fragment was amplified from fragments isolated from nonirradiated
E9 cells, which constitutively express EGR-1, but not from H4 cells
(not shown), which are EGR-1 deficient (FIG. 4).
[0102] As a further test that the DNA fragments were isolated
because they were bound by EGR-1, ten cloned inserts were isolated
by digestion using Eco--RI and labeled with alpha-.sup.32P-dATP by
Klenow fill-in. The labeled nucleic acid molecules were mixed with
bacterially synthesized EGR-1 protein, GST-Egr-1 between about 20
picoM and about 100 picoM. These mixtures were separated using
electrophoresis under non-denaturing conditions in a 6% acrylamide
gel. All ten cloned inserts exhibited gel shifts in the presence of
GST-Egr-1, indicating that the nucleic acid molecule isolates bound
to GST-Egr-1; gel shifts of three of these nucleic acid molecule
isolates, DNA4, DNA5 and DNA3, are shown in FIG. 5. Nucleic acid
molecule DNA4 from FIG. 5, labeled with alpha-.sup.32P-dATP was
subjected to competitive gel shift assays (FIG. 6). Briefly,
labeled DNA4 (lane 1) was incubated with about 100 picoM Egr-1
(lane 2) and then with excess unlabeled EBS (about 50X molar
excess) Wild-type Egr-1 binding site,
5'-GATCACTCGCGGGGGCGAGGATGAGCGCCCCCGCTCCTCTTAG-3'- (SEQ ID NO: 13)
(lane 3) or mutant EBS (EBSmut) that does not bind with Egr-1,
5'-GATCACTCACATTTACAAGGATGAGTGTAAATGTTCCTCTAG-3' (SEQ ID NO: 14)
(lane 4). As shown in FIG. 6, EBS, but not EBSmut, competed with
the binding of DNA4 with Egr-1.
[0103] Three of the DNA fragments of the present invention were
functionally linked to a reporter gene to determine the
functionality of the isolated nucleic acid molecule sequence.
Sequences E9BS-3, E9BS-4, and E9BS-6 were functionally linked to a
luciferase gene having a minimal (fos56-promoter by cloning into
plasmid pGL3-Basic (Promega) to form pE9BS-3, pE9BS-4 and pE9BS-6.
This plasmid was transfected into H4 cells, which does not express
Egr-1, with and without cotransfection of EGR-1 expression plasmid
pCMV-Egr-1. The transfected cell lines were then either untreated
or irradiated with UV-C (40J/m.sup.2) and the amount of luciferase
in the untreated or treated cells measured. As shown in FIG. 7, all
of the fragments tested showed Egr-1 induced stimulation of
expression of the reporter gene. In addition, E9BS-6 showed
increased expression of the reporter gene after irradiation.
Example 2
[0104] Isolation of Expressed Genes by PCR Using a cDNA
Library.
[0105] Bacteria containing the plasmids containing the isolated
fragments as inserts described in Example 1 were grown as a bulk,
mixed culture and plasmid DNA was isolated using Qiagen maxi-prep
columns generally following the manufactures instructions. The
isolated plasmid DNA was digested with EcoRI (about 10 units for
about 4 hours) to release DNA inserts. The digest was
electrophoresed on a 1% agarose gel, and gel sections containing
digested inserts was excised. The DNA inserts were isolated from
the gel using a Qiagen maxi-prep column generally following the
manufacturers instructions. The DNA inserts, a mixture of many
different clones of many different sequences, was mixed with DNA
isolated by excision from a lambda-gt11 library. The library was
made from RNA isolated from the NT2 human carcinoma cell line,
(ATCC NO: CRL-l973) which is know to express EGR-1, and was
purchased from Stratagene. The T7 vector primer
(5'-TAATACGACTCACTATAGGGAGA-3' (SEQ ID NO:2)) was added to the
mixture to serve as a 3' primer in the amplification reaction. PCR
was performed under the following conditions: 95.degree. C. for 45
minutes, 50.degree. C. for 30 minutes, 72.degree. C. for 5 minutes
for thirty cycles. These PCR conditions were optimined by varying
the amount of isolated DNA insert used in the PCR reactions so that
when products of the PCR reactions were electrophoresed on agarose
gels and stained with ethidium bromide, bands were visible in the
PCR which contained the cDNA library and the isolated fragment
inserts, but not in control reactions which lacked either the cDNA
library DNA or the isolated DNA inserts (FIG. 8).
[0106] The products of the PCR were separated on a 1% agarose gel.
Seven ethidium bromide stained bands were excised from the gel and
the DNA was isolated and cloned into a TA plasmid vector pCR 3.1
from Invitrogen. One of the seven clones obtained was sequenced.
The 5' end of the clone was found to contain sequences
approximating the Egr-1 binding site. Sequence analysis also
revealed a putafive TATA site preceding an open reading frame (702
base pairs) (FIG. 11).
[0107] Clone 1 (SEQ ID NO: 15) was used as a template in an in
vitro transcription and translation reaction to produce a protein
determined to be approximately 37 kDa by comparison with standard
molecular mass markers when subjected to electrophoresis through
10% SDS-PAGE. The in vitro transcription and translation reaction
was performed using the TnT-coupled reticulocyte lysate system
generally according to the manufacturers instructions (Promega).
For the reaction, 1 microgram of clone 1 template DNA or control
empty vector DNA was used in the presence of 10U of T7 RNA
polymerase (Promega), and 40 microCi of .sup.35S-Methionine (NEN).
Products were analyzed by 10% SDS-PAGE followed by autoradiography
with Kodak XR5 X-ray film. FIG. 9 shows that the control empty
vector produced no protein, while clone 1 (SEQ ID NO: 15) DNA
produced a polypeptide of approximately 37 kDa.
[0108] A gel shift assay was performed using a probe derived from
the 5' region of clone 1 (FIG. 10). The probe was generated by PCR
using specific primers designed to amplify the region encompassing
nucleotides 507 to 700 of SEQ ID NO: 15 generating a probe of 193
base pairs. The sequence of this probe is indicated in FIG. 11 in
bold type. The 5' and 3' primers used to generate the probe were
5'-TACCATAAGGGCAATGACAA-3' (SEQ ID NO:6) and
5'-CATCTCACACAGGTCAGCGGT-3' (SEQ ID NO:7) respectively. The PCR
product was radiolabelled using 10U of T4 kinase (Gibco, Life
Technologies), in the presence of 50 microgram of .sup.32P-ATP. For
the gel shift assay, 10,000 to 20,000 cpm of probe was mixed with 1
microgram of poly(dI--dC) (Pharmacia), as non-specific competitor
DNA, and 20 to 60 pmol of bacterially produced recombinant Egr-1,
in a buffer consisting of 10 mM HEPES, pH 7.9/10% (v/v) glycerol/1
mM DTT/50 mM KCl/2.5 mM MgCl.sub.2. The tubes were incubated at
room temperature for 15 minutes, and then either consensus Egr-1
binding site or mutated binding site oligonucleotides as previously
described were added to the appropriate tubes at a 50 fold molar
excess to the probe. The tubes were incubated for a further 15
minutes at room temperature, and were then subjected to
non-denaturing gel electrophoresis through a 6% polyacrylamide gel
containing 10 mM TRIS, pH 8.7/60 mM Glycine/0.2 mM EDTA. Following
electrophoresis, the gel was dried and subjected to autoradiography
with Kodak XR5 X-ray film.
[0109] Recombinant Egr-1 was prepared by cloning the coding
sequence of Egr-1 into the pGEX-2T vector (Pharmacia) in frame.
This cloning generated a GST-Egr-1 fusion protein which was
expressed in and purified from XL2B cells using glutathione-agarose
beads generally according to the manufacturers instructions
(Pharmacia). For this in frame cloning, Egr-1 was generated by PCR
using specific primers designed to amplify from amino acids 2 to
533 of mouse Egr-1. The template for PCR was mouse Egr-1 cloned
into the plasmid vector pcDNA3 (Invitrogen), and the primers used
were 5'-CGCGGATCCGCAGCGGCCAAGGCC-3' (SEQ ID NO:8) and
5'-CCGGAATTCGCAAATTTCAATTGT-3' (SEQ ID NO:9) containing BamHI and
EcoRI sites respectively, which were digested post-PCR to allow in
frame cloning into BamHI and EcoRI digested pGEX-2T vector. The
sequence of the competitor oligonucleotides used in the shift assay
were as used in FIG. 6. FIG. 10 shows that recombinant Egr-1 is
able to specifically bind to this region of clone 1. A potential
Egr-1 binding site within this region is indicated in FIG. 11 in
bold italics.
[0110] To generate a clone 1 reporter construct, the 5' region of
clone 1 containing nucleotides 1 to 922 of SEQ ID NO: 15 was cloned
into the pGL3 -Basic Luciferase reporter vector (Promega) (FIG.
12). A clone 1 reporter construct was generated by digesting the
original clone 1, inserted into pCR 3.1 TA vector, with KpnI and
PvuII. The 922 base pair digestion product consisting of the 5'
region of clone 1 was then purified by 1% agarose gel
electrophoresis, and then ligated into the pGL3-Basic vector which
had been digested with KpnI and SmaI. The transcriptional effects
of Egr-1 on this reporter construct were determined by transient
transfection assays in 293T cells. For these assays, 0.5 microgram
of the reporter construct (pGL3-luciferase reporter with the 5'
region of clone 1) was transfected together with 3 microgram of an
Egr-1 expression construct, or the same amount of the corresponding
empty vector, and 0.2 microgram of pCMV-beta-gal vector by liposome
mediated transfection with the Lipofectamine reagent according to
the manufacturers instructions (Gibco, Life Technologies).
Twenty-four hours after transfection, some dishes of cells were
irradiated with 40 J/m.sup.2 of UV-C radiation using a Stratalinker
(Stratagene). Four hours after irradiation the cells were harvested
and lysed in a buffer consisting of 100 mM KPO4, pH 7.8/0.2% (w/v)
Triton X-100/1 mM DTT. For each sample, the beta-galactosidase
(GAL) activity was measured by incubating a fraction of the cell
extract with 400 micrograms of ONPG
(O-nitrophenyl-D-galactopyranoside) at 37.degree. C. for 10 to 30
minutes. The resulting reaction was measured spectrophotometrically
at 420 nm. The spectrophotometric readings were used to equalize
for transfection efficiencies for the Luciferase reporter assays.
For the Luciferase assays, the corrected amount of cell extract was
mixed with 0.07 mM luciferin substrate (D-Luciferin, potassium
salt, Analytical Luminescence Laboratory), in the presence of 0.01
M ATP. Luciferase activity was measured in a 96 well microtiter
plate in an EG&G U Berthold microL umat LB96P Luminometer. FIG.
12 shows that, when transfected into 293T cells, Egr-1
transactivates the reporter construct, while UV-activated Egr-1
represses transactivation activity. These results demonstrate that
Egr-1 up-regulates cone 1 gene, but UV-irradiation presumably leads
to the formation of phosphorylated Egr-1 which then represses
transcription of clone 1 gene. This is the first demonstration that
Egr-1 modification alters its trans-activating function.
[0111] All publications, including patent documents and scientific
articles, referred to in this application are incorporated by
reference in their entirety for all purposes to the same extent as
if each individual publication were individually incorporated by
reference.
[0112] All headings are for the convenience of the reader and
should not be used to limit the meaning of the text that follows
the heading, unless so specified.
Sequence CWU 1
1
25 1 20 DNA Homo sapiens 1 tcctcggcga ctccttcctc 20 2 23 DNA Homo
sapiens 2 taatacgact cactataggg aga 23 3 20 PRT Homo sapiens 3 Cys
Asp Asn Phe Ser Ala Tyr Gly Trp Cys Pro Leu Gly Pro Gln Cys 1 5 10
15 Pro Gln Ser His 20 4 31 PRT Homo sapiens 4 Ile Ile Asp Thr Asp
Glu Ala Ala Ala Glu Asp Lys Arg Arg Arg Arg 1 5 10 15 Arg Arg Arg
Glu Lys Arg Lys Arg Ala Leu Leu Asn Leu Pro Gly 20 25 30 5 13 PRT
Homo sapiens 5 His Arg Ala Gly Phe Asp Ala Phe Met Thr Gly Tyr Val
1 5 10 6 20 DNA Homo sapiens 6 taccataagg gcaatgacaa 20 7 21 DNA
Homo sapiens 7 catctcacac aggtcagcgg t 21 8 24 DNA Homo sapiens 8
cgcggatccg cagcggccaa ggcc 24 9 23 DNA Homo sapiens 9 ccggaattcg
caaatttcaa ttg 23 10 20 DNA Homo sapiens 10 gggctgaagg gacccccctc
20 11 24 DNA Homo sapiens 11 aattcgaagc ttggatccga gcag 24 12 20
DNA Homo sapiens 12 ctgctcggat ccaagcttcg 20 13 43 DNA Homo sapiens
13 gatcactcgc gggggcgagg atgagcgccc ccgctcctct tag 43 14 42 DNA
Homo sapiens 14 gatcactcac atttacaagg atgagtgtaa atgttcctct ag 42
15 2007 DNA Homo sapiens TATA_signal (853)..(858) CDS (969)..(1673)
15 taatacgact cactataggg agacgagcgg tgtcatggcc gccgacagtg
acgatggcgc 60 agtttcagct cccgcagctt ccgacggtgg tgtcagcaaa
agcacaacat ctggggagga 120 gctagtagtc caggttcccg tagtggatgt
gcaaagcaac aacttcaagg agatgtggcc 180 atccctcctg ctagccataa
agacagctaa tttcgttggc tgtggacacg gagctgagtg 240 ggcttgggga
caagaagagt ttgctgaacc agtgcattga ggaacgttac aaggccgtgt 300
gtcatgctgc caggacccgt tctatccttt ccctgggcct cgcctgcttc aagcggcagc
360 cagacaaggg tgaacattcc tatctggctc aagtgttcaa tctcactctg
ctgtgcatgg 420 aggagtatgt catagaacca aagtctgtgc agttcctgat
acagcatggc ttcaacttca 480 accagcagta tgcccaaggc atcccctacc
ataagggcaa tgacaagggt gatgagagcc 540 agagccagtc agtacggacc
ctattcctgg agctaatccg aagcccgccg gcccctgttg 600 ctacacaatg
gccttataga cttggtgttc ctgtaccaaa acttctatgc acacctccct 660
gagagtctgg gaaccttcac cgctgacctg tgtgagatgt tcccagcagg catttatgac
720 accaaatatg ctgctgagtt tcatgcccgt ttcgtggcct cctacttaga
atatgccttc 780 cggaaatgtg ttttaggtgc tgaggattca gcagtgaaca
aaacagacca caaaaccctg 840 ctcttatgga gcttatatgc tagtggacca
ttaccctctt gcgctgttgc agtgaacggg 900 aaaatgggaa gcagcgggca
gctggcagcc cacaccttac cctggagttc tgcaactatc 960 cttccagc atg agg
gac cat att gat tac cgc tgc tgc ctg ccc cca gca 1010 Met Arg Asp
His Ile Asp Tyr Arg Cys Cys Leu Pro Pro Ala 1 5 10 acc cac cgt cct
cat ccc acc agc atc tgt gac aac ttc tcg gct tat 1058 Thr His Arg
Pro His Pro Thr Ser Ile Cys Asp Asn Phe Ser Ala Tyr 15 20 25 30 ggc
tgg tgc ccc ctg gga cca cag tgt cct cag tct cac gat att gac 1106
Gly Trp Cys Pro Leu Gly Pro Gln Cys Pro Gln Ser His Asp Ile Asp 35
40 45 cct atc att gac act gat gag gct gcg gca gag gac aag cgg cga
cgg 1154 Pro Ile Ile Asp Thr Asp Glu Ala Ala Ala Glu Asp Lys Arg
Arg Arg 50 55 60 cga cga cgt agg gaa aaa cgg aag agg gct tta ttg
aac cta ccg ggg 1202 Arg Arg Arg Arg Glu Lys Arg Lys Arg Ala Leu
Leu Asn Leu Pro Gly 65 70 75 aca cag acc tct ggg gaa gct aag gat
ggt cct ccc aag aag cag gtc 1250 Thr Gln Thr Ser Gly Glu Ala Lys
Asp Gly Pro Pro Lys Lys Gln Val 80 85 90 tgt ggg gat agc atc aag
cct gaa gaa acc gag cag gag gtg gct gcc 1298 Cys Gly Asp Ser Ile
Lys Pro Glu Glu Thr Glu Gln Glu Val Ala Ala 95 100 105 110 gat gaa
act agg aac ctg cct cac tcc aag caa ggc aac aaa aat gac 1346 Asp
Glu Thr Arg Asn Leu Pro His Ser Lys Gln Gly Asn Lys Asn Asp 115 120
125 tta gag atg ggg att aag gca gca agg cct gaa ata gct gat aga gct
1394 Leu Glu Met Gly Ile Lys Ala Ala Arg Pro Glu Ile Ala Asp Arg
Ala 130 135 140 acc tca gaa gtg cca ggg agc caa gcc agt cct aac cca
gtg cct ggg 1442 Thr Ser Glu Val Pro Gly Ser Gln Ala Ser Pro Asn
Pro Val Pro Gly 145 150 155 ggt gga ttg cac cgg gct ggt ttt gat gcc
ttt atg aca ggt tat gtg 1490 Gly Gly Leu His Arg Ala Gly Phe Asp
Ala Phe Met Thr Gly Tyr Val 160 165 170 atg gcc tat gtg gaa gtg agc
cag gga ccg caa ccc tgc agc tct gga 1538 Met Ala Tyr Val Glu Val
Ser Gln Gly Pro Gln Pro Cys Ser Ser Gly 175 180 185 190 ccc tgg ctc
cct gaa tgc cac aat aag gta tat ttg agt ggc aaa gct 1586 Pro Trp
Leu Pro Glu Cys His Asn Lys Val Tyr Leu Ser Gly Lys Ala 195 200 205
gta ccc ctc aca gtg gcc aag agc cag ttc tct cgt tcc tcc aaa gcc
1634 Val Pro Leu Thr Val Ala Lys Ser Gln Phe Ser Arg Ser Ser Lys
Ala 210 215 220 cac aat cag aag atg aag ctc act tgg ggc agt agc tga
tgcaacttcc 1683 His Asn Gln Lys Met Lys Leu Thr Trp Gly Ser Ser 225
230 235 accttgctct caggtggaac agaggtattt tgggtctctc tagcctgaaa
tgtcatcctc 1743 aactgctact gagtttgggg gagggggaat gtcttgacag
acatcactgc attgccctgg 1803 accgcctcct ttatcccagt gtttgaggta
caagtaagaa ggctgaccag cacctgtaac 1863 actgacttta tttttaagtc
tgaaaatgtc ttgggaaagt tttacaaaaa aaaaaatcaa 1923 cagaagcaag
ttatgaaaaa aaaaaaaaaa aaaaaactcg agggggggcc cggtacccaa 1983
ttctccctat agtgagtcgt atta 2007 16 234 PRT Homo sapiens 16 Met Arg
Asp His Ile Asp Tyr Arg Cys Cys Leu Pro Pro Ala Thr His 1 5 10 15
Arg Pro His Pro Thr Ser Ile Cys Asp Asn Phe Ser Ala Tyr Gly Trp 20
25 30 Cys Pro Leu Gly Pro Gln Cys Pro Gln Ser His Asp Ile Asp Pro
Ile 35 40 45 Ile Asp Thr Asp Glu Ala Ala Ala Glu Asp Lys Arg Arg
Arg Arg Arg 50 55 60 Arg Arg Glu Lys Arg Lys Arg Ala Leu Leu Asn
Leu Pro Gly Thr Gln 65 70 75 80 Thr Ser Gly Glu Ala Lys Asp Gly Pro
Pro Lys Lys Gln Val Cys Gly 85 90 95 Asp Ser Ile Lys Pro Glu Glu
Thr Glu Gln Glu Val Ala Ala Asp Glu 100 105 110 Thr Arg Asn Leu Pro
His Ser Lys Gln Gly Asn Lys Asn Asp Leu Glu 115 120 125 Met Gly Ile
Lys Ala Ala Arg Pro Glu Ile Ala Asp Arg Ala Thr Ser 130 135 140 Glu
Val Pro Gly Ser Gln Ala Ser Pro Asn Pro Val Pro Gly Gly Gly 145 150
155 160 Leu His Arg Ala Gly Phe Asp Ala Phe Met Thr Gly Tyr Val Met
Ala 165 170 175 Tyr Val Glu Val Ser Gln Gly Pro Gln Pro Cys Ser Ser
Gly Pro Trp 180 185 190 Leu Pro Glu Cys His Asn Lys Val Tyr Leu Ser
Gly Lys Ala Val Pro 195 200 205 Leu Thr Val Ala Lys Ser Gln Phe Ser
Arg Ser Ser Lys Ala His Asn 210 215 220 Gln Lys Met Lys Leu Thr Trp
Gly Ser Ser 225 230 17 725 DNA Homo sapiens unsure (2)..(597) "n"
refers to unknown nucleotide at positions 2, 3, 7, 9, 10, 12, 13,
14, 15, 16, 27, 45, 312, 320, 341, 421, 453, 501, 519, 529, 575,
578, 597 17 gnngggngnn gnnnnngggg gaacttntat cggtgcctac tcacngaaaa
ggctgaagag 60 tctcccatgt ctacttcttt ctacacagac acagcaacca
tccgatttct caatcttttc 120 cccacctttc ccccttttct attccacaaa
accgccattg tcatcatggg ccgttctcaa 180 tgagctgttg ggtgagatat
tagaattcta ctcacagaac gaaatgaaaa gtctcccatg 240 tctacttctt
ctacacaaga cacagcaaca tccgatttct caatcctttc cccaactttc 300
ccccttttct antccacaan accgccattg tcatcatggg ncgttctcaa tgagctgttg
360 ggtgagatat tagaattctg ggctgggaat gagttcagcc tggtggaatg
tgaacctgca 420 ncagtttggc atgaacgggc aaatgctgtg tancctccgg
aaaggagcgc ttcctggaag 480 ctggcgcctg actttgtggg ngacatcctc
cgggaaaang gttcactant tctaaagcgg 540 gcggcaacgc ggtggggctc
caattcgccc taaantgngt ccgtattaca attcacnggg 600 cggccgtttt
anaagtcctg nncggggaaa acccnggggt anccaacttt atcnccctgg 660
nngaaanccc cccttncnca acnggggtna naaccnannn gggccncccn ntttgcccct
720 cccaa 725 18 619 DNA Homo sapiens unsure (437)..(619) "n"
refers to unknown nucleotide at positions 437, 504, 512, 583, 591,
617, 619 18 agaagcttga attcgagcag agaagcttga attcgagcag aattggccca
attttgcctt 60 ataccacttt ccaatacctt cacttggagt gacttacact
gtggttaatt gcagttacaa 120 tgaagagatt aacatgggaa tgtcataata
attgaatcta aagaagacat aatttcaaaa 180 taagagcttg agtaataata
ccattgtgta acaatctgat ttccatccct cttatttttc 240 ctatattatg
cagtttagtt ctttactatc atgtgtttca tgtttgttcg gttttaccaa 300
cacatcatta gtaaattgaa tgtaaggctt ctcatttctt ttgtatccta catctaaaag
360 attttagtcc ttagaatcct cttgaaatgt tctccattta aaatggagaa
atagttcatg 420 ctctctcatc taagtangag ctaaaatcta aaaaattaat
aaataaaata gtccatcctc 480 taataataat aatgaatact gaanttgtta
antaataatt aatttttgag aagggggttc 540 actaatgtcg tccaagctgg
agtgcaatgg cgtgatcact aanttctaaa ncggcgccaa 600 cgcggtggag
ctccaantn 619 19 716 DNA Homo sapiens unsure (3)..(564) "n" refers
to unknown nucleotide at postions 3, 9, 11, 13, 23, 28, 32, 37, 39,
44, 49, 53, 63, 85, 89, 100, 282, 355, 361, 396, 413, 418, 424,
476, 504, 517, 525, 528, 532, 536, 541, 550, 557, 564 19 ggngtgggng
nnnggggggg ggnntttnng gnncggntnt tctnaagtnt ccngggcctc 60
atnaaacagc gggccgagaa cgggncaana tgacaatggn ggttttgtgg aatagaaaag
120 ggggaaaggt ggggaaatga ttgagaaatc ggatggttgc tgtgtctgtg
tagaaagaag 180 tagacatggg agacttttca ttttgttctg tgagtagaat
tctgggctgg gaatgagttc 240 agcctggtga atgtgaacct gcaccagttt
ggcatgaacg gncagatgct gtgtaacctc 300 ggcaaggagc gcttcctgga
gctggcgcct gactttgtgg gcgacatcct ctggnacagg 360 ntccactagt
tctagagcgg gcgccaccgc ggtggngctc caattcgccc tanagtgngt 420
cgtnttacaa ttcactggcc gtcgttttac aacgtcgtga ctgggaaaac cctggngtta
480 cccaacttaa tcgccttgca gcanatcccc ctttcgncag ctggngtnnt
ancgangagg 540 nccgcaccgn ttgcccntcc caanaagttg cgcagcctgn
atggggantg ggancgncct 600 gtnncgggng cantaagcgc ggngggtgtg
gtggntangc ncancgtgnn cgnnnnannt 660 gnnagngcct tangccngnn
ccttcgnttc tcccttcctt cnngnnangt ngcggg 716 20 619 DNA Homo sapiens
unsure (326)..(609) "n" refers to unknown nucleotide at positions
326, 435, 438, 472, 505, 556, 609 20 agaagcttga attcgagcag
agaagcttga attcgagcag aattggccca attttgcctt 60 ataccacttt
ccaatacctt cacttggagt gacttacact gtggttaatt gcagttacaa 120
tgaagagatt aacatgggaa tgtcataata attgaatcta aagaagacat aatttcaaaa
180 taagagcttg agtaataata ccattgtgta acaatctgat ttccatccct
cttatttttc 240 ctatattatg cagtttaagt tctttactat catgtgtttc
atgtttgttc ggttttacca 300 acacatcatt agtaaattga atgtanggct
tctcatttct tttgtatcct acatctaaaa 360 gattttagtc tttagaatcc
tcttgaaatg ttctccattt aaaatggaga aatagttcat 420 gctctctcat
ctaantanga gctaaaatct aaaaaataaa taaataaaat antccatcct 480
ctaataataa taatgaatac tgaanttgta aataataatt aatttttgag aatggggttc
540 actaatgtcg tccaanctgg agtgcaatgg cgtgatcact agttctaaac
cggcgccaac 600 gcggtgggnc tccaattcc 619 21 911 DNA Homo sapiens 21
accacatcca gacaatgaga agccaaaacc ttcatccttc atgatttcct tagccctccc
60 taattcctat ttaccttggt gtagttacat tccttccctg ctgtataaac
tcccaatttt 120 agtcagtaag ggagatggat ttgagataca tctcccaact
ccttggcagc agcacctggt 180 taaagcctcc tttcctggca atactatagt
ctcagtgatt ggctttcttt gtggtgagca 240 gcaggaccta gactgaaatt
gtagtatttt ggtaacagta tctgctctcc attcaaatct 300 atgctcagcc
atacagaatt attttttcag tttctttgaa tattctgcat attttcttct 360
acctctaagc ctccaaaaat aatctgaaaa gcagcaaaat cgccacaatg tggaatcaaa
420 ataggggtaa aaagcccttt agacattctt ttggcaataa actaactgaa
cttagtagga 480 cctggctcat agagacttct ctctttagga agtggacatc
tggtgactca agcatttggc 540 ttgaagcagt tttcagggga gtttcaactg
caattccaca ggatttcatt accagctatt 600 tgcggtcttg ctttttcctt
tgctggtact aaacaggtga catatatttt acattgataa 660 ttagtgtcat
ctgacttgag gccactgctt ttcttcttag tttctggtgc cctttgcagt 720
agtgcctttc ctaccatttt acatttggca gactggaaca gctcaaatag ctccaagaaa
780 gaaaaaactg cctcctttgt ctattcaagg ctctcacttc accttaaatg
cagaattttt 840 tctttttctt tttttttaag ttatgtatga ggattttttc
ttttcttttt tcttttttga 900 gacagggtct t 911 22 419 DNA Homo sapiens
unsure (120)..(298) "n" refers to unknown nucleotide at positions
120, 174, 270, 281, 298 22 acttgagtcc aggagttcaa ggctgtagtg
agttgtgatt gcaccaccgc actccagcct 60 cgatgacaga gtgagaccct
gtctgttaaa aaataataat aataatagat aatgggatan 120 gagtgtaaag
aaagacagga tgcttcttag caaagttaca aaaaatatta atangtcttt 180
gtcacaaata tatgtttgcc tatgagctga gaagagaaaa tgaaaaagtg aaaataagat
240 ttctcaaggt acaactttga tgcagttcan gtcaaactta ngtaagattt
tgttgtanag 300 tttgggaaat aaccattgtg gcaaggctgg aatgcaaatc
gattttttgc tgttacagaa 360 acagtaaatg aatttatggg attttatttt
aatttagtta gctttttatg aggagaatt 419 23 565 DNA Homo sapiens 23
ataattccat tcgattccac tcgatgattc cattcgagtt cattgactgt tccattccat
60 tccattcgat gatttcattc gagtccactc gatgattcta ttcgattgca
ttcgataatt 120 ccattcgttt gcattcgata attccattcg attccattgg
aggataattc catttgagtc 180 cattcgatga ttgttccatt cgattctatt
cggtgattcc attcgattcc atttgataat 240 gattccaatc gagaccattc
gatgattcca ttcaattcca ttcaatcatg atccctttcg 300 agtccattca
atgattccat tccagtccat tcgatgattc catctgattc cattcaatga 360
atccattcga ttccattcta tgacgattcc attcatttca tctgatgatg attccattcg
420 attcattcag tgataccatt cgattcattc gatgatgatt caatcaattt
aatcgatgat 480 tcattcgaat cattcgatga tgagtcatca tttcaattca
tggtaattca ttcgtttcaa 540 tcgatggtgt tcatttgatc atcga 565 24 584
DNA Homo sapiens unsure (328)..(582) "n" refers to unknown
nucleotide at positions 328, 403, 427, 492, 506, 557, 582 24
agagcagtcc agtatatata catacatata caagctacaa gctgcatatg taatttaaaa
60 ttttctaata accacattta aaaaggtaaa aagaaactgt tgaaataaat
tttaatatct 120 ttcattgaac ccaatatatg caaaatacta tcatttcaat
tataaccaaa ttaaaattaa 180 ggagatattt tacaattttc atattaacgt
ttccaattct ggtgtgaatt ttacactcac 240 cgaacatctc aattctgaca
agtcatattt taagtgctca acagctacgt gaggatagtg 300 gctattatgt
cacaaaatgc agctctangg atgaggacag tttacagaag atacttgagg 360
atacaggagc aagttaaatg gcagtttaag aaagcaaatc cangatgtgg gaaactccac
420 agaatanatg acctggtttc tcccttcact catccctcca aaatagaaat
caatggcaga 480 aagaaaaaag anggaggctg ttgtancata aaatacttag
ggacatacaa taaaaacagt 540 gtagggtttt gttgaanccg attcactaca
atgattcaca antt 584 25 678 DNA Homo sapiens unsure (5)..(675) "n"
refers to unknown nucleotide at positions 5, 7, 10, 12, 14, 16, 26,
42, 69, 545, 550, 566, 610, 635, 656, 675 25 ggggnnnntn tnnngnaaat
ctctgngttc gggccccccc ancaaggtcg aggcctatcg 60 ataagctana
tatcgaattc ctgcagcccg ggggatctga tggttttata aaggggagtt 120
gccctgcgaa agctctctct tacctgccgc catgtaagac cggactttgc tcctcattag
180 gtcaccctag ccatgtggaa ctgtgagtcc attaaacctc tttcctttat
aaattatgca 240 gtctcggata tgtctttatt agcaaggtga aaatgaacta
atacaagggt cacgtggtaa 300 atatatttaa tattaaaaaa aaatcttcca
aactattttc cagagtgtct gtaccttttt 360 acatttccat gagcaacgta
tgagtgattt agtttctttg acagcatttg gtatagttac 420 tattttttat
tttagttgtt ctcatcctgg tcttaatttg aattttccca atgatgagtg 480
atgttgaaaa tttttcttgt gcttacttgt catctggata ttctcgtcaa taaaatgtct
540 cttantatcn tttgcccatt ttcaantgga ttccttttgt gttttatcat
tgaattttaa 600 gaattcttcn atttatagat atgaattaca gatanaatca
tagatattat agatanatat 660 gagttatggt tcacnatt 678
* * * * *
References