U.S. patent application number 10/593790 was filed with the patent office on 2007-07-19 for novel modular type ii restriction endonuclease, cspci, and the use of modular endonucleases for generating endonucleases with new specificities.
This patent application is currently assigned to New England Biolabs, Inc.. Invention is credited to Jack Benner, Daniel Heiter, Keith Lunnen, Richard Morgan, Celine Nguefeu Nkenfou, Stephen Picone, Geoffrey Wilson.
Application Number | 20070166719 10/593790 |
Document ID | / |
Family ID | 35064409 |
Filed Date | 2007-07-19 |
United States Patent
Application |
20070166719 |
Kind Code |
A1 |
Morgan; Richard ; et
al. |
July 19, 2007 |
Novel modular type II restriction endonuclease, cspci, and the use
of modular endonucleases for generating endonucleases with new
specificities
Abstract
A novel restriction endonuclease and methods of making the same
are obtainable from either Citrobacter species 2144 (NEB#1398) or
the recombinant stain Escherichia coli (NEB#1554) which cleaves at
nt sequence 5'-CAANNNNNGTGG-3' (SEQ ID NO:14) in double-stranded
DNA molecules. The novel restriction endonuclease is a modular
protein in which the specificity moiety is an independent module
from the restriction-modification module.
Inventors: |
Morgan; Richard; (Middleton,
MA) ; Wilson; Geoffrey; (South Hamilton, MA) ;
Lunnen; Keith; (Essex, MA) ; Heiter; Daniel;
(Groveland, MA) ; Benner; Jack; (South Hamilton,
MA) ; Nkenfou; Celine Nguefeu; (Yaounde, CM) ;
Picone; Stephen; (Beverly, MA) |
Correspondence
Address: |
HARRIET M. STRIMPEL; NEW ENGLAND BIOLABS, INC.
240 COUNTY ROAD
IPSWICH
MA
01938-2723
US
|
Assignee: |
New England Biolabs, Inc.
240 County Road
Ipswich
MA
01938
|
Family ID: |
35064409 |
Appl. No.: |
10/593790 |
Filed: |
March 23, 2005 |
PCT Filed: |
March 23, 2005 |
PCT NO: |
PCT/US05/09824 |
371 Date: |
September 25, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60555796 |
Mar 24, 2004 |
|
|
|
Current U.S.
Class: |
435/6.12 ;
435/199; 435/252.33; 435/488; 435/6.13; 435/69.1; 536/23.2 |
Current CPC
Class: |
C12N 9/22 20130101 |
Class at
Publication: |
435/006 ;
435/069.1; 435/199; 435/252.33; 435/488; 536/023.2 |
International
Class: |
C12Q 1/68 20060101
C12Q001/68; C07H 21/04 20060101 C07H021/04; C12P 21/06 20060101
C12P021/06; C12N 9/22 20060101 C12N009/22; C12N 15/74 20060101
C12N015/74; C12N 1/21 20060101 C12N001/21 |
Claims
1-14. (canceled)
15. A method of making a Type II restriction endonuclease having an
altered specificity; comprising: (a) selecting a restriction
endonuclease characterized by a modular structure having a
specificity subunit and a catalytic subunit, the specificity
subunit further comprising an N-terminal domain for binding one
half site of a bipartite recognition sequence and a C-terminal
domain for binding a second half site of the bipartite recognition
sequence; (b) modifying the specificity subunit; and (c) obtaining
the Type II restriction endonuclease with altered specificity.
16. A method according to claim 15, wherein the restriction
endonuclease is selected from a set of enzymes having a modular
structure comprising a specificity subunit and a catalytic subunit,
the specificity subunit further comprising an N-terminal domain for
binding one half site of a bipartite recognition sequence and a
C-terminal domain for binding a second half site of the bipartite
recognition sequence.
17. A method according to claim 15, wherein modifying the
specificity subunit in step (b) further comprises substituting the
N-terminal domain with a second C-terminal domain or substituting
the C-terminal domain with a second N-terminal domain.
18. A method according to claim 15, wherein modifying the
specificity subunit further comprises substituting the N-terminal
domain or the C-terminal domain or both N-terminal and C-terminal
domain with a binding domain from a second restriction endonuclease
or methyltransferase.
19. A method according to claim 15, wherein modifying the
specificity subunit further comprises mutating the N-terminal
domain, the C-terminal domain or both domains to alter the binding
specificity.
20. A method according to claim 15, 16, 17, 18 or 19, wherein
modifying the specificity subunit further comprises changing the
length of the spacer amino acid sequence between the N-terminal and
C-terminal domains of the specificity module.
21. A method according to claim 18, wherein the second restriction
endonuclease or methyltransferase is selected from a group
consisting of a Type I restriction endonuclease, a Type IIG
restriction endonuclease and a .gamma.-type m.sup.6A
methyltransferase.
22. A method according to claim 15, wherein the specificity subunit
and the catalytic subunit are encoded by different genes.
23. A substantially pure Type IIG restriction endonuclease
obtainable from Citrobacter species 2144 (NEB#1398) (ATCC Patent
Accession No. PTA-5846) or from Escherichia coli NEB#1554 (ATCC
Patent Accession No. PTA-5887) capable of recognizing at least one
sequence selected from the group consisting of SEQ ID NO:32, SEQ ID
NO:33, SEQ ID NO:34 and SEQ ID NO:35, and cleaving the DNA on both
sides of the recognition sequence.
24. An isolated DNA encoding CstMI restriction endonuclease
obtainable from Escherichia coli NEB#1554 (ATCC Patent Accession
No. PTA-5887) or from Citrobacter species 2144 (NEB#1398) (ATCC
Patent Accession No. PTA-5846).
25. Isolated DNA encoding the restriction endonuclease of claim 1,
wherein the DNA comprises a first DNA segment encoding an
endonuclease and methyl transferase catalytic function and a second
DNA segment encoding a sequence specificity function of the
restriction endonuclease wherein the first and second DNA segments
comprise one or more DNA molecules.
26. A recombinant DNA vector, comprising: at least one of a first
DNA segment coding for the restriction and modification domains of
CspCI restriction endonuclease and a second segment coding for the
specificity domain of the restriction endonuclease.
27. A host cell transformed with a first DNA segment coding for the
restriction and modification domains of CspCI restriction
endonuclease and a second segment coding for the specificity domain
of the restriction endonuclease wherein the first DNA segment and
the second DNA segment are contained within one or more DNA
vectors.
28. A method for obtaining the endonuclease of claim 23, comprising
cultivating a sample of Citrobacter species 2144 (NEB#1398) or a
host cell according to claim 6 under conditions favoring the
production of the endonuclease; and purifying the endonuclease
therefrom.
Description
BACKGROUND OF THE INVENTION
[0001] Restriction endonucleases are enzymes that occur naturally
in certain unicellular microbes--mainly bacteria and archaea--and
that function to protect those organisms from infections by viruses
and other parasitic DNA elements. Restriction endonucleases bind to
specific sequences of nucleotides (`recognition sequence`) in
double-stranded DNA molecules (dsDNA) and cleave the DNA, usually
within or close to these sequences, disrupting the DNA and
triggering its destruction. Restriction endonucleases usually occur
with one or more companion enzymes termed modification
methyltransferases. Methyltransferases bind to the same sequences
in dsDNA as the restriction endonucleases they accompany, but
instead of cleaving the DNA, they alter it by the addition of a
methyl group to one of the bases within the sequence. This
modification (`methylation`) prevents the restriction endonuclease
from productively recognizing that site thereafter, rendering the
site resistant to cleavage. Methyltransferases function as cellular
antagonists to the restriction endonucleases they accompany,
protecting the cell's own DNA from destruction by its restriction
endonucleases. Together, a restriction endonuclease and its
companion modification methyltransferase(s) form a
restriction-modification (R-M) system, an enzymatic partnership
that accomplishes for microbes what the immune system accomplishes,
in some respects, for multicellular organisms.
[0002] A large and varied class of restriction endonucleases has
been classified as `Type II` class of restriction endonucleases.
These enzymes cleave DNA at defined positions, and when purified
can be used to cut DNA molecules into precise fragments for gene
cloning and analysis. The biochemical precision of Type II
restriction endonucleases far exceeds anything achievable by
chemical methods, making these enzymes the reagents sine qua non of
molecular biology laboratories. In this capacity as molecular tools
for gene dissection Type II restriction endonucleases have had a
profound impact on the life sciences and medicine in the past 25
years, transforming the academic and commercial arenas, alike.
Their utility has spurred a continuous search for new restriction
endonucleases, and a large number have been found: today more than
250 Type II endonucleases are known, each possessing different DNA
cleavage characteristics (Roberts, R. J. et al., Nucl. Acids. Res.
33:D230-D232 (2005)). (Rebase, http://rebase.neb.com/rebase). The
production and purification of these enzymes have also been
improved by the cloning and overexpression of the genes that encode
them, usually in the context of non-native host cells such as E.
coli.
[0003] Since the various restriction enzymes appear to perform
similar biological roles, and share the biochemistry of causing
dsDNA breaks, it might be thought that they would resemble one
another in amino acid sequence closely. Experience shows this not
to be true, however. Surprisingly, far from sharing significant
amino acid similarity with one another, most enzymes appear unique,
with their amino acid sequences resembling neither other
restriction enzymes nor any other known kind of protein. Type II
restriction endonucleases seem to have arisen independently of each
other during evolution, for the most part, and to have done so
hundreds of times, so that today's enzymes represent a
heterogeneous collection rather than a discrete family descended
from a common ancester. Restriction endonucleases are biochemically
diverse in organization and action: some act as homodimers, some as
monomers, others as heterodimers. Some bind symmetric sequences,
others asymmetric sequences; some bind continuous sequences, others
discontinuous sequences; some bind unique sequences, others
multiple sequences. Some are accompanied by a single
methyltransferase, others by two, and yet others by none at all.
When two methyltransferases are present, sometimes they are
separate proteins and at other times they are fused. The orders and
orientations of restriction and modification genes vary, with all
possible organizations occurring. Several kinds of
methyltransferases exist, some methylating adenines, others
methylating cytosines at the N-4 position, or at the 5 position).
Usually there is no way of predicting, a priori, which
modifications will block a particular restriction endonuclease,
which kind(s) of methyltransferases(s) will accompany that
restriction endonuclease in any specific instance, nor what their
gene orders or orientations will be.
[0004] From the point of view of cloning a Type II restriction
endonuclease, the great variability that exists among R-M systems
means that, for experimental purposes, each is unique. Each enzyme
is unique in amino acid sequence and catalytic behavior; each
occurs in unique enzymatic association, adapted to unique microbial
circumstances; and each presents the experimenter with a unique
challenge. Sometimes a restriction endonuclease can be cloned and
over-expressed in a straightforward manner but very often it
cannot, and what works well for one enzyme may fail altogether for
the next. Success with one is no guarantee of success with
another.
[0005] Novel endonucleases provide opportunities for innovative
genetic engineering.
SUMMARY OF THE INVENTION
[0006] In an embodiment of the invention, a substantially pure Type
IIG restriction endonuclease and an isolated DNA obtainable from
Citrobacter species 2144 (NEB#1398) (ATCC Patent Accession No.
PTA-5846) have been obtained. The recombinant DNA of the enzyme
from the Citrobacter species and cloned product thereof from
Escherichia coli NEB#1554 (ATCC Patent Accession No. PTA-5887) is
provided.
[0007] A further characteristic of the above-described restriction
endonuclease is that it recognizes the following base sequence in
double-stranded deoxyribonucleic acid molecules: TABLE-US-00001 5'-
.dwnarw.N.sub.10CAANNNNNGTGGN.sub.12.dwnarw. -3' (SEQ ID NO:33) 3'-
.uparw.N.sub.12GTTNNNNNCACCN.sub.10.uparw. -5' and/or 5'-
.dwnarw.N.sub.10CAANNNNNGTGGN.sub.13.dwnarw. -3' (SEQ ID NO:34) 3'-
.uparw.N.sub.12GTTNNNNNCACCN.sub.11.uparw. -5' and/or 5'-
.dwnarw.N.sub.11CAANNNNNGTGGN.sub.12.dwnarw. -3' (SEQ ID NO:35) 3'-
.uparw.N.sub.13GTTNNNNNCACCN.sub.10.uparw. -5' and/or 5'-
.dwnarw.N.sub.11CAANNNNNGTGGN.sub.13.dwnarw. -3' (SEQ ID NO:32) 3'-
.uparw.N.sub.13GTTNNNNNCACCN.sub.11.uparw. -5'
and cleaves the DNA on both sides of the recognition sequence at
the alternative positions shown by the arrows.
[0008] The DNA encoding the restriction endonuclease described
above may include a first DNA segment expressing endonuclease and
methyltransferase catalytic functions and a second DNA segment
encoding a sequence specificity function of the restriction
endonuclease wherein the first and second DNA segments are
contained in one or more DNA molecules.
[0009] The above-described DNA may be inserted into a vector. The
vector may include at least one of a first DNA segment coding for
the restriction and modification domains of CspCI restriction
endonuclease and a second segment coding for the specificity domain
of the restriction endonuclease.
[0010] In an embodiment of the invention, a host cell is provided
which is transformed by a first DNA segment coding for the
restriction and modification domains of CspCI restriction
endonuclease and a second segment coding for the specificity domain
of the restriction endonuclease. The first DNA segment and the
second DNA segment may be contained within one or more DNA
vectors.
[0011] In an embodiment of the invention, a method is provided for
obtaining the restriction endonuclease which includes the steps of
cultivating a sample of Citrobacter species 2144 (NEB#1398) or a
host cell as described above under conditions favoring the
production of the endonuclease; and purifying the endonuclease
therefrom.
[0012] In an embodiment of the invention, a method of making a Type
II restriction endonuclease having an altered specificity includes:
(a) selecting a restriction endonuclease from a set of enzymes
wherein each enzyme in the set is characterized by a modular
structure having a specificity subunit and a catalytic subunit. The
specificity subunit further includes an N-terminal domain for
binding one half site of a bipartite recognition sequence and a
C-terminal domain for binding a remaining half site of the
bipartite recognition sequence; (b) modifying the specificity
subunit; and (c) obtaining the restriction endonuclease with
altered specificity.
[0013] Where the restriction endonuclease is CspCI, one half site
is CM and the other half site is GTGG.
[0014] In this method, the step of modifying the specificity
subunit may further include (a) substituting the N-terminal domain
with a second copy of the C-terminal domain or substituting the
C-terminal domain with a second copy of the N-terminal domain (b)
substituting the N-terminal domain or the C-terminal domain or both
N-terminal and C-terminal domain with a DNA-binding domain from a
second restriction endonuclease or methylase, or (c) mutating the
N-terminal domain, the C-terminal domain or both domains to alter
the binding specificity. In any of these modifications or without
these modifications, an additional modification can be added,
namely changing the length of the spacer amino acid sequence
between the N-terminal and C-terminal domains of the specificity
subunit. In any of the above, the specificity subunit and the
catalytic subunit may be encoded by separate and distinct
genes.
[0015] In an embodiment of the invention, DNA-binding domain from
the second restriction endonuclease or methylate may derive from a
Type I restriction endonuclease, another Type IIG restriction
endonuclease, or from a .gamma.-type m.sup.6A methyltransferase.
Additionally, it is envisioned that the N-terminal cleavage domains
can be grafted onto other DNA-binding proteins.
BRIEF DESCRIPTION OF THE FIGURES
[0016] FIG. 1 is an agarose gel showing CspCI-cleavage of phage
lambda, T7, PhiX174, pBR322 and pUC19. DNAs. Lanes are as
follows:
[0017] lanes 1, 10, 15: lambda-HindIII, PhiX174-HaeIII size
standards;
[0018] lane 2: lambda DNA+CspCI;
[0019] lane 3: T7 DNA+CspCI;
[0020] lane 4: PhiX174 DNA;
[0021] lane 5: PhiX174 DNA+CspCI;
[0022] lane 6: PhiX174 DNA+CspCI+PstI;
[0023] lane 7: PhiX174 DNA+CspCI+SspI;
[0024] lane 8: PhiX174 DNA+CspCI+NciI;
[0025] lane 9: PhiX174 DNA+CspCI+StuI;
[0026] lane 11: pBR322 DNA;
[0027] lane 12: pBR322 DNA+CspCI;
[0028] lane 13: pUC19 DNA;
[0029] lane 14: pUC19 DNA+CspCI.
[0030] FIG. 2 is a high-concentration agarose gel of CspCI-cleaved
pUC2CspC DNA showing 35.+-.1 bp internal `mini-fragment`
(arrows).
[0031] FIG. 3 is a high-resolution agarose gel showing
partial-digestion doublet fragments. DNA: BglI-cleaved pUC2CspC
re-digested with increasing amounts of CspCI. Transient CspCI-BglI
fragment doublets are show by the arrows.
[0032] FIGS. 4a and 4b show a determination of the CspCI cleavage
sites by primed synthesis. Two experiments were performed using the
same M13mp18 template and primer combination. (-) is CspCI-cleaved
DNA only; (+) is Klenow-treatment of the CspCI-cleaved DNA.
[0033] FIG. 5 shows a determination of the CspCI cleavage sites by
run-off automated sequencing.
[0034] FIG. 5a: pUC1CspC-4 template; forward primer (SEQ ID
NO:1)
[0035] FIG. 5b: pUC1CspC-4 template; reverse primer (SEQ ID
NO:2)
[0036] FIG. 5c: pUC1CspC-1 template; forward primer (SEQ ID
NO:3)
[0037] FIG. 5d: pUC1CspC-1 template; reverse primer (SEQ ID
NO:4
[0038] A-anomalies, signifying template cleavage, are shown as
triangles (.DELTA.) below the tracings.
[0039] FIG. 6 shows the complete nucleotide sequence of the DNA
cloned from Citrobacter species 2144 (NEB#1398, New England
Biolabs, Inc., Beverly, Mass.) (SEQ ID NO:5).
[0040] FIG. 7a shows the nucleotide sequence of the CspCI-R-M gene
(SEQ ID NO:6).
[0041] FIG. 7b shows the nucleotide sequence of the CspCI-S gene
(SEQ ID NO:7).
[0042] FIG. 8a shows the gene organization of the CspCI
restriction-modification system.
[0043] FIG. 8b shows the gene organization of the plasmid clone
pUC19-CspCI-R-M-S ApoI #3 carrying the CspCI genes inserted into
the EcoRI site of pUC19
[0044] FIG. 9a shows the predicted amino acid sequences of the
R-M-CspCI endonuclease-methyltransferase subunit (SEQ ID NO:8).
[0045] FIG. 9b shows the predicted amino acid sequences of the
CspCI specificity subunit (SEQ ID NO:9).
DETAILED DESCRIPTION OF THE INVENTION
[0046] In most restriction enzymes, the parts of the protein
responsible for binding to the recognition sequence
(`specificity`:S) and for cleaving it (`catalysis`) are
interlinked. Experience has taught that altering either of these
functions frequently impairs the other, and renders the enzyme
inactive. A new class of enzymes has been identified in which the
functions of specificity and catalysis are largely separated. These
members of the Type IIG class of restriction endonucleases are
large enzymes in which the twin activities of restriction and
modification are combined in a single polypeptide chain while
specificity resides with a different polypeptide chain. Examples of
restriction endonucleases in this class are CspCI, BcgI and BaeI.
While not wishing to be limited by theory, CspCI is believed to act
as a dimer of one R-M-subunit and one S-subunit, while BcgI acts as
a trimer of two R-M subunits and one S-subunit.
[0047] The separated functional organization of this class of
enzymes provides unusual opportunities for protein engineering
because the functional modules can be independently manipulated to
generate novel specificities of choice as described in more detail
in Example V.
[0048] This new class of endonucleases is characterized by a DNA
encoding the specificity subunit that is distinct from the R-M
genes. The genes for these occur side by side, naturally, and are
expressed in cis. These genes can also be separated into different
replicons, and expressed in trans, without loss of activity. The
separate location of these genes in different amplicons permits the
S and the R-M genes to be altered individually, and allows the
endonuclease, or variants of it, to be reconstituted easily in
vivo, simply by introducing the two replicons into the same cell,
rather than rejoining the genes into the same DNA molecule.
Reconstitution can be performed individually, or in bulk by
transforming libraries of one altered gene into cells harboring the
other. Both genes may alternatively be co-transformed, together in
a mixture.
[0049] Alternatively, the R-M and S genes can be separated to allow
them to be expressed individually in different host cells. It will
be appreciated that since neither protein alone exhibits toxic
activity, the cells producing either subunit will be viable.
Expressing the subunits separately allows them to be purified
individually, and enables the enzyme, or variants of it, to be
reconstituted easily in vitro, simply by mixing together
preparations of the two subunits. High-throughput screening, and/or
multiplexing can be achieved using extracts of cells instead of
purified proteins.
[0050] The presence of DNA-methyltransferase motifs within this
class of endonuclease suggests that the endonucleases have
intrinsic methylation activity, in addition to endonuclease
activity. For example, CspCI is dependent on
S-adenosyl-L-methionine (AdoMet). By mutating the catalytic sites
for these activities, variants of these endonucleases can be
isolated. DNA-cleavage activity, DNA-methylation activity, or both,
may be abolished in these mutants.
[0051] Typically, the specificity subunit of endonucleases in the
Type IIG class determines which target sequence in a DNA molecule
will undergo cleavage by means of the R-M subunit. The R-M subunit
has a distinct N-terminal domain for DNA-cleavage, and a distinct
C-terminal domain for DNA-methylation. The S subunit has a distinct
N-terminal domain for binding one-half of the bipartite recognition
sequence, and a distinct C-terminal domain, for binding the other
half.
[0052] Other modular enzymes exist which characteristically cleave
DNA at a sequence that is distant to the recognition site. However,
these enzymes are monomers (CjeI and AloI) or homodimers (HaeIV)
both types being single proteins with a composition of R-M-S.
[0053] For any unknown restriction endonuclease that is observed to
have a modular structure, the recognition sequence of the
endonuclease of the class may be determined by mapping the
locations of the cleavage sites in a target DNA of known sequence.
The DNA sequences of these regions are compared for similarity and
common features. Candidate recognition sequences are compared with
the observed restriction fragments produced by
endonuclease-cleavage of a variety of DNAs. The approximate size of
DNA fragments produced by endonuclease digestion can be entered
into the program REBPredictor, which can be accessed at
http://taq.neb.com/.about.vincze/REBpredictor/index.php. Example
III describes how REBPredictor was used to predict potential
recognition sites for CspCI.
[0054] A modular endonuclease of the type described above can be
obtained as a product of recombination in a host cell or by
culturing the native strain. Host cells are grown in suitable media
supplemented with 100 mg/ml ampicillin and incubated aerobically at
37.degree. C. Cells in the late logarithmic stage of growth are
collected by centrifugation and either disrupted immediately or
stored frozen at -70.degree. C.
[0055] Conventional protein purification techniques can be used to
isolate the endonuclease from lysed cells. Cell paste is suspended
in a buffer solution and ruptured by sonication, high-pressure
dispersion or enzymatic digestion to allow extraction of the
endonuclease by the buffer solution. Intact cells and cellular
debris are then removed by centrifugation to produce a cell-free
extract containing the endonuclease. The endonuclease is then
purified from the cell-free extract by ion-exchange chromatography,
affinity chromatography, molecular sieve chromatography, or a
combination of these methods.
[0056] Alteration of the specificity domains in Type I restriction
enzymes has been achieved to generate novel enzymes that recognize
symmetric DNA sequences, and hybrid DNA sequences (Bickle et al.
Journal of Cell Biochemistry 18c136 (1994); Bickle et al. EMBO
Journal 15: 4775-4783 (1996)). Example VI describes how the
specificity domain in a modular Type II restriction enzyme can be
manipulated to alter the specificity of the enzyme.
[0057] Present embodiments of the invention are further illustrated
by the following Examples. These Examples are provided to aid in
the understanding of embodiments of the invention and are not
construed as a limitation thereof.
[0058] The references cited above and below as well as provisional
application No. 60/555,795 are herein incorporated by
reference.
EXAMPLES
Example I
Isolation of CspCI
[0059] CspCI was obtained by culturing either (i) Citrobacter
species 2144 (NEB#1398) or (ii) the transformed host, E. coli
NEB#1554, and recovering the endonuclease from the cells. A sample
of Citrobacter species 2144 (NEB#1398) has been deposited under the
terms and conditions of the Budapest Treaty with the American Type
Culture Collection (ATCC) on Mar. 4, 2004 and bears the Patent
Accession No. PTA-5846. A sample of a recombinant strain expressing
CspCI, E. coli (NEB#1554), has also been deposited under the terms
and conditions of the Budapest Treaty with the American Type
Culture Collection (ATCC) on Mar. 24, 2004 and bears the Patent
Accession No. PTA-5887.
[0060] Citrobacter species 2144 (NEB#1398) or E. coli (NEB#1554)
were incubated aerobically at 37.degree. C. Cells in the late
logarithmic stage of growth are collected by centrifugation and
either disrupted immediately or stored frozen at -70.degree. C.
[0061] The CspCI endonuclease was isolated from Citrobacter species
2144 (NEB#1398) or Escherichia coli (NEB#1554) by conventional
protein purification techniques. The cell paste was suspended in a
buffer solution and ruptured by sonication, high-pressure
dispersion or enzymatic digestion to allow extraction of the
endonuclease by the buffer solution. Intact cells and cellular
debris are then removed by centrifugation to produce a cell-free
extract containing CspCI. The CspCI endonuclease was then purified
from the cell-free extract by ion-exchange chromatography, affinity
chromatography, molecular sieve chromatography, or a combination of
these methods to produce the endonuclease.
Example II
Production of Native or Recombinant CspCI Endonuclease
[0062] 277 grams of E. coli NEB#1554 CspCI cell pellet or
Citrobacter species 2144 (NEB#1398) (New England Biolabs, Inc.,
Beverly, Mass.) were suspended in 1 liter of Buffer A (20 mM
Tris-HCl (pH 7.4), 1.0 mM DTT, 0.1 mM EDTA, 5% Gycerol) containing
300 mM NaCl, and passed through a Gaulin homogenizer at
.about.12,000 psig. The lysate was centrifuged at
.about.13,000.times.G for 40 minutes and the supernatant
collected.
[0063] The supernatant solution was applied to a 400 ml DEAE
Fast-Flow column (GE Healthcare, formerly Amersham Biosciences,
Piscataway N.J.) column equilibrated in buffer A plus 300 mM NaCl,
and the flow-through, containing the CspCI endonuclease activity,
was diluted 1:1 with buffer A.
[0064] The diluted enzyme was applied to a 375 ml Heparin Hyper-D
column (Biosepra, Marlborough Mass.), which had been equilibrated
in buffer B. (20 mM Tris-HCl (pH 7.4), 150 mM NaCl, 1.0 mM DTT, 0.1
mM EDTA, 5% Gycerol). A 2.5 L wash of buffer B was applied, then a
2 L gradient of NaCl from 0.15M to 1M in buffer B was applied and
fractions were collected. Fractions were assayed for CspCI
endonuclease activity by incubating with 1 microgram of phage
lambda DNA (NEB) in 50 microliter NEBuffer 2, supplemented with 20
microMolar (AdoMet) for 15 minutes at 37.degree. C. CspCI activity
eluted at 0.3M to 0.35M NaCl.
[0065] The Heparin Hyper-D column fractions containing the CspCI
activity were pooled and load directly onto a 200 ml Ceramic htp
column (Biosepra, Marlborough Mass.) equilibrated in Buffer B. A 1
L wash of buffer B was applied, then a 1 L gradient of KHPO.sub.4
(pH 7.5) from 0M to 0.6M in buffer B was applied and fractions were
collected. Fractions were assayed for CspCI endonuclease activity
by incubating with 1 microgram of phage lambda DNA in 50 microliter
NEBuffer 2, supplemented with 20 microMolar AdoMet for 15 minutes
at 37.degree. C. CspCI activity eluted at 0.4M to 0.5M KHPO4.
[0066] The Ceramic HTP column fractions containing the CspCI
activity were pooled and dialyzed into Buffer C (20 mM Tris-HCl (pH
7.4), 100 mM NaCl, 1.0 mM DTT, 0.1 mM EDTA, 5% Gycerol).
[0067] This pool was flowed through a 50 ml Source Q column (GE
Healthcare, formerly Amersham Biosciences, Piscataway N.J.)
equilibrated in buffer C and directly onto a Heparin TSK
equilibrated in buffer C. A 250 ml wash of buffer C was applied,
then a 400 ml gradient of NaCl from 0.1M to 0.8 M in buffer C was
applied and fractions were collected. Fractions were assayed for
CspCI endonuclease activity by incubating with 1 microgram of phage
lambda DNA (New England Biolabs, Inc., Beverly, Mass.) in 50
microliter NEBuffer 2, supplemented with 20 microMolar AdoMet for
15 minutes at 37.degree. C. CspCI activity eluted at 0.3M to 0.35M
NaCl.
[0068] The pool was dialyzed into Storage Buffer (10 mM Tris-HCl
(pH 7.4), 100 mM NaCl, 1.0 mM DTT, 0.1 mM EDTA, 50% Gycerol). One
million units of CspCI were obtained from this procedure. The CspCI
endonuclease thus produced was substantially pure and free of
contaminating nucleases. SDS polyacrylamide gel electrophoresis of
a sample of this preparation showed it comprised two principal
proteins of approximately 70 kDa and 35 kDa in the approximate
ratio by mass of 2:1.
[0069] Activity Determination
[0070] CspCI activity: Samples of from 1 to 10 microliter were
added to 50 microliter of substrate solution consisting of
1.times.NEBuffer 2 (New England Biolabs, Inc., Beverly, Mass.)
containing 1 microgram of phage lambda phage DNA supplemented with
20 microMolar AdoMet. The reaction was incubated at 37.degree. C.
for 60 minutes. The reaction was terminated by adding 20 microliter
of stop solution (50% glycerol, 50 mM EDTA pH 8.0, and 0.02%
Bromophenol Blue.) The reaction mixture was applied to a 10/0
agarose gel and electrophoresed. The bands obtained were identified
by comparison with DNA size standards.
[0071] Unit Definition: One unit of CspCI is defined as the amount
of CspCI required to completely cleave one microgram of phage
lambda DNA in a reaction volume of 50 microliter of
1.times.NEBuffer 2 (New England Biolabs, Inc., Beverly, Mass.)
supplemented with 20 microMolar AdoMet, within one hour at
37.degree. C.
[0072] Properties of CspCI:
[0073] AdoMet: Supplementing the CspCI reaction with 20 mM AdoMet
greatly enhanced the activity of the enzyme. In reactions where
AdoMet was omitted, the enzyme exhibited less than 5% of the
cutting activity it exhibited in the AdoMet-supplemented reactions,
indicating that AdoMet is a necessary cofactor for this enzyme.
[0074] Activity in various reaction buffers: CspCI was found to be
most active in NEBuffer 2+AdoMet, relative to other standard
NEBuffers (New England Biolabs, Inc, Beverly, Mass.).
[0075] Digestion at 37.degree. C. for one hour in the following
NEBuffers yielded the following approximate percentage cleavage
activities relative to NEBuffer 2 (New England Biolabs, Inc,
Beverly, Mass.)+20 mM AdoMet: [0076] NEBuffer 1+20 mM AdoMet: 10%
[0077] NEBuffer 2+20 mM AdoMet: 100% [0078] NEBuffer 3+20 mM
AdoMet: 100% [0079] NEBuffer 4+20 mM AdoMet: 75% [0080] NEBuffer
2-(No AdoMet): <5%
[0081] Activity in a 16-hour reaction: 0.5 units of CspCI are
required to cut one microgram of phage lambda DNA in a 16-hour
digest, compared to one unit that is required to cut one microgram
of phage lambda DNA in a one-hour digest.
[0082] Temperature: The CspCI unit titer was determined at
37.degree. C. by a one-hour incubation in 1.times.NEBuffer 2 plus
20 microMolar AdoMet. Incubation of CspCI at 70.degree. C. for 20
minutes prior to performing a reaction at 37.degree. C. does not
inactivate the enzyme. After heat treatment at 70.degree. C. for 20
minutes, CspCI retains nearly full activity.
[0083] Bilateral cleavage: CspCI cleaves DNA on both sides of its
recognition sequence. As a result, in addition to producing regular
restriction fragments, CspCI cleavage generates small, internal,
`mini-fragments` of 35.+-.1 bp, one from each recognition site.
These mini-fragments, which can be visualized by gel
electrophoresis (FIG. 2), comprise the recognition sequence and the
flanking DNA on each side up to the cut sites. The two cleavage
events that produce the mini-fragments appear to proceed
separately: cleavage occurs first on one side of the recognition
sequence and then later on the other side, rather than on both
sides simultaneously. As a result, when partially digested samples
of DNA are examined by gel electrophoresis, the DNA fragments
appear as doublets or triplets depending on whether the
mini-fragments have been trimmed yet from their termini (FIG.
3).
Example III
Determination of the CspCI Cleavage Site
[0084] The location of CspCI-induced cleavage relative to the
recognition sequence was determined by two methods, primed
synthesis and run-off automated sequencing.
[0085] A: Primed Synthesis Method
[0086] The locations of CspCI cleavages relative to the recognition
sequence was determined by cleavage of a primer extension product,
which was then electrophoresed alongside a set of standard dideoxy
sequencing reactions produced from the same primer and template.
M13mp18 DNA was employed as the template with a primer near the
recognition sequence position at 3009. Readable sequence for this
primer template combination begins at position 3069 and continues
through the CspCI site.
[0087] Sequencing Reactions
[0088] The sequencing reactions were performed using the Sequenase
version 2.0 DNA sequencing kit (GE Healthcare, formerly Amersham
Life Science) with modifications for the cleavage site
determination. The template and primer were assembled in a 0.5 ml
Eppendorf tube by combining 2.5 microliter dH2O, 3 microliter
5.times. sequencing buffer (200 mM Tris pH 7.5, 250 mM NaCl, 100 mM
MgCl.sub.2), 8 microliter M13mp18 single-stranded DNA (1.6
microgram) and 1.5 microliter of primer at 3.2 mM concentration.
The primer-template solutions were incubated at 65.degree. C. for 2
minutes, then cooled to 37.degree. C. over 20 minutes in a beaker
of 65.degree. C. water on the bench top to anneal the primer. The
labeling mix (diluted 1:20) and T7 Sequenase polymerase were
diluted according to manufacturer's instructions. The annealed
primer and template tube was placed on ice. To this tube were added
1.5 microliter 100 mM DTT, 3 microliter diluted dGTP labeling mix,
1 microliter [.alpha.-.sup.33P] dATP (2000 Ci/mM, 10 mCi/ml) and 3
microliter diluted T7 Sequenase polymerase (GE Healthcare, formerly
Amersham, Piscataway, N.J.). The reaction was mixed and incubated
at room temperature for 4 minutes.
[0089] 3.5 microliter of this reaction was then transferred into
each of four tubes containing 2.5 microliter termination mix for
the A, C, G and T sequencing termination reactions. To the
remaining reaction was added to 10 microliter of Sequence Extending
Mix (GE Healthcare, formerly Amersham Biosciences, Piscataway,
N.J.), which is a mixture of dNTPs (no ddNTPs) to allow extension
of the primer through and well beyond the CspCI site with no
terminations to create a labeled strand of DNA extending through
the CspCI recognition site for subsequent cleavage. The reactions
were incubated 5 minutes at 37.degree. C. To the A, C, G and T
reactions were added 4 microliter of stop solution and the samples
were stored on ice. The extension reaction was then incubated at
70.degree. C. for 20 minutes to inactivate the DNA polymerase
(Sequenase) (GE Healthcare, formerly Amersham, Piscataway, N.J.),
then cooled on ice.
[0090] 10 microliter of the extension reaction was then placed in
zone 0.5 ml Eppendorf tube and 7 microliter was placed in a second
tube. To the first tube was added 1 microliter (approximately 0.5
unit) of CspCI endonuclease, The reaction was mixed, and then 2
microliter was transferred to the second tube. These enzyme digest
reactions were mixed and then incubated at 37.degree. C. for 1
hour, following which the reactions were divided in half. To one
half, 4 microliter of stop solution was added and mixed (the
`minus` polymerase reaction). To the second half, 0.4 microliter
Klenow DNA polymerase (NEB#210) (New England Biolabs, Inc.,
Beverly, Mass.) containing 80 mM dNTPs was added (the `plus`
reaction), and the reaction was incubated at room temperature for
15 minutes, following which 4 microliter of stop solution was
added.
[0091] The sequencing reaction products were electrophoresed on an
6% Bis-Acrylamide sequencing gel (Stratagene Corporation, La Jolla,
Calif.), with the CspCI digestions of the extension reaction next
to the set of sequencing reactions produced from the same primer
and template combination.
[0092] Results
[0093] Digestion of the extension reaction product (the `minus`
reaction) produced a band which co-migrated with the C residue 12
bases 5' to the CspCI recognition sequence,
5'-CAGAGAGATAACCCACAAGAATTG-3', (SEQ ID NO:10) indicating cleavage
between the 12.sup.th and 11.sup.th bases 5' of the recognition
sequence on this strand. A second band was produced which
co-migrated with the A residue 12 bases 3' to the CspCI recognition
site on this strand, CCACAAGAATTGAGTTAAGCCCAA (SEQ ID NO:11),
indicating cleavage between the 12.sup.th and 13.sup.th bases 3' to
the recognition site. There was also a faint band one base farther
from the recognition site, indicating that a small portion of the
molecules were cut between the 13.sup.th and 14.sup.th bases 3' to
the recognition sequence. Treatment of the cleaved extension
reaction product with Kienow DNA polymerase (the `plus` reaction)
produced a band two bases shorter than the first band described
above, which co-migrated with the A residue 14 bases 5' to the
recognition sequence; 5'-ATCGAGAGATAACCCACAAGAATTG-3' (SEQ ID
NO:12), indicating cleavage between the 13.sup.th and 14.sup.th
bases 3' to the recognition sequence on the opposite strand of the
DNA (5'-CAANNNNNGTGG(N.sub.13) (SEQ ID NO:13). Several additional
bands were observed in the `plus` lane as well, corresponding to
the original band, 12 bases 3' to the site, and bands one and two
bases shorter, produced from cuts on the opposite strand of DNA
closer to the recognition sequence (FIG. 4).
[0094] These results, when combined with those obtained by the
second method described below, indicate that CspCI cleaves DNA on
both sides of its recognition sequence, and can do so at either
N11/N13 or N10/N12 5' to the sequence 5'-CAANNNNNGTGG-3' (SEQ ID
NO:14) and at N13/N11 or N12/N10 3' to the sequence, to produce DNA
fragments with 2-base 3'-extensions, and an excised fragment of 34,
35 or 36 bases that contains the recognition site.
[0095] B: Run-Off Sequencing Method
[0096] The second approach employed automated sequencing of
CspCI-partially cleaved template DNA with forward and reverse
primers to produce sequencing traces that extended through the
sites of cleavage. Two plasmids served as templates, pUC1CspC-1 and
pUC1CspC-4, constructed by inserting an oligonucleotide containing
the CspCI recognition sequence into the AatII site at nt 2617 of
pUC19 in both orientations (described in Example III, section 2,
below).
[0097] CspCI-Cleavage of pUC1CspC-1 and pUC1CspC-4
[0098] Sequencing reactions were carried out on partial digests of
pUC1CspC-1 and pUC1CspC-4, in order to determine the sites of
cleavage on both sides of the recognition site.
[0099] The digests were performed as follows:
[0100] a. Combine: [0101] 25 microgram pUC1CspC-1 or pUC1CspC-4
[0102] 100 microliter NEBuffer2 [0103] 1 microliter 32 mM AdoMet
[0104] dH2O to 1000 microliter
[0105] b. Distribute the mixture: 200 microliter in one reaction
tube, 100 microliter in 8 subsequent tubes.
[0106] c. Add 160 units CspCI endonuclease to the first tube, mix,
remove 100 microliter and add it to the second tube, mix, remove
100 microliter and add it to the third tube, etc. until the 9th
tube is reached.
[0107] d. Incubate all 9 reactions at 37.degree. C. for 60 minutes,
then place on ice.
[0108] e. Analyze a sample of each reaction on agarose gel; select
completely cleaved and partially cleaved plasmids.
[0109] f. Purify the cleaved plasmids for sequencing using Zymo DNA
Clean and Concentrator-5 spin-columns according to the
manufacturer's recommendations (Zymo Research, Orange, Calif.).
[0110] Sequencing Reactions
[0111] The reactions were performed with an AB1377 DNA sequencer
using CspCI-cleaved pUC1CspC-1 and -4 plasmid templates, and a pair
of primers that initiate synthesis approximately 250 nt away from
the CspCI site on one side, (forward-primer), and 160 nt away from
the CspCI site on the other side (reverse primer). The sequences of
these two primers are: TABLE-US-00002 5'- CAGTTCGATGTAACCCACTCG -3'
(SEQ ID NO:15)
[0112] forward primer; corresponds to pUC19 nt 2346-2366;
[0113] interrogates the minus-strand of the vector. TABLE-US-00003
5'- CCCGCTGACGCGCCCTGACGGGC -3' (SEQ ID NO:16)
[0114] reverse primer; corresponds to pUC19 nt 96-118
[0115] complement; interrogates the plus-strand of the vector.
[0116] When sequencing reactions encounter the 5' end of a template
strand, they frequently add a final, non-templated A to the
synthesized strand. If the template DNA comprises a mixture of
intact and truncated strands, such as occurs in incompletely
cleaved DNA samples, the position of cleavage reveals itself in the
sequencing trace by an anomalous A peak superimposed on the normal
peak, and by an overall reduction in the heights of the following
peaks. If the base normally present at the position of the anomaly
is something other than A--G, for example--then a mixed signal is
seen, in this example G plus A. However, if the base normally
present at this position is also A, then a single A peak is seen,
perhaps higher than normal, and this confounds unambiguous
identification.
[0117] Results
[0118] Unambiguous results were obtained for the positions of
cleavage on the 5' sides of the recognition sequence, but the data
was poorer regarding cleavage on the 3' sides. As a whole, however,
they were consistent with the endonuclease cleaving to produce
fragments with 2-base 3'-overhangs at. Sequence traces from
representative reactions are shown in FIG. 5.
[0119] The reaction of partially cleaved pUC1CspC-4 with the
forward primer displayed a strong anomalous A superimposed on the G
13 nt before the recognition sequence, and a stronger-than-expected
A peak 11 nt after it: TABLE-US-00004 (SEQ ID NO:17) 5' . . .
AAGTGccacctgacgtgcaacctaggtggcacgtctaagaa ac . . .
[0120] (Notation. Underlined: CspCI recognition site; bold: normal
base over which anomalous A superimposed; UPPER CASE: peaks of
normal height; lower case: peaks of reduced height)
[0121] These results suggest that cleavage of the complementary
strand (indicated |) occurs: TABLE-US-00005 (SEQ ID NO:18) 5' . . .
GTTT|CTTAGACGTGCCACCTAGGTTGCACGTCAGGTGGC| AGTT . . .
[0122] The reaction of partially cleaved pUC1CspC-4 with the
reverse primer displayed a strong A-anomaly on the T 12 nt before
the recognition sequence, and a suggestion of two anomalous A's
under the two G's 11 and 12 nt after the sequence: TABLE-US-00006
(SEQ ID NO:19) 5' . . . TGGTTtcttagacgtgccacctaggttgcacgtcaggtggc
act . . .
[0123] Ignoring momentarily the G-11 anomaly, these results
suggests that cleavage of the complementary strand occurs:
TABLE-US-00007 (SEQ ID NO:20) 5' . . .
TGC|CACCTGACGTGCAACCTAGGTGGCACGTCTAAGAA|A CCA . . .
[0124] Combining these results, CspCI-cleavage at the site in
pUC1CspC-4 appears to be: TABLE-US-00008 (SEQ ID NO:21) 5' . . .
AGTGC|CACCTGACGTGCAACCTAGGTGGCACGTCTAAGA A|ACC . . . (SEQ ID NO:22)
3' . . . TCA|CGGTGGACTGCACGTTGGATCCACCGTGCAGATTC| TTTGG . . . That
is to say: (SEQ ID NO:14) 11/13 CAA N.sub.5 GTGG 12/10
[0125] The same G-13 and A-11 A-anomalies were seen when
partially-cleaved pUC1CspC-1 was interrogated the forward primer,
and the same T-12 A-anomaly was seen when it was interrogated with
the reverse primer. Consequently, cleavage at the site in
pUC1CspC-1 appears to be: TABLE-US-00009 (SEQ ID NO:23) 5' . . .
AGTGC|CACCTGACGTGCCACCCGGGTTGCACGTCTAAGA A|ACC . . . (SEQ ID NO:24)
3' . . . TCA|CGGTGGACTGCACGGTGGGCCCAACGTGCAGATTC| TTTGG . . . That
is to say: 10/12 CAA N.sub.5 GTGG 13/11 (SEQ ID NO:14)
[0126] This numerical reversal in cleavage distances indicates that
the positions of DNA cleavage are independent of
recognition-sequence orientation, and dependent on nature of
flanking sequence. The sequence to the left (counter-clockwise) of
the recognition site is the same in both plasmids, as also is the
sequence to the right (clockwise). The latter, which is somewhat
A:T-rich, would seem to be more extended, physically, than the
G:C-rich DNA to the left, such that the endonuclease, as it
`measures` out from its binding site, cleaves 12/10 on either side
if the DNA is extended, and 13/11 on either side if the DNA is
compact.
[0127] Returning to the G-11 anomaly momentarily ignored, above,
its presence in the pUC1CspC-4/reverse primer reaction suggests
that the otherwise compact leftward DNA can become more extended,
perhaps due to torsional relaxation that accompanies
supercoil-release during digestion, leading to that CspCI can also
cleave:
[0128] 10/12 CAA N.sub.5 GTGG 12/10 (SEQ ID NO:14), and by
extension,
[0129] 11/13 CAA N.sub.5 GTGG 13/11 (SEQ ID NO:14).
Example IV
Cloning of the CspCI Restriction-Modification Genes
[0130] 1. Preparation of Genomic DNA
[0131] Genomic DNA was prepared from 2.5 g of Citrobacter species
2144, by the following steps: [0132] a. Cell wall digestion by
addition of lysozyme (2 mg/ml final), sucrose (1% final), and 50 mM
Tris-HCl, pH 8.0. [0133] b. Cell lysis by addition of 24 ml of
Lysis mixture: (50 mM Tris-HCl pH 8.0, 62.5 mM EDTA, 10/0 Triton.
[0134] c. Removal of proteins by phenol-CHCl.sub.3 extraction of
DNA 2 times (equal volume). [0135] d. Dialysis in 4 liters of TE
buffer, buffer change four times. [0136] e. RNase A treatment to
remove RNA. [0137] f. Genomic DNA precipitation in 0.4M NaCl and
0.55 volume of 100% isopropanol, spooled, dried and resuspended in
TE buffer.
[0138] 2. Preparation of Plasmid Vector pUC2CspC
[0139] Plasmid cloning vector pUC2CspC was constructed from E. coli
cloning vector pUC19 by inserting two CspCI recognition sites, one
at the unique AatII site at nt 2617, and another at the DraI site
at nt 1563. [0140] a. Two pairs of complementary oligonucleotides
were synthesized. Annealing of each pair produces a CspCI
recognition site, and double-stranded ends that can be ligated to
either AatII or DraI DNA fragments such that the ligation product
no longer contains the AatII or DraI site.
[0141] The oligonucleotide sequences, shown below in annealed
double-strand format, were:
[0142] AatII-Site Linker: TABLE-US-00010 5'-GCAACCNGGGTGGCACGT-3'
(SEQ ID NO:25) |||||||||||||| 3'-TGCACGTTGGNCCCACCG-5'
[0143] DraI-Site Linker: TABLE-US-00011 5'-CAANNNNNGTGG-3' (SEQ ID
NO:14) |||||||||||| 3'-GTTNNNNNCACC-5'
[0144] b. For the AatII site linker, 1 microgram pUC19 was digested
in a small volume with AatII. [0145] c. Annealed oligonucleotide
linker was added to the reaction, along with T4 DNA ligase and
ligase buffer, and the reaction incubated at room temperature for
two hours. [0146] d. Reaction products were transformed into E.
coli, and grown in the presence of ampicillin. [0147] e. Ap.sup.R
transformants were isolated, their plasmids prepared using a
FastPlasmid.RTM. Mini Kit (Eppendorf, Hamburg, Germany), and
analyzed by digesting with restriction enzymes AatII and CspCI.
[0148] f. Two plasmids were identified, pUC1CspC-1 and pUC1CspC-4,
each lacking an AatII site but containing one CspCI recognition
site in either of the two possible, opposite orientations. One of
these, pUC1CspC-4, was purified on a larger scale, using a Qiagen
Plasmid Midi Kit (Qiagen, Valencia, Calif.) according to the
manufacturer's recommendations, for linker insertion at the DraI
site. [0149] g. For the DraI site linker, only partial digestion
products were desired, therefore digestion, ligation, and DraI site
linker components were all added simultaneously. [0150] h. Samples
of the reaction were removed and placed on ice after incubation
times of 2, 5, 10, 20, 40, and 100 minutes. [0151] i. Reaction
samples were transformed into E. coli, plasmids prepared and
analyzed as in d. and e. above, digesting with restriction enzymes
DraI and CspCI. [0152] j. One plasmid, pUC2CspC, containing two
CspCI sites was identified and prepared on a large scale using a
Qiagen Plasmid Mega Kit according to the manufacturer's
recommendations (Qiagen, Valencia, Calif.).
[0153] Plasmid pUC2CspC was used as the plasmid selection vector
for cloning the genes for the CspCI restriction-modification
system. Plasmids pUC1CspC-1 and -4 were used as substrates for
analysis of the CspCI-cleavage reactions (Example II section b,
above).
[0154] 3. Genomic DNA Digestion and Library Construction
[0155] Restriction enzymes ApoI, BamHI, BglII, and Sau3AI were used
to individually digest .about.10 microgram quantities of
Citrobacter sp. 2144 genomic DNA to achieve complete and partial
digestions. Following heat-inactivation of the restriction enzymes
at 65.degree. C. for 15 minutes, the ApoI-digests were ligated to
EcoRI-cleaved, CIP-dephosphorylated pUC2CspC vector, and the
BamHI-, BglII-, and Sau3AI-digests were ligated to BamHI-cleaved,
CIP-dephosphorylated pCspIx2. The ligations, performed overnight
with T4 DNA ligase, were then used to transform the endA.sup.- E.
coli host, ER2683 (New England Biolabs, Inc., Beverly, Mass.), made
competent by the CaCl.sub.2 method. Several thousand
Ampicillin-resistant (Ap.sup.R) transformants were obtained from
each ligation. These colonies from each ligation were pooled and
amplified in 500 ml LB+Ap overnight, and plasmid DNA was prepared
from them by CsCl gradient purification to make primary plasmid
libraries.
[0156] 4. Cloning the CspCI Genes by Methylase-Selection
[0157] One microgram of each of the primary plasmid libraries was
challenged by digestion with .about.8 units of CspCI at 37.degree.
C. for 1 hr. The digestions were transformed back into ER2683 and
plated for survivors. Approximately 500 Ap.sup.R survivors arose
from the BglII-library, and 5, 29, and 20 from the BamHI-, Sau3AI-,
and ApoI-libraries, respectively. Plasmids from BamHI, Sau3AI and
ApoI survivors was prepared individually using the Compass Mini
Plasmid Kit method, and subjected to CspCI-digestion. 3 of the 20
clones from the ApoI-library were found to be resistant to CspCI,
but all those from BamHI- and Sau3AI-libraries were found to be
sensitive. The survivors from the BglII-library were pooled and
used to prepare a secondary plasmid library. This was challenged
again with CspCI and plated, and among the survivors several
additional CspCI-resistant clones were found.
[0158] 5. Identification of the cspCI-R-M
Endonuclease-Methyltransferase Gene, and the cspCI-S Specificity
Gene
[0159] The nt sequence of the inserted DNA in the CspCI-resistant
plasmid clones was determined by dideoxy automated sequencing.
Transposon-insertion into clone ApoI #3, using the GPS-1 System
(New England Biolabs, Inc., Beverly, Mass.), provided the initial
substrates for sequencing, and primer-walking was used
subsequently, on clones ApoI #3 and #12, and BglII #2 and #17, to
finalize the sequence. A total of 4616 bp was determined (FIG. 6),
within which two complete open reading frames (ORFs) of 1899 bp (nt
1604-3502), and 960 bp (nt 3489-4448) were found (FIG. 7). The two
ORFs have the same orientation and overlap by 14 bp (FIG. 8).
Analysis of the ORFs indicated that the larger, termed cspCI-R-M,
encodes a combined restriction-and-modification enzyme, R-M-CspCI,
and the smaller, termed cspCI-S, encodes a DNA-sequence-specificity
protein, S-CspCI (FIG. 9). R-M-CspCI is predicated to be 632 aa in
length and to have a molecular mass of 70,712 Daltons (or 631 aa
and 70,580 Daltons, without the N-terminal fMet). S-CspCI is
predicted to be 319 aa in length and to have a molecular mass of
35,267 Daltons (318 aa and 35,136 Daltons without the fMet). Both
proteins are necessary for CspCI restriction endonuclease
activity.
[0160] R-M-CspCI appears to comprise a DNA-cleavage catalytic
moiety joined to a DNA-methylation catalytic moiety. Amino acids
2-300, the N-terminal half of R-M-CspCI, more-or-less, are believed
to form an endonuclease domain, and to be responsible, primarily,
for DNA strand-cleavage activity of CspCI. This section includes
the aa sequence motif . . . PE-X.sub.15-ECK . . . (aa 57-76), a
motif found at the catalytic site of numerous DNA-endonucleases,
and likely therefore to be the endonuclease catalytic site of
CspCI. Amino acids 301-632 of R-M-CspCI, the C-terminal half of the
protein, are believed to form a methyltransferase domain, and to be
responsible, primarily, for DNA-modification. This section includes
several aa sequence motifs characteristic of the gamma-class of
DNA-adenine methyltransferases including . . . VLTP . . . (aa
325-328), . . . VLDICAGTGGF . . . (SEQ ID NO:26) (aa 347-357), and
. . . NPPY . . . (aa 435-438). On the basis of this, CspCI is
predicted to accomplish modification by methylating adenine
residues within its recognition sequence. Symmetry considerations
suggest that the bases modified are the second A in the top strand
(left sub-sequence), and the only A in the bottom strand (right
sub-sequence), thus: TABLE-US-00012 5' . . . CAAN.sub.5 GTGG . . .
3' -> 5' . . . CAA N.sub.5 GTGG . . . 3' (SEQ ID NO:14) 3' . . .
GTT N.sub.5 CACC . . . 5' 3' . . . GTT N.sub.5 CACC . . . 5'
[0161] R-M-CspCI displays substantial homology to the fused R-M
subunit of the BcgI restriction enzyme, and to several similar
putative R-M-subunits in Genbank.
[0162] S-CspCI also appears to be a fusion protein. In this case,
the two sections are similar in sequence and function, and are
believed to confer upon CspCI the ability to bind to the two
specific components of its recognition sequence. S-CspCI is
analogous to, and indeed weakly homologous to, the specificity
subunits of type I R-M systems. Amino acids 2-168, the N-terminal
half of S-CspCI, more or less, are believed to form one
target-recognition domain (TRD), likely the one responsible for
binding to the left, 5'-CAA-3', component of the recognition
sequence. Amino acids 169-319 are believed to form the other TRD,
and likely binds the other, 5'-CCAC-3' component. These two TRDs
display considerable homology to each other, and consequently
S-CspCI contains several internal repeated sequences. Among these
is the proximal repeat INDLF (aa 4-8) and LQDLF (aa 172-176), and
the distal repeat PDAYQGVRS (aa 144-152) and PDWDFMEKY (aa
300-308). Similar repeats occur within other specificity proteins,
and perhaps mediate in the binding between the S-subunit and
R-M-subunit. S-CspCI displays substantial homology to the
specificity subunit of BcgI, and to several similar putative
specificity subunits in Genbank.
[0163] 6. Characterization of the Cloned CspCI Endonuclease
[0164] CspCI restriction endonuclease purified according to example
1, above, was subjected to SDS-polyacrylamide gel electrophoresis
and found to comprise two proteins of approximately 70 kDa and 35
kDa. High-pressure liquid chromatography of the same sample
demonstrated that the 70 kDa and 35 kDa proteins occurred in the
mass ratio of 1:0.47, implying a molar ratio of 1:1.06. We take
this to indicate that CspCI purifies as, and likely is active as, a
heterodimer comprising one large subunit (R-M-CspCI) and one small
subunit (S-CspCI).
[0165] N-terminal sequence analysis of the isolated large subunit
indicated that it began with the probable amino acid sequence,
ANERKTEELV (SEQ ID NO:27). The initial codons of the CspCI-R-M ORF
specify almost the same sequence: MANERKTESLV (SEQ ID NO:28). This
result confirms that the large subunit is encoded by the CspCI-R-M
ORF; that its translation begins at the predicted ATG at nt 1604;
and that the initiating fMet is likely absent in the mature
protein. N-terminal analysis of the isolated small subunit
indicated that it began with the probable amino acid sequence,
PKINDLFHLE (SEQ ID NO:29). The initial codons of the cspCIS ORF
specify almost the same sequence: MPKINDLFHLE (SEQ ID NO:30). This
result confirms that the small subunit is encoded by the CspCI-S
ORF; that its translation begins at the predicted ATG at nt 3489;
and that its initiating fMet is also likely absent from the mature
protein.
[0166] 7. Establishing the Cleavage Site of CspC1
[0167] The endonuclease CspCI was found to cleave PhiX174 DNA
twice, producing fragments of approximately 3300 bp and 2050 bp.
The locations of the cut sites were mapped to approximate positions
of nt 1575 and nt 4875 by simultaneously digesting PhiX174. DNA
with CspCI and with additional restriction endonucleases which
cleave at known positions, such as PstI, SspI, NciI, and StuI (FIG.
1). CspCI did not cut pBR322 DNA or pUC19 DNA. The approximate size
of the DNA fragments produced by CspCI digestion of phage lambda
DNA (18 kb, 11 kb, 8.3 kb, 5.1 kb, 4.3 kb and 1.8 kb) were entered
into the program REBPredictor, which can be accessed at
http://taq.neb.com/.about.vincze/REBpredictor/index.php
[0168] REBPredictor uses the algorithm of Gingeras, et al. Nucl.
Acids Res. 5:4105 (1978), to predict potential recognition
sequences by comparing observed fragment sizes with those produced
by cleaving the DNA in silico at any given recognition pattern. One
predicted potential pattern computed was 5'-CCACNNNNNTTG-3' [SEQ ID
NO:31] (or 5'-CAANNNNNGTGG-3' [SEQ ID NO:14] on the complementary
strand), which occurs in PhiX174 DNA at positions consistent with
the mapping data obtained, i.e. at positions 1563 and 4866. This
sequence does not occur in pBR322 or pUC19 DNA. The size of
fragments predicted from cleavage at 5'-CAANNNNNGTGG-3' (SEQ ID
NO:14) sites in PhiX174, T7 and phage lambda DNAs matched the
observed size of fragments from the actual cleavage of these DNAs
with CspCI. From these results we conclude that CspCI recognizes
the sequence 5'-CAANNNNNGTGG-3' (SEQ ID NO:14).
[0169] The positions of cleavage at the CspCI recognition sequence
were determined by dideoxy sequencing analysis of the terminal base
sequence obtained from CspCI-cleavage of a suitable DNA substrate,
and by comparing the lengths of the CspCI-cleavage products of a
labeled DNA to a sequence ladder made from the same primer-template
pair (Sanger, et al., PNAS 74:5463-5467 (1977); Brown, et al., J.
Mol. Biol. 140:143-148 (1980)). By the above referenced methods, it
was found that CspCI, like several other endonucleases including
BcgI, BsaXI, CjeI and HaeIV, cleaves on both sides of its
recognition sequence. Our observations suggest that the position of
cleavage can vary by one base-pair on either side, being either
5'-N11/N13-CAANNNNNGTGG-N13/N11-3' (SEQ ID NO:32), or
5'-N10/N12-CAANNNNNGTGG-N12/N10-3' (SEQ ID NO:33) or
5'-N10/N12-CAANNNNNGTGG-N13/N11-3' (SEQ ID NO:34) or
5'-N11/N13-CAANNNNNGTGG-N12/N10-3' (SEQ ID NO:35). While not
wishing to be limited by theory, we believe the enzyme cuts at a
certain distance from the recognition sequence, and that it is the
degree of compactness of the DNA within this span that determines
whether this results in cutting at 11/13 or 10/12 base pairs.
Example V
Expression of CspCI Endonuclease in E. coli
[0170] The plasmid [pUC19-CspCI-R-M-S ApoI #3] was transferred into
ER2683 and plated on Ap.sup.R plates at 37.degree. C. overnight.
Several individual colonies were inoculated into 50 ml LB+Ap.sup.R
and grown at 37.degree. C. overnight. All clones expressed CspCI
endonuclease activity at >10.sup.5 u/g per gram of wet E. coli
cells. While the pUC19-CspCI-R-M-S ApoI contains all three domains
(cleavage, methylase and specificity moieties) of the endonuclease
on a single plasmid for transforming a host cell, it is within the
skill of one of ordinary skill in the art to place the cleavage
moiety, methylase moiety and specificity moiety on separate
plasmids or on a plurality of plasmids in which 2 out of 3 of the
domains are present on a single plasmid and the third domain is on
a second plasmid.
[0171] The strain NEB#1554, ER2683 [pUC19-CspCI-R-M-S ApoI #3] has
been deposited under the terms and conditions of the Budapest
Treaty with the American Type Culture Collection on Mar. 24, 2004
and received ATCC Accession No. PTA-5887.
Example VI
Engineering Variants of CspCI
[0172] CspCI offers a variety of engineering opportunities stemming
from its modular organization.
[0173] The specificity subunit of CspCI has a duplicated
organization that includes a pair of autonomous sequence-selection
domains. The domains occur as direct repeats within the linear
amino acid sequence, but they adopt reverse orientations in the
folded protein to match the anti-parallel organization of
double-strand DNA. One domain of S-CspCI is selective for 5'-CAA in
dsDNA, and the other for 5'-CCAC; the two domains are separated by
about 15 angstroms in the subunit so that as a whole it recognizes
5'-CAANNNNNGTGG (SEQ ID NO:14) in dsDNA. While not wishing to be
limited by theory, it is proposed that actual binding to this
sequence involves cooperation between the S-CspCI and the
methyltransferase domain of R-M-CspCI, the one sequence-specific,
the other non-specific. Alterations introduced into S-CspCI can
change the sequence it recognizes in the same ways they have been
shown to do in type I R-M systems:
[0174] The separation between sequence selection domains and
alteration in the length of the non-specific interval in the
recognition sequence can be achieved by introducing changes in the
`spacer` region. Examples of such changes include insertions such
as small duplications (e.g. to CAA N.sub.6 GTGG [SEQ ID NO:36]) for
increased length or deletions to reduce length (e.g. to CM N.sub.4
GTGG [SEQ ID NO:37)).
[0175] Various approaches exemplified below are used to alter the
specificity of CspCI.
[0176] (a) The recognition sequence of the endonuclease can be
altered by tandemly duplicating one of the two specificity domains.
In this way, the specificity domain is transformed from recognizing
an asymmetric recognition site to recognizing a symmetrical
recognition site (e.g. CAA N.sub.5 TTG [SEQ ID NO:38] or CCAC
N.sub.5 GTGG [SEQ ID NO:39]). This is accomplished without
physically joining the domains in a single polypeptide chain where
dimerization of the tandem repeat can occur spontaneously.
[0177] (b) Amino acid changes can be introduced within either
domain to alter the sequence selected by that domain, resulting in
altered specificity and causing nucleotide discrimination to be
diminished (e.g. CAA N.sub.5 GTGR (SEQ ID NO:40]), or lost (e.g.
CAA N.sub.5 GTG [SEQ ID NO:41]). Amino acid changes in the
S-subunit within the regions flanking the sequence-selection
domains are expected to abolish cleavage on both sides of its
recognition sequence. The ability of the R-M-subunit to bind to the
S-subunit in either orientation can be modified to limit its
binding to a single orientation. Accordingly, CspCI, or a variant,
may be transformed into an endonuclease that cleaves unilaterally,
on only one side of its recognition sequence.
[0178] (C) Swaps between the sequence-selection domains of S-CspCI
and those of other type IIG enzymes is expected to generate
chimeric S-subunits with hybrid specificities. A protein comprising
the N-terminus of S-CspCI (recognition sequence CM N.sub.5 GTGG)
(SEQ ID NO:14) and the C-terminus of, for example, S-BcgI
(recognition sequence CGA N.sub.5 TGC) (SEQ ID NO:42), when
combined with R-M-CspCI may result in an endonuclease that
recognizes CAA N.sub.5 TGC (SEQ ID NO:43). For example, N- and
C-terminal domains are expected to be interchangeable to create
combinations of two C-terminal domains or two N-terminal domains.
In this way, the C-terminal domains of S-CspCI and S-BcgI, together
will recognize GCA N.sub.5 GTGG (SEQ ID NO:44). In some Type IIG
enzymes, such as HaeIV, AloI, and CjeI, the specificity domain(s)
are fused at the C-terminus of the combined R-M-S protein. These
can also be swapped into S-CspCI.
[0179] Sequence-specificity modules are abundant in nature,
occurring both as individual proteins and as domains within
composite proteins. Coupling these specificity modules to an
endonuclease catalytic site will create endonucleases with new
specificities.
[0180] Examples of specificity domains from class IIG restriction
enzymes that may be used to replace the N- and the C-terminal
domains of S-CspCI are as follows: TABLE-US-00013 BcgI (New England
Biolabs, Inc., Beverly, MA) CGANNNNNNTGC (SEQ ID NO:45) BaeI (New
England Biolabs, Inc., Beverly, MA) ACNNNNGTAYC (SEQ ID NO:46) BpII
(Fermentas GmbH, Vilnius, Lithuania) GAGNNNNNCTC (SEQ ID NO:47)
CjeI, CCANNNNNNGT (SEQ ID NO:48) from Camylobacter jejuni (Vitor,
J.M.B., Morgan, R.D. Gene 157: 109-110 (1995)). AloI (Fermentas
GmbH, Vitnius, Lithuania) GAACNNNNNNTCC (SEQ ID NO:49) HaeIV
(Piekarowicz , A., et at. J. Mol. Biol. 293: 1055-1065 (1999))
GAYNNNNNRTC (SEQ ID NO:50) BsaXI (New England Biolabs, Inc.,
Beverly, MA) ACNNNNNCTCC (SEQ ID NO:51)
[0181] In addition to the above, Type I specificity proteins are a
rich potential source of specificity-domains for domain-swaps with
S-CspCI. The sequence-selection domains of S-CspCI bear some
homology to those of the specificity subunits of Type I R-M
systems. Hundreds of generally uncharacterized type I S-subunits
can be found in Genbank. These proteins interact naturally with
Type I modification subunits, which belong to the same gamma-class,
of DNA-adenine methyltransferases as R-M-CspCI and can be used as
specificity domains for domain swaps.
[0182] The C-terminal section of stand-alone gamma-class
DNA-adenine methyltransferases is thought to act as a
sequence-selection domain, conveying to the otherwise
indiscriminate catalytic site a particular nt sequence to be
methylated. These methyltransferases, some solitary, others from
Type II and Type IIS R-M systems, abound in nature. Over one
hundred have been characterized and many more uncharacterized
examples can be found in Genbank. In general, these enzymes
recognize continuous nt sequences. Most recognize symmetric
sequences 4 to 6 nt in length; others recognize asymmetric
sequences of up to 7 nt. These stand-alone methyltransferases also
represent a rich potential source of specificity-domains for
domain-swaps with S-CspCI. CspCI endonuclease variants with
recognition sequences of considerable length could be assembled
from these enzymes.
[0183] Type I S-proteins interact naturally with Type I
modification (M) subunits, forming trimers of composition 2M:1S.
These trimers binds specifically to the sequences selected by the
S-subunits and subsequently catalyze their methylation. Type I
M-subunits are homologous to the C-terminal, methyltransferase,
domain of R-M-CspCI, but they lack the N-terminal portion of this
protein that forms the endonuclease domain. CspCI can be used to
endow endonuclease activity on type I modification enzymes by
transferring an endonuclease domain from R-M-CspCI to a type I
M-subunit-a `domain graft`. This will cause the Type I
methyltranferase to cleave DNA as well as to modify it.
[0184] This experimental approach of grafting the endonuclease
domain of R-M-CspCI to the front of a Type I methyltransferase can
be applied to other stand-alone methyltransferases to cleave at
sequences that originally were only modified. For example, the
N-terminus cleavage domain of R-M-CspCI which is a gamma-class DNA
adenine methyltransferase can be transferred to other gamma-class
DNA adenine methyltransferases.
Sequence CWU 1
1
49 1 61 DNA unknown synthetic misc_feature (12)..(12) n=a,c, g or t
1 ccccgaaaag tnccacctga cgtgcaacct aggtggcacg tctaagaaac cattattatc
60 a 61 2 61 DNA unknown synthetic misc_feature (14)..(14) n=a,c, g
or t 2 tgataataat ggtntcttag acgtgccacc taggttgcac gtcaggtggc
acttttcggg 60 g 61 3 61 DNA unknown synthetic misc_feature
(12)..(12) n=a,c,t or g 3 ccccgaaaag tnccacctga cgtgccaccc
gggttgcacg tctaagaaac cattattatc 60 a 61 4 61 DNA unknown synthetic
misc_feature (14)..(14) n-a,c,g or t 4 tgataataat ggtntcttag
acgtgcaacc cgggtggcac gtcaggtggc acttttcggg 60 g 61 5 4616 DNA
unknown Citrobacter species 2144 5 agatctgcca atactgtttc gacagcgcca
cttaattcct tcaatttcgc gcaggttaga 60 tggcacttgt tcggaagagg
cgtctgtaac tcggtctcaa gctgcgggat tgccccggta 120 ttttgctcat
ccccttttaa ttcaacgatc atctgctgaa aagtcagacc ctcacgatag 180
tcagaggcca gcttctgttc ctcagatgcc agccctttca gatcggtctt acccgtctct
240 agttgttggg ttgccccatc aagagtgcgg cgtttttctg ccaattgagt
agcttttaca 300 cccgtcagat cgatatatct ggcatcaatc gccgaagtaa
aatttcggac aaaatcatta 360 aatgcatcca gcccaaataa cgttgagata
agttcagtct gtcttgccgg ggccagcgcc 420 gctattcttg agaagttgtc
aattcggttt ttttcaacaa agcaaaagcg gtgctgtgct 480 tcgttatgct
caattgctaa atcctgtcct tgctctccta cgccagtaat tacaggtgca 540
gaaaactgat cgacatgtgc atttctaaaa tagtcggttt gattacgaaa acgcttacta
600 tcagcctcag ctacgctacc cagtaatgta tattcaagcg cttcgcagaa
actggacttc 660 ccggtaccat tggggccata aatcagcacc agacgcgaat
ccaggtcaaa ttcctcctgt 720 ctggcaaatc ctctgaacgg tccaacggac
aacctcctga gtcgattgaa agtggagacg 780 cgttcgttgc tttgttcagg
cagtggctgg acttccaggc tgagggtatc ccaggcaggt 840 tgcgccagat
cgacgatacg ccttatccgc tgtccctgtg aggtacctaa cgggataata 900
ttatccagat tatcccatac aagattcgcc atttttctga catcaccggg tatatctgct
960 gtgtctaaag tttggaaaaa gcgtaaaaac tcgttactga gcattatgaa
tcctttttta 1020 cttgtcgttt tctcacgtta taagacaatg ataaaagata
cactcttagc taacgtattc 1080 acgtgatctg tagatcaatt atcttcagtt
ccgctctcaa gctgaactga accgggatga 1140 agacggtatg gcgcttgcca
cactagtaca ggtgtattac taaaaaaccg aaaggtattc 1200 gataaagccg
attacaacgc gttggtggac aacaccgaag ccacgctcgg cgatgaactg 1260
gtggcaaaga aagaaataca ggtccgccgg gagtaaacgt ccaccttcat caagccgatg
1320 gatgagcagg cgtaatatgt cgcagtgctt gcgaagcgcc gtactccgga
tgtgcgcaag 1380 aacgactgac gtctggtact gagccgtgac gatctggcct
ctgatgggcc cgcattaatg 1440 agatggtaaa tcctcactaa tattgaaggc
aaaaaataaa ggtctccaaa atcgactctt 1500 gtaaagaggc ttgcgaggcc
ctcctgcact ctagccatag ttcggaattg gtcgttaaaa 1560 tgtcgtacac
taccatcatt ttaaaatcga aatggaatat tgaatggcga acgaacgcaa 1620
aacagaatcc ttagttcgag accagctacg gacatttggc tactacgaac cggacaacgg
1680 catttctgta gaggagcaaa agtccgagat tgtcaagatt aagggtttgc
tttcaaaagc 1740 aagtaagaac gccaagggca atattggtta tcccgagttc
atcatctcta accggaaaga 1800 tactgcattc ctgatagttg tggagtgcaa
gccggatgtg aaaaagcacg agagcccaag 1860 ccgtgataag ccggtagact
atgcggtgga tggcgttctc cactacgcca gacacctagc 1920 caagcactat
accgtattgg cggtggctgt gagcggcacg acggcaagtt ctatgaaggt 1980
gtccaacttc cttgtgcctg cgggtaccac ggatgtgaag gcgctggtca acgagagtaa
2040 ttcctcagtt gccgaattgg tgccttatga tgactactac cgcctggcgt
cttatgatcc 2100 ggatgttgct cagaagcgcc actctgactt gctggcgttc
tcacgcgagc tgcacgagtt 2160 tatttggacg aaggcaaaaa tctccgaaga
agaaaagcct ctgctggtga gtgggacctt 2220 gattgcgttg atgaacaaca
cattcatcaa gacctttgac gctctacctg cagaagatgt 2280 gcaggaagcg
tggctgacgg ctatcaagaa ggagctggac aaagcttcta tcccccaggc 2340
caagaaggac acgatgctgc agccgtatac gacgattgcg gttaatccca atcttggcaa
2400 gcctgacagc aagacggcta aagagtatcc agatggagtt ttcaaggaaa
taatcacccg 2460 catcgccgac aacgtctggc cctacatcaa tgtctttcac
gactttgatg tggtcggaca 2520 attctacggt gagtttctga aatatactgc
gggcgacaaa aaagcgctgg gcatcgtgct 2580 gacgccgcgc catgtggctg
aactgttctc gctcatcgcc aacgttaacc ccaagtctaa 2640 ggtgctggac
atctgtgcgg gcacgggcgg ctttctcatc tcggccatgc aacacatgct 2700
caagaaggcc gtaacggaca aagagcgcaa cgacatcaag caaaatcggc tcatcgggat
2760 tgaaaacaac cccaagatgt ttgccttggc tgccagcaac atgattctgc
gtggtgatgg 2820 taaggctaac ctgcaccagg ccagttgctt tgataatgca
gtgattgcgg ccgtgcagaa 2880 gatgaagccc aacgtgggca tgcttaaccc
cccgtattcg cagtccaaga gcgacgcgga 2940 actgcatgag ctgtatttcg
tcaagcaaat gctcgacacg cttacaccag gtggagttgg 3000 tatcgcgatt
gttcccatgt caagcgccat ctcgcccaac ccaatgcgtg aagagctgat 3060
gaagtaccac tcactggatg cggtcatgtc aatgccccag gagctgtttt atccagtggg
3120 cacggtcacc tgtgtcatgg tctggattgc cggtgtgcca catgagcaaa
tgtccaagaa 3180 gacatggttt ggctactggc gcgacgatgg ctttgtgaaa
accaagcata aggggcgcat 3240 cgacatgaat ggcacctggc cagacatccg
tgaccgatgg attgaaatgt atcgcaatcg 3300 cgaagtgcat gctggcgaga
gcatcatgca gaaggtaggc cccgatgatg aatggtgcgc 3360 tgaagcctat
atggaaacgg actactcagt gctgactcag tccgactttg agaaggtcgt 3420
tcaaagctac gcgctattta aactatttgg tcaaggcagt agccagtccg aagtgaaagg
3480 ggcaacggat gccgaagatt aacgaccttt ttcatctgga gtacggtcac
agcctggagt 3540 tgaaccggct agagcaatcc acagcagccg atgccgtcaa
cttcgttgga cgggcagcta 3600 ggaacaatgg agtcaccgca cgcgtggctc
cccctccaaa cttgaaaccg gcagccgcag 3660 gcaccatcag cgtagcgctg
ggagggcaag gtggcgcagg agtcgccttc ctccaaccgc 3720 gtccctactt
ttgtggccgc gatgtgatgg tgctgacccc caagaagcac atgacagacc 3780
aagaaaagct gtggtgggtc atgtgcatca cagccaaccg tttccgcttt ggatttggtc
3840 gccaagctaa tcggacgcta aaggacttga atctgcctgc gccccaaaaa
actccaagct 3900 gggtgcatac agcgaacccc gatgcctacc aaggtgtcag
gtcccccgca agtgttcatc 3960 cagtcggcac gctggctgtg agcaactgga
aggctttcat tcttcaagac ttgtttacca 4020 tccgtaaagg acagcgactc
accaaggcca acatgttgcc cggtacggtg ccctacatcg 4080 gcgcatcgga
cacttccaac ggcgttactg cgcacatcgg gcaaaaacca atccacgagg 4140
gcggcaccat cagcgtcaca tatgacggtt caatagctga agcgttttac cagccctccc
4200 cattttgggc atcggatgct gtgaacgtgc tctatcccaa gggtttcaca
ctcacaccgg 4260 ccactgcctt gtttatctgc gcaatcatca ggatggagaa
atatcgcttc aactatggcc 4320 gaaaatggca cttagagcgt atgcgagaga
cagttatcag gttaccagct actgcaacag 4380 gtgcaccaga ttgggacttt
atggagaaat acatcaaaac tttgccctat agctcgcagt 4440 tgcaataatc
atggctgatt tcctaaattt cctgccgcat ctacgggtat tgcatgttca 4500
ggacggtggt gatcatcgct aggtggaggc ggaaagccgt gttttgctga ccgcttgccc
4560 ggcctgcggt gaaaagcctt cccattcagg gaaggcttta atcgagttat agatct
4616 6 1899 DNA unknown restriction and modification system of
Citrobacter species 2144 6 atggcgaacg aacgcaaaac agaatcctta
gttcgagacc agctacggac atttggctac 60 tacgaaccgg acaacggcat
ttctgtagag gagcaaaagt ccgagattgt caagattaag 120 ggtttgcttt
caaaagcaag taagaacgcc aagggcaata ttggttatcc cgagttcatc 180
atctctaacc ggaaagatac tgcattcctg atagttgtgg agtgcaagcc ggatgtgaaa
240 aagcacgaga gcccaagccg tgataagccg gtagactatg cggtggatgg
cgttctccac 300 tacgccagac acctagccaa gcactatacc gtattggcgg
tggctgtgag cggcacgacg 360 gcaagttcta tgaaggtgtc caacttcctt
gtgcctgcgg gtaccacgga tgtgaaggcg 420 ctggtcaacg agagtaattc
ctcagttgcc gaattggtgc cttatgatga ctactaccgc 480 ctggcgtctt
atgatccgga tgttgctcag aagcgccact ctgacttgct ggcgttctca 540
cgcgagctgc acgagtttat ttggacgaag gcaaaaatct ccgaagaaga aaagcctctg
600 ctggtgagtg ggaccttgat tgcgttgatg aacaacacat tcatcaagac
ctttgacgct 660 ctacctgcag aagatgtgca ggaagcgtgg ctgacggcta
tcaagaagga gctggacaaa 720 gcttctatcc cccaggccaa gaaggacacg
atgctgcagc cgtatacgac gattgcggtt 780 aatcccaatc ttggcaagcc
tgacagcaag acggctaaag agtatccaga tggagttttc 840 aaggaaataa
tcacccgcat cgccgacaac gtctggccct acatcaatgt ctttcacgac 900
tttgatgtgg tcggacaatt ctacggtgag tttctgaaat atactgcggg cgacaaaaaa
960 gcgctgggca tcgtgctgac gccgcgccat gtggctgaac tgttctcgct
catcgccaac 1020 gttaacccca agtctaaggt gctggacatc tgtgcgggca
cgggcggctt tctcatctcg 1080 gccatgcaac acatgctcaa gaaggccgta
acggacaaag agcgcaacga catcaagcaa 1140 aatcggctca tcgggattga
aaacaacccc aagatgtttg ccttggctgc cagcaacatg 1200 attctgcgtg
gtgatggtaa ggctaacctg caccaggcca gttgctttga taatgcagtg 1260
attgcggccg tgcagaagat gaagcccaac gtgggcatgc ttaacccccc gtattcgcag
1320 tccaagagcg acgcggaact gcatgagctg tatttcgtca agcaaatgct
cgacacgctt 1380 acaccaggtg gagttggtat cgcgattgtt cccatgtcaa
gcgccatctc gcccaaccca 1440 atgcgtgaag agctgatgaa gtaccactca
ctggatgcgg tcatgtcaat gccccaggag 1500 ctgttttatc cagtgggcac
ggtcacctgt gtcatggtct ggattgccgg tgtgccacat 1560 gagcaaatgt
ccaagaagac atggtttggc tactggcgcg acgatggctt tgtgaaaacc 1620
aagcataagg ggcgcatcga catgaatggc acctggccag acatccgtga ccgatggatt
1680 gaaatgtatc gcaatcgcga agtgcatgct ggcgagagca tcatgcagaa
ggtaggcccc 1740 gatgatgaat ggtgcgctga agcctatatg gaaacggact
actcagtgct gactcagtcc 1800 gactttgaga aggtcgttca aagctacgcg
ctatttaaac tatttggtca aggcagtagc 1860 cagtccgaag tgaaaggggc
aacggatgcc gaagattaa 1899 7 960 DNA unknown specificity subunit of
Citrobacter species 2144 7 atgccgaaga ttaacgacct ttttcatctg
gagtacggtc acagcctgga gttgaaccgg 60 ctagagcaat ccacagcagc
cgatgccgtc aacttcgttg gacgggcagc taggaacaat 120 ggagtcaccg
cacgcgtggc tccccctcca aacttgaaac cggcagccgc aggcaccatc 180
agcgtagcgc tgggagggca aggtggcgca ggagtcgcct tcctccaacc gcgtccctac
240 ttttgtggcc gcgatgtgat ggtgctgacc cccaagaagc acatgacaga
ccaagaaaag 300 ctgtggtggg tcatgtgcat cacagccaac cgtttccgct
ttggatttgg tcgccaagct 360 aatcggacgc taaaggactt gaatctgcct
gcgccccaaa aaactccaag ctgggtgcat 420 acagcgaacc ccgatgccta
ccaaggtgtc aggtcccccg caagtgttca tccagtcggc 480 acgctggctg
tgagcaactg gaaggctttc attcttcaag acttgtttac catccgtaaa 540
ggacagcgac tcaccaaggc caacatgttg cccggtacgg tgccctacat cggcgcatcg
600 gacacttcca acggcgttac tgcgcacatc gggcaaaaac caatccacga
gggcggcacc 660 atcagcgtca catatgacgg ttcaatagct gaagcgtttt
accagccctc cccattttgg 720 gcatcggatg ctgtgaacgt gctctatccc
aagggtttca cactcacacc ggccactgcc 780 ttgtttatct gcgcaatcat
caggatggag aaatatcgct tcaactatgg ccgaaaatgg 840 cacttagagc
gtatgcgaga gacagttatc aggttaccag ctactgcaac aggtgcacca 900
gattgggact ttatggagaa atacatcaaa actttgccct atagctcgca gttgcaataa
960 8 632 PRT unknown predicted amino acid sequence of restriction
modification system of Citrobacter species 2144 8 Met Ala Asn Glu
Arg Lys Thr Glu Ser Leu Val Arg Asp Gln Leu Arg 1 5 10 15 Thr Phe
Gly Tyr Tyr Glu Pro Asp Asn Gly Ile Ser Val Glu Glu Gln 20 25 30
Lys Ser Glu Ile Val Lys Ile Lys Gly Leu Leu Ser Lys Ala Ser Lys 35
40 45 Asn Ala Lys Gly Asn Ile Gly Tyr Pro Glu Phe Ile Ile Ser Asn
Arg 50 55 60 Lys Asp Thr Ala Phe Leu Ile Val Val Glu Cys Lys Pro
Asp Val Lys 65 70 75 80 Lys His Glu Ser Pro Ser Arg Asp Lys Pro Val
Asp Tyr Ala Val Asp 85 90 95 Gly Val Leu His Tyr Ala Arg His Leu
Ala Lys His Tyr Thr Val Leu 100 105 110 Ala Val Ala Val Ser Gly Thr
Thr Ala Ser Ser Met Lys Val Ser Asn 115 120 125 Phe Leu Val Pro Ala
Gly Thr Thr Asp Val Lys Ala Leu Val Asn Glu 130 135 140 Ser Asn Ser
Ser Val Ala Glu Leu Val Pro Tyr Asp Asp Tyr Tyr Arg 145 150 155 160
Leu Ala Ser Tyr Asp Pro Asp Val Ala Gln Lys Arg His Ser Asp Leu 165
170 175 Leu Ala Phe Ser Arg Glu Leu His Glu Phe Ile Trp Thr Lys Ala
Lys 180 185 190 Ile Ser Glu Glu Glu Lys Pro Leu Leu Val Ser Gly Thr
Leu Ile Ala 195 200 205 Leu Met Asn Asn Thr Phe Ile Lys Thr Phe Asp
Ala Leu Pro Ala Glu 210 215 220 Asp Val Gln Glu Ala Trp Leu Thr Ala
Ile Lys Lys Glu Leu Asp Lys 225 230 235 240 Ala Ser Ile Pro Gln Ala
Lys Lys Asp Thr Met Leu Gln Pro Tyr Thr 245 250 255 Thr Ile Ala Val
Asn Pro Asn Leu Gly Lys Pro Asp Ser Lys Thr Ala 260 265 270 Lys Glu
Tyr Pro Asp Gly Val Phe Lys Glu Ile Ile Thr Arg Ile Ala 275 280 285
Asp Asn Val Trp Pro Tyr Ile Asn Val Phe His Asp Phe Asp Val Val 290
295 300 Gly Gln Phe Tyr Gly Glu Phe Leu Lys Tyr Thr Ala Gly Asp Lys
Lys 305 310 315 320 Ala Leu Gly Ile Val Leu Thr Pro Arg His Val Ala
Glu Leu Phe Ser 325 330 335 Leu Ile Ala Asn Val Asn Pro Lys Ser Lys
Val Leu Asp Ile Cys Ala 340 345 350 Gly Thr Gly Gly Phe Leu Ile Ser
Ala Met Gln His Met Leu Lys Lys 355 360 365 Ala Val Thr Asp Lys Glu
Arg Asn Asp Ile Lys Gln Asn Arg Leu Ile 370 375 380 Gly Ile Glu Asn
Asn Pro Lys Met Phe Ala Leu Ala Ala Ser Asn Met 385 390 395 400 Ile
Leu Arg Gly Asp Gly Lys Ala Asn Leu His Gln Ala Ser Cys Phe 405 410
415 Asp Asn Ala Val Ile Ala Ala Val Gln Lys Met Lys Pro Asn Val Gly
420 425 430 Met Leu Asn Pro Pro Tyr Ser Gln Ser Lys Ser Asp Ala Glu
Leu His 435 440 445 Glu Leu Tyr Phe Val Lys Gln Met Leu Asp Thr Leu
Thr Pro Gly Gly 450 455 460 Val Gly Ile Ala Ile Val Pro Met Ser Ser
Ala Ile Ser Pro Asn Pro 465 470 475 480 Met Arg Glu Glu Leu Met Lys
Tyr His Ser Leu Asp Ala Val Met Ser 485 490 495 Met Pro Gln Glu Leu
Phe Tyr Pro Val Gly Thr Val Thr Cys Val Met 500 505 510 Val Trp Ile
Ala Gly Val Pro His Glu Gln Met Ser Lys Lys Thr Trp 515 520 525 Phe
Gly Tyr Trp Arg Asp Asp Gly Phe Val Lys Thr Lys His Lys Gly 530 535
540 Arg Ile Asp Met Asn Gly Thr Trp Pro Asp Ile Arg Asp Arg Trp Ile
545 550 555 560 Glu Met Tyr Arg Asn Arg Glu Val His Ala Gly Glu Ser
Ile Met Gln 565 570 575 Lys Val Gly Pro Asp Asp Glu Trp Cys Ala Glu
Ala Tyr Met Glu Thr 580 585 590 Asp Tyr Ser Val Leu Thr Gln Ser Asp
Phe Glu Lys Val Val Gln Ser 595 600 605 Tyr Ala Leu Phe Lys Leu Phe
Gly Gln Gly Ser Ser Gln Ser Glu Val 610 615 620 Lys Gly Ala Thr Asp
Ala Glu Asp 625 630 9 319 PRT unknown predicted amino acid sequence
of t he specificity subunit of Citrobacter species 2144 9 Met Pro
Lys Ile Asn Asp Leu Phe His Leu Glu Tyr Gly His Ser Leu 1 5 10 15
Glu Leu Asn Arg Leu Glu Gln Ser Thr Ala Ala Asp Ala Val Asn Phe 20
25 30 Val Gly Arg Ala Ala Arg Asn Asn Gly Val Thr Ala Arg Val Ala
Pro 35 40 45 Pro Pro Asn Leu Lys Pro Ala Ala Ala Gly Thr Ile Ser
Val Ala Leu 50 55 60 Gly Gly Gln Gly Gly Ala Gly Val Ala Phe Leu
Gln Pro Arg Pro Tyr 65 70 75 80 Phe Cys Gly Arg Asp Val Met Val Leu
Thr Pro Lys Lys His Met Thr 85 90 95 Asp Gln Glu Lys Leu Trp Trp
Val Met Cys Ile Thr Ala Asn Arg Phe 100 105 110 Arg Phe Gly Phe Gly
Arg Gln Ala Asn Arg Thr Leu Lys Asp Leu Asn 115 120 125 Leu Pro Ala
Pro Gln Lys Thr Pro Ser Trp Val His Thr Ala Asn Pro 130 135 140 Asp
Ala Tyr Gln Gly Val Arg Ser Pro Ala Ser Val His Pro Val Gly 145 150
155 160 Thr Leu Ala Val Ser Asn Trp Lys Ala Phe Ile Leu Gln Asp Leu
Phe 165 170 175 Thr Ile Arg Lys Gly Gln Arg Leu Thr Lys Ala Asn Met
Leu Pro Gly 180 185 190 Thr Val Pro Tyr Ile Gly Ala Ser Asp Thr Ser
Asn Gly Val Thr Ala 195 200 205 His Ile Gly Gln Lys Pro Ile His Glu
Gly Gly Thr Ile Ser Val Thr 210 215 220 Tyr Asp Gly Ser Ile Ala Glu
Ala Phe Tyr Gln Pro Ser Pro Phe Trp 225 230 235 240 Ala Ser Asp Ala
Val Asn Val Leu Tyr Pro Lys Gly Phe Thr Leu Thr 245 250 255 Pro Ala
Thr Ala Leu Phe Ile Cys Ala Ile Ile Arg Met Glu Lys Tyr 260 265 270
Arg Phe Asn Tyr Gly Arg Lys Trp His Leu Glu Arg Met Arg Glu Thr 275
280 285 Val Ile Arg Leu Pro Ala Thr Ala Thr Gly Ala Pro Asp Trp Asp
Phe 290 295 300 Met Glu Lys Tyr Ile Lys Thr Leu Pro Tyr Ser Ser Gln
Leu Gln 305 310 315 10 23 DNA unknown synthetic 10 cagagagata
acccacaaga ttg 23 11 24 DNA unknown synthetic 11 ccacaagaat
tgagttaagc ccaa 24 12 25 DNA unknown synthetic 12 atcgagagat
aacccacaag aattg 25 13 25 DNA unknown synthetic misc_feature
(4)..(8) n=a,c,t or g misc_feature (13)..(25) n=a,c,t or g 13
caannnnngt ggnnnnnnnn nnnnn 25 14 12 DNA unknown synthetic
misc_feature (4)..(8) n=a,t,c or g 14 caannnnngt gg 12 15 21 DNA
unknown primer 15 cagttcgatg taacccactc g 21 16 23 DNA unknown
primer 16 cccgctgacg cgccctgacg ggc
23 17 43 DNA unknown synthetic 17 aagtgccacc tgacgtgcaa cctaggtggc
acgtctaaga aac 43 18 43 DNA unknown synthetic 18 gtttcttaga
cgtgccacct aggttgcacg tcaggtggca ctt 43 19 44 DNA unknown synthetic
19 tggtttctta gacgtgccac ctaggttgca cgtcaggtgg cact 44 20 42 DNA
unknown synthetic 20 tgccacctga cgtgcaacct aggtggcacg tctaagaaac ca
42 21 43 DNA unknown synthetic 21 agtgccacct gacgtgcaac ctaggtggca
cgtctaagaa acc 43 22 43 DNA unknown synthetic 22 agtgccacct
gacgtgccac ccgggttgca cgtctaagaa acc 43 23 18 DNA unknown synthetic
misc_feature (7)..(7) n=a,c,t or g 23 gcaaccnggg tggcacgt 18 24 11
PRT unknown synthetic 24 Val Leu Asp Ile Cys Ala Gly Thr Gly Gly
Phe 1 5 10 25 10 PRT unknown synthetic 25 Ala Asn Glu Arg Lys Thr
Glu Glu Leu Val 1 5 10 26 11 PRT unknown synthetic 26 Met Ala Asn
Glu Arg Lys Thr Glu Ser Leu Val 1 5 10 27 10 PRT unknown synthetic
27 Pro Lys Ile Asn Asp Leu Phe His Leu Glu 1 5 10 28 11 PRT unknown
synthetic 28 Met Pro Lys Ile Asn Asp Leu Phe His Leu Glu 1 5 10 29
12 DNA unknown synthetic misc_feature (5)..(8) n=a,c,t or g
misc_feature (9)..(9) n is a, c, g, or t 29 ccacnnnnnt tg 12 30 36
DNA unknown synthetic misc_feature (1)..(11) n=a,c,t or g
misc_feature (15)..(19) n=a,c,t or g misc_feature (24)..(36)
n=a,c,t or g 30 nnnnnnnnnn ncaannnnng tggnnnnnnn nnnnnn 36 31 34
DNA unknown synthetic misc_feature (1)..(10) n=a,c,g or t
misc_feature (14)..(18) n=a,c,g or t misc_feature (23)..(34)
n=a,c,g or t 31 nnnnnnnnnn caannnnngt ggnnnnnnnn nnnn 34 32 35 DNA
unknown synthetic misc_feature (1)..(10) n=a,c,t or g misc_feature
(14)..(18) n=a,c,t or g misc_feature (23)..(35) n=a,c,t or g 32
nnnnnnnnnn caannnnngt ggnnnnnnnn nnnnn 35 33 35 DNA unknown
synthetic misc_feature (1)..(11) n=a,c,t or g misc_feature
(15)..(19) n=a,c,t or g misc_feature (24)..(35) n=a,c,t or g 33
nnnnnnnnnn ncaannnnng tggnnnnnnn nnnnn 35 34 13 DNA unknown
synthetic misc_feature (4)..(9) n=a,c,t or g 34 caannnnnng tgg 13
35 11 DNA unknown synthetic misc_feature (4)..(7) n=a,c,g or t 35
caannnngtg g 11 36 11 DNA unknown synthetic misc_feature (4)..(8)
n=a,c,t or g 36 caannnnntt g 11 37 13 DNA unknown synthetic
misc_feature (5)..(9) n=a,c,t or g 37 ccacnnnnng tgg 13 38 12 DNA
unknown synthetic misc_feature (4)..(4) n is a, c, g, or t
misc_feature (5)..(8) n=a,c,t or g misc_feature (12)..(12) r=a or g
38 caannnnngt gr 12 39 11 DNA unknown synthetic misc_feature
(4)..(8) n=a,c,t or g 39 caannnnngt g 11 40 11 DNA unknown
synthetic misc_feature (4)..(8) n=a,c,t or g 40 cgannnnntg c 11 41
11 DNA unknown synthetic misc_feature (4)..(8) n=a,c,t or g 41
caannnnntg c 11 42 12 DNA unknown synthetic misc_feature (4)..(8)
n=a,c,t or g 42 gcannnnngt gg 12 43 12 DNA Bacillus coagulans
misc_feature (4)..(9) n=a,c,t or g 43 cgannnnnnt gc 12 44 11 DNA
Bacillus sphaericus misc_feature (3)..(6) n=a,c,t or g misc_feature
(10)..(10) y=c or t 44 acnnnngtay c 11 45 11 DNA Bacillus pumilus
misc_feature (4)..(8) n=a,c,g or t 45 gagnnnnnct c 11 46 11 DNA
Campylobacter jejuni misc_feature (4)..(9) n=a, c, t or g 46
ccannnnnng t 11 47 13 DNA Acinetobacter lwoffii misc_feature
(5)..(10) n=a, c, t or g 47 gaacnnnnnn tcc 13 48 11 DNA Haemophilus
aegyptius misc_feature (3)..(3) y=c or t misc_feature (4)..(8)
n=a,c,t or g misc_feature (9)..(9) r=a or g 48 gaynnnnnrt c 11 49
11 DNA Bacillus stearothermophilus misc_feature (3)..(7) n=a,c, t
or g 49 acnnnnnctc c 11
* * * * *
References