U.S. patent application number 11/751441 was filed with the patent office on 2007-11-08 for methods and compositions for protein expression and purification.
Invention is credited to Tauseef R. Butt, Michael P. Malakhov, Oxana A. Malakhova, Hiep T. Tran, Stephen D. Weeks.
Application Number | 20070259414 11/751441 |
Document ID | / |
Family ID | 23359446 |
Filed Date | 2007-11-08 |
United States Patent
Application |
20070259414 |
Kind Code |
A1 |
Butt; Tauseef R. ; et
al. |
November 8, 2007 |
Methods and Compositions for Protein Expression and
Purification
Abstract
Methods for enhancing expression levels and secretion of
heterologous fusion proteins in a host cell are disclosed.
Inventors: |
Butt; Tauseef R.; (Audubon,
PA) ; Weeks; Stephen D.; (Philadelphia, PA) ;
Tran; Hiep T.; (West Chester, PA) ; Malakhov; Michael
P.; (San Diego, CA) ; Malakhova; Oxana A.;
(San Diego, CA) |
Correspondence
Address: |
DANN, DORFMAN, HERRELL & SKILLMAN
1601 MARKET STREET
SUITE 2400
PHILADELPHIA
PA
19103-2307
US
|
Family ID: |
23359446 |
Appl. No.: |
11/751441 |
Filed: |
May 21, 2007 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10389640 |
Mar 14, 2003 |
7220576 |
|
|
11751441 |
May 21, 2007 |
|
|
|
10338411 |
Jan 7, 2003 |
7060461 |
|
|
10389640 |
Mar 14, 2003 |
|
|
|
60346449 |
Jan 7, 2002 |
|
|
|
Current U.S.
Class: |
435/195 ;
435/243; 435/252.8; 435/255.1; 435/320.1; 435/325 |
Current CPC
Class: |
C07K 2319/60 20130101;
C07K 2319/95 20130101; C07K 14/00 20130101; C12P 21/02 20130101;
C12N 15/62 20130101; C07K 2319/02 20130101; C07K 2319/21
20130101 |
Class at
Publication: |
435/195 ;
435/243; 435/252.8; 435/255.1; 435/320.1; 435/325 |
International
Class: |
C12N 5/00 20060101
C12N005/00; C12N 1/16 20060101 C12N001/16; C12N 1/20 20060101
C12N001/20; C12N 9/14 20060101 C12N009/14; C12N 15/63 20060101
C12N015/63 |
Claims
1. A kit comprising a recombinant vector containing a nucleic acid
sequence encoding a UBL molecule selected from the group of SUMO,
RUB, HUB, URM1, and ISG15 operably linked to a promoter suitable
for expression in the desired host cell and a multiple cloning site
suitable for cloning a nucleic acid encoding the protein of
interest in-frame with the nucleic acid sequence encoding the UBL
molecule.
2. The kit of claim 1, wherein said kit further comprises host
cells suitable for expression of said vector.
3. The kit of claim 2, wherein said host cells are selected from
the group of yeast cells, E. coli, insect cells, and mammalian
cells.
4. The kit of claim 1, wherein said kit further comprises a kit
which comprises reagents for altering the nucleic acid encoding
said protein of interest to generate amino termini which are
different from those native to the wild-type protein.
5. The kit of claim 4, wherein said kit comprises reagents suitable
for site-directed mutagenesis.
6. The kit of claim 5, wherein said kit comprises oligonucleotides
for performing oligonucleotide-based site-directed mutagenesis.
7. A kit for purification of a protein from a host cell comprising:
i) a recombinant vector containing a nucleic acid sequence encoding
a UBL molecule selected from the group of SUMO, RUB, HUB, URM1, and
ISG15 operably linked to a promoter suitable for expression in the
desired host cell, a nucleic acid sequence encoding for a
purification tag in-frame with the nucleic acid sequence encoding
the UBL molecule, and a multiple cloning site suitable for cloning
a nucleic acid encoding the protein of interest in-frame with the
nucleic acid sequence encoding the UBL molecule, and ii) a protease
composition capable of cleaving the UBL molecule from the fusion
protein.
8. The kit of claim 7, wherein said kit further comprises host
cells suitable for expression of said vector.
9. The kit of claim 8 wherein said host cell is selected from the
group of yeast cells, E. coli, insect cells, and mammalian
cells.
10. The kit of claim 7 further comprising: i) a solid support for
binding the purification tag, ii) lysis buffers, iii) wash buffers,
iv) elution buffers, v) cleavage buffers, and vi) instruction
material.
Description
CROSS REFERENCE TO RELATED APPLICATION
[0001] This application is a continuation-in-part of pending U.S.
application Ser. No. 10/338,411 filed Jan. 7, 2003 which claims
priority to U.S. Provisional Application 60/346,449 entitled
"Methods for Protein Expression and Purification" filed Jan. 7,
2002. The entire disclosure of both documents is incorporated by
reference herein.
FIELD OF THE INVENTION
[0002] The present invention relates to the field of recombinant
gene expression and purification of expressed proteins. More
specifically, the invention provides materials and methods which
facilitate purification of heterologous proteins from a variety of
different host species.
BACKGROUND OF THE INVENTION
[0003] Several publications and patent documents are cited
throughout the specification in order to describe the state of the
art to which this invention pertains. Full citations for those
references that are numbered can be found at the end of the
specification. Each citation is incorporated herein as though set
forth in full.
[0004] Functional genomic studies have been hampered by the
inability to uniformly express and purify biologically active
proteins in heterologous expression systems. Despite the use of
identical transcriptional and translational signals in a given
expression vector, expressed protein levels have been observed to
vary dramatically (5, 7). For this reason, several strategies have
been developed to express heterologous proteins in bacteria, yeast,
mammalian and insect cells as gene-fusions.
[0005] The expression of heterologous genes in bacteria is by far
the simplest and most inexpensive means available for research or
commercial purposes. However, some heterologous gene products fail
to attain their correct three-dimensional conformation in E. coli
while others become sequestered in large insoluble aggregates or
"inclusion bodies" when overproduced. Major denaturant-induced
solubilization methods followed by removal of the denaturant under
conditions that favor refolding are often required to produce a
reasonable yield of the recombinant protein. Selection of ORFs for
structural genomics projects has also shown that only about 20% of
the genes expressed in E. coli render proteins that were soluble or
correctly folded (36, 38). These numbers are startlingly
disappointing especially given that most scientists rely on E. coli
for initial attempts to express gene products. Several gene fusion
systems such as NUS A, maltose binding protein (MBP), glutathione S
transferase (GST), and thioredoxin (TRX) have been developed (17).
All of these systems have certain drawbacks, ranging from
inefficient expression to inconsistent cleavage from desired
structure. Comprehensive data showing that a particular fusion is
best for a certain family of proteins is not available.
[0006] Ubiquitin and ubiquitin like proteins (UBLs) have been
described in the literature. The SUMO system has also been
characterized. SUMO (small ubiquitin related modifier) is also
known as Sentrin, SMT3, PIC1, GMP1 and UBL1. SUMO and the SUMO
pathway are present throughout the eukaryotic kingdom and the
proteins are highly conserved from yeast to humans (12, 15, 28).
SUMO homologues have also been identified in C. elegans and plants.
SUMO has 18% sequence identity with ubiquitin (28, 39). Yeast has
only a single SUMO gene, which has also been termed SMT3 (23, 16).
The yeast Smt3 gene is essential for viability (29). In contrast to
yeast, three members of SUMO have been described in vertebrates:
SUMO-1 and close homologues SUMO-2 and SUMO-3. Human SUMO-1, a 101
amino-acid polypeptide, shares 50% sequence identity with human
SUMO-1/SUMO-2 (29). Yeast SUMO (SMT3) shares 47% sequence identity
with mammalian SUMO-1. Although overall sequence homology between
ubiquitin and SUMO is only 18%, structure determination by nuclear
magnetic resonance (NMR) reveals that the two proteins share a
common three dimensional structure that is characterized by a
tightly packed globular fold with D-sheets wrapped around one
.alpha.-helix(4). Examination of the chaperoning properties of SUMO
reveals that attachment of a tightly packed globular structure to
N-termini of proteins can act as nucleus for folding and protect
the labile protein. All SUMO genes encode precursor proteins with a
short C-terminal sequence that extends from the conserved
C-terminal Gly-Gly motif. The extension sequence, 2-12 amino acids
in length, is different in all cases. Cells contain potent SUMO
proteases that remove the C-terminal extensions. The C-terminus of
SUMO is conjugated to F amino groups of lysine residues of target
proteins. The similarity of the enzymes of the sumoylation pathway
to ubiquitin pathway enzymes is remarkable, given the different
effects of these two protein modification pathways. Sumoylation of
cellular proteins has been proposed to regulate nuclear transport,
signal transduction, stress response, and cell cycle progression
(29). It is very likely that SUMO chaperones translocation of
proteins among various cell compartments, however, the precise
mechanistic details of this function of SUMO are not known.
[0007] Other fusions promote solubility of partner proteins
presumably due to their large size (e.g., NUS A). Fusion of
proteins with glutathione S-transferase (GST) or maltose binding
protein (MBP) has been proposed to enhance expression and yield of
fusion partners. However, enhanced expression is not always
observed when GST is used as GST forms dimers and can retard
protein solubility. Another problem with GST or other fusion
systems is that the desired protein may have to be removed from the
fusion. To circumvent this problem, protease sites, such as factor
X, thrombin or Tev protease sites are often engineered downstream
of the fusion partner. However, incomplete cleavage and
inappropriate cleavage within the fusion protein is often observed.
The present invention circumvents these problems.
SUMMARY OF THE INVENTION
[0008] In accordance with the present invention compositions and
methods for enhancing expression levels of a protein of interest in
a host cell are provided. An exemplary method comprises i) operably
linking a nucleic acid sequence encoding molecule selected from the
group consisting of SUMO, RUB, HUB, APG8, APG12, URM1, and ISG15 to
a nucleic acid sequence encoding said protein of interest thereby
generating a construct encoding a fusion protein, ii) introducing
said nucleic acid into said host cell, whereby the presence of said
molecule in said fusion protein increases the expression level of
said protein of interest in said host cell. In a preferred
embodiment the molecule is SUMO encoded by a nucleic acid of SEQ ID
NO: 2. The method optionally entails cleavage of said fusion
protein and isolation of the protein of interest.
[0009] In yet another embodiment of the invention, an exemplary
method for generating a protein of interest having an altered amino
terminus is provided. Such a method comprises i) providing a
nucleic acid sequence encoding the protein of interest; ii)
altering the N-terminal amino acid coding sequence in the nucleic
acid; iii) operably linking a SUMO molecule to the nucleic acid
sequence; and iv) expressing the nucleic acid in a eukaryotic cell,
thereby producing the protein of interest in the cell, wherein the
eukaryotic cell expresses endogenous SUMO cleaving enzymes, which
effect cleavage of SUMO from the sequence encoding the protein of
interest, thereby producing a protein of interest having an altered
amino terminus. All amino acids with the exception of proline may
be added to the amino terminus using this method.
[0010] The invention also provides a method for producing a
sumolated protein for tracking protein localization within a host
cell. An exemplary method comprises i) providing a nucleic acid
sequence encoding said protein; ii) substituting the N-terminal
amino acid coding sequence in the nucleic acid for a codon which
encodes proline; iii) operably linking a SUMO molecule to said
nucleic acid sequence; and iv) expressing said SUMO linked protein
in said host cell.
[0011] In another aspect of the invention, a method for enhancing
secretion levels of a protein of interest from a host cell is
provided. Such a method comprises i) operably linking a nucleic
acid sequence encoding molecule selected from the group consisting
of SUMO, RUB, HUB, URM1, and ISG15 to a nucleic acid sequence
encoding said protein of interest thereby generating a construct
encoding a fusion protein, ii) introducing said nucleic acid into
said host cell, whereby the presence of said molecule in said
fusion protein increases the secretion of said protein of interest
from said host cell.
[0012] In yet a further aspect of the invention, kits are provided
for performing the methods described above. Such kits comprise a
recombinant vector containing a nucleic acid sequence encoding a
UBL molecule selected from the group of SUMO, RUB, HUB, URM1, and
ISG15 operably linked to a promoter suitable for expression in the
desired host cell and a multiple cloning site suitable for cloning
a nucleic acid encoding the protein of interest. The recombinant
vector may also contain a nucleic acid sequence encoding for a
purification tag. The kits may further comprise a preparation of a
protease capable of cleaving the UBL molecule from the fusion
protein, an appropriate solid phase for binding the purification
tag, appropriate buffers including wash and cleavage buffers, and
frozen stocks of host cells.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIG. 1 is a schematic drawing illustrating the conjugation
pathways for ubiquitin and ubiquitin-like proteins (UBLs). An arrow
in the "C-terminal hydrolase" column indicates the cleavage of the
precursor proteins. Only enzymes previously described are provided.
The failure to list a particular enzyme in a particular pathway
does not preclude the existence of that enzyme.
[0014] FIG. 2 is a schematic representation of the cloning strategy
used to express SUMO fusion proteins. In this cloning strategy, a
Bsa I site is introduced directly downstream of a SUMO sequence
within a desired vector. The nucleic acid sequence encoding the
protein to be expressed as a fusion with SUMO is amplified by PCR
with primers that introduce a Bsa I site at the 5' end. The vector
and the PCR product are cleaved by Bsa I and an appropriate
restriction enzyme (represented by Xxx) that allows for insertion
of the cleaved PCR product into the vector.
[0015] FIG. 3 is a circular map of pSUMO, an E. coli SUMO
expression vector. The nucleic acid sequence provided (SEQ ID NO:
37) encompasses the SUMO encoding region and the multiple cloning
site. The amino acid sequence provided (SEQ ID NO: 38) is
6.times.His tagged SUMO. Restriction enzymes are indicated above
their recognition sequence. The pSUMO expression vector has been
constructed in the backbone of the pET-24d expression vector
(Novagen).
[0016] FIGS. 4A and 4B show Coomassie stained gels and graphic data
that demonstrate that the attachment of the carboxy-terminus of
UBLs to the amino-terminus of target proteins increases expression
and/or enhances solubility of the protein in E. coli. Green
fluorescence protein (GFP) and UBL-GFP fusions encoded in pET24d E.
coli expression vectors were expressed in the E. coli Rosetta pLysS
strain (Novagen). Expression was induced either at 37.degree. C.
with 1 mM IPTG for four hours either in LB medium (FIG. 4A) or in
minimal media with 1 mM IPTG at 26.degree. C. overnight (FIG. 4B).
Left panels are Coomassie stained SDS-polyacrylamide gels of total
cellular protein (top) and soluble proteins (bottom). The first
lanes of each gel are molecular weight markers. Dark arrow
indicates observed GFP species and light arrow indicates size of
expected GFP species. Right panel is quantitative representation in
Arbitrary Units (AU) of GFP fluorescence present in soluble
fractions as measured in a Fluorscan Ascent FL fluorometer
(LabSystems).
[0017] FIG. 5 is a Coomassie stained SDS-polyacrylamide gel
demonstrating the expression and purification of a human tyrosine
kinase as a SUMO fusion protein in E. coli. Tyrosine kinase and the
fusion protein SUMO-tyrosine kinase were expressed in the Rossetta
pLysS strain (Novagen) of E. coli in LB or minimal media (MM). The
right panel shows the Ni-NTA resin purified proteins from the
transformed E. coli cells. The left panel has the same lane
arrangement as the right panel, but 1/3 of the amount protein was
loaded on the SDS-polyacrylamide gel. Numbers indicate molecular
weight standards in the first lane.
[0018] FIG. 6 shows a Coomassie stained SDS-polyacrylamide gel
representing purified SUMO hydrolase from E. coli and the partial
purification and elution of SUMO-tyrosine kinase fusion protein. E.
coli cells were transformed with a vector expressing either SUMO
hydrolase Ulp1 or SUMO-tyrosine kinase and cultured in minimal
media. Proteins were subsequently purified by Ni-NTA resin.
SUMO-tyrosine kinase was further purified by elution with either
100 mM EDTA or 250 mM imidazole. The gel shows that the current
methods yield approximately 90% pure Ulp1 protein.
[0019] FIG. 7 is a stained SDS-polyacrylamide gel of the expression
of the liver X receptor (LXR) ligand binding domain as a fusion
protein with SUMO. E. coli cells were transformed with a SUMO-LXR
expression vector. The cells were subsequently induced with 1 mM
IPTG at 20.degree. C. overnight or 37.degree. C. for 3 hours. 10
.mu.g of total protein (WC), soluble protein (CS), and insoluble
protein (insol) from each induction were loaded per well of a 12%
SDS-polyacrylamide gel.
[0020] FIGS. 8A and 8B display stained SDS-polyacrylamide gels
demonstrating the solubility of the SUMO-MAPKAPK2 fusion protein
expressed at 37.degree. C. (FIG. 8A) and 20.degree. C. (FIG. 8B).
E. coli cells expressing a SUMO-fusion of MAPKAP2 kinase were
induced with 0.1 (lanes 2-4), 0.25 (lanes 5-7), and 0.5 (lanes
8-10) mM IPTG. The original induction sample (I) in addition to the
supernatant (S) and resuspended pellet (P) following lysis and
centrifugation were analyzed by SDS-PAGE. The first lanes are
BioRad low molecular weight markers.
[0021] FIG. 9 is a Western blot (top panel) of UBL-GFP fusion
proteins expressed in yeast cells demonstrating that UBL-GFP fusion
proteins are co-translationally cleaved in yeast. Yeast strain
BJ1991 was transformed with a vector expressing Ub-GFP, SUMO-GFP,
Urm1-GFP, Hub1-GFP, Rub1-GFP, Apg8-GFP, Apg12-GFP or ISG15-GFP
under the control of a copper sulfate regulated promoter. Total
cell extracts were prepared by boiling the cells in SDS-PAGE buffer
and briefly sonicating the sample to reduce viscosity. 20 .mu.g of
the total yeast proteins were resolved on 12% SDS-PAGE minigels and
analyzed by Western blot with a rabbit polyclonal antibody against
GFP and a secondary HRP-conjugated antibody. The arrow indicates
the size of unfused GFP. An identical gel (bottom panel) was run in
parallel and stained with Coomassie to ensure equal loading of the
proteins from all samples.
[0022] FIG. 10 is a series of Western blots that indicate SUMO-GFP
Fusions are co-translationally cleaved in yeast generating novel
amino termini. In addition to methionine as the first amino acid of
GFP following the C-terminal Gly-Gly sequence of SUMO, we have
engineered the remaining 19 amino acids as the amino-terminal
residue of GFP in yeast SUMO-(X)20-GFP expression vectors. All
expression vectors containing the 20 amino-terminal variants of GFP
fusion proteins were expressed in yeast under the control of copper
inducible promoter. Yeast lysates were separated by SDS-PAGE and
analyzed by Western blot with antibodies against GFP. The
"unfused-GFP" lanes represent the expression of GFP alone with no
SUMO fusion. The "SSUMO-GFP" lanes are bacterially expressed
SUMO-GFP.
[0023] FIGS. 11A and 11B are schematic representations of the SUMO
(FIG. 11A) and ubiquitin (FIG. 11B) GFP fusion proteins that also
contain the gp67 secretory signal. In construct E, only unfused GFP
protein is expressed. In construct G, a 7 kDa secretory sequence
from gp67 was attached to the N-terminus of GFP. In constructs S
and U, SUMO and ubiquitin sequences, respectively, are inserted in
frame to the N-terminus of GFP. In constructs GS and GU, gp67
sequences are followed by SUMO and ubiquitin, respectively, and
then GFP. In constructs SG and UG, gp67 sequences are inserted in
between the C-terminus of SUMO and ubiquitin, repectively, and the
N-terminus of GFP.
[0024] FIGS. 12A and 12B are Western blots demonstrating expression
of SUMO and ubiquitin fusion proteins in insect cells. Hi-five
insect cells were infected with recombinant baculovirus encoding
for SUMO or ubiquitin fusion proteins. At 24 hours post-infection,
equal amounts of cell lysates (FIG. 12A) and media (FIG. 12B) were
separated by SDS-PAGE and analyzed by Western blot with antibodies
against GFP. Lane markers: Hi5 is Hi Five cells, E is eGFP, G is
gp67-eGFP, U is ubiquitin-eGFP, S is SUMO-eGFP, GU is
gp67-ubiquitin-eGFP, UG is ubiquitin-gp67-eGFP, GS is
gp67-SUMO-eGFP, SG is SUMO-gp67-eGFP, and eGFP is a positive
control.
[0025] FIGS. 13A, 13B, and 13C are Western blots demonstrating
expression of SUMO and ubiquitin fusion proteins in insect cells.
Hi-five insect cells were infected with recombinant baculovirus
encoding for SUMO or ubiquitin fusion proteins. At 48 hours
post-infection, equal amounts of cell lysates (FIGS. 13A and 13C)
and media (FIG. 13B) were separated by SDS-PAGE and analyzed by
Western blot with antibodies against GFP. The lanes are: Hi5 is Hi
Five cells, E is eGFP, G is gp67-eGFP, U is ubiquitin-eGFP, S is
SUMO-eGFP, GU is gp67-ubiquitin-eGFP, UG is ubiquitin-gp67-eGFP, GS
is gp67-SUMO-eGFP, SG is SUMO-gp67-eGFP, and S-P is
SUMO-proline-GFP.
[0026] FIG. 14 is a series of micrographs of eGFP expression in
Hi-Five cells infected with different eGFP fusion baculoviruses.
Pictures were taken with a Leitz Fluovert Inverted Microscope with
excitation at 488nm with Hammamatsu Orca Cooled CCD camera.
[0027] FIG. 15 contains stained SDS-polyacrylamide gels
representing the in vitro Ulp1 cleavage of Ni-NTA resin purified
His6SUMO-eGFP fusion proteins expressed in E. coli. The purified
His6SUMO-eGFP fusions, containing a different amino acid at the +1
position of the Ulp1 cleavage site, were incubated at 30.degree. C.
for 3 hours with purified Ulp1 hydrolase. The lanes are marked with
the single letter code of the +1 amino acid. The negative control
(-Ve) is the incubation of His6SUMO-eGFP at 30.degree. C. for 3
hours in the absence of enzyme. Low molecular weight markers (LMW)
are also provided.
[0028] FIG. 16 contains a pair of stained SDS-polyacrylamide gels
representing the effects of various conditions on Ulp1. Ni-NTA
purified His6SUMO-GFP was incubated with Ulp1 under the indicated
conditions for one hour at room temperature unless indicated
otherwise. Low molecular weight markers (LMW) are also
provided.
[0029] FIG. 17 is a stained SDS-polyacrylamide gel representing the
effects of various protease inhibitors on Ulp1. Ni-NTA purified
His6SUMO-GFP was incubated with Ulp1 and 10 mM of various protease
inhibitors for 1 hour at room temperature. Lane markers: Norm is
addition of Ulp1 and N-ethymaleimide (NEM) to the substrate at the
same time, Pre is the incubation of Ulp1 with NEM prior to the
addition of substrate, +Ve is the absence of any inhibitor, -Ve is
in the absence of Ulp1, lane 1 is with E-64, lane 2 is with EDTA,
lane 3 is with leupeptin, lane 4 is with NEM, lane 5 is with
pepstatin, lane 6 is with TLCK. Low molecular weight markers (LMW)
are also provided.
[0030] FIG. 18 is a stained SDS-polyacrylamide gel showing
purification and cleavage of MAPKAP2. E. coli transformed with the
expression vector for SUMO-MAPKAP2 where either grown at 37.degree.
C. and induced with 0.1 mM IPTG (lanes 2-7) or at 20.degree. C. and
induced with 0.5 mM IPTG (lanes 8-13). Cell lysates were Ni-NTA
purified and separated by SDS-PAGE. Lane 1: BioRad low molecular
weight marker; lanes 2 and 8: soluble fraction of cell lysates;
lanes 3 and 9: flow through from Ni-NTA column; lanes 4 and 10: 15
mM imidazole wash of Ni-NTA column; lanes 5 and 11: 300 mm
imidazole elution of Ni-NTA column; lanes 6 and 12: supernatant of
2 hour incubation of elution with SUMO hydrolase at 30.degree. C.;
and lanes 7 and 13: pellet of hydrolase incubation.
[0031] FIG. 19 is a stained SDS-polyacrylamide gel showing SUMO
hydrolase function at pH 7.5 and 8.0. Purified SUMO-GFP was cleaved
using 1/50 diluted purified stock of SUMO hydrolase in sodium
phosphate buffer pH 7.5 (lanes 1-6) and 8.0 (lanes 8-13) at room
temperature for the following length of times: lanes 1 and 8: 0
minutes, lanes 2 and 9: 1 min, lanes 3 and 10: 2.5 min, lanes 4 and
11: 5 min, lanes 5 and 12: 10 min, and lanes 6 and 13: 20 min. Lane
7 is blank and M is molecular weight markers.
[0032] FIG. 20 is a stained SDS-polyacrylamide gel indicating SUMO
hydrolase cleaves SUMO-.beta.-Galactosidase. Purified SUMO
hydrolase was incubated with E. coli produced
SUMO-.beta.-Galactosidase at room temperature for 0 minutes (lane
1), 2.5 min (lane 2), 5 min (lane 3), 10 min (lane 4), and 20 min
(lane 5). Molecular weight markers are provided in lane M.
[0033] FIG. 21 is a stained SDS-polyacrylamide gel showing the
cleavage of SUMO-GUS by SUMO Hydrolase in the presence of urea.
Ni-NTA purified SUMO-.beta.-GUS was incubated with 1/50 dilution of
purified stock of SUMO hydrolase for 1 hour in increasing
concentrations of urea at pH 8.0. Lane markers: M is broad range
molecular weight marker; lane 1 is SUMO-GUS from soluble E. coli
fraction; lane 2: flow through from nickel column; lane 3: wash;
lane 4: elution; lanes 5-9: SUMO-GUS and hydrolase with various
denaturants, specifically, lane 5: none; lane 6: 1 mM DTT; lane 7:
0.5 M Urea; lane 8: 1.0M Urea; lane 9: 2.0M Urea.
[0034] FIG. 22 is a stained SDS-polyacrylamide gel demonstrating
the rapid isolation of a SUMO fusion protein. E. coli cells
expressing a single IgG binding domain from Protein G fused to
His6Smt3 were lysed with guanidinium chloride lysis buffer. Cell
lysate supernatants were purified over Ni-NTA and eluted in a
native buffer that allows for cleavage by Ulp1. Lane markers: PMW
is molecular weight markers; lane 1 is cellular proteins prior to
treatment with guanidinium chloride, lane 2 is guanidinium chloride
cell lysates, lane 3 is flow through from Ni-NTA column, lane 4 is
elution, and lane 5 is Ulp1 cleavage of elution.
[0035] FIGS. 23 is the amino acid (SEQ ID NO: 1) and nucleotide
(SEQ ID NO: 2) sequences of SUMO.
[0036] FIGS. 24A and 25B are the amino acid (SEQ ID NO: 3) and
nucleotide (SEQ ID NO: 4) sequences of GFP.
[0037] FIGS. 25A and 25B are the amino acid (SEQ ID NO: 5) and
nucleotide (SEQ ID NO: 6) sequences of SUMO-GFP.
[0038] FIGS. 26A and 26B are the amino acid (SEQ ID NO: 7) and
nucleotide (SEQ ID NO: 8) sequences of ubiquitin-GFP.
[0039] FIGS. 27A and 27B are the amino acid (SEQ ID NO: 9) and
nucleotide (SEQ ID NO: 10) sequences of URM1-GFP.
[0040] FIGS. 28A and 28B are the amino acid (SEQ ID NO: 11) and
nucleotide (SEQ ID NO: 12) sequences of HUB1-GFP.
[0041] FIGS. 29A and 29B are the amino acid (SEQ ID NO: 13) and
nucleotide (SEQ ID NO: 14) sequences of RUB1-GFP.
[0042] FIGS. 30A and 30B are the amino acid (SEQ ID NO: 15) and
nucleotide (SEQ ID NO: 16) sequences of APG8-GFP.
[0043] FIGS. 31A and 31B are the amino acid (SEQ ID NO: 17) and
nucleotide (SEQ ID NO: 18) sequences of APG12-GFP.
[0044] FIGS. 32A and 32B are the amino acid (SEQ ID NO: 19) and
nucleotide (SEQ ID NO: 20) sequences of ISG15-GFP.
[0045] FIG. 33 is the amino acid (SEQ ID NO: 21) and nucleotide
(SEQ ID NO: 22) sequences of SUMO-Protein G.
[0046] FIGS. 34A, 34B, and 34C are the amino acid (SEQ ID NO: 23)
and nucleotide (SEQ ID NO: 24) sequences of SUMO-.beta. GUS.
[0047] FIGS. 35A, 35B, and 35C are the amino acid (SEQ ID NO: 25)
and nucleotide (SEQ ID NO: 26) sequences of SUMO-LXR.alpha..
[0048] FIGS. 36A and 36B are the amino acid (SEQ ID NO: 27) and
nucleotide (SEQ ID NO: 28) sequences of SUMO-Tyrosine Kinase.
[0049] FIG. 37A and 37B are the amino acid (SEQ ID NO: 29) and
nucleotide (SEQ ID NO: 30) sequences of SUMO-MPAKAP2 Kinase.
[0050] FIGS. 38A, 38B, 38C, 38D, and 38E are the amino acid (SEQ ID
NO: 31) and nucleotide (SEQ ID NO: 32) sequences of SUMO-.beta.
GAL.
[0051] FIG. 39 is a circular map of YEpSUMO-eGFP.
[0052] FIGS. 40A, 40B, 40C, 40D, and 40E are the nucleotide
sequence (SEQ ID NO: 33) of YEpSUMO-eGFP. Select restriction enzyme
sites are indicated.
[0053] FIG. 41 is a circular map of YEpUbGUS.
[0054] FIGS. 42A, 42B, 42C, 42D, 42E, 42F, and 42G are the
nucleotide sequence (SEQ ID NO: 34) of YEpSUMO-eGFP. Select
restriction enzyme sites are indicated.
[0055] FIG. 43 is a circular map of pFastBac SUMO-eGFP.
[0056] FIGS. 44A, 44B, 44C, 44D, and 44E are the nucleotide
sequence (SEQ ID NO: 35) of pFastBac SUMO-eGFP. Select restriction
enzyme sites are indicated.
[0057] FIG. 45 is a circular map of pSUMO (pET24d6HisxSUMO).
[0058] FIGS. 46A, 46B, 46C, 46D, and 46E are the nucleotide
sequence (SEQ ID NO: 36) of pSUMO (pET24d6HisxSUMO). Select
restriction enzyme sites are indicated.
DETAILED DESCRIPTION OF THE INVENTION
[0059] There are a number of reasons for the lack of efficient
recombinant protein expression in a host, including, for example,
short half life, improper folding or compartmentalization and codon
bias. While the Human Genome project has successfully created a DNA
"map" of the human genome, the development of protein expression
technologies that function uniformly in different expression
platforms and for all the protein motifs has not yet been
achieved.
[0060] In accordance with the present invention, it has been
discovered that that N-terminal fusion of the ubiquitin homologue
SUMO or Smt3 to otherwise unexpressed or poorly expressed proteins
remarkably enhances the expression levels of biologically active
proteins in both prokaryotes and eukaryotes. The Ubiquitin-Like
protein (UBL) family contains many proteins, including for example,
SUMO, Rub1, Hub1, ISGt5, Apg12, Apg8, Urm1, Ana1a and Ana1b (15,
28). See Table 1. The hallmark of all of these proteins, exept
APG12, and URM1, is that they are synthesized as precursors and
processed by a hydrolase (or proteases) to generate mature
carboxy-terminal sequence. Secondly, all of the UBLs share a common
structure.
[0061] In E. coli, fusion proteins remained intact while in yeast
or insect cells fusion proteins were efficiently cleaved, except
when proline was the N-terminal residue of the target protein.
While any of the UBLs set forth in Table 1 may be utilized in the
compositions and methods of the invention to enhance expression of
heterologous fusion proteins of interest, SUMO is exemplified in
the gene fusion system provided herein. TABLE-US-00001 TABLE 1
Properties of Ubiquitin-like Proteins (UBLs) UBL Knockout % UB COOH
(yeast) Function phenotype Substrate Identity KDa Hydrolase
Residues UB Translocation not viable many 100 8.5 UCH/ LRLR GG to
proteasome UBPs (SEQ ID for degradation. NO: 39) SUMO Translocation
not viable Sentrins, 18 11.6 Aut1/ GG (SMT3) to nucleus RanGap,
Aut2 others RUB1 Regulation of viable; cullins, 60 8.7 not GG
(NEDD8) mitosis. non- cytoskelet. known essential. proteins HUB1
Cell viable; Sph1, 22 8.2 not YY polarization deficient Hbt1 cell
known during mating in mating. polarity projections. factors ISG-15
Unknown IFN, LPS many .about.30; 28 15.0 UBP43 LRLR GG (UCRP)
hypersensi- (two (USP18) (SEQ ID tivity; domains) NO: 39) death
APG12 Autophagy viable, Apg5 18 21.1 not FG defective in cleaved
autophagy URM1 Unknown ts growth; unknown 20 11.0 not GG non- known
essential. APG8 Autophagy viable; no phospatidyl- 18 13.6 Apg4/ FG
(LC3) autophago- ethanol- Aut2 cytosis or amine sporulation
[0062] The SUMO fusion system of the present invention has been
successfully applied to express different molecular weight proteins
such as 6 KDa Protein G domain to 110 KDa .beta.-galactosidase in
E. coli and eukaryotic cells. More specifically, the system allows
one to: (1) enhance the expression of under-expressed proteins; (2)
increase the solubility of proteins that are insoluble; (3) protect
candidate proteins from degradation by intracellular proteases by
fusing UBLs to their N-termini; (4) cleave the fusion protein to
efficiently generate authentic proteins using naturally-present
enzymes (5) generate proteins with novel amino termini; and (6)
cleave all fusion proteins with remarkable efficiency irrespective
of the N-terminal sequence of the fused protein, using UBL
hydrolases such as SUMO hydrolase Ulpl. Because UBLs are small
molecular weight proteins (.about.100 amino acids), they can also
be used as purification tags as well. These remarkable properties
of UBLs make them excellent candidates for enhancing expression and
solubility of proteins. The method may also be utilized to generate
novel amino termini on proteins of interest for a variety of
research, diagnostic and therapeutic applications.
[0063] The ultimate fate of ubiquitinated or sumoylated proteins
within a cell varies. A protein can be monoubiquitinated or
polyubiquitinated. Ubiquitination of protein has multiple functions
and gives rise to different fates for the protein within a cell
(11). Ubiquitination primarily targets proteins to 26S proteosome
for degradation (13). On the other hand, sumoylation of target
proteins does not lead to degradation, but, rather, leads directly
or indirectly to altered localization of proteins (15). There are
about 17 deubiquitinating enzymes that cleave conjugated ubiquitin
from target proteins as well as ubiquitin-ubiquitin and ubiquitin
artificial-fusion proteins (1, 35). Thus far it appears that yeast
has two cysteinyl proteases, called Ulp1 and Ulp2, that remove SUMO
from .epsilon.-amino groups of lysine as well from the artificial
linear SUMO-fusions(20, 21).
[0064] To determine if UBLs and SUMO fusion will enhance expression
of recombinant proteins of different sizes and function, we have
designed several UBL-GFP fusion proteins in addition to SUMO-fusion
proteins and monitored their expression levels in E. coli, yeast
and insect cells. In E. coli, the proteins are expressed as intact
fusions, while in eukaryotes, the fusions were efficiently cleaved.
A dramatic increase in the yield of proteins after fusion with SUMO
and expression in E. coli was observed. In additional studies,
SUMO-GFP protein was used as a model fusion for detailed studies in
yeast and insect cells. We have designed SUMO-GFP fusion where all
the N-terminal methionine residues have been replaced with the rest
of the 19 amino acids. We have purified 20 sumo-GFP fusion proteins
from E.coli and cleaved them in vitro with Ulp1. Ulp1 efficiently
cleaved 19 out of the 20 possible amino acid junctions. The proline
junction was not cleaved. As compared to deubiquitinating enzyme
(3), Ulp1 demonstrated broad specificity and robustness in its
digestion properties. Proteins having a wide range of molecular
weights were cleaved efficiently by Ulp1. Similarly, in yeast, and
insect cells, the fusion proteins were efficiently processed,
yielding intact, biologically active proteins. In addition to
enhancing protein expression levels, the SUMO-fusion approach can
be used to advantage to generate desired N-termini to study novel
N-terminal protein functions in the cell. Since SUMO fusion can
both enhance recombinant protein yield and generate new N-termini,
this technology provides an important tool for post-genomic
biotechnology analyses.
[0065] The present invention also encompasses kits for use in
effecting enhanced expression, secretion, purification,
localization, and alteration of the amino terminus of a protein of
interest. Such kits comprise a recombinant vector containing a
nucleic acid sequence encoding a UBL molecule selected from the
group of SUMO, RUB, HUB, URM1, and ISG15 operably linked to a
promoter suitable for expression in the desired host cell and a
multiple cloning site suitable for cloning a nucleic acid encoding
the protein of interest in-frame with the nucleic acid sequence
encoding the UBL molecule. The promoter is preferably a strong
promoter and may be constitutive or regulated. Such promoters are
well known in the art and include, but are not limited to, the
promoters provided hereinbelow such as the ADH1, T7, and CUP1
promoters.
[0066] The recombinant vector may also contain a nucleic acid
sequence encoding a purification tag in-frame with the sequence
encoding the UBL molecule. Purification tags are well known in the
art (see Sambrook et al., 2001, Molecular Cloning, Cold Spring
Harbor Laboratory) and include, but are not limited to:
polyhistidine, glutathione-S-transferase, maltose binding protein,
thioredoxin, the FLAG.TM. epitope, and the c-myc epitope. Materials
and methods for the purification of fusion proteins via
purification tags are also well known in the art (see Sambrook et
al., Novagen catalog, 2002, examples hereinbelow). Reagents
including, but not limited to, solid supports capable of binding
the purification tag, lysis buffers, wash buffers, and elution
buffers may also be included in the kits.
[0067] The kits may further comprise a composition comprising a
protease or proteases capable of cleaving the UBL molecule from the
fusion protein, cleavage buffers, frozen stocks of host cells, and
instruction manuals. The kits may also further comprise reagents
for altering the nucleic acid encoding a protein of interest to
generate amino termini which are different from those native to the
wild-type protein. Methods for altering the nucleic acid are well
known in the art and include, but are not limited to, site-directed
mutagenesis and oligonucleotide-based site-directed mutagenesis
(see BD Biosciences Catalog, 2001; Qiagen Catalog, 2001; Ausubel et
al., eds., 1995, Current Protocols in Molecular Biology, John Wiley
and Sons, Inc.).
[0068] As used herein, an "instructional material" includes a
publication, a recording, a diagram, or any other medium of
expression which can be used to communicate the usefulness of the
composition of the invention for performing a method of the
invention. The instructional material of the kit of the invention
can, for example, be affixed to a container which contains a kit of
the invention to be shipped together with a container which
contains the kit. Alternatively, the instructional material can be
shipped separately from the container with the intention that the
instructional material and kit be used cooperatively by the
recipient.
[0069] The materials and methods set forth below are provided to
facilitate the practice of the present invention.
Design and Construction of E. coli Expression Vectors:
[0070] The original vector backbone was developed using pET 24d
vector from Novagen (see FIG. 3 as well as FIGS. 45-46A-E) . pET24d
uses a T7 promoter system that is inducible with IPTG. The vector
has a kanamycin selection marker and does not contain any
translation terminator.
Construction of Variable His6SUMO-GFP Fusions:
[0071] A N-terminal six his-tagged SUMO (fusion vector was
constructed as follows. A PCR product was generated with the
primers 5'CCATGGGTCATCACCATCATCATCACGGGTCGGACTCAGAAGTCAATC AA-3'
(SEQ ID NO: 40) and 5'-GGATCCGGTCTCAACCTCCAATC TGTTCGCGGTGAG-3'(SEQ
ID NO:41) using yeast Smt3 gene (16) as a template (kind gift of
Erica Johnson). The PCR fragment was double digested with Nco I and
Bam HI, and then ligated into pET24d, which had been similarly
digested. It is important to note that the current invention
utilizes a variant of the wild type yeast SUMO sequence. The A
nucleotide at position 255 has been replaced with a G nucleotide,
thus encoding an alanine instead of a threonine (SEQ ID NOS: 1 and
2). The detailed cloning strategy is provided in FIG. 2. The pET24d
His6Smt3eGFP fusions, containing each of the twenty different amino
acids at the +1 position of the cleavage site were generated as
follows. The eGFP sequence was amplified a template, with the
primers 5'-GGTCTCAAGGT NNNGTGAGCAAGGGCGAGGAGC-3' (SEQ ID NO:42) and
5'-AAGCTTATTACTTGTACAGCTCGT CCATGCC-3'(SEQ ID NO: 43), where the
NNN in the forward primer corresponding to the variable codon
encoding one of the twenty amino acids. The PCR products were
purified and double digested with Bsa I and Hind III, these were
then ligated into the pET24dHisSUMO vector which had been similarly
digested. Plasmids from clones containing the variable inserts,
were sequenced to confirm the presence of the novel codon in
each.
Construction of SUMO-Fusion Vectors from pSUMO:
[0072] The gene encoding the protein of interest is cloned in frame
with the SUMO tag, in the pSUMO vector, by utilizing the encoded
Bsa I site. Bsa I belongs to the family of Class IIS restriction
enzymes, which recognize non-palindromic sequences, and cleave at a
site that is separate from their recognition sequences. The latter
trait gives Class IIS enzymes two useful properties. First, when a
Class IIS enzyme recognition site is engineered at the end of a
primer, the site is cleaved when digested. Second, overhangs
created by Class IIS enzymes are template-derived and thus unique.
This is in clear contrast to regular Class II restriction enzymes
such as EcoRI, which creates an enzyme-defined overhang that will
ligate to any EcoRI-digested end. The unique overhangs produced by
Class IIS enzymes can be ligated only to their original
partner.
[0073] It is often preferable to amplify the gene encoding the
protein of interest via PCR prior to cloning into the pSUMO vector.
The forward primer must contain the additional standard
sequence:
[0074] 5'-GGTCTCAAGGTNNN-3'(SEQ ID NO:44) where GGTCTC is the Bsa I
site and NNN is the first codon of the gene encoding the protein of
interest. Additional nucleotides are required for the primer to
anneal specifically with the gene of interest during the PCR
amplification. The reverse primer may contain another restriction
enzyme such as Xho I to allow for directional cloning of a gene
into pSUMO. Bsa I can also be employed in the reverse primer to
simplify cloning steps, for example, in the following primer:
TABLE-US-00002 (SEQ ID NO:45) 5'-GGTCTCCTCGAGTTANNN-3'
The PCR product can be digested with both Xho I and Bsa I. A
digestion reaction containing just the latter enzyme generates a
product that would directionally ligate into the pSUMO vector
between the Bsa I and Xho I sites of the MCS. Construction of
pSUMO-Protein G Fusion E. coli Expression Vector:
[0075] The B2 IgG binding domain (9) from streptococcus G148
protein was synthesized by three synthetic oligonucleotides. The
sequence of the gene is 5'-GT CTTAAGA CTA AGA GGT GGC ACG CCG GCG
GTG ACC ACC TAT AAA CTG GTG ATT AAC GGC AAA ACC CTG AAA GGC GAA ACC
ACC-3'. (SEQ ID NO:46) The 81 bps oligo sequence is 5'-GCC GTT ATC
GTT CGC ATA CTG TTT AAA CGC TTT TTC CGC GGT TTC CGC ATC CAC CGC TTT
GGT GGT TTC GCC TTT CAG-3'. (SEQ ID NO:47) The 86 pbs oligo
sequence is 5'-CAG TAT GCG AAC GAT AAC GGC GTG GAT GGC GTG TGG ACC
TAT GAT GAT GCG ACC AAA ACC TTT ACC GTG ACC GAA TAA GGT ACC
CC-3'(SEQ ID NO:48). The bolded nucleotides refer to the AfiII and
Kpn1 sites that flank the protein G domain. ACG is the first amino
acid residue of the domain. The above three oligos were annealed
using the Life Technologies protocol. The annealed fragments were
extended by Poll enzyme. The resultant gene was PCR amplified by
the following oligo primers G1 forward 5'-CTT GTC TTA AGA GGT-3'
(SEQ ID NO:49) and G2 reverse primer 5'-GCT GGG TAC CTT ATT CGG
TCA-3'(SEQ ID NO:50). The above protein G gene was cloned at the
AfllI and Kpn1 site of the human ubiquitin gene and expressed as
ubiquitin-protein G fusion protein in an E. coli pET 22 expression
vector (Novagen). The protein G sequence was in turn amplified from
the ubiquitin-protein G fusion plasmid by using the primers
5'-GGTCTCAAGGTACGCCGGCGGTGACCACCT-3'(SEQ ID NO:51) and
5'-AAGCTTATTATTCGGTCACGGTAAAGGTTT-3'(SEQ ID NO:52) and inserted in
pSUMO to generate pSUMO-protein G expression vector.
Construction of E. coli SUMO-.beta.-Galactosidase Expression
Vector.
[0076] E. coli .beta.-galctosidase was amplified using pfu
(Stratagene) a preparation of genomic DNA from BL21(DE3)
(Stratagene) as a template and the primers
5'-GGTCTCAAGGTATGACCATGATTACGGATTCACT-3' (SEQ ID NO:53) and
5'-AAGCTTATTATTATTATTTTTGACACCAGACC-3'(SEQ ID NO:54). The PCR
products were purified and double digested with Bsa I and Hind III.
These were then ligated into the vector pET24d6xHisSUMO, which had
been similarly digested.
Construction of E. coli pSUMO-Liver X Receptor (LXR) Expression
Vector:
[0077] The PCR products of the LXR from amino acid residue 189 to
the end of the protein that spans the ligand binding domain was
digested with BsaI and HindIII and ligated into the pSUMO vector,
also digested with BsaI and HindIII.
Construction of E. coli pSUMO-MAPKAP2 Expression Vector:
[0078] The fragment of MAPKAP2, encoded in the plasmid pMON45641,
was amplified by PCR and cloned into pET24d 6HisSUMO vector by
designing PCR primers that flank the sequence shown FIGS. 8A and
8B. The SUMO vector was digested with Bsa I site and Hind III. The
cloning procedure yields a fusion protein, which, upon expression,
purification and cleavage, generates the desired protein whose
first amino acid is a glutamine (CAG).
Construction of E. coli pSUMO-Tyrosine Kinase Expression
Vector:
[0079] For the tyrosine kinase, both, the SUMO fusion and unfused
expression vectors were designed. As described above the region of
kinase was cloned by PCR flanked with BsaI and Hind III sites that
were cloned in to similarly digested pSUMO.
Construction of E. coli pSUMO-.beta.-Glucuronidase Expression
Vector:
[0080] E. coli .beta.-glucuronidase was the kind gift of Ben Glick,
University of Chicago) and amplified with the primers
TABLE-US-00003 (SEQ ID NO:55)
5'-GGTCTCAAGGTATGCAGATCTTCGTCAAGACGTT-3' and (SEQ ID NO:56) 5'-AAGC
TTATTATTGTTTGCCTCCCTGCTGCG-3'.
Construction of E. coli SUMO-Hydrolase Expression Vector:
[0081] C-terminal His-tagged SUMO hydrolase/protease Ulp(403-621)p
(21) (27) as expressed from pET24d in Rosetta(DE3) pLysS (Novagen).
The recombinant rotein was purified using Ni-NTA agarose (Qiagen)
and buffer exchanged into 20 mM Tris-HCl pH 8.0, 150 mM NaCl and 5
mM .beta.-mercaptoethanol using a PD-10 column (AP Biotech). About
2 ug of the pure protein was analyzed on gels and data shown in
FIG. 6 lane Ulp1. The protein was almost 90% pure as judged by
SDS-PAGE analysis.
Construction of E. coli UBL-GFP Fusion Vectors.
[0082] DNA sequences encoding ubiquitin (Ub), SUMO, Urm1, Hub1,
Rub1, Apg8, and Apg12 were PCR-amplified using Deep-Vent polymerase
(NEB) and yeast strain DNA to generate a template. Full-length
human ISG15 cDNA was a kind gift of Dr. A. Haas, Medical College of
Wisconsin, Milwaukee. A unique NcoI site followed by 6His sequence
was introduced by PCR at the 5'-end of each Ub1 cDNA. Primer
sequence at the 3'-end included unique Esp3I and Hindlll sites. PCR
products were digested with NcoI/HindIII and inserted into
respective sites of pET24d vector (Novagen) as described above.
Full length GFP sequence (Clontech Cat # 60610-1) flanked by Esp3I
and HindIII sites, respectively, was PCR-amplified and cloned into
pCR4-TOPO-TA vector (Invitrogen). Esp3I/HindIII digested
GFP-encoding gene was inserted into respective sites of pET24d-UBLl
plasmids, creating final UBL-GFP expression vectors for E. coli. In
toto, there were nine plasmid constructs coding for the following
structures: 6His-Ubl-GFP. All plasmids were sequenced to confirm
the expected structure.
Design and Construction of Yeast UBL-Fusion Vectors:
[0083] Saccharomyces cerevisiae has been used as a eukaryotic model
for all the experiments involving yeast. All of the expression
vectors for these studies were designed on multicopy yeast vectors
that contain tryptophan or leucine as a selectable marker and 2.mu.
as an origin of replication(22). Proteins were expressed as unfused
products or as ubiquitin, SUMO or other UBL fusion proteins.
Construction of the .beta.-Glucuronidase Yeast Expression
Vectors:
[0084] To demonstrate that UBLs increase the level of secretion of
the protein to the media, in addition to enhancing the level of
expression, expression vectors were constructed with and without
ubiquitin. We have also compared ubiquitin fusion and SUMO fusion
using GFP as a model protein (see FIG. 9 and FIG. 10). pRS425-GUS
plasmid was produced by cloning the XhoI-SacI fragment (containing
E. coli .beta.-Glucuronidase (GUS)) from plasmid pGUS1 (25, 22)
into the XhoI-SacI sites of piasmid pRS425 (32). The next
construction involved addition of a promoter, and resulted in the
plasmid pRS425-ADH1p-GUS. The fragment XhoI-HindIII (containing the
ADH1) was inserted into the XhoI-HindIII sites of the plasmid
pRS425-GUS. The ADH1 promoter XhoI-HindIII fragment was cloned
using polymerase chain reaction (PCR), amplifying the ADH1 promoter
from the plasmid pGRIP1(37). The following primers were used to
amplify the full length ADH1 promoter: ADH1-XhoI:
5'-gctcgagagcagatgcttcgttg-3'(SEQ ID NO:57), and ADH1-HindIII:
5'-gcaaagcttggagttgattgtatgc-3'(SEQ ID NO:58). The underlining
indicates the nucleotide sequence of the XhoI and HindIII
restriction sites. PCR of the DNA fragment involved amplification
in 30 cycles (96.degree. C.-30 sec., 54.degree. C.-1 min. and
72.degree. C.-3 min.) using high replication fidelity Deep Vent
Polymerase (New England Biolabs). The PCR product was then digested
with XhoI and HindIII, and subsequently cloned into the
XhoI-HindIII sites of pRS425-GUS. Construction of the next set of
plasmids involved a change in promoter. The following two plasmids
were constructed to give expression vectors containing either a
methionine or proline junction between the ubiquitin and the GUS.
pRS425-GPDp-Ub(Methionine)-GUS and pRS425-GPDp-Ub(Proline)-GUS were
similarly constructed using both pre-constructed plasmids and PCR
amplification. The final expression construct was
pRS425-CUP1p-SUMO-GUS, which was the only plasmid produced with the
CUP1, copper regulated promoter. This plasmid was digested with the
enzymes BglII and NsiI, releasing the CUP1 promoter(6). The CUP1
fragment was then ligated to pRS425-GPDp-Ub-GUS, having also been
digested with Bglll-NsiI.
Construction of SUMO-N-GFP Yeast Expression Vector:
[0085] To determine what variety of N-terminal variant amino acids
at the junction of SUMO and GFP can be cleaved in yeast we designed
SUMO-GFP vectors in which all 20 amino acid residues were encoded
at the N-terminus of GFP. Essentially all 20 SUMO-X-GFP vectors
designed for E. coli expression were digested with Bsa I-Hind III,
and the inserts were purified. The 20 inserts were cloned in Yep12
that was slightly modified. Specifically, YeEpSW was generated by
digesting Yep12 with Bam HI and Sacd. The CUP1 promoter region was
recovered from the fragment by PCR. A polylinker was created at the
3' end of CUP1 with a variety of restriction sites including NcoI
and Xho1. All 20 SUMO-GFPs (N end variants) were digested with
NcoI-XhoI enzymes and cloned directly YepSW. The resultant vector
YepSW-SUMO-eGFP utilizes tryptophan selection and expresses
SUMO-GFP proteins under the control of the copper promoter. All
vectors were sequenced to ensure correct codons at the junction of
SUMO and GFP.
Construction of UBL-GFP Fusion Yeast Expression Vectors:
[0086] Construction of the UBL-GFP fusion vectors for E. coli has
been described above. In order to make UBL yeast expression vector
NcoI/XhoI fragments carrying GFP alone and all the Upl-GFP fusions
were inserted into respective sites of pYEp SW (see above) that was
similarly digested with NcoI/XhoI. Insertion of UBL-GFP cassette in
Yep SW (See FIGS. 39 and 40A-40F), allows copper inducible
expression of Ubl-GFP fusions in yeast system.
Design and Construction of Recombinant Baculovirus for SUMO and
Ubiquitin GFP Fusion Expression:
[0087] To demonstrate that attachment of SUMO or ubiquitin to GFP
increases its expression and enhances secretion into the media,
several GFP fusion vectors were designed with different
configurations of gp67 secretory signals. The basic GFP vector for
expression is essentially based on E. coli vectors described above.
Derivatives of this vector representing each candidate gene have
been constructed by designing PCR primers. The construction of GFP
plasmid transfer vectors for baculovirus is described. To help
appreciate the rationale for the secretory signal in the context of
GFP-fusion, see the diagrammatic representation shown in FIG. 11.
Single letter code refers to unfused GFP (E); gp67-sec signal-GFP
(G); ubiquitin-GFP (U); SUMO-GFP (S); gp67-Ub-GFP (GU); Ub-gp67-GFP
(UG); gp67-SUMO-GFP (GS); and SUMO-gp67-GFP (SG).
[0088] (i) pFastbacE. A synthetic oligonucleotide containing the
Esp3I site was inserted between BamHI and EcoRI cloning site of the
transfer vector pFastbac1, which had been modified by removing
Esp3I site from Gmr region. (ii) pFastbacG. The signal sequence of
the gp67 gene derived from pACSecG2T was isolated by PCR using 2
primers (f-gp67 and r-gp67), digested with BglII and EcoRI in the
next step, and then inserted between BamHI and EcoRI cloning sites
of the transfer vector pFastbacE. (iii) pFastbacS. A full-length
SUMO gene derived from pET SUMO was generated by PCR using 2
primers (f-bacsmt and r-bacsmt), digested with BsaI and EcoRI in
the next step, and then inserted between BamHI and EcoRI cloning
sites of the transfer vector pFastbacE. (iv) pFastbacG/S. The
signal sequence of the gp67 gene in the pACSecG2T vector was
generated by PCR using 2 primers (f-fusgp67 and r-fusgp67), and
inserted between BamHI and EcoRI cloning sites of the transfer
vector pFastbacE to create a new pFastbacG, which was used for
fusion with SUMO afterward. A full-length SUMO gene derived from
pET SUMO as described above (iii) was digested with BsaI and SacI
and inserted between Esp3I and SacI cloning sites of the new
transfer vector pFastbacG. (v) pFastbacS/G. A full-length SUMO gene
derived from pET SUMO was generated by PCR using 2 primers
(f-fussmt3 and r-fusgp67) and inserted between BamHI and EcoRI
cloning sites of the transfer vector pFastbacE to create the new
pFastbacS, used for fusion with gp67 afterward. The signal sequence
of the gp67 gene derived from pACSecG2T as described above (ii) was
digested with BsaI and SacI, and then inserted between the Esp3I
and SacI cloning sites of the new transfer vector pFastbacS.
Preparation of Baculovirus Stocks and Cell Growth.
[0089] Transfer vector constructs based on the pFastbac 1 shuttle
plasmid (Invitrogen, Inc.) were transposed in DH10Bac E. coli
competent cells to transfer the respective e-GFP fusion sequences
into recombinant virus DNA by site-specific integration. After
alkaline lysis of transformed (white colonies) of E. coli cells,
which contain recombinant virus (bacmid) DNA, and extraction of the
recombinant bacmid DNA, the bacmid DNA was used to transfect
Spodoptera frugiperda (Sf9) insect cells, in which virus
replication occurs. The virus was then amplified to produce passage
2 (for long-term storage) and passage 3 virus (for working) stocks
by infection of fresh Sf9 cell cultures and used directly to infect
cells for fusion protein expression. Virus infectivity (pfu/ml) was
determined by titration in Sf9 cells using the BacPAK.TM. Rapid
Titer Kit (BD Sciences Clontech, Inc.). A 50 ml culture of Hi-Five
cells at concentration of 1.times.106 cells/ml, was infected with
recombinant virus at MOI=5 in Express Five media (serum free
media). The cells were grown in 100 ml spinner flask at 27.degree.
C. Every 24 hours, cell viability was determined by trypan blue and
cell counting. 5ml of the suspension culture was removed at 24 hour
intervals, centrifuged at 500.times.g at 4.degree. C. in 10
minutes. The supernatant was transferred into a fresh tube to
monitor any protein that may have been secreted into the media (see
below).
Analysis of Proteins from Insect Cell Compartments:
[0090] Cell pellets (from above step) were gently washed in 1 ml
PBS and recentrifuged at 500.times.g at 4.degree. C. for 10
minutes. All supernatant and pellets are stored at -80.degree. C.
The presence of recombinant protein in cells and media was
ascertained by SDS-PAGE and Western blotting of supernatant and
cell pellets. The total intracellular protein was extracted by
M-PER extraction buffer (Pierce), a neutral buffer for protein
extraction. The cell pellet was mixed with rapid pipetting and
incubated for 1 hour on an orbital shaker. The suspension was
centrifuged at 500.times.g at 4.degree. C. for 10 minutes to remove
debris. The supernatant contained extracted cellular proteins that
were either analyzed by PAGE or stored at -80.degree. C. To analyze
the proteins present in the media, the following procedure was
adopted. Trichloroacetic acid was added to 5 ml media to a final
concentration of 20%. The suspension was mixed well and left on ice
for three hours, and then centrifuged 500.times.g at 4.degree. C.
for 10 minutes. The white pellet was washed with 80% ethyl alcohol
twice, and then dried. The pellet was suspended in 1 ml of M-PER
buffer for PAGE to compare the distribution of control (unfused)
and SUMO-fused proteins inside and outside the cell.
Methods for Analysis of Yeast Expressed Fusion Proteins:
[0091] Yeast cultures were grown in synthetic or rich media.
Standard yeast and E. coli media were prepared as described (31).
The yeast strain Y4727: Mata his3-.DELTA.200 leu2-.DELTA.0
lys2-.DELTA.0 met5-.DELTA.0 trp1-.DELTA.63 ura3-.DELTA.0 was used
as a host (gift from Dr. Jeff Boeke) or BJ 1991. Yeast
transformation was performed according to published procedures (8).
Yeast transformants with autonomously replicating plasmids were
maintained in yeast selective media. The E. coli
.beta.-Galactosidase and .beta.-Glucuronidase proteins were
expressed under the regulation of either the alcohol dehydrogenase
(ADH), or Glyceraldehyde-Phosphate-Dehydrogenase (GPD) promoter or
copper metallothioneine (CUP 1) promoter in 2 .mu.m multicopy
plasmids with the LEU2 selective marker.
[0092] Yeast cells were transformed with appropriate expression
vectors, and single colonies were grown in synthetic media minus
the selectable marker. For each protein, at least two single
colonies were independently analyzed for protein expression. Cells
were grown in 5 ml culture overnight and, in the morning, the
culture was diluted to an O.D. at 600 nm of 0.5. If the gene was
under the control of copper inducible promoter, copper sulfate was
added to 100 uM and the culture was allowed to grow for at least
three hours. Cells were pelleted at 2000.times.g for 5 minutes,
washed with 10 mM Tris-EDTA buffer pH 7.5. If enzymatic assays were
performed, cells were disrupted in assay buffer with glass beads,
2.times. times the volume of the pellet. Cells were centrifuged and
the supernatant was recovered for enzymatic or protein analysis.
Alternatively, if the level and the type of protein was analyzed by
SDS-PAGE, cell pellet was suspended in SDS-PAGE buffer and boiled
for 5 mins. The suspension was centrifuged, and 10-20 ul aliquots
were run on 12% SDS-PAGE.
Measurement of .beta.-GUS Activity from Yeast:
[0093] .beta.-Glucuronidase (GUS) is a 65 kDa protein that is a
useful marker for protein trafficking. We have used GUS to
determine the role of N-terminal ubiquitin on secretion of GUS in
yeast. Yeast cells were transformed with various GUS vectors, grown
overnight in selective liquid media at 30.degree. C., and diluted
in the liquid selective media to 0.1 OD600 (OD culture). Yeast
cells were incubated in the presence of inducer in shaker at
30.degree. C. Affer 4 hours of incubation, 100 .mu.l of 2.times.
"Z" Sarcosine-ONPG buffer (120 mM Na2HPO4, 80 mM NaH2PO4, 20 mM
KCl, 2 mM MgSO4, 100 mM .beta.-mercaptoethanol, pH 7.0, 0.4%
lauroyl sarcosine) was added. (The 2.times. "Z" Sarcosine- buffer
is freshly prepared or stored at -20.degree. C. prior use.) We used
a fluorometric assay with 4-methylumbelliferyl .beta.-D-glucuronide
as the substrate for .beta.-GUS assay. After incubation at
37.degree. C. for 1 hour (t incubation), the reaction was stopped
by adding 100 .mu.l of quenching solution, 0.5 M Na.sub.2CO.sub.3.
The GUS activity was determined by reading the plates in a
fluorometric plate reader. For calorimetric reactions, relative
activity was calculated as following: (1000.times.OD reaction)/(t
incubation.times.OD culture).
E. coli Growth, Compartmentalization and Protein Expression.
[0094] Protein expression studies were carried out in the Rosetta
bacterial strain (Novagen). This strain is derived from the lambda
DE3 lysogen strain and carries a chromosomal copy of the IPTG
inducible T7 RNA polymerase along with tRNAs on a pACYC based
plasmid. Cultures were grown in LB as well as minimal media and at
growth temperatures of 37.degree. C. and 20.degree. C. with 100
ug/mL ampicillin and 30 ug/mL chloramphenicol. The culture was
diluted 50 fold and grown to mid log (OD at 600 nm=0.5-0.7), at
which time the culture was induced with 1 mM IPTG. Induction was
allowed to proceed for 4-5 hrs, Upon completion of induction, cells
were centrifuged and resuspended in a buffer containing 20%
sucrose. To analyze protein induction in total cells, SDS-PAGE
buffer was added and the protein was analyzed following SDS-PAGE
and staining with Coomassie blue.
Separation of Soluble and Insoluble Fractions.
[0095] E. coli were harvested by mild centrifugation and washed
once with PBS buffer. Cells were resuspended in 4 ml of PBS and
ruptured by several pulses of sonication. Unbroken cells were
removed by mild centrifugation (5 min at 1500.times.g) and
supernatants were sonicated again to ensure complete cell lysis. An
aliquot (5 .mu.l) was mixed with 2% SDS to ensure that no viscosity
is detected owing to lysis of unbroken cells. After ensuring that
no unbroken cells remained in the lysate, insoluble material
consisting of cell walls, inclusion bodies and membrane fragments
was sedimented by centrifugation (18,000.times.g for 10 min). The
supernatant was considered "Soluble fraction".
[0096] The pellets were washed from any remaining soluble proteins,
lipids and peptidoglycan as follows. Pellets were resuspended in
600 .mu.l of PBS and to the suspensions 600 .mu.l of solution
containing 3 M urea and 1% Triton X100 was added. The suspension
was briefly vortexed and insoluble material was collected by
centrifugation as above. The PBS/Urea/Triton wash was repeated two
more times to ensure complete removal of soluble proteins. The
washed pellets, designated as "insoluble fraction," consisted
primarily of inclusion bodies formed by over expressed proteins.
Approximately I Ofg of protein from each fraction was resolved on
12% SDS-PAGE minigels and stained with Coomassie Brilliant
Blue.
Fluorescence (GFP Activity) Assessment.
[0097] GFP fluorescence was measured in soluble fractions (approx.
0.1 mg of soluble protein in a final volume of 40 .mu.l) using
Fluoroscan Accent FL fluorometer (LabSystems) with Excitation 485
nm/Emission 510 nm filter set with the exposure set to 40 sec. The
data are presented in Arbitrary Units (AU).
Western Blotting.
[0098] Twenty .mu.g of total yeast protein per lane were resolved
on 12% SDS-PAGE minigel and electro-blotted to nitrocellulose
membranes by standard methods. Membranes were blocked with 5% milk
in TTBS buffer and incubated with rabbit anti-GFP antibodies
(Clontech, cat no. 8367) at 1:100 dilution overnight at 4.degree.
C. Secondary HRP-conjugated antibodies were from Amersham.
Identical gels were run in parallel and stained with Coomassie to
ensure equal loading of the samples.
[0099] The various 6HisxSUMO-GFP (16) fusions were expressed in
Rosetta(DE3) pLysS (Novagen) using the procedures recommended by
the manufacturer. Expression levels in the absence and presence of
the fusion proteins was compared by SDS-PAGE analysis. The
recombinant proteins were purified using Ni-NTA agarose; (Qiagen)
using procedures recommended by the manufacturer.
Cleavage of Proteins
[0100] For studies in E. coli, an organism that does not possess
SUMO or ubiquitin cleaving enzymes, each cleavage reaction
contained 100 ul of purified fusion protein, 99 ul of the buffer 20
mM Tris-HCl pH 8.0, 150 mM NaCl, 5 mM .beta.-mercaptoethanol, and 1
ul of enzyme. The reactions were incubated for 3 hours at
30.degree. C., and stopped by addition of 6.times. Laemmli SDS-page
loading buffer followed by boiling at 95.degree. C. for 5 minutes.
The products of the cleavage reaction were analyzed by
SDS-PAGE.
[0101] The following examples are provided to illustrate various
embodiments of the present invention. They are not intended to
limit the invention in any way.
EXAMPLE I
Attachment of C-Terminus of UBLs to N-Terminus of GFP Enhances the
Expression and Solubility of the Protein in E. coli
[0102] The design and construction of all the UBL E. coli
expression vectors has been described above. The DNA sequences,
accession numbers of the UBL-GFP fusion proteins, and translation
frames are shown FIGS. 25-32. FIG. 4A shows the 37.degree. C.
expression pattern of GFP, Ub-GFP, SUMO-GFP, Urm1-GFP, Hub1-GFP,
Rub1-GFP, Apg8-GFP, Apg12-GFP, ISG15-GFP. Un-fused GFP is generally
poorly expressed in E. coli. The data show that all of the UBLs
enhance the expression level of GFP to varying degrees. However,
the greatest amount of induction was observed with Ub, SUMO, Urm1,
Apg8 and Apg12. Induced cells were broken by sonication and soluble
proteins were analyzed on SDS-polyacrylamide gels. The stained gel
shows (FIG. 4A, Soluble Panel) that ubiquitin, SUMO, Urm1, Hub1 and
ISG15 were able to solublize the GFP while Rub1, Apg8 and Apg12
fusion proteins were not soluble, however, fusion to these proteins
did enhance the level of expression several fold. To determine if
the fusion proteins were folded correctly, we determined the
fluorescence properties of proteins in the soluble fraction. FIG. 4
A also shows GFP fluorescence in approximately 0.1 mg of soluble
protein in a final volume of 40 ul using Fluoroscan Accent FL
fluorometer (LabSystems) with Excitation 485 nm/Emission 510 nm
filter set with the exposure set to 40 sec. The data are presented
in Arbitrary Units (AU) and show that Ub, SUMO, Urm1, Hub1 and
ISG15 produced GFP protein that was able to fluoresce and, thus,
was folded correctly. Fusions of GFP with Rub1, Apg8 and Apg12 were
induced in large amounts but were not soluble and did not show any
fluorescence.
[0103] In addition, it is shown that ISG15 plays a role in immune
response (24). Thus presentation of ISG15 as a fusion protein is a
viable tool for novel vaccine candidates. Similarly, Apg8 and Apg12
translocate protein to compartments in the cell for autophagy
(30).
[0104] Similar experiments were performed with all the UBL-GFP
fusion proteins, but the induction was performed at 26.degree. C.
overnight. The data shown in FIG. 4B confirms the finding in FIG.
4A. Almost all of the UBLs except Hub 1 showed dramatically
enhanced expression of GFP after fusion. In the case of SUMO, the
level of expression was increased about 20 fold. Analysis of
soluble fraction showed that Ub, SUMO, Urm and ISG15 were able to
solubilize fused GFP (see FIG. 4B, Soluble panel). Functional
analysis of fusion GFP was performed by fluorescence from the
soluble fraction. This data confirms the observation made in FIG.
4A. Combining all the data from the induction studies demonstrates
that fusion of all the UBLs to GFP enhances expression level from
2-40 fold. In addition, Ub, SUMO, Urm1, Hub1 and ISG15 also
increase the solubility of the GFP. These UBLs are therefore
capable of producing correctly folded proteins in E. coli.
[0105] To gain more insight into the role of UBLs in enhancement of
expression and solubility, we have tested the SUMO-fusion systems
with other proteins as well. Serine threonine kinases, tyrosine
kinase and human nuclear receptor have proven difficult to express
in E. coli. Researchers have opted to use tissue culture systems to
express soluble kinases of receptors. FIG. 5 shows expression
6His-SUMO-Tyr-Kinase and unfused Tyr-Kinase in E. coli using LB or
minimal medium (MM), and purified on Ni-NTA resin as described
previously. The small fraction of resin was boiled with
1.times.SDS-PAGE sample buffer and aliquots were resolved on the
12% SDS-PAGE. Equal amounts of E. coli culture were taken for
SUMO-Tyr-kinase and unfused Tyr-kinase and purification was
performed under identical conditions. The stained gel in FIG. 5
shows that STNMO fusion increases the yield of the kinase at least
20 fold, in cells grown in LB media. FIG. 6 also shows the pattern
of the SUMO-Try kinase that was eluted from Ni-NTA by 100 mM EDTA
or 250 mM imidazole. These data further demonstrate that SUMO
fusion enhances the expression of difficult to express protein such
as Tyr-kinase, and that the expressed fusion protein is
soluble.
[0106] Human nuclear receptor proteins, such as steroid receptors,
contain ligand-binding domains. These proteins have proven hard to
express in soluble form in E. coli. We have used human liver X
receptor (LXR) ligand binding domain to demonstrate that SUMO
fusion promotes solubility of the protein in E. coli. The
ligand-binding domain of LXR was expressed as SUMO fusion in
Rosetta plysS cell at 20.degree. C. or 37.degree. C. and the
pattern of soluble and insoluble protein was analyzed. FIG. 7 shows
the stained SDS-polyacrylamide gel demonstrating that about 40% of
the LXR protein was solublized by SUMO fusion, see lane CS in
20.degree. C. box in FIG. 7 (predominant band in 40 kDa range). If
the cells were induced at 37.degree. C., hardly any SUMO-LXR was
soluble although the level of protein induction had increased
dramatically. Further proof that SUMO promotes solubility of
previously insoluble proteins was gained by expressing MAPKAP2
kinase as a SUMO-fusion in E. coli. FIGS. 8A and 8B shows induction
kinetics in E. coli cells expressing kinase at 20.degree. C. and
37.degree. C. Numbers at the top of the gel, 0.1, 0.25 and 0.5
refer to the mM concentration of inducer IPTG, in the culture. The
original induced culture (I), supernatant from lysed cells (S) and
resuspended pellet (P) were analyzed on 12% SDS-PAGE. The data
clearly demonstrate that 90% of the SUMO kinase is soluble when the
cells are induced at 20.degree. C. with 0.25 mM IPTG. Although
induction at 37.degree. C. allows greater degree of expression,
more than 50% of the kinase is still insoluble under these
conditions. Cleavage of SUMO-MAPKKAP2 kinase by SUMO hydrolase is
described in Example III. Also see FIG. 18.
[0107] Overall, these results show that in bacteria, fusion of UBLs
to GFP increases the level of expression from 2-40 fold. Some of
the UBLs such as Ub, SUMO, Urm1, Hub1, and ISG15 solublize
otherwise insoluble proteins. In particular, SUMO has been
demonstrated to increase solubility of kinases and LXR a under
controlled temperature induction from 50-95% of the total expressed
protein.
EXAMPLE II
SUMO-Fusion Expression in Yeast and Insect Cells
Fusions of C-Terminal UBLs to the N-Terminus of GFPs are Cleaved in
Yeast
[0108] To further assess the utility of UBL fusion in eukaryotic
cells we expressed all of the UBL-GFP fusions previously described
in FIG. 4 in yeast. S. cerevisiae BJ1991 strain was transformned
with either YEp-GFP or YEp-UBL-GFP fusion constructs using standard
procedures. Positive clones were grown in YPD medium and induced
with 100 .mu.M CUSO.sub.4 at cell density OD600=0.2 for 3.5 hours.
Total cell extracts were prepared by boiling the yeast cells in
SDS-PAGE buffer. Twenty ug of proteins were analyzed on 12% SDS
gels. A replica gel was stained in Coomassie blue and another gel
was blotted and probed with antibodies against GFP. Data in FIG. 9
shows that Ub-GFP, SUMO-GFP and ISG15-GFP fusions were efficiently
cleaved in yeast, while Rub1-GFP fusion was partially cleaved.
Apg8-GFP fusion was cleaved into two fragments. It is noteworthy
that all the UBL-GFP fusions were designed with methionine as the
first amino terminus. GFP fusion with Urm1, Hub1 and Apg12
expressed well, but were not cleaved in yeast. There was a modest
increase in expression of GFP following fusion with Ub, SUMO, ISG15
and cleavage in yeast. Generally we have observed 10-20 fold
increase in the level of protein expression following fusion to UBL
in prokaryotes and eukaryotes (see FIG. 4B, 10 and 11). The reason
for the modest increase in GFP fusion following cleavage is that
the cells were grown in induction media containing 100 uM copper
sulfate in rich YPD media. Rich media contains many copper binding
sites, and less free copper is available to induce the gene. A
nearly 100-fold increase in GFP production has been observed with a
variety of N-terminal fusions when cells were induced with 100 uM
copper sulfate in synthetic media. See FIG. 10.
Generation of New Amino Termini:
[0109] The identity of the N-terminus of a protein has been
proposed to control its half-life (the N-end Rule) (35). Many
important biopharmaceuticals such as growth factors, chemokines,
and other cellular proteins, require desired N-termini for
therapeutic activity. It has not been possible to generate desired
N-termini, as nature initiates translation from methionine, but the
SUMO system offers a novel way to accomplish this.
[0110] To demonstrate that all N-termini of GFP in SUMO-GFP fusions
were efficiently cleaved when expressed in yeast, a comprehensive
study of SUMO-GFP with 20 N-termini was carried out. Multi-copy
yeast expression plasmids were designed as described above.
Plasmids were transformed in yeast strain BJ 1991, four single
colonies were selected, and the levels and cleavage patterns of two
of the strains were analyzed by SDS-PAGE and western blotting. Data
from Western blots of a single colony is presented in FIG. 10.
These results are in agreement with our in vitro studies of
purified SUMO-X-GFPs (from E. coli) and its cleavage pattern of
SUMO hydrolase. All of the SUMO-GFP fusions were cleaved
efficiently except those containing proline at the junction (see
FIG. 10, middle panel lane "Pro"). It is also interesting to note
that SUMO-Ileu-GFP was partially cleaved during the phase of copper
induction. All of the genes are under the control of copper
inducible promoter. It is possible that SUMO-Ileu-GFP is resistant
to cleavage due to the non-polar nature of the residue at the +1
active site of SUMO hydrolase. In this respect SUMO-Val-GFP was
also partially resistant to cleavage in vivo (see lower most panel
lane labeled "Val"). It is clear from these results that
SUMO-Pro-GFP fusion was completely resistant to cleavage by yeast
SUMO hydrolases as no GFP was observed (see lane "pro" in middle
panel of FIG. 10). This data is consistent with our previous
observations. See FIG. 15. Another important aspect of these
findings is that fusion of SUMO with various N-termini of GFP
appears to increase the expression of almost all the proteins,
although to various degrees. For example Cys-GFP, Gly-GFP and
His-GFP accumulated in greater amounts as compared to other
N-terminal GFPs. A direct comparison of the increase in the level
of GFP following fusion to SUMO can be made by comparing the level
of un-fused GFP (see last lanes of lower most panel in FIG. 10).
Although 20 ug of yeast proteins were loaded on SDS-PAGE the GFP
signal was not detected. To ensure that we were not dealing with
mutation or any artifact, we loaded a protein sample from another
single colony that was induced in under similar conditions and the
sample was loaded next to the previous GFP. No signal was detected,
suggesting that unfused GFP is made in very small amounts that
cannot be detected under the present experimental conditions,
(i.e., a four hour induction with copper sulfate). These studies
show that fusion with SUMO leads to a dramatic increase in the
amount of protein expressed in yeast. All of the N-terminal fusions
are cleaved by endogenous SUMO hydrolases except when the
N-terminal residue is proline. Thus for enhanced expression of a
protein in eukaryotes permanent attachment of SUMO is not required
as significant (.about.100 fold) increased accumulation of the
protein was observed even after the cleavage of SUMO. At the same
time, SUMO-pro-fusions are also useful as 6.times.His-SUMO can be
used to purify the protein from yeast, and the SUMO moiety can be
removed with 10 times greater amounts of the SUMO hydrolase (see
example III).
[0111] Previous studies have shown that attachment of ubiquitin to
the N-termini of proteins in yeast enhances expression, and protein
fusions containing all amino acid at the N-terminal residue, except
proline, are efficiently cleaved in yeast (2, 10, 34). However,
these technologies have several drawbacks. Firstly, none of the
deubiquitinating enzymes (DUBs) have been shown to efficiently
cleave ubiquitin fusion proteins of varying sizes and structures
(3,1), despite the fact that they were discovered more than 15
years ago (35, 19, 3). Secondly, and perhaps more importantly,
ubiquitin predominantly functions as a signal for proteolysis(14).
Therefore, for physiological reasons and for the lack of robust
cleavage of artificial ubiquitin-fusions by DUBs, the ubiquitin
gene fusion system has not been successfully developed for
commercial applications. We have observed that the SUMO system
appears to perform in a manner that is remarkably superior to that
of ubiquitin, as SUMO-and other UBL fusions enhance protein
expression and solubility in prokaryotes. In addition, many of the
UBLs increase expression of GFP, following the cleavage of UBL in
yeast. Unlike the ubiquitin-fusion system, which may direct the
protein to the ubiquitin proteosome pathway, the current cleavage
of fusion-protein in yeast is the result of C-terminal fusion with
SUMO, and proteins generated with novel N-termini are not subject
to degradation by the ubiquitin-proteosome pathway. This is one of
the reasons that large amount of GFP has accumulated in yeast after
cleavage of the SUMO fusion (see FIG. 10).
N-terminal Attachment of Ubiquitin Promotes Protein Secretion:
[0112] To date, a role for ubiquitin in the secretion of proteins
has not been determined. We have assessed whether N-terminal fusion
of ubiquitin to a protein promotes its secretion in yeast. Several
yeast expression vectors that express E. coli .beta.-glucoronidase
(GUS) were designed. All of the yeast GUS expression vectors
described in Table 2 are engineered under the control of the strong
glycolytic GPD promoter that expresses constitutively. Some of the
constructs were also expressed under the control of a copper
regulated metallothionein promoter (CUP1) as well. CUP1 promoter
driven synthesis of the SUMO-GUS constructs was induced by addition
of 100 uM copper sulfate and incubation of 3 hours. To determine
the level of GUS from media, cells were harvested by centrifugation
at 2000.times.g for 10 mins. Supernatant was collected and equal
amounts of aliquots were assayed for enzymatic activity or western
blot analysis as described above. For the comparative study, all
strains were treated identically and grown at the same time to
equal O.D, and the assays were performed at the same time. To
examine intracellular enzymatic activity, the cells were harvested
by centrifugation and washed with Tris EDTA buffer, pH 7.5. The
cell pellets were suspended in sarcosine buffer and ruptured with
glass beads at 4.degree. C., three times by vigorously vortexing.
Supernatant was collected for assay of the enzymatic activity. The
amount of protein secretion was determined by estimating relative
activity of the enzyme in the media. The data is shown in Table 2.
TABLE-US-00004 TABLE 2 Ubiquitin-GUS Expression and Secretion in
Yeast Vector GUS Activity GUS Activity (pRS425) Promoter Signal
Sequence Inside Cell In Supernatant ADH1-GUS1 ADH1 -- +++ -
GPD-.alpha.-factor-GUS1 GPD .alpha.-factor ++ GPD-Ub-GUS1 GPD
Ubiquitin ++++ ++++ GPD-Ub-.alpha.-factor-GUS1 GPD
Ubiquitin-.alpha.-factor ++++ - GPD-.alpha.-factor-Ub(pro)-GUS1 GPD
.alpha.-factor-Ubiquitin(pro) ++ - GPD-.alpha.-factor-Ub(met)-GUS1
GPD .alpha.-factor-Ubiquitin(met) ++ - CUP1-Ub-GUS1 CUP1 Ubiquitin
++++ ++ GUS activity was measured as described. It was not possible
to measure specific units of GUS in the media as yeast grown in
synthetic media. Yeast secretes little protein and current methods
of protein estimation, BioRad kit cannot estimate the protein, the
data was presented as + where one + is equal to 2 units of GUS as
described in invention. - Sign means no GUS activity was
detected.
[0113] The following conclusions are drawn from this study. [0114]
1) Fusion of ubiquitin to GUS leads to a several fold increase when
yeast extracts were analyzed by enzymatic assays. [0115] 2)
Insertion of proline at the junction of ubiquitin and GUS did not
allow cleavage of the ubiquitin-GUS fusion protein. [0116] 3) The
attachment of alpha factor secretory sequences to the N-terminus of
ubiquitin-fusion did not have show any appreciable increase in
secretion of the protein into the media. [0117] 4) Presence of
alpha factor sequences between ubiquitin and GUS did not lead to
any increase in extracellular level of GUS activity. [0118] 5)
Greatest amount of secretion was observed with ubiquitin-Met-GUS.
These observations suggest that endogenous secretory sequences of
GUS in the context of ubiquitin promote the best secretion for GUS.
To this end the current data from yeast correlates very well with
the ubiquitin-GFP protein secretion in insect cells (see FIG. 13).
Fusion of SUMO and Ubiquitin to the N-Terminus of GFP Promotes
Enhanced Expression and Secretion in Insect Cells.
[0119] The role of SUMO in enhanced expression and secretion of
proteins in cultured cells has also been studied in insect cells.
Baculovirus vectors expressing SUMO-GFP constructs and appropriate
controls have been described above. See FIG. 11A for the
orientation gp67 secretory signals in the SUMO-GFP constructs. Data
from a 24 hour infection is shown in FIG. 12. Panel A shows
intracellular protein analysis by Western blots. It is clear that
fusion with ubiquitin and SUMO promotes a large increase in the
amount of protein (compare lane E with lane U and S). Insertion of
gp67 signal sequences to the N-terminus of SUMO leads to further
increase in the amount of protein in insect cells (compare unfused
GFP lane E with gp67-SUMO-GFP lane GS). On the other hand
attachment of gp67 signal sequence to the N-terminus of GFP (lane
G, UG or SG) did not increase the level of protein expression, to
the contrary there was diminution of signal when gp67 was attached
to N-terminus of GFP(lane G) or between SUMO and GFP (lane SG). We
estimate that in the level of expression in the context of
gp67-SUMO-GFP is 20.times. fold higher as compared to unfused GFP
(lane E) or 40.times. fold higher as compared to gp67-GFP (lane G).
No unfused GFP was secreted by any of the constructs at 24 hour
post infection, as shown in blot in FIG. 12 panel B. These results
show that fusion with SUMO leads to a dramatic increase in
expression of GFP in insect cells. Additionally, both SUMO-GFP and
gp67-SUMO-GFP were efficiently cleaved by endogenous SUMO
hydrolases.
[0120] Similar experiments were performed with cells 48 hours post
infection. The data in FIGS. 13A and B show that the pattern of
intracellular expression was similar to the one seen in 24 hours of
infection; however, large amounts of ubiquitin and SUMO-GFP protein
were secreted at 48 hour post infection. Examination of the blots
from media and intracellular protein show that reasonable
expression of unfused GFP was observed inside the cell, but hardly
any protein was secreted in the media (compare lane E of panel A
and panel B in FIG. 13). Attachment of gp67 to the N-terminus of
SUMO-GFP leads to the greatest amount of protein secreted into the
media (see lane GS in panel B). Another important finding is that
attachment of ubiquitin without any signal sequences shows very
high secretion of GFP in the media. This result is completely
consistent with our finding that attachment of ubiquitin to the
N-terminus of GUS promotes the greatest amount of secretion of GUS
into the yeast media.
[0121] We have also discovered that SUMO-Pro-GFP fusion was not
cleaved by endogenous SUMO hydrolases in insect cells (FIG. 13C).
Although some non-specific degradation of SUMO-Pro-GFP was observed
in these experiments (see lane S-P in FIG. 13C), we conclude that
unlike SUMO-GFP, SUMO-Pro-GFP is not cleaved in insect cells. This
observation is also consistent with the finding in yeast that
SUMO-Pro-GFP is not cleaved in cells while other N-terminal GFP
fusions are processed in yeast.
[0122] Further confirmation of these observations was obtained by
fluorescence imaging of the cells expressing GFP fusion proteins.
FIG. 14 shows that cells expressing GFP and fusion GFP fluoresce
intensely. The fluorescence imaging was the strongest and most
widely diffused in cell expressing gp67-SUMO-GFP and Ub-GFP. These
cells show the largest amount of GFP secreted into the media (FIG.
13 panel B). It appears that secretory signal attachment directly
the to N-terminus of GFP produces less GFP in the media and inside
the cells. This observation is borne out by low fluorescence
intensity and granulated pigmented fluorescence (see panel G-eGFP,
S/G-eGFP and U/G-eGFP). These data have led to the following
conclusions: [0123] 1) The increase in the amount of SUMO-fusion
protein expression in insect cells was several-fold higher (20-40
fold) than that of unfused protein, as determined by and Western
blot analysis. [0124] 2) All of the SUMO-GFP constructs that
contain methionine at the +1 position were cleaved except
SUMO-Proline-GUS. This aspect of the SUMO-fusion technology allows
us to express proteins that are stably sumoylated. [0125] 3)
Attachment of ubiquitin to the N-terminus of GFP led to dramatic
enhancement in secretion of the protein in the media. Ubiquitin
promotes secretion of proteins that may or may not have endogenous
secretory signal. Thus, N-terminal ubiquitination may be utilized
as a tool to enhance secretetion of proteins in eukaryotic cells.
[0126] 4) N-terminal SUMO also promotes secretion of protein in
insect cells.
EXAMPLE III
[0126] SUMO Protease ULP1 Cleaves a Variety of SUMO-Fusion
Proteins:
Properties and Applications in Protein and Peptide Expression and
Purification
[0127] Yeast cells contain two SUMO proteases, Ulp1 and Ulp2, which
cleave sumoylated proteins in the cell. At least eight SUMO
hydrolases have been identified in mammalian systems. The yeast
SUMO hydrolase Ulp1 catalyzes two reactions. It processes full
length SUMO into its mature form and it also de-conjugates SUMO
from side chain lysines of target proteins. Examples I and II
establish our findings that attachment of SUMO to the N-terminus of
under-expressed proteins dramatically enhances their expression in
E. coli, yeast and insect cells. To broaden the application of SUMO
fusion technology as a tool for expression of proteins and peptides
of different sizes and structures, the ability of Ulp1 to cleave a
variety of proteins and peptides has been examined. Purified
recombinant SUMO-GFPs were efficiently cleaved when any amino acid
except Proline is present in the +1 position of the cleavage site.
Similar properties of SUMO hydrolase Ulp1 were observed when
Sumo-tyrosine kinase, Sumo-protein G, Sumo-.beta.-GUS, and SUMO
MAPKAP2 kinase were used as substrates. The in vitro activity of
the enzyme showed that it was active under broad ranges of pH,
temperature, and salt and imidazole concentration. These findings
suggest that the Ulp1 is much more robust in cleavage of the
SUMO-fusion proteins as compared to its counterpart,
ubiquitin-fusion hydrolase. Broad specificity and highly efficient
cleavage properties of the Ulp1 indicate that SUMO-fusion
technology can be used as a universal tag to purify a variety of
proteins and peptides, which are readily cleaved to render highly
pure proteins.
[0128] The following materials and methods are provided to
facilitate the practice of Example III.
Affinity Purification and Cleavage of SUMO Fusion Proteins with
SUMO Hydrolase.
[0129] The following table lists the solutions required for the
affinity purification and cleavage procedures: TABLE-US-00005
Solution Components Lysis buffer 25 mM Tris pH 8.0; 50 mM NaCl Wash
Buffer 25 mM imidazole; 50 mM Tris pH 8.0; 250 mM NaCl; (optional)
5-10 mM .beta.-mercaptoethanol (protein dependent) Elution Buffer
300 mM imidazole; 50 mM Tris pH 8.0; 250 mM NaCl; (optional) 5-10
mM .beta.-mercaptoethanol (protein dependent) SUMO hydrolase (Ulp1)
; 250 mM NaCl; Cleavage Buffer 5 mM .beta.-mercaptoethanol (protein
dependent)
From typical 250 ml cultures, the samples are pelleted by
centrifugation, and supernatants are removed by decanting.
Generally, from 250 ml of culture, 1.0 -1.5 grams of wet cells are
produced. Pelleted cells are then resuspended in 5-10 ml of lysis
buffer. RNase and DNase are added to final concentration of 10
ug/ml lysis solution. Samples are kept on ice throughout the
sonication procedure. Using an appropriate tip, the samples are
sonicated 3 - 5 times for 10 second pulses at 50% duty cycle.
Sonicates are incubated on ice for 30 minutes; if the samples are
viscous after this time, the sonication procedure is repeated.
Lysed samples (in lysis solution) are loaded onto 1-ml columns. The
columns are washed with 5 to 10 volumes of wash buffer (wash
fractions are saved until the procedure is complete). Columns are
developed with 2.5 ml of elution buffer, and SUMO hydrolase
cleavage is performed by one of two methods: 1) cleavage is
performed in elution buffer, with SUMO hydrolase added at 50 ul/250
ml buffer, samples incubated at room temperature for 2 hr or
overnight at 4.degree. C., and cleavage monitored by gel
electrophoresis; 2) imidazole is first removed by dialysis, gel
filtration, or desalting, samples are then resuspended in SUMO
hydrolase cleavage buffer, SUMO hydrolase is added at 50 ul/2.5 ml
buffer, and samples are incubated at room temperature for 2 hr or
at 4.degree. C. overnight, with cleavage monitored by gel
electrophoresis. Units of SUMO hydrolase are defined as the amount
of enzyme that cleaves 1 ug of pure SUMO-Met-GFP (up to 95%) in 50
mM Tris-HCl pH 8.0, 0.5 mM DTT, 150 mM NaCl at room temperature in
60 minutes.
[0130] After cleavage, protein can be stored at 4.degree. C., or
subjected to purification. ##STR1## The expression and purification
of carboxy terminus of Ulp1p is described above.
In vitro Cleavage Experiments
[0131] The various His6smt3XeGFP fusions were expressed in Rosetta
(DE3) pLysS (Novagen). The recombinant proteins were purified using
Ni-NTA agarose (Qiagen). The comparative in vitro cleavage
reactions were carried out by first normalizing the amount of the
various fusions in each reaction. This was done by measuring the
fluorescence properties of the purified fusion proteins using the
fluorimeter Fluoriskan II (Lab Systems) and then diluting the more
concentrated samples with the Ni-NTA agarose elution buffer (20 mM
Tris-HCl pH 8.0, 150 mM NaCl 300 mM Imidazole and 5 mM
beta-mercaptoethanol), such that their fluorescence values equaled
that of the lowest yielder. Each cleavage reaction contained 100 ul
of protein, 99 ul of the buffer 20 mM Tris-HCl pH 8.0, 150 mM NaCl
and 5 mM beta-mercaptoethanol and 1 ul of enzyme. The reactions
were incubated for 3 hours at 30.degree. C. after which they were
stopped by addition of 6.times. Laemmli SDS-page loading buffer
followed by boiling at 95.degree. C. for 5 minutes. The products of
the cleavage reaction were analyzed by SDS-PAGE.
[0132] Proline cleavage experiments were carried out in a fashion
similar to those described above. The purified His6smt3PeGFP was
buffer exchanged into 20 mM Tris-HCl pH 8.0, 150 mM NaCl and 5 mM
beta-mercaptoethanol using a PD-10 column. A 10 fold increase in
the amount of Ulp1 were added to each reaction. Digestions were
incubated for 3 hours at 30.degree. C. All reactions were stopped
by addition of Laemmli loading buffer and analyzed by SDS-page.
FIG. 15 shows the stained SDS-PAGE analysis of all the SUMO-X-GFPs
and their digestion by SUMO hydrolase. The findings clearly show
that Ulp1 hydrolase was able to cleave all the SUMO-GFP fusions
except proline. These finding are similar to the observations made
in yeast (FIG. 10) and in insect cells (FIG. 13).
[0133] Conjugation of ubiquitin and SUMO to its target proteins is
a highly regulated and dynamic process. Several deubiquitinating
enzymes (DUBs) have been identified in yeast and other eukaryotic
cells(1). Yeast genetics studies show that many of these enzymes
are not essential suggesting that an overlapping function is
performed by most of these enzymes. DUBs have been most extensively
studied and shown to cleave linear ubiquitin fusions as well
isopepetide bonds (3, 35). Much less is known about the enzymes
that remove SUMO from isopeptide bonds or artificial SUMO-fusion
proteins. Hochstrasser and Li have shown that Ulp1 and Ulp2 remove
Smt3 and SUMO 1 from proteins and play a role in progression
through the G2IM phase and recovery of cells from checkpoint
arrest, respectively(20, 21). Ulp1 and Ulp2 cleave C-terminus of
SUMO (-GGATY; SEQ ID NO: 59) to mature form (-GG) and de-conjugate
Smt3 from the side chains of lysines(20, 21). The sequence
similarity of two enzymes is restricted to a 200-amino acid
sequence called ULP that contains the catalytically active region.
The three-dimensional structure of the ULP domain from Ulp1 has
been determined in a complex form with SUMO (Smt3) precursor(27).
These studies show that conserved surfaces of SUMO determine the
processing and de-conjugation of SUMO. Database searches of the
human genome and recent findings suggest that there are at least 7
human ULPs with the size ranging from 238 to 1112 amino acid
residues (18, 33, 39). It is intriguing to note that SUMO Ulps are
not related to DUBs, suggesting that SUMO Ulps evolved separately
from DUBs. The findings that ULP structure is distantly related to
adenovirus processing protease, intracellular pathogen Chlammydia
trachomatis and other proposed bacterial cystiene protease core
domains suggest that this sequence evolved in prokaryotes(20, 21).
Detailed properties of the SUMO proteases are provided in described
in Table 3. TABLE-US-00006 TABLE 3 SUMO Hydrolases/Proteases Enzyme
Properties (MW) Reference UB1-specific 72 KDa. 6 21 residues Li and
Protease Cleaves linear fusion and Hochstrasser, ULP1 SUMO
isopeptides bonds. 1999 (REF 20) ULP2 (Yeast) 117 KDa, 1034
residues Li and Cleaves linear fusions and Hochstrasser, SUMO
isopeptide structures. 2000 (REF 21) SUMO-I C-Terminal 30 Kda
Suzuki, et al, Cleaves linear fusions and 1999 (REF 33) SUMO
isopeptide structures SUMO-I specific 126 KDa 1112 residues Kim, et
al, Protease Specific for SUMO-1 fusion 2000 (REF 18) SUSP I
(Human) but not Smt3 fusion. Does not cleave isopeptide bond.
Sentrin specific All of the SENP enzymes Yeh, et al, Proteases
(SENP) have conserved C-terminal 2000 (REF 39) SENP1 region with
core catalytic SENP2 cysteine. SENP3 The smallest SENP7 is 238
SENP4 residues and the largest SENP5 SENP6 is 1112 residues. SENP6
SENP7
Ulp1 has proven extremely robust in cleaving a variety of
SUMO-fusion proteins expressed in E. coli as described in the
present example. We have designed SUMO-GFP fusions in which the
N-terminal methionine has been replaced with rest of the 19 amino
acids. Attachment of 6.times. His to N-terminus of SUMO afforded
easy purification of the 20 SUMO-GFP fusions from E. coli. The
enzyme was active under broad ranges of pH, temperature, salts and
imidazole concentration and was very effective in cleaving variety
of proteins from SUMO fusion that includes BPTI a 6.49 KDa, Protein
G a 7 KDa, .beta.-Glucuronidase (GUS) and 110 KDa
.beta.-Galactosidase (GAL) genes. These findings suggest that the
Ulp1 is much more robust in cleavage of the SUMO-fusion proteins as
compared to its counterpart ubiquitin-fusion hydrolase.
SUMO Protease/Hydrolase is a Robust Enzyme
Effects of Temperature and Additives
[0134] The effects of various additives/conditions and temperature
upon the in vitro cleavage reaction were determined as follows:
His6smt3MeGFP was expressed from pET24d in Rosetta(DE3) pLysS
(Novagen). The recombinant protein was purified as before using
Ni-NTA agarose (Qiagen) and then buffer exchanged into 20 mM
Tris-HCl pH 8.0, 150 mM NaCl and 5 mM .beta.-mercaptoethanol using
a PD-column (AP Biotech). Cleavage reactions were performed with
100 ug of the purified protein, 0.5 ul of enzyme, the appropriate
amount of a stock solution of additive to generate the final
concentrations listed in Table 4, plus the exchange buffer up to a
final volume of 200 ul. Reactions were incubated for 1 hour at
37.degree. C. except for those at 4.degree. C. were incubated for 3
hours. The data in FIG. 16 shows that Ulp1 was extremely active at
37.degree. C. as well as at 4.degree. C. Generally, His tagged
proteins are purified on nickel columns and eluted with imidazole.
We have discovered that the enzyme was remarkably active at 0-300
mM imidazole concentration. The enzyme was highly active at 0.01%
SDS and up to 1% triton X 100. See Table 4. Similarly, chaotropic
agents such as urea and did not effect the activity of the enzyme
up to 2 M. Ulp1 showed 50% activity at 0.5M concentration of
guanadinium hydrochloride (FIG. 16 and Table 4). A variety of
reagents, including cysteine protease inhibitors, EDTA, PMSF.
Pepstatin, Leupeptin, TLCK had no effect on the enzymatic activity
(FIG. 17 and Table 4). N-ethymaleimide was active only if incubated
with the enzyme prior to addition of the substrate. All the data
shown in Table 2 demonstrate that this enzyme is extremely robust
and thus constitutes a superior reagent for cleaving fusion
proteins under variety of conditions. TABLE-US-00007 TABLE 4 The
Effect of Different Conditions on the Ulp1 Hydrolase Activity
Conditions/Additions Effect Environmental: Temperature Ulp1 is
active over a broad range of temperatures, cleaving from 4 to
37.degree. C. Salts: Imidazole Ulp1 shows similar activity in the
range of 0 to 300 mM Detergents: SDS 0.01% SDS blocks activity
Triton-X Ulp1 shows similar activity on the range of 0 to 0.1%
Chaotrophs Urea Ulp1 shows complete activity up to and including a
2M concentration Gdm HCl Ulp1 shows 50% activity in 0.5M but is
completely inactive in 1M concentrations Protease inhibitors: E-64
Cysteine protease inhibitor; no affect EDTA Metalloprotease
inhibitor; no affect PMSF Serine protease inhibitor; no affect
Pepstatin Aspartate protease inhibitor; no affect Leupeptin
Inhibits serine and cysteine proteases with trypsin-like
specificity; no affect TLCK-HCl Inhibits serine and cysteine
proteases with chymotrypsin-like specificity; no affect
N-ethylmaleimide Cysteine protease inhibitor; on effective if
enzyme is preincubated with inhibitor before addition of
substrate
Robust Properties of SUMO Hydrolase: Cleavage of Different Size
Fusion Proteins Under Broad pH Range
[0135] FIG. 18 shows purification of a 40 kDa MAPKAP2 kinase that
was difficult to express unless fused to SUMO. We have shown in
Example I (FIG. 8) that this kinase was expressed in a highly
soluble form (95%) as fusion to SUMO. FIG. 18 shows that whether
purfied from cells expressing at 37.degree. C. or 20.degree. C.,
the SUMO fusion was efficiently cleaved under the conditions
described.
[0136] The SUMO hydrolase also functions under broad pH range. FIG.
19 shows kinetics of cleavage at pH 7.5 and 8.0. The data shows
that purified SUMO-GFP was completely digested at room temperature.
We have also performed experiments from pH 5.5 to 10. The data (not
shown) support the notion that this enzyme is active over broad
range of pH.
[0137] As discussed above, for broad utility of the system it is
important that the enzyme be able to cleave fusion proteins of
different sizes and structures in vitro. FIG. 20 shows the
digestion pattern of SUMO-.beta.-galactosidase (.beta.-Gal) a 110
KDa protein. .beta.-Gal enzyme is composed of tetrameric subunits.
The digestion pattern demonstrates that in 20 minutes, SUMO
hydrolase was able to cleave 100% of the protein.
[0138] Among dozens of proteins expressed as SUMO fusions in our
lab, only one, .beta.-GUS, proved partially resistant to cleavage
by the hydrolase. Configurations of artificial SUMO fusion are
bound to occur wherein the structure of the protein will hinder the
ability of the enzyme to recognize and bind the cleavage site of
the fusion protein. This problem has been solved by adding small
concentrations of urea, which does not inhibit the hydrolase, but
results in cleavage the fusion that was previously resistant. FIG.
21 shows the digestion pattern of purified B-GUS and SUMO hydrolase
before and after addition of urea. Lane 6 and 9 contain the same
amount of SUMO hydrolase to which 2M urea was added during the
incubation. Addition of urea allowed complete cleavage of 65 KDa
B-GUS in 20 min at room temperature. This data further proves that
the SUMO hydrolase cleaves broad spectrum of fusion protein
efficiently. Additives such as urea can be added to aid complete
cleavage of these structures that are resistant to hydrolase
action.
High Throughput Protein Purification of Fusion Proteins:
Rapid Peptide Miniprep
[0139] We have discovered that, due to the rapid folding properties
of SUMO, the fused protein can also be rapidly re-natured after
treatment of the crude protein mix with chaotropic agents such as
guanidinium hydrochloride or urea. We have developed a simple and
rapid procedure to purify SUMO-fused proteins that are expressed in
prokaryotes and eukaryotes. This method was tested with
SUMO-protein G fusion expressed in E. coli. Cells expressing
6.times.His-SUMO-G protein fusion were harvested and frozen until
required for protein purification. Three times the weight per
volume lysis buffer (6 M Guanidinium Chloride, 20 mM Tris-HCl, 150
mM NaCl, pH 8.0) was added to the cell pellet rapidly lyse the
cells. The supernatant was loaded onto a pre-equilibrated column
containing Ni-NTA agarose (Qiagen), the flow through was collected
for analysis. The column was then washed, first with 2 column
volumes (CV) of Lysis buffer, followed by 3 CV of wash buffer (20
mM Tris-HCl, 150 mM NaCl 15 mM Imidazole pH 8.0). The fusion
protein was then eluted using 2 CV of elution buffer (20 mM
Tris-HCl, 150 mM NaCl 300 mM Imidazole pH 8.0). The purified
product is present in a native buffer that allows for cleavage and
release of the peptide from the Sumo fusion using Ulp1. See FIG.
22. This data demonstrates that it is possible to rapidly purify
the fusion protein and cleave it from the resin with Ulp1. It is
possible that proteins of higher molecular weights may not rapidly
re-nature and be amenable to cleavage by Ulp1. However, since the
Ulp1 requires three-dimensional SUMO be intact the purification and
cleavage properties are more dependent on the refolding of SUMO.
Similar to DNA mini-preps, rapid mini preps for the expression and
purification analysis of the fused proteins may be readily
employed. Table 5 summarizes the data showing the dramatic
enhancement of protein production observed when utilizing the
compositions and methods of the present invention. The sequences
and vectors utilized in the practice of the invention are shown in
FIGS. 23-46. TABLE-US-00008 TABLE 5 Fusion with SUMO Enhances
Protein Expression E. coli Expression All of the fusion have Met of
UBLs N-Termini SUMO-GFP 40 fold Ub-GFP 40 fold Urm1-GFP 50 fold
Hub1-GFP 2 fold Rub1-GFP 50 fold Apg8-GFP 40 fold Apg12-GFP 20 fold
ISG15-GFP 3-5 fold Yeast Met and Various N-Termini Various UBLs
expressed in Copper induction not observed in rich media. rich
media, however, Ub, SUMO, ISG15 fusions were processed and GFP
induced 3-5 fold. All of the twenty N- Dramatic induction of GFP
following terminal variants were fusion with SUMO. At least 50-100
expressed in yeast as fold induction as compared to SUMO-X-GFP
fusions. unfused GFP expression. Under GFP was processed in all
current loading conditions (20 ug) cases, except when N- GFP was
not detectable. terminal residue was proline. Insect Cells Met as
N-termini SUMO-GFP 10 fold compared to GFP gp67-SUMO-GFP 30 fold
compared to gp-GFP gp67-SUMO-GFP 50 fold compared to SUMO-gp67-GFP
Secretion SUMO-GFP At least 50 fold compared to GFP Secretion
Ub-GFP At least 50 fold compared to GFP
REFERENCES
[0140] 1. Amerik, A. Y., S. J. Li, and M. Hochstrasser. 2000.
Analysis of the deubiquitinating enzymes of the yeast Saccharomyces
cerevisiae. Biol Chem 381:981-92. [0141] 2. Bachmair, A., D.
Finley, and A. Varshavsky. 1986. In vivo half-life of a protein is
a function of its amino-terminal residue. Science 234:179-86.
[0142] 3. Baker, R. T. 1996. Protein expression using ubiquitin
fusion and cleavage. Curr Opin Biotechnol 7:541-6. [0143] 4. Bayer,
P., A. Arndt, S. Metzger, R. Mahajan, F. Melchior, R. Jaenicke, and
J. Becker. 1998. Structure determination of the small
ubiquitin-related modifier SUMO-1. J Mol Biol 280:275-86. [0144] 5.
Butt, T. R., S. Jonnalagadda, B. P. Monia, E. J. Sternberg, J. A.
Marsh, J. M. Stadel, D. J. Ecker, and S. T. Crooke. 1989. Ubiquitin
fusion augments the yield of cloned gene products in Escherichia
coli. Proc Natl Acad Sci U S A 86:2540-4. [0145] 6. Butt, T. R., E.
J. Sternberg, J. A. Gorman, P. Clark, D. Hamer, M. Rosenberg, and
S. T. Crooke. 1984. Copper metallothionein of yeast, structure of
the gene, and regulation of expression. Proc Natl Acad Sci U S A
81:3332-6. [0146] 7. Ecker, D. J., J. M. Stadel, T. R. Butt, J. A.
Marsh, B. P. Monia, D. A. Powers, J. A. Gorman, P. E. Clark, F.
Warren, A. Shatzman, and et al. 1989. Increasing gene expression in
yeast by fusion to ubiquitin. J Biol Chem 264:7715-9. [0147] 8.
Gietz, D., A. St. Jean, R. A. Woods, and R. H. Schiestl. 1992.
Improved method for high efficiency transformation of intact yeast
cells. Nucleic Acids Res 20:1425. [0148] 9. Goward, C. R., J. P.
Murphy, T. Atkinson, and D. A. Barstow. 1990. Expression and
purification of a truncated recombinant streptococcal protein G.
Biochem J 267:171-7. [0149] 10. Graumann, K., J. L. Wittliff, W.
Raffelsberger, L. Miles, A. Jungbauer, and T. R. Butt. 1996.
Structural and functional analysis of N-terminal point mutants of
the human estrogen receptor. J Steroid Biochem Mol Biol 57:293-300.
[0150] 11. Hicke, L. 1997. Ubiquitin-dependent internalization and
down-regulation of plasma membrane proteins. Faseb J 11:1215-26.
[0151] 12. Hochstrasser, M. 2000. Evolution and function of
ubiquitin-like protein-conjugation systems. Nat Cell Biol 2:E153-7.
[0152] 13. Hochstrasser, M. 1995. Ubiquitin, proteasomes, and the
regulation of intracellular protein degradation. Curr Opin Cell
Biol 7:215-23. [0153] 14. Hochstrasser, M. 1996.
Ubiquitin-dependent protein degradation. Annu Rev Genet 30:405-39.
[0154] 15. Jentsch, S., and G. Pyrowolakis. 2000. Ubiquitin and its
kin: how close are the family ties? Trends Cell Biol
10:335-42..sub.--00001785.sub.--00001785. [0155] 16. Johnson, E.
S., I. Schwienhorst, R. J. Dohmen, and G. Blobel. 1997. The
ubiquitin-like protein Smt3p is activated for conjugation to other
proteins by an Aos1p/Uba2p heterodimer. Embo J 16:5509-19. [0156]
17. Kapust, R. B., and D. S. Waugh. 1999. Escherichia coli
maltose-binding protein is uncommonly effective at promoting the
solubility of polypeptides to which it is fused. Protein Sci
8:1668-74. [0157] 18. Kim, K. I., S. H. Baek, Y. J. Jeon, S.
Nishimori, T. Suzuki, S. Uchida, N. Shimbara, H. Saitoh, K. Tanaka,
and C. H. Chung. 2000. A new SUMO-1-specific protease, SUSP1, that
is highly expressed in reproductive organs. J Biol Chem
275:14102-6. [0158] 19. LaBean, T. H., S. A. Kauffman, and T. R.
Butt. 1995. Libraries of random-sequence polypeptides produced with
high yield as carboxy-terminal fusions with ubiquitin. Mol Divers
1:29-38. [0159] 20. Li, S. J., and M. Hochstrasser. 1999. A new
protease required for cell-cycle progression in yeast. Nature
398:246-51. [0160] 21. Li, S. J., and M. Hochstrasser. 2000. The
yeast ULP2 (SMT4) gene encodes a novel protease specific for the
ubiquitin-like Smt3 protein. Mol Cell Biol 20:2367-77. [0161] 22.
Lyttle, C. R., P. Damian-Matsumura, H. Juui, and T. R. Butt. 1992.
Human estrogen receptor regulation in a yeast model system and
studies on receptor agonists and antagonists. J Steroid Biochem Mol
Biol 42:677-85. [0162] 23. Mahajan, R., L. Gerace, and F. Melchior.
1998. Molecular characterization of the SUMO-1 modification of
RanGAP1 and its role in nuclear envelope association. J Cell Biol
140:259-70. [0163] 24. Malakhova, O., M. Malakhov, C. Hetherington,
and D. E. Zhang. 2002. Lipopolysaccharide activates the expression
of ISG15-specific protease UBP43 via interferon regulatory factor
3. J Biol Chem 277:14703-11. [0164] 25. Marathe, S. V., and J. E.
McEwen. 1995. Vectors with the gus reporter gene for identifying
and quantitating promoter regions in Saccharomyces cerevisiae. Gene
154:105-7. [0165] 26. Matunis, M. J., J. Wu, and G. Blobel. 1998.
SUMO-1 modification and its role in targeting the Ran
GTPase-activating protein, RanGAP1, to the nuclear pore complex. J
Cell Biol 140:499-509. [0166] 27. Mossessova, E., and C. D. Lima.
2000. Ulp1-SUMO crystal structure and genetic analysis reveal
conserved interactions and a regulatory element essential for cell
growth in yeast. Mol Cell 5:865-76. [0167] 28. Muller, S., C.
Hoege, G. Pyrowolakis, and S. Jentsch. 2001. SUMO, ubiquitin's
mysterious cousin. Nat Rev Mol Cell Biol 2:202-10. [0168] 29.
Muller, S., M. J. Matunis, and A. Dejean. 1998. Conjugation with
the ubiquitin-related modifier SUMO-1 regulates the partitioning of
PML within the nucleus. Embo J 17:61-70. [0169] 30. Ohsumi, Y.
2001. Molecular dissection of autophagy: two ubiquitin-like
systems. Nat Rev Mol Cell Biol 2:211-6. [0170] 31. Sherman, F., G.
Fink, and J. Hicks. 1986. Methods in yeas genetics. Cold Spring
Harbor Laboratory, Cold Spring Harbor, N.Y. [0171] 32. Sikorski, R.
S., and P. Hieter. 1989. A system of shuttle vectors and yeast host
strains designed for efficient manipulation of DNA in Saccharomyces
cerevisiae. Genetics 122:19-27. [0172] 33. Suzuki, T., A. Ichiyama,
H. Saitoh, T. Kawakami, M. Omata, C. H. Chung, M. Kimura, N.
Shimbara, and K. Tanaka. 1999. A new 30-kDa ubiquitin-related
SUMO-1 hydrolase from bovine brain. J Biol Chem 274:31131-4. [0173]
34. Varshavsky, A. 1996. The N-end rule: functions, mysteries,
uses. Proc Natl Acad Sci U S A 93:12142-9. [0174] 35. Varshavsky,
A. 2000. Ubiquitin fusion technique and its descendants. Methods
Enzymol 327:578-93. [0175] 36. Waldo, G. S., B. M. Standish, J.
Berendzen, and T. C. Terwilliger. 1999. Rapid protein-folding assay
using green fluorescent protein. Nat Biotechnol 17:691-5. [0176]
37. Walfish, P. G., T. Yoganathan, Y. F. Yang, H. Hong, T. R. Butt,
and M. R. Stallcup. 1997. Yeast hormone response element assays
detect and characterize GRIP1 coactivator-dependent activation of
transcription by thyroid and retinoid nuclear receptors. Proc Nati
Acad Sci U S A 94:3697-702. [0177] 38. Wright, L. C., J. Seybold,
A. Robichaud, I. M. Adcock, and P. J. Barnes. 1998.
Phosphodiesterase expression in human epithelial cells. Am J
Physiol 275:L694-700. [0178] 39. Yeh, E. T., L. Gong, and T.
Kamitani. 2000. Ubiquitin-like proteins: new wines in new bottles.
Gene 248:1-14.
[0179] While certain of the preferred embodiments of the present
invention have been described and specifically exemplified above,
it is not intended that the invention be limited to such
embodiments. Various modifications may be made thereto without
departing from the scope and spirit of the present invention, as
set forth in the following claims.
Sequence CWU 1
1
65 1 106 PRT Artificial Sequence Synthetic Sequence 1 Met Gly His
His His His His His Gly Ser Asp Ser Glu Val Asn Gln 1 5 10 15 Glu
Ala Lys Pro Glu Val Lys Pro Glu Val Lys Pro Glu Thr His Ile 20 25
30 Asn Leu Lys Val Ser Asp Gly Ser Ser Glu Ile Phe Phe Lys Ile Lys
35 40 45 Lys Thr Thr Pro Leu Arg Arg Leu Met Glu Ala Phe Ala Lys
Arg Gln 50 55 60 Gly Lys Glu Met Asp Ser Leu Arg Phe Leu Tyr Asp
Gly Ile Arg Ile 65 70 75 80 Gln Ala Asp Gln Ala Pro Glu Asp Leu Asp
Met Glu Asp Asn Asp Ile 85 90 95 Ile Glu Ala His Arg Glu Gln Ile
Gly Gly 100 105 2 320 DNA Artificial Sequence Synthetic Sequence 2
ccatgggtca tcaccatcat catcacgggt cggactcaga agtcaatcaa gaagctaagc
60 cagaggtcaa gccagaagtc aagcctgaga ctcacatcaa tttaaaggtg
tccgatggat 120 cttcagagat cttcttcaag atcaaaaaga ccactccttt
aagaaggctg atggaagcgt 180 tcgctaaaag acagggtaag gaaatggact
ccttaagatt cttgtacgac ggtattagaa 240 ttcaagctga tcaggcccct
gaagatttgg acatggagga taacgatatt attgaggctc 300 accgcgaaca
gattggaggt 320 3 239 PRT Artificial Sequence Synthetic Sequence 3
Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro Ile Leu 1 5
10 15 Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser
Gly 20 25 30 Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu
Lys Phe Ile 35 40 45 Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro
Thr Leu Val Thr Thr 50 55 60 Leu Thr Tyr Gly Val Gln Cys Phe Ser
Arg Tyr Pro Asp His Met Lys 65 70 75 80 Gln His Asp Phe Phe Lys Ser
Ala Met Pro Glu Gly Tyr Val Gln Glu 85 90 95 Arg Thr Ile Phe Phe
Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 100 105 110 Val Lys Phe
Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys Gly 115 120 125 Ile
Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu Glu Tyr 130 135
140 Asn Tyr Asn Ser His Asn Val Tyr Ile Met Ala Asp Lys Gln Lys Asn
145 150 155 160 Gly Ile Lys Val Asn Phe Lys Ile Arg His Asn Ile Glu
Asp Gly Ser 165 170 175 Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr
Pro Ile Gly Asp Gly 180 185 190 Pro Val Leu Leu Pro Asp Asn His Tyr
Leu Ser Thr Gln Ser Ala Leu 195 200 205 Ser Lys Asp Pro Asn Glu Lys
Arg Asp His Met Val Leu Leu Glu Phe 210 215 220 Val Thr Ala Ala Gly
Ile Thr Leu Gly Met Asp Glu Leu Tyr Lys 225 230 235 4 727 DNA
Artificial Sequence Synthetic Sequence 4 atggtgagca agggcgagga
gctgttcacc ggggtggtgc ccatcctggt cgagctggac 60 ggcgacgtaa
acggccacaa gttcagcgtg tccggcgagg gcgagggcga tgccacctac 120
ggcaagctga ccctgaagtt catctgcacc accggcaagc tgcccgtgcc ctggcccacc
180 ctcgtgacca ccctgaccta cggcgtgcag tgcttcagcc gctaccccga
ccacatgaag 240 cagcacgact tcttcaagtc cgccatgccc gaaggctacg
tccaggagcg caccatcttc 300 ttcaaggacg acggcaacta caagacccgc
gccgaggtga agttcgaggg cgacaccctg 360 gtgaaccgca tcgagctgaa
gggcatcgac ttcaaggagg acggcaacat cctggggcac 420 aagctggagt
acaactacaa cagccacaac gtctatatca tggccgacaa gcagaagaac 480
ggcatcaagg tgaacttcaa gatccgccac aacatcgagg acggcagcgt gcagctcgcc
540 gaccactacc agcagaacac ccccatcggc gacggccccg tgctgctgcc
cgacaaccac 600 tacctgagca cccagtccgc cctgagcaaa gaccccaacg
agaagcgcga tcacatggtc 660 ctgctggagt tcgtgaccgc cgccgggatc
actctcggca tggacgagct gtacaagtaa 720 taagctt 727 5 345 PRT
Artificial Sequence Synthetic Sequence 5 Met Gly His His His His
His His Gly Ser Asp Ser Glu Val Asn Gln 1 5 10 15 Glu Ala Lys Pro
Glu Val Lys Pro Glu Val Lys Pro Glu Thr His Ile 20 25 30 Asn Leu
Lys Val Ser Asp Gly Ser Ser Glu Ile Phe Phe Lys Ile Lys 35 40 45
Lys Thr Thr Pro Leu Arg Arg Leu Met Glu Ala Phe Ala Lys Arg Gln 50
55 60 Gly Lys Glu Met Asp Ser Leu Arg Phe Leu Tyr Asp Gly Ile Arg
Ile 65 70 75 80 Gln Ala Asp Gln Ala Pro Glu Asp Leu Asp Met Glu Asp
Asn Asp Ile 85 90 95 Ile Glu Ala His Arg Glu Gln Ile Gly Gly Met
Val Ser Lys Gly Glu 100 105 110 Glu Leu Phe Thr Gly Val Val Pro Ile
Leu Val Glu Leu Asp Gly Asp 115 120 125 Val Asn Gly His Lys Phe Ser
Val Ser Gly Glu Gly Glu Gly Asp Ala 130 135 140 Thr Tyr Gly Lys Leu
Thr Leu Lys Phe Ile Cys Thr Thr Gly Lys Leu 145 150 155 160 Pro Val
Pro Trp Pro Thr Leu Val Thr Thr Leu Thr Tyr Gly Val Gln 165 170 175
Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gln His Asp Phe Phe Lys 180
185 190 Ser Ala Met Pro Glu Gly Tyr Val Gln Glu Arg Thr Ile Phe Phe
Lys 195 200 205 Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val Lys Phe
Glu Gly Asp 210 215 220 Thr Leu Val Asn Arg Ile Glu Leu Lys Gly Ile
Asp Phe Lys Glu Asp 225 230 235 240 Gly Asn Ile Leu Gly His Lys Leu
Glu Tyr Asn Tyr Asn Ser His Asn 245 250 255 Val Tyr Ile Met Ala Asp
Lys Gln Lys Asn Gly Ile Lys Val Asn Phe 260 265 270 Lys Ile Arg His
Asn Ile Glu Asp Gly Ser Val Gln Leu Ala Asp His 275 280 285 Tyr Gln
Gln Asn Thr Pro Ile Gly Asp Gly Pro Val Leu Leu Pro Asp 290 295 300
Asn His Tyr Leu Ser Thr Gln Ser Ala Leu Ser Lys Asp Pro Asn Glu 305
310 315 320 Lys Arg Asp His Met Val Leu Leu Glu Phe Val Thr Ala Ala
Gly Ile 325 330 335 Thr Leu Gly Met Asp Glu Leu Tyr Lys 340 345 6
1047 DNA Artificial Sequence Synthetic Sequence 6 ccatgggtca
tcaccatcat catcacgggt cggactcaga agtcaatcaa gaagctaagc 60
cagaggtcaa gccagaagtc aagcctgaga ctcacatcaa tttaaaggtg tccgatggat
120 cttcagagat cttcttcaag atcaaaaaga ccactccttt aagaaggctg
atggaagcgt 180 tcgctaaaag acagggtaag gaaatggact ccttaagatt
cttgtacgac ggtattagaa 240 ttcaagctga tcaggcccct gaagatttgg
acatggagga taacgatatt attgaggctc 300 accgcgaaca gattggaggt
atggtgagca agggcgagga gctgttcacc ggggtggtgc 360 ccatcctggt
cgagctggac ggcgacgtaa acggccacaa gttcagcgtg tccggcgagg 420
gcgagggcga tgccacctac ggcaagctga ccctgaagtt catctgcacc accggcaagc
480 tgcccgtgcc ctggcccacc ctcgtgacca ccctgaccta cggcgtgcag
tgcttcagcc 540 gctaccccga ccacatgaag cagcacgact tcttcaagtc
cgccatgccc gaaggctacg 600 tccaggagcg caccatcttc ttcaaggacg
acggcaacta caagacccgc gccgaggtga 660 agttcgaggg cgacaccctg
gtgaaccgca tcgagctgaa gggcatcgac ttcaaggagg 720 acggcaacat
cctggggcac aagctggagt acaactacaa cagccacaac gtctatatca 780
tggccgacaa gcagaagaac ggcatcaagg tgaacttcaa gatccgccac aacatcgagg
840 acggcagcgt gcagctcgcc gaccactacc agcagaacac ccccatcggc
gacggccccg 900 tgctgctgcc cgacaaccac tacctgagca cccagtccgc
cctgagcaaa gaccccaacg 960 agaagcgcga tcacatggtc ctgctggagt
tcgtgaccgc cgccgggatc actctcggca 1020 tggacgagct gtacaagtaa taagctt
1047 7 323 PRT Artificial Sequence Synthetic Sequence 7 Met Gly His
His His His His His Gly Gln Ile Phe Val Lys Thr Leu 1 5 10 15 Thr
Gly Lys Thr Ile Thr Leu Glu Val Glu Pro Ser Asp Thr Ile Glu 20 25
30 Asn Val Lys Ala Lys Ile Gln Asp Lys Glu Gly Ile Pro Pro Asp Gln
35 40 45 Gln Arg Leu Ile Phe Ala Gly Lys Gln Leu Glu Asp Gly Arg
Thr Leu 50 55 60 Ser Asp Tyr Asn Ile Gln Lys Glu Ser Thr Leu His
Leu Val Leu Arg 65 70 75 80 Leu Arg Gly Gly Met Val Ser Lys Gly Glu
Glu Leu Phe Thr Gly Val 85 90 95 Val Pro Ile Leu Val Glu Leu Asp
Gly Asp Val Asn Gly His Lys Phe 100 105 110 Ser Val Ser Gly Glu Gly
Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr 115 120 125 Leu Lys Phe Ile
Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr 130 135 140 Leu Val
Thr Thr Leu Thr Tyr Gly Val Gln Cys Phe Ser Arg Tyr Pro 145 150 155
160 Asp His Met Lys Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly
165 170 175 Tyr Val Gln Glu Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn
Tyr Lys 180 185 190 Thr Arg Ala Glu Val Lys Phe Glu Gly Asp Thr Leu
Val Asn Arg Ile 195 200 205 Glu Leu Lys Gly Ile Asp Phe Lys Glu Asp
Gly Asn Ile Leu Gly His 210 215 220 Lys Leu Glu Tyr Asn Tyr Asn Ser
His Asn Val Tyr Ile Met Ala Asp 225 230 235 240 Lys Gln Lys Asn Gly
Ile Lys Val Asn Phe Lys Ile Arg His Asn Ile 245 250 255 Glu Asp Gly
Ser Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro 260 265 270 Ile
Gly Asp Gly Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr 275 280
285 Gln Ser Ala Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val
290 295 300 Leu Leu Glu Phe Val Thr Ala Ala Gly Ile Thr Leu Gly Met
Asp Glu 305 310 315 320 Leu Tyr Lys 8 981 DNA Artificial Sequence
Synthetic Sequence 8 ccatgggtca tcaccatcat catcacgggc agatcttcgt
caagacgtta accggtaaaa 60 ccataactct agaagttgaa ccatccgata
ccatcgaaaa cgttaaggct aaaattcaag 120 acaaggaagg cattccacct
gatcaacaaa gattgatctt tgccggtaag cagctcgagg 180 acggtagaac
gctgtctgat tacaacattc agaaggagtc gaccttacat cttgtcttac 240
gcctacgtgg aggtatggtg agcaagggcg aggagctgtt caccggggtg gtgcccatcc
300 tggtcgagct ggacggcgac gtaaacggcc acaagttcag cgtgtccggc
gagggcgagg 360 gcgatgccac ctacggcaag ctgaccctga agttcatctg
caccaccggc aagctgcccg 420 tgccctggcc caccctcgtg accaccctga
cctacggcgt gcagtgcttc agccgctacc 480 ccgaccacat gaagcagcac
gacttcttca agtccgccat gcccgaaggc tacgtccagg 540 agcgcaccat
cttcttcaag gacgacggca actacaagac ccgcgccgag gtgaagttcg 600
agggcgacac cctggtgaac cgcatcgagc tgaagggcat cgacttcaag gaggacggca
660 acatcctggg gcacaagctg gagtacaact acaacagcca caacgtctat
atcatggccg 720 acaagcagaa gaacggcatc aaggtgaact tcaagatccg
ccacaacatc gaggacggca 780 gcgtgcagct cgccgaccac taccagcaga
acacccccat cggcgacggc cccgtgctgc 840 tgcccgacaa ccactacctg
agcacccagt ccgccctgag caaagacccc aacgagaagc 900 gcgatcacat
ggtcctgctg gagttcgtga ccgccgccgg gatcactctc ggcatggacg 960
agctgtacaa gtaataagct t 981 9 346 PRT Artificial Sequence Synthetic
Sequence 9 Met Gly His His His His His His Gly Val Asn Val Lys Val
Glu Phe 1 5 10 15 Leu Gly Gly Leu Asp Ala Ile Phe Gly Lys Gln Arg
Val His Lys Ile 20 25 30 Lys Met Asp Lys Glu Asp Pro Val Thr Val
Gly Asp Leu Ile Asp His 35 40 45 Ile Val Ser Thr Met Ile Asn Asn
Pro Asn Asp Val Ser Ile Phe Ile 50 55 60 Glu Asp Asp Ser Ile Arg
Pro Gly Ile Ile Thr Leu Ile Asn Asp Thr 65 70 75 80 Asp Trp Glu Leu
Glu Gly Glu Lys Asp Tyr Ile Leu Glu Asp Gly Asp 85 90 95 Ile Ile
Ser Phe Thr Ser Thr Leu His Gly Gly Met Val Ser Lys Gly 100 105 110
Glu Glu Leu Phe Thr Gly Val Val Pro Ile Leu Val Glu Leu Asp Gly 115
120 125 Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu Gly Glu Gly
Asp 130 135 140 Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe Ile Cys Thr
Thr Gly Lys 145 150 155 160 Leu Pro Val Pro Trp Pro Thr Leu Val Thr
Thr Leu Thr Tyr Gly Val 165 170 175 Gln Cys Phe Ser Arg Tyr Pro Asp
His Met Lys Gln His Asp Phe Phe 180 185 190 Lys Ser Ala Met Pro Glu
Gly Tyr Val Gln Glu Arg Thr Ile Phe Phe 195 200 205 Lys Asp Asp Gly
Asn Tyr Lys Thr Arg Ala Glu Val Lys Phe Glu Gly 210 215 220 Asp Thr
Leu Val Asn Arg Ile Glu Leu Lys Gly Ile Asp Phe Lys Glu 225 230 235
240 Asp Gly Asn Ile Leu Gly His Lys Leu Glu Tyr Asn Tyr Asn Ser His
245 250 255 Asn Val Tyr Ile Met Ala Asp Lys Gln Lys Asn Gly Ile Lys
Val Asn 260 265 270 Phe Lys Ile Arg His Asn Ile Glu Asp Gly Ser Val
Gln Leu Ala Asp 275 280 285 His Tyr Gln Gln Asn Thr Pro Ile Gly Asp
Gly Pro Val Leu Leu Pro 290 295 300 Asp Asn His Tyr Leu Ser Thr Gln
Ser Ala Leu Ser Lys Asp Pro Asn 305 310 315 320 Glu Lys Arg Asp His
Met Val Leu Leu Glu Phe Val Thr Ala Ala Gly 325 330 335 Ile Thr Leu
Gly Met Asp Glu Leu Tyr Lys 340 345 10 1050 DNA Artificial Sequence
Synthetic Sequence 10 ccatgggtca tcaccatcat catcacgggg taaacgtgaa
agtggagttt ctaggtggac 60 ttgatgctat ttttggaaaa caaagagtac
ataaaattaa gatggacaaa gaagatcctg 120 tcacagtggg cgatttgatt
gaccacattg tatctactat gatcaataac cctaatgacg 180 ttagtatctt
catcgaagat gattctataa gacccggtat catcacatta atcaacgaca 240
ccgactggga gctcgaaggc gaaaaagact acatattgga agacggtgac atcatctctt
300 ttacttcaac attacatgga ggtatggtga gcaagggcga ggagctgttc
accggggtgg 360 tgcccatcct ggtcgagctg gacggcgacg taaacggcca
caagttcagc gtgtccggcg 420 agggcgaggg cgatgccacc tacggcaagc
tgaccctgaa gttcatctgc accaccggca 480 agctgcccgt gccctggccc
accctcgtga ccaccctgac ctacggcgtg cagtgcttca 540 gccgctaccc
cgaccacatg aagcagcacg acttcttcaa gtccgccatg cccgaaggct 600
acgtccagga gcgcaccatc ttcttcaagg acgacggcaa ctacaagacc cgcgccgagg
660 tgaagttcga gggcgacacc ctggtgaacc gcatcgagct gaagggcatc
gacttcaagg 720 aggacggcaa catcctgggg cacaagctgg agtacaacta
caacagccac aacgtctata 780 tcatggccga caagcagaag aacggcatca
aggtgaactt caagatccgc cacaacatcg 840 aggacggcag cgtgcagctc
gccgaccact accagcagaa cacccccatc ggcgacggcc 900 ccgtgctgct
gcccgacaac cactacctga gcacccagtc cgccctgagc aaagacccca 960
acgagaagcg cgatcacatg gtcctgctgg agttcgtgac cgccgccggg atcactctcg
1020 gcatggacga gctgtacaag taataagctt 1050 11 320 PRT Artificial
Sequence Synthetic Sequence 11 Met Gly His His Tyr His His His Gly
Met Ile Glu Val Val Val Asn 1 5 10 15 Asp Arg Leu Gly Lys Lys Val
Arg Val Lys Cys Leu Ala Glu Asp Ser 20 25 30 Val Gly Asp Phe Lys
Lys Val Leu Ser Leu Gln Ile Gly Thr Gln Pro 35 40 45 Asn Lys Ile
Val Leu Gln Lys Gly Gly Ser Val Leu Lys Asp His Ile 50 55 60 Ser
Leu Glu Asp Tyr Glu Val His Asp Gln Thr Asn Leu Glu Leu Tyr 65 70
75 80 Tyr Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro
Ile 85 90 95 Leu Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe
Ser Val Ser 100 105 110 Gly Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys
Leu Thr Leu Lys Phe 115 120 125 Ile Cys Thr Thr Gly Lys Leu Pro Val
Pro Trp Pro Thr Leu Val Thr 130 135 140 Thr Leu Thr Tyr Gly Val Gln
Cys Phe Ser Arg Tyr Pro Asp His Met 145 150 155 160 Lys Gln His Asp
Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gln 165 170 175 Glu Arg
Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala 180 185 190
Glu Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys 195
200 205 Gly Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu
Glu 210 215 220 Tyr Asn Tyr Asn Ser His Asn Val Tyr Ile Met Ala Asp
Lys Gln Lys 225 230 235 240 Asn Gly Ile Lys Val Asn Phe Lys Ile Arg
His Asn Ile Glu Asp Gly 245 250 255 Ser Val Gln Leu Ala Asp His Tyr
Gln Gln Asn Thr Pro Ile Gly Asp 260 265 270 Gly Pro Val Leu Leu Pro
Asp Asn His Tyr Leu Ser Thr Gln Ser Ala 275 280 285 Leu Ser Lys Asp
Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu 290 295 300 Phe Val
Thr Ala Ala
Gly Ile Thr Leu Gly Met Asp Glu Leu Tyr Lys 305 310 315 320 12 972
DNA Artificial Sequence Synthetic Sequence 12 ccatgggtca tcactatcat
catcacggga tgattgaggt agttgtgaat gaccgattag 60 gcaaaaaagt
cagagtgaag tgccttgctg aagatagtgt aggtgatttc aaaaaagtat 120
tgtccttgca aattggcacc caaccaaaca aaattgtgtt gcagaagggt ggaagtgttt
180 taaaagacca tatctctctg gaagattatg aggtacatga tcagacaaat
ttggagctgt 240 attacatggt gagcaagggc gaggagctgt tcaccggggt
ggtgcccatc ctggtcgagc 300 tggacggcga cgtaaacggc cacaagttca
gcgtgtccgg cgagggcgag ggcgatgcca 360 cctacggcaa gctgaccctg
aagttcatct gcaccaccgg caagctgccc gtgccctggc 420 ccaccctcgt
gaccaccctg acctacggcg tgcagtgctt cagccgctac cccgaccaca 480
tgaagcagca cgacttcttc aagtccgcca tgcccgaagg ctacgtccag gagcgcacca
540 tcttcttcaa ggacgacggc aactacaaga cccgcgccga ggtgaagttc
gagggcgaca 600 ccctggtgaa ccgcatcgag ctgaagggca tcgacttcaa
ggaggacggc aacatcctgg 660 ggcacaagct ggagtacaac tacaacagcc
acaacgtcta tatcatggcc gacaagcaga 720 agaacggcat caaggtgaac
ttcaagatcc gccacaacat cgaggacggc agcgtgcagc 780 tcgccgacca
ctaccagcag aacaccccca tcggcgacgg ccccgtgctg ctgcccgaca 840
accactacct gagcacccag tccgccctga gcaaagaccc caacgagaag cgcgatcaca
900 tggtcctgct ggagttcgtg accgccgccg ggatcactct cggcatggac
gagctgtaca 960 agtaataagc tt 972 13 323 PRT Artificial Sequence
Synthetic Sequence misc_feature 13, 69 Xaa = unknown 13 Met Gly His
His His His His His Gly Ile Val Lys Xaa Lys Thr Leu 1 5 10 15 Thr
Gly Lys Glu Ile Ser Val Glu Leu Lys Glu Ser Asp Leu Val Tyr 20 25
30 His Ile Lys Glu Leu Leu Glu Glu Lys Glu Gly Ile Pro Pro Ser Gln
35 40 45 Gln Arg Leu Ile Phe Gln Gly Lys Gln Ile Asp Asp Lys Leu
Thr Val 50 55 60 Thr Asp Ala His Xaa Val Glu Gly Met Gln Leu His
Leu Val Leu Thr 65 70 75 80 Leu Arg Gly Gly Met Val Ser Lys Gly Glu
Glu Leu Phe Thr Gly Val 85 90 95 Val Pro Ile Leu Val Glu Leu Asp
Gly Asp Val Asn Gly His Lys Phe 100 105 110 Ser Val Ser Gly Glu Gly
Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr 115 120 125 Leu Lys Phe Ile
Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr 130 135 140 Leu Val
Thr Thr Leu Thr Tyr Gly Val Gln Cys Phe Ser Arg Tyr Pro 145 150 155
160 Asp His Met Lys Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly
165 170 175 Tyr Val Gln Glu Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn
Tyr Lys 180 185 190 Thr Arg Ala Glu Val Lys Phe Glu Gly Asp Thr Leu
Val Asn Arg Ile 195 200 205 Glu Leu Lys Gly Ile Asp Phe Lys Glu Asp
Gly Asn Ile Leu Gly His 210 215 220 Lys Leu Glu Tyr Asn Tyr Asn Ser
His Asn Val Tyr Ile Met Ala Asp 225 230 235 240 Lys Gln Lys Asn Gly
Ile Lys Val Asn Phe Lys Ile Arg His Asn Ile 245 250 255 Glu Asp Gly
Ser Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro 260 265 270 Ile
Gly Asp Gly Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr 275 280
285 Gln Ser Ala Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val
290 295 300 Leu Leu Glu Phe Val Thr Ala Ala Gly Ile Thr Leu Gly Met
Asp Glu 305 310 315 320 Leu Tyr Lys 14 981 DNA Artificial Sequence
Synthetic Sequence misc_feature 40, 207 n = a, c, g, or t 14
ccatgggtca tcaccatcat catcacggga ttgttaaagn gaagacactg actgggaagg
60 agatctctgt tgagctgaag gaatcagatc tcgtatatca catcaaggaa
cttttggagg 120 aaaaagaagg gattccacca tctcaacaaa gacttatatt
ccagggaaaa caaattgatg 180 ataaattaac agtaacggat gcacatntag
tagagggaat gcaactccac ttggtattaa 240 cactacgcgg aggtatggtg
agcaagggcg aggagctgtt caccggggtg gtgcccatcc 300 tggtcgagct
ggacggcgac gtaaacggcc acaagttcag cgtgtccggc gagggcgagg 360
gcgatgccac ctacggcaag ctgaccctga agttcatctg caccaccggc aagctgcccg
420 tgccctggcc caccctcgtg accaccctga cctacggcgt gcagtgcttc
agccgctacc 480 ccgaccacat gaagcagcac gacttcttca agtccgccat
gcccgaaggc tacgtccagg 540 agcgcaccat cttcttcaag gacgacggca
actacaagac ccgcgccgag gtgaagttcg 600 agggcgacac cctggtgaac
cgcatcgagc tgaagggcat cgacttcaag gaggacggca 660 acatcctggg
gcacaagctg gagtacaact acaacagcca caacgtctat atcatggccg 720
acaagcagaa gaacggcatc aaggtgaact tcaagatccg ccacaacatc gaggacggca
780 gcgtgcagct cgccgaccac taccagcaga acacccccat cggcgacggc
cccgtgctgc 840 tgcccgacaa ccactacctg agcacccagt ccgccctgag
caaagacccc aacgagaagc 900 gcgatcacat ggtcctgctg gagttcgtga
ccgccgccgg gatcactctc ggcatggacg 960 agctgtacaa gtaataagct t 981 15
363 PRT Artificial Sequence Synthetic Sequence 15 Met Gly His His
His His His His Gly Lys Ser Thr Phe Lys Ser Glu 1 5 10 15 Tyr Pro
Phe Glu Lys Arg Lys Ala Glu Ser Glu Arg Ile Ala Asp Arg 20 25 30
Phe Lys Asn Arg Ile Pro Val Ile Cys Glu Lys Ala Glu Lys Ser Asp 35
40 45 Ile Pro Glu Ile Asp Lys Arg Lys Tyr Leu Val Pro Ala Asp Leu
Thr 50 55 60 Val Gly Gln Phe Val Tyr Val Ile Arg Lys Arg Ile Met
Leu Pro Pro 65 70 75 80 Glu Lys Ala Ile Phe Ile Phe Val Asn Asp Thr
Leu Pro Pro Thr Ala 85 90 95 Ala Leu Met Ser Ala Ile Tyr Gln Glu
His Lys Asp Lys Asp Gly Phe 100 105 110 Leu Tyr Val Thr Tyr Ser Gly
Glu Asn Thr Phe Gly Met Val Ser Lys 115 120 125 Gly Glu Glu Leu Phe
Thr Gly Val Val Pro Ile Leu Val Glu Leu Asp 130 135 140 Gly Asp Val
Asn Gly His Lys Phe Ser Val Ser Gly Glu Gly Glu Gly 145 150 155 160
Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe Ile Cys Thr Thr Gly 165
170 175 Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Leu Thr Tyr
Gly 180 185 190 Val Gln Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gln
His Asp Phe 195 200 205 Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gln
Glu Arg Thr Ile Phe 210 215 220 Phe Lys Asp Asp Gly Asn Tyr Lys Thr
Arg Ala Glu Val Lys Phe Glu 225 230 235 240 Gly Asp Thr Leu Val Asn
Arg Ile Glu Leu Lys Gly Ile Asp Phe Lys 245 250 255 Glu Asp Gly Asn
Ile Leu Gly His Lys Leu Glu Tyr Asn Tyr Asn Ser 260 265 270 His Asn
Val Tyr Ile Met Ala Asp Lys Gln Lys Asn Gly Ile Lys Val 275 280 285
Asn Phe Lys Ile Arg His Asn Ile Glu Asp Gly Ser Val Gln Leu Ala 290
295 300 Asp His Tyr Gln Gln Asn Thr Pro Ile Gly Asp Gly Pro Val Leu
Leu 305 310 315 320 Pro Asp Asn His Tyr Leu Ser Thr Gln Ser Ala Leu
Ser Lys Asp Pro 325 330 335 Asn Glu Lys Arg Asp His Met Val Leu Leu
Glu Phe Val Thr Ala Ala 340 345 350 Gly Ile Thr Leu Gly Met Asp Glu
Leu Tyr Lys 355 360 16 1099 DNA Artificial Sequence Synthetic
Sequence 16 atgggtcatc accatcatca tcacgggaag tctacattta agtctgaata
tccatttgaa 60 aaaaggaagg cggagtcgga gaggattgct gacaggttca
agaataggat acctgtgatt 120 tgcgaaaaag ctgaaaagtc agatattcca
gagattgata agcgtaaata tctagttcct 180 gctgacctta ccgtagggca
atttgtttat gttataagaa agaggattat gctaccccct 240 gagaaggcca
tcttcatttt tgtcaatgat actttgccac ctactgcggc gttgatgtct 300
gccatatatc aagaacacaa ggataaggac gggtttttgt atgtcactta ctcaggagaa
360 aatacatttg gtatggtgag caagggcgag gagctgttca ccggggtggt
gcccatcctg 420 gtcgagctgg acggcgacgt aaacggccac aagttcagcg
tgtccggcga gggcgagggc 480 gatgccacct acggcaagct gaccctgaag
ttcatctgca ccaccggcaa gctgcccgtg 540 ccctggccca ccctcgtgac
caccctgacc tacggcgtgc agtgcttcag ccgctacccc 600 gaccacatga
agcagcacga cttcttcaag tccgccatgc ccgaaggcta cgtccaggag 660
cgcaccatct tcttcaagga cgacggcaac tacaagaccc gcgccgaggt gaagttcgag
720 ggcgacaccc tggtgaaccg catcgagctg aagggcatcg acttcaagga
ggacggcaac 780 atcctggggc acaagctgga gtacaactac aacagccaca
acgtctatat catggccgac 840 aagcagaaga acggcatcaa ggtgaacttc
aagatccgcc acaacatcga ggacggcagc 900 gtgcagctcg ccgaccacta
ccagcagaac acccccatcg gcgacggccc cgtgctgctg 960 cccgacaacc
actacctgag cacccagtcc gccctgagca aagaccccaa cgagaagcgc 1020
gatcacatgg tcctgctgga gttcgtgacc gccgccggga tcactctcgg catggacgag
1080 ctgtacaagt aataagctt 1099 17 433 PRT Artificial Sequence
Synthetic Sequence misc_feature 176 Xaa = unknown 17 Met Gly His
His His His His His Gly Ser Arg Ile Leu Glu Ser Glu 1 5 10 15 Asn
Glu Thr Glu Ser Asp Glu Ser Ser Ile Ile Ser Thr Asn Asn Gly 20 25
30 Thr Ala Met Glu Arg Ser Arg Asn Asn Gln Glu Leu Arg Ser Ser Pro
35 40 45 His Thr Val Gln Asn Arg Leu Glu Leu Phe Ser Arg Arg Leu
Ser Gln 50 55 60 Leu Gly Leu Ala Ser Asp Ile Ser Val Asp Gln Gln
Val Glu Asp Ser 65 70 75 80 Ser Ser Gly Thr Tyr Glu Gln Glu Glu Thr
Ile Lys Thr Asn Ala Gln 85 90 95 Thr Ser Lys Gln Lys Ser His Lys
Asp Glu Lys Asn Ile Gln Lys Ile 100 105 110 Gln Ile Lys Phe Gln Pro
Ile Gly Ser Ile Gly Gln Leu Lys Pro Ser 115 120 125 Val Cys Lys Ile
Ser Met Ser Gln Ser Phe Ala Met Val Ile Leu Phe 130 135 140 Leu Lys
Arg Arg Leu Lys Met Asp His Val Tyr Cys Tyr Ile Asn Asn 145 150 155
160 Ser Phe Ala Pro Ser Pro Gln Gln Asn Ile Gly Glu Leu Trp Met Xaa
165 170 175 Phe Lys Thr Asn Asp Glu Leu Ile Val Ser Tyr Cys Ala Ser
Val Ala 180 185 190 Phe Gly Met Val Ser Lys Gly Glu Glu Leu Phe Thr
Gly Val Val Pro 195 200 205 Ile Leu Val Glu Leu Asp Gly Asp Val Asn
Gly His Lys Phe Ser Val 210 215 220 Ser Gly Glu Gly Glu Gly Asp Ala
Thr Tyr Gly Lys Leu Thr Leu Lys 225 230 235 240 Phe Ile Cys Thr Thr
Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val 245 250 255 Thr Thr Leu
Thr Tyr Gly Val Gln Cys Phe Ser Arg Tyr Pro Asp His 260 265 270 Met
Lys Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val 275 280
285 Gln Glu Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg
290 295 300 Ala Glu Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile
Glu Leu 305 310 315 320 Lys Gly Ile Asp Phe Lys Glu Asp Gly Asn Ile
Leu Gly His Lys Leu 325 330 335 Glu Tyr Asn Tyr Asn Ser His Asn Val
Tyr Ile Met Ala Asp Lys Gln 340 345 350 Lys Asn Gly Ile Lys Val Asn
Phe Lys Ile Arg His Asn Ile Glu Asp 355 360 365 Gly Ser Val Gln Leu
Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly 370 375 380 Asp Gly Pro
Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gln Ser 385 390 395 400
Ala Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu 405
410 415 Glu Phe Val Thr Ala Ala Gly Ile Thr Leu Gly Met Asp Glu Leu
Tyr 420 425 430 Lys 18 1311 DNA Artificial Sequence Synthetic
Sequence misc_feature 401, 529 n = a, c, g, or t 18 ccatgggtca
tcaccatcat catcacggga gtaggatcct agagagcgaa aatgaaacag 60
aaagtgacga aagctccatc atatccacaa ataatggaac ggcaatggaa agatccagaa
120 ataatcaaga attaagatca tctcctcata ccgttcaaaa tagattggaa
ctttttagca 180 ggagattgtc tcagcttggt ttggcgagtg acatttctgt
cgaccagcaa gttgaagatt 240 cctctagtgg cacttatgaa caggaagaga
caatcaaaac gaatgcacaa acaagcaaac 300 aaaaaagcca taaagacgaa
aaaaacatac aaaagataca gataaaattt cagcccattg 360 gttctattgg
gcagttaaaa ccatctgttt gtaaaatatc natgtcacag tcttttgcaa 420
tggttatttt atttcttaag agacggctga aaatggacca tgtttattgt tatataaata
480 attcgtttgc gccaagtccg cagcaaaata ttggtgaact ttggatgcna
ttcaagacta 540 atgatgagct tattgtaagt tattgtgcat ccgtagcgtt
tggtatggtg agcaagggcg 600 aggagctgtt caccggggtg gtgcccatcc
tggtcgagct ggacggcgac gtaaacggcc 660 acaagttcag cgtgtccggc
gagggcgagg gcgatgccac ctacggcaag ctgaccctga 720 agttcatctg
caccaccggc aagctgcccg tgccctggcc caccctcgtg accaccctga 780
cctacggcgt gcagtgcttc agccgctacc ccgaccacat gaagcagcac gacttcttca
840 agtccgccat gcccgaaggc tacgtccagg agcgcaccat cttcttcaag
gacgacggca 900 actacaagac ccgcgccgag gtgaagttcg agggcgacac
cctggtgaac cgcatcgagc 960 tgaagggcat cgacttcaag gaggacggca
acatcctggg gcacaagctg gagtacaact 1020 acaacagcca caacgtctat
atcatggccg acaagcagaa gaacggcatc aaggtgaact 1080 tcaagatccg
ccacaacatc gaggacggca gcgtgcagct cgccgaccac taccagcaga 1140
acacccccat cggcgacggc cccgtgctgc tgcccgacaa ccactacctg agcacccagt
1200 ccgccctgag caaagacccc aacgagaagc gcgatcacat ggtcctgctg
gagttcgtga 1260 ccgccgccgg gatcactctc ggcatggacg agctgtacaa
gtaataagct t 1311 19 410 PRT Artificial Sequence Synthetic Sequence
19 Met Gly His His His His His His Gly Gly Trp Asp Leu Thr Val Lys
1 5 10 15 Met Leu Ala Gly Asn Glu Phe Gln Val Ser Leu Ser Ser Ser
Met Ser 20 25 30 Val Ser Glu Leu Lys Ala Gln Ile Thr Gln Lys Ile
Gly Val His Ala 35 40 45 Phe Gln Gln Arg Leu Ala Val His Pro Ser
Gly Val Ala Leu Gln Asp 50 55 60 Arg Val Pro Leu Ala Ser Gln Gly
Leu Gly Pro Gly Ser Thr Val Leu 65 70 75 80 Leu Val Val Asp Lys Cys
Asp Glu Pro Leu Ser Ile Leu Val Arg Asn 85 90 95 Asn Lys Gly Arg
Ser Ser Thr Tyr Glu Val Arg Leu Thr Gln Thr Val 100 105 110 Ala His
Leu Lys Gln Gln Val Ser Gly Leu Glu Gly Val Gln Asp Asp 115 120 125
Leu Phe Trp Leu Thr Phe Glu Gly Lys Pro Leu Glu Asp Gln Leu Pro 130
135 140 Leu Gly Glu Tyr Gly Leu Lys Pro Leu Ser Thr Val Phe Met Asn
Leu 145 150 155 160 Arg Leu Arg Gly Gly Gly Thr Glu Pro Gly Gly Met
Val Ser Lys Gly 165 170 175 Glu Glu Leu Phe Thr Gly Val Val Pro Ile
Leu Val Glu Leu Asp Gly 180 185 190 Asp Val Asn Gly His Lys Phe Ser
Val Ser Gly Glu Gly Glu Gly Asp 195 200 205 Ala Thr Tyr Gly Lys Leu
Thr Leu Lys Phe Ile Cys Thr Thr Gly Lys 210 215 220 Leu Pro Val Pro
Trp Pro Thr Leu Val Thr Thr Leu Thr Tyr Gly Val 225 230 235 240 Gln
Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gln His Asp Phe Phe 245 250
255 Lys Ser Ala Met Pro Glu Gly Tyr Val Gln Glu Arg Thr Ile Phe Phe
260 265 270 Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val Lys Phe
Glu Gly 275 280 285 Asp Thr Leu Val Asn Arg Ile Glu Leu Lys Gly Ile
Asp Phe Lys Glu 290 295 300 Asp Gly Asn Ile Leu Gly His Lys Leu Glu
Tyr Asn Tyr Asn Ser His 305 310 315 320 Asn Val Tyr Ile Met Ala Asp
Lys Gln Lys Asn Gly Ile Lys Val Asn 325 330 335 Phe Lys Ile Arg His
Asn Ile Glu Asp Gly Ser Val Gln Leu Ala Asp 340 345 350 His Tyr Gln
Gln Asn Thr Pro Ile Gly Asp Gly Pro Val Leu Leu Pro 355 360 365 Asp
Asn His Tyr Leu Ser Thr Gln Ser Ala Leu Ser Lys Asp Pro Asn 370 375
380 Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val Thr Ala Ala Gly
385 390 395 400 Ile Thr Leu Gly Met Asp Glu Leu Tyr Lys 405 410 20
1242 DNA Artificial Sequence Synthetic Sequence 20 ccatgggtca
tcaccatcat catcacgggg gctgggacct gacggtgaag atgctggcgg 60
gcaacgaatt ccaggtgtcc ctgagcagct ccatgtcggt gtcagagctg aaggcgcaga
120 tcacccagaa gattggcgtg cacgccttcc agcagcgtct ggctgtccac
ccgagcggtg 180 tggcgctgca ggacagggtc ccccttgcca gccagggcct
gggccctggc agcacggtcc 240 tgctggtggt ggacaaatgc gacgaacctc
tgagcatcct ggtgaggaat aacaagggcc 300 gcagcagcac ctacgaggtc
cggctgacgc agaccgtggc ccacctgaag cagcaagtga 360 gcgggctgga
gggtgtgcag gacgacctgt tctggctgac cttcgagggg aagcccctgg 420
aggaccagct cccgctgggg gagtacggcc tcaagcccct gagcaccgtg ttcatgaatc
480 tgcgcctgcg gggaggcggc acagagcctg gaggtatggt gagcaagggc
gaggagctgt 540
tcaccggggt ggtgcccatc ctggtcgagc tggacggcga cgtaaacggc cacaagttca
600 gcgtgtccgg cgagggcgag ggcgatgcca cctacggcaa gctgaccctg
aagttcatct 660 gcaccaccgg caagctgccc gtgccctggc ccaccctcgt
gaccaccctg acctacggcg 720 tgcagtgctt cagccgctac cccgaccaca
tgaagcagca cgacttcttc aagtccgcca 780 tgcccgaagg ctacgtccag
gagcgcacca tcttcttcaa ggacgacggc aactacaaga 840 cccgcgccga
ggtgaagttc gagggcgaca ccctggtgaa ccgcatcgag ctgaagggca 900
tcgacttcaa ggaggacggc aacatcctgg ggcacaagct ggagtacaac tacaacagcc
960 acaacgtcta tatcatggcc gacaagcaga agaacggcat caaggtgaac
ttcaagatcc 1020 gccacaacat cgaggacggc agcgtgcagc tcgccgacca
ctaccagcag aacaccccca 1080 tcggcgacgg ccccgtgctg ctgcccgaca
accactacct gagcacccag tccgccctga 1140 gcaaagaccc caacgagaag
cgcgatcaca tggtcctgct ggagttcgtg accgccgccg 1200 ggatcactct
cggcatggac gagctgtaca agtaataagc tt 1242 21 166 PRT Artificial
Sequence Synthetic Sequence 21 Met Gly His His His His His His Gly
Ser Asp Ser Glu Val Asn Gln 1 5 10 15 Glu Ala Lys Pro Glu Val Lys
Pro Glu Val Lys Pro Glu Thr His Ile 20 25 30 Asn Leu Lys Val Ser
Asp Gly Ser Ser Glu Ile Phe Phe Lys Ile Lys 35 40 45 Lys Thr Thr
Pro Leu Arg Arg Leu Met Glu Ala Phe Ala Lys Arg Gln 50 55 60 Gly
Lys Glu Met Asp Ser Leu Arg Phe Leu Tyr Asp Gly Ile Arg Ile 65 70
75 80 Gln Ala Asp Gln Thr Pro Glu Asp Leu Asp Met Glu Asp Asn Asp
Ile 85 90 95 Ile Glu Ala His Arg Glu Gln Ile Gly Gly Thr Pro Ala
Val Thr Thr 100 105 110 Tyr Lys Leu Val Ile Asn Gly Lys Thr Leu Lys
Gly Glu Thr Thr Thr 115 120 125 Lys Ala Val Asp Ala Glu Thr Ala Glu
Lys Ala Phe Lys Gln Tyr Ala 130 135 140 Asn Asp Asn Gly Val Asp Gly
Val Trp Thr Tyr Asp Asp Ala Thr Lys 145 150 155 160 Thr Phe Thr Val
Thr Glu 165 22 510 DNA Artificial Sequence Synthetic Sequence 22
ccatgggtca tcaccatcat catcacgggt cggactcaga agtcaatcaa gaagctaagc
60 cagaggtcaa gccagaagtc aagcctgaga ctcacatcaa tttaaaggtg
tccgatggat 120 cttcagagat cttcttcaag atcaaaaaga ccactccttt
aagaaggctg atggaagcgt 180 tcgctaaaag acagggtaag gaaatggact
ccttaagatt cttgtacgac ggtattagaa 240 ttcaagctga tcagacccct
gaagatttgg acatggagga taacgatatt attgaggctc 300 accgcgaaca
gattggaggt acgccggcgg tgaccaccta taaactggtg attaacggca 360
aaaccctgaa aggcgaaacc accaccaaag cggtggatgc ggaaaccgcg gaaaaagcgt
420 ttaaacagta tgcgaacgat aacggcgtgg atggcgtgtg gacctatgat
gatgcgacca 480 aaacctttac cgtgaccgaa taataagctt 510 23 711 PRT
Artificial Sequence Synthetic Sequence 23 Met Gly His His His His
His His Gly Ser Asp Ser Glu Val Asn Gln 1 5 10 15 Glu Ala Lys Pro
Glu Val Lys Pro Glu Val Lys Pro Glu Thr His Ile 20 25 30 Asn Leu
Lys Val Ser Asp Gly Ser Ser Glu Ile Phe Phe Lys Ile Lys 35 40 45
Lys Thr Thr Pro Leu Arg Arg Leu Met Glu Ala Phe Ala Lys Arg Gln 50
55 60 Gly Lys Glu Met Asp Ser Leu Arg Phe Leu Tyr Asp Gly Ile Arg
Ile 65 70 75 80 Gln Ala Asp Gln Thr Pro Glu Asp Leu Asp Met Glu Asp
Asn Asp Ile 85 90 95 Ile Glu Ala His Arg Glu Gln Ile Gly Gly Met
Glu Phe Met Leu Arg 100 105 110 Pro Val Glu Thr Pro Thr Arg Glu Ile
Lys Lys Leu Asp Gly Leu Trp 115 120 125 Ala Phe Ser Leu Asp Arg Glu
Asn Cys Gly Ile Asp Gln Arg Trp Trp 130 135 140 Glu Ser Ala Leu Gln
Glu Ser Arg Ala Ile Ala Val Pro Gly Ser Phe 145 150 155 160 Asn Asp
Gln Phe Ala Asp Ala Asp Ile Arg Asn Tyr Ala Gly Asn Val 165 170 175
Trp Tyr Gln Arg Glu Val Phe Ile Pro Lys Gly Trp Ala Gly Gln Arg 180
185 190 Ile Val Leu Arg Phe Asp Ala Val Thr His Tyr Gly Lys Val Trp
Val 195 200 205 Asn Asn Gln Glu Val Met Glu His Gln Gly Gly Tyr Thr
Pro Phe Glu 210 215 220 Ala Asp Val Thr Pro Tyr Val Ile Ala Gly Lys
Ser Val Arg Ile Thr 225 230 235 240 Val Cys Val Asn Asn Glu Leu Asn
Trp Gln Thr Ile Pro Pro Gly Met 245 250 255 Val Ile Thr Asp Glu Asn
Gly Lys Lys Lys Gln Ser Tyr Phe His Asp 260 265 270 Phe Phe Asn Tyr
Ala Gly Ile His Arg Ser Val Met Leu Tyr Thr Thr 275 280 285 Pro Asn
Thr Trp Val Asp Asp Ile Thr Val Val Thr His Val Ala Gln 290 295 300
Asp Cys Asn His Ala Ser Val Asp Trp Gln Val Val Ala Asn Gly Asp 305
310 315 320 Val Ser Val Glu Leu Arg Asp Ala Asp Gln Gln Val Val Ala
Thr Gly 325 330 335 Gln Gly Thr Ser Gly Thr Leu Gln Val Val Asn Pro
His Leu Trp Gln 340 345 350 Pro Gly Glu Gly Tyr Leu Tyr Glu Leu Cys
Val Thr Ala Lys Ser Gln 355 360 365 Thr Glu Cys Asp Ile Tyr Pro Leu
Arg Val Gly Ile Arg Ser Val Ala 370 375 380 Val Lys Gly Gln Gln Phe
Leu Ile Asn His Lys Pro Phe Tyr Phe Thr 385 390 395 400 Gly Phe Gly
Arg His Glu Asp Ala Asp Leu Arg Gly Lys Gly Phe Asp 405 410 415 Asn
Val Leu Met Val His Asp His Ala Leu Met Asp Trp Ile Gly Ala 420 425
430 Asn Ser Tyr Arg Thr Ser His Tyr Pro Tyr Ala Glu Glu Met Leu Asp
435 440 445 Trp Ala Asp Glu His Gly Ile Val Val Ile Asp Glu Thr Ala
Ala Val 450 455 460 Gly Phe Asn Leu Ser Leu Gly Ile Gly Phe Glu Ala
Gly Asn Lys Pro 465 470 475 480 Lys Glu Leu Tyr Ser Glu Glu Ala Val
Asn Gly Glu Thr Gln Gln Ala 485 490 495 His Leu Gln Ala Ile Lys Glu
Leu Ile Ala Arg Asp Lys Asn His Pro 500 505 510 Ser Val Val Met Trp
Ser Ile Ala Asn Glu Pro Asp Thr Arg Pro Gln 515 520 525 Val His Gly
Asn Ile Ser Pro Leu Ala Glu Ala Thr Arg Lys Leu Asp 530 535 540 Pro
Thr Arg Pro Ile Thr Cys Val Asn Val Met Phe Cys Asp Ala His 545 550
555 560 Thr Asp Thr Ile Ser Asp Leu Phe Asp Val Leu Cys Leu Asn Arg
Tyr 565 570 575 Tyr Gly Trp Tyr Val Gln Ser Gly Asp Leu Glu Thr Ala
Glu Lys Val 580 585 590 Leu Glu Lys Glu Leu Leu Ala Trp Gln Glu Lys
Leu His Gln Pro Ile 595 600 605 Ile Ile Thr Glu Tyr Gly Val Asp Thr
Leu Ala Gly Leu His Ser Met 610 615 620 Tyr Thr Asp Met Trp Ser Glu
Glu Tyr Gln Cys Ala Trp Leu Asp Met 625 630 635 640 Tyr His Arg Val
Phe Asp Arg Val Ser Ala Val Val Gly Glu Gln Val 645 650 655 Trp Asn
Phe Ala Asp Phe Ala Thr Ser Gln Gly Ile Leu Arg Val Gly 660 665 670
Gly Asn Lys Lys Gly Ile Phe Thr Arg Asp Arg Lys Pro Lys Ser Ala 675
680 685 Ala Phe Leu Leu Gln Lys Arg Trp Thr Gly Met Asn Phe Gly Glu
Lys 690 695 700 Pro Gln Gln Gly Gly Lys Gln 705 710 24 2133 DNA
Artificial Sequence Synthetic Sequence 24 atgggtcatc accatcatca
tcacgggtcg gactcagaag tcaatcaaga agctaagcca 60 gaggtcaagc
cagaagtcaa gcctgagact cacatcaatt taaaggtgtc cgatggatct 120
tcagagatct tcttcaagat caaaaagacc actcctttaa gaaggctgat ggaagcgttc
180 gctaaaagac agggtaagga aatggactcc ttaagattct tgtacgacgg
tattagaatt 240 caagctgatc agacccctga agatttggac atggaggata
acgatattat tgaggctcac 300 cgcgaacaga ttggaggtat ggaattcatg
ttacgtcctg tagaaacccc aacccgtgaa 360 atcaaaaaac tcgacggcct
gtgggcattc agtctggatc gcgaaaactg tggaattgat 420 cagcgttggt
gggaaagcgc gttacaagaa agccgggcaa ttgctgtgcc aggcagtttt 480
aacgatcagt tcgccgatgc agatattcgt aattatgcgg gcaacgtctg gtatcagcgc
540 gaagtcttta taccgaaagg ttgggcaggc cagcgtatcg tgctgcgttt
cgatgcggtc 600 actcattacg gcaaagtgtg ggtcaataat caggaagtga
tggagcatca gggcggctat 660 acgccatttg aagccgatgt cacgccgtat
gttattgccg ggaaaagtgt acgtatcacc 720 gtttgtgtga acaacgaact
gaactggcag actatcccgc cgggaatggt gattaccgac 780 gaaaacggca
agaaaaagca gtcttacttc catgatttct ttaactatgc cggaatccat 840
cgcagcgtaa tgctctacac cacgccgaac acctgggtgg acgatatcac cgtggtgacg
900 catgtcgcgc aagactgtaa ccacgcgtct gttgactggc aggtggtggc
caatggtgat 960 gtcagcgttg aactgcgtga tgcggatcaa caggtggttg
caactggaca aggcactagc 1020 gggactttgc aagtggtgaa tccgcacctc
tggcaaccgg gtgaaggtta tctctatgaa 1080 ctgtgcgtca cagccaaaag
ccagacagag tgtgatatct acccgcttcg cgtcggcatc 1140 cggtcagtgg
cagtgaaggg ccaacagttc ctgattaacc acaaaccgtt ctactttact 1200
ggctttggtc gtcatgaaga tgcggactta cgtggcaaag gattcgataa cgtgctgatg
1260 gtgcacgacc acgcattaat ggactggatt ggggccaact cctaccgtac
ctcgcattac 1320 ccttacgctg aagagatgct cgactgggca gatgaacatg
gcatcgtggt gattgatgaa 1380 actgctgctg tcggctttaa cctctcttta
ggcattggtt tcgaagcggg caacaagccg 1440 aaagaactgt acagcgaaga
ggcagtcaac ggggaaactc agcaagcgca cttacaggcg 1500 attaaagagc
tgatagcgcg tgacaaaaac cacccaagcg tggtgatgtg gagtattgcc 1560
aacgaaccgg atacccgtcc gcaagtgcac gggaatattt cgccactggc ggaagcaacg
1620 cgtaaactcg acccgacgcg tccgatcacc tgcgtcaatg taatgttctg
cgacgctcac 1680 accgatacca tcagcgatct ctttgatgtg ctgtgcctga
accgttatta cggatggtat 1740 gtccaaagcg gcgatttgga aacggcagag
aaggtactgg aaaaagaact tctggcctgg 1800 caggagaaac tgcatcagcc
gattatcatc accgaatacg gcgtggatac gttagccggg 1860 ctgcactcaa
tgtacaccga catgtggagt gaagagtatc agtgtgcatg gctggatatg 1920
tatcaccgcg tctttgatcg cgtcagcgcc gtcgtcggtg aacaggtatg gaatttcgcc
1980 gattttgcga cctcgcaagg catattgcgc gttggcggta acaagaaagg
gatcttcact 2040 cgcgaccgca aaccgaagtc ggcggctttt ctgctgcaaa
aacgctggac tggcatgaac 2100 ttcggtgaaa aaccgcagca gggaggcaaa caa
2133 25 553 PRT Artificial Sequence Synthetic Sequence 25 Met Gly
His His His His His His Gly Ser Asp Ser Glu Val Asn Gln 1 5 10 15
Glu Ala Lys Pro Glu Val Lys Pro Glu Val Lys Pro Glu Thr His Ile 20
25 30 Asn Leu Lys Val Ser Asp Gly Ser Ser Glu Ile Phe Phe Lys Ile
Lys 35 40 45 Lys Thr Thr Pro Leu Arg Arg Leu Met Glu Ala Phe Ala
Lys Arg Gln 50 55 60 Gly Lys Glu Met Asp Ser Leu Arg Phe Leu Tyr
Asp Gly Ile Arg Ile 65 70 75 80 Gln Ala Asp Gln Thr Pro Glu Asp Leu
Asp Met Glu Asp Asn Asp Ile 85 90 95 Ile Glu Ala His Arg Glu Gln
Ile Gly Gly Met Ser Leu Trp Leu Gly 100 105 110 Ala Pro Val Pro Asp
Ile Pro Pro Asp Ser Ala Val Glu Leu Trp Lys 115 120 125 Pro Gly Ala
Gln Asp Ala Ser Ser Gln Ala Gln Gly Gly Ser Ser Cys 130 135 140 Ile
Leu Arg Glu Glu Ala Arg Met Pro His Ser Ala Gly Gly Thr Ala 145 150
155 160 Gly Val Gly Leu Glu Ala Ala Glu Pro Thr Ala Leu Leu Thr Arg
Ala 165 170 175 Glu Pro Pro Ser Glu Pro Thr Glu Ile Arg Pro Gln Lys
Arg Lys Lys 180 185 190 Gly Pro Ala Pro Lys Met Leu Gly Asn Glu Leu
Cys Ser Val Cys Gly 195 200 205 Asp Lys Ala Ser Gly Phe His Tyr Asn
Val Leu Ser Cys Glu Gly Cys 210 215 220 Lys Gly Phe Phe Arg Arg Ser
Val Ile Lys Gly Ala His Tyr Ile Cys 225 230 235 240 His Ser Gly Gly
His Cys Pro Met Asp Thr Tyr Met Arg Arg Lys Cys 245 250 255 Gln Glu
Cys Arg Leu Arg Lys Cys Arg Gln Ala Gly Met Arg Glu Glu 260 265 270
Cys Val Leu Ser Glu Glu Gln Ile Arg Leu Lys Lys Leu Lys Arg Gln 275
280 285 Glu Glu Glu Gln Ala His Ala Thr Ser Leu Pro Pro Arg Arg Ser
Ser 290 295 300 Pro Pro Gln Ile Leu Pro Gln Leu Ser Pro Glu Gln Leu
Gly Met Ile 305 310 315 320 Glu Lys Leu Val Ala Ala Gln Gln Gln Cys
Asn Arg Arg Ser Phe Ser 325 330 335 Asp Arg Leu Arg Val Thr Pro Trp
Pro Met Ala Pro Asp Pro His Ser 340 345 350 Arg Glu Ala Arg Gln Gln
Arg Phe Ala His Phe Thr Glu Leu Ala Ile 355 360 365 Val Ser Val Gln
Glu Ile Val Asp Phe Ala Lys Gln Leu Pro Gly Phe 370 375 380 Leu Gln
Leu Ser Arg Glu Asp Gln Ile Ala Leu Leu Lys Thr Ser Ala 385 390 395
400 Ile Glu Val Met Leu Leu Glu Thr Ser Arg Arg Tyr Asn Pro Gly Ser
405 410 415 Glu Ser Ile Thr Phe Leu Lys Asp Phe Ser Tyr Asn Arg Glu
Asp Phe 420 425 430 Ala Lys Ala Gly Leu Gln Val Glu Phe Ile Asn Pro
Ile Phe Glu Phe 435 440 445 Ser Arg Ala Met Asn Glu Leu Gln Leu Asn
Asp Ala Glu Phe Ala Leu 450 455 460 Leu Ile Ala Ile Ser Ile Phe Ser
Ala Asp Arg Pro Asn Val Gln Asp 465 470 475 480 Gln Leu Gln Val Glu
Arg Leu Gln His Thr Tyr Val Glu Ala Leu His 485 490 495 Ala Tyr Val
Ser Ile His His Pro His Asp Arg Leu Met Phe Pro Arg 500 505 510 Met
Leu Met Lys Leu Val Ser Leu Arg Thr Leu Ser Ser Val His Ser 515 520
525 Glu Gln Val Phe Ala Leu Arg Leu Gln Asp Lys Lys Leu Pro Pro Leu
530 535 540 Leu Ser Glu Ile Trp Asp Val His Glu 545 550 26 1662 DNA
Artificial Sequence Synthetic Sequence 26 atgggtcatc accatcatca
tcacgggtcg gactcagaag tcaatcaaga agctaagcca 60 gaggtcaagc
cagaagtcaa gcctgagact cacatcaatt taaaggtgtc cgatggatct 120
tcagagatct tcttcaagat caaaaagacc actcctttaa gaaggctgat ggaagcgttc
180 gctaaaagac agggtaagga aatggactcc ttaagattct tgtacgacgg
tattagaatt 240 caagctgatc agacccctga agatttggac atggaggata
acgatattat tgaggctcac 300 cgcgaacaga ttggaggtat gtccttgtgg
ctgggggccc ctgtgcctga cattcctcct 360 gactctgcgg tggagctgtg
gaagccaggc gcacaggatg caagcagcca ggcccaggga 420 ggcagcagct
gcatcctcag agaggaagcc aggatgcccc actctgctgg gggtactgca 480
ggggtggggc tggaggctgc agagcccaca gccctgctca ccagggcaga gcccccttca
540 gaacccacag agatccgtcc acaaaagcgg aaaaaggggc cagcccccaa
aatgctgggg 600 aacgagctat gcagcgtgtg tggggacaag gcctcgggct
tccactacaa tgttctgagc 660 tgcgagggct gcaagggatt cttccgccgc
agcgtcatca agggagcgca ctacatctgc 720 cacagtggcg gccactgccc
catggacacc tacatgcgtc gcaagtgcca ggagtgtcgg 780 cttcgcaaat
gccgtcaggc tggcatgcgg gaggagtgtg tcctgtcaga agaacagatc 840
cgcctgaaga aactgaagcg gcaagaggag gaacaggctc atgccacatc cttgcccccc
900 aggcgttcct caccccccca aatcctgccc cagctcagcc cggaacaact
gggcatgatc 960 gagaagctcg tcgctgccca gcaacagtgt aaccggcgct
ccttttctga ccggcttcga 1020 gtcacgcctt ggcccatggc accagatccc
catagccggg aggcccgtca gcagcgcttt 1080 gcccacttca ctgagctggc
catcgtctct gtgcaggaga tagttgactt tgctaaacag 1140 ctacccggct
tcctgcagct cagccgggag gaccagattg ccctgctgaa gacctctgcg 1200
atcgaggtga tgcttctgga gacatctcgg aggtacaacc ctgggagtga gagtatcacc
1260 ttcctcaagg atttcagtta taaccgggaa gactttgcca aagcagggct
gcaagtggaa 1320 ttcatcaacc ccatcttcga gttctccagg gccatgaatg
agctgcaact caatgatgcc 1380 gagtttgcct tgctcattgc tatcagcatc
ttctctgcag accggcccaa cgtgcaggac 1440 cagctccagg tggagaggct
gcagcacaca tatgtggaag ccctgcatgc ctacgtctcc 1500 atccaccatc
cccatgaccg actgatgttc ccacggatgc taatgaaact ggtgagcctc 1560
cggaccctga gcagcgtcca ctcagagcaa gtgtttgcac tgcgtctgca ggacaaaaag
1620 ctcccaccgc tgctctctga gatctgggat gtgcacgaat ga 1662 27 473 PRT
Artificial Sequence Synthetic Sequence 27 Met Gly His His His His
His His Gly Ser Asp Ser Glu Val Asn Gln 1 5 10 15 Glu Ala Lys Pro
Glu Val Lys Pro Glu Val Lys Pro Glu Thr His Ile 20 25 30 Asn Leu
Lys Val Ser Asp Gly Ser Ser Glu Ile Phe Phe Lys Ile Lys 35 40 45
Lys Thr Thr Pro Leu Arg Arg Leu Met Glu Ala Phe Ala Lys Arg Gln 50
55 60 Gly Lys Glu Met Asp Ser Leu Arg Phe Leu Tyr Asp Gly Ile Arg
Ile 65 70 75 80 Gln Ala Asp Gln Thr Pro Glu Asp Leu Asp Met Glu Asp
Asn Asp Ile 85 90 95 Ile Glu Ala His Arg Glu Gln Ile Gly Gly Met
Cys Pro Asn Ser Ser 100 105 110 Ala Ser Asn Ala Ser Gly Ala Ala Ala
Pro Thr Leu Pro Ala His Pro 115 120 125 Ser Thr Leu Thr His Pro Gln
Arg Arg Ile Asp Thr Leu Asn Ser Asp 130
135 140 Gly Tyr Thr Pro Glu Pro Ala Arg Ile Thr Ser Pro Asp Lys Pro
Arg 145 150 155 160 Pro Met Pro Met Asp Thr Ser Val Tyr Glu Ser Pro
Tyr Ser Asp Pro 165 170 175 Glu Glu Leu Lys Asp Lys Lys Leu Phe Leu
Lys Arg Asp Asn Leu Leu 180 185 190 Ile Ala Asp Ile Glu Leu Gly Cys
Gly Asn Phe Gly Ser Val Arg Gln 195 200 205 Gly Val Tyr Arg Met Arg
Lys Lys Gln Ile Asp Val Ala Ile Lys Val 210 215 220 Leu Lys Gln Gly
Thr Glu Lys Ala Asp Thr Glu Glu Met Met Arg Glu 225 230 235 240 Ala
Gln Ile Met His Gln Leu Asp Asn Pro Tyr Ile Val Arg Leu Ile 245 250
255 Gly Val Cys Gln Ala Glu Ala Leu Met Leu Val Met Glu Met Ala Gly
260 265 270 Gly Gly Pro Leu His Lys Phe Leu Val Gly Lys Arg Glu Glu
Ile Pro 275 280 285 Val Ser Asn Val Ala Glu Leu Leu His Gln Val Ser
Met Gly Met Lys 290 295 300 Tyr Leu Glu Glu Lys Asn Phe Val His Arg
Asp Leu Ala Ala Arg Asn 305 310 315 320 Val Leu Leu Val Asn Arg His
Tyr Ala Lys Ile Ser Asp Phe Gly Leu 325 330 335 Ser Lys Ala Leu Gly
Ala Asp Asp Ser Tyr Tyr Thr Ala Arg Ser Ala 340 345 350 Gly Lys Trp
Pro Leu Lys Trp Tyr Ala Pro Glu Cys Ile Asn Phe Arg 355 360 365 Lys
Phe Ser Ser Arg Ser Asp Val Trp Ser Tyr Gly Val Thr Met Trp 370 375
380 Glu Ala Leu Ser Tyr Gly Gln Lys Pro Tyr Lys Lys Met Lys Gly Pro
385 390 395 400 Glu Val Met Ala Phe Ile Glu Gln Gly Lys Arg Met Glu
Cys Pro Pro 405 410 415 Glu Cys Pro Pro Glu Leu Tyr Ala Leu Met Ser
Asp Cys Trp Ile Tyr 420 425 430 Lys Trp Glu Asp Arg Pro Asp Phe Leu
Thr Val Glu Gln Arg Met Arg 435 440 445 Ala Cys Tyr Tyr Ser Leu Ala
Ser Lys Val Glu Gly Pro Pro Gly Ser 450 455 460 Thr Gln Lys Ala Glu
Ala Ala Cys Ala 465 470 28 1422 DNA Artificial Sequence Synthetic
Sequence 28 atgggtcatc accatcatca tcacgggtcg gactcagaag tcaatcaaga
agctaagcca 60 gaggtcaagc cagaagtcaa gcctgagact cacatcaatt
taaaggtgtc cgatggatct 120 tcagagatct tcttcaagat caaaaagacc
actcctttaa gaaggctgat ggaagcgttc 180 gctaaaagac agggtaagga
aatggactcc ttaagattct tgtacgacgg tattagaatt 240 caagctgatc
agacccctga agatttggac atggaggata acgatattat tgaggctcac 300
cgcgaacaga ttggaggtat gtgccccaac agcagtgcca gcaacgcctc aggggctgct
360 gctcccacac tcccagccca cccatccacg ttgactcatc ctcagagacg
aatcgacacc 420 ctcaactcag atggatacac ccctgagcca gcacgcataa
cgtccccaga caaaccgcgg 480 ccgatgccca tggacacgag cgtgtatgag
agcccctaca gcgacccaga ggagctcaag 540 gacaagaagc tcttcctgaa
gcgcgataac ctcctcatag ctgacattga acttggctgc 600 ggcaactttg
gctcagtgcg ccagggcgtg taccgcatgc gcaagaagca gatcgacgtg 660
gccatcaagg tgctgaagca gggcacggag aaggcagaca cggaagagat gatgcgcgag
720 gcgcagatca tgcaccagct ggacaacccc tacatcgtgc ggctcattgg
cgtctgccag 780 gccgaggccc tcatgctggt catggagatg gctgggggcg
ggccgctgca caagttcctg 840 gtcggcaaga gggaggagat ccctgtgagc
aatgtggccg agctgctgca ccaggtgtcc 900 atggggatga agtacctgga
ggagaagaac tttgtgcacc gtgacctggc ggcccgcaac 960 gtcctgctgg
ttaaccggca ctacgccaag atcagcgact ttggcctctc caaagcactg 1020
ggtgccgacg acagctacta cactgcccgc tcagcaggga agtggccgct caagtggtac
1080 gcacccgaat gcatcaactt ccgcaagttc tccagccgca gcgatgtctg
gagctatggg 1140 gtcaccatgt gggaggcctt gtcctacggc cagaagccct
acaagaagat gaaagggccg 1200 gaggtcatgg ccttcatcga gcagggcaag
cggatggagt gcccaccaga gtgtccaccc 1260 gaactgtacg cactcatgag
tgactgctgg atctacaagt gggaggatcg ccccgacttc 1320 ctgaccgtgg
agcagcgcat gcgagcctgt tactacagcc tggccagcaa ggtggaaggg 1380
cccccaggca gcacacagaa ggctgaggct gcctgtgcct ga 1422 29 434 PRT
Artificial Sequence Synthetic Sequence 29 Met Gly His His His His
His His Gly Ser Asp Ser Glu Val Asn Gln 1 5 10 15 Glu Ala Lys Pro
Glu Val Lys Pro Glu Val Lys Pro Glu Thr His Ile 20 25 30 Asn Leu
Lys Val Ser Asp Gly Ser Ser Glu Ile Phe Phe Lys Ile Lys 35 40 45
Lys Thr Thr Pro Leu Arg Arg Leu Met Glu Ala Phe Ala Lys Arg Gln 50
55 60 Gly Lys Glu Met Asp Ser Leu Arg Phe Leu Tyr Asp Gly Ile Arg
Ile 65 70 75 80 Gln Ala Asp Gln Thr Pro Glu Asp Leu Asp Met Glu Asp
Asn Asp Ile 85 90 95 Ile Glu Ala His Arg Glu Gln Ile Gly Gly Met
Gln Phe His Val Lys 100 105 110 Ser Gly Leu Gln Ile Lys Lys Asn Ala
Ile Ile Asp Asp Tyr Lys Val 115 120 125 Thr Ser Gln Val Leu Gly Leu
Gly Ile Asn Gly Lys Val Leu Gln Ile 130 135 140 Phe Asn Lys Arg Thr
Gln Glu Lys Phe Ala Leu Lys Met Leu Gln Asp 145 150 155 160 Cys Pro
Lys Ala Arg Arg Glu Val Glu Leu His Trp Arg Ala Ser Gln 165 170 175
Cys Pro His Ile Val Arg Ile Val Asp Val Tyr Glu Asn Leu Tyr Ala 180
185 190 Gly Arg Lys Cys Leu Leu Ile Val Met Glu Cys Leu Asp Gly Gly
Glu 195 200 205 Leu Phe Ser Arg Ile Gln Asp Arg Gly Asp Gln Ala Phe
Thr Glu Arg 210 215 220 Glu Ala Ser Glu Ile Met Lys Ser Ile Gly Glu
Ala Ile Gln Tyr Leu 225 230 235 240 His Ser Ile Asn Ile Ala His Arg
Asp Val Lys Pro Glu Asn Leu Leu 245 250 255 Tyr Thr Ser Lys Arg Pro
Asn Ala Ile Leu Lys Leu Thr Asp Phe Gly 260 265 270 Phe Ala Lys Glu
Thr Thr Ser His Asn Ser Leu Thr Thr Pro Cys Tyr 275 280 285 Thr Pro
Tyr Tyr Val Ala Pro Glu Val Leu Gly Pro Glu Lys Tyr Asp 290 295 300
Lys Ser Cys Asp Met Trp Ser Leu Gly Val Ile Met Tyr Ile Leu Leu 305
310 315 320 Cys Gly Tyr Pro Pro Phe Tyr Ser Asn His Gly Leu Ala Ile
Ser Pro 325 330 335 Gly Met Lys Thr Arg Ile Arg Met Gly Gln Tyr Glu
Phe Pro Asn Pro 340 345 350 Glu Trp Ser Glu Val Ser Glu Glu Val Lys
Met Leu Ile Arg Asn Leu 355 360 365 Leu Lys Thr Glu Pro Thr Gln Arg
Met Thr Ile Thr Glu Phe Met Asn 370 375 380 His Pro Trp Ile Met Gln
Ser Thr Lys Val Pro Gln Thr Pro Leu His 385 390 395 400 Thr Ser Arg
Val Leu Lys Glu Asp Lys Glu Arg Trp Glu Asp Val Lys 405 410 415 Glu
Glu Met Thr Ser Ala Leu Ala Thr Met Arg Val Asp Tyr Glu Gln 420 425
430 Ile Lys 30 1305 DNA Artificial Sequence Synthetic Sequence 30
atgggtcatc accatcatca tcacgggtcg gactcagaag tcaatcaaga agctaagcca
60 gaggtcaagc cagaagtcaa gcctgagact cacatcaatt taaaggtgtc
cgatggatct 120 tcagagatct tcttcaagat caaaaagacc actcctttaa
gaaggctgat ggaagcgttc 180 gctaaaagac agggtaagga aatggactcc
ttaagattct tgtacgacgg tattagaatt 240 caagctgatc agacccctga
agatttggac atggaggata acgatattat tgaggctcac 300 cgcgaacaga
ttggaggtat gcagttccac gtcaagtccg gcctgcagat caagaagaac 360
gccatcatcg atgactacaa ggtcaccagc caggtcctgg ggctgggcat caacggcaaa
420 gttttgcaga tcttcaacaa gaggacccag gagaaattcg ccctcaaaat
gcttcaggac 480 tgccccaagg cccgcaggga ggtggagctg cactggcggg
cctcccagtg cccgcacatc 540 gtacggatcg tggatgtgta cgagaatctg
tacgcaggga ggaagtgcct gctgattgtc 600 atggaatgtt tggacggtgg
agaactcttt agccgaatcc aggatcgagg agaccaggca 660 ttcacagaaa
gagaagcatc cgaaatcatg aagagcatcg gtgaggccat ccagtatctg 720
cattcaatca acattgccca tcgggatgtc aagcctgaga atctcttata cacctccaaa
780 aggcccaacg ccatcctgaa actcactgac tttggctttg ccaaggaaac
caccagccac 840 aactctttga ccactccttg ttatacaccg tactatgtgg
ctccagaagt gctgggtcca 900 gagaagtatg acaagtcctg tgacatgtgg
tccctgggtg tcatcatgta catcctgctg 960 tgtgggtatc cccccttcta
ctccaaccac ggccttgcca tctctccggg catgaagact 1020 cgcatccgaa
tgggccagta tgaatttccc aacccagaat ggtcagaagt atcagaggaa 1080
gtgaagatgc tcattcggaa tctgctgaaa acagagccca cccagagaat gaccatcacc
1140 gagtttatga accacccttg gatcatgcaa tcaacaaagg tccctcaaac
cccactgcac 1200 accagccggg tcctgaagga ggacaaggag cggtgggagg
atgtcaagga ggagatgacc 1260 agtgccttgg ccacaatgcg cgttgactac
gagcagatca agtaa 1305 31 1130 PRT Artificial Sequence Synthetic
Sequence 31 Met Gly His His His His His His Gly Ser Asp Ser Glu Val
Asn Gln 1 5 10 15 Glu Ala Lys Pro Glu Val Lys Pro Glu Val Lys Pro
Glu Thr His Ile 20 25 30 Asn Leu Lys Val Ser Asp Gly Ser Ser Glu
Ile Phe Phe Lys Ile Lys 35 40 45 Lys Thr Thr Pro Leu Arg Arg Leu
Met Glu Ala Phe Ala Lys Arg Gln 50 55 60 Gly Lys Glu Met Asp Ser
Leu Arg Phe Leu Tyr Asp Gly Ile Arg Ile 65 70 75 80 Gln Ala Asp Gln
Thr Pro Glu Asp Leu Asp Met Glu Asp Asn Asp Ile 85 90 95 Ile Glu
Ala His Arg Glu Gln Ile Gly Gly Met Thr Met Ile Thr Asp 100 105 110
Ser Leu Ala Val Val Leu Gln Arg Arg Asp Trp Glu Asn Pro Gly Val 115
120 125 Thr Gln Leu Asn Arg Leu Ala Ala His Pro Pro Phe Ala Ser Trp
Arg 130 135 140 Asn Ser Glu Glu Ala Arg Thr Asp Arg Pro Ser Gln Gln
Leu Arg Ser 145 150 155 160 Leu Asn Gly Glu Trp Arg Phe Ala Trp Phe
Pro Ala Pro Glu Ala Val 165 170 175 Pro Glu Ser Trp Leu Glu Cys Asp
Leu Pro Glu Ala Asp Thr Val Val 180 185 190 Val Pro Ser Asn Trp Gln
Met His Gly Tyr Asp Ala Pro Ile Tyr Thr 195 200 205 Asn Val Thr Tyr
Pro Ile Thr Val Asn Pro Pro Phe Val Pro Thr Glu 210 215 220 Asn Pro
Thr Gly Cys Tyr Ser Leu Thr Phe Asn Val Asp Glu Ser Trp 225 230 235
240 Leu Gln Glu Gly Gln Thr Arg Ile Ile Phe Asp Gly Val Asn Ser Ala
245 250 255 Phe His Leu Trp Cys Asn Gly Arg Trp Val Gly Tyr Gly Gln
Asp Ser 260 265 270 Arg Leu Pro Ser Glu Phe Asp Leu Ser Ala Phe Leu
Arg Ala Gly Glu 275 280 285 Asn Arg Leu Ala Val Met Val Leu Arg Trp
Ser Asp Gly Ser Tyr Leu 290 295 300 Glu Asp Gln Asp Met Trp Arg Met
Ser Gly Ile Phe Arg Asp Val Ser 305 310 315 320 Leu Leu His Lys Pro
Thr Thr Gln Ile Ser Asp Phe His Val Ala Thr 325 330 335 Arg Phe Asn
Asp Asp Phe Ser Arg Ala Val Leu Glu Ala Glu Val Gln 340 345 350 Met
Cys Gly Glu Leu Arg Asp Tyr Leu Arg Val Thr Val Ser Leu Trp 355 360
365 Gln Gly Glu Thr Gln Val Ala Ser Gly Thr Ala Pro Phe Gly Gly Glu
370 375 380 Ile Ile Asp Glu Arg Gly Gly Tyr Ala Asp Arg Val Thr Leu
Arg Leu 385 390 395 400 Asn Val Glu Asn Pro Lys Leu Trp Ser Ala Glu
Ile Pro Asn Leu Tyr 405 410 415 Arg Ala Val Val Glu Leu His Thr Ala
Asp Gly Thr Leu Ile Glu Ala 420 425 430 Glu Ala Cys Asp Val Gly Phe
Arg Glu Val Arg Ile Glu Asn Gly Leu 435 440 445 Leu Leu Leu Asn Gly
Lys Pro Leu Leu Ile Arg Gly Val Asn Arg His 450 455 460 Glu His His
Pro Leu His Gly Gln Val Met Asp Glu Gln Thr Met Val 465 470 475 480
Gln Asp Ile Leu Leu Met Lys Gln Asn Asn Phe Asn Ala Val Arg Cys 485
490 495 Ser His Tyr Pro Asn His Pro Leu Trp Tyr Thr Leu Cys Asp Arg
Tyr 500 505 510 Gly Leu Tyr Val Val Asp Glu Ala Asn Ile Glu Thr His
Gly Met Val 515 520 525 Pro Met Asn Arg Leu Thr Asp Asp Pro Arg Trp
Leu Pro Ala Met Ser 530 535 540 Glu Arg Val Thr Arg Met Val Gln Arg
Asp Arg Asn His Pro Ser Val 545 550 555 560 Ile Ile Trp Ser Leu Gly
Asn Glu Ser Gly His Gly Ala Asn His Asp 565 570 575 Ala Leu Tyr Arg
Trp Ile Lys Ser Val Asp Pro Ser Arg Pro Val Gln 580 585 590 Tyr Glu
Gly Gly Gly Ala Asp Thr Thr Ala Thr Asp Ile Ile Cys Pro 595 600 605
Met Tyr Ala Arg Val Asp Glu Asp Gln Pro Phe Pro Ala Val Pro Lys 610
615 620 Trp Ser Ile Lys Lys Trp Leu Ser Leu Pro Gly Glu Thr Arg Pro
Leu 625 630 635 640 Ile Leu Cys Glu Tyr Ala His Ala Met Gly Asn Ser
Leu Gly Gly Phe 645 650 655 Ala Lys Tyr Trp Gln Ala Phe Arg Gln Tyr
Pro Arg Leu Gln Gly Gly 660 665 670 Phe Val Trp Asp Trp Val Asp Gln
Ser Leu Ile Lys Tyr Asp Glu Asn 675 680 685 Gly Asn Pro Trp Ser Ala
Tyr Gly Gly Asp Phe Gly Asp Thr Pro Asn 690 695 700 Asp Arg Gln Phe
Cys Met Asn Gly Leu Val Phe Ala Asp Arg Thr Pro 705 710 715 720 His
Pro Ala Leu Thr Glu Ala Lys His Gln Gln Gln Phe Phe Gln Phe 725 730
735 Arg Leu Ser Gly Gln Thr Ile Glu Val Thr Ser Glu Tyr Leu Phe Arg
740 745 750 His Ser Asp Asn Glu Leu Leu His Trp Met Val Ala Leu Asp
Gly Lys 755 760 765 Pro Leu Ala Ser Gly Glu Val Pro Leu Asp Val Ala
Pro Gln Gly Lys 770 775 780 Gln Leu Ile Glu Leu Pro Glu Leu Pro Gln
Pro Glu Ser Ala Gly Gln 785 790 795 800 Leu Trp Leu Thr Val Arg Val
Val Gln Pro Asn Ala Thr Ala Trp Ser 805 810 815 Glu Ala Gly His Ile
Ser Ala Trp Gln Gln Trp Arg Leu Ala Glu Asn 820 825 830 Leu Ser Val
Thr Leu Pro Ala Ala Ser His Ala Ile Pro His Leu Thr 835 840 845 Thr
Ser Glu Met Asp Phe Cys Ile Glu Leu Gly Asn Lys Arg Trp Gln 850 855
860 Phe Asn Arg Gln Ser Gly Phe Leu Ser Gln Met Trp Ile Gly Asp Lys
865 870 875 880 Lys Gln Leu Leu Thr Pro Leu Arg Asp Gln Phe Thr Arg
Ala Pro Leu 885 890 895 Asp Asn Asp Ile Gly Val Ser Glu Ala Thr Arg
Ile Asp Pro Asn Ala 900 905 910 Trp Val Glu Arg Trp Lys Ala Ala Gly
His Tyr Gln Ala Glu Ala Ala 915 920 925 Leu Leu Gln Cys Thr Ala Asp
Thr Leu Ala Asp Ala Val Leu Ile Thr 930 935 940 Thr Ala His Ala Trp
Gln His Gln Gly Lys Thr Leu Phe Ile Ser Arg 945 950 955 960 Lys Thr
Tyr Arg Ile Asp Gly Ser Gly Gln Met Ala Ile Thr Val Asp 965 970 975
Val Glu Val Ala Ser Asp Thr Pro His Pro Ala Arg Ile Gly Leu Asn 980
985 990 Cys Gln Leu Ala Gln Val Ala Glu Arg Val Asn Trp Leu Gly Leu
Gly 995 1000 1005 Pro Gln Glu Asn Tyr Pro Asp Arg Leu Thr Ala Ala
Cys Phe Asp Arg 1010 1015 1020 Trp Asp Leu Pro Leu Ser Asp Met Tyr
Thr Pro Tyr Val Phe Pro Ser 1025 1030 1035 1040 Glu Asn Gly Leu Arg
Cys Gly Thr Arg Glu Leu Asn Tyr Gly Pro His 1045 1050 1055 Gln Trp
Arg Gly Asp Phe Gln Phe Asn Ile Ser Arg Tyr Ser Gln Gln 1060 1065
1070 Gln Leu Met Glu Thr Ser His Arg His Leu Leu His Ala Glu Glu
Gly 1075 1080 1085 Thr Trp Leu Asn Ile Asp Gly Phe His Met Gly Ile
Gly Gly Asp Asp 1090 1095 1100 Ser Trp Ser Pro Ser Val Ser Ala Glu
Phe Gln Leu Ser Ala Gly Arg 1105 1110 1115 1120 Tyr His Tyr Gln Leu
Val Trp Cys Gln Lys 1125 1130 32 3396 DNA Artificial Sequence
Synthetic Sequence 32 atgggtcatc accatcatca tcacgggtcg gactcagaag
tcaatcaaga agctaagcca 60 gaggtcaagc cagaagtcaa gcctgagact
cacatcaatt taaaggtgtc cgatggatct 120 tcagagatct tcttcaagat
caaaaagacc actcctttaa gaaggctgat ggaagcgttc 180 gctaaaagac
agggtaagga aatggactcc ttaagattct tgtacgacgg tattagaatt 240
caagctgatc agacccctga agatttggac atggaggata acgatattat tgaggctcac
300 cgcgaacaga ttggaggtat gaccatgatt acggattcac tggccgtcgt
tttacaacgt 360 cgtgactggg aaaaccctgg cgttacccaa
cttaatcgcc ttgcagcaca tccccctttc 420 gccagctggc gtaatagcga
agaggcccgc accgatcgcc cttcccaaca gttgcgcagc 480 ctgaatggcg
aatggcgctt tgcctggttt ccggcaccag aagcggtgcc ggaaagctgg 540
ctggagtgcg atcttcctga ggccgatact gtcgtcgtcc cctcaaactg gcagatgcac
600 ggttacgatg cgcccatcta caccaacgta acctatccca ttacggtcaa
tccgccgttt 660 gttcccacgg agaatccgac gggttgttac tcgctcacat
ttaatgttga tgaaagctgg 720 ctacaggaag gccagacgcg aattattttt
gatggcgtta actcggcgtt tcatctgtgg 780 tgcaacgggc gctgggtcgg
ttacggccag gacagtcgtt tgccgtctga atttgacctg 840 agcgcatttt
tacgcgccgg agaaaaccgc ctcgcggtga tggtgctgcg ttggagtgac 900
ggcagttatc tggaagatca ggatatgtgg cggatgagcg gcattttccg tgacgtctcg
960 ttgctgcata aaccgactac acaaatcagc gatttccatg ttgccactcg
ctttaatgat 1020 gatttcagcc gcgctgtact ggaggctgaa gttcagatgt
gcggcgagtt gcgtgactac 1080 ctacgggtaa cagtttcttt atggcagggt
gaaacgcagg tcgccagcgg caccgcgcct 1140 ttcggcggtg aaattatcga
tgagcgtggt ggttatgccg atcgcgtcac actacgtctg 1200 aacgtcgaaa
acccgaaact gtggagcgcc gaaatcccga atctctatcg tgcggtggtt 1260
gaactgcaca ccgccgacgg cacgctgatt gaagcagaag cctgcgatgt cggtttccgc
1320 gaggtgcgga ttgaaaatgg tctgctgctg ctgaacggca agccgttgct
gattcgaggc 1380 gttaaccgtc acgagcatca tcctctgcat ggtcaggtca
tggatgagca gacgatggtg 1440 caggatatcc tgctgatgaa gcagaacaac
tttaacgccg tgcgctgttc gcattatccg 1500 aaccatccgc tgtggtacac
gctgtgcgac cgctacggcc tgtatgtggt ggatgaagcc 1560 aatattgaaa
cccacggcat ggtgccaatg aatcgtctga ccgatgatcc gcgctggcta 1620
ccggcgatga gcgaacgcgt aacgcgaatg gtgcagcgcg atcgtaatca cccgagtgtg
1680 atcatctggt cgctggggaa tgaatcaggc cacggcgcta atcacgacgc
gctgtatcgc 1740 tggatcaaat ctgtcgatcc ttcccgcccg gtgcagtatg
aaggcggcgg agccgacacc 1800 acggccaccg atattatttg cccgatgtac
gcgcgcgtgg atgaagacca gcccttcccg 1860 gctgtgccga aatggtccat
caaaaaatgg ctttcgctac ctggagagac gcgcccgctg 1920 atcctttgcg
aatacgccca cgcgatgggt aacagtcttg gcggtttcgc taaatactgg 1980
caggcgtttc gtcagtatcc ccgtttacag ggcggcttcg tctgggactg ggtggatcag
2040 tcgctgatta aatatgatga aaacggcaac ccgtggtcgg cttacggcgg
tgattttggc 2100 gatacgccga acgatcgcca gttctgtatg aacggtctgg
tctttgccga ccgcacgccg 2160 catccagcgc tgacggaagc aaaacaccag
cagcagtttt tccagttccg tttatccggg 2220 caaaccatcg aagtgaccag
cgaatacctg ttccgtcata gcgataacga gctcctgcac 2280 tggatggtgg
cgctggatgg taagccgctg gcaagcggtg aagtgcctct ggatgtcgct 2340
ccacaaggta aacagttgat tgaactgcct gaactaccgc agccggagag cgccgggcaa
2400 ctctggctca cagtacgcgt agtgcaaccg aacgcgaccg catggtcaga
agccgggcac 2460 atcagcgcct ggcagcagtg gcgtctggcg gaaaacctca
gtgtgacgct ccccgccgcg 2520 tcccacgcca tcccgcatct gaccaccagc
gaaatggatt tttgcatcga gctgggtaat 2580 aagcgttggc aatttaaccg
ccagtcaggc tttctttcac agatgtggat tggcgataaa 2640 aaacaactgc
tgacgccgct gcgcgatcag ttcacccgtg caccgctgga taacgacatt 2700
ggcgtaagtg aagcgacccg cattgaccct aacgcctggg tcgaacgctg gaaggcggcg
2760 ggccattacc aggccgaagc agcgttgttg cagtgcacgg cagatacact
tgctgatgcg 2820 gtgctgatta cgaccgctca cgcgtggcag catcagggga
aaaccttatt tatcagccgg 2880 aaaacctacc ggattgatgg tagtggtcaa
atggcgatta ccgttgatgt tgaagtggcg 2940 agcgatacac cgcatccggc
gcggattggc ctgaactgcc agctggcgca ggtagcagag 3000 cgggtaaact
ggctcggatt agggccgcaa gaaaactatc ccgaccgcct tactgccgcc 3060
tgttttgacc gctgggatct gccattgtca gacatgtata ccccgtacgt cttcccgagc
3120 gaaaacggtc tgcgctgcgg gacgcgcgaa ttgaattatg gcccacacca
gtggcgcggc 3180 gacttccagt tcaacatcag ccgctacagt caacagcaac
tgatggaaac cagccatcgc 3240 catctgctgc acgcggaaga aggcacatgg
ctgaatatcg acggtttcca tatggggatt 3300 ggtggcgacg actcctggag
cccgtcagta tcggcggaat tccagctgag cgccggtcgc 3360 taccattacc
agttggtctg gtgtcaaaaa taataa 3396 33 6865 DNA Artificial Sequence
Synthetic Sequence 33 cgccttgtta ctagttagaa aaagacattt ttgctgtcag
tcactgtcaa gagattcttt 60 tgctggcatt tcttctagaa gcaaaaagag
cgatgcgtct tttccgctga accgttccag 120 caaaaaagac taccaacgca
atatggattg tcagaatcat ataaaagaga agcaaataac 180 tccttgtctt
gtatcaattg cattataata tcttcttgtt agtgcaatat catatagaag 240
tcatcgaaat agatattaag aaaaacaaac tgtacaatcc atgggtcatc accatcatca
300 tcacgggtcg gactcagaag tcaatcaaga agctaagcca gaggtcaagc
cagaagtcaa 360 gcctgagact cacatcaatt taaaggtgtc cgatggatct
tcagagatct tcttcaagat 420 caaaaagacc actcctttaa gaaggctgat
ggaagcgttc gctaaaagac agggtaagga 480 aatggactcc ttaagattct
tgtacgacgg tattagaatt caagctgatc agacccctga 540 agatttggac
atggaggata acgatattat tgaggctcac cgcgaacaga ttggaggtat 600
ggtgagcaag ggcgaggagc tgttcaccgg ggtggtgccc atcctggtcg agctggacgg
660 cgacgtaaac ggccacaagt tcagcgtgtc cggcgagggc gagggcgatg
ccacctacgg 720 caagctgacc ctgaagttca tctgcaccac cggcaagctg
cccgtgccct ggcccaccct 780 cgtgaccacc ctgacctacg gcgtgcagtg
cttcagccgc taccccgacc acatgaagca 840 gcacgacttc ttcaagtccg
ccatgcccga aggctacgtc caggagcgca ccatcttctt 900 caaggacgac
ggcaactaca agacccgcgc cgaggtgaag ttcgagggcg acaccctggt 960
gaaccgcatc gagctgaagg gcatcgactt caaggaggac ggcaacatcc tggggcacaa
1020 gctggagtac aactacaaca gccacaacgt ctatatcatg gccgacaagc
agaagaacgg 1080 catcaaggtg aacttcaaga tccgccacaa catcgaggac
ggcagcgtgc agctcgccga 1140 ccactaccag cagaacaccc ccatcggcga
cggccccgtg ctgctgcccg acaaccacta 1200 cctgagcacc cagtccgccc
tgagcaaaga ccccaacgag aagcgcgatc acatggtcct 1260 gctggagttc
gtgaccgccg ccgggatcac tctcggcatg gacgagctgt acaagtaata 1320
agcttgcggc cgcactcgag gagctccctg gcgaattgta ccaagatggc ctttggtggg
1380 ttgaagaagg aaaaagacag aaacgactta attacctact tgaaaaaagc
ctgtgagtaa 1440 acaggcccct tttcctttgt cgatatcatg taattagtta
tgtcacgctt acattcacgc 1500 cctcccccca catccgctct aaccgaaaag
gaaggagtta gacaacctga agtctaggtc 1560 cctatttatt tttttatagt
tatgttagta ttaagaacgt tatttatatt tcaaattttt 1620 cttttttttc
tgtacagacg cgtgtacgca tgtaacatta tactgaaaac cttgcttgag 1680
aaggttttgg gacgctcgaa ggctttaatt tgcaagctta tcgatgataa gctgtcaaac
1740 atgagaattc ggtcgaaaaa agaaaaggag agggccaaga gggagggcat
tggtgactat 1800 tgagcacgtg agtatacgtg attaagcaca caaaggcagc
ttggagtatg tctgttatta 1860 atttcacagg tagttctggt ccattggtga
aagtttgcgg cttgcagagc acagaggccg 1920 cagaatgtgc tctagattcc
gatgctgact tgctgggtat tatatgtgtg cccaatagaa 1980 agagaacaat
tgacccggtt attgcaagga aaatttcaag tcttgtaaaa gcatataaaa 2040
atagttcagg cactccgaaa tacttggttg gcgtgtttcg taatcaacct aaggaggatg
2100 ttttggctct ggtcaatgat tacggcattg atatcgtcca actgcatgga
gatgagtcgt 2160 ggcaagaata ccaagagttc ctcggtttgc cagttattaa
aagactcgta tttccaaaag 2220 actgcaacat actactcagt gcagcttcac
agaaacctca ttcgtttatt cccttgtttg 2280 attcagaagc aggtgggaca
ggtgaacttt tggattggaa ctcgatttct gactgggttg 2340 gaaggcaaga
gagccccgaa agcttacatt ttatgttagc tggtggactg acgccagaaa 2400
atgttggtga tgcgcttaga ttaaatggcg ttattggtgt tgatgtaagc ggaggtgtgg
2460 agacaaatgg tgtaaaagac tctaacaaaa tagcaaattt cgtcaaaaat
gctaagaaat 2520 aggttattac tgagtagtat ttatttaagt attgtttgtg
cacttgcctg cagcttctca 2580 atgatattcg aatacgcttt gaggagatac
agcctaatat ccgacaaact gttttacaga 2640 tttacgatcg tacttgttac
ccatcattga attttgaaca tccgaacctg ggagttttcc 2700 ctgaaacaga
tagtatattt gaacctgtat aataatatat agtctagcgc tttacggaag 2760
acaatgtatg tatttcggtt cctggagaaa ctattgcatc tattgcatag gtaatcttgc
2820 acgtcgcatc cccggttcat tttctgcgtt tccatcttgc acttcaatag
catatctttg 2880 ttaacgaagc atctgtgctt cattttgtag aacaaaaatg
caacgcgaga gcgctaattt 2940 ttcaaacaaa gaatctgagc tgcattttta
cagaacagaa atgcaacgcg aaagcgctat 3000 tttaccaacg aagaatctgt
gcttcatttt tgtaaaacaa aaatgcaacg cgagagcgct 3060 aatttttcaa
acaaagaatc tgagctgcat ttttacagaa cagaaatgca acgcgagagc 3120
gctattttac caacaaagaa tctatacttc ttttttgttc tacaaaaatg catcccgaga
3180 gcgctatttt tctaacaaag catcttagat tacttttttt ctcctttgtg
cgctctataa 3240 tgcagtctct tgataacttt ttgcactgta ggtccgttaa
ggttagaaga aggctacttt 3300 ggtgtctatt ttctcttcca taaaaaaagc
ctgactccac ttcccgcgtt tactgattac 3360 tagcgaagct gcgggtgcat
tttttcaaga taaaggcatc cccgattata ttctataccg 3420 atgtggattg
cgcatacttt gtgaacagaa agtgatagcg ttgatgattc ttcattggtc 3480
agaaaattat gaacggtttc ttctattttg tctctatata ctacgtatag gaaatgttta
3540 cattttcgta ttgttttcga ttcactctat gaatagttct tactacaatt
tttttgtcta 3600 aagagtaata ctagagataa acataaaaaa tgtagaggtc
gagtttagat gcaagttcaa 3660 ggagcgaaag gtggatgggt aggttatata
gggatatagc acagagatat atagcaaaga 3720 gatacttttg agcaatgttt
gtggaagcgg tattcgcaat attttagtag ctcgttacag 3780 tccggtgcgt
ttttggtttt ttgaaagtgc gtcttcagag cgcttttggt tttcaaaagc 3840
gctctgaagt tcctatactt tctagagaat aggaacttcg gaataggaac ttcaaagcgt
3900 ttccgaaaac gagcgcttcc gaaaatgcaa cgcgagctgc gcacatacag
ctcactgttc 3960 acgtcgcacc tatatctgcg tgttgcctgt atatatatat
acatgagaag aacggcatag 4020 tgcgtgttta tgcttaaatg cgtacttata
tgcgtctatt tatgtaggat gaaaggtagt 4080 ctagtacctc ctgtgatatt
atcccattcc atgcggggta tcgtatgctt ccttcagcac 4140 taccctttag
ctgttctata tgctgccact cctcaattgg attagtctca tccttcaatg 4200
ctatcatttc ctttgatatt ggatcatatg catagtaccg agaaactagt gcgaagtagt
4260 gatcaggtat tgctgttatc tgatgagtat acgttgtcct ggccacggca
gaagcacgct 4320 tatcgctcca atttcccaca acattagtca actccgttag
gcccttcatt gaaagaaatg 4380 aggtcatcaa atgtcttcca atgtgagatt
ttgggccatt ttttatagca aagattgaat 4440 aaggcgcatt tttcttcaaa
gctttattgt acgatctgac taagttatct tttaataatt 4500 ggtattcctg
tttattgctt gaagaattgc cggtcctatt tactcgtttt aggactggtt 4560
cagaattctt gaagacgaaa gggcctcgtg atacgcctat ttttataggt taatgtcatg
4620 ataataatgg tttcttagac gtcaggtggc acttttcggg gaaatgtgcg
cggaacccct 4680 atttgtttat ttttctaaat acattcaaat atgtatccgc
tcatgagaca ataaccctga 4740 taaatgcttc aataatattg aaaaaggaag
agtatgagta ttcaacattt ccgtgtcgcc 4800 cttattccct tttttgcggc
attttgcctt cctgtttttg ctcacccaga aacgctggtg 4860 aaagtaaaag
atgctgaaga tcagttgggt gcacgagtgg gttacatcga actggatctc 4920
aacagcggta agatccttga gagttttcgc cccgaagaac gttttccaat gatgagcact
4980 tttaaagttc tgctatgtgg cgcggtatta tcccgtgttg acgccgggca
agagcaactc 5040 ggtcgccgca tacactattc tcagaatgac ttggttgagt
actcaccagt cacagaaaag 5100 catcttacgg atggcatgac agtaagagaa
ttatgcagtg ctgccataac catgagtgat 5160 aacactgcgg ccaacttact
tctgacaacg atcggaggac cgaaggagct aaccgctttt 5220 ttgcacaaca
tgggggatca tgtaactcgc cttgatcgtt gggaaccgga gctgaatgaa 5280
gccataccaa acgacgagcg tgacaccacg atgcctgcag caatggcaac aacgttgcgc
5340 aaactattaa ctggcgaact acttactcta gcttcccggc aacaattaat
agactggatg 5400 gaggcggata aagttgcagg accacttctg cgctcggccc
ttccggctgg ctggtttatt 5460 gctgataaat ctggagccgg tgagcgtggg
tctcgcggta tcattgcagc actggggcca 5520 gatggtaagc cctcccgtat
cgtagttatc tacacgacgg ggagtcaggc aactatggat 5580 gaacgaaata
gacagatcgc tgagataggt gcctcactga ttaagcattg gtaactgtca 5640
gaccaagttt actcatatat actttagatt gatttaaaac ttcattttta atttaaaagg
5700 atctaggtga agatcctttt tgataatctc atgaccaaaa tcccttaacg
tgagttttcg 5760 ttccactgag cgtcagaccc cgtagaaaag atcaaaggat
cttcttgaga tccttttttt 5820 ctgcgcgtaa tctgctgctt gcaaacaaaa
aaaccaccgc taccagcggt ggtttgtttg 5880 ccggatcaag agctaccaac
tctttttccg aaggtaactg gcttcagcag agcgcagata 5940 ccaaatactg
tccttctagt gtagccgtag ttaggccacc acttcaagaa ctctgtagca 6000
ccgcctacat acctcgctct gctaatcctg ttaccagtgg ctgctgccag tggcgataag
6060 tcgtgtctta ccgggttgga ctcaagacga tagttaccgg ataaggcgca
gcggtcgggc 6120 tgaacggggg gttcgtgcac acagcccagc ttggagcgaa
cgacctacac cgaactgaga 6180 tacctacagc gtgagctatg agaaagcgcc
acgcttcccg aagggagaaa ggcggacagg 6240 tatccggtaa gcggcagggt
cggaacagga gagcgcacga gggagcttcc agggggaaac 6300 gcctggtatc
tttatagtcc tgtcgggttt cgccacctct gacttgagcg tcgatttttg 6360
tgatgctcgt caggggggcg gagcctatgg aaaaacgcca gcaacgcggc ctttttacgg
6420 ttcctggcct tttgctggcc ttttgctcac atgttctttc ctgcgttatc
ccctgattct 6480 gtggataacc gtattaccgc ctttgagtga gctgataccg
ctcgccgcag ccgaacgacc 6540 gagcgcagcg agtcagtgag cgaggaagcg
gaagagcgcc tgatgcggta ttttctcctt 6600 acgcatctgt gcggtatttc
acaccgcata tggtgcactc tcagtacaat ctgctctgat 6660 gccgcatagt
taagccagta tacactccgc tatcgctacg tgactgggtc atggctgcgc 6720
cccgacaccc gccaacaccc gctgacgcgc cctgacgggc ttgtctgctc ccggcatccg
6780 cttacagaca agctgtgacc gtctccggga gctgcatgtg tcagaggttt
tcaccgtcat 6840 caccgaaacg cgcgaggcag ggatc 6865 34 7894 DNA
Artificial Sequence Synthetic Sequence 34 ccttgttact agttagaaaa
agacattttt gctgtcagtc actgtcaaga gattcttttg 60 ctggcatttc
ttctagaagc aaaaagagcg atgcgtcttt tccgctgaac cgttccagca 120
aaaaagacta ccaacgcaat atggattgtc agaatcatat aaaagagaag caaataactc
180 cttgtcttgt atcaattgca ttataatatc ttcttgttag tgcaatatca
tatagaagtc 240 atcgaaatag atattaagaa aaacaaactg tacaatccat
gggtcatcac catcatcatc 300 acgggcagat cttcgtcaag acgttaaccg
gtaaaaccat aactctagaa gttgaaccat 360 ccgataccat cgaaaacgtt
aaggctaaaa ttcaagacaa ggaaggcatt ccacctgatc 420 aacaaagatt
gatctttgcc ggtaagcagc tcgaggacgg tagaacgctg tctgattaca 480
acattcagaa ggagtcgacc ttacatcttg tcttacgcct acgtggaggt atggaattca
540 tgttacgtcc tgtagaaacc ccaacccgtg aaatcaaaaa actcgacggc
ctgtgggcat 600 tcagtctgga tcgcgaaaac tgtggaattg atcagcgttg
gtgggaaagc gcgttacaag 660 aaagccgggc aattgctgtg ccaggcagtt
ttaacgatca gttcgccgat gcagatattc 720 gtaattatgc gggcaacgtc
tggtatcagc gcgaagtctt tataccgaaa ggttgggcag 780 gccagcgtat
cgtgctgcgt ttcgatgcgg tcactcatta cggcaaagtg tgggtcaata 840
atcaggaagt gatggagcat cagggcggct atacgccatt tgaagccgat gtcacgccgt
900 atgttattgc cgggaaaagt gtacgtatca ccgtttgtgt gaacaacgaa
ctgaactggc 960 agactatccc gccgggaatg gtgattaccg acgaaaacgg
caagaaaaag cagtcttact 1020 tccatgattt ctttaactat gccggaatcc
atcgcagcgt aatgctctac accacgccga 1080 acacctgggt ggacgatatc
accgtggtga cgcatgtcgc gcaagactgt aaccacgcgt 1140 ctgttgactg
gcaggtggtg gccaatggtg atgtcagcgt tgaactgcgt gatgcggatc 1200
aacaggtggt tgcaactgga caaggcacta gcgggacttt gcaagtggtg aatccgcacc
1260 tctggcaacc gggtgaaggt tatctctatg aactgtgcgt cacagccaaa
agccagacag 1320 agtgtgatat ctacccgctt cgcgtcggca tccggtcagt
ggcagtgaag ggccaacagt 1380 tcctgattaa ccacaaaccg ttctacttta
ctggctttgg tcgtcatgaa gatgcggact 1440 tacgtggcaa aggattcgat
aacgtgctga tggtgcacga ccacgcatta atggactgga 1500 ttggggccaa
ctcctaccgt acctcgcatt acccttacgc tgaagagatg ctcgactggg 1560
cagatgaaca tggcatcgtg gtgattgatg aaactgctgc tgtcggcttt aacctctctt
1620 taggcattgg tttcgaagcg ggcaacaagc cgaaagaact gtacagcgaa
gaggcagtca 1680 acggggaaac tcagcaagcg cacttacagg cgattaaaga
gctgatagcg cgtgacaaaa 1740 accacccaag cgtggtgatg tggagtattg
ccaacgaacc ggatacccgt ccgcaagtgc 1800 acgggaatat ttcgccactg
gcggaagcaa cgcgtaaact cgacccgacg cgtccgatca 1860 cctgcgtcaa
tgtaatgttc tgcgacgctc acaccgatac catcagcgat ctctttgatg 1920
tgctgtgcct gaaccgttat tacggatggt atgtccaaag cggcgatttg gaaacggcag
1980 agaaggtact ggaaaaagaa cttctggcct ggcaggagaa actgcatcag
ccgattatca 2040 tcaccgaata cggcgtggat acgttagccg ggctgcactc
aatgtacacc gacatgtgga 2100 gtgaagagta tcagtgtgca tggctggata
tgtatcaccg cgtctttgat cgcgtcagcg 2160 ccgtcgtcgg tgaacaggta
tggaatttcg ccgattttgc gacctcgcaa ggcatattgc 2220 gcgttggcgg
taacaagaaa gggatcttca ctcgcgaccg caaaccgaag tcggcggctt 2280
ttctgctgca aaaacgctgg actggcatga acttcggtga aaaaccgcag cagggaggca
2340 aacaataagc ttgcggccgc actcgaggag ctccctggcg aattgtacca
agatggcctt 2400 tggtgggttg aagaaggaaa aagacagaaa cgacttaatt
acctacttga aaaaagcctg 2460 tgagtaaaca ggcccctttt cctttgtcga
tatcatgtaa ttagttatgt cacgcttaca 2520 ttcacgccct ccccccacat
ccgctctaac cgaaaaggaa ggagttagac aacctgaagt 2580 ctaggtccct
atttattttt ttatagttat gttagtatta agaacgttat ttatatttca 2640
aatttttctt ttttttctgt acagacgcgt gtacgcatgt aacattatac tgaaaacctt
2700 gcttgagaag gttttgggac gctcgaaggc tttaatttgc aagcttatcg
atgataagct 2760 gtcaaacatg agaattcggt cgaaaaaaga aaaggagagg
gccaagaggg agggcattgg 2820 tgactattga gcacgtgagt atacgtgatt
aagcacacaa aggcagcttg gagtatgtct 2880 gttattaatt tcacaggtag
ttctggtcca ttggtgaaag tttgcggctt gcagagcaca 2940 gaggccgcag
aatgtgctct agattccgat gctgacttgc tgggtattat atgtgtgccc 3000
aatagaaaga gaacaattga cccggttatt gcaaggaaaa tttcaagtct tgtaaaagca
3060 tataaaaata gttcaggcac tccgaaatac ttggttggcg tgtttcgtaa
tcaacctaag 3120 gaggatgttt tggctctggt caatgattac ggcattgata
tcgtccaact gcatggagat 3180 gagtcgtggc aagaatacca agagttcctc
ggtttgccag ttattaaaag actcgtattt 3240 ccaaaagact gcaacatact
actcagtgca gcttcacaga aacctcattc gtttattccc 3300 ttgtttgatt
cagaagcagg tgggacaggt gaacttttgg attggaactc gatttctgac 3360
tgggttggaa ggcaagagag ccccgaaagc ttacatttta tgttagctgg tggactgacg
3420 ccagaaaatg ttggtgatgc gcttagatta aatggcgtta ttggtgttga
tgtaagcgga 3480 ggtgtggaga caaatggtgt aaaagactct aacaaaatag
caaatttcgt caaaaatgct 3540 aagaaatagg ttattactga gtagtattta
tttaagtatt gtttgtgcac ttgcctgcag 3600 cttctcaatg atattcgaat
acgctttgag gagatacagc ctaatatccg acaaactgtt 3660 ttacagattt
acgatcgtac ttgttaccca tcattgaatt ttgaacatcc gaacctggga 3720
gttttccctg aaacagatag tatatttgaa cctgtataat aatatatagt ctagcgcttt
3780 acggaagaca atgtatgtat ttcggttcct ggagaaacta ttgcatctat
tgcataggta 3840 atcttgcacg tcgcatcccc ggttcatttt ctgcgtttcc
atcttgcact tcaatagcat 3900 atctttgtta acgaagcatc tgtgcttcat
tttgtagaac aaaaatgcaa cgcgagagcg 3960 ctaatttttc aaacaaagaa
tctgagctgc atttttacag aacagaaatg caacgcgaaa 4020 gcgctatttt
accaacgaag aatctgtgct tcatttttgt aaaacaaaaa tgcaacgcga 4080
gagcgctaat ttttcaaaca aagaatctga gctgcatttt tacagaacag aaatgcaacg
4140 cgagagcgct attttaccaa caaagaatct atacttcttt tttgttctac
aaaaatgcat 4200 cccgagagcg ctatttttct aacaaagcat cttagattac
tttttttctc ctttgtgcgc 4260 tctataatgc agtctcttga taactttttg
cactgtaggt ccgttaaggt tagaagaagg 4320 ctactttggt gtctattttc
tcttccataa aaaaagcctg actccacttc ccgcgtttac 4380 tgattactag
cgaagctgcg ggtgcatttt ttcaagataa aggcatcccc gattatattc 4440
tataccgatg tggattgcgc atactttgtg aacagaaagt gatagcgttg atgattcttc
4500 attggtcaga aaattatgaa cggtttcttc tattttgtct ctatatacta
cgtataggaa 4560 atgtttacat tttcgtattg ttttcgattc actctatgaa
tagttcttac tacaattttt 4620 ttgtctaaag agtaatacta gagataaaca
taaaaaatgt agaggtcgag tttagatgca 4680 agttcaagga gcgaaaggtg
gatgggtagg ttatataggg atatagcaca gagatatata 4740 gcaaagagat
acttttgagc aatgtttgtg gaagcggtat tcgcaatatt ttagtagctc 4800
gttacagtcc ggtgcgtttt tggttttttg aaagtgcgtc ttcagagcgc ttttggtttt
4860 caaaagcgct ctgaagttcc tatactttct agagaatagg aacttcggaa
taggaacttc 4920 aaagcgtttc cgaaaacgag cgcttccgaa aatgcaacgc
gagctgcgca catacagctc 4980 actgttcacg tcgcacctat atctgcgtgt
tgcctgtata tatatataca
tgagaagaac 5040 ggcatagtgc gtgtttatgc ttaaatgcgt acttatatgc
gtctatttat gtaggatgaa 5100 aggtagtcta gtacctcctg tgatattatc
ccattccatg cggggtatcg tatgcttcct 5160 tcagcactac cctttagctg
ttctatatgc tgccactcct caattggatt agtctcatcc 5220 ttcaatgcta
tcatttcctt tgatattgga tcatatgcat agtaccgaga aactagtgcg 5280
aagtagtgat caggtattgc tgttatctga tgagtatacg ttgtcctggc cacggcagaa
5340 gcacgcttat cgctccaatt tcccacaaca ttagtcaact ccgttaggcc
cttcattgaa 5400 agaaatgagg tcatcaaatg tcttccaatg tgagattttg
ggccattttt tatagcaaag 5460 attgaataag gcgcattttt cttcaaagct
ttattgtacg atctgactaa gttatctttt 5520 aataattggt attcctgttt
attgcttgaa gaattgccgg tcctatttac tcgttttagg 5580 actggttcag
aattcttgaa gacgaaaggg cctcgtgata cgcctatttt tataggttaa 5640
tgtcatgata ataatggttt cttagacgtc aggtggcact tttcggggaa atgtgcgcgg
5700 aacccctatt tgtttatttt tctaaataca ttcaaatatg tatccgctca
tgagacaata 5760 accctgataa atgcttcaat aatattgaaa aaggaagagt
atgagtattc aacatttccg 5820 tgtcgccctt attccctttt ttgcggcatt
ttgccttcct gtttttgctc acccagaaac 5880 gctggtgaaa gtaaaagatg
ctgaagatca gttgggtgca cgagtgggtt acatcgaact 5940 ggatctcaac
agcggtaaga tccttgagag ttttcgcccc gaagaacgtt ttccaatgat 6000
gagcactttt aaagttctgc tatgtggcgc ggtattatcc cgtgttgacg ccgggcaaga
6060 gcaactcggt cgccgcatac actattctca gaatgacttg gttgagtact
caccagtcac 6120 agaaaagcat cttacggatg gcatgacagt aagagaatta
tgcagtgctg ccataaccat 6180 gagtgataac actgcggcca acttacttct
gacaacgatc ggaggaccga aggagctaac 6240 cgcttttttg cacaacatgg
gggatcatgt aactcgcctt gatcgttggg aaccggagct 6300 gaatgaagcc
ataccaaacg acgagcgtga caccacgatg cctgcagcaa tggcaacaac 6360
gttgcgcaaa ctattaactg gcgaactact tactctagct tcccggcaac aattaataga
6420 ctggatggag gcggataaag ttgcaggacc acttctgcgc tcggcccttc
cggctggctg 6480 gtttattgct gataaatctg gagccggtga gcgtgggtct
cgcggtatca ttgcagcact 6540 ggggccagat ggtaagccct cccgtatcgt
agttatctac acgacgggga gtcaggcaac 6600 tatggatgaa cgaaatagac
agatcgctga gataggtgcc tcactgatta agcattggta 6660 actgtcagac
caagtttact catatatact ttagattgat ttaaaacttc atttttaatt 6720
taaaaggatc taggtgaaga tcctttttga taatctcatg accaaaatcc cttaacgtga
6780 gttttcgttc cactgagcgt cagaccccgt agaaaagatc aaaggatctt
cttgagatcc 6840 tttttttctg cgcgtaatct gctgcttgca aacaaaaaaa
ccaccgctac cagcggtggt 6900 ttgtttgccg gatcaagagc taccaactct
ttttccgaag gtaactggct tcagcagagc 6960 gcagatacca aatactgtcc
ttctagtgta gccgtagtta ggccaccact tcaagaactc 7020 tgtagcaccg
cctacatacc tcgctctgct aatcctgtta ccagtggctg ctgccagtgg 7080
cgataagtcg tgtcttaccg ggttggactc aagacgatag ttaccggata aggcgcagcg
7140 gtcgggctga acggggggtt cgtgcacaca gcccagcttg gagcgaacga
cctacaccga 7200 actgagatac ctacagcgtg agctatgaga aagcgccacg
cttcccgaag ggagaaaggc 7260 ggacaggtat ccggtaagcg gcagggtcgg
aacaggagag cgcacgaggg agcttccagg 7320 gggaaacgcc tggtatcttt
atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg 7380 atttttgtga
tgctcgtcag gggggcggag cctatggaaa aacgccagca acgcggcctt 7440
tttacggttc ctggcctttt gctggccttt tgctcacatg ttctttcctg cgttatcccc
7500 tgattctgtg gataaccgta ttaccgcctt tgagtgagct gataccgctc
gccgcagccg 7560 aacgaccgag cgcagcgagt cagtgagcga ggaagcggaa
gagcgcctga tgcggtattt 7620 tctccttacg catctgtgcg gtatttcaca
ccgcatatgg tgcactctca gtacaatctg 7680 ctctgatgcc gcatagttaa
gccagtatac actccgctat cgctacgtga ctgggtcatg 7740 gctgcgcccc
gacacccgcc aacacccgct gacgcgccct gacgggcttg tctgctcccg 7800
gcatccgctt acagacaagc tgtgaccgtc tccgggagct gcatgtgtca gaggttttca
7860 ccgtcatcac cgaaacgcgc gaggcaggga tccg 7894 35 5800 DNA
Artificial Sequence Synthetic Sequence 35 atcatggaga taattaaaat
gataaccatc tcgcaaataa ataagtattt tactgttttc 60 gtaacagttt
tgtaataaaa aaacctataa atattccgga ttattcatac cgtcccacca 120
tcgggcgcga tgggtcatca ccatcatcat cacgggtcgg actcagaagt caatcaagaa
180 gctaagccag aggtcaagcc agaagtcaag cctgagactc acatcaattt
aaaggtgtcc 240 gatggatctt cagagatctt cttcaagatc aaaaagacca
ctcctttaag aaggctgatg 300 gaagcgttcg ctaaaagaca gggtaaggaa
atggactcct taagattctt gtacgacggt 360 attagaattc aagctgatca
gacccctgaa gatttggaca tggaggataa cgatattatt 420 gaggctcacc
gcgaacagat tggaggtatg gtgagcaagg gcgaggagct gttcaccggg 480
gtggtgccca tcctggtcga gctggacggc gacgtaaacg gccacaagtt cagcgtgtcc
540 ggcgagggcg agggcgatgc cacctacggc aagctgaccc tgaagttcat
ctgcaccacc 600 ggcaagctgc ccgtgccctg gcccaccctc gtgaccaccc
tgacctacgg cgtgcagtgc 660 ttcagccgct accccgacca catgaagcag
cacgacttct tcaagtccgc catgcccgaa 720 ggctacgtcc aggagcgcac
catcttcttc aaggacgacg gcaactacaa gacccgcgcc 780 gaggtgaagt
tcgagggcga caccctggtg aaccgcatcg agctgaaggg catcgacttc 840
aaggaggacg gcaacatcct ggggcacaag ctggagtaca actacaacag ccacaacgtc
900 tatatcatgg ccgacaagca gaagaacggc atcaaggtga acttcaagat
ccgccacaac 960 atcgaggacg gcagcgtgca gctcgccgac cactaccagc
agaacacccc catcggcgac 1020 ggccccgtgc tgctgcccga caaccactac
ctgagcaccc agtccgccct gagcaaagac 1080 cccaacgaga agcgcgatca
catggtcctg ctggagttcg tgaccgccgc cgggatcact 1140 ctcggcatgg
acgagctgta caagtaatga gacggaattc aaaggcctac gtcgacgagc 1200
tcactagtcg cggccgcttt cgaatctaga gcctgcagtc tcgaggcatg cggtaccaag
1260 cttgtcgaga agtactagag gatcataatc agccatacca catttgtaga
ggttttactt 1320 gctttaaaaa acctcccaca cctccccctg aacctgaaac
ataaaatgaa tgcaattgtt 1380 gttgttaact tgtttattgc agcttataat
ggttacaaat aaagcaatag catcacaaat 1440 ttcacaaata aagcattttt
ttcactgcat tctagttgtg gtttgtccaa actcatcaat 1500 gtatcttatc
atgtctggat ctgatcactg cttgagccta ggagatccga accagataag 1560
tgaaatctag ttccaaacta ttttgtcatt tttaattttc gtattagctt acgacgctac
1620 acccagttcc catctatttt gtcactcttc cctaaataat ccttaaaaac
tccatttcca 1680 cccctcccag ttcccaacta ttttgtccgc ccacagcggg
gcatttttct tcctgttatg 1740 tttttaatca aacatcctgc caactccatg
tgacaaaccg tcatcttcgg ctactttttc 1800 tctgtcacag aatgaaaatt
tttctgtcat ctcttcgtta ttaatgtttg taattgactg 1860 aatatcaacg
cttatttgca gcctgaatgg cgaatgggac gcgccctgta gcggcgcatt 1920
aagcgcggcg ggtgtggtgg ttacgcgcag cgtgaccgct acacttgcca gcgccctagc
1980 gcccgctcct ttcgctttct tcccttcctt tctcgccacg ttcgccggct
ttccccgtca 2040 agctctaaat cgggggctcc ctttagggtt ccgatttagt
gctttacggc acctcgaccc 2100 caaaaaactt gattagggtg atggttcacg
tagtgggcca tcgccctgat agacggtttt 2160 tcgccctttg acgttggagt
ccacgttctt taatagtgga ctcttgttcc aaactggaac 2220 aacactcaac
cctatctcgg tctattcttt tgatttataa gggattttgc cgatttcggc 2280
ctattggtta aaaaatgagc tgatttaaca aaaatttaac gcgaatttta acaaaatatt
2340 aacgtttaca atttcaggtg gcacttttcg gggaaatgtg cgcggaaccc
ctatttgttt 2400 atttttctaa atacattcaa atatgtatcc gctcatgaga
caataaccct gataaatgct 2460 tcaataatat tgaaaaagga agagtatgag
tattcaacat ttccgtgtcg cccttattcc 2520 cttttttgcg gcattttgcc
ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa 2580 agatgctgaa
gatcagttgg gtgcacgagt gggttacatc gaactggatc tcaacagcgg 2640
taagatcctt gagagttttc gccccgaaga acgttttcca atgatgagca cttttaaagt
2700 tctgctatgt ggcgcggtat tatcccgtat tgacgccggg caagagcaac
tcggtcgccg 2760 catacactat tctcagaatg acttggttga gtactcacca
gtcacagaaa agcatcttac 2820 ggatggcatg acagtaagag aattatgcag
tgctgccata accatgagtg ataacactgc 2880 ggccaactta cttctgacaa
cgatcggagg accgaaggag ctaaccgctt ttttgcacaa 2940 catgggggat
catgtaactc gccttgatcg ttgggaaccg gagctgaatg aagccatacc 3000
aaacgacgag cgtgacacca cgatgcctgt agcaatggca acaacgttgc gcaaactatt
3060 aactggcgaa ctacttactc tagcttcccg gcaacaatta atagactgga
tggaggcgga 3120 taaagttgca ggaccacttc tgcgctcggc ccttccggct
ggctggttta ttgctgataa 3180 atctggagcc ggtgagcgtg ggtctcgcgg
tatcattgca gcactggggc cagatggtaa 3240 gccctcccgt atcgtagtta
tctacacgac ggggagtcag gcaactatgg atgaacgaaa 3300 tagacagatc
gctgagatag gtgcctcact gattaagcat tggtaactgt cagaccaagt 3360
ttactcatat atactttaga ttgatttaaa acttcatttt taatttaaaa ggatctaggt
3420 gaagatcctt tttgataatc tcatgaccaa aatcccttaa cgtgagtttt
cgttccactg 3480 agcgtcagac cccgtagaaa agatcaaagg atcttcttga
gatccttttt ttctgcgcgt 3540 aatctgctgc ttgcaaacaa aaaaaccacc
gctaccagcg gtggtttgtt tgccggatca 3600 agagctacca actctttttc
cgaaggtaac tggcttcagc agagcgcaga taccaaatac 3660 tgtccttcta
gtgtagccgt agttaggcca ccacttcaag aactctgtag caccgcctac 3720
atacctcgct ctgctaatcc tgttaccagt ggctgctgcc agtggcgata agtcgtgtct
3780 taccgggttg gactcaagac gatagttacc ggataaggcg cagcggtcgg
gctgaacggg 3840 gggttcgtgc acacagccca gcttggagcg aacgacctac
accgaactga gatacctaca 3900 gcgtgagcat tgagaaagcg ccacgcttcc
cgaagggaga aaggcggaca ggtatccggt 3960 aagcggcagg gtcggaacag
gagagcgcac gagggagctt ccagggggaa acgcctggta 4020 tctttatagt
cctgtcgggt ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc 4080
gtcagggggg cggagcctat ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc
4140 cttttgctgg ccttttgctc acatgttctt tcctgcgtta tcccctgatt
ctgtggataa 4200 ccgtattacc gcctttgagt gagctgatac cgctcgccgc
agccgaacga ccgagcgcag 4260 cgagtcagtg agcgaggaag cggaagagcg
cctgatgcgg tattttctcc ttacgcatct 4320 gtgcggtatt tcacaccgca
gaccagccgc gtaacctggc aaaatcggtt acggttgagt 4380 aataaatgga
tgccctgcgt aagcgggtgt gggcggacaa taaagtctta aactgaacaa 4440
aatagatcta aactatgaca ataaagtctt aaactagaca gaatagttgt aaactgaaat
4500 cagtccagtt atgctgtgaa aaagcatact ggacttttgt tatggctaaa
gcaaactctt 4560 cattttctga agtgcaaatt gcccgtcgta ttaaagaggg
gcgtggccaa gggcatggta 4620 aagactatat tcgcggcgtt gtgacaattt
accgaacaac tccgcggccg ggaagccgat 4680 ctcggcttga acgaattgtt
aggtggcggt acttgggtcg atatcaaagt gcatcacttc 4740 ttcccgtatg
cccaactttg tatagagagc cactgcggga tcgtcaccgt aatctgcttg 4800
cacgtagatc acataagcac caagcgcgtt ggcctcatgc ttgaggagat tgatgagcgc
4860 ggtggcaatg ccctgcctcc ggtgctcgcc ggagactgcg agatcataga
tatagatctc 4920 actacgcggc tgctcaaacc tgggcagaac gtaagccgcg
agagcgccaa caaccgcttc 4980 ttggtcgaag gcagcaagcg cgatgaatgt
cttactacgg agcaagttcc cgaggtaatc 5040 ggagtccggc tgatgttggg
agtaggtggc tacgtctccg aactcacgac cgaaaagatc 5100 aagagcagcc
cgcatggatt tgacttggtc agggccgagc ctacatgtgc gaatgatgcc 5160
catacttgag ccacctaact ttgttttagg gcgactgccc tgctgcgtaa catcgttgct
5220 gctgcgtaac atcgttgctg ctccataaca tcaaacatcg acccacggcg
taacgcgctt 5280 gctgcttgga tgcccgaggc atagactgta caaaaaaaca
gtcataacaa gccatgaaaa 5340 ccgccactgc gccgttacca ccgctgcgtt
cggtcaaggt tctggaccag ttgcgtgagc 5400 gcatacgcta cttgcattac
agtttacgaa ccgaacaggc ttatgtcaac tgggttcgtg 5460 ccttcatccg
tttccacggt gtgcgtcacc cggcaacctt gggcagcagc gaagtcgagg 5520
catttctgtc ctggctggcg aacgagcgca aggtttcggt ctccacgcat cgtcaggcat
5580 tggcggcctt gctgttcttc tacggcaagg tgctgtgcac ggatctgccc
tggcttcagg 5640 agatcggaag acctcggccg tcgcggcgct tgccggtggt
gctgaccccg gatgaagtgg 5700 ttcgcatcct cggttttctg gaaggcgagc
atcgtttgtt cgcccaggac tctagctata 5760 gttctagtgg ttggctacgt
atactccgga atattaatag 5800 36 5598 DNA Artificial Sequence
Synthetic Sequence 36 atccggatat agttcctcct ttcagcaaaa aacccctcaa
gacccgttta gaggccccaa 60 ggggttatgc tagttattgc tcagcggtgg
cagcagccaa ctcagcttcc tttcgggctt 120 tgttagcagc cggatctcag
tggtggtggt ggtggtgctc gagtgcggcc gcaagcttgt 180 cgacggagct
cgaattcgga tccggtctca acctccaatc tgttcgcggt gagcctcaat 240
aatatcgtta tcctccatgt ccaaatcttc aggggtctga tcagcttgaa ttctaatacc
300 gtcgtacaag aatcttaagg agtccatttc cttaccctgt cttttagcga
acgcttccat 360 cagccttctt aaaggagtgg tctttttgat cttgaagaag
atctctgaag atccatcgga 420 cacctttaaa ttgatgtgag tctcaggctt
gacttctggc ttgacctctg gcttagcttc 480 ttgattgact tctgagtccg
acccgtgatg atgatggtga tgacccatgg tatatctcct 540 tcttaaagtt
aaacaaaatt atttctagag gggaattgtt atccgctcac aattccccta 600
tagtgagtcg tattaatttc gcgggatcga gatctcgatc ctctacgccg gacgcatcgt
660 ggccggcatc accggcgcca caggtgcggt tgctggcgcc tatatcgccg
acatcaccga 720 tggggaagat cgggctcgcc acttcgggct catgagcgct
tgtttcggcg tgggtatggt 780 ggcaggcccc gtggccgggg gactgttggg
cgccatctcc ttgcatgcac cattccttgc 840 ggcggcggtg ctcaacggcc
tcaacctact actgggctgc ttcctaatgc aggagtcgca 900 taagggagag
cgtcgagatc ccggacacca tcgaatggcg caaaaccttt cgcggtatgg 960
catgatagcg cccggaagag agtcaattca gggtggtgaa tgtgaaacca gtaacgttat
1020 acgatgtcgc agagtatgcc ggtgtctctt atcagaccgt ttcccgcgtg
gtgaaccagg 1080 ccagccacgt ttctgcgaaa acgcgggaaa aagtggaagc
ggcgatggcg gagctgaatt 1140 acattcccaa ccgcgtggca caacaactgg
cgggcaaaca gtcgttgctg attggcgttg 1200 ccacctccag tctggccctg
cacgcgccgt cgcaaattgt cgcggcgatt aaatctcgcg 1260 ccgatcaact
gggtgccagc gtggtggtgt cgatggtaga acgaagcggc gtcgaagcct 1320
gtaaagcggc ggtgcacaat cttctcgcgc aacgcgtcag tgggctgatc attaactatc
1380 cgctggatga ccaggatgcc attgctgtgg aagctgcctg cactaatgtt
ccggcgttat 1440 ttcttgatgt ctctgaccag acacccatca acagtattat
tttctcccat gaagacggta 1500 cgcgactggg cgtggagcat ctggtcgcat
tgggtcacca gcaaatcgcg ctgttagcgg 1560 gcccattaag ttctgtctcg
gcgcgtctgc gtctggctgg ctggcataaa tatctcactc 1620 gcaatcaaat
tcagccgata gcggaacggg aaggcgactg gagtgccatg tccggttttc 1680
aacaaaccat gcaaatgctg aatgagggca tcgttcccac tgcgatgctg gttgccaacg
1740 atcagatggc gctgggcgca atgcgcgcca ttaccgagtc cgggctgcgc
gttggtgcgg 1800 atatctcggt agtgggatac gacgataccg aagacagctc
atgttatatc ccgccgttaa 1860 ccaccatcaa acaggatttt cgcctgctgg
ggcaaaccag cgtggaccgc ttgctgcaac 1920 tctctcaggg ccaggcggtg
aagggcaatc agctgttgcc cgtctcactg gtgaaaagaa 1980 aaaccaccct
ggcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa 2040
tgcagctggc acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat
2100 gtaagttagc tcactcatta ggcaccggga tctcgaccga tgcccttgag
agccttcaac 2160 ccagtcagct ccttccggtg ggcgcggggc atgactatcg
tcgccgcact tatgactgtc 2220 ttctttatca tgcaactcgt aggacaggtg
ccggcagcgc tctgggtcat tttcggcgag 2280 gaccgctttc gctggagcgc
gacgatgatc ggcctgtcgc ttgcggtatt cggaatcttg 2340 cacgccctcg
ctcaagcctt cgtcactggt cccgccacca aacgtttcgg cgagaagcag 2400
gccattatcg ccggcatggc ggccccacgg gtgcgcatga tcgtgctcct gtcgttgagg
2460 acccggctag gctggcgggg ttgccttact ggttagcaga atgaatcacc
gatacgcgag 2520 cgaacgtgaa gcgactgctg ctgcaaaacg tctgcgacct
gagcaacaac atgaatggtc 2580 ttcggtttcc gtgtttcgta aagtctggaa
acgcggaagt cagcgccctg caccattatg 2640 ttccggatct gcatcgcagg
atgctgctgg ctaccctgtg gaacacctac atctgtatta 2700 acgaagcgct
ggcattgacc ctgagtgatt tttctctggt cccgccgcat ccataccgcc 2760
agttgtttac cctcacaacg ttccagtaac cgggcatgtt catcatcagt aacccgtatc
2820 gtgagcatcc tctctcgttt catcggtatc attaccccca tgaacagaaa
tcccccttac 2880 acggaggcat cagtgaccaa acaggaaaaa accgccctta
acatggcccg ctttatcaga 2940 agccagacat taacgcttct ggagaaactc
aacgagctgg acgcggatga acaggcagac 3000 atctgtgaat cgcttcacga
ccacgctgat gagctttacc gcagctgcct cgcgcgtttc 3060 ggtgatgacg
gtgaaaacct ctgacacatg cagctcccgg agacggtcac agcttgtctg 3120
taagcggatg ccgggagcag acaagcccgt cagggcgcgt cagcgggtgt tggcgggtgt
3180 cggggcgcag ccatgaccca gtcacgtagc gatagcggag tgtatactgg
cttaactatg 3240 cggcatcaga gcagattgta ctgagagtgc accatatatg
cggtgtgaaa taccgcacag 3300 atgcgtaagg agaaaatacc gcatcaggcg
ctcttccgct tcctcgctca ctgactcgct 3360 gcgctcggtc gttcggctgc
ggcgagcggt atcagctcac tcaaaggcgg taatacggtt 3420 atccacagaa
tcaggggata acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc 3480
caggaaccgt aaaaaggccg cgttgctggc gtttttccat aggctccgcc cccctgacga
3540 gcatcacaaa aatcgacgct caagtcagag gtggcgaaac ccgacaggac
tataaagata 3600 ccaggcgttt ccccctggaa gctccctcgt gcgctctcct
gttccgaccc tgccgcttac 3660 cggatacctg tccgcctttc tcccttcggg
aagcgtggcg ctttctcata gctcacgctg 3720 taggtatctc agttcggtgt
aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc 3780 cgttcagccc
gaccgctgcg ccttatccgg taactatcgt cttgagtcca acccggtaag 3840
acacgactta tcgccactgg cagcagccac tggtaacagg attagcagag cgaggtatgt
3900 aggcggtgct acagagttct tgaagtggtg gcctaactac ggctacacta
gaaggacagt 3960 atttggtatc tgcgctctgc tgaagccagt taccttcgga
aaaagagttg gtagctcttg 4020 atccggcaaa caaaccaccg ctggtagcgg
tggttttttt gtttgcaagc agcagattac 4080 gcgcagaaaa aaaggatctc
aagaagatcc tttgatcttt tctacggggt ctgacgctca 4140 gtggaacgaa
aactcacgtt aagggatttt ggtcatgaac aataaaactg tctgcttaca 4200
taaacagtaa tacaaggggt gttatgagcc atattcaacg ggaaacgtct tgctctaggc
4260 cgcgattaaa ttccaacatg gatgctgatt tatatgggta taaatgggct
cgcgataatg 4320 tcgggcaatc aggtgcgaca atctatcgat tgtatgggaa
gcccgatgcg ccagagttgt 4380 ttctgaaaca tggcaaaggt agcgttgcca
atgatgttac agatgagatg gtcagactaa 4440 actggctgac ggaatttatg
cctcttccga ccatcaagca ttttatccgt actcctgatg 4500 atgcatggtt
actcaccact gcgatccccg ggaaaacagc attccaggta ttagaagaat 4560
atcctgattc aggtgaaaat attgttgatg cgctggcagt gttcctgcgc cggttgcatt
4620 cgattcctgt ttgtaattgt ccttttaaca gcgatcgcgt atttcgtctc
gctcaggcgc 4680 aatcacgaat gaataacggt ttggttgatg cgagtgattt
tgatgacgag cgtaatggct 4740 ggcctgttga acaagtctgg aaagaaatgc
ataaactttt gccattctca ccggattcag 4800 tcgtcactca tggtgatttc
tcacttgata accttatttt tgacgagggg aaattaatag 4860 gttgtattga
tgttggacga gtcggaatcg cagaccgata ccaggatctt gccatcctat 4920
ggaactgcct cggtgagttt tctccttcat tacagaaacg gctttttcaa aaatatggta
4980 ttgataatcc tgatatgaat aaattgcagt ttcatttgat gctcgatgag
tttttctaag 5040 aattaattca tgagcggata catatttgaa tgtatttaga
aaaataaaca aataggggtt 5100 ccgcgcacat ttccccgaaa agtgccacct
gaaattgtaa acgttaatat tttgttaaaa 5160 ttcgcgttaa atttttgtta
aatcagctca ttttttaacc aataggccga aatcggcaaa 5220 atcccttata
aatcaaaaga atagaccgag atagggttga gtgttgttcc agtttggaac 5280
aagagtccac tattaaagaa cgtggactcc aacgtcaaag ggcgaaaaac cgtctatcag
5340 ggcgatggcc cactacgtga accatcaccc taatcaagtt ttttggggtc
gaggtgccgt 5400 aaagcactaa atcggaaccc taaagggagc ccccgattta
gagcttgacg gggaaagccg 5460 gcgaacgtgg cgagaaagga agggaagaaa
gcgaaaggag cgggcgctag ggcgctggca 5520 agtgtagcgg tcacgctgcg
cgtaaccacc acacccgccg cgcttaatgc gccgctacag 5580 ggcgcgtccc
attcgcca 5598 37 478 DNA Artificial Sequence Synthetic Sequence 37
agatctcgat cccgcgaaat taatacgact cactataggg gaattgtgag cggataacaa
60 ttcccctcta gaaataattt tgtttaactt taagaaggag atataccatg
ggtcatcacc 120 atcatcatca cgggtcggac tcagaagtca atcaagaagc
taagccagag gtcaagccag 180 aagtcaagcc tgagactcac atcaatttaa
aggtgtccga tggatcttca gagatcttct 240 tcaagatcaa aaagaccact
cctttaagaa ggctgatgga agcgttcgct aaaagacagg 300 gtaaggaaat
ggactcctta agattcttgt acgacggtat tagaattcaa gctgatcaga 360
cccctgaaga tttggacatg gaggataacg atattattga ggctcaccgc gaacagattg
420 gaggttgaga ccggatccga attcgagctc cgtcgacaag cttgcggccg cactcgag
478 38 106 PRT Saccharomtces cerevisiae 38 Met Gly His His His His
His His Gly Ser Asp
Ser Glu Val Asn Gln 1 5 10 15 Glu Ala Lys Pro Glu Val Lys Pro Glu
Val Lys Pro Glu Thr His Ile 20 25 30 Asn Leu Lys Val Ser Asp Gly
Ser Ser Glu Ile Phe Phe Lys Ile Lys 35 40 45 Lys Thr Thr Pro Leu
Arg Arg Leu Met Glu Ala Phe Ala Lys Arg Gln 50 55 60 Gly Lys Glu
Met Asp Ser Leu Arg Phe Leu Tyr Asp Gly Ile Arg Ile 65 70 75 80 Gln
Ala Asp Gln Thr Pro Glu Asp Leu Asp Met Glu Asp Asn Asp Ile 85 90
95 Ile Glu Ala His Arg Glu Gln Ile Gly Gly 100 105 39 6 PRT
Artificial Sequence Synthetic Sequence 39 Leu Arg Leu Arg Gly Gly 1
5 40 50 DNA Artificial Sequence primer 40 ccatgggtca tcaccatcat
catcacgggt cggactcaga agtcaatcaa 50 41 36 DNA Artificial Sequence
primer 41 ggatccggtc tcaacctcca atctgttcgc ggtgag 36 42 33 DNA
Artificial Sequence primer misc_feature (12)...(14) n = a, c, g, or
t 42 ggtctcaagg tnnngtgagc aagggcgagg agc 33 43 31 DNA Artificial
Sequence primer 43 aagcttatta cttgtacagc tcgtccatgc c 31 44 14 DNA
Artificial Sequence primer misc_feature (12)...(14) n = a, c, g, or
t 44 ggtctcaagg tnnn 14 45 18 DNA Artificial Sequence primer
misc_feature (16)...(18) n = a, c, g, or t 45 ggtctcctcg agttannn
18 46 84 DNA Artificial Sequence Synthetic Sequence 46 gtcttaagac
taagaggtgg cacgccggcg gtgaccacct ataaactggt gattaacggc 60
aaaaccctga aaggcgaaac cacc 84 47 78 DNA Artificial Sequence
Synthetic Sequence 47 gccgttatcg ttcgcatact gtttaaacgc tttttccgcg
gtttccgcat ccaccgcttt 60 ggtggtttcg cctttcag 78 48 86 DNA
Artificial Sequence Synthetic Sequence 48 cagtatgcga acgataacgg
cgtggatggc gtgtggacct atgatgatgc gaccaaaacc 60 tttaccgtga
ccgaataagg tacccc 86 49 15 DNA Artificial Sequence primer 49
cttgtcttaa gaggt 15 50 21 DNA Artificial Sequence primer 50
gctgggtacc ttattcggtc a 21 51 30 DNA Artificial Sequence primer 51
ggtctcaagg tacgccggcg gtgaccacct 30 52 30 DNA Artificial Sequence
primer 52 aagcttatta ttcggtcacg gtaaaggttt 30 53 34 DNA Artificial
Sequence primer 53 ggtctcaagg tatgaccatg attacggatt cact 34 54 32
DNA Artificial Sequence primer 54 aagcttatta ttattatttt tgacaccaga
cc 32 55 34 DNA Artificial Sequence primer 55 ggtctcaagg tatgcagatc
ttcgtcaaga cgtt 34 56 30 DNA Artificial Sequence primer 56
aagcttatta ttgtttgcct ccctgctgcg 30 57 25 DNA Artificial Sequence
primer 57 gctcgagagc acagatgctt cgttg 25 58 25 DNA Artificial
Sequence primer 58 gcaaagcttg gagttgattg tatgc 25 59 5 PRT
Artificial Sequence Synthetic Sequence 59 Gly Gly Ala Thr Tyr 1 5
60 18 DNA Artificial Sequence primer 60 ttttggtctc caggttgt 18 61
18 DNA Artificial Sequence primer 61 acaacctgga gaccaaaa 18 62 13
DNA Artificial Sequence primer 62 ggaggttgag acc 13 63 13 DNA
Artificial Sequence primer 63 ggtctcaacc tcc 13 64 294 DNA
Artificial Sequence Synthetic Sequence 64 atgtcggact cagaagtcaa
tcaagaagct aagccagagg tcaagccaga agtcaagcct 60 gagactcaca
tcaatttaaa ggtgtccgat ggatcttcag agatcttctt caagatcaaa 120
aagaccactc ctttaagaag gctgatggaa gcgttcgcta aaagacaggg taaggaaatg
180 gactccttaa gattcttgta cgacggtatt agaattcaag ctgatcaggc
ccctgaagat 240 ttggacatgg aggataacga tattattgag gctcaccgcg
aacagattgg aggt 294 65 98 PRT Artificial Sequence Synthetic
Sequence 65 Met Ser Asp Ser Glu Val Asn Gln Glu Ala Lys Pro Glu Val
Lys Pro 1 5 10 15 Glu Val Lys Pro Glu Thr His Ile Asn Leu Lys Val
Ser Asp Gly Ser 20 25 30 Ser Glu Ile Phe Phe Lys Ile Lys Lys Thr
Thr Pro Leu Arg Arg Leu 35 40 45 Met Glu Ala Phe Ala Lys Arg Gln
Gly Lys Glu Met Asp Ser Leu Arg 50 55 60 Phe Leu Tyr Asp Gly Ile
Arg Ile Gln Ala Asp Gln Ala Pro Glu Asp 65 70 75 80 Leu Asp Met Glu
Asp Asn Asp Ile Ile Glu Ala His Arg Glu Gln Ile 85 90 95 Gly
Gly
* * * * *