U.S. patent application number 16/628456 was filed with the patent office on 2022-05-19 for evolution of trna synthetases.
This patent application is currently assigned to President and Fellows of Harvard College. The applicant listed for this patent is President and Fellows of Harvard College. Invention is credited to David Irby Bryson, JR., David R. Liu.
Application Number | 20220154237 16/628456 |
Document ID | / |
Family ID | |
Filed Date | 2022-05-19 |
United States Patent
Application |
20220154237 |
Kind Code |
A1 |
Liu; David R. ; et
al. |
May 19, 2022 |
EVOLUTION OF TRNA SYNTHETASES
Abstract
The disclosure provides amino acid sequence variants of
orthogonal aminoacyl-tRNA synthetases (AARSs) having increased
activity and selectivity compared to previous AARSs, and methods of
producing the same.
Inventors: |
Liu; David R.; (Lexington,
MA) ; Bryson, JR.; David Irby; (Dorchester,
MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
President and Fellows of Harvard College |
Cambridge |
MA |
US |
|
|
Assignee: |
President and Fellows of Harvard
College
Cambridge
MA
|
Appl. No.: |
16/628456 |
Filed: |
July 3, 2018 |
PCT Filed: |
July 3, 2018 |
PCT NO: |
PCT/US18/40692 |
371 Date: |
January 3, 2020 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62529320 |
Jul 6, 2017 |
|
|
|
62535090 |
Jul 20, 2017 |
|
|
|
International
Class: |
C12P 21/02 20060101
C12P021/02; C12N 9/00 20060101 C12N009/00 |
Goverment Interests
FEDERALLY SPONSORED RESEARCH
[0002] This invention was made with government support under grant
numbers N66001-12-C-4207, awarded by the Defense Advanced Research
Projects Agency; EB022376, GM118062, AI119813, GM022854 and
GM106621, awarded by the National Institutes of Health; and
FG02-98ER2031, awarded by the Department of Energy. The government
has certain rights in the invention.
Claims
1. A tRNA synthetase protein variant comprising an amino acid
sequence that is at least 80% identical to SEQ ID NO: 20 or SEQ ID
NO: 21, and includes at least one mutation at a position selected
from V31, T56, H62, and A100.
2. The tRNA synthetase protein variant of claim 1, wherein the at
least one mutation is V31I, T56P, H62Y, A100E, or A100S.
3. The tRNA synthetase protein variant of claim 1 comprising
mutations at V31, T56, H62, and A100.
4. The tRNA synthetase protein variant of claim 3, wherein the
mutations are V31I, T56P, H62Y, and A100E.
5. The tRNA synthetase protein variant of claim 1, wherein the
nucleic acid sequence encoding the amino acid sequence comprises
one or more premature stop codons.
6.-8. (canceled)
9. A chimeric pyrrolysyl-tRNA synthetase (PylRS) protein variant
comprising: (i) a first portion comprising amino acid residues
1-149 of Methanosarcina barkeri PylRS (SEQ ID NO: 20); and (ii) a
second portion comprising amino acid residues 185-454 of
Methanosarcina mazei PylRS (SEQ ID NO: 21), wherein the first
portion or the second portion comprises at least one of the amino
acid substitutions set forth in Tables 2-6.
10. The chimeric PylRS protein variant of claim 9, wherein the
chimeric PylRS protein variant comprises an amino acid substitution
at at least one of the following positions: V31, T56, H62, or
A100.
11. The chimeric PylRS protein variant of claim 10, wherein the
amino acid substitution is V31I, T56P, H62Y, A100E, or any
combination thereof.
12. A tRNA synthetase (TyrRS) protein variant comprising at least
one mutation at a position selected from L69 and V235.
13. The tRNA synthetase protein variant of claim 12, wherein the at
least one mutation is L69F or V235I.
14. The tRNA synthetase protein variant of claim 12 comprising
mutations at L69 and V235.
15. The tRNA synthetase protein variant of claim 14, wherein the
mutations are L69F and V235I.
16. The tyrosyl-tRNA synthetase (TyrRS) protein variant of claim
12, wherein the TyrRS protein comprises an amino acid sequence that
is at least 80% identical to SEQ ID NO: 24.
17. (canceled)
18. An isolated nucleic acid comprising the sequence represented by
any one of SEQ ID NO: 5-19.
19. A protein encoded by the isolated nucleic acid of claim 18.
20. A method for aminoacylation of a tRNA, the method comprising
contacting a tRNA encoding an amber codon with the tRNA synthetase
protein variant of claim 1 in the presence of a non-canonical amino
acid.
21. The method of claim 20, wherein the non-canonical amino acid is
a pyrolysine or a p-iodo-L-phenylalanine.
22. The method of claim 20 wherein the tRNA is contacted with the
tRNA synthetase inside a cell.
23. A method for incorporating a non-canonical amino acid into a
peptide, the method comprising expressing in a cell: (i) an mRNA
transcript, wherein the transcript comprises an amber codon at a
position in which a non-canonical amino acid (ncAA) is to be
translated; (ii) a tRNA capable of incorporating the ncAA; and
(iii) the tRNA synthetase protein variant of claim 1.
24. The method of claim 23, wherein the non-canonical amino acid is
a pyrolysine or a p-iodo-L-phenylalanine.
25. (canceled)
Description
RELATED APPLICATIONS
[0001] This application is a national stage filing under 35 U.S.C.
.sctn. 371 of international PCT application, PCT/US2018/040692,
filed Jul. 3, 2018, which claims priority under 35 U.S.C. .sctn.
119(e) to U.S. provisional patent applications, U.S. Ser. No.
62/535,090, filed Jul. 20, 2017, entitled "EVOLUTION OF TRNA
SYNTHETASES", and U.S. Ser. No. 62/529,320, filed Jul. 6, 2017,
entitled "EVOLUTION OF TRNA SYNTHETASES", the entire contents of
each of which are incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0003] The directed evolution of orthogonal aminoacyl-tRNA
synthetases (AARSs) enables genetic code expansion through the
site-specific installation of non-canonical amino acids into
proteins. Traditional laboratory evolution techniques typically
produce AARSs with greatly reduced activity (often
.about.1,000-fold lower) and poor amino acid selectivity compared
to their wild-type counterparts, limiting their utility.
[0004] Although researchers have evolved many AARSs to incorporate
non-canonical amino acids (ncAAs) into proteins, several
outstanding challenges limit their utility and generality.
Laboratory evolution of AARSs with altered amino acid specificity
typically relies on three to five rounds of sequential positive and
negative selections from an AARS library containing either
partially or fully randomized residues in the amino acid-binding
pocket. The limited number of rounds of selection typically
conducted in AARS evolution campaigns reflects the effort required
to complete each round of evolution, which is on the order of one
week or longer. A consequence of conducting relatively few rounds
of selection on libraries that focus mutagenesis on and around the
amino acid-binding pocket is that laboratory-evolved AARSs
routinely emerge with suboptimal properties, including
.about.1,000-fold reduced activity (k.sub.cat/K.sub.M) compared to
their wild-type counterparts, and modest selectivity for the target
ncAA over endogenous amino acids that can require compensation with
high concentrations of ncAA and expression in minimal media,
lowering protein yields. The modest enzymatic efficiency and
selectivity of many laboratory-evolved AARSs are longstanding
challenges that limit the production and purity of expressed
proteins containing ncAAs.
SUMMARY OF THE INVENTION
[0005] In some aspects, the disclosure relates to evolved AARSs
that increase the utility of orthogonal translation systems and
establish the capability of rapidly and continuously evolving
orthogonal AARSs with high activity and amino acid specificity. The
disclosure is based, in part, on the discovery that positive and
negative phage-assisted continuous evolution (PACE) selections
produce highly active and selective orthogonal AARSs through
hundreds of generations on rapid time scales. For example, as
described in the Examples section, continuous evolution of a
pyrrolysyl-tRNA synthetase (PylRS), in some embodiments, improved
enzymatic efficiency (k.sub.cat/K.sub.M.sup.tRNA) up to 45-fold
compared to the wild-type enzyme.
[0006] In some aspects, the disclosure relates to the discovery
that PACE unexpectedly generated highly active, split-PylRS
variants produced as two mutually dependent polypeptide fragments,
recapitulating natural PylRS homologs. It was observed that
simultaneous positive and negative selection PACE over 48 h greatly
improved the selectivity of a promiscuous tyrosyl-tRNA synthetase
variant for site-specific incorporation of p-iodo-1-phenylalanine,
rejecting p-nitro-1-phenylalanine.
[0007] Accordingly, in some aspects, the disclosure provides
pyrrolysyl-tRNA synthetase (PylRS) protein variants. In some
embodiments, a PylRS protein variant described herein comprises a
nucleic acid sequence or an amino acid sequence that is at least
90% identical to a Methanosarcina PylRS or a fragment thereof
(e.g., the N-terminal domain of a Methanosarcina PylRS or the
C-terminal domain of a Methanosarcina PylRS), for example M. bakeri
PylRS (e.g., SEQ ID NO: 6) or M. mazei PylRS (e.g., SEQ ID NO:
7).
[0008] In some aspects, the disclosure provides tyrosyl-tRNA
synthetase (TyrRS) proteins variant capable of incorporating a
p-iodo-1-phenylalanine into a protein. In some embodiments, a TyrRS
protein variant described herein comprises a nucleic acid sequence
or an amino acid sequence that is at least 90% identical to a
Methanocaldococcus jannaschii (M. jannaschii) TyrRS (MjTyrRS), for
example SEQ ID NO: 24.
[0009] In some embodiments, a PylRS comprises an amino acid
sequence that is at least 90% identical to the amino acid sequence
of a wild-type M. bakeri or M. mazei PylRS (e.g., the amino acid
sequence set forth in SEQ ID NO: 20 or 21), or a fragment thereof
(e.g., amino acids 1-149 of SEQ ID NO: 20 or amino acids 185-454 of
SEQ ID NO: 21).
[0010] In some aspects, the disclosure provides a pyrrolysyl-tRNA
synthetase (PylRS) protein variant having an N-terminal domain
amino acid substitution present in at least one of the following
positions: V31, T56, H62, or A100 (e.g., relative to SEQ ID NO: 20
or 21). In some embodiments, the amino acid substitution is V31I,
T56P, H62Y, or A100E.
[0011] In some aspects, the disclosure provides a chimeric
pyrrolysyl-tRNA synthetase (PylRS) protein variant comprising: a
first portion comprising amino acid residues 1-149 of
Methanosarcina barkeri PylRS (SEQ ID NO: 20); and a second portion
comprising amino acid residues 185-454 of Methanosarcina mazei
PylRS (SEQ ID NO: 21), wherein the first portion or the second
portion comprises at least one of the amino acid substitutions set
forth in Tables 2-6.
[0012] In some embodiments, the chimeric protein variant comprises
an amino acid substitution in at least one of the following
positions: V31, T56, H62, or A100. In some embodiments, the amino
acid substitution is V31I, T56P, H62Y, A100E, or any combination
thereof.
[0013] In some aspects, the disclosure provides a tyrosyl-tRNA
synthetase (TyrRS) protein variant having an amino acid
substitution present in at least one of the following positions:
L69 or V235. In some embodiments, the amino acid substitution is
L69F, V235I, or L69F and V235I.
[0014] In some aspects, the disclosure provides an isolated nucleic
acid comprising a sequence represented by any one of SEQ ID NO:
5-19. In some aspects, the disclosure provides a protein encoded by
an isolated nucleic acid comprising a sequence represented by any
one of SEQ ID NO: 5-19.
[0015] In some aspects, the disclosure relates to a host cell
comprising a tRNA synthetase protein variant as described by the
disclosure. In some aspects, the disclosure relates to an isolated
nucleic acid as described by the disclosure. In some embodiments,
the tRNA synthetase protein variant is orthogonal to the host cell
(e.g., not expressed naturally in the host cell). In some
embodiments, a host cell is a bacterial cell. In some embodiments,
a bacterial cell is an E. coli cell.
[0016] In some aspects, the disclosure relates to a selection
system comprising: a first container housing a selection phagemid
as described by the disclosure; a second container housing a
positive selection system as described by the disclosure; and,
optionally, a third container housing negative selection system as
described by the disclosure.
[0017] In some embodiments, a selection system further comprises a
container housing one or more bacterial cells. In some embodiments,
the bacterial cells are E. coli cells.
[0018] In some aspects, the disclosure relates to methods of using
tRNA synthetase protein variants described by the disclosure. In
some embodiments, the disclosure relates to methods for
aminoacylation of a tRNA, the methods comprising, contacting a tRNA
encoding an amber codon with a tRNA synthetase protein variant as
described by the disclosure in the presence of a non-canonical
amino acid.
[0019] In some embodiments, a non-canonical amino acid is a
pyrolysine or a p-iodo-L-phenylalanine. In some embodiments, the
tRNA is contacted with the tRNA synthetase inside a cell.
[0020] In some aspects, the disclosure relates to methods for
incorporating a non-canonical amino acid into a peptide, the method
comprising expressing in a cell: (i) an mRNA transcript, wherein
the transcript comprises an amber codon at a position in which a
non-canonical amino acid (ncAA) is to be translated; (ii) a tRNA
capable of incorporating the ncAA; (iii) a tRNA synthetase protein
variant as described herein, wherein (i), (ii), and (iii) are
expressed in the presence of the ncAA.
[0021] In some embodiments, the non-canonical amino acid is a
pyrolysine or a p-iodo-L-phenylalanine. In some embodiments, the
tRNA synthetase is orthogonal to the cell. In some embodiments, the
cell is an E. coli cell.
[0022] The summary above is meant to illustrate, in a non-limiting
manner, some of the embodiments, advantages, features, and uses of
the technology disclosed herein. Other embodiments, advantages,
features, and uses of the technology disclosed herein will be
apparent from the Detailed Description, the Drawings, the Examples,
and the Claims
BRIEF DESCRIPTION OF DRAWINGS
[0023] FIG. 1 shows an overview of orthogonal translation in
biological systems. The orthogonal amber suppressor tRNA is not
recognized by any of the cell's endogenous AARS enzymes, but is
selectively aminoacylated by the orthogonal AARS with the desired
ncAA. The charged amber suppressor tRNA decodes `UAG` stop codons
during translation of the protein of interest, enabling
site-specific incorporation of the ncAA into proteins made by the
cell.
[0024] FIGS. 2A-2C show an overview of PACE positive selections for
the continuous evolution of AARS activity. FIG. 2A shows strategies
for linking AARS activity to the expression of gene III, which
encodes the pIII protein required for phage to be infectious. In
strategy 1, AARS-catalyzed aminoacylation of an amber suppressor
tRNA enables translation of full-length T7 RNAP from a transcript
containing a premature amber stop codon. T7 RNAP subsequently
drives expression of gene III from the T7 promoter (P.sub.T7). In
strategy 2, amber suppressor tRNA aminoacylation permits
full-length translation of pIII from gene III mRNA containing a
premature stop codon. FIG. 2B shows host-cell plasmids used to
implement both selection strategies. The accessory plasmid (AP)
encodes gene III and the amber suppressor tRNA. The complementary
plasmid (CP) encodes T7 RNAP controlled by the phage-shock promoter
(P.sub.psp), which is induced only upon phage infection. The
mutagenesis plasmid (MP) increases the rate of evolution during
PACE through arabinose-induced production of mutagenic proteins.
The selection phage (SP) encodes all phage genes except gene III,
which is replaced by the evolving AARS gene. FIG. 2C shows a
diagram of PACE with selection strategy 1 plasmids shown. SPs
capable of catalyzing aminoacylation of the amber suppressor tRNA
result in production of pIII protein from gene III of the AP in
host E. coli. Under continuous dilution in the fixed-volume vessel
(the "lagoon"), phage that are capable of triggering the production
of pIII propagate faster than the rate of dilution, resulting in
the continuous enrichment of SPs encoding active AARS variants.
[0025] FIG. 3 shows Non-canonical amino acids in this study. (1)
p-nitro-L-phenylalanine, (2) NF-(tert-butoxycarbonyl)-L-lysine, (3)
L-pyrrolysine, (4) NF-acetyl-L-lysine, (5) 3-iodo-L-phenylalanine,
and (6) p-iodo-L-phenylalanine.
[0026] FIGS. 4A-4C show optimization of the T7 RNAP-mediated PACE
positive selection for aminoacylation. Two amber stop codons in T7
RNAP are required to make reporter expression completely dependent
on orthogonal translation of full-length T7 RNAP. FIG. 4A shows a
luciferase reporter assay for optimizing the position and number of
TAG stop codons in T7 RNAP. The orthogonal AARS (inducible with
IPTG) charges the ncAA onto the amber suppressor tRNA, enabling
translation of full-length T7 RNAP (inducible with
anhydrotetracycline, ATc). Production of T7 RNAP results in
subsequent expression of the luciferase reporter gene, luxAB. FIG.
4B shows using p-NFRS to site-specifically incorporate p-NF at two
positions (Ser12TAG+Ser203TAG) in T7 RNAP provided optimal reporter
signal that was dependent on orthogonal AARS activity (+IPTG,
+ncAA) and on expression of T7 RNAP (+ATc). FIG. 4C shows using
chPylRS, reporter signal resulting from site-specific incorporation
of BocK into T7 RNAP(Ser12TAG+Ser203TAG) suggests broad
ncAA-tolerance at both sites of ncAA installation. Each value and
error bar in b and c reflects the mean and s.d. of at least three
independent biological replicates.
[0027] FIGS. 5A-5D show non-continuous propagation of SP in
positive selections designed for PACE. To confirm activity
dependence of phage propagation for each of the two positive
selections (suppression of stop codons in T7 RNAP or in gene III),
phage titers resulting from 16 h of propagation in batch culture
were compared for SP expressing p-NFRS (FIG. 5A, FIG. 5C) or
chPylRS (FIG. 5B, FIG. 5D). In each experiment, equal amounts of SP
encoding the AARS of interest were used to infect cultures of S1030
host cells harboring the required PACE AP and CP plasmids in the
presence or absence of the ncAA. Controls representing the starting
titers for each set of experiments were prepared by diluting the
same amount of SP into media lacking cells. Results indicate that
selection stringency increases as the number of stop codons is
increased in T7 RNAP (FIGS. 5A-5B) or gIII (FIGS. 5C-5D).
[0028] FIGS. 6A-6B show positive selections for aminoacylation
support activity-dependent, continuous propagation in PACE. FIG. 6A
shows that, in lagoon 1 (L1) supplemented with 1 mM p-NF, SP-p-NFRS
propagates for 48 h of PACE using the selection based on amber
suppression of two stop codons in T7 RNAP. SP-Kan, which lacks AARS
activity, however, rapidly washed out of lagoon 2 (L2) by the first
time point (16 h) under identical conditions. FIG. 6B shows phage
were propagated for 30 h of PACE in the presence of 1 mM p-NF
starting from a 1:1 mixture of SP-p-NFRS and SP-MBP-TEV using the
selection based on amber suppression of a single stop codon in gene
III. Activity-dependent phage titers and PCR analysis of phage
taken from each time point sampled during PACE confirmed that
SP-p-NFRS propagated exclusively while SP-MBP-TEV rapidly washed
out.
[0029] FIGS. 7A-7B show evolution of AARS activity during mock
PACE. FIG. 7A shows p-NFRS was challenged to aminoacylate the amber
suppressor tRNA in the absence of its cognate ncAA substrate, p-NF,
over 48 h of positive selection PACE conducted in two separate
lagoons (L1 and L2). Enhanced mutagenesis from the MP was supplied
in L2 only. Phage titers of L2 (green) rapidly increased after 16
h, while titers in L1 (magenta) were relatively stable throughout
the evolution. FIG. 7B shows mutations in PACE-evolved clones and
the relative amino acid substrate specificities of clones from L2.
Relative aminoacylation activity was compared in the PACE host
strain, S1030, by measuring luminescence signal resulting from
amber suppression of a premature stop codon at position 361 of a
luciferase gene (luxAB). More coding mutations were obtained in
phage isolates from L2, in which the MP provided enhanced
mutagenesis, and every characterized L2 mutant emerged from PACE
with increased activity on endogenous amino acids (no ncAA)
compared to the progenitor enzyme, p-NFRS. Each value and error bar
in b reflects the mean and s.d. of at least three independent
biological replicates.
[0030] FIGS. 8A-8E show continuous evolution and characterization
of chimeric pyrrolysyl-tRNA synthetase (chPylRS) variants with
enhanced aminoacylation activity. FIG. 8A shows PACE was performed
in three segments designed to gradually increase selection
stringency. The first two segments (Pyl-1 and Pyl-2) used the
selection requiring amber suppression of two stop codons in T7
RNAP, and the final segment (Pyl-3) used the selection requiring
direct amber suppression of stop codons in gene III. The number of
stop codons in the gene required for each selection and the
concentration of BocK substrate are shown above the phage titer
graph. Dotted lines (black) indicate transfer of evolved phage from
the end of each PACE segment into the subsequent segment. Triangles
indicate convergence toward the specified mutations. FIGS. 8B-8C
show the relative expression of luciferase containing BocK at
position 361 resulting from aminoacylation by progenitor enzyme,
chPylRS, compared to evolved variants from the end of PACE segment
Pyl-1 (FIG. 8B) or compared to variants containing only the
consensus mutations from the end of each PACE segment (FIG. 8C).
Labels correspond to PACE segments in FIG. 8A. FIG. 8D shows the
relative efficiency of multisite, BocK incorporation into sfGFP
resulting from aminoacylation by chPylRS variants with or without
beneficial mutations discovered in PACE (V31I, T56P, H62Y, and
A100E; IPYE). FIG. 8E shows the relative efficiency of AcK
incorporation at position 2 of sfGFP resulting from aminoacylation
by AcK3RS variants with or without transplanted mutations from
PACE. Each value and error bar in b-e reflects the mean and s.d. of
at least three independent biological replicates.
[0031] FIGS. 9A-9F show mutations emerging from PACE enhance the
activity of PylRS variants on their target ncAA. FIG. 9A shows
contributions toward improved activity from consensus mutations in
chPylRS generated during PACE segments Pyl-1 and Pyl-2. FIGS. 9C-9D
show transplantation of the activity-enhancing PACE mutations V31I,
T56P, H62Y, and A100E (IPYE) into M. barkeri (Mb) or M. Mazei (Mm)
PylRS greatly improved the expression levels of luciferase
containing the ncAA BocK at position 361 (FIG. 9B) and the
expression levels of sfGFP containing a BocK at position 2 (FIG.
9C) or position 151 (FIG. 9D). FIGS. 9E-9F show transplantation of
the `IPYE` mutations into multiple variants of AcK3RS (FIG. 9E) or
into the chimeric IFRS (FIG. 9F) improved expression of luciferase
containing the ncAA residue at position 361. Each value and error
bar in b-e reflects the mean and s.d. of at least three independent
biological replicates.
[0032] FIGS. 10A-10D show ESI-MS analysis of purified sfGFP
containing up to three BocK residues produced by chPylRS(IPYE).
Analysis of purified wild type sfGFP (FIG. 10A) or sfGFP containing
one (FIG. 10B), two (FIG. 10C) or three (FIG. 10D) BocK residues
produced by chPylRS(IPYE) in the presence of 1 mM ncAA. BocK
substitutions in sfGFP were made in response to premature amber
stop codons at positions 39 (1.times.TAG), 39 and 151
(2.times.TAG), or 39, 135, and 151 (3.times.TAG). Protein was
expressed in TOP10 cells in LB media. The major peak in each of the
spectra was in agreement with the calculated mass of BocK
incorporation. In each of the spectra containing BocK, a minor peak
corresponding to an unclipped N-terminal methionine was also
observed (calculated mass+131.19 Da).
[0033] FIG. 11 shows ESI-MS analysis of purified sfGFP containing
an AcK residue at position 2 produced by chAcK3RS(IPYE) in the
presence of 1 mM AcK. Protein was expressed in TOP10 cells in LB
media, and the major peak at found at 27,812.58 Da was in agreement
with the calculated value (27,812.3 Da).
[0034] FIGS. 12A-12C show characterization of split variants of
chPylRS emerging from PACE. Evolved split variants of chPylRS
require the `IPYE` tetramutation to retain high activity.
Aminoacylation is dependent on both the N- and C-terminal fragments
of the chPylRS variants shown. FIGS. 12A-12B show the relative
expression sfGFP containing three premature stop codons at
positions 39, 135, and 151 (sfGFP(3.times.TAG)) was compared in the
presence or absence of 1 mM BocK for the six, split proteins
containing the `IPYE` tetramutation (FIG. 12A) or with variants
lacking the tetramutation (FIG. 12B). FIG. 12C shows the relative
expression of sfGFP(Asn39TAG) in the presence of the unsplit
chPylRS(IPYE) was compared to expression in the presence of the
N-terminal fragments of split2 (NTerm.S2), split3 (NTerm.S3), or
split6 (NTerm.S6) or the C-terminal fragment (CTerm) that would
result from reinitiation at Met-107. Each value and error bar
reflects the mean and s.d. of four independent biological
replicates.
[0035] FIG. 13 shows characterization of the S326I mutation
emerging from lagoon 2 during the Pyl-3 segment of PACE. The
relative activity of split1, split2, and split3 containing the
additional mutation, S326I, were compared to variants lacking the
mutation and to the full-length chPylRS(IPYE). Each variant was
used to produce sfGFP(3.times.TAG) containing three premature stop
codons at positions 39, 135, and 151. Each value and error bar
represents the s.d. of four independent biological replicates.
[0036] FIGS. 14A-14B show Western blot analysis of full-length and
split chPylRS variants from PACE. FIG. 14A shows the chPylRS
variants were N-terminally tagged with c-Myc and C-terminally
tagged with 6.times.His to enable two-color detection of the
expressed proteins in order to characterize translation of stop
codon-containing mutants that arose during PACE.
[0037] FIG. 14B shows Western blot analysis of the protein lysates
expressed in BL21 star DE3 cells indicated that the full-length
variants chPylRS and PylRS(IPYE) were expressed with the N- and
C-termini intact, but the presence of an internal start site also
promotes alternative expression of the truncated, C-terminal
fragment. Each of the split variants (split2, split3, and split 6)
are expressed as two, distinct N- and C-terminal fragments
indicating termination of translation at the premature stop codon
and reinitiation at an internal start site.
[0038] FIGS. 15A-15D show ESI-MS analysis of affinity-tagged
Ni-NTA-purified chPylRS variants from PACE. The evolved
synthetases, chPylRS(IPYE) (FIG. 15A), Spit2 (FIG. 15B), Split3
(FIG. 15C), and Split6 (FIG. 15D) were labeled with an N-terminal
c-Myc-tag and a C-terminal 6.times.His-tag and purified over Ni-NTA
resin prior to ESI-MS analysis. In the split variants of chPylRS,
the N-terminal fragment is lost upon affinity purification. Protein
was expressed in BL21 star DE3 cells in LB media. The major peaks
in each spectra were in agreement with the calculated mass of the
full-length enzyme, chPylRS(IPYE) (FIG. 15A), or the C-terminal
fragment resulting from reinitiation at position Met-107 (FIGS.
15A-15D).
[0039] FIG. 16 shows alignment of PylRS sequences from multiple
organisms and from PACE variants. Activity enhancing mutations from
PACE and premature stop codons (*) that emerged in each of the
split variants are shown. Note that the activity-enhancing A100E
mutation became A100S in Split1 and Split 2 due to the frameshift.
Split3, Split4, and Split6 each lack the A100E mutations because
they terminate earlier in the sequence. Arrows denote the PylSn and
PylSc gene products of the D. hafniense strains.
[0040] FIGS. 17A-17E show evolution of AARS variants from dual
positive- and negative-selection PACE with greatly improved amino
acid specificity. FIG. 17A shows strategy for linking undesired
aminoacylation to gene III-neg expression, which encodes the
pIII-neg protein. When undesired aminoacylation occurs in the
negative selection, pIII-neg is produced, impeding progeny phage
infectivity. In the absence of undesired aminoacylation, only pIII
is produced, resulting in infectious phage progeny.
Negative-selection stringency is modulated by ATc concentration.
FIG. 17B shows host-cell plasmids used to implement the negative
selection.
[0041] FIG. 17C shows a diagram of dual-selection PACE using
simultaneous positive and negative selections. Evolving phage are
continuously cross-seeded between positive and negative selection
lagoons at a 50-fold dilution. FIG. 17D shows the relative
site-specific incorporation efficiency of either endogenous amino
acids (no ncAA), p -NF, or p-IF at position 39 of sfGFP resulting
from aminoacylation by p-NFRS, p-IFRS, or evolved variants from
PACE (Iodo.1, Iodo.5, Iodo.7, and Iodo.8). FIG. 17E shows the
predicted position of mutations evolved during dual-selection PACE.
The shown crystal structure is the p-NFRS protein sequence aligned
to pdb:2AG636, which is the crystal structure of an AARS that has
the identical protein sequence of p-IFRS and is bound to the ncAA
substrate, p-bromo-L-phenylalanine. The shaded spheres in the
crystal structure correspond to the mutations in the table to the
left. Active-site residues within a 5 .ANG. radius around the ncAA
substrate are colored gray. Each value and error bar in d reflects
the mean and s.d. of at least three independent biological
replicates.
[0042] FIGS. 18A-18B show an overview of the PACE negative
selection for AARS activity using the dominant-negative variant of
pIII (pIII-neg). FIG. 18A shows a diagram of PACE negative
selection plasmids. PACE host cells (S1030) are cotransformed with
the negative-selection accessory plasmid (AP-) and a
negative-selection complementary plasmid (CP-). When an SP infects
the negative selection host, production of pIII protein from gene
III is induced from the phage shock promoter (Ppsp) of the AP-. If
the AARS encoded by the SP can catalyze aminoacylation under the
conditions of the negative selection (e.g., in the absence of
ncAA), full-length T7 RNAP is produced from the AP- through amber
suppression of amber stop codons at position 12 and 203 of the T7
RNAP gene. When full-length T7 RNAP is produced, expression of gene
III-neg is induced from the T7 promoter (PT7) of the CP-resulting
in production of the dominant-negative pIII-neg protein. The
infectivity of progeny phage decreases with the amount of pIII-neg
in the host cell. Expression levels of the T7 RNAP gene on the AP-
are also controlled by an ATc-inducible promoter (Ptet), allowing
the negative selection to be turned on or off during PACE. FIG. 18B
shows a diagram of inputs and outputs of the AND logic gate created
by the PACE negative selection. The dominant-negative pIII-neg
protein is produced only in the presence of both aminoacylation
activity and ATc. In the absence of either negative-selection
input, progeny phage are infectious and carry forward the encoded
AARS into the subsequent round of evolution in PACE.
[0043] FIGS. 19A-19C show validation of the PACE negative
selection. FIG. 19A shows mock PACE experiments were performed in
parallel to demonstrate that the negative selection is dependent on
both aminoacylation activity and the concentration of ATc. In
lagoon 1 (L1), SP-p-NFRS was propagated in the absence of substrate
amino acid (-p-NF) to determine the maximum concentration of ATc
that could be tolerated without decreasing the rate of phage
propagation when aminoacylation does not occur. In lagoon 2 (L2),
SP-p-NFRS and SP-MBP-TEV were both propagated in the presence of
the p-NFRS substrate (+p-NF) to determine the minimum concentration
of ATc that would support negative selection when aminoacylation
does occur. FIG. 19B shows activity-dependent titers were measured
to detect the relative amount of active SP-p-NFRS present in the
lagoons at each sampled time point of PACE. In L1, the maximum
concentration of ATc (broken gray line) that did not affect phage
propagation was 30 ng/mL. In L2 (magenta line), the minimum
concentration of ATc that induced negative selection against
aminoacylation was 10 ng/mL. FIG. 19C shows PCR analysis of phage
from each sampled time point of L2 confirms that the inactive
SP-MBP-TEV was selectively enriched from a 1000:1 excess of
SP-p-NFRS at time points that correspond to ATc concentrations
between 10 and 30 ng/mL (16-40 h of PACE).
[0044] FIGS. 20A-20D show the previously evolved AARS, p-NFRS,
accepts multiple amino acid substrates. ESI-MS analysis of purified
wild type sfGFP (FIG. 20A) or sfGFP(Asn39TAG) expressed with p-NFRS
in the presence of 1 mM p-NF (FIG. 20B), no ncAA (FIG. 20C), or 1
mM p-IF (FIG. 20D) demonstrates that p-NFRS accepts Phe, p-NF, and
p-IF. Protein was expressed in BL21 star DE3 cells in LB media.
FIG. 20B shows a peak corresponding to incorporation of p-NF into
sfGFP was observed at 27,918.09 Da (calculated: 27,918.31 Da).
FIGS. 20B-20C show peaks corresponding to incorporation of Phe were
found at 27,873.01 Da and 27873.09 Da, respectively, (calculated:
27,873.32 Da) from expression in the presence or absence of 1 mM
p-NF. FIG>20C shows a peak corresponding to incorporation of
p-IF into sfGFP was found at 27,999.04 Da (calculated: 27,999.22
Da). Minor peaks in each spectrum correspond to an unclipped
N-terminal methionine (calculated mass+131.19 Da).
[0045] FIGS. 21A-21B show dual-selection PACE of the polyspecific
MjTyrRS variant, p-NFRS, to evolve selective activity on p-IF. FIG.
21A shows a diagram of chemostats and lagoons during dual-selection
PACE. DRM media supplemented with 4 mM p-NF was pumped into the
negative selection lagoon and DRM media supplemented with 1 mM p-IF
was pumped into the positive selection lagoon. Host cell cultures
from each chemostat were pumped through the corresponding lagoons
that were supplemented with required inducers (ATc and arabinose).
The opposing lagoons were coupled such that material was
continuously exchanged (`cross-seeded`) between each lagoon at a
50-fold slower flow rate (gray arrows) with respect to the flow
rate from the chemostats through each lagoon (black arrows). FIG.
21B shows a plot of phage titers measured from samples taken at the
indicated time points from each lagoon during PACE. Positive
selection was conducted exclusively for the first 24 h of the
experiment, and dual-selection began at the 24-h time point by
cross-seeding phage between the opposing lagoons. The flow rate
from the chemostats through the lagoons (broken gray line) was
doubled after the two lagoons were coupled, and the flow rate of
cross-seeded material was adjusted to maintain 50-fold dilution
into the opposing selections.
[0046] FIGS. 22A-22B show non-continuous counterselections to
isolate p-IF-selective evolved AARS variants after dual-selection
PACE. FIG. 22A shows two counterselections were performed in
parallel without enhanced mutagenesis (no MP) on the evolved pool
of SP sampled from the negative-selection lagoon at the end of
dual-selection PACE. Negative selections were performed in batch
culture to non-continuously propagate phage lacking unwanted AARS
activity on canonical amino acids and p-NF. The stringent negative
selection (left) was performed in host cells containing an
AP-:CP-pair in which the ATc-inducible promoter driving expression
of T7 RNAP(Ser12TAG, Ser203TAG) on AP- (FIG. 18) was replaced with
the strong, PproD constitutive-promoter1. A less stringent negative
selection was performed (right) using an AP-:CP- pair in which the
weaker PproA constitutive-promoter1 was upstream of T7
RNAP(Ser12TAG, Ser203TAG). SPs that propagated overnight in the
non-continuous negative selection were isolated and used to infect
positive-selection host cells to conduct activity-dependent plaque
assays in the presence of p-NF or p-IF. Plaques that formed in the
presence of the desired amino acid, p-IF, were isolated and
subjected to DNA sequencing. FIG. 22B shows data from parallel
counterselections. The enrichment factor reports the number of
activity-dependent plaques that formed in 1 mM p-IF divided by the
number of plaques that formed in 1 mM p-NF. Mutants marked "RBS
mutation" indicate that the ribosome-binding site (RBS) driving
translation of the AARS was mutated; these clones were not further
characterized.
[0047] FIGS. 23A-23B show the PACE-evolved Iodo.5 variant is highly
selective for the desired substrate, p-IF. ESI-MS analysis of
purified sfGFP from expression of sfGFP(Asn39TAG) with p-IFRS (FIG.
23A) or Iodo.5 (FIG. 23B) in LB media supplemented with both 1 mM
p-NF and 1 mM p-IF demonstrates that each AARS enzyme selectively
incorporates p-IF. (a, b) A peak corresponding to incorporation of
p-IF into sfGFP was found at 27,999.52 Da and 27,999.45 Da,
respectively (calculated: 27,999.22 Da). Incorporation of p-NF into
sfGFP was calculated to have a mass of 27,918.31 Da (dashed
line).
DEFINITIONS
[0048] The term "phage-assisted continuous evolution (PACE)," as
used herein, refers to continuous evolution that employs phage as
viral vectors. The general concept of PACE technology has been
described, for example, in International PCT Application,
PCT/US2009/056194, filed Sep. 8, 2009, published as WO 2010/028347
on Mar. 11, 2010; International PCT Application, PCT/US2011/066747,
filed Dec. 22, 2011, published as WO 2012/088381 on Jun. 28, 2012;
U.S. Application, U.S. Pat. No. 9,023,594, issued May 5, 2015,
International PCT Application, PCT/US2015/012022, filed Jan. 20,
2015, published as WO 2015/134121 on Sep. 11, 2015, and
International PCT Application, PCT/US2016/027795, filed Apr. 15,
2016, published as WO 2016/168631 on Oct. 20, 2016, the entire
contents of each of which are incorporated herein by reference.
[0049] The term "continuous evolution," as used herein, refers to
an evolution process, in which a population of nucleic acids
encoding a gene to be evolved is subjected to multiple rounds of
(a) replication, (b) mutation, and (c) selection to produce a
desired evolved version of the gene to be evolved that is different
from the original version of the gene, for example, in that a gene
product, such as, e.g., an RNA or protein encoded by the gene,
exhibits a new activity not present in the original version of the
gene product, or in that an activity of a gene product encoded by
the original gene to be evolved is modulated (increased or
decreased). The multiple rounds can be performed without
investigator intervention, and the steps (a)-(c) can be carried out
simultaneously. Typically, the evolution procedure is carried out
in vitro, for example, using cells in culture as host cells. In
general, a continuous evolution process provided herein relies on a
system in which a gene encoding a gene product of interest is
provided in a nucleic acid vector that undergoes a life-cycle
including replication in a host cell and transfer to another host
cell, wherein a critical component of the life-cycle is deactivated
and reactivation of the component is dependent upon an activity of
the gene to be evolved that is a result of a mutation in the
nucleic acid vector.
[0050] The term "vector," as used herein, refers to a nucleic acid
that can be modified to encode a gene of interest and that is able
to enter into a host cell, mutate and replicate within the host
cell, and then transfer a replicated form of the vector into
another host cell. Exemplary suitable vectors include viral
vectors, such as retroviral vectors or bacteriophages, and
conjugative plasmids. Additional suitable vectors will be apparent
to those of skill in the art based on the instant disclosure.
[0051] The term "viral vector," as used herein, refers to a nucleic
acid comprising a viral genome that, when introduced into a
suitable host cell, can be replicated and packaged into viral
particles able to transfer the viral genome into another host cell.
The term viral vector extends to vectors comprising truncated or
partial viral genomes. For example, in some embodiments, a viral
vector is provided that lacks a gene encoding a protein essential
for the generation of infectious viral particles. In suitable host
cells, for example, host cells comprising the lacking gene under
the control of a conditional promoter, however, such truncated
viral vectors can replicate and generate viral particles able to
transfer the truncated viral genome into another host cell. In some
embodiments, the viral vector is a phage, for example, a
filamentous phage (e.g., an M13 phage). In some embodiments, a
viral vector, for example, a phage vector, is provided that
comprises a gene of interest to be evolved.
[0052] The term "phage," as used herein interchangeably with the
term "bacteriophage," refers to a virus that infects bacterial
cells. Typically, phages consist of an outer protein capsid
enclosing genetic material. The genetic material can be ssRNA,
dsRNA, ssDNA, or dsDNA, in either linear or circular form. Phages
and phage vectors are well known to those of skill in the art and
non-limiting examples of phages that are useful for carrying out
the methods provided herein are X (Lysogen), T2, T4, T7, T12, R17,
M13, MS2, G4, P1, P2, P4, Phi X174, N4, (P6, and (D29. In certain
embodiments, the phage utilized in the present invention is M13.
Additional suitable phages and host cells will be apparent to those
of skill in the art and the invention is not limited in this
aspect. For an exemplary description of additional suitable phages
and host cells, see Elizabeth Kutter and Alexander Sulakvelidze:
Bacteriophages: Biology and Applications. CRC Press; 1.sup.st
edition (December 2004), ISBN: 0849313368; Martha R. J. Clokie and
Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume
1: Isolation, Characterization, and Interactions (Methods in
Molecular Biology) Humana Press; 1.sup.st edition (December, 2008),
ISBN: 1588296822; Martha R. J. Clokie and Andrew M. Kropinski:
Bacteriophages: Methods and Protocols, Volume 2: Molecular and
Applied Aspects (Methods in Molecular Biology) Humana Press;
1.sup.st edition (December 2008), ISBN: 1603275649; all of which
are incorporated herein in their entirety by reference for
disclosure of suitable phages and host cells as well as methods and
protocols for isolation, culture, and manipulation of such
phages).
[0053] The term "accessory plasmid," as used herein, refers to a
plasmid comprising a gene required for the generation of infectious
viral particles under the control of a conditional promoter. In the
context of continuous evolution of genes, transcription from the
conditional promoter of the accessory plasmid is typically
activated, directly or indirectly, by a function of the gene to be
evolved. Accordingly, the accessory plasmid serves the function of
conveying a competitive advantage to those viral vectors in a given
population of viral vectors that carry a version of the gene to be
evolved able to activate the conditional promoter or able to
activate the conditional promoter more strongly than other versions
of the gene to be evolved. In some embodiments, only viral vectors
carrying an "activating" version of the gene to be evolved will be
able to induce expression of the gene required to generate
infectious viral particles in the host cell, and, thus, allow for
packaging and propagation of the viral genome in the flow of host
cells. Vectors carrying non-activating versions of the gene to be
evolved, on the other hand, will not induce expression of the gene
required to generate infectious viral vectors, and, thus, will not
be packaged into viral particles that can infect fresh host
cells.
[0054] The term "helper phage," as used herein, interchangeable
with the terms "helper phagemid" and "helper plasmid," refers to a
nucleic acid construct comprising a phage gene required for the
phage life cycle, or a plurality of such genes, but lacking a
structural element required for genome packaging into a phage
particle. For example, a helper phage may provide a wild-type phage
genome lacking a phage origin of replication. In some embodiments,
a helper phage is provided that comprises a gene required for the
generation of phage particles, but lacks a gene required for the
generation of infectious particles, for example, a full-length pIII
gene. In some embodiments, the helper phage provides only some, but
not all, genes for the generation of infectious phage particles.
Helper phages are useful to allow modified phages that lack a gene
for the generation of infectious phage particles to complete the
phage life cycle in a host cell. Typically, a helper phage will
comprise the genes for the generation of infectious phage particles
that are lacking in the phage genome, thus complementing the phage
genome. In the continuous evolution context, the helper phage
typically complements the selection phage, but both lack a phage
gene required for the production of infectious phage particles.
[0055] The term "selection phage," as used herein interchangeably
with the term "selection plasmid," refers to a modified phage that
comprises a nucleic acid sequence encoding a tRNA synthetase to be
evolved, and lacks a full-length gene encoding a protein required
for the generation of infectious phage particles. For example, some
M13 selection phages provided herein comprise a nucleic acid
sequence encoding a gene to be evolved, e.g., under the control of
an M13 promoter, and lack all or part of a phage gene encoding a
protein required for the generation of infectious phage particles,
e.g., gI, gII, gIII, gIV, gV, gVI, gVII, gVIII, gIX, or gX, or any
combination thereof. For example, some M13 selection phages
provided herein comprise a nucleic acid sequence encoding a tRNA
synthetase protein to be evolved, e.g., under the control of an M13
promoter, and lack all or part of a gene encoding a protein
required for the generation of infective phage particles, e.g., the
gIII gene encoding the pIII protein.
[0056] The term "mutagenesis plasmid," as used herein, refers to a
plasmid comprising a gene encoding a gene product that acts as a
mutagen. In some embodiments, the gene encodes a DNA polymerase
lacking a proofreading capability. In some embodiments, the gene is
a gene involved in the bacterial SOS stress response, for example,
a UmuC, UmuD', or RecA gene. In some embodiments, the gene is a
GATC methylase gene, for example, a deoxyadenosine methylase (dam
methylase) gene. In some embodiments, the gene is involved in
binding of hemimethylated GATC sequences, for example, a seqA gene.
In some embodiments, the gene is involved with repression of
mutagenic nucleobase export, for example emrR. Mutagenesis plasmids
(also referred to as mutagenesis constructs) are described, for
example by International Patent Application, PCT/US2016/027795,
filed Apr. 16, 2016, published as WO2016/168631 on Oct. 20, 2016,
the entire contents of which are incorporated herein by
reference.
[0057] The term "nucleic acid," as used herein, refers to a polymer
of nucleotides. The polymer may include natural nucleosides (i.e.,
adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine,
deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside
analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine,
pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine,
C5-bromouridine, C5-fluorouridine, C5-iodouridine,
C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine,
7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine,
0(6)-methylguanine, 4-acetylcytidine,
5-(carboxyhydroxymethyl)uridine, dihydrouridine,
methylpseudouridine, 1-methyl adenosine, 1-methyl guanosine,
N6-methyl adenosine, and 2-thiocytidine), chemically modified
bases, biologically modified bases (e.g., methylated bases),
intercalated bases, modified sugars (e.g., 2'-fluororibose, ribose,
2'-deoxyribose, 2'-O-methylcytidine, arabinose, and hexose), or
modified phosphate groups (e.g., phosphorothioates and
5'-N-phosphoramidite linkages).
[0058] The term "protein," as used herein, refers to a polymer of
amino acid residues linked together by peptide bonds. The term, as
used herein, refers to proteins, polypeptides, and peptide of any
size, structure, or function. Typically, a protein will be at least
three amino acids long. A protein may refer to an individual
protein or a collection of proteins. Inventive proteins preferably
contain only natural amino acids, although non-natural amino acids
(i.e., compounds that do not occur in nature but that can be
incorporated into a polypeptide chain; see, for example,
cco.caltech.edu/-dadgrp/Unnatstruct.gif, which displays structures
of non-natural amino acids that have been successfully incorporated
into functional ion channels) and/or amino acid analogs as are
known in the art may alternatively be employed. Also, one or more
of the amino acids in an inventive protein may be modified, for
example, by the addition of a chemical entity such as a
carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl
group, an isofarnesyl group, a fatty acid group, a linker for
conjugation, functionalization, or other modification, etc. A
protein may also be a single molecule or may be a multi-molecular
complex. A protein may be just a fragment of a naturally occurring
protein or peptide. A protein may be naturally occurring,
recombinant, or synthetic, or any combination of these.
[0059] The term "evolved synthetase protein," as used herein,
refers to a tRNA synthetase protein variant that is expressed by a
gene of interest (e.g., a gene encoding a wild-type synthetase
protein, such as a wild-type pyrrolysyl-tRNA synthetase) that has
been subjected to continuous evolution, such as PACE or SE-PACE.
Examples of evolved synthetase proteins include but are not limited
to evolved aminoacyl-tRNA synthetases (AARSs), such as evolved
pyrrolysyl-tRNA synthetase (PylRS) proteins and evolved
tyrosyl-tRNA synthetase (TyrRS) proteins.
[0060] The term "wild-type pyrrolysyl-tRNA synthetase (PylRS)"
refers to the amino acid sequence of a pyrrolysyl-tRNA synthetase
(PylRS) protein as it naturally occurs in the genome of the host
from which it is derived. Examples of a wild-type PylRS proteins
include Methanosarcina barkeri PylRS (MbPylRS), which is
represented by the amino acid sequence set forth in SEQ ID NO: 20
or the amino acid sequence of NCBI Accession Number WP_011305865.1,
or Methanosarcina mazei PylRS (MmPylRS), which is represented by
the amino acid sequence set forth in SEQ ID NO: 21 or the amino
acid sequence of NCBI Accession Number WP_011033391.1.
[0061] The term "wild-type tyrosyl-tRNA synthetase (TyrRS)" refers
to the amino acid sequence of a tyrosyl-tRNA synthetase (TyrRS)
protein as it naturally occurs in the genome of the host from which
it is derived. Examples of a wild-type TyrRS proteins include M.
jannaschii TyrRS (MjTyrRS), which is represented by the amino acid
sequence set forth in SEQ ID NO: 24 or the amino acid sequence of
NCBI Accession Number WP_010869888.1.
[0062] The term "pyrrolysyl-tRNA synthetase (PylRS) protein
variant" refers to a PylRS protein having one or more amino acid
variations introduced into the amino acid sequence, e.g., as a
result of application of the PACE method, as compared to the amino
acid sequence of a naturally-occurring or wild-type PylRS protein.
Amino acid sequence variations may include one or more mutated
residues within the amino acid sequence of the PylRS protein
variant, e.g., as a result of a change in the nucleotide sequence
encoding the protein that results in a change in the codon at any
particular position in the coding sequence, the deletion of one or
more amino acids (e.g., a truncated protein), the insertion of one
or more amino acids, or any combination of the foregoing. In some
embodiments, the N- or C-terminal domain of a PylRS variant is a
variant of a naturally-occurring PylRS from an organism, that does
not occur in nature. In some embodiments, a PylRS variant or PylRS
N- or C-terminal domain variant is at least 50%, at least 55%, at
least 60%, at least 65%, at least 70%, at least 75% at least 80%,
at least 85%, at least 90%, at least 95%, at least 96%, at least
97%, at least 98%, at least 99%, or at least 99.5% identical to a
naturally-occurring PylRS (or corresponding PylRS domain) from an
organism.
[0063] The term "tyrosyl-tRNA synthetase (TyrRS) protein variant"
refers to a TyrRS protein having one or more amino acid variations
introduced into the amino acid sequence, e.g., as a result of
application of the PACE method, as compared to the amino acid
sequence of a naturally-occurring or wild-type TyrRS protein. Amino
acid sequence variations may include one or more mutated residues
within the amino acid sequence of the TyrRS protein variant, e.g.,
as a result of a change in the nucleotide sequence encoding the
protein that results in a change in the codon at any particular
position in the coding sequence, the deletion of one or more amino
acids (e.g., a truncated protein), the insertion of one or more
amino acids, or any combination of the foregoing. In some
embodiments, a TyrRS variant is a variant of a naturally-occurring
TyrRS from an organism, or a variant of an evolved TyrRS that does
not occur in nature. In some embodiments, a TyrRS variant or TyrRS
N- or C-terminal domain variant is at least 50%, at least 55%, at
least 60%, at least 65%, at least 70%, at least 75% at least 80%,
at least 85%, at least 90%, at least 95%, at least 96%, at least
97%, at least 98%, at least 99%, or at least 99.5% identical to a
naturally-occurring TyrRS (or corresponding TyrRS domain) from an
organism.
[0064] The term "recombinant" as used herein in the context of
proteins or nucleic acids refers to proteins or nucleic acids that
do not occur in nature, but are the product of human engineering.
For example, in some embodiments, a recombinant protein or nucleic
acid molecule comprises an amino acid or nucleotide sequence that
comprises at least one, at least two, at least three, at least
four, at least five, at least six, or at least seven mutations as
compared to any naturally occurring sequence.
[0065] The term "mutation," as used herein, refers to a
substitution of a residue within a sequence, e.g., a nucleic acid
or amino acid sequence, with another residue, or a deletion or
insertion of one or more residues within a sequence. Mutations are
typically described herein by identifying the original residue
followed by the position of the residue within the sequence and by
the identity of the newly substituted residue. Various methods for
making the amino acid substitutions (mutations) provided herein are
well known in the art, and are provided by, for example, Green and
Sambrook, Molecular Cloning: A Laboratory Manual (4.sup.th ed.,
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.
(2012)).
[0066] The term "fusion protein" as used herein refers to a hybrid
polypeptide which comprises protein domains from at least two
different proteins. One protein may be located at the
amino-terminal (N-terminal) portion of the fusion protein or at the
carboxy-terminal (C-terminal) protein thus forming an
"amino-terminal fusion protein" or a "carboxy-terminal fusion
protein," respectively. A protein may comprise different domains,
for example, a nucleic acid binding domain (e.g., the gRNA binding
domain of Cas9 that directs the binding of the protein to a target
site) and a nucleic acid cleavage domain or a catalytic domain of a
nucleic-acid editing protein. Any of the proteins provided herein
may be produced by any method known in the art. For example, the
proteins provided herein may be produced via recombinant protein
expression and purification, which is especially suited for fusion
proteins comprising a peptide linker. Methods for recombinant
protein expression and purification are well known, and include
those described by Green and Sambrook, Molecular Cloning: A
Laboratory Manual (4.sup.th ed., Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of
which are incorporated herein by reference.
[0067] The term "chimeric protein" refers to a fusion protein in
which the first protein portion and the second protein portion are
derived from different species. For example, in some embodiments, a
chimeric PylRS protein comprises an N-terminal domain of a MbPylRS
(e.g., MbPylRS amino acids 1-149 as set forth in SEQ ID NO: 20) and
a C-terminal domain from a MmPylRS (e.g., MmPylRS amino acids
185-454 as set forth in SEQ ID NO: 21).
[0068] The term "host cell," as used herein, refers to a cell that
can host, replicate, and transfer a phage vector useful for a
continuous evolution process as provided herein. In embodiments
where the vector is a viral vector, a suitable host cell is a cell
that can be infected by the viral vector, can replicate it, and can
package it into viral particles that can infect fresh host cells. A
cell can host a viral vector if it supports expression of genes of
viral vector, replication of the viral genome, and/or the
generation of viral particles. One criterion to determine whether a
cell is a suitable host cell for a given viral vector is to
determine whether the cell can support the viral life cycle of a
wild-type viral genome that the viral vector is derived from. For
example, if the viral vector is a modified M13 phage genome, as
provided in some embodiments described herein, then a suitable host
cell would be any cell that can support the wild-type M13 phage
life cycle. Suitable host cells for viral vectors useful in
continuous evolution processes are well known to those of skill in
the art, and the disclosure is not limited in this respect.
[0069] The term "stop codon", as used herein, refers to a
three-nucleotide sequence that is present within messenger RNA
(mRNA) and typically functions to terminate protein translation.
Examples of stop codons include the DNA sequences "TAG" or "UAG"
(also referred to as an "amber codon"), "TAA" or "UAA" (also
referred to as an "ochre" codon), and "TGA" or "UGA" (also referred
to as an "opal" or "umber" codon). In some embodiments, a tRNA
synthetase protein variant, for example a PylRS protein variant, is
evolved to recognize one or more stop codons and allow protein
translation to read through the codon to produce a full-length
protein. In some embodiments, a PylRS protein variant is evolved to
enable a tRNA to insert a pyrrolyine amino acid at protein position
encoded a canonical stop codon (e.g., an amber stop codon) of an
mRNA. In some embodiments, a TyrRS protein variant is evolved to
enable a tRNA to insert a p-iodo-L-phenylalanine amino acid at
protein position encoded a canonical stop codon (e.g., an amber
stop codon) of an mRNA.
DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS
[0070] Some aspects of this disclosure provide tRNA synthetase
variants (e.g., PylRS protein variants, TyrRS protein variants,
etc.) and methods, compositions, and systems for producing the
same. In some embodiments, the disclosure relates to the use of
phage-assisted continuous evolution (PACE) to produce tRNA
synthetase protein variants. In some embodiments, tRNA synthetase
protein variants described by the disclosure exhibit improved
activity (e.g., improved incorporation of target non-canonical
amino acids (ncAAs) into tRNAs) or amino acid specificity (e.g.,
charging of preferred ncAAs) relative to the wild-type or variant
tRNA synthetase protein from which they are derived. Some aspects
of this disclosure provide fusion proteins, such as chimeric PylRS
protein variants comprising an N-terminal domain of a MbPylRS
protein or MbPylRS protein variant and a C-terminal domain of a
MmPylRS protein or MmPylRS protein variant.
PylRS Protein Variants
[0071] Some aspects of the disclosure relate to tRNA synthetase
protein variants. The disclosure is based, in part, on certain tRNA
synthetase protein variants (e.g., PylRS variants, etc.) that are
orthogonal (e.g., with respect to a non-archaebacterial cell, for
example an E. coli cell) and are characterized by increased
activity and amino acid specificity relative to wild-type tRNA
synthetase proteins (e.g., the PylRS protein from which the variant
was evolved). In some embodiments, tRNA synthetase protein variants
described by the disclosure are characterized by improved (e.g.,
increased) charging activity (e.g., binding of a non-canonical
amino acid to a tRNA via aminoacylation) relative to wild-type tRNA
synthetase proteins (e.g., the PylRS protein from which the variant
was evolved).
[0072] The tRNA synthetase protein variants described by the
disclosure are typically derived from a wild-type PylRS protein and
have at least one variation in the amino acid sequence of the
variant protein as compared to the amino acid sequence of the
cognate wild-type tRNA synthetase protein. In some embodiments, a
tRNA synthetase protein variant has at least one variation in its
encoding nucleic acid sequence that results in a change in the
amino acid sequence present within a cognate wild-type tRNA
synthetase protein. The variation in amino acid sequence generally
results from a mutation, insertion, or deletion in a DNA coding
sequence. Mutation of a DNA sequence can result in a nonsense
mutation (e.g., a transcription termination codon (TAA, TAG, or
TGA) that produces a truncated protein), a missense mutation (e.g.,
an insertion or deletion mutation that shifts the reading frame of
the coding sequence), or a silent mutation (e.g., a change in the
coding sequence that results in a codon that codes for the same
amino acid normally present in the cognate protein, also referred
to sometimes as a synonymous mutation). In some embodiments,
mutation of a DNA sequence results in a non-synonymous (i.e.,
conservative, semi-conservative, or radical) amino acid
substitution.
[0073] The tRNA synthetase protein can be any tRNA synthetase
protein known in the art. In some embodiments, a tRNA synthetase
protein variant is a pyrrolysyl-tRNA synthetase (PylRS) protein
variant. In some embodiments, a wild-type PylRS protein is a M.
bakeri PylRS (MbPylRS) protein. In some embodiments, a MbPylRS is
represented by the amino acid sequence set forth in NCBI Accession
Number WP_011305865.1 or SEQ ID NO: 20. Additional PylRS proteins
are described, for example, in Wan et al. (2014) Biochim Biophys
Acta 1844(6):1059-1070.
[0074] In some aspects, the disclosure relates to chimeric tRNA
synthetase proteins. In some embodiments, a chimeric PylRS protein
or chimeric PylRS protein variant (e.g., a chimeric PylRS protein
that has been subjected to PACE) comprises an N-terminal domain
from a first PylRS protein and a C-terminal domain from a second
PylRS protein. In some embodiments, an N-terminal domain comprises
amino acids 1-149 of a PylRS protein, for example MbPylRS (amino
acids 1-149 of SEQ ID NO: 20). In some embodiments, a C-terminal
domain comprises amino acids 185-454 of a PylRS protein, for
example MmPylRS (amino acids 185-454 of SEQ ID NO: 21). Examples of
chimeric PylRS proteins and chimeric PylRS protein variants are
described by the nucleic acid sequences set forth in SEQ ID NOs: 5,
8, and 11-17.
[0075] In some embodiments, a PylRS protein variant and a wild-type
PylRS protein (e.g., MbPylRS or MmPylRS) are from about 50% to
about 99.9% identical, about 55% to about 95% identical, about 60%
to about 90% identical, about 65% to about 85% identical, or about
70% to about 80% identical at the amino acid sequence level. In
some embodiments, a PylRS protein variant comprises an amino acid
sequence that is at least 50%, at least 60%, at least 70%, at least
80%, at least 90%, at least 95%, at least 99%, or at least 99.9%
identical to the amino acid sequence of a wild-type PylRS protein
(e.g., MbPylRS or MmPylRS). In some embodiments, amino acid
sequence identity is based on an alignment against a reference
sequence (e.g., a wild-type PylRS protein, for example, SEQ ID NO:
20 or 21) by NCBI Constraint-based Multiple Alignment Tool
(COBALT), using the following parameters; Alignment Parameters: Gap
penalties-11,-1 and End-Gap penalties-5,-1, CDD Parameters: Use RPS
BLAST on; Blast E-value 0.003; Find Conserved columns and Recompute
on, and Query Clustering Parameters: Use query clusters on; Word
Size 4; Max cluster distance 0.8; Alphabet Regular.
[0076] In some embodiments, a PylRS protein variant is about 70%,
about 71%, about 72%, about 73%, about 74%, about 75%, about 76%,
about 77%, about 78%, about 79%, about 80%, about 81%, about 82%,
about 83%, about 84%, about 85%, about 86%, about 87%, about 88%,
about 89%, about 90%, about 91%, about 92%, about 93%, about 94%,
about 95%, about 96%, about 97%, about 98%, about 99%, or about
99.9% identical to a wild-type PylRS protein (e.g., MbPylRS,
MmPylRS, or a chimeric PylRS).
[0077] The amount or level of variation between a wild-type PylRS
protein and a PylRS protein variant can also be expressed as the
number of mutations present in the amino acid sequence encoding the
PylRS protein variant relative to the amino acid sequence encoding
the wild-type PylRS protein. In some embodiments, an amino acid
sequence encoding a PylRS protein variant comprises between about 1
mutation and about 100 mutations, about 10 mutations and about 90
mutations, about 20 mutations and about 80 mutations, about 30
mutations and about 70 mutations, or about 40 and about 60
mutations relative to an amino acid sequence encoding a wild-type
PylRS protein (e.g., MbPylRS, MmPylRS, or a chimeric PylRS). In
some embodiments, an amino acid sequence encoding a PylRS protein
variant comprises 1, 2, 3,4, 5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
or 50 mutations relative to an amino acid sequence encoding a
wild-type PylRS protein (e.g., MbPylRS, MmPylRS, or a chimeric
PylRS protein). In some embodiments, an amino acid sequence of a
PylRS protein variant comprises more than 100 mutations relative to
an amino acid sequence of a wild-type PylRS protein.
[0078] Particular combinations of mutations present in an amino
acid sequence encoding a PylRS protein variant can be referred to
as the "genotype" of the PylRS protein variant. For example, a
PylRS protein variant genotype may comprise the mutations V31I,
T56P, H62Y, and A100E, relative to a wild-type PylRS protein (e.g.,
SEQ ID NO: 20 or 21).
TyrRS Protein Variants
[0079] In some aspects, the disclosure relates to tyrosyl-tRNA
synthetase (TyrRS) protein variants. The disclosure is based, in
part, on certain tRNA synthetase protein variants (e.g., TyrRS
variants, etc.) that are orthogonal (e.g., with respect to a
non-archaebacterial cell, for example an E. coli cell) and are
characterized by increased activity and amino acid specificity
relative to wild-type tRNA synthetase proteins (e.g., the TyrRS
protein from which the variant was evolved). In some embodiments,
tRNA synthetase protein variants described by the disclosure are
characterized by improved (e.g., increased) charging activity
(e.g., binding of a non-canonical amino acid to a tRNA via
aminoacylation) relative to wild-type tRNA synthetase proteins
(e.g., the TyrRS protein from which the variant was evolved).
[0080] The tRNA synthetase protein variants described by the
disclosure are typically derived from a wild-type TyrRS protein and
have at least one variation in the amino acid sequence of the
variant protein as compared to the amino acid sequence of the
cognate wild-type tRNA synthetase protein. In some embodiments, a
tRNA synthetase protein variant has at least one variation in its
encoding nucleic acid sequence that results in a change in the
amino acid sequence present within a cognate wild-type tRNA
synthetase protein. The variation in amino acid sequence generally
results from a mutation, insertion, or deletion in a DNA coding
sequence. Mutation of a DNA sequence can result in a nonsense
mutation (e.g., a transcription termination codon (TAA, TAG, or
TGA) that produces a truncated protein), a missense mutation (e.g.,
an insertion or deletion mutation that shifts the reading frame of
the coding sequence), or a silent mutation (e.g., a change in the
coding sequence that results in a codon that codes for the same
amino acid normally present in the cognate protein, also referred
to sometimes as a synonymous mutation). In some embodiments,
mutation of a DNA sequence results in a non-synonymous (i.e.,
conservative, semi-conservative, or radical) amino acid
substitution.
[0081] The TyrRS protein can be any TyrRS protein known in the art.
In some embodiments, a tRNA synthetase protein variant is
tyrosyl-tRNA synthetase (TyrRS) protein variant. In some
embodiments, a wild-type TyrRS protein is a M. jannaschii TyrRS
(MjTyrRS) protein. In some embodiments, a MjTyrRS is represented by
the amino acid sequence set forth in SEQ ID NO: 24 or NCBI
Accession Number WP_010869888.1. Additional TyrRS proteins are
described, for example, in Bedouelle H. Tyrosyl-tRNA Synthetases.
In: Madame Curie Bioscience Database [Internet]. Austin, Tex.:
Landes Bioscience; 2000-2013.
[0082] In some embodiments, a TyrRS protein variant and a wild-type
TyrRS protein (e.g., MjTyrRS) are from about 50% to about 99.9%
identical, about 55% to about 95% identical, about 60% to about 90%
identical, about 65% to about 85% identical, or about 70% to about
80% identical at the amino acid sequence level. In some
embodiments, a TyrRS protein variant comprises an amino acid
sequence that is at least 50%, at least 60%, at least 70%, at least
80%, at least 90%, at least 95%, at least 99%, or at least 99.9%
identical to the amino acid sequence of a wild-type TyrRS protein
(e.g., MjTyrRS). In some embodiments, amino acid sequence identity
is based on an alignment against a reference sequence (e.g., a
wild-type TyrRS protein, for example, SEQ ID NO: 24) by NCBI
Constraint-based Multiple Alignment Tool (COBALT), using the
following parameters; Alignment Parameters: Gap penalties-11,-1 and
End-Gap penalties-5,-1, CDD Parameters: Use RPS BLAST on; Blast
E-value 0.003; Find Conserved columns and Recompute on, and Query
Clustering Parameters: Use query clusters on; Word Size 4; Max
cluster distance 0.8; Alphabet Regular.
[0083] In some embodiments, a TyrRS protein variant is about 70%,
about 71%, about 72%, about 73%, about 74%, about 75%, about 76%,
about 77%, about 78%, about 79%, about 80%, about 81%, about 82%,
about 83%, about 84%, about 85%, about 86%, about 87%, about 88%,
about 89%, about 90%, about 91%, about 92%, about 93%, about 94%,
about 95%, about 96%, about 97%, about 98%, about 99%, or about
99.9% identical to a wild-type TyrRS protein (e.g., MjTyrRS).
[0084] The amount or level of variation between a wild-type TyrRS
protein and a TyrRS protein variant can also be expressed as the
number of mutations present in the amino acid sequence encoding the
TyrRS protein variant relative to the amino acid sequence encoding
the wild-type TyrRS protein. In some embodiments, an amino acid
sequence encoding a TyrRS protein variant comprises between about 1
mutation and about 100 mutations, about 10 mutations and about 90
mutations, about 20 mutations and about 80 mutations, about 30
mutations and about 70 mutations, or about 40 and about 60
mutations relative to an amino acid sequence encoding a wild-type
TyrRS protein (e.g., MjTyrRS). In some embodiments, an amino acid
sequence encoding a TyrRS protein variant comprises 1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 mutations relative to an
amino acid sequence encoding a wild-type TyrRS protein (e.g.,
MjTyrRS). In some embodiments, an amino acid sequence of a TyrRS
protein variant comprises more than 100 mutations relative to an
amino acid sequence of a wild-type TyrRS protein.
[0085] Particular combinations of mutations present in an amino
acid sequence encoding a PylRS protein variant can be referred to
as the "genotype" of the PylRS protein variant. For example, a
TyrRS protein variant genotype may comprise the mutations L69F and
V235I, relative to a wild-type TyrRS protein (e.g., SEQ ID NO:
24).
Methods of Use
[0086] Some aspects of this disclosure provide methods of using the
tRNA synthetase protein variants provided herein. For example, some
aspects of this disclosure provide methods comprising contacting a
tRNA with a tRNA synthetase protein variant as described by the
disclosure (e.g., a PylRS protein variant or a TyrRS protein
variant), in the presence of a cognate non-canonical amino acid,
for example pyrolysine (in the case of PylRS) or
p-iodo-L-phenylalanine (in the case of TyrRS), under conditions
under which the tRNA synthetase protein variant "charges" (binds)
the non-canonical amino acid to the tRNA.
[0087] In some embodiments, the tRNA, tRNA synthetase protein
variant, and the non-canonical amino acid are contacted to one
another in a cell, for example a bacterial cell. In some
embodiments, the cell in which the tRNA, tRNA synthetase protein
variant, and the non-canonical amino acid are contacted to one
another does not naturally express the tRNA synthetase protein
variant or the tRNA synthetase protein from which the variant is
derived (e.g., the tRNA synthetase protein variant is orthogonal to
the cell). In some embodiments, the cell is a non-archaebacteria
cell, for example an E. coli cell.
[0088] Methods described by the disclosure are useful, in some
embodiments, for charging (e.g., binding) a transfer RNA (tRNA)
with a non-canonical amino acid by an aminoacylation reaction. In
some embodiments, the activity (e.g., catalytic efficiency of
aminoacylation) of a tRNA synthetase protein variant described by
the disclosure is between about 2-fold and about 50-fold increased
relative to a wild-type tRNA synthetase enzyme. In some
embodiments, the activity of a tRNA synthetase protein variant is
increased about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold,
8-fold, 9-fold, 10-fold, 15-fold, 20-fold, 25-fold, 30-fold,
40-fold, or 50-fold relative to the activity of a wild-type tRNA
synthetase protein. In some embodiments, the activity of a tRNA
synthetase protein variant is increased more than 50-fold relative
to a wild-type tRNA synthetase protein.
[0089] In some embodiments, tRNA synthetase protein variants
described by the disclosure are characterized by improved substrate
specificity (e.g., reduced incorporation of off-target or
undesirable amino acids into a tRNA) relative to a wild-type tRNA
synthetase protein. In some embodiments, substrate specificity of a
tRNA synthetase protein variant described by the disclosure is
increased about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold,
8-fold, 9-fold, 10-fold, 15-fold, 20-fold, 25-fold, 30-fold,
40-fold, or 50-fold relative to the activity of a wild-type tRNA
synthetase protein.
[0090] In some aspects, the disclosure relates to orthogonal
translation systems (OTSs) that allow non-canonical amino acids
(ncAAs) to be site-specifically incorporated into recombinant
proteins, for example during translation of the recombinant protein
in a cell or in vitro. Accordingly, in some embodiments, the
disclosure provides a method for incorporating a non-canonical
amino acid (ncAA) into a peptide, the method comprising expressing
in a cell containing an ncAA: 1) a tRNA synthetase protein variant
(e.g., PylRS or TyrRS protein variant); 2) a tRNA capable of
incorporating the ncAA; and 3) a nucleic acid sequence encoding a
protein, wherein the nucleic acid sequence comprises a codon that
is recognized (e.g., bound) by the tRNA.
[0091] In some embodiments, the tRNA synthetase protein variant is
orthogonal to the cell in which it is being expressed. In some
embodiments, the cell is an E. coli cell. In some embodiments, the
nucleic acid sequence is an mRNA sequence. In some embodiments, the
nucleic acid sequence (e.g., mRNA sequence) comprises an amber
codon (UAG) that is recognized by the tRNA synthetase.
[0092] In some embodiments, the ncAA is a pyrolysine or a
p-iodo-L-phenylalanine. In some embodiments, the ncAA is introduced
into the culture media surrounding the cell prior to being
contained by the cell. In some embodiments, the tRNA synthetase
protein variant, tRNA, and the nucleic acid sequence are expressed
in the cell prior to the cell containing the ncAA (e.g., the ncAA
is added to the cell after expression of the tRNA synthetase
protein variant and the nucleic acid sequence).
Vectors and Systems
[0093] Some aspects of this disclosure provide expression
constructs encoding gene products that select for a desired
physiochemical characteristic or desired function of an evolved
tRNA synthetase protein, such as PylRS or TyrRS in a host cell,
e.g., in a bacterial host cell. In some embodiments, a PACE
selection system comprises one or more gene products encoded by a
nucleic acid (e.g., an isolated nucleic acid). In some embodiments,
one or more nucleic acids that are operably linked comprise an
expression construct. Expression constructs are sometimes also
referred to as vectors. In some embodiments, the expression
constructs are plasmids.
[0094] In some embodiments, a PACE selection system for production
of an evolved tRNA synthetase protein comprises one or more
positive selection plasmids. In some embodiments, at least one of
the positive selection plasmids is an accessory plasmid (AP). In
some embodiments, a positive selection AP comprises a nucleic acid
sequence encoding an amber suppressor tRNA and a gene III (e.g.,
pIII protein). In some embodiments, a positive selection system
comprises a complementary plasmid (CP) that encodes T7 RNAP
controlled by the phage-shock promoter (P.sub.psp), which is
induced only upon phage infection. In some embodiments, a positive
selection system comprises a mutagenesis plasmid (MP) that
increases the rate of evolution during PACE through
arabinose-induced production of mutagenic proteins. In some
embodiments, the selection phage (SP) encodes all phage genes
except gene III, which is replaced by the evolving AARS gene (e.g.,
gene encoding the tRNA synthetase protein to be evolved). FIGS.
2A-2C provide schematics of positive selection systems described by
the disclosure.
[0095] In some embodiments, a PACE selection system for production
of an evolved tRNA synthetase protein comprises one or more
negative selection plasmids. In some embodiments, one or more of
the negative selection plasmids is a negative accessory plasmid
(AP-) and a negative complementary plasmid (CP-). In some
embodiments, a negative accessory plasmid comprises one or more
nucleic acid sequences encoding gene III under the control of a
P.sub.PSP promoter, an amber suppressor tRNA, and a T7RNA
polymerase that comprises amber stop codons and is under the
control of a Tet promoter. In some embodiments, a negative
complementary plasmid comprises a nucleic acid sequence encoding a
dominant-negative variant of gene III (e.g., pIII-neg) under the
control of a T7 promoter.
[0096] Without wishing to be bound by any theory, when an SP
infects the negative selection host, production of pIII protein
from gene III is induced from the phage shock promoter (Ppsp) of
the AP-. If the AARS encoded by the SP (e.g., tRNA synthetase
protein to be evolved) can catalyze aminoacylation under the
conditions of the negative selection (e.g., in the absence of
ncAA), full-length T7 RNAP is produced from the AP- through amber
suppression of amber stop codons at position 12 and 203 of the T7
RNAP gene. When full-length T7 RNAP is produced, expression of gene
III-neg is induced from the T7 promoter (PT7) of the CP- resulting
in production of the dominant-negative pIII-neg protein. The
infectivity of progeny phage decreases with the amount of pIII-neg
in the host cell. Expression levels of the T7 RNAP gene on the AP-
are also controlled by an ATc-inducible promoter (Ptet), allowing
the negative selection to be turned on or off during PACE.
Non-limiting examples of negative selection PACE systems are
described in FIGS. 18A-18B.
[0097] In some embodiments, a selection system for production of an
evolved tRNA synthetase protein comprises one or more positive
selection plasmids and one or more negative selection plasmids as
described herein, and also may be referred to as a "dual-selection
system". In some embodiments, a dual-selection system is described
by FIGS. 17A-17C and further in the Examples section below.
[0098] In some aspects, the disclosure relates to methods of
evolving a tRNA synthetase protein variant. In some embodiments,
the methods comprise the steps of (i) introducing a selection
phagemid comprising a gene encoding a tRNA synthetase to be evolved
into a flow of bacterial host cells through a lagoon, the host
cells comprise phage genes required to package the selection
phagemid into infectious phage particles, wherein at least one gene
required to package the selection phagemid into infectious phage
particles is expressed in response to expression of the gene to be
evolved in the host cell, and wherein the flow rate of the host
cells through the lagoon permits replication of the phagemid, but
not of the host cells, in the lagoon; (ii) replicating and mutating
the phagemid within the flow of host cells; and (iii) isolating a
phagemid comprising a mutated gene encoding an evolved tRNA
synthetase protein variant from the flow of cells.
EXAMPLES
Example 1
[0099] General methods. PCR and all cloning steps were performed in
HyClone water (GE Healthcare Life Sciences). In all other
experiments, water was purified by a MilliQ purification system
(EMD Millipore). PCR was performed with Q5 Hot Start High-Fidelity
DNA polymerase (New England Biolabs) when unmodified primers were
used, and Phusion U Hot Start DNA polymerase (Thermo Fisher
Scientific) was used when deoxyuridine-containing primers were
required for USER cloning. Plasmids and selection phage were
prepared using isothermal assembly with Gibson Assembly 2.times.
Master Mix (New England Biolabs), USER cloning with USER enzyme
(New England Biolabs), or ligation cycling reaction with Ampligase
(Epicentre). Genes were either synthesized from gBlock gene
fragments (Integrated DNA Technologies) or PCR amplified from
native sources. Chimeric PylRS, MbPylRS, and MmPylRS were obtained
from pTECH plasmid sources. Premature stop codons and single point
mutations were placed into genes using the Q5 Site-Directed
Mutagenesis kit (New England Biolabs). The gene encoding p-NFRS was
PCR amplified from the pEVOL plasmid, which was generously provided
to us by P. Schultz of the Scripps Research Institute. DNA vector
amplification was performed using TOP10, Machi (Thermo Fisher
Scientific) or NEB 5-alpha F' Iq (New England Biolabs) cells. All
Sanger sequencing of plasmids and SPs was performed from DNA
samples that had been amplified using the Illustra Templiphi 100
Amplification Kit (GE Healthcare Life Sciences). All ncAAs were
purchased from Chem-Impex International except for
4-nitro-L-phenylalanine (Nanjing Pharmatechs) and
4-iodo-L-phenylalanine (Astatech, Inc.).
[0100] Non-continuous phage propagation. S1030 cells (25 .mu.L)
were electroporated with the accessory plasmid of interest and a
complementary plasmid, when required (Table 1). Transformed cells
recovered 1 h in SOC media (New England Biolabs) at 37.degree. C.
while shaking. Recovered cells were plated on LB agar (United
States Biologicals) containing the antibiotics required for plasmid
maintenance and grew 20 h at 37.degree. C. Single colonies of the
transformed cells were picked and grown for 16 h in a 37.degree. C.
shaker at 230 rpm using 3 mL of Davis rich media (DRM) containing
antibiotics. The saturated cultures were diluted 1,000-fold into 3
mL of identical media or media supplemented with 1 mM ncAA where
noted. The diluted cultures were grown at 37.degree. C. while
shaking to mid-log phase (absorbance at 600 nm
(A.sub.600)=0.5-0.7). Once the desired cell density was reached,
the cultures were inoculated with selection plasmid (SP) to provide
a desired starting titer of .about.1.times.10.sup.5 pfu/mL. A
dilution reference was also prepared by diluting an identical
volume of SP into media containing no cells. All cultures and
dilution references were shaken for 16 h at 37.degree. C. The
resulting saturated cultures were centrifuged 8 min at 3,000 g, and
the supernatant was filtered using a 0.22 .mu.m cellulose acetate,
Spin-X centrifuge tube filter (Costar), and the samples were stored
at 4.degree. C.
TABLE-US-00001 TABLE 1 Plasmids ORF1 ORF3 Plasmid Class [RBS].sup.2
ORF2 [RBS] PACE Name (resistance) Origin Prom Genes Prom Genes Prom
Genes Experiments pDB007(+) AP (carb.sup.R) SC101 P.sub.T7 [SD8]
gIII, luxAB P.sub.ProK tyrT.sup.Opt.sub.CUA -- -- p-NFRS
pDB021CH(+) AP (carb.sup.R) SC101 P.sub.T7 [SD8] gIII, luxAB
P.sub.ProK pylT -- -- Pyl-1, Pyl-2 pDB026a AP (carb.sup.R) SC101
P.sub.psp [SD8] gIII(P29*), P.sub.ProK tyrT.sup.Opt.sub.CUA -- --
luxAB pDB026b AP (carb.sup.R) SC101 P.sub.psp [SD8] gIII(P83*),
P.sub.ProK tyrT.sup.Opt.sub.CUA -- -- luxAB pDB026c AP (carb.sup.R)
SC101 P.sub.psp [SD8] gIII(T177*), P.sub.ProK tyrT.sup.Opt.sub.CUA
-- -- luxAB pDB026d AP (carb.sup.R) SC101 P.sub.psp [SD8]
gIII(Y184*), P.sub.ProK tyrT.sup.Opt.sub.CUA -- -- luxAB pDB026e AP
(carb.sup.R) SC101 P.sub.psp [SD8] gIII(P29*, P.sub.ProK
tyrT.sup.Opt.sub.CUA -- -- Y184*), luxAB pDB026f AP (carb.sup.R)
SC101 P.sub.psp [SD8] gIII(P29*, P.sub.ProK tyrT.sup.Opt.sub.CUA --
-- P83*, Y184*), luxAB pDB026g AP (carb.sup.R) SC101 P.sub.psp
[SD8] gIII(P29*, P.sub.ProK tyrT.sup.Opt.sub.CUA -- -- P83*, T177*,
Y184*), luxAB pJC175e AP (carb.sup.R) SC101 P.sub.psp [SD8] gIII,
luxAB -- -- -- -- pDB038 AP (spec.sup.R) ColE1 P.sub.psp [SD8]
gIII(P29*), P.sub.ProK pylT -- -- Pyl-3 luxAB pDB038a AP
(spec.sup.R) ColE1 P.sub.psp [SD8] gIII(P29*, P.sub.ProK pylT -- --
Pyl-3 Y184*), luxAB pDB038b AP (spec.sup.R) ColE1 P.sub.psp [SD8]
gIII(P29*, P.sub.ProK pylT -- -- Pyl-3 P83*, Y184*), luxAB
pDB007ns2a AP.sup.- (carb.sup.R) SC101 P.sub.psp [SD8] gIII
P.sub.ProK tyrT.sup.Opt.sub.CUA P.sub.tet [SD4] p-NFRS T7RNAP(S12*,
S203*) pDB036a AP.sup.- (carb.sup.R) SC101 P.sub.psp [SD8] gIII
P.sub.ProK tyrT.sup.Opt.sub.CUA P.sub.proD [SD4] Countersel.
T7RNAP(S12*, S203*) pDB036d AP.sup.- (carb.sup.R) SC101 P.sub.psp
[SD8] gIII P.sub.ProK tyrT.sup.Opt.sub.CUA P.sub.proA [SD4]
Countersel. T7RNAP(S12*, S203*) pDB023f CP (spec.sup.R) ColE1
P.sub.psp [SD8] T7RNAP(S12*, -- -- -- -- Pyl-1, Pyl-2 S203*)
pDB023f1 CP (spec.sup.R) ColE1 P.sub.psp [SD4] T7RNAP(S12*, -- --
-- -- p-NFRS S203*) pDB023k CP (spec.sup.R) ColE1 P.sub.psp [SD8]
T7RNAP(S12*, -- -- -- -- S203*, S527*) pDB016 CP.sup.- (spec.sup.R)
ColE1 P.sub.T7 [SD8] gIII-neg -- -- -- -- p-NFRS, Countersel. DP4
DP (chlor.sup.R) cloDF13 P.sub.psp dnaQ926, dam, seqA P.sub.C araC
P.sub.psp-tet [sd8] gIII Pyl-1, Pyl-2, p-NFRS DP6 DP (chlor.sup.R)
cloDF13 P.sub.psp dnaQ926, dam, seqA, P.sub.C araC P.sub.psp-tet
[sd8] gIII Pyl-3 emrR, ugi, cda1 pBAD-sfGFP EP (carb.sup.R) pBR322
P.sub.BAD sfGFP-6xHis variant P.sub.C araC -- -- pDB005x(-) EP
(carb.sup.R) SC101 P.sub.lacZ [SD8] chPylRS P.sub.ProK pylT
P.sub.T7 [SD8] luxAB pDB007xb(-) EP (carb.sup.R) SC101 P.sub.lacZ
[SD8] p-NFRS P.sub.ProK tyrT.sup.Opt.sub.CUA P.sub.T7 [SD8] luxAB
pDB027c EP (carb.sup.R) SC101 P.sub.BAD [SD8] luxAB(Y361*),
P.sub.ProK tyrT.sup.Opt.sub.CUA P.sub.C araC [SD8] MjTyrRS variant
pDB032c EP (carb.sup.R) SC101 P.sub.BAD [SD8] luxAB(Y361*),
P.sub.ProK pylT P.sub.C araC [SD8] PylRS variant pDB059c EP
(carb.sup.R) SC101 P.sub.BAD [SD8] luxAB(Y361*) P.sub.C araC -- --
pDB070 EP (chlor.sup.R) p15A P.sub.tet MjTyrRS variant P.sub.ProK
tyrT.sup.Opt.sub.CUA P.sub.PN25 TetR pTECH-AcK3RS EP (chlor.sup.R)
p15A P.sub.lpp AcK3RS variant P.sub.ProK pylT -- -- pTECH-PylRS EP
(chlor.sup.R) p15A P.sub.lpp PylRS variant P.sub.ProK pylT -- --
pET28b(+)- EP (Kan.sup.R) pBR322 P.sub.T7 sfGFP-6xHis variant
P.sub.I LacI -- -- sfGFP pDB009a EP (spec.sup.R) ColE1 P.sub.tet
[SD8] wt T7 RNAP -- -- -- -- pDB009b EP (spec.sup.R) ColE1
P.sub.tet [SD8] T7 RNAP(S12*) -- -- -- -- pDB009c EP (spec.sup.R)
ColE1 P.sub.tet [SD8] T7 RNAP(S203*) -- -- -- -- pDB009d EP
(spec.sup.R) ColE1 P.sub.tet [SD8] T7 RNAP(S527*) -- -- -- --
pDB009f EP (spec.sup.R) ColE1 P.sub.tet [SD8] T7 RNAP(S12*, -- --
-- -- S203*) pDB009g EP (spec.sup.R) ColE1 P.sub.tet [SD8] T7
RNAP(Y250*) -- -- -- -- pDB009h EP (spec.sup.R) ColE1 P.sub.tet
[SD8] T7 RNAP(Y312*) -- -- -- -- pDB009i EP (spec.sup.R) ColE1
P.sub.tet [SD8] T7 RNAP(Y250*, -- -- -- -- Y312*) pDB009j EP
(spec.sup.R) ColE1 P.sub.tet [SD8] T7 RNAP(S12*, -- -- -- -- S527*)
pDB060-AcK3RS EP (spec.sup.R) ColE1 P.sub.lpp AcK3RS variant
P.sub.ProK pylT -- -- pDB060-IFRS EP (spec.sup.R) ColE1 P.sub.lpp
IFRS variant P.sub.ProK pylT -- -- pDB060-PylRS EP (spec.sup.R)
ColE1 P.sub.lpp PylRS variant P.sub.ProK pylT -- -- MP4 MP
(chlor.sup.R) cloDF13 P.sub.psp dnaQ926, dam, seqA P.sub.C araC --
-- Pyl-2, p-NFRS SP-Kan SP (kan.sup.R) M13 f1 P.sub.gIII Kan -- --
-- -- SP-chPylRS SP (none) M13 f1 P.sub.gIII [SD4] chPyl -- -- --
-- Pyl-1 SP-MBP-TEV SP (none) M13 f1 P.sub.gIII [SD8] MBP-TEV -- --
-- -- SP-p-NFRS SP (none) M13 f1 P.sub.gIII [SD4] p-NFRS -- -- --
-- p-NFRS
[0101] Plaque assay. S1030 cells transformed with the appropriate
plasmids were grown in 2.times.YT liquid media (United States
Biologicals) supplemented with antibiotics required for plasmid
maintenance to A.sub.600=0.6-0.8. Phage supernatant was serially
diluted at 10-fold or 100-fold increments yielding either eight or
four total samples, respectively, including undiluted sample. For
each phage sample, 100 .mu.L of cells were combined with 10 .mu.L
of phage. Within 2 min from phage infection, 950 .mu.L of
55.degree. C. top agar (7 g/L bacteriological agar in 2.times.YT;
no antibiotics) was added and mixed with the phage-infected cells
by gentle pipetting once up and down while avoiding formation of
bubbles. The final mixtures were plated onto quartered Petri plates
that had been previously poured with 1.5 mL of bottom agar (15 g/L
bacteriological agar in 2.times.YT; no antibiotics). Once the
overlaid agar congealed, the plates were incubated 16 h at
37.degree. C. to allow plaque formation. When plaque formation was
dependent on orthogonal AARS activity, 1 mM ncAA was also added to
all liquid and solid media when denoted. When clonal-phage isolates
were required, well separated plaques were picked from plates and
grown individually at 37.degree. C. while shaking in 3 mL of DRM
supplemented with 1 mM ncAA of interest where required. The
resulting saturated cultures were pelleted at 3,000 g for 8 min,
and the phage supernatant was sterile filtered and stored at
4.degree. C. for further analysis.
[0102] Phage-assisted continuous evolution of aminoacyl-tRNA
synthetases. In general, the PACE apparatus-including host-cell
strains, lagoons, chemostats, and media--was used as previously
described, for example in WO2010/028347. All liquid and solid media
contained antibiotics required for plasmid maintenance unless
indicated otherwise. To prepare each PACE strain, the accessory
plasmid (AP), complementary plasmid (CP), and MP or drift plasmid
(DP) of interest were cotransformed into electrocompetent S1030
cells, which recovered for 1 h in SOC medium without antibiotics
(New England Biolabs). The recovered transformants were plated onto
2.times.YT agar containing 0.4% glucose to prevent induction of
mutagenesis prior to PACE, and colonies were grown for 16-20 h in a
37.degree. C. incubator. Three colonies were picked and resuspended
in DRM. A portion of each suspension was tested for arabinose
sensitivity as previously described, and the remainder was used to
inoculate liquid cultures in DRM, which were subsequently grown for
16 h in a 230 rpm shaker at 37.degree. C.
[0103] Each PACE chemostat was prepared by diluting an
arabinose-sensitive overnight culture into 40 mL or 80 mL of DRM,
which was supplemented with ncAA where noted, and the chemostats
grew at 37.degree. C. while stirring with a magnetic stir bar. Once
the culture reached an approximate cell density of A.sub.600=1.0,
fresh DRM (supplemented with ncAA where noted) was used to
continuously dilute the chemostat culture at a dilution rate of 1.6
chemostat volumes per h while maintaining a constant culture volume
as previously described.
[0104] Lagoons flowing from the chemostats were continuously
diluted using the indicated flow rates while maintaining a 25-mL
constant volume by adjusting the height of the needle drawing waste
out of each lagoon. All lagoons were supplemented with 25 mM
arabinose from a syringe pump to induce mutagenesis from the MP or
DP, unless otherwise indicated. Arabinose supplementation began at
least two hours prior to phage infection to insure cells were
maximally induced at the start of each experiment. Lagoons were
also supplemented with anhydrotetracycline (ATc), where noted, to
induce either genetic drift (mutagenesis under weak or no selective
pressure) or negative selection depending on the nature of the
host-cell plasmids.
[0105] Samples of evolving SP pools were taken periodically at
indicated time points from the waste line of each lagoon. Collected
samples were centrifuged at 10,000 g for 2 min, and the supernatant
was passed through a 0.22 .mu.m filter and stored at 4.degree. C.
for subsequent analysis. Phage titers were determined by plaque
assays using S1059 cells (containing the phage-responsive pJC175e
plasmid to report total phage titer) and untransformed S1030 cells
(reporting cheaters from unwanted recombination of gene III into
the SP) for all collected samples. Activity-dependent plaque assays
were performed for mock selection PACE experiments, using S1030
cells cotransformed with the AP and CP used in the host cells of
the corresponding experiment. Mock selections were also monitored
by PCR performed on phage aliquots using primers DB212
(5'-CAAGCCTCAGCGACCGAATA; SEQ ID NO: 1) and DB213
(5'-GGAAACCGAGGAAACGCAA; SEQ ID NO: 2), which anneal to regions of
the phage backbone flanking the gene of interest.
[0106] Evolution of chPylRS (Pyl-1). Host cells cotransformed with
pDB021CH(+), pDB023f, and DP4 were maintained in an 80 mL chemostat
using media containing 1 mM BocK. At the beginning of PACE, genetic
drift was induced (200 ng/mL ATc) in a lagoon that was flowing from
the chemostat at 1 lagoon volume per h. The lagoon was infected
with 10.sup.8 pfu of SP-chPylRS to start the experiment. ATc
supplementation was adjusted to 20 ng/mL at 16 h of PACE to slowly
reduce the amount of genetic drift, and ATc supplementation was
stopped at 24 h. The lagoon flow rate was increased to 2 volumes
per h at 40 h of PACE to increase selection stringency for the
remainder of the experiment, which ended at 120 h.
[0107] Continuation of chPylRS evolution (Pyl-2). Three
preparations of media were used, which contained different
concentrations of NE-(tert-butoxycarbonyl)-L-lysine (BocK) (DRM-A:
1 mM BocK; DRM-B: 0.5 mM BocK; DRM-C: 0.25 mM BocK). Host cells
cotransformed with pDB021CH(+), pDB023f, and MP4 were maintained in
a 40 mL chemostat containing DRM-A at the start of the experiment.
A single lagoon was flowed from the chemostat at 1 lagoon volume
per h, and the experiment was initiated by infecting the lagoon
with 100 .mu.L (2.times.10.sup.4 pfu) of the evolved pool of SP
collected from the 120-h end point of Pyl-1. To increase the
selection stringency during the experiment, the media being pumped
into the chemostat was changed to DRM-B at 42 h of PACE and was
changed to DRM-C at 69 h. The experiment ended at 168 h.
[0108] Continuation of chPylRS evolution (Pyl-3). PACE was
conducted in two separate lagoons, L1 and L2, and a concentration
of 1 mM BocK was maintained throughout the experiment. Selection
stringency was increased during the experiment by modulating the
lagoon flow rate and altering the ratio of host cells in the
lagoons to increase the number of amber suppression events required
to produce full-length pIII during translation (Host-A: pDB038 and
DP6; Host-B: pDB038a and DP6; Host-C: pDB038b and DP6). Host cells
were maintained separately in three, 40 mL chemostats (C1-C3,
respectively), and each chemostat was individually prepared and
coupled to both lagoons, as needed, over the course of the
experiment to minimize media waste and to minimize the total growth
time of each chemostat culture.
[0109] At the start of the experiment L1 and L2 were continuously
diluted with Host-A from C1 at a rate of 0.5 lagoon volumes per h,
and genetic drift was induced only in L1 (100 ng/mL ATc). Each
lagoon was infected with 10.sup.8 pfu of clonal-phage isolate
SP-Py12.288-2, which was isolated from the Pyl-2 segment (Table 2;
bold residues responsible for enhancing activity of chPylRS). The
flow rate from C1 through each lagoon was increased to 1 lagoon
volume per h at 41 h of PACE. At the 91-h mark of the experiment,
L1 and L2 were fed a 1:1 mixture of Host-A:Host-B supplied from C1
and C2, respectively, and the flow through each lagoon was
maintained at 1 lagoon volume per h. At the 120-h mark, 100% Host-B
was flowed to each lagoon at 0.5 lagoon volumes per h, and the flow
rate was later doubled to 1 lagoon volume per h at 136 h of PACE.
At the 162 h mark, L1 and L2 were fed a 1:1 mixture of
Host-B:Host-C supplied from C2 and C3, respectively, and the flow
through each lagoon was maintained at 1 lagoon volume per h. ATc
supplementation to L1 was stopped at 184 h of PACE to end genetic
drift. At the 190-h time point of PACE, 100% Host-C was flowed to
each lagoon at 0.5 lagoon volumes per h for the remainder of the
experiment, which was stopped at 209 h.
TABLE-US-00002 TABLE 2 chPylRS Pyl-2.162.1-5 Pyl-2.189.1-5
Pyl-2.288.1-5 Residue 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 D2 E D7 E A12 G
V31 I I I I I I I I I I I I I I T56 P P P P P P P P P P P P P P H62
Y E77 K T91 S A100 E E E E E K104 E R113 H L118 M A150 V R217 S
D257 G G G G G G G G G G G G G N259 S S L266 I I P282 S I327 M M
G336 E D338 E
[0110] Evolution of p-NFRS with dual selection. The media of the
positive selection contained 1 mM p-IF, and media of the negative
selection contained 4 mM p-NF. Three host-cell strains (Host-A:
pDB007(+), pDB023fl, and DP4; Host-B: pDB007(+), pDB023fl, and MP4;
Host-C: pDB007(+)ns2a, pDB016, and MP4) were used, and were
maintained separately in three, 80 mL chemostats (C1-C3),
respectively. Host-A was pumped into a positive-selection lagoon
(L1-pos) at a flow rate of 1 lagoon volume per h, and genetic drift
was induced by supplementation with 200 ng/mL ATc to the lagoon.
L1-pos was infected with 10.sup.8 pfu of SP-p-NFRS to initiate the
experiment, and supplementation of ATc was stopped at 24 h to end
genetic drift. Concomitant with the end of genetic drift, C1 was
disconnected from L1-pos, and the lagoon was connected to C2
(containing Host-B), which was pumped into L1-pos at 1 lagoon
volume per h. Also at this time, L1-pos was cross coupled to a
second lagoon (L2-neg), which was being continuously flowed with
negative-selection Host-C from C3. The maximum level of
negative-selection stringency from Host-C was maintained by
supplementing L2-neg with 30 ng/mL ATc. Cross coupling of the
opposing selection lagoons was accomplished using two lines of
Masterflex Microbore two-stop tubes (silicone; platinum cured; 0.89
mm ID) (Cole-Parmer), which each had a dead volume of 1 mL. One of
the tubes was used to transfer small volumes of culture from L1-pos
into L2-neg, and the second tube transferred material in the
opposing direction. Material was peristaltically pumped through the
cross-coupling lines with a Masterflex L/S Standard Digital Drive
(Cole-Parmer) equipped with a Masterflex L/S 8-channel multichannel
pump head for microbore tubing (Cole-Parmer). The flow rate through
each cross-coupling line was initially set to 0.5 mL/h to maintain
a 50-fold dilution of the transferred material into the opposing
lagoons. The flow rate through L1-pos and L2-neg was doubled to 2
lagoon volumes per h at 28 h, and flow through the cross-coupling
lines was adjusted to 1 mL/h to maintain 50-fold dilution of
transferred material in each direction. The experiment ended at 48
h.
[0111] Luciferase assay. S1030 cells (25 .mu.L) were electroporated
with the appropriate plasmid(s) and recovered in SOC media (New
England Biolabs) for 1 h while shaking at 37.degree. C. Transformed
cells were plated and grown overnight at 37.degree. C. on LB agar
containing the antibiotics required for plasmid maintenance. Single
colonies were used to inoculate 2-3 mL of DRM containing
antibiotics and were grown overnight at 37.degree. C. while shaking
at 230 rpm. The saturated overnight cultures were diluted 100-fold
in a 96-well deep well plate using 1 mL of DRM containing the
required antibiotic and supplemented with 1 mM ncAA where denoted.
The plate was shaken at 37.degree. C. for 2 h at 230 rpm and then
supplemented with the indicated concentration of
isopropyl-p-D-thiogalactosidase (IPTG), anhydrotetracycline (ATc),
or 1 mM arabinose-depending on the nature of the plasmids--to
induce protein expression. The plate continued to incubate with
shaking at 37.degree. C. for an additional 2-3 h until maximum
luminescence signal was observed. Each luminescence measurement was
taken on 150 .mu.L of each culture, which had been transferred to a
96-well black wall, clear bottom plate (Costar). The A.sub.600 and
luminescence measurements from each well were taken using an
Infinite M1000 Pro microplate reader (Tecan). Background A.sub.600
measurements were taken on wells containing media only. The raw
luminescence value from each well was divided by the
background-subtracted A.sub.600 value of the corresponding well to
provide the luminescence value normalized to cell density. All
variants were assayed in at least biological triplicate, and error
bars represent the standard deviation of the independent
measurements.
[0112] sfGFP assay. In assays of MjTyrRS variants (p-NFRS, p-IFRS,
and PACE-evolved), a pDB070 plasmid containing the AARS of interest
and a pET28b(+) containing the superfolder GFP (sfGFP) of interest
were cotransformed into chemically competent BL21 Star (DE3) cells
(Thermo Fisher Scientific). The transformed cells recovered in SOC
(New England Biolabs) for 1 h while shaking at 37.degree. C. and
were then plated and grown overnight at 37.degree. C. on LB agar
containing 50 .mu.g/mL kanamycin and 25 .mu.g/mL chloramphenicol.
Single colonies were used to inoculate 2 mL of LB media (United
States Biologicals) containing antibiotics and were grown overnight
at 37.degree. C. while shaking at 230 rpm. The saturated overnight
cultures were diluted 100-fold in a 96-well deep well plate using
600 .mu.L of LB media containing the required antibiotic and were
grown at 37.degree. C. to a cell density of A.sub.600=0.3 while
shaking at 230 rpm. AARS expression was induced by addition of LB
media (200 .mu.L) containing antibiotics and the additional
components to provide each well with a final concentration of 200
ng/mL anhydrotetracycline (ATc) and 1 mM ncAA where indicated.
Incubation continued until cultures reached a cell density of
A.sub.600=0.5. Each well was then supplemented with 1 mM
isopropyl-p-D-thiogalactosidase (IPTG) to induce sfGFP
expression.
[0113] In assays of PylRS variants (including AcK3RS variants), a
pTECH plasmid containing the AARS of interest and a pBAD plasmid
containing the sfGFP of interest were cotransformed into chemically
competent TOP10 cells (Thermo Fisher Scientific). The transformed
cells recovered in SOC (New England Biolabs) for 1 h while shaking
at 37.degree. C. and were then plated and grown overnight at
37.degree. C. on LB agar containing 100 .mu.g/mL carbenicillin and
25 .mu.g/mL chloramphenicol. Single colonies were used to inoculate
2-3 mL of LB media (United States Biologicals) containing
antibiotics and were grown overnight at 37.degree. C. while shaking
at 230 rpm. The saturated overnight cultures were diluted 100-fold
in a 96-well deep well plate using 500 .mu.L of LB media containing
the required antibiotic. The plate was shaken at 37.degree. C. for
3 h at 230 rpm and an additional 500 .mu.L of LB was added
containing antibiotics and additional components to provide each
well with a final concentration of 1 mM ncAA where denoted and 1.5
mM arabinose to induce expression of sfGFP.
[0114] For all experiments, the cultures incubated with shaking at
37.degree. C. for an additional 16 h after induction of sfGFP, and
150 .mu.L of each culture was transferred to a 96-well black wall,
clear bottom plate (Costar). The A.sub.600 and fluorescence
(excitation=485 nm; emission=510 nm; bandwidth of excitation and
emission=5 nm) readings from each well were taken using an Infinite
M1000 Pro microplate reader (Tecan). Background A.sub.600 and
background fluorescence measurements were taken on wells containing
LB media only. The background-subtracted fluorescence value from
each well was divided by the background-subtracted A.sub.600 value
of the same well to provide the fluorescence value normalized to
cell density. All variants were assayed in at least biological
triplicate, and error bars represent the standard deviation of the
independent measurements.
[0115] Protein expression and purification of sfGFP. Expression of
His-tagged sfGFP was performed using the plasmids, cell strains,
and antibiotic concentrations described in the methods for sfGFP
assays. Saturated overnight cultures were prepared from single
colonies of cotransformed cells, which were diluted 1,000-fold into
300 mL of LB media containing 1 mM ncAA where denoted and were
grown while shaking at 230 rpm at 37.degree. C. Once the cultures
utilizing a pDB070 plasmid grew to a cell density of A.sub.600=0.3,
AARS expression was induced by supplementing with
anhydrotetracycline (ATc) to a final concentration of 200 ng/mL,
and incubation was continued. Cultures utilizing a pTECH plasmid
did not require this step as the AARS was expressed constitutively.
Once cultures grew to a cell density of A.sub.600=0.5, sfGFP
expression was induced by supplementation with a final
concentration of 1 mM isopropyl-.beta.-D-thiogalactosidase (IPTG)
for cultures utilizing a pET28b(+) plasmid or a final concentration
of 1 mM arabinose for cultures utilizing a pBAD plasmid. Incubation
with shaking at 37.degree. C. continued for an additional 16 h
after induction of sfGFP expression. Cells were harvested by
centrifugation at 5,000 g for 10 min at 4.degree. C., and the
resulting pellets were resuspended in B-PER II Bacterial Protein
Extraction Reagent (Thermo Fisher Scientific) containing EDTA-free
protease inhibitor cocktail (Roche). The soluble fraction of the
cell lysates were diluted by an equal volume of equilibration
buffer (20 mM Tris (pH 7.4), 10 mM imidazole, 300 mM NaCl) and were
separately loaded onto a column containing 2 mL of HisPur Ni-NTA
resin (Thermo Fisher Scientific) that had been pre-washed with two
bed-volumes of equilibration buffer. The resin was washed with two
bed-volumes of wash buffer (20 mM Tris (pH 7.4), 25 mM imidazole,
300 mM NaCl) and protein was then eluted in 3 mL of elution buffer
(20 mM Tris (pH 7.4), 250 mM imidazole, 300 mM NaCl). The purified
protein was dialyzed against 20 mM Tris (pH 7.4), 150 mM NaCl, 5 mM
EDTA, 1 mM 2-mercaptoethanol (BME), and 10% glycerol. Purified
protein was stored at -80.degree. C. until analysis.
[0116] AARS variant expression and purification for aminoacylation
assays. The genes of chPylRS variants and MjTyrRS variants were
cloned into pET15a and transformed into BL21(DE3) (New England
Biolabs) cells for expression. Cells were grown in 500 mL of LB
media supplemented with 100 .mu.g/mL ampicillin at 37.degree. C. to
an A.sub.600 of 0.6-0.8, and protein expression was induced by
addition of 0.5 mM IPTG (chPylRS variants) or 1 mM IPTG (MjTyrRS
variants). Cells were incubated at 30.degree. C. for an additional
6 h (chPylRS variants) or 4 h (MjTyrRS variants) and harvested by
centrifugation at 5,000 g for 10 min at 4.degree. C. The cell
pellet was resuspended in 15 mL of lysis buffer (50 mM Tris (pH
7.5), 300 mM NaCl, 20 mM imidazole), and cells were lysed by
sonication. The crude extract was centrifuged at 20,000 g for 30
min at 4.degree. C. The soluble fraction was loaded onto a column
containing 2 mL of Ni-NTA resin (Qiagen) previously equilibrated
with 20 mL of lysis buffer. The column was washed with 20 mL of
lysis buffer. The bound protein was then eluted with 2 mL of 50 mM
Tris (pH 7.5), 300 mM NaCl, 300 mM imidazole. The purified protein
was dialyzed with 50 mM HEPES-KOH (pH 7.5), 50 mM KCl, 1 mM DTT and
50% glycerol and stored at -80.degree. C. for further studies.
[0117] Purification of c-Myc-chPylRS-6.times.His variants. The
chPylRS variants were cloned into the pTech plasmid using insertion
primers that incorporate the N-terminal c-Myc sequence
(MEQKLISEEDL-; SEQ ID NO: 3) and the C-terminal 6.times.His
sequence (-GSHHHHHH; SEQ ID NO: 4). BL21 star (DE3) cells (Thermo
Fisher Scientific) transformed with the appropriate pTech plasmids
were grown in LB media (United States Biologicals) supplemented
with 25 .mu.g/mL chloramphenicol. For each variant, a saturated
overnight culture was prepared from a single colony, and a 1:100
dilution of culture was made into 5 mL of fresh LB media containing
chloramphenicol. The starter culture grew at 37.degree. C. while
shaking at 230 rpm until the cell density reached A.sub.600=0.3.
The starter culture was then used to inoculate a 1 L culture of LB
media containing chloramphenicol, which continued to incubate while
shaking for an additional 16 h. Cells were harvested by
centrifugation at 5,000 g for 10 min at 4.degree. C., and cell
pellets were resuspended in lysis buffer (20 mM Tris (pH 7.4), 300
mM NaCl, 10 mM imidazole, and EDTA-free protease inhibitor cocktail
(Roche)). The cells were lysed by sonication on ice, and the crude
extract was centrifuged at 15,000 g for 15 min at 4.degree. C.
Lysates were loaded onto columns containing 2 mL of HisPur Ni-NTA
resin (Thermo Fisher Scientific) that had been pre-washed with two
bed-volumes of equilibration buffer. The resin was washed with 10
bed-volumes of wash buffer (20 mM Tris (pH 7.4), 25 mM imidazole,
300 mM NaCl) and protein was then eluted in 3 mL of elution buffer
(20 mM Tris (pH 7.4), 250 mM imidazole, 300 mM NaCl). The purified
protein was dialyzed against 20 mM Tris (pH 7.4), 150 mM NaCl, 5 mM
EDTA, 1 mM dithiothreitol. Purified protein was stored in 20%
glycerol at -80.degree. C. until analysis.
[0118] Western blot analysis of c-Myc-chPylRS-6.times.His variants.
Cell lysates (30 .mu.L) of expressed protein were combined with 25
.mu.L of XT Sample Buffer (Bio-Rad), 5 .mu.L of 2-mercaptoethanol,
and 40 .mu.L water. The samples were heated at 70.degree. C. for 10
min and 7.5 .mu.L of prepared sample was loaded per well of a Bolt
Bis-Tris Plus Gel (Thermo Fisher Scientific). Precision Plus
Protein Dual Color Standard (4 .mu.L) Bio-Rad was used as the
reference ladder. The loaded gel was run at 200V for 22 min in
1.times. Bolt MES SDS running buffer (Thermo Fisher Scientific).
The gel was transferred to a PVDF membrane using the iBlot 2 Gel
Transfer Device (Thermo Fisher Scientific). The membrane was
blocked for 1 h at room temperature in 50% Odyssey blocking buffer
(PBS) (Li-Cor) and was then soaked 4 times for 5 min in PBS
containing 0.1% Tween-20 (PBST). The blocked membrane was soaked
with primary antibodies (rabbit anti-6.times.His (1:1,000 dilution)
(Abcam, ab9108) and mouse anti-c-Myc (1:7,000 dilution)
(Sigma-Aldrich, M4439)) in 50% Odyssey buffer (PBS) containing 0.2%
Tween-20 for 4 h at room temperature. The membrane was washed four
times in PBST, and then soaked for 1 h in the dark at room
temperature with secondary antibodies (donkey anti-mouse 800CW
(1:20,000 dilution) (Li-Cor) and goat anti-rabbit 680RD (1:20,000
dilution) (Li-Cor)) in Odyssey buffer containing 0.01% SDS, 0.2%
Tween-20. The membrane was washed 4 times in PBST and finally
rinsed with PBS. The membrane was scanned using an Odyssey Imaging
System (Li-Cor).
[0119] LCMS analysis of intact purified proteins. Purified protein
samples were diluted to 10 .mu.M in dialysis buffer lacking
reducing agent or glycerol prior to analysis on an Agilent 6220
ESI-TOF mass spectrometer equipped with an Agilent 1260 HPLC.
Separation and desalting was performed on an Agilent PLRP-S Column
(1,000A, 4.6.times.50 mm, 5 .mu.m). Mobile the phase A was 0.1%
formic acid in water and mobile phase B was acetonitrile with 0.1%
formic acid. A constant flow rate of 0.250 mL/min was used. Ten
microliters of the protein solution was injected and washed on the
column for the first 3 min at 5% B, diverting non-retained
materials to waste. The protein was then eluted using a linear
gradient from 5% B to 100% B over 7 min. The mobile phase
composition was maintained at 100% B for 5 min and then returned to
5% B over 1 minute. The column was then re-equilibrated at 5% B for
the next 4 min. Data was analyzed using Agilent MassHunter
Qualitative Analysis software (B.06.00, Build 6.0.633.0 with
Bioconfirm). The charge state distribution for the protein produced
by electrospray ionization was deconvoluted to neutral charge state
using Bioconfirm's implementation of MaxEnt algorithm, giving a
measurement of average molecular weight. The average molecular
weight of the proteins were predicted using ExPASy Compute pI/Mw
tool (http://web.expasy.org/compute_pi/), and each calculation was
adjusted for chromophore maturation in sfGFP and any ncAA
substitutions.
[0120] Amber suppressor tRNA preparation. Template plasmid
containing the tRNA.sup.Pyl or tRNA.sub.CUA.sup.Tyr/Opt gene was
purified with the plasmid maxi kit (Qiagen). The plasmid containing
tRNA.sup.Pyl (100 .mu.g) was digested with BstNI (New England
Biolabs). The tRNA.sub.CUA.sup.Tyr/Opt gene was amplified by PCR.
The BstNI digested template DNA or PCR product was purified by
phenol chloroform extraction, followed by ethanol precipitation and
dissolved in double distilled water. A His-tagged T7 RNA polymerase
was purified over a column of Ni-NTA resin according to
manufacturer's instructions (Qiagen). The transcription reaction
(40 mM Tris (pH 8); 4 mM each of UTP, CTP, GTP, and ATP at pH 7.0;
22 mM MgCl.sub.2; 2 mM spermidine; 10 mM DTT; 6 .mu.g
pyrophosphatase (Roche Applied Science); 60 .mu.g/mL of DNA
template, approximately 0.2 mg/mL T7 RNA polymerase) was performed
in 10 mL reactions overnight at 37.degree. C. The tRNA was purified
on 12% denaturing polyacrylamide gel containing 8 M urea and TBE
buffer (90 mM Tris, 90 mM boric acid, 2 mM EDTA). UV shadowing was
used to illuminate the pure tRNA band, which was excised and
extracted three times with 1M sodium acetate pH 5.3 at 4.degree. C.
The tRNA extractions were then ethanol precipitated, dissolved in
RNase-free distilled water, pooled, and finally desalted using a
Biospin 30 column (BioRad). The tRNA was refolded by heating to
100.degree. C. for 5 min and slow cooling to room temperature. At
65.degree. C., MgCl.sub.2 was added to a final concentration of 10
mM to aid folding. A His-tagged CCA adding enzyme was purified over
column of Ni-NTA resin according to manufacturer's instructions
(Qiagen). 16 .mu.M refolded tRNA in 50 mM Tris (pH 8.0), 20 mM
MgCl.sub.2, 5 mM DTT, and 50 .mu.M NaPPi was incubated at room
temperature for 1 h with approximately 0.2 mg/mL CCA-adding enzyme
and 1.6 .mu.Ci/.mu.L of (.alpha.-.sup.32P)-labeled ATP
(PerkinElmer). The sample was phenol/chloroform extracted and then
passed over a Bio-spin 30 column (Bio-Rad) to remove excess
ATP.
[0121] Aminoacylation assay. A 20 .mu.L aminoacylation reaction
contained the following components for chPylRS variants: 50 mM
HEPES-KOH (pH 7.2), 25 mM KCl, 10 mM MgCl.sub.2, 5 mM DTT, 10 mM
ATP, 25 .mu.g/mL pyrophosphatase (Roche Applied Science), 10 mM
amino acids, 500 nM PylRS variants, 5 .mu.M unlabeled tRNA.sup.Pyl,
and 100 nM .sup.32P-labeled tRNA.sup.Pyl. A 20 L aminoacylation
reaction contained the following components for MjTyrRS variants:
50 mM Tris-HCl (pH 7.5), 1 mM DTT, 10 mM MgCl.sub.2, 10 mM ATP, 20
.mu.M unlabeled tRNA.sub.CUA.sup.Tyr/Opt 3 .mu.M .sup.32P-labeled
tRNA.sub.CUA.sup.Tyr/Opt, 2 .mu.M MjTyrRS variants. Various
concentrations of ATP (1-100 .mu.M), BocK (0.1-10 mM), Pyl (5-500
.mu.M), Phe (0.1-3.2 mM), p-NF (1-32 mM), p-IF (1-32 mM), and tRNA
(0.05-5 .mu.M) were used to determine K.sub.M values for
corresponding substrates. Time points were taken at 5 min, 20 min
and 60 min by removing 2 .mu.L aliquots from the reaction and
immediately quenching the reaction into an ice-cold 3 .mu.L quench
solution (0.66 .mu.g/.mu.L nuclease P1 (Sigma) in 100 mM sodium
citrate (pH 5.0)). For each reaction, 2 .mu.L of blank reaction
mixture (containing no enzyme) was added to the quench solution as
the start time point. The nuclease P1 mixture was then incubated at
room temperature for 30 min and 1 .mu.L aliquots were spotted on
PEI-cellulose plates (Merck) and developed in running buffer
containing 5% acetic acid and 100 mM ammonium acetate. Radioactive
spots for AMP and AA-AMP (representing free tRNA and
aminoacyl-tRNA, respectively) were separated and then visualized
and quantified by phosphorimaging using a Molecular Dynamics Storm
860 phosphorimager (Amersham Biosciences). The ratio of
aminoacylated tRNA to total tRNA was determined to monitor reaction
progress.
Example 2
[0122] The development of orthogonal translation systems (OTSs)
that allow non-canonical amino acids (ncAAs) to be
site-specifically incorporated into recombinant proteins has
enabled researchers to dramatically expand the genetic code. More
than 200 ncAAs have been installed into designer proteins using
OTSs in prokaryotes, eukaryotic cells, and even in whole animals.
The most common strategy for genetic code expansion in vivo
requires three key components. An unused or rarely used codon
(typically the TAG nonsense codon) is placed into a gene's coding
sequence at the position(s) of desired ncAA incorporation. An
orthogonal tRNA (o-tRNA) that is not recognized by host endogenous
aminoacyl-tRNA synthetases (AARSs) decodes the nonsense codon
during translation. Lastly, an orthogonal AARS is required, which
is typically a variant that researchers have evolved to selectively
aminoacylate the o-tRNA, but not endogenous tRNAs, with the target
ncAA (FIG. 1). This third component must be generated for each
different ncAA of interest, and evolving a tailor-made orthogonal
AARS is by far the most challenging and labor-intensive requirement
of this strategy.
[0123] Although researchers have evolved many AARSs to incorporate
ncAAs into proteins, several outstanding challenges limit their
utility and generality. Laboratory evolution of AARSs with altered
amino acid specificity typically relies on three to five rounds of
sequential positive and negative selections from an AARS library
containing either partially or fully randomized residues in the
amino acid-binding pocket. The limited number of rounds of
selection typically conducted in AARS evolution campaigns reflects
the effort required to complete each round of evolution, which is
on the order of one week or longer. A consequence of conducting
relatively few rounds of selection on libraries that focus
mutagenesis on and around the amino acid-binding pocket is that
laboratory-evolved AARSs routinely emerge with suboptimal
properties, for example .about.1,000-fold reduced activity
(k.sub.cat/K.sub.M) compared to their wild-type counterparts, and
modest selectivity for the target ncAA over endogenous amino acids
that can require compensation with high concentrations of ncAA and
expression in minimal media, lowering protein yields. The modest
enzymatic efficiency and selectivity of many laboratory-evolved
AARSs are longstanding challenges that limit the production and
purity of expressed proteins containing ncAAs.
[0124] This example describes phage-assisted continuous evolution
(PACE) selections that enable the laboratory evolution of
orthogonal AARSs over hundreds of generations of mutation,
selection, and replication on practical time scales. AARS PACE was
performed over 268 generations to evolve pyrrolysyl-tRNA synthetase
(PylRS) variants that acquired up to a 45-fold improvement in
enzymatic efficiency (k.sub.cat/K.sub.M.sup.tRNA) compared to the
parent PylRS. The enabling mutations from PACE were also
successfully transplanted into other PylRS variants without
requiring further evolution, resulting in up to 9.7-fold higher
expression of ncAA-containing protein when introduced into a
previously reported PylRS-derived synthetase, AcK3RS.
Interestingly, PACE also gave rise to unexpected mutations that
split PylRS into mutually dependent N- and C-terminal fragments
that maintained high activity and specificity when co-expressed,
mimicking naturally occurring split PylRS homologs. In addition, a
promiscuous mutant Methanocaldococcus jannaschii tyrosyl-tRNA
synthetase (MjTyrRS) was evolved into a variant with >23-fold
higher selectivity for the desired amino acid,
p-iodo-L-phenylalanine, over the undesired substrate,
p-nitro-L-phenylalanine, in 48 h of PACE. Together, these results
establish a rapid and effective approach to improve the catalytic
efficiency and alter the amino acid specificity of AARS
enzymes.
[0125] Development of a positive PACE selection for AARS activity.
PACE has enabled the rapid laboratory evolution of diverse classes
of proteins including polymerases, proteases, genome-editing
agents, and insecticidal proteins. Two strategies by which
aminoacylation of an orthogonal amber suppressor tRNA would induce
pIII production through amber suppression are described. In the
first strategy, amber suppression of premature stop codons in the
T7 RNA polymerase (T7 RNAP) gene allows translation of full-length
T7 RNAP, which transcribes gene III from an upstream T7 promoter.
This approach results in pIII production in an amplified manner
since each amber suppression event can give rise to many gene III
transcripts. In the second, more stringent strategy, amber
suppression of premature stop codons in gene III results in direct
translation of full-length pIII without amplification (FIGS.
2A-2C).
[0126] To implement the first selection strategy, permissive
residues in T7 RNAP that would not inhibit enzymatic activity when
mutated to a wide variety of amino acids were identified. The
number of amber codons are needed in the T7 RNAP gene to make
full-length translation of the polymerase completely dependent on
orthogonal translation was also identified. Amber mutations were
installed in the T7 RNAP gene at Ser-12, Ser-203, Tyr-250, Tyr-312,
and Ser-527, positions predicted from the crystal structure that
avoid perturbation of RNA polymerization or DNA binding.
Suppression with p-nitro-L-phenylalanine (p-NF) (FIG. 3) at
combinations of these sites using a previously evolved MjTyrRS
variant (p-NFRS) revealed that a minimum of two amber stop codons
were required for transcriptional activation by T7 RNAP to become
fully dependent on both the AARS and the ncAA substrate (FIGS.
4A-4B). Similar results were observed with site-specific
installation of NE-(tert-butoxycarbonyl)-L-lysine (BocK) using a
chimeric PylRS (chPylRS), comprising residues 1-149 of
Methanosarcina barkeri PylRS (MbPylRS) and residues 185-454 of
Methanosarcina mazei PylRS (MmPylRS) (FIG. 4C).
[0127] To test the ability of this selection to support phage
propagation, selection phage (SP) expressing either p-NFRS
(SP-p-NFRS) or chPylRS (SP-chPylRS) were propagated
non-continuously in cultures of host E. coli cells harboring an
accessory plasmid (AP) and complementary plasmid (CP) that together
expressed the requisite amber suppressor tRNA, T7 RNAP(S12TAG,
S203TAG), and gene III downstream of a T7 promoter. It was observed
that SP propagation in these cultures was dependent on the presence
of a matched ncAA substrate (FIGS. 5A-5B). Together, these results
validate the PACE selection strategy based on amplified expression
of gene III through amber suppression of two or more stop codons in
T7 RNAP.
[0128] To implement the second, more stringent selection strategy
based on direct amber suppression of premature stop codons in gene
III, amber mutations were installed at positions Pro-29, Pro-83,
Thr-177, or Tyr-184 of gene III. These residues were chosen because
they are predicted to be uninvolved in pIII binding to the host
cell TolA protein or to the host cell F pilus. The N-terminal
signal peptide of pIII, which spans residues 1-18, was not targeted
for amber suppression as this region is required for insertion of
pIII into the host inner membrane. The ability of this selection to
support phage propagation was investigated by challenging selection
phage expressing either p-NFRS (SP-p-NFRS) or chPylRS (SP-chPylRS)
to propagate non-continuously in cultures of host E. coli cells
harboring an accessory plasmid that expressed the requisite amber
suppressor tRNA and gene III containing one or more premature stop
codons. It was observed that each of the mutated positions in pIII
were permissive to ncAA incorporation, and the presence of a single
premature stop codon in the coding sequence of pIII was sufficient
to make robust phage propagation dependent on AARS activity from
SP-p-NFRS or SP-chPylRS (FIGS. 5C-5D). Collectively, these
developments identify positions in T7 RNAP and pIII that tolerate a
range of amino acid side chains, and thereby establish two
strategies to link AARS activity to phage infectivity through amber
suppression of premature stop codons in T7 RNAP or in gene III.
[0129] Next, whether PACE positive selection for aminoacylation
based on amber suppression of stop codons in T7 RNAP could support
activity-dependent phage propagation in the continuous flow format
of PACE was investigated. 48-h mock PACE selections were conducted.
It was observed that SP-p-NFRS propagated at high phage titer
levels in a lagoon supplemented with p-NF substrate without further
adaptation. In a separate control lagoon, SP expressing a kanamycin
resistance gene rather than an AARS were unable to propagate in the
positive selection and rapidly washed out (FIG. 6A). It was also
observed that the PACE positive selection based on direct amber
suppression of stop codons in gene III supported activity-dependent
phage propagation in continuous flow. In a single, 30-h mock PACE
using this selection strategy, active SP-p-NFRS was highly enriched
starting from a 1:1 input-mixture of SP-p-NFRS and an SP expressing
an unrelated gene (FIG. 6B). Together, these results confirmed that
both selection strategies were capable of supporting phage
propagation in PACE in a manner dependent on orthogonal AARS
activity.
[0130] To demonstrate that an AARS with little or no starting
activity on a target amino acid could evolve new activity to
propagate in the positive selection, an additional mock PACE
experiment in which SP-p-NFRS was challenged to evolve acceptance
of endogenous amino acids was performed by propagating the phage in
Davis rich media (DRM) that was not supplemented with p-NF using
the selection requiring amber suppression of stop codons in T7
RNAP. Under these conditions, high titers of SP-p-NFRS were
dependent on induction of a mutagenesis plasmid (MP), MP4, that
enhances the rate of mutagenesis in the host E. coli (FIG. 7A).
This observation indicates that mutation of the AARS was required
in order for SP to propagate when the cognate amino acid substrate
was unavailable. Sanger sequencing analysis of clonal phage from
the experiments confirmed that more mutations accumulated in the
gene encoding p-NFRS when MP4 was induced. Additionally, the
evolved mutants, but not the starting p-NFRS, displayed strong
aminoacylation activity in the absence of p-NF using a luciferase
reporter of amber suppression (FIG. 7B), confirming that the AARS
evolved to accept one or more canonical amino acids during PACE.
Together, results of these experiments validated the positive
selection for continuously evolving an orthogonal AARS.
[0131] Continuous evolution of catalytically enhanced PylRS
variants. PylRS from archaebacteria are preferred evolutionary
starting points for genetic code expansion efforts due to their
tRNA orthogonality in a range of hosts. Wild-type PylRS variants,
however, are generally hampered by poor catalytic efficiency, which
is typically further diminished as an undesired consequence of
traditional laboratory evolution.
[0132] chPylRS the chimera of residues 1-149 of MbPylRS and
residues 185-454 of MmPylRS--was evolved to have improved
aminoacylation activity over 497 h of PACE in three segments (FIG.
8A). The PylRS substrate analog, BocK, was used in the evolution of
PylRS rather than the natural cognate substrate, L-pyrrolysine,
which is not readily available. In the first two segments of PACE
(Pyl-1 and Pyl-2), SP-chPylRS was evolved using the less stringent
selection requiring amber suppression of T7 RNAP. During Pyl-1, the
flow rate was modulated to increase selection stringency. The pool
of phage surviving Pyl-1 was further evolved in Pyl-2, which
challenged SP-chPylRS to propagate as the ncAA substrate
concentration was incrementally reduced. The final PACE segment,
Pyl-3, was conducted in two lagoons using the more stringent
selection strategy, and the number of amber stop codons in gene III
was incrementally increased from one to three to increase demands
on PylRS efficiency. This approach gradually increased selection
stringency over the 497-h evolution of chPylRS, and emerging
variants had survived on average 268 generations of mutation,
selection, and replication.
[0133] Clonal SP isolates from the 120-h endpoint of Pyl-1 were
sequenced and revealed mutations throughout the PylRS gene with
strong convergence toward a pair of mutations in PylRS: V31I and
T56P. Sequencing of clonal isolates from the second PACE segment
(Pyl-2) revealed strong convergence toward D257G at 162 h, and full
convergence on A100E by the end of Pyl-2 (288 h). The additionally
stringent conditions of the Pyl-3 segment selected for complete
convergence toward mutation H62Y in all sequenced clones from 408 h
of PACE (Tables 3-6).
TABLE-US-00003 TABLE 3 Summary of mutations observed in PACE
segment Pyl-1 chPylRS Pyl-1.120.1-8 Residue 1 2 3 4 5 6 7 8 D7 A
V31 I I I I F I I I 41A E T56 P P P P P P A100 T S127 P P A152 V
D257 G G343 D Mutations in chPylRS from the Pyl-1 segment were
determined by Sanger sequencing of eight clonal SP isolates from
120 h of total PACE. Only coding mutations are shown. Shaded
mutations were shown to be responsible for enhancing the activity
of chPylRS.
TABLE-US-00004 TABLE 4 Summary of mutations observed in PACE
segment Pyl-2 chPylRS Pyl-2.162.1-5 Pyl-2.189.1-5 Pyl-2.288.1-5
Residue 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 D2 E D7 E A12 G V31 I I I I I
I I I I I I I I I T56 P P P P P P P P P P P P P P H62 Y E77 K T91 S
A100 E E E E E K104 E R113 H L118 M A150 V R217 S D257 G G G G G G
G G G G G G G N259 S S L266 I I P282 S I327 M M G336 E D338 E
Mutations in chPylRS from the Pyl-2 segment were determined by
Sanger sequencing of five clonal SP isolates from 162 h, 189 h, and
288 h of total PACE. Only coding mutations are shown. Shaded
mutations were shown to be responsible for enhancing the activity
of chPylRS.
TABLE-US-00005 TABLE 5 Summary of mutations observed in lagoon 1 of
PACE segment Pyl-3 chPylRS Pyl-3-L1.408.1-8 Pyl-3-L1.450.1-8
Pyl-3-L1.497.1-8 Residue 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5
6 7 8 V31 I I I I I I I I I I I I I I I I I I I I I I I I T56 P P P
P P P P P P P P P P P P P P P P P P P P P H62 Y Y Y Y Y Y Y Y Y Y Y
Y Y Y Y Y Y Y Y Y Y Y Y Y K90 V97 A A A S99 L L L L L L L L A100 E
E E E E E E E E E E S E S E E E S S S S S S P101 R R R R R R R R
V103 * * * * * * * * K104 E A106 T M107 M' M' M' M' M' M' M' M'
V111 I G A114 T V122 G V134 I K157 R S156 R E203 D Y207 S F K251 R
D257 G G G G G G G G G G G G G G G G G G G G G G G G N259 S F260 S
L266 I N323 S S326 I I I I I V I I I H334 T L335 W D351 E K396 Q
K403 R A405 V A406 S Mutations in chPylRS from lagoon 1 (L1) of the
Pyl-3 segment were determined by Sanger sequencing of eight clonal
SP isolates from 408 h, 450 h, and 497 h of total PACE. Only coding
mutations are shown. Shaded mutations were shown to be responsible
for enhancing the activity of chPylRS. Mutations denoted by a star
indicate stop codons that resulted in split-protein variants in
which translation reinitiates at the position corresponding to
Met-107 of chPylRS (M').
TABLE-US-00006 TABLE 6 Summary of mutations observed in lagoon 2 of
PACE segment Pyl-3 chPylRS Pyl3-L2.408.1-8 Pyl3-L2.450.1-8
Pyl3-L2.497.1-8 Residue 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6
7 8 V8 G G T20 P P P P I26 V H28 Y V31 I I I I I I I I I I I I I I
I I I I I I I I I I D44 G H45 R S53 F T56 P P P P P P P P P P P P P
P P P P P P P P P P P A59 T H62 Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
Y Y Y Y Y Y R73 H D78 E N80 D T91 A S92 L V93 C K94 * S99 F L F L L
A100 E E E E E E E E E E E E * E E E S * E E S E S P101 R R R R R R
V103 * * * * * * M107 M' M' M' M' M' M' M' M' M' S112 P E119 G N120
Y Y A126 T T T T130 P N143 K P147 L L P153 V V P234 S S E236 G G G
D257 G G G G G G G G G G G G G G G G G G G G G G G G K258 E R321 W
S326 N G343 D V367 I I378 V D379 N N A406 S Mutations in chPylRS
from lagoon 2 (L2) of the Pyl-3 segment were determined by Sanger
sequencing of eight clonal SP isolates from 408 h, 450 h, and 497 h
of total PACE. Only coding mutations are shown. Shaded mutations
were shown to be responsible for enhancing the activity of chPylRS.
Mutations denoted by a star indicate stop codons that resulted in
split-protein variants in which translation reinitiates at the
position corresponding to Met-107 of chPylRS (M').
[0134] Each of the Pyl-1 variants exhibited improved aminoacylation
activity in a luciferase reporter of amber suppression with BocK
(FIG. 8B). Comparison of the consensus mutations acquired in each
segment of PACE showed that the two combined mutations from Pyl-1
increased luciferase signal 8.5-fold compared to the progenitor
chPylRS, and the additional two mutations from Pyl-2 improved amber
suppression signal 21-fold. The variant containing all consensus
mutations from the three segments of PACE provided 24-fold improved
amber suppression signal compared to chPylRS while maintaining
substrate specificity (FIG. 8C). Further analysis of the consensus
mutations acquired in the first two segments of PACE demonstrated
that D257G did not significantly contribute to enhance the activity
of chPylRS (FIG. 9A). Therefore, the tetramutant comprising V31I,
T56P, H62Y, and A100E was responsible for the large improvement in
apparent activity.
[0135] BocK was incorporated at up to three positions in sfGFP to
compare the relative activity of chPylRS to the tetramutant
variant, chPylRS(IPYE), containing the activity-enhancing mutations
from PACE (FIG. 8D and FIGS. 10A-10D). Expression of sfGFP
containing three BocK residues was improved nearly 4-fold by
chPylRS(IPYE) compared to chPylRS. Biochemical characterization of
chPylRS(IPYE) using BocK confirmed that the k.sub.cat improved
8.7-fold, and the K.sub.M for tRNA.sup.Pyl substrate improved
5.7-fold, such that the catalytic efficiency
(k.sub.cat/K.sub.M.sup.tRNA) of the evolved variant was enhanced
45-fold compared to chPylRS (Table 7). These findings indicate that
the increased apparent activity of the tetramutant results from
catalytic enhancement of chPylRS, rather than solely from
non-catalytic improvements such as enhanced protein expression or
stability. The outcome of these experiments demonstrates that PACE
positive selection is highly effective at improving the activity of
an AARS commonly used for genetic code expansion.
TABLE-US-00007 TABLE 7 Kinetic parameters of chPylRS variants
containing mutations from PACE. k.sub.cat/ K.sub.M.sup.tRNA,
k.sub.cat, .mu.M.sup.-1 Relative PylRS s.sup.-1 .times.
K.sub.M.sup.ATP, K.sub.M.sup.tRNA, K.sub.M.sup.bocK, s.sup.-1
.times. catalytic variant 10.sup.-3 .mu.M .mu.M mM 10.sup.-3
efficiency chPylRS 11.88 .+-. 0.18 2.54 .+-. 0.16 0.26 .+-. 0.07
1.03 .+-. 0.05 45.69 1 V31I, T56P 73.15 .+-. 1.01 5.74 .+-. 0.20
0.10 .+-. 0.02 0.82 .+-. 0.18 731.50 15.9 V31P, T56P, 110.23 .+-.
4.65 3.45 .+-. 1.19 0.13 .+-. 0.03 0.91 .+-. 0.08 847.92 18.4 A100E
V31P, T56P, 103.87 .+-. 2.37 3.96 .+-. 0.52 0.05 .+-. 0.01 1.13
.+-. 0.23 2,077.40 45.2 H62Y, A100E
[0136] The activity-enhancing coding mutations discovered through
PACE were localized exclusively in the N-terminal domain of
chPylRS, which is involved in tRNA binding and is typically not
targeted for mutagenesis in traditional laboratory evolution
efforts. These changes occur at positions conserved in MbPylRS and
MmPylRS, and in some embodiments, the PACE mutations may generally
improve the activity of other natural and engineered PylRS
homologs. Amber suppression assays using several different
reporters demonstrated that the activity of the MbPylRS(IPYE)
variant was dramatically enhanced while the MmPylRS(IPYE) variant
was also improved, albeit more modestly (FIGS. 9B-9D). To test
whether the beneficial mutations also enhance evolved PylRS
enzymes, the four PACE-derived mutations were transplanted into
AcK3RS, which was previously evolved to accept NE-acetyl-L-lysine
(AcK). MbAcK3RS, MmAcK3RS, and chimeric AcK3RS (chAcK3RS) variants
containing the four mutations each exhibited increased expression
of reporter proteins up to 9.7-fold compared to their unmodified
PylRS counterparts, without sacrificing amino acid selectivity
(FIG. 8E and FIGS. 9E and 11). Reporter expression was also
enhanced more than 5-fold when the mutations were transplanted into
the PylRS-derived IFRS, which was previously evolved to charge
3-iodo-L-phenylalanine (3-IF) (FIG. 9F). Collectively, these
results show that the beneficial mutations discovered exclusively
in the N-terminal domain of chPylRS substantially enhance activity
in all six additional PylRS variants tested.
Unexpected Evolution of Split PylRS Enzymes
[0137] Although there was no strong convergence toward new
beneficial coding mutations between the 408-h and 497-h time
points, 13 of 16 (81%) of the sequenced SP isolates from the two
lagoons of Pyl-3 acquired a surprising frameshift in their coding
sequences by 497 h. Of the thirteen affected clones, 12 of these
contained a single frameshift at one of four different locations in
chPylRS (Tables 5 and 6). In each case, the shifted reading frame
in the chPylRS gene produced a premature ochre (TAA) or opal (TGA)
stop codon resulting in a truncated protein of 93, 99, or 102
residues. In addition, one of the 13 affected isolates from the
497-h time point contained an in-frame ochre stop codon at position
Lys-90, resulting in a truncated protein of 89 residues. Downstream
of the premature stop codon in every case is a Met codon at
canonical position 107 of chPylRS. In some embodiments, protein
synthesis reinitiates from Met-107 resulting in a split chPylRS. In
assays of amber suppression, the split chPylRS(IPYE) variants
exhibited comparable apparent activity as the full-length
chPylRS(IPYE) enzyme. Activity of the split variants was strictly
dependent on the presence of both fragments (FIG. 12A, FIG. 12C,
and Tables 8-9). In contrast, split chPylRS variants lacking the
PACE-evolved coding mutations in their N-terminal fragment resulted
in significant loss of activity (FIG. 12B), which may explain why
the split enzyme was not observed in PACE until the four
activity-enhancing mutations were acquired. Results from western
blot analysis and ESI-MS analysis of split variants confirmed
translational reinitiation from Met-107 (FIGS. 14-15). The
prevalence of the split PylRS variants suggest a fitness advantage
to the split constructs during PACE, although the molecular basis
of this potential advantage is currently unknown.
TABLE-US-00008 TABLE 8 Kinetic parameters of chPylRS variants using
L-pyrrolysine substrate. PylRS variant k.sub.cat, s.sup.-1 .times.
10.sup.-3 K.sub.M.sup.Pyl, .mu.M chPylRS 33.24 .+-. 2.74 21.03 .+-.
0.15 V31I, T56P, A100E 289.16 .+-. 11.45 18.42 .+-. 0.69
TABLE-US-00009 TABLE 9 Kinetic parameters of the fusions of split
chPylRS variants from PACE. AARS variant k.sub.cat, s.sup.-1
.times. 10.sup.-3 K.sub.M.sup.BocK, mM K.sub.M.sup.tRNA, .mu.M
Fused Split2 20 .+-. 1 1.68 .+-. 0.19 3.62 .+-. 0.51 Fused Split3
33 .+-. 3 4.90 .+-. 0.92 3.84 .+-. 0.34 Fused Split6 .sup. 19 .+-.
0.2 1.00 .+-. 0.05 3.61 .+-. 0.38
[0138] The evolution of a split PylRS variant in PACE appears to
mirror the evolution of PylRS in nature, as PylRS homologs in
certain bacteria are expressed from two separate genes (pylSc and
pylSn). The D. hafniense pylSn encodes a 110-residue polypeptide
that is homologous to the N-terminal region of archaeal PylRS, and
an alignment of PylSn to the N-terminal split PylRS evolved in PACE
shows that they terminate near the same location (FIG. 16). These
observations together demonstrate the ability of PACE to evolve
unexpected changes in protein topology.
[0139] Development and validation of AARS negative selections in
PACE. While positive selection PACE was able to greatly increase
the activity of PylRS, the evolution of AARSs to recognize
non-cognate substrates requires the use of negative selections to
minimize activity on endogenous amino acids. A PACE negative
selection that links tRNA aminoacylation to the inhibition of phage
propagation was developed. A dominant-negative variant of pIII
(pIII-neg) was used as the basis of a PACE negative selection for
RNA polymerase activity and DNA binding activity. Because pIII-neg
poisons the infectivity of emergent phage, variants possessing
undesired activity are unable to effectively propagate and are
gradually washed out from the evolving pool of SP under constant
dilution.
[0140] In the PACE negative selection for aminoacylation, amber
suppression of two stop codons in T7 RNAP(S12TAG, S203TAG) allows
transcriptional activation of the gene encoding pIII-neg. Amber
suppression in this context thus results in expression of pIII-neg
and reduced progeny phage infectivity (FIGS. 17A-17B, and FIGS.
18A-18B). Mock PACE negative selections with SP-p-NFRS confirmed
negative selection against AARS activity. SP-p-NFRS in the presence
of p-NF quickly washed out of PACE lagoons under negative
selection, whereas an SP lacking any AARS activity propagated
robustly under the same conditions (FIGS. 19A-19C). These findings
established a PACE negative selection against undesired
aminoacylation activity.
Continuous Evolution of an AARS with Greatly Improved Amino Acid
Selectivity.
[0141] Homogeneity of ncAA incorporation is often crucial for
downstream applications, as it is usually impractical or impossible
to purify proteins containing the desired ncAA substitution from
mixtures containing undesired amino acids at the position(s) of
interest. The amino acid selectivity of evolved AARSs is therefore
a critical determinant of their utility. The laboratory-evolved
MjTyrRS variant, p-NFRS, selectively charges p-NF in minimal media,
but overnight expression in LB media demonstrated that p-NFRS also
efficiently charges Phe in the presence or in the absence of 1 mM
p-NF (FIGS. 20A-20C). Additionally, p-NFRS is a polyspecific
enzyme, as it efficiently charges p-iodo-L-phenylalanine (p-IF) in
addition to p-NF (FIG. 20D). The ability of coupled PACE positive
and negative selections to generate a highly specific AARS by
evolving p-NFRS to charge p-IF selectively was investigated.
[0142] Opposing positive and negative selections were coupled
continuously by constantly exchanging small volumes of material
from opposing PACE lagoons, which allows the pool of AARS variants
to be evolved in both selections simultaneously, rather than
performing iterative counterselections (FIG. 17C). This strategy's
effectiveness relies on (1) the only actively replicating element
in the selection lagoons is the SP, (2) the comparatively small
number of host cells that are diverted into the opposing selection
should not greatly affect either selection due to the much larger
population of correct host cells being continuously infused, and
(3) any contaminating ncAA diverted into the opposing selection
would be diluted to a very low concentration that would be
insufficient to support effective aminoacylation. In some
embodiments, coupling the opposing selections lagoons provides an
opportunity for SP variants capable of propagating in both
selections--i.e., those AARS variants that evolved high amino acid
selectivity--to outcompete variants able to propagate exclusively
in one of the opposing selections.
[0143] SP-p-NFRS was evolved for 24 h of positive selection PACE
toward p-IF followed by 24 h of coupled positive selection with
negative selection against the undesired ncAA, p-NF (FIGS.
21A-21B). SPs that acquired preferential activity toward p-IF in
PACE were isolated from the evolved pool using a single round of
non-continuous counterselections. To enrich variants possessing
little to no activity on the undesired ncAA, endpoint SPs from the
PACE negative selection were challenged to propagate
non-continuously on negative-selection host cells in media
containing 4 mM p-NF. The resulting SPs were then challenged with
positive-selection host cells in the presence of the desired
substrate, 1 mM p-IF, for their ability to promote formation of
activity-dependent plaques (the result of phage propagation in
semi-solid media) and eight of the resulting plaques were sequenced
(FIGS. 22A-22B).
[0144] Of the eight sequenced phage isolates, four acquired no new
mutations in the AARS gene, but instead emerged from PACE with
weakened ribosome binding sites driving AARS expression. Each of
the remaining four SP variants contained one or more coding
mutations and demonstrated a strong preference for charging p-IF
over p-NF (FIGS. 17D-17E and Table 10). The best performing
PACE-evolved variant, Iodo.5, which contained mutations L69F and
V235I with respect to p-NFRS, matched the amino acid specificity of
a previously reported MjTyrRS variant, p-IFRS, that was evolved to
charge p-IF through positive and negative selection on agar plates.
Based on our limit of detection in the assay, expression of
sfGFP(Asn39TAG) using variant Iodo.5 was >23-fold higher with
p-IF than with p-NF.
TABLE-US-00010 TABLE 10 Kinetic parameters of MjTyrRS variants
containing mutations from PACE. k.sub.cat/K.sub.M.sup.ncAA,
Relative mM.sup.-1 catalytic AARS variant ncAA k.sub.cat, s.sup.-1
.times. 10.sup.-3 K.sub.M.sup.ncAA, mM s.sup.-1 .times. 10.sup.-3
efficiency p-NFRS p-NF 1.40 .+-. 0.05 3.68 .+-. 0.29 0.38 1.00
p-NFRS p-IF 0.87 .+-. 0.11 2.23 .+-. 0.46 0.39 1.03 p-NFRS Phe 0.14
.+-. 0.003 0.16 .+-. 0.03 0.875 2.3 Iodo.5 p-NF ND ND ND ND Iodo.5
p-IF ND ND ND ND lodo.1 p-IF 1.60 .+-. 1.27 5.65 .+-. 1.82 0.28
0.74 Iodo.7 p-IF 0.21 .+-. 0.03 0.92 .+-. 0.22 0.23 0.61 Iodo.8
p-IF 1.00 .+-. 0.10 3.80 .+-. 0.84 0.26 0.68
[0145] The protein sequences of p-IFRS and p-NFRS differ by only a
single amino acid; p-NFRS contains Asn160 and p-IFRS contains
His160. It is possible that His160 also emerged in PACE but was not
isolated. We further tested Iodo.5 by expressing the sfGFP reporter
in LB media containing both 1 mM p-NF and 1 mM p-IF in a single
culture. Intact protein mass spectrometry of the resulting purified
protein revealed the desired mass corresponding to incorporation of
p-IF with only trace p-NF incorporation and no detectable
incorporation of Phe at the site of interest (FIGS. 23A-23B). These
results establish that PACE can rapidly evolve a highly selective
AARS from a polyspecific variant in 48 h with no library
cloning.
EQUIVALENTS AND SCOPE
[0146] Those skilled in the art will recognize, or be able to
ascertain using no more than routine experimentation, many
equivalents of the embodiments described herein. The scope of the
present disclosure is not intended to be limited to the above
description, but rather is as set forth in the appended claims.
[0147] Articles such as "a," "an," and "the" may mean one or more
than one unless indicated to the contrary or otherwise evident from
the context. Claims or descriptions that include "or" between two
or more members of a group are considered satisfied if one, more
than one, or all of the group members are present, unless indicated
to the contrary or otherwise evident from the context. The
disclosure of a group that includes "or" between two or more group
members provides embodiments in which exactly one member of the
group is present, embodiments in which more than one members of the
group are present, and embodiments in which all of the group
members are present. For purposes of brevity those embodiments have
not been individually spelled out herein, but it will be understood
that each of these embodiments is provided herein and may be
specifically claimed or disclaimed.
[0148] It is to be understood that the invention encompasses all
variations, combinations, and permutations in which one or more
limitation, element, clause, or descriptive term, from one or more
of the claims or from one or more relevant portion of the
description, is introduced into another claim. For example, a claim
that is dependent on another claim can be modified to include one
or more of the limitations found in any other claim that is
dependent on the same base claim. Furthermore, where the claims
recite a composition, it is to be understood that methods of making
or using the composition according to any of the methods of making
or using disclosed herein or according to methods known in the art,
if any, are included, unless otherwise indicated or unless it would
be evident to one of ordinary skill in the art that a contradiction
or inconsistency would arise.
[0149] Where elements are presented as lists, e.g., in Markush
group format, it is to be understood that every possible subgroup
of the elements is also disclosed, and that any element or subgroup
of elements can be removed from the group. It is also noted that
the term "comprising" is intended to be open and permits the
inclusion of additional elements or steps. It should be understood
that, in general, where an embodiment, product, or method is
referred to as comprising particular elements, features, or steps,
embodiments, products, or methods that consist, or consist
essentially of, such elements, features, or steps, are provided as
well. For purposes of brevity those embodiments have not been
individually spelled out herein, but it will be understood that
each of these embodiments is provided herein and may be
specifically claimed or disclaimed.
[0150] Where ranges are given, endpoints are included. Furthermore,
it is to be understood that unless otherwise indicated or otherwise
evident from the context and/or the understanding of one of
ordinary skill in the art, values that are expressed as ranges can
assume any specific value within the stated ranges in some
embodiments, to the tenth of the unit of the lower limit of the
range, unless the context clearly dictates otherwise. For purposes
of brevity, the values in each range have not been individually
spelled out herein, but it will be understood that each of these
values is provided herein and may be specifically claimed or
disclaimed. It is also to be understood that unless otherwise
indicated or otherwise evident from the context and/or the
understanding of one of ordinary skill in the art, values expressed
as ranges can assume any subrange within the given range, wherein
the endpoints of the subrange are expressed to the same degree of
accuracy as the tenth of the unit of the lower limit of the
range.
[0151] In addition, it is to be understood that any particular
embodiment of the present invention may be explicitly excluded from
any one or more of the claims. Where ranges are given, any value
within the range may explicitly be excluded from any one or more of
the claims. Any embodiment, element, feature, application, or
aspect of the compositions and/or methods of the invention, can be
excluded from any one or more claims. For purposes of brevity, all
of the embodiments in which one or more elements, features,
purposes, or aspects is excluded are not set forth explicitly
herein.
TABLE-US-00011 SEQUENCES DNA sequence of chPylRS. Bolded codons
(Val-31, Thr-56, His-62, and Ala-100) were mutated to `ATT`, `CCC`,
`TAT`, and `GAG`, respectively, in the `IPYE` variant of the enzyme
(SEQ ID NO: 25, 26). >SEQ ID NO: 5
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGGTTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTACCGCACGTGCATTCCGTCATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCAAAAC
CTCTGTTAAAGTTAAAGTTGTTTCTGCGCCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGTG
CGCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCT
CCGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGTGCCCCCGCACTTACGAAGAGCCA
GACTGACAGGCTTGAAGTCCTGTTAAACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTCA
GGGAGCTTGAGTCCGAATTGCTCTCTCGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAAG
GGAGAATTATCTGGGGAAACTCGAGCGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAAA
TAAAATCCCCGATCCTGATCCCTCTTGAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACTT
TCAAAACAGATCTTCAGGGTTGACAAGAACTTCTGCCTGAGACCCATGCTTGCTCCAAACCTTTACAA
CTACCTGCGCAAGCTTGACAGGGCCCTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTACA
GAAAAGAGTCCGACGGCAAAGAACACCTCGAAGAGTTTACCATGCTGAACTTCTGCCAGATGGGATC
GGGATGCACACGGGAAAATCTTGAAAGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTCA
AGATCGTAGGCGATTCCTGCATGGTCTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACTT
TCCTCTGCAGTAGTCGGACCCATACCGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGGC
AGGTTTCGGACTCGAACGCCTTCTAAAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAGG
TCCGAGTCTTACTATAACGGGATTTCTACCAACCTGTAA; DNA sequence of MbPylRS.
Bolded codons (Val-31, Thr-56, His-62, and Ala-100) were mutated to
`ATT`, `CCC`, `TAT`, and `GAG`, respectively, in the `IPYE` variant
of the enzyme (SEQ ID NO: 27, 28). >SEQ ID NO: 6
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGGTTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTACCGCACGTGCATTCCGTCATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCAAAAC
CTCTGTTAAAGTTAAAGTTGTTTCTGCGCCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGTG
CGCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCT
CCGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCGCCGGCGCCGTCTCTGACCCGTTCTCA
GCTGGATCGTGTTGAAGCGCTGCTGTCTCCGGAAGATAAAATCTCTCTGAACATCGCGAAACCGTTCC
GTGAACTGGAATCTGAACTGGTTACCCGTCGTAAAAACGATTTCCAGCGTCTGTACACCAACGATCGT
GAAGACTACCTGGGTAAACTGGAACGTGACATCACCAAATTCTTCGTTGACCGTGATTTCCTGGAAAT
CAAATCTCCGATCCTGATCCCGGCGGAATACGTTGAACGTATGGGTATCAACAACGATACCGAACTGT
CTAAACAGATCTTCCGTGTTGATAAAAACCTGTGCCTGCGTCCGATGCTGGCGCCGACCCTGTACAAC
TATCTGCGTAAACTGGATCGTATCCTGCCGGACCCGATCAAAATCTTCGAAGTTGGTCCGTGCTACCG
TAAAGAATCTGACGGTAAAGAACACCTGGAAGAGTTCACCATGGTGAACTTCTGCCAGATGGGTTCT
GGTTGCACCCGTGAGAACCTGGAATCTCTGATCAAAGAATTTCTGGACTACCTGGAAATCGACTTCGA
AATCGTTGGTGACTCCTGCATGGTGTACGGTGATACCCTGGACATCATGCACGGTGACCTGGAACTGT
CTTCTGCGGTTGTTGGTCCGGTTCCGCTGGATCGTGAATGGGGTATCGACAAACCGTGGATCGGTGCG
GGTTTCGGTCTGGAACGTCTGCTGAAAGTTATGCACGGTTTCAAAAACATCAAACGTGCGTCTCGTTC
TGAATCTTACTACAACGGTATCTCTACCAACCTGTAA; DNA sequence of MmPylRS.
Bolded codons (Val-31, Thr-56, His-62, and Ala-100) were mutated to
`ATT`, `CCC`, `TAT`, and `GAG`, respectively, in the `IPYE` variant
of the enzyme (SEQ ID NO: 29, 30). >SEQ ID NO: 7
ATGGATAAAAAACCACTAAACACTCTGATATCTGCAACCGGGCTCTGGATGTCCAGGACCGGAACAA
TTCATAAAATAAAACACCACGAAGTCTCTCGAAGCAAAATCTATATTGAAATGGCATGCGGAGACCA
CCTTGTTGTAAACAACTCCAGGAGCAGCAGGACTGCAAGAGCGCTCAGGCACCACAAATACAGGAA
GACCTGCAAACGCTGCAGGGTTTCGGATGAGGATCTCAATAAGTTCCTCACAAAGGCAAACGAAGAC
CAGACAAGCGTAAAAGTCAAGGTCGTTTCTGCCCCTACCAGAACGAAAAAGGCAATGCCAAAATCCG
TTGCGAGAGCCCCGAAACCTCTTGAGAATACAGAAGCGGCACAGGCTCAACCTTCTGGATCTAAATTT
TCACCTGCGATACCGGTTTCCACCCAAGAGTCAGTTTCTGTCCCGGCATCTGTTTCAACATCAATATCA
AGCATTTCTACAGGAGCAACTGCATCCGCACTGGTAAAAGGGAATACGAATCCCATTACATCCATGTC
TGCCCCTGTTCAGGCAAGTGCCCCCGCACTTACGAAGAGCCAGACTGACAGGCTTGAAGTCCTGTTAA
ACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTCAGGGAGCTTGAGTCCGAATTGCTCTCT
CGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAAGGGAGAATTATCTGGGGAAACTCGAG
CGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAAATAAAATCCCCGATCCTGATCCCTCTT
GAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACTTTCAAAACAGATCTTCAGGGTTGACA
AGAACTTCTGCCTGAGACCCATGCTTGCTCCAAACCTTTACAACTACCTGCGCAAGCTTGACAGGGCC
CTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTACAGAAAAGAGTCCGACGGCAAAGAAC
ACCTCGAAGAGTTTACCATGCTGAACTTCTGCCAGATGGGATCGGGATGCACACGGGAAAATCTTGA
AAGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTCAAGATCGTAGGCGATTCCTGCATGG
TCTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACTTTCCTCTGCAGTAGTCGGACCCATA
CCGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGGCAGGTTTCGGGCTCGAACGCCTTCT
AAAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAGGTCCGAGTCTTACTATAACGGGATT
TCTACCAACCTGTAA; DNA sequence of chAcK3RS. Bolded codons (Val-31,
Thr-56, His-62, and Ala-100) were mutated to `ATT`, `CCC`, `TAT`,
and `GAG`, respectively, in the `IPYE` variant of the enzyme (SEQ
ID NO: 31, 32). >SEQ ID NO: 8
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGGTTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTACCGCACGTGCATTCCGTCATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGGTGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCAAAAC
CTCTGTTAAAGTTAAAGTTGTTTCTGCGCCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGTG
CGCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCT
CCGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGTGCCCCCGCACTTACGAAGAGCCA
GACTGACAGGCTTGAAGTCCTGTTAAACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTCA
GGGAGCTTGAGTCCGAATTGCTCTCTCGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAAG
GGAGAATTATCTGGGGAAACTCGAGCGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAAA
TAAAATCCCCGATCCTGATCCCTCTTGAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACTT
TCAAAACAGATCTTCAGGGTTGACAAGAACTTCTGCCTGAGACCCATGATGGCTCCAAACATTTTTAA
CTACGCTCGCAAGCTTGACAGGGCCCTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTACA
GAAAAGAGTCCGACGGCAAAGAACACCTCGAAGAGTTTACCATGCTGAACTTCTTTCAGATGGGATC
GGGATGCACACGGGAAAATCTTGAAAGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTCA
AGATCGTAGGCGATTCCTGCATGGTCTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACTT
TCCTCTGCAGTAGTCGGACCCATACCGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGGC
AGGTTTCGGACTCGAACGCCTTCTAAAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAGG
TCCGAGTCTTACTATAACGGGATTTCTACCAACCTGTAA; DNA sequence of MbAcK3RS.
Bolded codons (Val-31, Thr-56, His-62, and Ala-100) were mutated to
`ATT`, `CCC`, `TAT`, and `GAG`, respectively, in the `IPYE` variant
of the enzyme (SEQ ID NO: 33, 34). >SEQ ID NO: 9
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGGTTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTACCGCACGTGCATTCCGTCATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGGTGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCAAAAC
CTCTGTTAAAGTTAAAGTTGTTTCTGCGCCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGTG
CGCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCT
CCGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCGCCGGCGCCGTCTCTGACCCGTTCTCA
GCTGGATCGTGTTGAAGCGCTGCTGTCTCCGGAAGATAAAATCTCTCTGAACATCGCGAAACCGTTCC
GTGAACTGGAATCTGAACTGGTTACCCGTCGTAAAAACGATTTCCAGCGTCTGTACACCAACGATCGT
GAAGACTACCTGGGTAAACTGGAACGTGACATCACCAAATTCTTCGTTGACCGTGATTTCCTGGAAAT
CAAATCTCCGATCCTGATCCCGGCGGAATACGTTGAACGTATGGGTATCAACAACGATACCGAACTGT
CTAAACAGATCTTCCGTGTTGATAAAAACCTGTGCCTGCGTCCGATGATGGCGCCGACCATTTTTAAC
TATGCTCGTAAACTGGATCGTATCCTGCCGGACCCGATCAAAATCTTCGAAGTTGGTCCGTGCTACCG
TAAAGAATCTGACGGTAAAGAACACCTGGAAGAGTTCACCATGGTGAACTTCTTTCAGATGGGTTCTG
GTTGCACCCGTGAGAACCTGGAATCTCTGATCAAAGAATTTCTGGACTACCTGGAAATCGACTTCGAA
ATCGTTGGTGACTCCTGCATGGTGTACGGTGATACCCTGGACATCATGCACGGTGACCTGGAACTGTC
TTCTGCGGTTGTTGGTCCGGTTCCGCTGGATCGTGAATGGGGTATCGACAAACCGTGGATCGGTGCGG
GTTTCGGTCTGGAACGTCTGCTGAAAGTTATGCACGGTTTCAAAAACATCAAACGTGCGTCTCGTTCT
GAATCTTACTACAACGGTATCTCTACCAACCTGTAA; DNA sequence of MmAcK3RS.
Bolded codons (Val-31, Thr-56, His-62, and Ala-100) were mutated to
`ATT`, `CCC`, `TAT`, and `GAG`, respectively, in the `IPYE` variant
of the enzyme (SEQ ID NO: 35, 36). >SEQ ID NO: 10
ATGGATAAAAAACCACTAAACACTCTGATATCTGCAACCGGGCTCTGGATGTCCAGGACCGGAACAA
TTCATAAAATAAAACACCACGAAGTCTCTCGAAGCAAAATCTATATTGAAATGGCATGCGGAGACCA
CCTTGTTGTAAACAACTCCAGGAGCAGCAGGACTGCAAGAGCGCTCAGGCACCACAAATACAGGAA
GACCTGCAAACGCTGCAGGGTTTCGGGTGAGGATCTCAATAAGTTCCTCACAAAGGCAAACGAAGAC
CAGACAAGCGTAAAAGTCAAGGTCGTTTCTGCCCCTACCAGAACGAAAAAGGCAATGCCAAAATCCG
TTGCGAGAGCCCCGAAACCTCTTGAGAATACAGAAGCGGCACAGGCTCAACCTTCTGGATCTAAATTT
TCACCTGCGATACCGGTTTCCACCCAAGAGTCAGTTTCTGTCCCGGCATCTGTTTCAACATCAATATCA
AGCATTTCTACAGGAGCAACTGCATCCGCACTGGTAAAAGGGAATACGAATCCCATTACATCCATGTC
TGCCCCTGTTCAGGCAAGTGCCCCCGCACTTACGAAGAGCCAGACTGACAGGCTTGAAGTCCTGTTAA
ACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTCAGGGAGCTTGAGTCCGAATTGCTCTCT
CGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAAGGGAGAATTATCTGGGGAAACTCGAG
CGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAAATAAAATCCCCGATCCTGATCCCTCTT
GAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACTTTCAAAACAGATCTTCAGGGTTGACA
AGAACTTCTGCCTGAGACCCATGATGGCTCCAAACATTTTTAACTACGCTCGCAAGCTTGACAGGGCC
CTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTACAGAAAAGAGTCCGACGGCAAAGAAC
ACCTCGAAGAGTTTACCATGCTGAACTTCTTTCAGATGGGATCGGGATGCACACGGGAAAATCTTGAA
AGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTCAAGATCGTAGGCGATTCCTGCATGGT
CTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACTTTCCTCTGCAGTAGTCGGACCCATAC
CGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGGCAGGTTTCGGGCTCGAACGCCTTCTA
AAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAGGTCCGAGTCTTACTATAACGGGATTTC
TACCAACCTGTAA; DNA sequence of chIFRS. Bolded codons (Val-31,
Thr-56, His-62, and Ala-100) were mutated to `ATT`, `CCC`, `TAT`,
and `GAG`, respectively, in the `IPYE` variant of the enzyme (SEQ
ID NO: 37, 38). >SEQ ID NO: 11
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGGTTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTACCGCACGTGCATTCCGTCATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCAAAAC
CTCTGTTAAAGTTAAAGTTGTTTCTGCGCCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGTG
CGCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCT
CCGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGTGCCCCCGCACTTACGAAGAGCCA
GACTGACAGGCTTGAAGTCCTGTTAAACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTCA
GGGAGCTTGAGTCCGAATTGCTCTCTCGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAAG
GGAGAATTATCTGGGGAAACTCGAGCGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAAA
TAAAATCCCCGATCCTGATCCCTCTTGAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACTT
TCAAAACAGATCTTCAGGGTTGACAAGAACTTCTGCCTGAGACCCATGCTTGCTCCAAACCTTTACAA
CTACCTGCGCAAGCTTGACAGGGCCCTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTACA
GAAAAGAGTCCGACGGCAAAGAACACCTCGAAGAGTTTACCATGCTGTCGTTCATTCAGATGGGATC
GGGATGTACACGGGAAAATCTTGAAAGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTCA
AGATCGTAGGCGATTCCTGCATGGTCTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACTT
TCCTCTGCAGTAGTCGGACCCATACCGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGGC
AGGTTTCGGGCTCGAACGCCTTCTAAAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAGG
TCCGAGTCTTACTATAACGGGATTTCTACCAACCTGTAA; DNA sequence of
PACE-evolved chPylRS variant, Split1. The in-frame, premature stop
codon of the split enzyme is underlined, and the position of
translational reinitiation, corresponding to Met-107 of chPylRS, is
italicized. In the Spit1' variant, bolded codons were reverted back
to `GTT`, `ACC`, `CAT`, and `GCG`, respectively. >SEQ ID NO: 12
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGATTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTCCCGCACGTGCATTCCGTTATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCAAAAC
CTCTGTTAAAGTTAAAGCTGTTCTGAGCCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGTG
CGCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCT
CCGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGTGCCCCCGCACTTACGAAGAGCCA
GACTGACAGGCTTGAAGTCCTGTTAAACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTCA
GGGAGCTTGAGTCCGAATTGCTCTCTCGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAAG
GGAGAATTATCTGGGGAAACTCGAGCGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAAA
TAAAATCCCCGATCCTGATCCCTCTTGAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACTT
TCAAAACAGATCTTCAGGGTTGACAAGAACTTCTGCCTGAGACCCATGCTTGCTCCAAACCTTTACAA
CTACCTGCGCAAGCTTGACAGGGCCCTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTACA
GAAAAGAGTCCGACGGCAAAGAACACCTCGAAGAGTTTACCATGCTGAACTTCTGCCAGATGGGATC
GGGATGCACACGGGAAAATCTTGAAAGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTCA
AGATCGTAGGCGATTCCTGCATGGTCTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACTT
TCCTCTGCAGTAGTCGGACCCATACCGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGGC
AGGTTTCGGGCTCGAACGCCTTCTAAAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAGG
TCCGAGTCTTACTATAACGGGATTTCTACCAACCTGTAA; DNA sequence of
PACE-evolved chPylRS variant, Split2. The in-frame, premature stop
codon of the split enzyme is underlined, and the position of
translational reinitiation, corresponding to Met-107 of chPylRS, is
italicized. In the Spit2' variant, bolded codons were reverted back
to `GTT`, `ACC`, `CAT`, and `GCG`, respectively. >SEQ ID NO: 13
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGATTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTCCCGCACGTGCATTCCGTTATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCAAAAC
CTCTGTTAAAGTTAAAGTTGTTCTGAGCCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGTGC
GCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCTC
CGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGTGCCCCCGCACTTACGAAGAGCCAG
ACTGACAGGCTTGAAGTCCTGTTAAACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTCAG
GGAGCTTGAGTCCGAATTGCTCTCTCGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAAGG
GAGAATTATCTGGGGAAACTCGAGCGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAAAT
AAAATCCCCGATCCTGATCCCTCTTGAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACTTT
CAAAACAGATCTTCAGGGTTGACAAGAACTTCTGCCTGAGACCCATGCTTGCTCCAAACCTTTACAAC
TACCTGCGCAAGCTTGACAGGGCCCTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTACAG
AAAAGAGTCCGACGGCAAAGAACACCTCGAAGAGTTTACCATGCTGAACTTCTGCCAGATGGGATCG
GGATGCACACGGGAAAATCTTGAAAGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTCAA
GATCGTAGGCGATTCCTGCATGGTCTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACTTT
CCTCTGCAGTAGTCGGACCCATACCGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGGC
AGGTTTCGGGCTCGAACGCCTTCTAAAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAGG
TCCGAGTCTTACTATAACGGGATTTCTACCAACCTGTAA; DNA sequence of
PACE-evolved chPylRS variant, Split3. The in-frame, premature stop
codon of the split enzyme is underlined, and the position of
translational reinitiation, corresponding to Met-107 of chPylRS, is
italicized. In the Spit3' variant, bolded codons were reverted back
to `GTT`, `ACC`, `CAT`, and `GCG`, respectively. >SEQ ID NO: 14
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGATTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTCCCGCACGTGCATTCCGTTATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCAAAAC
CTCTGTTAAAGTTAAAGTTGTTTTCTGAGCCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGT
GCGCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTC
TCCGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGTGCCCCCGCACTTACGAAGAGCC
AGACTGACAGGCTTGAAGTCCTGTTAAACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTC
AGGGAGCTTGAGTCCGAATTGCTCTCTCGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAA
GGGAGAATTATCTGGGGAAACTCGAGCGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAA
ATAAAATCCCCGATCCTGATCCCTCTTGAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACT
TTCAAAACAGATCTTCAGGGTTGACAAGAACTTCTGCCTGAGACCCATGCTTGCTCCAAACCTTTACA
ACTACCTGCGCAAGCTTGACAGGGCCCTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTAC
AGAAAAGAGTCCGACGGCAAAGAACACCTCGAAGAGTTTACCATGCTGAACTTCTGCCAGATGGGAT
CGGGATGCACACGGGAAAATCTTGAAAGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTC
AAGATCGTAGGCGATTCCTGCATGGTCTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACT
TTCCTCTGCAGTAGTCGGACCCATACCGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGG
CAGGTTTCGGGCTCGAACGCCTTCTAAAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAG
GTCCGAGTCTTACTATAACGGGATTTCTACCAACCTGTAA; DNA sequence of
PACE-evolved chPylRS variant, Split4. The in-frame, premature stop
codon of the split enzyme is underlined, and the position of
translational reinitiation, corresponding to Met-107 of chPylRS, is
italicized. In the Spit4' variant, bolded codons were reverted back
to `GTT`, `ACC`, `CAT`, and `GCG`, respectively. >SEQ ID NO: 15
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGATTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTCCCGCACGTGCATTCCGTTATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCTAAAC
CTCTGTTAAAGTTAAAGTTGTTTCTGAGCCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGTG
CGCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCT
CCGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGTGCCCCCGCACTTACGAAGAGCCA
GACTGACAGGCTTGAAGTCCTGTTAAACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTCA
GGGAGCTTGAGTCCGAATTGCTCTCTCGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAAG
GGAGAATTATCTGGGGAAACTCGAGCGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAAA
TAAAATCCCCGATCCTGATCCCTCTTGAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACTT
TCAAAACAGATCTTCAGGGTTGACAAGAACTTCTGCCTGAGACCCATGCTTGCTCCAAACCTTTACAA
CTACCTGCGCAAGCTTGACAGGGCCCTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTACA
GAAAAGAGTCCGACGGCAAAGAACACCTCGAAGAGTTTACCATGCTGAACTTCTGCCAGATGGGATC
GGGATGCACACGGGAAAATCTTGAAAGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTCA
AGATCGTAGGCGATTCCTGCATGGTCTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACTT
TCCTCTGCAGTAGTCGGACCCATACCGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGGC
AGGTTTCGGGCTCGAACGCCTTCTAAAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAGG
TCCGAGTCTTACTATAACGGGATTTCTACCAACCTGTAA; DNA sequence of
PACE-evolved chPylRS variant, Split5. The in-frame, premature stop
codon of the split enzyme is underlined, and the position of
translational reinitiation, corresponding to Met-107 of chPylRS, is
italicized. In the Spit5' variant, bolded codons were reverted back
to `GTT`, `ACC`, `CAT`, and `GCG`, respectively. >SEQ ID NO: 16
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGATTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTCCCGCACGTGCATTCCGTTATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCAAAAC
CTCTGTTAAAGTTAAAGTTGTTTCTGAGCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGTGC
GCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCTC
CGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGTGCCCCCGCACTTACGAAGAGCCAG
ACTGACAGGCTTGAAGTCCTGTTAAACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTCAG
GGAGCTTGAGTCCGAATTGCTCTCTCGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAAGG
GAGAATTATCTGGGGAAACTCGAGCGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAAAT
AAAATCCCCGATCCTGATCCCTCTTGAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACTTT
CAAAACAGATCTTCAGGGTTGACAAGAACTTCTGCCTGAGACCCATGCTTGCTCCAAACCTTTACAAC
TACCTGCGCAAGCTTGACAGGGCCCTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTACAG
AAAAGAGTCCGACGGCAAAGAACACCTCGAAGAGTTTACCATGCTGAACTTCTGCCAGATGGGATCG
GGATGCACACGGGAAAATCTTGAAAGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTCAA
GATCGTAGGCGATTCCTGCATGGTCTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACTTT
CCTCTGCAGTAGTCGGACCCATACCGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGGC
AGGTTTCGGGCTCGAACGCCTTCTAAAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAGG
TCCGAGTCTTACTATAACGGGATTTCTACCAACCTGTAA; DNA sequence of
PACE-evolved chPylRS variant, Split6. This split enzyme contained
several in-frame, premature stop codons (underlined) between the
frameshift and the position of translational reinitiation,
corresponding to Met-107 of chPylRS italicized. In the Spit6'
variant, bp; ded codons were reverted back to `GTT`, `ACC`, `CAT`,
and `GCG`, respectively. >SEQ ID NO: 17
ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATGTCCCGTACCGGCACGCT
GCACAAGATCAAGCACTATGAGATTTCTCGTTCTAAAATCTACATCGAAATGGCGTGTGGTGACCATC
TGGTTGTGAACAACTCTCGTTCTTGTCGTCCCGCACGTGCATTCCGTTATCATAAATACCGTAAAACC
TGCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTTCTACCGAAGGCAAAAC
CCTCTGTTAAAGTTAAAGTTGTTTCTGAGCCGAAAGTGAAAAAAGCGATGCCGAAATCTGTTTCTCGT
GCGCCGAAACCGCTGGAAAATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTC
TCCGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGTGCCCCCGCACTTACGAAGAGCC
AGACTGACAGGCTTGAAGTCCTGTTAAACCCAAAAGATGAGATTTCCCTGAATTCCGGCAAGCCTTTC
AGGGAGCTTGAGTCCGAATTGCTCTCTCGCAGAAAAAAAGACCTGCAGCAGATCTACGCGGAAGAAA
GGGAGAATTATCTGGGGAAACTCGAGCGTGAAATTACCAGGTTCTTTGTGGACAGGGGTTTTCTGGAA
ATAAAATCCCCGATCCTGATCCCTCTTGAGTATATCGAAAGGATGGGCATTGATAATGATACCGAACT
TTCAAAACAGATCTTCAGGGTTGACAAGAACTTCTGCCTGAGACCCATGCTTGCTCCAAACCTTTACA
ACTACCTGCGCAAGCTTGACAGGGCCCTGCCTGATCCAATAAAAATTTTTGAAATAGGCCCATGCTAC
AGAAAAGAGTCCGACGGCAAAGAACACCTCGAAGAGTTTACCATGCTGAACTTCTGCCAGATGGGAT
CGGGATGCACACGGGAAAATCTTGAAAGCATAATTACGGACTTCCTGAACCACCTGGGAATTGATTTC
AAGATCGTAGGCGATTCCTGCATGGTCTATGGGGATACCCTTGATGTAATGCACGGAGACCTGGAACT
TTCCTCTGCAGTAGTCGGACCCATACCGCTTGACCGGGAATGGGGTATTGATAAACCCTGGATAGGGG
CAGGTTTCGGGCTCGAACGCCTTCTAAAGGTTAAACACGACTTTAAAAATATCAAGAGAGCTGCAAG
GTCCGAGTCTTACTATAACGGGATTTCTACCAACCTGTAA; DNA sequence of p-NFRS
(amino acid sequence SEQ ID NO: 39). >SEQ ID NO: 18
ATGGACGAATTTGAAATGATAAAGAGAAACACATCTGAAATTATCAGCGAGGAAGAGTTAAGAGAG
GTTTTAAAAAAAGATGAAAAATCTGCTCTGATAGGTTTTGAACCAAGTGGTAAAATACATTTAGGGCA
TTATCTCCAAATAAAAAAGATGATTGATTTACAAAATGCTGGATTTGATATAATTATATTGTTGGCTG
ATTTACACGCCTATTTAAACCAGAAAGGAGAGTTGGATGAGATTAGAAAAATAGGAGATTATAACAA
AAAAGTTTTTGAAGCAATGGGGTTAAAGGCAAAATATGTTTATGGAAGTTCGTTCCAGCTTGATAAGG
ATTATACACTGAATGTCTATAGATTGGCTTTAAAAACTACCTTAAAAAGAGCAAGAAGGAGTATGGA
ACTTATAGCAAGAGAGGATGAAAATCCAAAGGTTGCTGAAGTTATCTATCCAATAATGCAGGTTAATC
CTCTTAATTATGAGGGCGTTGATGTTGCAGTTGGAGGGATGGAGCAGAGAAAAATACACATGTTAGC
AAGGGAGCTTTTACCAAAAAAGGTTGTTTGTATTCACAACCCTGTCTTAACGGGTTTGGATGGAGAAG
GAAAGATGAGTTCTTCAAAAGGGAATTTTATAGCTGTTGATGACTCTCCAGAAGAGATTAGGGCTAAG
ATAAAGAAAGCATACTGCCCAGCTGGAGTTGTTGAAGGAAATCCAATAATGGAGATAGCTAAATACT
TCCTTGAATATCCTTTAACCATAAAAAGGCCAGAAAAATTTGGTGGAGATTTGACAGTTAATAGCTAT
GAGGAGTTAGAGAGTTTATTTAAAAATAAGGAATTGCATCCAATGCGCTTAAAAAATGCTGTAGCTG
AAGAACTTATAAAGATTTTAGAGCCAATTAGAAAGAGATTATAA; DNA sequence of
p-IFRS (amino acid sequence SEQ ID NO: 40). >SEQ ID NO: 19
ATGGACGAATTTGAAATGATAAAGAGAAACACATCTGAAATTATCAGCGAGGAAGAGTTAAGAGAG
GTTTTAAAAAAAGATGAAAAATCTGCTCTGATAGGTTTTGAACCAAGTGGTAAAATACATTTAGGGCA
TTATCTCCAAATAAAAAAGATGATTGATTTACAAAATGCTGGATTTGATATAATTATATTGTTGGCTG
ATTTACACGCCTATTTAAACCAGAAAGGAGAGTTGGATGAGATTAGAAAAATAGGAGATTATAACAA
AAAAGTTTTTGAAGCAATGGGGTTAAAGGCAAAATATGTTTATGGAAGTTCGTTCCAGCTTGATAAGG
ATTATACACTGAATGTCTATAGATTGGCTTTAAAAACTACCTTAAAAAGAGCAAGAAGGAGTATGGA
ACTTATAGCAAGAGAGGATGAAAATCCAAAGGTTGCTGAAGTTATCTATCCAATAATGCAGGTTAATC
CTCTTCATTATGAGGGCGTTGATGTTGCAGTTGGAGGGATGGAGCAGAGAAAAATACACATGTTAGC
AAGGGAGCTTTTACCAAAAAAGGTTGTTTGTATTCACAACCCTGTCTTAACGGGTTTGGATGGAGAAG
GAAAGATGAGTTCTTCAAAAGGGAATTTTATAGCTGTTGATGACTCTCCAGAAGAGATTAGGGCTAAG
ATAAAGAAAGCATACTGCCCAGCTGGAGTTGTTGAAGGAAATCCAATAATGGAGATAGCTAAATACT
TCCTTGAATATCCTTTAACCATAAAAAGGCCAGAAAAATTTGGTGGAGATTTGACAGTTAATAGCTAT
GAGGAGTTAGAGAGTTTATTTAAAAATAAGGAATTGCATCCAATGCGCTTAAAAAATGCTGTAGCTG
AAGAACTTATAAAGATTTTAGAGCCAATTAGAAAGAGATTATAA; Amino acid sequence
of MbPylRS >SEQ ID NO: 20
MDKKPLDVLISATGLWMSRTGTLHKIKHYEVSRSKIYIEMACGDHLVVNNSRSCRTARAFRHHKYRKTCK
RCRVSDEDINNFLTRSTEGKTSVKVKVVSAPKVKKAMPKSVSRAPKPLENPVSAKASTDTSRSVPSPAKST
PNSPVPTSAPAPSLTRSQLDRVEALLSPEDKISLNIAKPFRELESELVTRRKNDFQRLYTNDREDYLGKLERD
ITKFFVDRDFLEIKSPILIPAEYVERMGINNDTELSKQIFRVDKNLCLRPMLAPTLYNYLRKLDRILPDPIKIF-
E
VGPCYRKESDGKEHLEEFTMVNFCQMGSGCTRENLESLIKEFLDYLEIDFEIVGDSCMVYGDTLDIMHGDL
ELSSAVVGPVPLDREWGIDKPWIGAGFGLERLLKVMHGFKNIKRASRSESYYNGISTNL; Amino
acid sequence of MmPylRS >SEQ ID NO: 21
MDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVVNNSRSSRTARALRHHKYRKTCK
RCRVSDEDLNKFLTKANEDQTSVKVKVVSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAIPV
STQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSG
KPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQ-
I
FRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENL
ESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAVVGPIPLDREWGIDKPWIGAGFGLERLLKVK
HDFKNIKRAARSESYYNGISTNL; Amino acid sequence of p-NFRS >SEQ ID
NO: 22
MDEFEMIKRNTSEIISEEELREVLKKDEKSALIGFEPSGKIHLGHYLQIKKMIDLQNAGFDIIILLADLHAYLN
QKGELDEIRKIGDYNKKVFEAMGLKAKYVYGSSFQLDKDYTLNVYRLALKTTLKRARRSMELIAREDENP
KVAEVIYPIMQVNPLNYEGVDVAVGGMEQRKIHMLARELLPKKVVCIHNPVLTGLDGEGKMSSSKGNFIA
VDDSPEEIRAKIKKAYCPAGVVEGNPIMEIAKYFLEYPLTIKRPEKFGGDLTVNSYEELESLFKNKELHPMR
LKNAVAEELIKILEPIRKRL; Amino acid sequence p-IFRS SEQ ID NO: 23
MDEFEMIKRNTSEIISEEELREVLKKDEKSALIGFEPSGKIHLGHYLQIKKMIDLQNAGFDIIILLADLHAYLN
QKGELDEIRKIGDYNKKVFEAMGLKAKYVYGSSFQLDKDYTLNVYRLALKTTLKRARRSMELIAREDENP
KVAEVIYPIMQVNPLHYEGVDVAVGGMEQRKIHMLARELLPKKVVCIHNPVLTGLDGEGKMSSSKGNFIA
VDDSPEEIRAKIKKAYCPAGVVEGNPIMEIAKYFLEYPLTIKRPEKFGGDLTVNSYEELESLFKNKELHPMR
LKNAVAEELIKILEPIRKRL; Amino acid sequence of M. jannaschii TyrRS
SEQ ID NO: 24
MDEFEMIKRNTSEIISEEELREVLKKDEKSAYIGFEPSGKIHLGHYLQIKKMIDLQNAGFDIIILLADLHAYLN
QKGELDEIRKIGDYNKKVFEAMGLKAKYVYGSEFQLDKDYTLNVYRLALKTTLKRARRSMELIAREDENP
KVAEVIYPIMQVNDIHYLGVDVAVGGMEQRKIHMLARELLPKKVVCIHNPVLTGLDGEGKMSSSKGNFIA
VDDSPEEIRAKIKKAYCPAGVVEGNPIMEIAKYFLEYPLTIKRPEKFGGDLTVNSYEELESLFKNKELHPMD
LKNAVAEELIKILEPIRKR;
Sequence CWU 1
1
40120DNAArtificial SequenceSynthetic Polynucleotide 1caagcctcag
cgaccgaata 20219DNAArtificial SequenceSynthetic Polynucleotide
2ggaaaccgag gaaacgcaa 19311PRTArtificial SequenceSynthetic
Polypeptide 3Met Glu Gln Lys Leu Ile Ser Glu Glu Asp Leu1 5
1048PRTArtificial SequenceSynthetic Polypeptide 4Gly Ser His His
His His His His1 551260DNAArtificial SequenceSynthetic
Polynucleotide 5atggataaga agccgctgga tgttctgatc tctgcgaccg
gtctgtggat gtcccgtacc 60ggcacgctgc acaagatcaa gcactatgag gtttctcgtt
ctaaaatcta catcgaaatg 120gcgtgtggtg accatctggt tgtgaacaac
tctcgttctt gtcgtaccgc acgtgcattc 180cgtcatcata aataccgtaa
aacctgcaaa cgttgtcgtg tttctgacga agatatcaac 240aacttcctga
cccgttctac cgaaggcaaa acctctgtta aagttaaagt tgtttctgcg
300ccgaaagtga aaaaagcgat gccgaaatct gtttctcgtg cgccgaaacc
gctggaaaat 360ccggtttctg cgaaagcgtc taccgacacc tctcgttctg
ttccgtctcc ggcgaaatct 420accccgaact ctccggttcc gacctctgca
agtgcccccg cacttacgaa gagccagact 480gacaggcttg aagtcctgtt
aaacccaaaa gatgagattt ccctgaattc cggcaagcct 540ttcagggagc
ttgagtccga attgctctct cgcagaaaaa aagacctgca gcagatctac
600gcggaagaaa gggagaatta tctggggaaa ctcgagcgtg aaattaccag
gttctttgtg 660gacaggggtt ttctggaaat aaaatccccg atcctgatcc
ctcttgagta tatcgaaagg 720atgggcattg ataatgatac cgaactttca
aaacagatct tcagggttga caagaacttc 780tgcctgagac ccatgcttgc
tccaaacctt tacaactacc tgcgcaagct tgacagggcc 840ctgcctgatc
caataaaaat ttttgaaata ggcccatgct acagaaaaga gtccgacggc
900aaagaacacc tcgaagagtt taccatgctg aacttctgcc agatgggatc
gggatgcaca 960cgggaaaatc ttgaaagcat aattacggac ttcctgaacc
acctgggaat tgatttcaag 1020atcgtaggcg attcctgcat ggtctatggg
gatacccttg atgtaatgca cggagacctg 1080gaactttcct ctgcagtagt
cggacccata ccgcttgacc gggaatgggg tattgataaa 1140ccctggatag
gggcaggttt cggactcgaa cgccttctaa aggttaaaca cgactttaaa
1200aatatcaaga gagctgcaag gtccgagtct tactataacg ggatttctac
caacctgtaa 126061260DNAArtificial SequenceSynthetic Polynucleotide
6atggataaga agccgctgga tgttctgatc tctgcgaccg gtctgtggat gtcccgtacc
60ggcacgctgc acaagatcaa gcactatgag gtttctcgtt ctaaaatcta catcgaaatg
120gcgtgtggtg accatctggt tgtgaacaac tctcgttctt gtcgtaccgc
acgtgcattc 180cgtcatcata aataccgtaa aacctgcaaa cgttgtcgtg
tttctgacga agatatcaac 240aacttcctga cccgttctac cgaaggcaaa
acctctgtta aagttaaagt tgtttctgcg 300ccgaaagtga aaaaagcgat
gccgaaatct gtttctcgtg cgccgaaacc gctggaaaat 360ccggtttctg
cgaaagcgtc taccgacacc tctcgttctg ttccgtctcc ggcgaaatct
420accccgaact ctccggttcc gacctctgcg ccggcgccgt ctctgacccg
ttctcagctg 480gatcgtgttg aagcgctgct gtctccggaa gataaaatct
ctctgaacat cgcgaaaccg 540ttccgtgaac tggaatctga actggttacc
cgtcgtaaaa acgatttcca gcgtctgtac 600accaacgatc gtgaagacta
cctgggtaaa ctggaacgtg acatcaccaa attcttcgtt 660gaccgtgatt
tcctggaaat caaatctccg atcctgatcc cggcggaata cgttgaacgt
720atgggtatca acaacgatac cgaactgtct aaacagatct tccgtgttga
taaaaacctg 780tgcctgcgtc cgatgctggc gccgaccctg tacaactatc
tgcgtaaact ggatcgtatc 840ctgccggacc cgatcaaaat cttcgaagtt
ggtccgtgct accgtaaaga atctgacggt 900aaagaacacc tggaagagtt
caccatggtg aacttctgcc agatgggttc tggttgcacc 960cgtgagaacc
tggaatctct gatcaaagaa tttctggact acctggaaat cgacttcgaa
1020atcgttggtg actcctgcat ggtgtacggt gataccctgg acatcatgca
cggtgacctg 1080gaactgtctt ctgcggttgt tggtccggtt ccgctggatc
gtgaatgggg tatcgacaaa 1140ccgtggatcg gtgcgggttt cggtctggaa
cgtctgctga aagttatgca cggtttcaaa 1200aacatcaaac gtgcgtctcg
ttctgaatct tactacaacg gtatctctac caacctgtaa 126071365DNAArtificial
SequenceSynthetic Polynucleotide 7atggataaaa aaccactaaa cactctgata
tctgcaaccg ggctctggat gtccaggacc 60ggaacaattc ataaaataaa acaccacgaa
gtctctcgaa gcaaaatcta tattgaaatg 120gcatgcggag accaccttgt
tgtaaacaac tccaggagca gcaggactgc aagagcgctc 180aggcaccaca
aatacaggaa gacctgcaaa cgctgcaggg tttcggatga ggatctcaat
240aagttcctca caaaggcaaa cgaagaccag acaagcgtaa aagtcaaggt
cgtttctgcc 300cctaccagaa cgaaaaaggc aatgccaaaa tccgttgcga
gagccccgaa acctcttgag 360aatacagaag cggcacaggc tcaaccttct
ggatctaaat tttcacctgc gataccggtt 420tccacccaag agtcagtttc
tgtcccggca tctgtttcaa catcaatatc aagcatttct 480acaggagcaa
ctgcatccgc actggtaaaa gggaatacga atcccattac atccatgtct
540gcccctgttc aggcaagtgc ccccgcactt acgaagagcc agactgacag
gcttgaagtc 600ctgttaaacc caaaagatga gatttccctg aattccggca
agcctttcag ggagcttgag 660tccgaattgc tctctcgcag aaaaaaagac
ctgcagcaga tctacgcgga agaaagggag 720aattatctgg ggaaactcga
gcgtgaaatt accaggttct ttgtggacag gggttttctg 780gaaataaaat
ccccgatcct gatccctctt gagtatatcg aaaggatggg cattgataat
840gataccgaac tttcaaaaca gatcttcagg gttgacaaga acttctgcct
gagacccatg 900cttgctccaa acctttacaa ctacctgcgc aagcttgaca
gggccctgcc tgatccaata 960aaaatttttg aaataggccc atgctacaga
aaagagtccg acggcaaaga acacctcgaa 1020gagtttacca tgctgaactt
ctgccagatg ggatcgggat gcacacggga aaatcttgaa 1080agcataatta
cggacttcct gaaccacctg ggaattgatt tcaagatcgt aggcgattcc
1140tgcatggtct atggggatac ccttgatgta atgcacggag acctggaact
ttcctctgca 1200gtagtcggac ccataccgct tgaccgggaa tggggtattg
ataaaccctg gataggggca 1260ggtttcgggc tcgaacgcct tctaaaggtt
aaacacgact ttaaaaatat caagagagct 1320gcaaggtccg agtcttacta
taacgggatt tctaccaacc tgtaa 136581260DNAArtificial
SequenceSynthetic Polynucleotide 8atggataaga agccgctgga tgttctgatc
tctgcgaccg gtctgtggat gtcccgtacc 60ggcacgctgc acaagatcaa gcactatgag
gtttctcgtt ctaaaatcta catcgaaatg 120gcgtgtggtg accatctggt
tgtgaacaac tctcgttctt gtcgtaccgc acgtgcattc 180cgtcatcata
aataccgtaa aacctgcaaa cgttgtcgtg tttctggtga agatatcaac
240aacttcctga cccgttctac cgaaggcaaa acctctgtta aagttaaagt
tgtttctgcg 300ccgaaagtga aaaaagcgat gccgaaatct gtttctcgtg
cgccgaaacc gctggaaaat 360ccggtttctg cgaaagcgtc taccgacacc
tctcgttctg ttccgtctcc ggcgaaatct 420accccgaact ctccggttcc
gacctctgca agtgcccccg cacttacgaa gagccagact 480gacaggcttg
aagtcctgtt aaacccaaaa gatgagattt ccctgaattc cggcaagcct
540ttcagggagc ttgagtccga attgctctct cgcagaaaaa aagacctgca
gcagatctac 600gcggaagaaa gggagaatta tctggggaaa ctcgagcgtg
aaattaccag gttctttgtg 660gacaggggtt ttctggaaat aaaatccccg
atcctgatcc ctcttgagta tatcgaaagg 720atgggcattg ataatgatac
cgaactttca aaacagatct tcagggttga caagaacttc 780tgcctgagac
ccatgatggc tccaaacatt tttaactacg ctcgcaagct tgacagggcc
840ctgcctgatc caataaaaat ttttgaaata ggcccatgct acagaaaaga
gtccgacggc 900aaagaacacc tcgaagagtt taccatgctg aacttctttc
agatgggatc gggatgcaca 960cgggaaaatc ttgaaagcat aattacggac
ttcctgaacc acctgggaat tgatttcaag 1020atcgtaggcg attcctgcat
ggtctatggg gatacccttg atgtaatgca cggagacctg 1080gaactttcct
ctgcagtagt cggacccata ccgcttgacc gggaatgggg tattgataaa
1140ccctggatag gggcaggttt cggactcgaa cgccttctaa aggttaaaca
cgactttaaa 1200aatatcaaga gagctgcaag gtccgagtct tactataacg
ggatttctac caacctgtaa 126091260DNAArtificial SequenceSynthetic
Polynucleotide 9atggataaga agccgctgga tgttctgatc tctgcgaccg
gtctgtggat gtcccgtacc 60ggcacgctgc acaagatcaa gcactatgag gtttctcgtt
ctaaaatcta catcgaaatg 120gcgtgtggtg accatctggt tgtgaacaac
tctcgttctt gtcgtaccgc acgtgcattc 180cgtcatcata aataccgtaa
aacctgcaaa cgttgtcgtg tttctggtga agatatcaac 240aacttcctga
cccgttctac cgaaggcaaa acctctgtta aagttaaagt tgtttctgcg
300ccgaaagtga aaaaagcgat gccgaaatct gtttctcgtg cgccgaaacc
gctggaaaat 360ccggtttctg cgaaagcgtc taccgacacc tctcgttctg
ttccgtctcc ggcgaaatct 420accccgaact ctccggttcc gacctctgcg
ccggcgccgt ctctgacccg ttctcagctg 480gatcgtgttg aagcgctgct
gtctccggaa gataaaatct ctctgaacat cgcgaaaccg 540ttccgtgaac
tggaatctga actggttacc cgtcgtaaaa acgatttcca gcgtctgtac
600accaacgatc gtgaagacta cctgggtaaa ctggaacgtg acatcaccaa
attcttcgtt 660gaccgtgatt tcctggaaat caaatctccg atcctgatcc
cggcggaata cgttgaacgt 720atgggtatca acaacgatac cgaactgtct
aaacagatct tccgtgttga taaaaacctg 780tgcctgcgtc cgatgatggc
gccgaccatt tttaactatg ctcgtaaact ggatcgtatc 840ctgccggacc
cgatcaaaat cttcgaagtt ggtccgtgct accgtaaaga atctgacggt
900aaagaacacc tggaagagtt caccatggtg aacttctttc agatgggttc
tggttgcacc 960cgtgagaacc tggaatctct gatcaaagaa tttctggact
acctggaaat cgacttcgaa 1020atcgttggtg actcctgcat ggtgtacggt
gataccctgg acatcatgca cggtgacctg 1080gaactgtctt ctgcggttgt
tggtccggtt ccgctggatc gtgaatgggg tatcgacaaa 1140ccgtggatcg
gtgcgggttt cggtctggaa cgtctgctga aagttatgca cggtttcaaa
1200aacatcaaac gtgcgtctcg ttctgaatct tactacaacg gtatctctac
caacctgtaa 1260101365DNAArtificial SequenceSynthetic Polynucleotide
10atggataaaa aaccactaaa cactctgata tctgcaaccg ggctctggat gtccaggacc
60ggaacaattc ataaaataaa acaccacgaa gtctctcgaa gcaaaatcta tattgaaatg
120gcatgcggag accaccttgt tgtaaacaac tccaggagca gcaggactgc
aagagcgctc 180aggcaccaca aatacaggaa gacctgcaaa cgctgcaggg
tttcgggtga ggatctcaat 240aagttcctca caaaggcaaa cgaagaccag
acaagcgtaa aagtcaaggt cgtttctgcc 300cctaccagaa cgaaaaaggc
aatgccaaaa tccgttgcga gagccccgaa acctcttgag 360aatacagaag
cggcacaggc tcaaccttct ggatctaaat tttcacctgc gataccggtt
420tccacccaag agtcagtttc tgtcccggca tctgtttcaa catcaatatc
aagcatttct 480acaggagcaa ctgcatccgc actggtaaaa gggaatacga
atcccattac atccatgtct 540gcccctgttc aggcaagtgc ccccgcactt
acgaagagcc agactgacag gcttgaagtc 600ctgttaaacc caaaagatga
gatttccctg aattccggca agcctttcag ggagcttgag 660tccgaattgc
tctctcgcag aaaaaaagac ctgcagcaga tctacgcgga agaaagggag
720aattatctgg ggaaactcga gcgtgaaatt accaggttct ttgtggacag
gggttttctg 780gaaataaaat ccccgatcct gatccctctt gagtatatcg
aaaggatggg cattgataat 840gataccgaac tttcaaaaca gatcttcagg
gttgacaaga acttctgcct gagacccatg 900atggctccaa acatttttaa
ctacgctcgc aagcttgaca gggccctgcc tgatccaata 960aaaatttttg
aaataggccc atgctacaga aaagagtccg acggcaaaga acacctcgaa
1020gagtttacca tgctgaactt ctttcagatg ggatcgggat gcacacggga
aaatcttgaa 1080agcataatta cggacttcct gaaccacctg ggaattgatt
tcaagatcgt aggcgattcc 1140tgcatggtct atggggatac ccttgatgta
atgcacggag acctggaact ttcctctgca 1200gtagtcggac ccataccgct
tgaccgggaa tggggtattg ataaaccctg gataggggca 1260ggtttcgggc
tcgaacgcct tctaaaggtt aaacacgact ttaaaaatat caagagagct
1320gcaaggtccg agtcttacta taacgggatt tctaccaacc tgtaa
1365111260DNAArtificial SequenceSynthetic Polynucleotide
11atggataaga agccgctgga tgttctgatc tctgcgaccg gtctgtggat gtcccgtacc
60ggcacgctgc acaagatcaa gcactatgag gtttctcgtt ctaaaatcta catcgaaatg
120gcgtgtggtg accatctggt tgtgaacaac tctcgttctt gtcgtaccgc
acgtgcattc 180cgtcatcata aataccgtaa aacctgcaaa cgttgtcgtg
tttctgacga agatatcaac 240aacttcctga cccgttctac cgaaggcaaa
acctctgtta aagttaaagt tgtttctgcg 300ccgaaagtga aaaaagcgat
gccgaaatct gtttctcgtg cgccgaaacc gctggaaaat 360ccggtttctg
cgaaagcgtc taccgacacc tctcgttctg ttccgtctcc ggcgaaatct
420accccgaact ctccggttcc gacctctgca agtgcccccg cacttacgaa
gagccagact 480gacaggcttg aagtcctgtt aaacccaaaa gatgagattt
ccctgaattc cggcaagcct 540ttcagggagc ttgagtccga attgctctct
cgcagaaaaa aagacctgca gcagatctac 600gcggaagaaa gggagaatta
tctggggaaa ctcgagcgtg aaattaccag gttctttgtg 660gacaggggtt
ttctggaaat aaaatccccg atcctgatcc ctcttgagta tatcgaaagg
720atgggcattg ataatgatac cgaactttca aaacagatct tcagggttga
caagaacttc 780tgcctgagac ccatgcttgc tccaaacctt tacaactacc
tgcgcaagct tgacagggcc 840ctgcctgatc caataaaaat ttttgaaata
ggcccatgct acagaaaaga gtccgacggc 900aaagaacacc tcgaagagtt
taccatgctg tcgttcattc agatgggatc gggatgtaca 960cgggaaaatc
ttgaaagcat aattacggac ttcctgaacc acctgggaat tgatttcaag
1020atcgtaggcg attcctgcat ggtctatggg gatacccttg atgtaatgca
cggagacctg 1080gaactttcct ctgcagtagt cggacccata ccgcttgacc
gggaatgggg tattgataaa 1140ccctggatag gggcaggttt cgggctcgaa
cgccttctaa aggttaaaca cgactttaaa 1200aatatcaaga gagctgcaag
gtccgagtct tactataacg ggatttctac caacctgtaa 1260121259DNAArtificial
SequenceSynthetic Polynucleotide 12atggataaga agccgctgga tgttctgatc
tctgcgaccg gtctgtggat gtcccgtacc 60ggcacgctgc acaagatcaa gcactatgag
atttctcgtt ctaaaatcta catcgaaatg 120gcgtgtggtg accatctggt
tgtgaacaac tctcgttctt gtcgtcccgc acgtgcattc 180cgttatcata
aataccgtaa aacctgcaaa cgttgtcgtg tttctgacga agatatcaac
240aacttcctga cccgttctac cgaaggcaaa acctctgtta aagttaaagc
tgttctgagc 300cgaaagtgaa aaaagcgatg ccgaaatctg tttctcgtgc
gccgaaaccg ctggaaaatc 360cggtttctgc gaaagcgtct accgacacct
ctcgttctgt tccgtctccg gcgaaatcta 420ccccgaactc tccggttccg
acctctgcaa gtgcccccgc acttacgaag agccagactg 480acaggcttga
agtcctgtta aacccaaaag atgagatttc cctgaattcc ggcaagcctt
540tcagggagct tgagtccgaa ttgctctctc gcagaaaaaa agacctgcag
cagatctacg 600cggaagaaag ggagaattat ctggggaaac tcgagcgtga
aattaccagg ttctttgtgg 660acaggggttt tctggaaata aaatccccga
tcctgatccc tcttgagtat atcgaaagga 720tgggcattga taatgatacc
gaactttcaa aacagatctt cagggttgac aagaacttct 780gcctgagacc
catgcttgct ccaaaccttt acaactacct gcgcaagctt gacagggccc
840tgcctgatcc aataaaaatt tttgaaatag gcccatgcta cagaaaagag
tccgacggca 900aagaacacct cgaagagttt accatgctga acttctgcca
gatgggatcg ggatgcacac 960gggaaaatct tgaaagcata attacggact
tcctgaacca cctgggaatt gatttcaaga 1020tcgtaggcga ttcctgcatg
gtctatgggg atacccttga tgtaatgcac ggagacctgg 1080aactttcctc
tgcagtagtc ggacccatac cgcttgaccg ggaatggggt attgataaac
1140cctggatagg ggcaggtttc gggctcgaac gccttctaaa ggttaaacac
gactttaaaa 1200atatcaagag agctgcaagg tccgagtctt actataacgg
gatttctacc aacctgtaa 1259131259DNAArtificial SequenceSynthetic
Polynucleotide 13atggataaga agccgctgga tgttctgatc tctgcgaccg
gtctgtggat gtcccgtacc 60ggcacgctgc acaagatcaa gcactatgag atttctcgtt
ctaaaatcta catcgaaatg 120gcgtgtggtg accatctggt tgtgaacaac
tctcgttctt gtcgtcccgc acgtgcattc 180cgttatcata aataccgtaa
aacctgcaaa cgttgtcgtg tttctgacga agatatcaac 240aacttcctga
cccgttctac cgaaggcaaa acctctgtta aagttaaagt tgttctgagc
300cgaaagtgaa aaaagcgatg ccgaaatctg tttctcgtgc gccgaaaccg
ctggaaaatc 360cggtttctgc gaaagcgtct accgacacct ctcgttctgt
tccgtctccg gcgaaatcta 420ccccgaactc tccggttccg acctctgcaa
gtgcccccgc acttacgaag agccagactg 480acaggcttga agtcctgtta
aacccaaaag atgagatttc cctgaattcc ggcaagcctt 540tcagggagct
tgagtccgaa ttgctctctc gcagaaaaaa agacctgcag cagatctacg
600cggaagaaag ggagaattat ctggggaaac tcgagcgtga aattaccagg
ttctttgtgg 660acaggggttt tctggaaata aaatccccga tcctgatccc
tcttgagtat atcgaaagga 720tgggcattga taatgatacc gaactttcaa
aacagatctt cagggttgac aagaacttct 780gcctgagacc catgcttgct
ccaaaccttt acaactacct gcgcaagctt gacagggccc 840tgcctgatcc
aataaaaatt tttgaaatag gcccatgcta cagaaaagag tccgacggca
900aagaacacct cgaagagttt accatgctga acttctgcca gatgggatcg
ggatgcacac 960gggaaaatct tgaaagcata attacggact tcctgaacca
cctgggaatt gatttcaaga 1020tcgtaggcga ttcctgcatg gtctatgggg
atacccttga tgtaatgcac ggagacctgg 1080aactttcctc tgcagtagtc
ggacccatac cgcttgaccg ggaatggggt attgataaac 1140cctggatagg
ggcaggtttc gggctcgaac gccttctaaa ggttaaacac gactttaaaa
1200atatcaagag agctgcaagg tccgagtctt actataacgg gatttctacc
aacctgtaa 1259141261DNAArtificial SequenceSynthetic Polynucleotide
14atggataaga agccgctgga tgttctgatc tctgcgaccg gtctgtggat gtcccgtacc
60ggcacgctgc acaagatcaa gcactatgag atttctcgtt ctaaaatcta catcgaaatg
120gcgtgtggtg accatctggt tgtgaacaac tctcgttctt gtcgtcccgc
acgtgcattc 180cgttatcata aataccgtaa aacctgcaaa cgttgtcgtg
tttctgacga agatatcaac 240aacttcctga cccgttctac cgaaggcaaa
acctctgtta aagttaaagt tgttttctga 300gccgaaagtg aaaaaagcga
tgccgaaatc tgtttctcgt gcgccgaaac cgctggaaaa 360tccggtttct
gcgaaagcgt ctaccgacac ctctcgttct gttccgtctc cggcgaaatc
420taccccgaac tctccggttc cgacctctgc aagtgccccc gcacttacga
agagccagac 480tgacaggctt gaagtcctgt taaacccaaa agatgagatt
tccctgaatt ccggcaagcc 540tttcagggag cttgagtccg aattgctctc
tcgcagaaaa aaagacctgc agcagatcta 600cgcggaagaa agggagaatt
atctggggaa actcgagcgt gaaattacca ggttctttgt 660ggacaggggt
tttctggaaa taaaatcccc gatcctgatc cctcttgagt atatcgaaag
720gatgggcatt gataatgata ccgaactttc aaaacagatc ttcagggttg
acaagaactt 780ctgcctgaga cccatgcttg ctccaaacct ttacaactac
ctgcgcaagc ttgacagggc 840cctgcctgat ccaataaaaa tttttgaaat
aggcccatgc tacagaaaag agtccgacgg 900caaagaacac ctcgaagagt
ttaccatgct gaacttctgc cagatgggat cgggatgcac 960acgggaaaat
cttgaaagca taattacgga cttcctgaac cacctgggaa ttgatttcaa
1020gatcgtaggc gattcctgca tggtctatgg ggataccctt gatgtaatgc
acggagacct 1080ggaactttcc tctgcagtag tcggacccat accgcttgac
cgggaatggg gtattgataa 1140accctggata ggggcaggtt tcgggctcga
acgccttcta aaggttaaac acgactttaa 1200aaatatcaag agagctgcaa
ggtccgagtc ttactataac gggatttcta ccaacctgta 1260a
1261151260DNAArtificial SequenceSynthetic Polynucleotide
15atggataaga agccgctgga tgttctgatc tctgcgaccg gtctgtggat gtcccgtacc
60ggcacgctgc acaagatcaa gcactatgag atttctcgtt ctaaaatcta catcgaaatg
120gcgtgtggtg accatctggt tgtgaacaac tctcgttctt gtcgtcccgc
acgtgcattc 180cgttatcata aataccgtaa aacctgcaaa cgttgtcgtg
tttctgacga agatatcaac 240aacttcctga cccgttctac cgaaggctaa
acctctgtta aagttaaagt tgtttctgag 300ccgaaagtga aaaaagcgat
gccgaaatct gtttctcgtg cgccgaaacc gctggaaaat 360ccggtttctg
cgaaagcgtc taccgacacc tctcgttctg ttccgtctcc ggcgaaatct
420accccgaact ctccggttcc gacctctgca agtgcccccg cacttacgaa
gagccagact 480gacaggcttg aagtcctgtt aaacccaaaa gatgagattt
ccctgaattc cggcaagcct 540ttcagggagc ttgagtccga attgctctct
cgcagaaaaa aagacctgca gcagatctac 600gcggaagaaa gggagaatta
tctggggaaa ctcgagcgtg aaattaccag gttctttgtg 660gacaggggtt
ttctggaaat aaaatccccg atcctgatcc ctcttgagta tatcgaaagg
720atgggcattg ataatgatac cgaactttca aaacagatct tcagggttga
caagaacttc 780tgcctgagac ccatgcttgc tccaaacctt tacaactacc
tgcgcaagct tgacagggcc 840ctgcctgatc caataaaaat ttttgaaata
ggcccatgct acagaaaaga gtccgacggc 900aaagaacacc tcgaagagtt
taccatgctg aacttctgcc agatgggatc gggatgcaca 960cgggaaaatc
ttgaaagcat aattacggac ttcctgaacc acctgggaat tgatttcaag
1020atcgtaggcg attcctgcat ggtctatggg gatacccttg atgtaatgca
cggagacctg 1080gaactttcct ctgcagtagt cggacccata
ccgcttgacc gggaatgggg tattgataaa 1140ccctggatag gggcaggttt
cgggctcgaa cgccttctaa aggttaaaca cgactttaaa 1200aatatcaaga
gagctgcaag gtccgagtct tactataacg ggatttctac caacctgtaa
1260161259DNAArtificial SequenceSynthetic Polynucleotide
16atggataaga agccgctgga tgttctgatc tctgcgaccg gtctgtggat gtcccgtacc
60ggcacgctgc acaagatcaa gcactatgag atttctcgtt ctaaaatcta catcgaaatg
120gcgtgtggtg accatctggt tgtgaacaac tctcgttctt gtcgtcccgc
acgtgcattc 180cgttatcata aataccgtaa aacctgcaaa cgttgtcgtg
tttctgacga agatatcaac 240aacttcctga cccgttctac cgaaggcaaa
acctctgtta aagttaaagt tgtttctgag 300cgaaagtgaa aaaagcgatg
ccgaaatctg tttctcgtgc gccgaaaccg ctggaaaatc 360cggtttctgc
gaaagcgtct accgacacct ctcgttctgt tccgtctccg gcgaaatcta
420ccccgaactc tccggttccg acctctgcaa gtgcccccgc acttacgaag
agccagactg 480acaggcttga agtcctgtta aacccaaaag atgagatttc
cctgaattcc ggcaagcctt 540tcagggagct tgagtccgaa ttgctctctc
gcagaaaaaa agacctgcag cagatctacg 600cggaagaaag ggagaattat
ctggggaaac tcgagcgtga aattaccagg ttctttgtgg 660acaggggttt
tctggaaata aaatccccga tcctgatccc tcttgagtat atcgaaagga
720tgggcattga taatgatacc gaactttcaa aacagatctt cagggttgac
aagaacttct 780gcctgagacc catgcttgct ccaaaccttt acaactacct
gcgcaagctt gacagggccc 840tgcctgatcc aataaaaatt tttgaaatag
gcccatgcta cagaaaagag tccgacggca 900aagaacacct cgaagagttt
accatgctga acttctgcca gatgggatcg ggatgcacac 960gggaaaatct
tgaaagcata attacggact tcctgaacca cctgggaatt gatttcaaga
1020tcgtaggcga ttcctgcatg gtctatgggg atacccttga tgtaatgcac
ggagacctgg 1080aactttcctc tgcagtagtc ggacccatac cgcttgaccg
ggaatggggt attgataaac 1140cctggatagg ggcaggtttc gggctcgaac
gccttctaaa ggttaaacac gactttaaaa 1200atatcaagag agctgcaagg
tccgagtctt actataacgg gatttctacc aacctgtaa 1259171261DNAArtificial
SequenceSynthetic Polynucleotide 17atggataaga agccgctgga tgttctgatc
tctgcgaccg gtctgtggat gtcccgtacc 60ggcacgctgc acaagatcaa gcactatgag
atttctcgtt ctaaaatcta catcgaaatg 120gcgtgtggtg accatctggt
tgtgaacaac tctcgttctt gtcgtcccgc acgtgcattc 180cgttatcata
aataccgtaa aacctgcaaa cgttgtcgtg tttctgacga agatatcaac
240aacttcctga cccgttctac cgaaggcaaa accctctgtt aaagttaaag
ttgtttctga 300gccgaaagtg aaaaaagcga tgccgaaatc tgtttctcgt
gcgccgaaac cgctggaaaa 360tccggtttct gcgaaagcgt ctaccgacac
ctctcgttct gttccgtctc cggcgaaatc 420taccccgaac tctccggttc
cgacctctgc aagtgccccc gcacttacga agagccagac 480tgacaggctt
gaagtcctgt taaacccaaa agatgagatt tccctgaatt ccggcaagcc
540tttcagggag cttgagtccg aattgctctc tcgcagaaaa aaagacctgc
agcagatcta 600cgcggaagaa agggagaatt atctggggaa actcgagcgt
gaaattacca ggttctttgt 660ggacaggggt tttctggaaa taaaatcccc
gatcctgatc cctcttgagt atatcgaaag 720gatgggcatt gataatgata
ccgaactttc aaaacagatc ttcagggttg acaagaactt 780ctgcctgaga
cccatgcttg ctccaaacct ttacaactac ctgcgcaagc ttgacagggc
840cctgcctgat ccaataaaaa tttttgaaat aggcccatgc tacagaaaag
agtccgacgg 900caaagaacac ctcgaagagt ttaccatgct gaacttctgc
cagatgggat cgggatgcac 960acgggaaaat cttgaaagca taattacgga
cttcctgaac cacctgggaa ttgatttcaa 1020gatcgtaggc gattcctgca
tggtctatgg ggataccctt gatgtaatgc acggagacct 1080ggaactttcc
tctgcagtag tcggacccat accgcttgac cgggaatggg gtattgataa
1140accctggata ggggcaggtt tcgggctcga acgccttcta aaggttaaac
acgactttaa 1200aaatatcaag agagctgcaa ggtccgagtc ttactataac
gggatttcta ccaacctgta 1260a 126118921DNAArtificial
SequenceSynthetic Polynucleotide 18atggacgaat ttgaaatgat aaagagaaac
acatctgaaa ttatcagcga ggaagagtta 60agagaggttt taaaaaaaga tgaaaaatct
gctctgatag gttttgaacc aagtggtaaa 120atacatttag ggcattatct
ccaaataaaa aagatgattg atttacaaaa tgctggattt 180gatataatta
tattgttggc tgatttacac gcctatttaa accagaaagg agagttggat
240gagattagaa aaataggaga ttataacaaa aaagtttttg aagcaatggg
gttaaaggca 300aaatatgttt atggaagttc gttccagctt gataaggatt
atacactgaa tgtctataga 360ttggctttaa aaactacctt aaaaagagca
agaaggagta tggaacttat agcaagagag 420gatgaaaatc caaaggttgc
tgaagttatc tatccaataa tgcaggttaa tcctcttaat 480tatgagggcg
ttgatgttgc agttggaggg atggagcaga gaaaaataca catgttagca
540agggagcttt taccaaaaaa ggttgtttgt attcacaacc ctgtcttaac
gggtttggat 600ggagaaggaa agatgagttc ttcaaaaggg aattttatag
ctgttgatga ctctccagaa 660gagattaggg ctaagataaa gaaagcatac
tgcccagctg gagttgttga aggaaatcca 720ataatggaga tagctaaata
cttccttgaa tatcctttaa ccataaaaag gccagaaaaa 780tttggtggag
atttgacagt taatagctat gaggagttag agagtttatt taaaaataag
840gaattgcatc caatgcgctt aaaaaatgct gtagctgaag aacttataaa
gattttagag 900ccaattagaa agagattata a 92119921DNAArtificial
SequenceSynthetic Polynucleotide 19atggacgaat ttgaaatgat aaagagaaac
acatctgaaa ttatcagcga ggaagagtta 60agagaggttt taaaaaaaga tgaaaaatct
gctctgatag gttttgaacc aagtggtaaa 120atacatttag ggcattatct
ccaaataaaa aagatgattg atttacaaaa tgctggattt 180gatataatta
tattgttggc tgatttacac gcctatttaa accagaaagg agagttggat
240gagattagaa aaataggaga ttataacaaa aaagtttttg aagcaatggg
gttaaaggca 300aaatatgttt atggaagttc gttccagctt gataaggatt
atacactgaa tgtctataga 360ttggctttaa aaactacctt aaaaagagca
agaaggagta tggaacttat agcaagagag 420gatgaaaatc caaaggttgc
tgaagttatc tatccaataa tgcaggttaa tcctcttcat 480tatgagggcg
ttgatgttgc agttggaggg atggagcaga gaaaaataca catgttagca
540agggagcttt taccaaaaaa ggttgtttgt attcacaacc ctgtcttaac
gggtttggat 600ggagaaggaa agatgagttc ttcaaaaggg aattttatag
ctgttgatga ctctccagaa 660gagattaggg ctaagataaa gaaagcatac
tgcccagctg gagttgttga aggaaatcca 720ataatggaga tagctaaata
cttccttgaa tatcctttaa ccataaaaag gccagaaaaa 780tttggtggag
atttgacagt taatagctat gaggagttag agagtttatt taaaaataag
840gaattgcatc caatgcgctt aaaaaatgct gtagctgaag aacttataaa
gattttagag 900ccaattagaa agagattata a 92120419PRTArtificial
SequenceSynthetic Polypeptide 20Met Asp Lys Lys Pro Leu Asp Val Leu
Ile Ser Ala Thr Gly Leu Trp1 5 10 15Met Ser Arg Thr Gly Thr Leu His
Lys Ile Lys His Tyr Glu Val Ser 20 25 30Arg Ser Lys Ile Tyr Ile Glu
Met Ala Cys Gly Asp His Leu Val Val 35 40 45Asn Asn Ser Arg Ser Cys
Arg Thr Ala Arg Ala Phe Arg His His Lys 50 55 60Tyr Arg Lys Thr Cys
Lys Arg Cys Arg Val Ser Asp Glu Asp Ile Asn65 70 75 80Asn Phe Leu
Thr Arg Ser Thr Glu Gly Lys Thr Ser Val Lys Val Lys 85 90 95Val Val
Ser Ala Pro Lys Val Lys Lys Ala Met Pro Lys Ser Val Ser 100 105
110Arg Ala Pro Lys Pro Leu Glu Asn Pro Val Ser Ala Lys Ala Ser Thr
115 120 125Asp Thr Ser Arg Ser Val Pro Ser Pro Ala Lys Ser Thr Pro
Asn Ser 130 135 140Pro Val Pro Thr Ser Ala Pro Ala Pro Ser Leu Thr
Arg Ser Gln Leu145 150 155 160Asp Arg Val Glu Ala Leu Leu Ser Pro
Glu Asp Lys Ile Ser Leu Asn 165 170 175Ile Ala Lys Pro Phe Arg Glu
Leu Glu Ser Glu Leu Val Thr Arg Arg 180 185 190Lys Asn Asp Phe Gln
Arg Leu Tyr Thr Asn Asp Arg Glu Asp Tyr Leu 195 200 205Gly Lys Leu
Glu Arg Asp Ile Thr Lys Phe Phe Val Asp Arg Asp Phe 210 215 220Leu
Glu Ile Lys Ser Pro Ile Leu Ile Pro Ala Glu Tyr Val Glu Arg225 230
235 240Met Gly Ile Asn Asn Asp Thr Glu Leu Ser Lys Gln Ile Phe Arg
Val 245 250 255Asp Lys Asn Leu Cys Leu Arg Pro Met Leu Ala Pro Thr
Leu Tyr Asn 260 265 270Tyr Leu Arg Lys Leu Asp Arg Ile Leu Pro Asp
Pro Ile Lys Ile Phe 275 280 285Glu Val Gly Pro Cys Tyr Arg Lys Glu
Ser Asp Gly Lys Glu His Leu 290 295 300Glu Glu Phe Thr Met Val Asn
Phe Cys Gln Met Gly Ser Gly Cys Thr305 310 315 320Arg Glu Asn Leu
Glu Ser Leu Ile Lys Glu Phe Leu Asp Tyr Leu Glu 325 330 335Ile Asp
Phe Glu Ile Val Gly Asp Ser Cys Met Val Tyr Gly Asp Thr 340 345
350Leu Asp Ile Met His Gly Asp Leu Glu Leu Ser Ser Ala Val Val Gly
355 360 365Pro Val Pro Leu Asp Arg Glu Trp Gly Ile Asp Lys Pro Trp
Ile Gly 370 375 380Ala Gly Phe Gly Leu Glu Arg Leu Leu Lys Val Met
His Gly Phe Lys385 390 395 400Asn Ile Lys Arg Ala Ser Arg Ser Glu
Ser Tyr Tyr Asn Gly Ile Ser 405 410 415Thr Asn
Leu21454PRTArtificial SequenceSynthetic Polypeptide 21Met Asp Lys
Lys Pro Leu Asn Thr Leu Ile Ser Ala Thr Gly Leu Trp1 5 10 15Met Ser
Arg Thr Gly Thr Ile His Lys Ile Lys His His Glu Val Ser 20 25 30Arg
Ser Lys Ile Tyr Ile Glu Met Ala Cys Gly Asp His Leu Val Val 35 40
45Asn Asn Ser Arg Ser Ser Arg Thr Ala Arg Ala Leu Arg His His Lys
50 55 60Tyr Arg Lys Thr Cys Lys Arg Cys Arg Val Ser Asp Glu Asp Leu
Asn65 70 75 80Lys Phe Leu Thr Lys Ala Asn Glu Asp Gln Thr Ser Val
Lys Val Lys 85 90 95Val Val Ser Ala Pro Thr Arg Thr Lys Lys Ala Met
Pro Lys Ser Val 100 105 110Ala Arg Ala Pro Lys Pro Leu Glu Asn Thr
Glu Ala Ala Gln Ala Gln 115 120 125Pro Ser Gly Ser Lys Phe Ser Pro
Ala Ile Pro Val Ser Thr Gln Glu 130 135 140Ser Val Ser Val Pro Ala
Ser Val Ser Thr Ser Ile Ser Ser Ile Ser145 150 155 160Thr Gly Ala
Thr Ala Ser Ala Leu Val Lys Gly Asn Thr Asn Pro Ile 165 170 175Thr
Ser Met Ser Ala Pro Val Gln Ala Ser Ala Pro Ala Leu Thr Lys 180 185
190Ser Gln Thr Asp Arg Leu Glu Val Leu Leu Asn Pro Lys Asp Glu Ile
195 200 205Ser Leu Asn Ser Gly Lys Pro Phe Arg Glu Leu Glu Ser Glu
Leu Leu 210 215 220Ser Arg Arg Lys Lys Asp Leu Gln Gln Ile Tyr Ala
Glu Glu Arg Glu225 230 235 240Asn Tyr Leu Gly Lys Leu Glu Arg Glu
Ile Thr Arg Phe Phe Val Asp 245 250 255Arg Gly Phe Leu Glu Ile Lys
Ser Pro Ile Leu Ile Pro Leu Glu Tyr 260 265 270Ile Glu Arg Met Gly
Ile Asp Asn Asp Thr Glu Leu Ser Lys Gln Ile 275 280 285Phe Arg Val
Asp Lys Asn Phe Cys Leu Arg Pro Met Leu Ala Pro Asn 290 295 300Leu
Tyr Asn Tyr Leu Arg Lys Leu Asp Arg Ala Leu Pro Asp Pro Ile305 310
315 320Lys Ile Phe Glu Ile Gly Pro Cys Tyr Arg Lys Glu Ser Asp Gly
Lys 325 330 335Glu His Leu Glu Glu Phe Thr Met Leu Asn Phe Cys Gln
Met Gly Ser 340 345 350Gly Cys Thr Arg Glu Asn Leu Glu Ser Ile Ile
Thr Asp Phe Leu Asn 355 360 365His Leu Gly Ile Asp Phe Lys Ile Val
Gly Asp Ser Cys Met Val Tyr 370 375 380Gly Asp Thr Leu Asp Val Met
His Gly Asp Leu Glu Leu Ser Ser Ala385 390 395 400Val Val Gly Pro
Ile Pro Leu Asp Arg Glu Trp Gly Ile Asp Lys Pro 405 410 415Trp Ile
Gly Ala Gly Phe Gly Leu Glu Arg Leu Leu Lys Val Lys His 420 425
430Asp Phe Lys Asn Ile Lys Arg Ala Ala Arg Ser Glu Ser Tyr Tyr Asn
435 440 445Gly Ile Ser Thr Asn Leu 45022306PRTArtificial
SequenceSynthetic Polypeptide 22Met Asp Glu Phe Glu Met Ile Lys Arg
Asn Thr Ser Glu Ile Ile Ser1 5 10 15Glu Glu Glu Leu Arg Glu Val Leu
Lys Lys Asp Glu Lys Ser Ala Leu 20 25 30Ile Gly Phe Glu Pro Ser Gly
Lys Ile His Leu Gly His Tyr Leu Gln 35 40 45Ile Lys Lys Met Ile Asp
Leu Gln Asn Ala Gly Phe Asp Ile Ile Ile 50 55 60Leu Leu Ala Asp Leu
His Ala Tyr Leu Asn Gln Lys Gly Glu Leu Asp65 70 75 80Glu Ile Arg
Lys Ile Gly Asp Tyr Asn Lys Lys Val Phe Glu Ala Met 85 90 95Gly Leu
Lys Ala Lys Tyr Val Tyr Gly Ser Ser Phe Gln Leu Asp Lys 100 105
110Asp Tyr Thr Leu Asn Val Tyr Arg Leu Ala Leu Lys Thr Thr Leu Lys
115 120 125Arg Ala Arg Arg Ser Met Glu Leu Ile Ala Arg Glu Asp Glu
Asn Pro 130 135 140Lys Val Ala Glu Val Ile Tyr Pro Ile Met Gln Val
Asn Pro Leu Asn145 150 155 160Tyr Glu Gly Val Asp Val Ala Val Gly
Gly Met Glu Gln Arg Lys Ile 165 170 175His Met Leu Ala Arg Glu Leu
Leu Pro Lys Lys Val Val Cys Ile His 180 185 190Asn Pro Val Leu Thr
Gly Leu Asp Gly Glu Gly Lys Met Ser Ser Ser 195 200 205Lys Gly Asn
Phe Ile Ala Val Asp Asp Ser Pro Glu Glu Ile Arg Ala 210 215 220Lys
Ile Lys Lys Ala Tyr Cys Pro Ala Gly Val Val Glu Gly Asn Pro225 230
235 240Ile Met Glu Ile Ala Lys Tyr Phe Leu Glu Tyr Pro Leu Thr Ile
Lys 245 250 255Arg Pro Glu Lys Phe Gly Gly Asp Leu Thr Val Asn Ser
Tyr Glu Glu 260 265 270Leu Glu Ser Leu Phe Lys Asn Lys Glu Leu His
Pro Met Arg Leu Lys 275 280 285Asn Ala Val Ala Glu Glu Leu Ile Lys
Ile Leu Glu Pro Ile Arg Lys 290 295 300Arg Leu30523306PRTArtificial
SequenceSynthetic Polypeptide 23Met Asp Glu Phe Glu Met Ile Lys Arg
Asn Thr Ser Glu Ile Ile Ser1 5 10 15Glu Glu Glu Leu Arg Glu Val Leu
Lys Lys Asp Glu Lys Ser Ala Leu 20 25 30Ile Gly Phe Glu Pro Ser Gly
Lys Ile His Leu Gly His Tyr Leu Gln 35 40 45Ile Lys Lys Met Ile Asp
Leu Gln Asn Ala Gly Phe Asp Ile Ile Ile 50 55 60Leu Leu Ala Asp Leu
His Ala Tyr Leu Asn Gln Lys Gly Glu Leu Asp65 70 75 80Glu Ile Arg
Lys Ile Gly Asp Tyr Asn Lys Lys Val Phe Glu Ala Met 85 90 95Gly Leu
Lys Ala Lys Tyr Val Tyr Gly Ser Ser Phe Gln Leu Asp Lys 100 105
110Asp Tyr Thr Leu Asn Val Tyr Arg Leu Ala Leu Lys Thr Thr Leu Lys
115 120 125Arg Ala Arg Arg Ser Met Glu Leu Ile Ala Arg Glu Asp Glu
Asn Pro 130 135 140Lys Val Ala Glu Val Ile Tyr Pro Ile Met Gln Val
Asn Pro Leu His145 150 155 160Tyr Glu Gly Val Asp Val Ala Val Gly
Gly Met Glu Gln Arg Lys Ile 165 170 175His Met Leu Ala Arg Glu Leu
Leu Pro Lys Lys Val Val Cys Ile His 180 185 190Asn Pro Val Leu Thr
Gly Leu Asp Gly Glu Gly Lys Met Ser Ser Ser 195 200 205Lys Gly Asn
Phe Ile Ala Val Asp Asp Ser Pro Glu Glu Ile Arg Ala 210 215 220Lys
Ile Lys Lys Ala Tyr Cys Pro Ala Gly Val Val Glu Gly Asn Pro225 230
235 240Ile Met Glu Ile Ala Lys Tyr Phe Leu Glu Tyr Pro Leu Thr Ile
Lys 245 250 255Arg Pro Glu Lys Phe Gly Gly Asp Leu Thr Val Asn Ser
Tyr Glu Glu 260 265 270Leu Glu Ser Leu Phe Lys Asn Lys Glu Leu His
Pro Met Arg Leu Lys 275 280 285Asn Ala Val Ala Glu Glu Leu Ile Lys
Ile Leu Glu Pro Ile Arg Lys 290 295 300Arg Leu30524305PRTArtificial
SequenceSynthetic Polypeptide 24Met Asp Glu Phe Glu Met Ile Lys Arg
Asn Thr Ser Glu Ile Ile Ser1 5 10 15Glu Glu Glu Leu Arg Glu Val Leu
Lys Lys Asp Glu Lys Ser Ala Tyr 20 25 30Ile Gly Phe Glu Pro Ser Gly
Lys Ile His Leu Gly His Tyr Leu Gln 35 40 45Ile Lys Lys Met Ile Asp
Leu Gln Asn Ala Gly Phe Asp Ile Ile Ile 50 55 60Leu Leu Ala Asp Leu
His Ala Tyr Leu Asn Gln Lys Gly Glu Leu Asp65 70 75 80Glu Ile Arg
Lys Ile Gly Asp Tyr Asn Lys Lys Val Phe Glu Ala Met 85 90 95Gly Leu
Lys Ala Lys Tyr Val Tyr Gly Ser Glu Phe Gln Leu Asp Lys 100 105
110Asp Tyr Thr Leu Asn Val Tyr Arg Leu Ala Leu Lys Thr Thr Leu Lys
115 120 125Arg Ala Arg Arg Ser Met Glu Leu Ile Ala Arg Glu Asp Glu
Asn Pro 130 135 140Lys Val Ala Glu Val Ile Tyr Pro Ile Met Gln Val
Asn Asp Ile His145 150 155 160Tyr Leu Gly Val Asp Val Ala
Val Gly Gly Met Glu Gln Arg Lys Ile 165 170 175His Met Leu Ala Arg
Glu Leu Leu Pro Lys Lys Val Val Cys Ile His 180 185 190Asn Pro Val
Leu Thr Gly Leu Asp Gly Glu Gly Lys Met Ser Ser Ser 195 200 205Lys
Gly Asn Phe Ile Ala Val Asp Asp Ser Pro Glu Glu Ile Arg Ala 210 215
220Lys Ile Lys Lys Ala Tyr Cys Pro Ala Gly Val Val Glu Gly Asn
Pro225 230 235 240Ile Met Glu Ile Ala Lys Tyr Phe Leu Glu Tyr Pro
Leu Thr Ile Lys 245 250 255Arg Pro Glu Lys Phe Gly Gly Asp Leu Thr
Val Asn Ser Tyr Glu Glu 260 265 270Leu Glu Ser Leu Phe Lys Asn Lys
Glu Leu His Pro Met Asp Leu Lys 275 280 285Asn Ala Val Ala Glu Glu
Leu Ile Lys Ile Leu Glu Pro Ile Arg Lys 290 295
300Arg305251260DNAArtificial SequenceSynthetic Polynucleotide
25atggataaga agccgctgga tgttctgatc tctgcgaccg gtctgtggat gtcccgtacc
60ggcacgctgc acaagatcaa gcactatgag atttctcgtt ctaaaatcta catcgaaatg
120gcgtgtggtg accatctggt tgtgaacaac tctcgttctt gtcgtcccgc
acgtgcattc 180cgttatcata aataccgtaa aacctgcaaa cgttgtcgtg
tttctgacga agatatcaac 240aacttcctga cccgttctac cgaaggcaaa
acctctgtta aagttaaagt tgtttctgag 300ccgaaagtga aaaaagcgat
gccgaaatct gtttctcgtg cgccgaaacc gctggaaaat 360ccggtttctg
cgaaagcgtc taccgacacc tctcgttctg ttccgtctcc ggcgaaatct
420accccgaact ctccggttcc gacctctgca agtgcccccg cacttacgaa
gagccagact 480gacaggcttg aagtcctgtt aaacccaaaa gatgagattt
ccctgaattc cggcaagcct 540ttcagggagc ttgagtccga attgctctct
cgcagaaaaa aagacctgca gcagatctac 600gcggaagaaa gggagaatta
tctggggaaa ctcgagcgtg aaattaccag gttctttgtg 660gacaggggtt
ttctggaaat aaaatccccg atcctgatcc ctcttgagta tatcgaaagg
720atgggcattg ataatgatac cgaactttca aaacagatct tcagggttga
caagaacttc 780tgcctgagac ccatgcttgc tccaaacctt tacaactacc
tgcgcaagct tgacagggcc 840ctgcctgatc caataaaaat ttttgaaata
ggcccatgct acagaaaaga gtccgacggc 900aaagaacacc tcgaagagtt
taccatgctg aacttctgcc agatgggatc gggatgcaca 960cgggaaaatc
ttgaaagcat aattacggac ttcctgaacc acctgggaat tgatttcaag
1020atcgtaggcg attcctgcat ggtctatggg gatacccttg atgtaatgca
cggagacctg 1080gaactttcct ctgcagtagt cggacccata ccgcttgacc
gggaatgggg tattgataaa 1140ccctggatag gggcaggttt cggactcgaa
cgccttctaa aggttaaaca cgactttaaa 1200aatatcaaga gagctgcaag
gtccgagtct tactataacg ggatttctac caacctgtaa 126026419PRTArtificial
SequenceSynthetic Polypeptide 26Met Asp Lys Lys Pro Leu Asp Val Leu
Ile Ser Ala Thr Gly Leu Trp1 5 10 15Met Ser Arg Thr Gly Thr Leu His
Lys Ile Lys His Tyr Glu Ile Ser 20 25 30Arg Ser Lys Ile Tyr Ile Glu
Met Ala Cys Gly Asp His Leu Val Val 35 40 45Asn Asn Ser Arg Ser Cys
Arg Pro Ala Arg Ala Phe Arg Tyr His Lys 50 55 60Tyr Arg Lys Thr Cys
Lys Arg Cys Arg Val Ser Asp Glu Asp Ile Asn65 70 75 80Asn Phe Leu
Thr Arg Ser Thr Glu Gly Lys Thr Ser Val Lys Val Lys 85 90 95Val Val
Ser Glu Pro Lys Val Lys Lys Ala Met Pro Lys Ser Val Ser 100 105
110Arg Ala Pro Lys Pro Leu Glu Asn Pro Val Ser Ala Lys Ala Ser Thr
115 120 125Asp Thr Ser Arg Ser Val Pro Ser Pro Ala Lys Ser Thr Pro
Asn Ser 130 135 140Pro Val Pro Thr Ser Ala Ser Ala Pro Ala Leu Thr
Lys Ser Gln Thr145 150 155 160Asp Arg Leu Glu Val Leu Leu Asn Pro
Lys Asp Glu Ile Ser Leu Asn 165 170 175Ser Gly Lys Pro Phe Arg Glu
Leu Glu Ser Glu Leu Leu Ser Arg Arg 180 185 190Lys Lys Asp Leu Gln
Gln Ile Tyr Ala Glu Glu Arg Glu Asn Tyr Leu 195 200 205Gly Lys Leu
Glu Arg Glu Ile Thr Arg Phe Phe Val Asp Arg Gly Phe 210 215 220Leu
Glu Ile Lys Ser Pro Ile Leu Ile Pro Leu Glu Tyr Ile Glu Arg225 230
235 240Met Gly Ile Asp Asn Asp Thr Glu Leu Ser Lys Gln Ile Phe Arg
Val 245 250 255Asp Lys Asn Phe Cys Leu Arg Pro Met Leu Ala Pro Asn
Leu Tyr Asn 260 265 270Tyr Leu Arg Lys Leu Asp Arg Ala Leu Pro Asp
Pro Ile Lys Ile Phe 275 280 285Glu Ile Gly Pro Cys Tyr Arg Lys Glu
Ser Asp Gly Lys Glu His Leu 290 295 300Glu Glu Phe Thr Met Leu Asn
Phe Cys Gln Met Gly Ser Gly Cys Thr305 310 315 320Arg Glu Asn Leu
Glu Ser Ile Ile Thr Asp Phe Leu Asn His Leu Gly 325 330 335Ile Asp
Phe Lys Ile Val Gly Asp Ser Cys Met Val Tyr Gly Asp Thr 340 345
350Leu Asp Val Met His Gly Asp Leu Glu Leu Ser Ser Ala Val Val Gly
355 360 365Pro Ile Pro Leu Asp Arg Glu Trp Gly Ile Asp Lys Pro Trp
Ile Gly 370 375 380Ala Gly Phe Gly Leu Glu Arg Leu Leu Lys Val Lys
His Asp Phe Lys385 390 395 400Asn Ile Lys Arg Ala Ala Arg Ser Glu
Ser Tyr Tyr Asn Gly Ile Ser 405 410 415Thr Asn
Leu271260DNAArtificial SequenceSynthetic Polynucleotide
27atggataaga agccgctgga tgttctgatc tctgcgaccg gtctgtggat gtcccgtacc
60ggcacgctgc acaagatcaa gcactatgag atttctcgtt ctaaaatcta catcgaaatg
120gcgtgtggtg accatctggt tgtgaacaac tctcgttctt gtcgtcccgc
acgtgcattc 180cgttatcata aataccgtaa aacctgcaaa cgttgtcgtg
tttctgacga agatatcaac 240aacttcctga cccgttctac cgaaggcaaa
acctctgtta aagttaaagt tgtttctgag 300ccgaaagtga aaaaagcgat
gccgaaatct gtttctcgtg cgccgaaacc gctggaaaat 360ccggtttctg
cgaaagcgtc taccgacacc tctcgttctg ttccgtctcc ggcgaaatct
420accccgaact ctccggttcc gacctctgcg ccggcgccgt ctctgacccg
ttctcagctg 480gatcgtgttg aagcgctgct gtctccggaa gataaaatct
ctctgaacat cgcgaaaccg 540ttccgtgaac tggaatctga actggttacc
cgtcgtaaaa acgatttcca gcgtctgtac 600accaacgatc gtgaagacta
cctgggtaaa ctggaacgtg acatcaccaa attcttcgtt 660gaccgtgatt
tcctggaaat caaatctccg atcctgatcc cggcggaata cgttgaacgt
720atgggtatca acaacgatac cgaactgtct aaacagatct tccgtgttga
taaaaacctg 780tgcctgcgtc cgatgctggc gccgaccctg tacaactatc
tgcgtaaact ggatcgtatc 840ctgccggacc cgatcaaaat cttcgaagtt
ggtccgtgct accgtaaaga atctgacggt 900aaagaacacc tggaagagtt
caccatggtg aacttctgcc agatgggttc tggttgcacc 960cgtgagaacc
tggaatctct gatcaaagaa tttctggact acctggaaat cgacttcgaa
1020atcgttggtg actcctgcat ggtgtacggt gataccctgg acatcatgca
cggtgacctg 1080gaactgtctt ctgcggttgt tggtccggtt ccgctggatc
gtgaatgggg tatcgacaaa 1140ccgtggatcg gtgcgggttt cggtctggaa
cgtctgctga aagttatgca cggtttcaaa 1200aacatcaaac gtgcgtctcg
ttctgaatct tactacaacg gtatctctac caacctgtaa 126028419PRTArtificial
SequenceSynthetic Polypeptide 28Met Asp Lys Lys Pro Leu Asp Val Leu
Ile Ser Ala Thr Gly Leu Trp1 5 10 15Met Ser Arg Thr Gly Thr Leu His
Lys Ile Lys His Tyr Glu Ile Ser 20 25 30Arg Ser Lys Ile Tyr Ile Glu
Met Ala Cys Gly Asp His Leu Val Val 35 40 45Asn Asn Ser Arg Ser Cys
Arg Pro Ala Arg Ala Phe Arg Tyr His Lys 50 55 60Tyr Arg Lys Thr Cys
Lys Arg Cys Arg Val Ser Asp Glu Asp Ile Asn65 70 75 80Asn Phe Leu
Thr Arg Ser Thr Glu Gly Lys Thr Ser Val Lys Val Lys 85 90 95Val Val
Ser Glu Pro Lys Val Lys Lys Ala Met Pro Lys Ser Val Ser 100 105
110Arg Ala Pro Lys Pro Leu Glu Asn Pro Val Ser Ala Lys Ala Ser Thr
115 120 125Asp Thr Ser Arg Ser Val Pro Ser Pro Ala Lys Ser Thr Pro
Asn Ser 130 135 140Pro Val Pro Thr Ser Ala Pro Ala Pro Ser Leu Thr
Arg Ser Gln Leu145 150 155 160Asp Arg Val Glu Ala Leu Leu Ser Pro
Glu Asp Lys Ile Ser Leu Asn 165 170 175Ile Ala Lys Pro Phe Arg Glu
Leu Glu Ser Glu Leu Val Thr Arg Arg 180 185 190Lys Asn Asp Phe Gln
Arg Leu Tyr Thr Asn Asp Arg Glu Asp Tyr Leu 195 200 205Gly Lys Leu
Glu Arg Asp Ile Thr Lys Phe Phe Val Asp Arg Asp Phe 210 215 220Leu
Glu Ile Lys Ser Pro Ile Leu Ile Pro Ala Glu Tyr Val Glu Arg225 230
235 240Met Gly Ile Asn Asn Asp Thr Glu Leu Ser Lys Gln Ile Phe Arg
Val 245 250 255Asp Lys Asn Leu Cys Leu Arg Pro Met Leu Ala Pro Thr
Leu Tyr Asn 260 265 270Tyr Leu Arg Lys Leu Asp Arg Ile Leu Pro Asp
Pro Ile Lys Ile Phe 275 280 285Glu Val Gly Pro Cys Tyr Arg Lys Glu
Ser Asp Gly Lys Glu His Leu 290 295 300Glu Glu Phe Thr Met Val Asn
Phe Cys Gln Met Gly Ser Gly Cys Thr305 310 315 320Arg Glu Asn Leu
Glu Ser Leu Ile Lys Glu Phe Leu Asp Tyr Leu Glu 325 330 335Ile Asp
Phe Glu Ile Val Gly Asp Ser Cys Met Val Tyr Gly Asp Thr 340 345
350Leu Asp Ile Met His Gly Asp Leu Glu Leu Ser Ser Ala Val Val Gly
355 360 365Pro Val Pro Leu Asp Arg Glu Trp Gly Ile Asp Lys Pro Trp
Ile Gly 370 375 380Ala Gly Phe Gly Leu Glu Arg Leu Leu Lys Val Met
His Gly Phe Lys385 390 395 400Asn Ile Lys Arg Ala Ser Arg Ser Glu
Ser Tyr Tyr Asn Gly Ile Ser 405 410 415Thr Asn
Leu291365DNAArtificial SequenceSynthetic Polynucleotide
29atggataaaa aaccactaaa cactctgata tctgcaaccg ggctctggat gtccaggacc
60ggaacaattc ataaaataaa acaccacgaa atttctcgaa gcaaaatcta tattgaaatg
120gcatgcggag accaccttgt tgtaaacaac tccaggagca gcaggcccgc
aagagcgctc 180aggtatcaca aatacaggaa gacctgcaaa cgctgcaggg
tttcggatga ggatctcaat 240aagttcctca caaaggcaaa cgaagaccag
acaagcgtaa aagtcaaggt cgtttctgag 300cctaccagaa cgaaaaaggc
aatgccaaaa tccgttgcga gagccccgaa acctcttgag 360aatacagaag
cggcacaggc tcaaccttct ggatctaaat tttcacctgc gataccggtt
420tccacccaag agtcagtttc tgtcccggca tctgtttcaa catcaatatc
aagcatttct 480acaggagcaa ctgcatccgc actggtaaaa gggaatacga
atcccattac atccatgtct 540gcccctgttc aggcaagtgc ccccgcactt
acgaagagcc agactgacag gcttgaagtc 600ctgttaaacc caaaagatga
gatttccctg aattccggca agcctttcag ggagcttgag 660tccgaattgc
tctctcgcag aaaaaaagac ctgcagcaga tctacgcgga agaaagggag
720aattatctgg ggaaactcga gcgtgaaatt accaggttct ttgtggacag
gggttttctg 780gaaataaaat ccccgatcct gatccctctt gagtatatcg
aaaggatggg cattgataat 840gataccgaac tttcaaaaca gatcttcagg
gttgacaaga acttctgcct gagacccatg 900cttgctccaa acctttacaa
ctacctgcgc aagcttgaca gggccctgcc tgatccaata 960aaaatttttg
aaataggccc atgctacaga aaagagtccg acggcaaaga acacctcgaa
1020gagtttacca tgctgaactt ctgccagatg ggatcgggat gcacacggga
aaatcttgaa 1080agcataatta cggacttcct gaaccacctg ggaattgatt
tcaagatcgt aggcgattcc 1140tgcatggtct atggggatac ccttgatgta
atgcacggag acctggaact ttcctctgca 1200gtagtcggac ccataccgct
tgaccgggaa tggggtattg ataaaccctg gataggggca 1260ggtttcgggc
tcgaacgcct tctaaaggtt aaacacgact ttaaaaatat caagagagct
1320gcaaggtccg agtcttacta taacgggatt tctaccaacc tgtaa
136530454PRTArtificial SequenceSynthetic Polypeptide 30Met Asp Lys
Lys Pro Leu Asn Thr Leu Ile Ser Ala Thr Gly Leu Trp1 5 10 15Met Ser
Arg Thr Gly Thr Ile His Lys Ile Lys His His Glu Ile Ser 20 25 30Arg
Ser Lys Ile Tyr Ile Glu Met Ala Cys Gly Asp His Leu Val Val 35 40
45Asn Asn Ser Arg Ser Ser Arg Pro Ala Arg Ala Leu Arg Tyr His Lys
50 55 60Tyr Arg Lys Thr Cys Lys Arg Cys Arg Val Ser Asp Glu Asp Leu
Asn65 70 75 80Lys Phe Leu Thr Lys Ala Asn Glu Asp Gln Thr Ser Val
Lys Val Lys 85 90 95Val Val Ser Glu Pro Thr Arg Thr Lys Lys Ala Met
Pro Lys Ser Val 100 105 110Ala Arg Ala Pro Lys Pro Leu Glu Asn Thr
Glu Ala Ala Gln Ala Gln 115 120 125Pro Ser Gly Ser Lys Phe Ser Pro
Ala Ile Pro Val Ser Thr Gln Glu 130 135 140Ser Val Ser Val Pro Ala
Ser Val Ser Thr Ser Ile Ser Ser Ile Ser145 150 155 160Thr Gly Ala
Thr Ala Ser Ala Leu Val Lys Gly Asn Thr Asn Pro Ile 165 170 175Thr
Ser Met Ser Ala Pro Val Gln Ala Ser Ala Pro Ala Leu Thr Lys 180 185
190Ser Gln Thr Asp Arg Leu Glu Val Leu Leu Asn Pro Lys Asp Glu Ile
195 200 205Ser Leu Asn Ser Gly Lys Pro Phe Arg Glu Leu Glu Ser Glu
Leu Leu 210 215 220Ser Arg Arg Lys Lys Asp Leu Gln Gln Ile Tyr Ala
Glu Glu Arg Glu225 230 235 240Asn Tyr Leu Gly Lys Leu Glu Arg Glu
Ile Thr Arg Phe Phe Val Asp 245 250 255Arg Gly Phe Leu Glu Ile Lys
Ser Pro Ile Leu Ile Pro Leu Glu Tyr 260 265 270Ile Glu Arg Met Gly
Ile Asp Asn Asp Thr Glu Leu Ser Lys Gln Ile 275 280 285Phe Arg Val
Asp Lys Asn Phe Cys Leu Arg Pro Met Leu Ala Pro Asn 290 295 300Leu
Tyr Asn Tyr Leu Arg Lys Leu Asp Arg Ala Leu Pro Asp Pro Ile305 310
315 320Lys Ile Phe Glu Ile Gly Pro Cys Tyr Arg Lys Glu Ser Asp Gly
Lys 325 330 335Glu His Leu Glu Glu Phe Thr Met Leu Asn Phe Cys Gln
Met Gly Ser 340 345 350Gly Cys Thr Arg Glu Asn Leu Glu Ser Ile Ile
Thr Asp Phe Leu Asn 355 360 365His Leu Gly Ile Asp Phe Lys Ile Val
Gly Asp Ser Cys Met Val Tyr 370 375 380Gly Asp Thr Leu Asp Val Met
His Gly Asp Leu Glu Leu Ser Ser Ala385 390 395 400Val Val Gly Pro
Ile Pro Leu Asp Arg Glu Trp Gly Ile Asp Lys Pro 405 410 415Trp Ile
Gly Ala Gly Phe Gly Leu Glu Arg Leu Leu Lys Val Lys His 420 425
430Asp Phe Lys Asn Ile Lys Arg Ala Ala Arg Ser Glu Ser Tyr Tyr Asn
435 440 445Gly Ile Ser Thr Asn Leu 450311260DNAArtificial
SequenceSynthetic Polynucleotide 31atggataaga agccgctgga tgttctgatc
tctgcgaccg gtctgtggat gtcccgtacc 60ggcacgctgc acaagatcaa gcactatgag
atttctcgtt ctaaaatcta catcgaaatg 120gcgtgtggtg accatctggt
tgtgaacaac tctcgttctt gtcgtcccgc acgtgcattc 180cgttatcata
aataccgtaa aacctgcaaa cgttgtcgtg tttctggtga agatatcaac
240aacttcctga cccgttctac cgaaggcaaa acctctgtta aagttaaagt
tgtttctgag 300ccgaaagtga aaaaagcgat gccgaaatct gtttctcgtg
cgccgaaacc gctggaaaat 360ccggtttctg cgaaagcgtc taccgacacc
tctcgttctg ttccgtctcc ggcgaaatct 420accccgaact ctccggttcc
gacctctgca agtgcccccg cacttacgaa gagccagact 480gacaggcttg
aagtcctgtt aaacccaaaa gatgagattt ccctgaattc cggcaagcct
540ttcagggagc ttgagtccga attgctctct cgcagaaaaa aagacctgca
gcagatctac 600gcggaagaaa gggagaatta tctggggaaa ctcgagcgtg
aaattaccag gttctttgtg 660gacaggggtt ttctggaaat aaaatccccg
atcctgatcc ctcttgagta tatcgaaagg 720atgggcattg ataatgatac
cgaactttca aaacagatct tcagggttga caagaacttc 780tgcctgagac
ccatgatggc tccaaacatt tttaactacg ctcgcaagct tgacagggcc
840ctgcctgatc caataaaaat ttttgaaata ggcccatgct acagaaaaga
gtccgacggc 900aaagaacacc tcgaagagtt taccatgctg aacttctttc
agatgggatc gggatgcaca 960cgggaaaatc ttgaaagcat aattacggac
ttcctgaacc acctgggaat tgatttcaag 1020atcgtaggcg attcctgcat
ggtctatggg gatacccttg atgtaatgca cggagacctg 1080gaactttcct
ctgcagtagt cggacccata ccgcttgacc gggaatgggg tattgataaa
1140ccctggatag gggcaggttt cggactcgaa cgccttctaa aggttaaaca
cgactttaaa 1200aatatcaaga gagctgcaag gtccgagtct tactataacg
ggatttctac caacctgtaa 126032419PRTArtificial SequenceSynthetic
Polypeptide 32Met Asp Lys Lys Pro Leu Asp Val Leu Ile Ser Ala Thr
Gly Leu Trp1 5 10 15Met Ser Arg Thr Gly Thr Leu His Lys Ile Lys His
Tyr Glu Ile Ser 20 25 30Arg Ser Lys Ile Tyr Ile Glu Met Ala Cys Gly
Asp His Leu Val Val 35 40 45Asn Asn Ser Arg Ser Cys Arg Pro Ala Arg
Ala Phe Arg Tyr His Lys 50 55 60Tyr Arg Lys Thr Cys Lys Arg Cys Arg
Val Ser Gly Glu Asp Ile Asn65 70 75 80Asn Phe Leu Thr Arg Ser Thr
Glu Gly Lys Thr Ser Val Lys Val Lys 85 90 95Val Val Ser Glu Pro Lys
Val Lys Lys Ala Met Pro Lys Ser Val Ser 100 105 110Arg Ala Pro Lys
Pro Leu Glu Asn Pro Val Ser Ala Lys Ala Ser Thr 115 120 125Asp Thr
Ser Arg Ser Val Pro Ser Pro Ala Lys Ser Thr Pro Asn Ser 130
135 140Pro Val Pro Thr Ser Ala Ser Ala Pro Ala Leu Thr Lys Ser Gln
Thr145 150 155 160Asp Arg Leu Glu Val Leu Leu Asn Pro Lys Asp Glu
Ile Ser Leu Asn 165 170 175Ser Gly Lys Pro Phe Arg Glu Leu Glu Ser
Glu Leu Leu Ser Arg Arg 180 185 190Lys Lys Asp Leu Gln Gln Ile Tyr
Ala Glu Glu Arg Glu Asn Tyr Leu 195 200 205Gly Lys Leu Glu Arg Glu
Ile Thr Arg Phe Phe Val Asp Arg Gly Phe 210 215 220Leu Glu Ile Lys
Ser Pro Ile Leu Ile Pro Leu Glu Tyr Ile Glu Arg225 230 235 240Met
Gly Ile Asp Asn Asp Thr Glu Leu Ser Lys Gln Ile Phe Arg Val 245 250
255Asp Lys Asn Phe Cys Leu Arg Pro Met Met Ala Pro Asn Ile Phe Asn
260 265 270Tyr Ala Arg Lys Leu Asp Arg Ala Leu Pro Asp Pro Ile Lys
Ile Phe 275 280 285Glu Ile Gly Pro Cys Tyr Arg Lys Glu Ser Asp Gly
Lys Glu His Leu 290 295 300Glu Glu Phe Thr Met Leu Asn Phe Phe Gln
Met Gly Ser Gly Cys Thr305 310 315 320Arg Glu Asn Leu Glu Ser Ile
Ile Thr Asp Phe Leu Asn His Leu Gly 325 330 335Ile Asp Phe Lys Ile
Val Gly Asp Ser Cys Met Val Tyr Gly Asp Thr 340 345 350Leu Asp Val
Met His Gly Asp Leu Glu Leu Ser Ser Ala Val Val Gly 355 360 365Pro
Ile Pro Leu Asp Arg Glu Trp Gly Ile Asp Lys Pro Trp Ile Gly 370 375
380Ala Gly Phe Gly Leu Glu Arg Leu Leu Lys Val Lys His Asp Phe
Lys385 390 395 400Asn Ile Lys Arg Ala Ala Arg Ser Glu Ser Tyr Tyr
Asn Gly Ile Ser 405 410 415Thr Asn Leu331260DNAArtificial
SequenceSynthetic Polynucleotide 33atggataaga agccgctgga tgttctgatc
tctgcgaccg gtctgtggat gtcccgtacc 60ggcacgctgc acaagatcaa gcactatgag
atttctcgtt ctaaaatcta catcgaaatg 120gcgtgtggtg accatctggt
tgtgaacaac tctcgttctt gtcgtcccgc acgtgcattc 180cgttatcata
aataccgtaa aacctgcaaa cgttgtcgtg tttctggtga agatatcaac
240aacttcctga cccgttctac cgaaggcaaa acctctgtta aagttaaagt
tgtttctgag 300ccgaaagtga aaaaagcgat gccgaaatct gtttctcgtg
cgccgaaacc gctggaaaat 360ccggtttctg cgaaagcgtc taccgacacc
tctcgttctg ttccgtctcc ggcgaaatct 420accccgaact ctccggttcc
gacctctgcg ccggcgccgt ctctgacccg ttctcagctg 480gatcgtgttg
aagcgctgct gtctccggaa gataaaatct ctctgaacat cgcgaaaccg
540ttccgtgaac tggaatctga actggttacc cgtcgtaaaa acgatttcca
gcgtctgtac 600accaacgatc gtgaagacta cctgggtaaa ctggaacgtg
acatcaccaa attcttcgtt 660gaccgtgatt tcctggaaat caaatctccg
atcctgatcc cggcggaata cgttgaacgt 720atgggtatca acaacgatac
cgaactgtct aaacagatct tccgtgttga taaaaacctg 780tgcctgcgtc
cgatgatggc gccgaccatt tttaactatg ctcgtaaact ggatcgtatc
840ctgccggacc cgatcaaaat cttcgaagtt ggtccgtgct accgtaaaga
atctgacggt 900aaagaacacc tggaagagtt caccatggtg aacttctttc
agatgggttc tggttgcacc 960cgtgagaacc tggaatctct gatcaaagaa
tttctggact acctggaaat cgacttcgaa 1020atcgttggtg actcctgcat
ggtgtacggt gataccctgg acatcatgca cggtgacctg 1080gaactgtctt
ctgcggttgt tggtccggtt ccgctggatc gtgaatgggg tatcgacaaa
1140ccgtggatcg gtgcgggttt cggtctggaa cgtctgctga aagttatgca
cggtttcaaa 1200aacatcaaac gtgcgtctcg ttctgaatct tactacaacg
gtatctctac caacctgtaa 126034419PRTArtificial SequenceSynthetic
Polypeptide 34Met Asp Lys Lys Pro Leu Asp Val Leu Ile Ser Ala Thr
Gly Leu Trp1 5 10 15Met Ser Arg Thr Gly Thr Leu His Lys Ile Lys His
Tyr Glu Ile Ser 20 25 30Arg Ser Lys Ile Tyr Ile Glu Met Ala Cys Gly
Asp His Leu Val Val 35 40 45Asn Asn Ser Arg Ser Cys Arg Pro Ala Arg
Ala Phe Arg Tyr His Lys 50 55 60Tyr Arg Lys Thr Cys Lys Arg Cys Arg
Val Ser Gly Glu Asp Ile Asn65 70 75 80Asn Phe Leu Thr Arg Ser Thr
Glu Gly Lys Thr Ser Val Lys Val Lys 85 90 95Val Val Ser Glu Pro Lys
Val Lys Lys Ala Met Pro Lys Ser Val Ser 100 105 110Arg Ala Pro Lys
Pro Leu Glu Asn Pro Val Ser Ala Lys Ala Ser Thr 115 120 125Asp Thr
Ser Arg Ser Val Pro Ser Pro Ala Lys Ser Thr Pro Asn Ser 130 135
140Pro Val Pro Thr Ser Ala Pro Ala Pro Ser Leu Thr Arg Ser Gln
Leu145 150 155 160Asp Arg Val Glu Ala Leu Leu Ser Pro Glu Asp Lys
Ile Ser Leu Asn 165 170 175Ile Ala Lys Pro Phe Arg Glu Leu Glu Ser
Glu Leu Val Thr Arg Arg 180 185 190Lys Asn Asp Phe Gln Arg Leu Tyr
Thr Asn Asp Arg Glu Asp Tyr Leu 195 200 205Gly Lys Leu Glu Arg Asp
Ile Thr Lys Phe Phe Val Asp Arg Asp Phe 210 215 220Leu Glu Ile Lys
Ser Pro Ile Leu Ile Pro Ala Glu Tyr Val Glu Arg225 230 235 240Met
Gly Ile Asn Asn Asp Thr Glu Leu Ser Lys Gln Ile Phe Arg Val 245 250
255Asp Lys Asn Leu Cys Leu Arg Pro Met Met Ala Pro Thr Ile Phe Asn
260 265 270Tyr Ala Arg Lys Leu Asp Arg Ile Leu Pro Asp Pro Ile Lys
Ile Phe 275 280 285Glu Val Gly Pro Cys Tyr Arg Lys Glu Ser Asp Gly
Lys Glu His Leu 290 295 300Glu Glu Phe Thr Met Val Asn Phe Phe Gln
Met Gly Ser Gly Cys Thr305 310 315 320Arg Glu Asn Leu Glu Ser Leu
Ile Lys Glu Phe Leu Asp Tyr Leu Glu 325 330 335Ile Asp Phe Glu Ile
Val Gly Asp Ser Cys Met Val Tyr Gly Asp Thr 340 345 350Leu Asp Ile
Met His Gly Asp Leu Glu Leu Ser Ser Ala Val Val Gly 355 360 365Pro
Val Pro Leu Asp Arg Glu Trp Gly Ile Asp Lys Pro Trp Ile Gly 370 375
380Ala Gly Phe Gly Leu Glu Arg Leu Leu Lys Val Met His Gly Phe
Lys385 390 395 400Asn Ile Lys Arg Ala Ser Arg Ser Glu Ser Tyr Tyr
Asn Gly Ile Ser 405 410 415Thr Asn Leu351365DNAArtificial
SequenceSynthetic Polynucleotide 35atggataaaa aaccactaaa cactctgata
tctgcaaccg ggctctggat gtccaggacc 60ggaacaattc ataaaataaa acaccacgaa
atttctcgaa gcaaaatcta tattgaaatg 120gcatgcggag accaccttgt
tgtaaacaac tccaggagca gcaggcccgc aagagcgctc 180aggtatcaca
aatacaggaa gacctgcaaa cgctgcaggg tttcgggtga ggatctcaat
240aagttcctca caaaggcaaa cgaagaccag acaagcgtaa aagtcaaggt
cgtttctgag 300cctaccagaa cgaaaaaggc aatgccaaaa tccgttgcga
gagccccgaa acctcttgag 360aatacagaag cggcacaggc tcaaccttct
ggatctaaat tttcacctgc gataccggtt 420tccacccaag agtcagtttc
tgtcccggca tctgtttcaa catcaatatc aagcatttct 480acaggagcaa
ctgcatccgc actggtaaaa gggaatacga atcccattac atccatgtct
540gcccctgttc aggcaagtgc ccccgcactt acgaagagcc agactgacag
gcttgaagtc 600ctgttaaacc caaaagatga gatttccctg aattccggca
agcctttcag ggagcttgag 660tccgaattgc tctctcgcag aaaaaaagac
ctgcagcaga tctacgcgga agaaagggag 720aattatctgg ggaaactcga
gcgtgaaatt accaggttct ttgtggacag gggttttctg 780gaaataaaat
ccccgatcct gatccctctt gagtatatcg aaaggatggg cattgataat
840gataccgaac tttcaaaaca gatcttcagg gttgacaaga acttctgcct
gagacccatg 900atggctccaa acatttttaa ctacgctcgc aagcttgaca
gggccctgcc tgatccaata 960aaaatttttg aaataggccc atgctacaga
aaagagtccg acggcaaaga acacctcgaa 1020gagtttacca tgctgaactt
ctttcagatg ggatcgggat gcacacggga aaatcttgaa 1080agcataatta
cggacttcct gaaccacctg ggaattgatt tcaagatcgt aggcgattcc
1140tgcatggtct atggggatac ccttgatgta atgcacggag acctggaact
ttcctctgca 1200gtagtcggac ccataccgct tgaccgggaa tggggtattg
ataaaccctg gataggggca 1260ggtttcgggc tcgaacgcct tctaaaggtt
aaacacgact ttaaaaatat caagagagct 1320gcaaggtccg agtcttacta
taacgggatt tctaccaacc tgtaa 136536454PRTArtificial
SequenceSynthetic Polypeptide 36Met Asp Lys Lys Pro Leu Asn Thr Leu
Ile Ser Ala Thr Gly Leu Trp1 5 10 15Met Ser Arg Thr Gly Thr Ile His
Lys Ile Lys His His Glu Ile Ser 20 25 30Arg Ser Lys Ile Tyr Ile Glu
Met Ala Cys Gly Asp His Leu Val Val 35 40 45Asn Asn Ser Arg Ser Ser
Arg Pro Ala Arg Ala Leu Arg Tyr His Lys 50 55 60Tyr Arg Lys Thr Cys
Lys Arg Cys Arg Val Ser Gly Glu Asp Leu Asn65 70 75 80Lys Phe Leu
Thr Lys Ala Asn Glu Asp Gln Thr Ser Val Lys Val Lys 85 90 95Val Val
Ser Glu Pro Thr Arg Thr Lys Lys Ala Met Pro Lys Ser Val 100 105
110Ala Arg Ala Pro Lys Pro Leu Glu Asn Thr Glu Ala Ala Gln Ala Gln
115 120 125Pro Ser Gly Ser Lys Phe Ser Pro Ala Ile Pro Val Ser Thr
Gln Glu 130 135 140Ser Val Ser Val Pro Ala Ser Val Ser Thr Ser Ile
Ser Ser Ile Ser145 150 155 160Thr Gly Ala Thr Ala Ser Ala Leu Val
Lys Gly Asn Thr Asn Pro Ile 165 170 175Thr Ser Met Ser Ala Pro Val
Gln Ala Ser Ala Pro Ala Leu Thr Lys 180 185 190Ser Gln Thr Asp Arg
Leu Glu Val Leu Leu Asn Pro Lys Asp Glu Ile 195 200 205Ser Leu Asn
Ser Gly Lys Pro Phe Arg Glu Leu Glu Ser Glu Leu Leu 210 215 220Ser
Arg Arg Lys Lys Asp Leu Gln Gln Ile Tyr Ala Glu Glu Arg Glu225 230
235 240Asn Tyr Leu Gly Lys Leu Glu Arg Glu Ile Thr Arg Phe Phe Val
Asp 245 250 255Arg Gly Phe Leu Glu Ile Lys Ser Pro Ile Leu Ile Pro
Leu Glu Tyr 260 265 270Ile Glu Arg Met Gly Ile Asp Asn Asp Thr Glu
Leu Ser Lys Gln Ile 275 280 285Phe Arg Val Asp Lys Asn Phe Cys Leu
Arg Pro Met Met Ala Pro Asn 290 295 300Ile Phe Asn Tyr Ala Arg Lys
Leu Asp Arg Ala Leu Pro Asp Pro Ile305 310 315 320Lys Ile Phe Glu
Ile Gly Pro Cys Tyr Arg Lys Glu Ser Asp Gly Lys 325 330 335Glu His
Leu Glu Glu Phe Thr Met Leu Asn Phe Phe Gln Met Gly Ser 340 345
350Gly Cys Thr Arg Glu Asn Leu Glu Ser Ile Ile Thr Asp Phe Leu Asn
355 360 365His Leu Gly Ile Asp Phe Lys Ile Val Gly Asp Ser Cys Met
Val Tyr 370 375 380Gly Asp Thr Leu Asp Val Met His Gly Asp Leu Glu
Leu Ser Ser Ala385 390 395 400Val Val Gly Pro Ile Pro Leu Asp Arg
Glu Trp Gly Ile Asp Lys Pro 405 410 415Trp Ile Gly Ala Gly Phe Gly
Leu Glu Arg Leu Leu Lys Val Lys His 420 425 430Asp Phe Lys Asn Ile
Lys Arg Ala Ala Arg Ser Glu Ser Tyr Tyr Asn 435 440 445Gly Ile Ser
Thr Asn Leu 450371260DNAArtificial SequenceSynthetic Polynucleotide
37atggataaga agccgctgga tgttctgatc tctgcgaccg gtctgtggat gtcccgtacc
60ggcacgctgc acaagatcaa gcactatgag atttctcgtt ctaaaatcta catcgaaatg
120gcgtgtggtg accatctggt tgtgaacaac tctcgttctt gtcgtcccgc
acgtgcattc 180cgttatcata aataccgtaa aacctgcaaa cgttgtcgtg
tttctgacga agatatcaac 240aacttcctga cccgttctac cgaaggcaaa
acctctgtta aagttaaagt tgtttctgag 300ccgaaagtga aaaaagcgat
gccgaaatct gtttctcgtg cgccgaaacc gctggaaaat 360ccggtttctg
cgaaagcgtc taccgacacc tctcgttctg ttccgtctcc ggcgaaatct
420accccgaact ctccggttcc gacctctgca agtgcccccg cacttacgaa
gagccagact 480gacaggcttg aagtcctgtt aaacccaaaa gatgagattt
ccctgaattc cggcaagcct 540ttcagggagc ttgagtccga attgctctct
cgcagaaaaa aagacctgca gcagatctac 600gcggaagaaa gggagaatta
tctggggaaa ctcgagcgtg aaattaccag gttctttgtg 660gacaggggtt
ttctggaaat aaaatccccg atcctgatcc ctcttgagta tatcgaaagg
720atgggcattg ataatgatac cgaactttca aaacagatct tcagggttga
caagaacttc 780tgcctgagac ccatgcttgc tccaaacctt tacaactacc
tgcgcaagct tgacagggcc 840ctgcctgatc caataaaaat ttttgaaata
ggcccatgct acagaaaaga gtccgacggc 900aaagaacacc tcgaagagtt
taccatgctg tcgttcattc agatgggatc gggatgtaca 960cgggaaaatc
ttgaaagcat aattacggac ttcctgaacc acctgggaat tgatttcaag
1020atcgtaggcg attcctgcat ggtctatggg gatacccttg atgtaatgca
cggagacctg 1080gaactttcct ctgcagtagt cggacccata ccgcttgacc
gggaatgggg tattgataaa 1140ccctggatag gggcaggttt cgggctcgaa
cgccttctaa aggttaaaca cgactttaaa 1200aatatcaaga gagctgcaag
gtccgagtct tactataacg ggatttctac caacctgtaa 126038419PRTArtificial
SequenceSynthetic Polypeptide 38Met Asp Lys Lys Pro Leu Asp Val Leu
Ile Ser Ala Thr Gly Leu Trp1 5 10 15Met Ser Arg Thr Gly Thr Leu His
Lys Ile Lys His Tyr Glu Ile Ser 20 25 30Arg Ser Lys Ile Tyr Ile Glu
Met Ala Cys Gly Asp His Leu Val Val 35 40 45Asn Asn Ser Arg Ser Cys
Arg Pro Ala Arg Ala Phe Arg Tyr His Lys 50 55 60Tyr Arg Lys Thr Cys
Lys Arg Cys Arg Val Ser Asp Glu Asp Ile Asn65 70 75 80Asn Phe Leu
Thr Arg Ser Thr Glu Gly Lys Thr Ser Val Lys Val Lys 85 90 95Val Val
Ser Glu Pro Lys Val Lys Lys Ala Met Pro Lys Ser Val Ser 100 105
110Arg Ala Pro Lys Pro Leu Glu Asn Pro Val Ser Ala Lys Ala Ser Thr
115 120 125Asp Thr Ser Arg Ser Val Pro Ser Pro Ala Lys Ser Thr Pro
Asn Ser 130 135 140Pro Val Pro Thr Ser Ala Ser Ala Pro Ala Leu Thr
Lys Ser Gln Thr145 150 155 160Asp Arg Leu Glu Val Leu Leu Asn Pro
Lys Asp Glu Ile Ser Leu Asn 165 170 175Ser Gly Lys Pro Phe Arg Glu
Leu Glu Ser Glu Leu Leu Ser Arg Arg 180 185 190Lys Lys Asp Leu Gln
Gln Ile Tyr Ala Glu Glu Arg Glu Asn Tyr Leu 195 200 205Gly Lys Leu
Glu Arg Glu Ile Thr Arg Phe Phe Val Asp Arg Gly Phe 210 215 220Leu
Glu Ile Lys Ser Pro Ile Leu Ile Pro Leu Glu Tyr Ile Glu Arg225 230
235 240Met Gly Ile Asp Asn Asp Thr Glu Leu Ser Lys Gln Ile Phe Arg
Val 245 250 255Asp Lys Asn Phe Cys Leu Arg Pro Met Leu Ala Pro Asn
Leu Tyr Asn 260 265 270Tyr Leu Arg Lys Leu Asp Arg Ala Leu Pro Asp
Pro Ile Lys Ile Phe 275 280 285Glu Ile Gly Pro Cys Tyr Arg Lys Glu
Ser Asp Gly Lys Glu His Leu 290 295 300Glu Glu Phe Thr Met Leu Ser
Phe Ile Gln Met Gly Ser Gly Cys Thr305 310 315 320Arg Glu Asn Leu
Glu Ser Ile Ile Thr Asp Phe Leu Asn His Leu Gly 325 330 335Ile Asp
Phe Lys Ile Val Gly Asp Ser Cys Met Val Tyr Gly Asp Thr 340 345
350Leu Asp Val Met His Gly Asp Leu Glu Leu Ser Ser Ala Val Val Gly
355 360 365Pro Ile Pro Leu Asp Arg Glu Trp Gly Ile Asp Lys Pro Trp
Ile Gly 370 375 380Ala Gly Phe Gly Leu Glu Arg Leu Leu Lys Val Lys
His Asp Phe Lys385 390 395 400Asn Ile Lys Arg Ala Ala Arg Ser Glu
Ser Tyr Tyr Asn Gly Ile Ser 405 410 415Thr Asn
Leu39306PRTArtificial SequenceSynthetic Polypeptide 39Met Asp Glu
Phe Glu Met Ile Lys Arg Asn Thr Ser Glu Ile Ile Ser1 5 10 15Glu Glu
Glu Leu Arg Glu Val Leu Lys Lys Asp Glu Lys Ser Ala Leu 20 25 30Ile
Gly Phe Glu Pro Ser Gly Lys Ile His Leu Gly His Tyr Leu Gln 35 40
45Ile Lys Lys Met Ile Asp Leu Gln Asn Ala Gly Phe Asp Ile Ile Ile
50 55 60Leu Leu Ala Asp Leu His Ala Tyr Leu Asn Gln Lys Gly Glu Leu
Asp65 70 75 80Glu Ile Arg Lys Ile Gly Asp Tyr Asn Lys Lys Val Phe
Glu Ala Met 85 90 95Gly Leu Lys Ala Lys Tyr Val Tyr Gly Ser Ser Phe
Gln Leu Asp Lys 100 105 110Asp Tyr Thr Leu Asn Val Tyr Arg Leu Ala
Leu Lys Thr Thr Leu Lys 115 120 125Arg Ala Arg Arg Ser Met Glu Leu
Ile Ala Arg Glu Asp Glu Asn Pro 130 135 140Lys Val Ala Glu Val Ile
Tyr Pro Ile Met Gln Val Asn Pro Leu Asn145 150 155 160Tyr Glu Gly
Val Asp Val Ala Val Gly Gly Met Glu Gln Arg Lys Ile 165 170 175His
Met Leu Ala Arg Glu Leu Leu Pro Lys Lys Val Val Cys Ile His 180 185
190Asn Pro Val Leu Thr Gly Leu Asp Gly Glu Gly Lys Met Ser Ser Ser
195 200 205Lys Gly Asn Phe Ile Ala Val Asp Asp Ser Pro Glu Glu Ile
Arg Ala 210
215 220Lys Ile Lys Lys Ala Tyr Cys Pro Ala Gly Val Val Glu Gly Asn
Pro225 230 235 240Ile Met Glu Ile Ala Lys Tyr Phe Leu Glu Tyr Pro
Leu Thr Ile Lys 245 250 255Arg Pro Glu Lys Phe Gly Gly Asp Leu Thr
Val Asn Ser Tyr Glu Glu 260 265 270Leu Glu Ser Leu Phe Lys Asn Lys
Glu Leu His Pro Met Arg Leu Lys 275 280 285Asn Ala Val Ala Glu Glu
Leu Ile Lys Ile Leu Glu Pro Ile Arg Lys 290 295 300Arg
Leu30540306PRTArtificial SequenceSynthetic Polypeptide 40Met Asp
Glu Phe Glu Met Ile Lys Arg Asn Thr Ser Glu Ile Ile Ser1 5 10 15Glu
Glu Glu Leu Arg Glu Val Leu Lys Lys Asp Glu Lys Ser Ala Leu 20 25
30Ile Gly Phe Glu Pro Ser Gly Lys Ile His Leu Gly His Tyr Leu Gln
35 40 45Ile Lys Lys Met Ile Asp Leu Gln Asn Ala Gly Phe Asp Ile Ile
Ile 50 55 60Leu Leu Ala Asp Leu His Ala Tyr Leu Asn Gln Lys Gly Glu
Leu Asp65 70 75 80Glu Ile Arg Lys Ile Gly Asp Tyr Asn Lys Lys Val
Phe Glu Ala Met 85 90 95Gly Leu Lys Ala Lys Tyr Val Tyr Gly Ser Ser
Phe Gln Leu Asp Lys 100 105 110Asp Tyr Thr Leu Asn Val Tyr Arg Leu
Ala Leu Lys Thr Thr Leu Lys 115 120 125Arg Ala Arg Arg Ser Met Glu
Leu Ile Ala Arg Glu Asp Glu Asn Pro 130 135 140Lys Val Ala Glu Val
Ile Tyr Pro Ile Met Gln Val Asn Pro Leu His145 150 155 160Tyr Glu
Gly Val Asp Val Ala Val Gly Gly Met Glu Gln Arg Lys Ile 165 170
175His Met Leu Ala Arg Glu Leu Leu Pro Lys Lys Val Val Cys Ile His
180 185 190Asn Pro Val Leu Thr Gly Leu Asp Gly Glu Gly Lys Met Ser
Ser Ser 195 200 205Lys Gly Asn Phe Ile Ala Val Asp Asp Ser Pro Glu
Glu Ile Arg Ala 210 215 220Lys Ile Lys Lys Ala Tyr Cys Pro Ala Gly
Val Val Glu Gly Asn Pro225 230 235 240Ile Met Glu Ile Ala Lys Tyr
Phe Leu Glu Tyr Pro Leu Thr Ile Lys 245 250 255Arg Pro Glu Lys Phe
Gly Gly Asp Leu Thr Val Asn Ser Tyr Glu Glu 260 265 270Leu Glu Ser
Leu Phe Lys Asn Lys Glu Leu His Pro Met Arg Leu Lys 275 280 285Asn
Ala Val Ala Glu Glu Leu Ile Lys Ile Leu Glu Pro Ile Arg Lys 290 295
300Arg Leu305
* * * * *
References